Smltutorial
Smltutorial
Robert Harper1
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213-3891
Copyright
c 1986-1993 Robert Harper.
All rights reserved.
4 Input-Output 84
A Answers 88
Acknowledgements
Several of the examples were cribbed from Luca Cardelli's introduction to his
dialect of ML [3], from Robin Milner's core language denition [5], from Dave
MacQueen's modules paper [6], and from Abelson and Sussman's book [1].
Joachim Parrow, Don Sannella, and David Walker made many helpful sug-
gestions.
vii
Chapter 1
Introduction
These notes are an introduction to the Standard ML programming language.
Here are some of the highlights of Standard ML:
ML is a functional programming language. Functions are rst-class
data objects: they may be passed as arguments, returned as results,
and stored in variables. The principal control mechanism in ML is
recursive function application.
ML is an interactive language. Every phrase read is analyzed, compiled,
and executed, and the value of the phrase is reported, together with its
type.
ML is strongly typed. Every legal expression has a type which is deter-
mined automatically by the compiler. Strong typing guarantees that
no program can incur a type error at run time, a common source of
bugs.
ML has a polymorphic type system. Each legal phrase has a uniquely-
determined most general typing that determines the set of contexts in
which that phrase may be legally used.
ML supports abstract types. Abstract types are a useful mechanism for
program modularization. New types together with a set of functions on
objects of that type may be dened. The details of the implementation
are hidden from the user of the type, achieving a degree of isolation
that is crucial to program maintenance.
1
2 CHAPTER 1. INTRODUCTION
ML prompts with \- ," and precedes its output with \> ." The user entered
the phrase \3+2". ML evaluated this expression and printed the value, \5",
of the phrase, together with its type, \int".
Various sorts of errors can arise during an interaction with ML. Most
of these fall into three categories: syntax errors, type errors, and run-time
faults. You are probably familiar with syntax errors and run-time errors from
your experience with other programming languages. Here is an example of
what happens when you enter a syntactically incorrect phrase:
- let x=3 in x end;
Parse error: Was expecting "in" in ... let <?> x ...
1 The details of the interaction with the ML top level vary from one implementation to
another, but the overall \feel" is similar in all systems known to the author. These notes
were prepared using the Edinburgh compiler, circa 1988.
3
4 CHAPTER 2. THE CORE LANGUAGE
the right a value of type , is a type. More signicantly, the set of functions
mapping one type to another form a type. In addition to these and other
basic types, ML allows for user-dened types. We shall return to this point
later.
Expressions in ML denote values in the same way that numerals denote
numbers. The type of an expression is determined by a set of rules that
guarantee that if the expression has a value, then the value of the expression
is a value of the type assigned to the expression (got that?) For example,
every numeral has type int since the value of a numeral is an integer. We
shall illustrate the typing system of ML by example.
2.2.1 Unit
The type unit consists of a single value, written (), sometimes pronounced
\unit" as well. This type is used whenever an expression has no interesting
value, or when a function is to have no arguments.
2.2.2 Booleans
The type bool consists of the values true and false. The ordinary boolean
negation is available as not; the boolean functions andalso and orelse are
also provided as primitive.
The conditional expression, if e then e1 else e2, is also considered
here because its rst argument, e, must be a boolean. Note that the else
clause is not optional! The reason is that this \if" is a conditional expres-
sion, rather than a conditional command, such as in Pascal. If the else
clause were omitted, and the test were false, then the expression would have
no value! Note too that both the then expression and the else expression
must have the same type. The expression
if true then true else ()
is type incorrect, or ill-typed, since the type of the then clause is bool ,
whereas the type of the else clause is unit.
- not true;
> false : bool
- false andalso true;
6 CHAPTER 2. THE CORE LANGUAGE
2.2.3 Integers
The type int is the set of (positive and negative) integers. Integers are
written in the usual way, except that negative integers are written with the
tilde character \~" rather than a minus sign.
- 75;
> 75 : int
- ~24;
> ~24 : int
- (3+2) div 2;
> 2 : int
- (3+2) mod 2;
> 1 : int
The usual arithmetic operators, +, -, *, div, and mod, are available, with
div and mod being integer division and remainder, respectively. The usual
relational operators, <, <=, >, >=, =, and <>, are provided as well. They each
take two expressions of type int and return a boolean according to whether
or not the relation holds.
- 3<2;
> false : bool
- 3*2 >= 12 div 6;
> true : bool
- if 4*5 mod 3 = 1 then 17 else 51;
> 51 : int
Notice that the relational operators, when applied to two integers, evaluate
to either true or false, and therefore have type bool.
2.2. BASIC EXPRESSIONS, VALUES, AND TYPES 7
2.2.4 Strings
The type string consists of the set of nite sequences of characters. Strings
are written in the conventional fashion as characters between double quotes.
The double quote itself is written \n"".
- "Fish knuckles";
> "Fish knuckles" : string
- "\"";
> """ : string
Special characters may also appear in strings, but we shall have no need of
them. Consult the ML language denition [7] for the details of how to build
such strings.
The function size returns the length, in characters, of a string, and the
function ^ is an inx append function.2
- "Rhinocerous " ^ "Party";
> "Rhinocerous Party"
- size "Walrus whistle";
> 14 : int
- 3E2;
> 300.0 : real
- 3.14159E2;
> 314.159 : real
The usual complement of basic functions on the reals are provided. The
arithmetic functions ~, +, -, and * may be applied to real numbers, though
one may not mix and match: a real can only be added to a real, and not to
an integer. The relational operators =, <>, <, and so on, are also dened for
the reals in the usual way. Neither div nor mod are dened for the reals, but
the function / denotes ordinary real-valued division. In addition there are
functions such as sin, sqrt, and exp for the usual mathematical functions.
The function real takes an integer to the corresponding real number, and
floor truncates a real to the greatest integer less than it.
- 3.0+2.0;
> 5.0 : real
- (3.0+2.0) = real(3+2);
> true : bool
- floor(3.2);
> 3 : real
- floor(~3.2);
> ~4 : real
- cos(0.0);
> 1.0 : real
- cos(0);
Type clash in: (cos 0)
Looking for a: real
I have found a: int
This completes the set of atomic types in ML. We now move on to the
compound types, those that are built up from other types.
2.2.6 Tuples
The type * , where and are types, is the type of ordered pairs whose
rst component has type and whose second component has type . Ordered
pairs are written (e1,e2), where e1 and e2 are expressions. Actually, there's
2.2. BASIC EXPRESSIONS, VALUES, AND TYPES 9
2.2.7 Lists
The type list consists of nite sequences, or lists, of values of type . For
instance, the type int list consists of lists of integers, and the type bool
list list consists of lists of lists of booleans. There are two notations for
lists, the basic one and a convenient abbreviation. The rst is based on the
following characterization of lists: a list is either empty, or it consists of
a value of type followed by a list. This characterization is re
ected in the
following notation for lists: the empty list is written nil and a non-empty
list is written e::l, where e is an expression of some type and l is some
list. The operator :: is pronounced \cons", after the LISP list-forming
function by that name.
If you think about this denition for a while, you'll see that every non-
empty list can be written in this form:
10 CHAPTER 2. THE CORE LANGUAGE
the intuitive meaning of a list of values of a given type. The role of nil is to
serve as the terminator for a list | every list has the form illustrated above.
This method of dening a type is called a recursive type denition. Such
denitions characteristically have one or more base cases, or starting points,
and one or more recursion cases. For lists, the base case is the empty list,
nil, and the recursion case is cons, which takes a list and some other value
and yields another list. Recursively-dened types occupy a central position
in functional programming because the organization of a functional program
is determined by the structure of the data objects on which it computes.
Here are some examples of using nil and :: to build lists:
- nil;
> [] : 'a list
- 3 :: 4 :: nil;
> [3,4] : int list
- ( 3 :: nil ) :: ( 4 :: 5 :: nil ) :: nil;
> [[3],[4,5]] : int list list
- ["This", "is", "it"];
> ["This","is","it"] : string list
2.2.8 Records
The last compound type that we shall consider in this section is the record
type. Records are quite similar to Pascal records and to C structures (and to
similar features in other programming languages). A record consists of a nite
set of labelled elds, each with a value of any type (as with tuples, dierent
elds may have dierent types). Record values are written by giving a set of
equations of the form l = e, where l is a label and e is an expression, enclosed
in curly braces. The equation l = e sets the value of the eld labelled l to the
value of e. The type of such a value is a set of pairs of the form l : t where
l is a label and is a type, also enclosed in curly braces. The order of the
equations and typings is completely immaterial | components of a record are
identied by their label, rather than their position. Equality is component-
wise: two records are equal if their corresponding elds (determined by label)
are equal.
- {name="Foo",used=true};
> {name="Foo", used=true} : {name:string, used:bool}
- {name="Foo",used=true} = {used=not false,name="Foo"};
> true : bool
- {name="Bar",used=true} = {name="Foo",used=true};
> false : bool
Tuples are special cases of records. The tuple type * is actually
short-hand for the record type f 1 : , 2 : g with two elds labeled
12 CHAPTER 2. THE CORE LANGUAGE
\1" and \2". Thus the expressions (3,4) and f1=3,2=4g have precisely the
same meaning.
This completes our introduction to the basic expressions, values, and
types in ML. It is important to note the regularity in the ways of forming
values of the various types. For each type there are basic expression forms
for denoting values of that type. For the atomic types, these expressions
are the constants of that type. For example, the constants of type int are
the numerals, and the constants of type string are the character strings,
enclosed in double quotes. For the compound types, values are built using
value constructors, or just constructors, whose job is to build a member of
a compound type out of the component values. For example, the pairing
constructor, written ( , ), takes two values and builds a member of a tuple
type. Similarly, nil and :: are constructors that build members of the list
type, as do the square brackets. The record syntax can also be viewed as a
(syntactically elaborate) constructor for record types. This view of data as
being built up from constants by constructors is one of the fundamental prin-
ciples underlying ML and will play a crucial role in much of the development
below.
There is one more very important type in ML, the function type. Before
we get to the function type, it is convenient to take a detour through the
declaration forms of ML, and some of the basic forms of expressions. With
that under our belt, we can more easily discuss functions and their types.
- val pair = ( x, s );
> val pair = (20,"Abcdef") : int * string
The second binding for x hides the previous binding, and does not aect
the value of y. Whenever an identier is used in an expression, it refers to
the closest textually enclosing value binding for that identier. Thus the
occurrence of x in the right-hand side of the value binding for z refers to
the second binding of x, and hence has value true, not 17. This rule is no
dierent than that used in other block-structured languages, but it is worth
emphasizing that it is the same.
Multiple identiers may be bound simultaneously, using the keyword
\and" as a separator:
- val x = 17;
> val x = 17 : int
- val x = true and y = x;
> val x = true : bool
val y = 17 : int
Notice that y receives the value 17, not true! Multiple value bindings joined
by and are evaluated in parallel | rst all of the right-hand sides are evalu-
ated, then the resulting values are all bound to their corresponding left-hand
sides.
In order to facilitate the following explanation, we need to introduce some
terminology. We said that the role of a declaration is to dene an identier
for use in a program. There are several ways in which an identier can be
used, one of which is as a variable. To declare an identier for a particular
use, one uses the binding form associated with that use. For instance, to
declare an identier as a variable, one uses a value binding (which binds
a value to the variable and establishes its type). Other binding forms will
be introduced later on. In general, the role of a declaration is to build an
environment, which keeps track of the meaning of the identiers that have
been declared. For instance, after the value bindings above are processed,
the environment records the fact that the value of x is true and that the
value of y is 17. Evaluation of expressions is performed with respect to this
environment, so that the value of the expression x can be determined to be
true.
Just as expressions can be combined to form other expressions by using
functions like addition and pairing, so too can declarations be combined with
2.3. IDENTIFIERS, BINDINGS, AND DECLARATIONS 15
When two declarations are combined with semicolon, ML rst evaluates the
left-hand declaration, producing an environment E , and then evaluates the
right-hand declaration (with respect to E ), producing environment E . The 0
second declaration may hide the identiers declared in the rst, as indicated
above.
It is also useful to be able to have local declarations whose role is to
assist in the construction of some other declarations. This is accomplished
as follows:
- local
val x = 10
in
val u = x*x + x*x
val v = 2*x + (x div 5)
end;
> val u = 200 : int
val v = 22 : int
The binding for x is local to the bindings for u and v, in the sense that x is
available during the evaluation of the bindings for u and v, but not thereafter.
This is re
ected in the result of the declaration: only u and v are declared.
It is also possible to localize a declaration to an expression using let:
- let
val x = 10
in
3 The semicolon is syntactically optional: two sequential bindings are considered to be
separated by a semicolon.
16 CHAPTER 2. THE CORE LANGUAGE
x*x + 2*x + 1
end;
- 121 : int
The declaration of x is local to the expression occurring after the in, and is
not visible from the outside. The body of the let is evaluated with respect
to the environment built by the declaration occurring before the in. In
this example, the declaration binds x to the value 10. With respect to this
environment, the value of x*x+2*x+1 is 121, and this is the value of the whole
expression.
Exercise 2.3.1 What is the result printed by the ML system in response to
the following declarations? Assume that there are no initial bindings for x,
y or z.
2.4 Patterns
You may have noticed that there is no means of obtaining, say, the rst com-
ponent of a tuple, given only the expressions dened so far. Compound values
are decomposed via pattern matching. Values of compound types are them-
selves compound, built up from their component values by the use of value
constructors. It is natural to use this structure to guide the decomposition
of compound values into their component parts.
Suppose that x has type int*bool. Then x must be some pair, with the
left component an integer and the right component a boolean. We can obtain
the value of the left and right components using the following generalization
of a value binding.
- val x = ( 17, true );
> val x = (17,true) : int*bool
- val ( left, right ) = x;
> val left = 17 : int
val right = true : bool
2.4. PATTERNS 17
The left-hand side of the second value binding is a pattern, which is built up
from variables and constants using value constructors. That is, a pattern is
just an expression, possibly involving variables. The dierence is that the
variables in a pattern are not references to previously-bound variables, but
rather variables that are about to be bound by pattern-matching. In the
above example, left and right are two new value identiers that become
bound by the value binding. The pattern matching process proceeds by
traversing the value of x in parallel with the pattern, matching corresponding
components. A variable matches any value, and that value is bound to that
identier. Otherwise (i.e., when the pattern is a constant) the pattern and
the value must be identical. In the above example, since x is an ordered pair,
the pattern match succeeds by assigning the left component of x to left,
and the right component to right.
Notice that the simplest case of a pattern is a variable. This is the form
of value binding that we introduced in the previous section.
It does not make sense to pattern match, say, an integer against an or-
dered pair, nor a list against a record. Any such attempt results in a type
error at compile time. However, it is also possible for pattern matching to
fail at run time:
- val x=(false,17);
> val x = (false,17) : bool*int
- val (false,w) = x;
> val w = 17 : int
- val (true,w) = x;
Failure: match
Notice that in the second and third value bindings, the pattern has a con-
stant in the left component of the pair. Only a pair with this value as left
component can match this pattern successfully. In the case of the second
binding, x in fact has false as left component, and therefore the match suc-
ceeds, binding 17 to w. But in the third binding, the match fails because
true does not match false. The message Failure: match indicates that
a run-time matching failure has occurred.
Pattern matching may be performed against values of any of the types
that we have introduced so far. For example, we can get at the components
of a three element list as follows:
18 CHAPTER 2. THE CORE LANGUAGE
Here hd is bound to the rst element of the list l (called the head of l), and
tl is bound to the list resulting from deleting the rst element (called the
tail of the list). The type of hd is string and the type of tl is string list.
The reason is that :: constructs lists out of a component (the left argument)
and another list.
Exercise 2.4.1 What would happen if we wrote val [hd,tl] = l; instead
of the above. (Hint: expand the abbreviated notation into its true form, then
match the result against l).
Suppose that all we are interested in is the head of a list, and are not
interested in its tail. Then it is inconvenient to have to make up a name
for the tail, only to be ignored. In order to accommodate this \don't care"
case, ML has a wildcard pattern that matches any value whatsoever, without
creating a binding.
- val l = ["Lo", "and", "behold"];
> val l = ["Lo","and","behold"] : string list
- val hd::_ = l;
> val hd = "Lo" : string
2.4. PATTERNS 19
Pattern matching proceeds as before, binding l and r to the left and right
components of x, but in addition the binding of l is further matched against
the pattern (ll,lr), binding ll and lr to the left and right components of
l. The results are printed as usual.
Before you get too carried away with pattern matching, you should real-
ize that there is one signicant limitation: patterns must be linear: a given
pattern variable may occur only once in a pattern. This precludes the possi-
bility of writing a pattern (x,x) which matches only symmetric pairs, those
for which the left and right components have the same value. This restriction
causes no diculties in practice, but it is worth pointing out that there are
limitations.
Exercise 2.4.2 Bind the variable x to the value 0 by constructing patterns
to match against the following expressions.
For example, given the expression (true,"hello",0), the required pattern
is ( , ,x).
1. { a=1, b=0, c=true }
2. [ ~2, ~1, 0, 1, 2 ]
3. [ (1,2), (0,1) ]
2.5. DEFINING FUNCTIONS 21
identier. ML has no such restriction. Functions are perfectly good values,
and so may be designated by arbitrarily complex expressions. Therefore the
general form of an application is e e , which is evaluated by rst evaluating e,
0
applying f to v. In the simple case that e is an identier, such as size, then
the evaluation of e is quite simple | simply retrieve the value of size, which
had better be a function. But in general, e can be quite complex and require
any amount of computation before returning a function as value. Notice
that this rule for evaluation of function application uses the call-by-value
parameter passing mechanism since the argument to a function is evaluated
before the function is applied.
How can we guarantee that in an application e e , e will in fact evaluate
0
A function type has the form -> , pronounced \ to ," where and
are types. An expression of this type has as value a function that whenever
it is applied to a value of type , returns a value of type , provided that
it terminates (unfortunately, there is no practical means of ensuring that all
functions terminate for all arguments). The type is called the domain type
of the function, and is called its range type. An application e e is legal
0
only if e has type -> and e has type , that is, only if the type of the
0
argument matches the domain type of the function. The type of the whole
expression is then , which follows from the denition of the type -> .
For example,
- size;
size = fn : string -> int
- not;
not = fn : bool -> bool
- not 3;
Type clash in: not 3
Looking for a: bool
I have found a: int
The type of size indicates that it takes a string as argument and returns
an integer, just as we might expect. Similarly, not is a function that takes a
boolean and returns a boolean. Functions have no visible structure, and so
print as \fn". The application of not to 3 fails because the domain type of
not is bool, whereas the type of 3 is int.
Since functions are values, we can bind them to identiers using the value
binding mechanism introduced in the last section. For example,
- val len = size;
> val len = fn : string -> int
- len "abc";
> 3 : int
The identier size is bound to some (internally-dened) function with type
string->int. The value binding above retrieves the value of size, some
function, and binds it to the identier len. The application len "abc" is
processed by evaluating len to obtain some function, evaluating "abc" to
obtain a string (itself), and applying that function to that string. The result
2.5. DEFINING FUNCTIONS 23
Functions are dened using function bindings that are introduced by the
24 CHAPTER 2. THE CORE LANGUAGE
Any call to f will loop forever, calling itself over and over.
2.5. DEFINING FUNCTIONS 25
- fun append(nil,l) = l
| append(hd::tl,l) = hd :: append(tl,l);
> val append = fn : ( 'a list * 'a list ) -> 'a list
There are two cases to consider, one for the empty list and one for a non-
empty list, in accordance with the inductive structure of lists. It is trivial to
append a list l to the empty list: the result is just l. For non-empty lists,
we can append l to hd::tl by cons'ing hd onto the result of appending l to
tl.
The type of append is a polytype; that is, it is a type that involves the
type variable 'a. The reason is that append obviously works no matter what
the type of the elements of the list are | the type variable 'a stands for the
type of the elements of the list, and the type of append ensures that both
lists to be appended have the same type of elements (which is the type of the
elements of the resulting list). This is an example of a polymorphic function;
it can be applied to a variety of lists, each with a dierent element type.
Here are some examples of the use of append:
- append([],[1,2,3]);
> [1,2,3] : int list
- append([1,2,3],[4,5,6]);
> [1,2,3,4,5,6] : int list
- append(["Bowl","of"],["soup"]);
> ["Bowl", "of", "soup"] : string list
Notice that we used append for objects of type int list and of type string
list.
In general ML assigns the most general type that it can to an expression.
By \most general", we mean that the type re
ects only the commitments
that are made by the internal structure of the expression. For example, in
the denition of the function append, the rst argument is used as the target
2.5. DEFINING FUNCTIONS 27
of a pattern match against nil and ::, forcing it to be of some list type.
The type of the second argument must be a list of the same type since it
is potentially cons'd with an element of the rst list. These two constraints
imply that the result is a list of the same type as the two arguments, and
hence append has type ('a list * 'a list) -> 'a list.
Returning to the example above of a function f(x) dened to be f(x), we
see that the type is 'a->'b because, aside from being a function, the body of
f makes no commitment to the type of x, and hence it is assigned the type
'a, standing for any type at all. The result type is similarly uncommitted,
and so is taken to be 'b, an arbitrary type. You should convince yourself
that no type error can arise from any use of f, even though it has the very
general type 'a->'b.
Function bindings are just another form of declaration, analogous to the
value bindings of the previous section (in fact, function bindings are just a
special form of value binding). Thus we now have two methods for building
declarations: value bindings and function bindings. This implies that a func-
tion may be dened anywhere that a value may be declared; in particular,
local function denitions are possible. Here is the denition of an ecient
list reversal function:
- fun reverse l =
let fun rev(nil,y) = y
| rev(hd::tl,y) = rev(tl,hd::y)
in
rev(l,nil)
end;
> val reverse = fn : 'a list -> 'a list
The function rev is a local function binding that may be used only within
the let. Notice that rev is dened by recursion on its rst argument, and
reverse simply calls rev, and hence does not need to decompose its argu-
ment l.
Functions are not restricted to using parameters and local variables |
they may freely refer to variables that are available when the function is
dened. Consider the following denition:
- fun pairwith(x,l) =
let fun p y = (x,y)
28 CHAPTER 2. THE CORE LANGUAGE
in map p l
end;
> val pairwith = fn : 'a * 'b list -> ('a*'b) list
- val l=[1,2,3];
> val l = [1,2,3] : int list
- pairwith("a",l);
> [("a",1),("a",2),("a",3)] : ( string * int ) list
The local function p has a non-local reference to the identier x, the pa-
rameter of the function pairwith. The same rule applies here as with other
non-local references: the nearest enclosing binding is used. This is exactly
the same rule that is used in other block structured languages such as Pascal
(but diers from the one used in most implementations of LISP).
Exercise 2.5.6 A \perfect number" is one that is equal to the sum of all
its factors (including 1 but not including itself). For example, 6 is a perfect
number because 6 = 3 + 2 + 1. Dene the predicate isperfect to test for
perfect numbers.
It was emphasized above that in ML functions are values; they have the
same rights and privileges as any other value. In particular, this means that
functions may be passed as arguments to other functions, and applications
may evaluate to functions. Functions that use functions in either of these
ways are called higher order functions. The origin of this terminology is
somewhat obscure, but the idea is essentially that functions are often taken
to be more complex data items than, say, integers (which are called \rst
order" objects). The distinction is not absolute, and we shall not have need
to make much of it, though you should be aware of roughly what is meant
by the term.
First consider the case of a function returning a function as result. Sup-
pose that f is such a function. What must its type look like? Let's suppose
that it takes a single argument of type . Then if it is to return a function
as result, say a function of type ->, then the type of f must be ->(->)
This re
ects the fact that f takes an object of type , and returns a function
whose type is ->. The result of any such application of f may itself be
applied to a value of type , resulting in a value of type . Such a succes-
sive application is written f(e1)(e2), or just f e1 e2; this is not the same
as f(e1,e2)! Remember that (e1,e2) is a single object, consisting of an
2.5. DEFINING FUNCTIONS 29
Notice how the type of map re
ects the correlation between the type of the
list elements and the domain type of the function, and between the range
type of the function and the result type.
Here are some examples of using map:
- val l = [1,2,3,4,5];
> val l = [1,2,3,4,5] : int list
- map twice l;
> [2,4,6,8,10] : int list
- fun listify x = [x];
> val listify = fn : 'a -> 'a list
- map listify l;
> [[1],[2],[3],[4],[5]] : int list list
Let's walk through this carefully. The function compose takes a pair of
functions as argument and returns a function; this function, when applied
to x returns f(g(x)). Since the result is f(g(x)), the type of x must be
the domain type of g; since f is applied to the result of g(x), the domain
type of f must be the range type of g. Hence we get the type printed
above. The function fourtimes is obtained by applying compose to the pair
(twice,twice) of functions. The result is a function that, when applied to
x, returns twice(twice(x)); in this case, x is 5, so the result is 20.
Now that you've gained some familiarity with ML, you may feel that it
is a bit peculiar that declarations and function values are intermixed. So far
2.5. DEFINING FUNCTIONS 31
there is no primitive expression form for functions: the only way to designate
a function is to use a fun binding to bind it to an identier, and then to refer
to it by name. But why should we insist that all functions have names?
There is a good reason for naming functions in certain circumstances, as we
shall see below, but it also makes sense to have anonymous functions, or
lambda's (the latter terminology comes from LISP and the -calculus.)
Here are some examples of the use of function constants and their rela-
tionship to clausal function denitions:
- fun listify x = [x];
> val listify = fn : 'a->'a list
- val listify2 = fn x=>[x];
> listify2 = fn : 'a->'a list
- listify 7;
> [7] : int list
- listify2 7;
> [7] : int list
- (fn x=>[x])(7);
> [7] : int list
- val l=[1,2,3];
> val l = [1,2,3] : int list
- map(fn x=>[x],l);
> [[1],[2],[3]] : int list list
The clauses that make up the denition of the anonymous function are col-
lectively called a match.
The very anonymity of anonymous functions prevents us from writing
down an anonymous function that calls itself recursively. This is the reason
why functions are so closely tied up with declarations in ML: the purpose of
the fun binding is to arrange that a function have a name for itself while it
is being dened.
Exercise 2.5.8 Consider the problem of deciding how many dierent ways
there are of changing $1 into 1, 2, 5, 10, 20 and 50 pence coins. Suppose
that we impose some order on the types of coins. Then it is clear that the
following relation holds
Number of ways to change amount a using n kinds of coins
= Number of ways to change amount a using all but the rst kind of coin
+ Number of ways to change amount a-d using all n kinds of coins,
where d is the denomination of the rst kind of coin.
This relation can be transformed into a recursive function if we specify the
degenerate cases that terminate the recursion. If a = 0, we will count this as
one way to make change. If a < 0, or n = 0, then there is no way to make
change. This leads to the following recursive denition to count the number
of ways of changing a given amount of money.
fun first_denom 1 = 1
| first_denom 2 = 2
| first_denom 3 = 5
| first_denom 4 = 10
| first_denom 5 = 20
| first_denom 6 = 50;
fun cc(0,_) = 1
| cc(_,0) = 0
| cc(amount, kinds) =
if amount < 0 then 0
2.6. POLYMORPHISM AND OVERLOADING 33
else
cc(amount-(first_denom kinds), kinds)
+ cc(amount, (kinds-1));
for a wide class of types. For example, the type of append was seen to be
'a list * 'a list -> 'a list, re
ecting the fact that append does not
care what the component values of the list are, only that the two arguments
are both lists having elements of the same type. The type of a polymorphic
function is always a polytype, and the collection of types for which it is
dened is the innite collection determined by the instances of the polytype.
For example, append works for int list's and bool list's and int*bool
list's, and so on ad innitum. Note that polymorphism is not limited to
functions: the empty list nil is a list of every type, and thus has type 'a
list.
This phenomenon is to be contrasted with another notion, known as over-
loading. Overloading is a much more ad hoc notion than polymorphism be-
cause it is more closely tied up with notation than it is with the structure of
a function's denition. A ne example of overloading is the addition func-
tion, +. Recall that we write 3+2 to denote the sum of two integers, 3 and
2, and that we also write 3.0+2.0 for the addition of the two real numbers
3.0 and 2.0. This may seem like the same phenomenon as the appending
of two integer lists and the appending of two real lists, but the similarity is
only apparent: the same append function is used to append lists of any type,
but the algorithm for addition of integers is dierent from that for addition
for real numbers. (If you are familiar with typical machine representations
of integers and
oating point numbers, this point is fairly obvious.) Thus
the single symbol + is used to denote two dierent functions, and not a sin-
gle polymorphic function. The choice of which function to use in any given
instance is determined by the type of the arguments.
This explains why it is not possible to write fun plus(x,y)=x+y in ML:
the compiler must know the types of x and y in order to determine which
addition function to use, and therefore is unable to accept this denition. The
way around this problem is to explicitly specify the type of the argument to
plus by writing fun plus(x:int,y:int)=x+y so that the compiler knows
that integer addition is intended. It it an interesting fact that in the absence
of overloaded identiers such as +, it is never necessary to include explicit
type information.5 But in order to support overloading and to allow you to
explicitly write down the intended type of an expression as a double-checking
measure, ML allows you to qualify a phrase with a type expression. Here are
5 Except occasionally when using partial patterns, as in fun f fx,...g = x
2.6. POLYMORPHISM AND OVERLOADING 35
some examples:
- fun plus(x,y) = x+y;
Unresolvable overloaded identifier: +
- fun plus(x:int,y:int) = x+y;
> val plus = fn : int*int->int
- 3 : bool;
Type clash in: 3 : bool
Looking for a: bool
I have found a: int
- (plus,true): (int*int->int) * bool;
> (fn, true) : (int*int->int) * bool
- fun id(x:'a) = x;
> val id = fn : 'a -> 'a
Note that one can write polytypes just as they are printed by ML: type
variables are identiers preceded by a single quote.
Equality is an interesting \in-between" case. It is not a polymorphic
function in the same sense that append is, yet, unlike +, it is dened for
arguments of (nearly) every type. As discussed above, not every type admits
equality, but for every type that does admit equality, there is a function =
that tests whether or not two values of that type are equal, returning true
or false, as the case may be. Now since ML can tell whether or not a
given type admits equality, it provides a means of using equality in a \quasi-
polymorphic" way. The trick is to introduce a new kind of type variable,
written ''a, which may be instantiated to any type that admits equality (an
\equality type", for short). The ML type checker then keeps track of whether
a type is required to admit equality, and re
ects this in the inferred type of
a function by using these new type variables. For example,
- fun member( x, nil ) = false
| member( x, h::t ) = if x=h then true else member(x,t);
> val member = fn : ''a * ''a list -> bool
The occurrences of ''a in the type of member limit the use of member to those
types that admit equality.
36 CHAPTER 2. THE CORE LANGUAGE
This declaration declares the identier color to be a new data type, with
2.7. DEFINING TYPES 37
Functions may be dened over a user-dened data type by pattern match-
ing, just as for the primitive types. The value constructors for that data type
determine the overall form of the function denition, just as nil and :: are
used to build up patterns for functions dened over lists. For example,
- fun favorite Red = true
| favorite Blue = false
| favorite Yellow = false ;
> val favorite = fn : color->bool
- val color = Red;
> val color = Red : color
- favorite color;
> true : bool
This example also illustrates the use of the same identier in two dierent
ways. The identier color is used as the name of the type dened above,
and as a variable bound to Red. This mixing is always harmless (though
6 Nullary constructors (those with no arguments) are sometimes called constants.
38 CHAPTER 2. THE CORE LANGUAGE
perhaps confusing) since the compiler can always tell from context whether
the type name or the variable name is intended.
Not all user-dened value constructors need be nullary:
- datatype money = nomoney | coin of int | note of int |
check of string*int ;
> type money
con nomoney : money
con coin : int->money
con note : int->money
con check : string*int->money
- fun amount(nomoney) = 0
| amount(coin(pence)) = pence
| amount(note(pounds)) = 100*pounds
| amount(check(bank,pence)) = pence ;
> val amount = fn : money->int
The type money has four constructors, one a constant, and three with ar-
guments. The function amount is dened by pattern-matching using these
constructors, and returns the amount in pence represented by an object of
type money.
What about equality for user-dened data types? Recall the denition
of equality of lists: two lists are equal i either they are both nil, or they
are of the form h::t and h'::t', with h equal to h' and t equal to t'. In
general, two values of a given data type are equal i they are \built the same
way" (i.e., they have the same constructor at the outside), and corresponding
components are equal. As a consequence of this denition of equality for data
types, we say that a user-dened data type admits equality i each of the
domain types of each of the value constructors admits equality. Continuing
with the money example, we see that the type money admits equality because
both int and string do.
- nomoney = nomoney;
> true : bool
- nomoney = coin(5);
> false : bool
- coin(5) = coin(3+2);
> true : bool
2.7. DEFINING TYPES 39
Notice how the denition parallels the informal description of a binary tree.
The function countleaves is dened recursively on btree's, returning the
number of leaves in that tree.
There is an important pattern to be observed here: functions on recursive-
ly-dened data values are dened recursively. We have seen this pattern
before in the case of functions such as append which is dened over lists.
The built-in type list can be considered to have been dened as follows:7
- datatype 'a list = nil | :: of 'a * 'a list ;
> type 'a list
con nil : 'a list
con :: : ('a * ('a list)) -> ('a list)
This example illustrates the use of a parametric data type declaration: the
type list takes another type as argument, dening the type of the members
of the list. This type is represented using a type variable, 'a in this case, as
argument to the type constructor list. We use the phrase \type constructor"
because list builds a type from other types, much as value constructors build
values from other values.
7 This example does not account for the fact that :: is an inx operator, but we will
neglect that for now.
40 CHAPTER 2. THE CORE LANGUAGE
The function frontier takes a tree as argument and returns a list consisting
of the values attached to the leaves of the tree.
Exercise 2.7.1 Design a function samefrontier(x,y) which returns true
if the same elements occur in the same order, regardless of the internal struc-
ture of x and y, and returns false otherwise. A correct, but unsatisfactory
denition is
fun samefrontier(x,y) = (frontier x) = (frontier y)
This is a dicult exercise, the problem being to avoid
attening a huge tree
when it is frontier unequal to the one with which it is being compared.
ML also provides a mechanism for dening abstract types using an abstype
binding.8 An abstract type is a data type with a set of functions dened on
it. The data type itself is called the implementation type of the abstract type,
and the functions are called its interface. The type dened by an abstype
binding is abstract because the constructors of the implementation type are
8 Abstract types in this form are, for the most part, superseded by the modules system
described in the next chapter.
2.7. DEFINING TYPES 41
hidden from any program that uses the type (called a client): only the inter-
face is available. Since programs written to use the type cannot tell what the
implementation type is, they are restricted to using the functions provided
by the interface of the type. Therefore the implementation can be changed
at will, without aecting the programs that use it. This is an important
mechanism for structuring programs so as to prevent interference between
components.
Here is an example of an abstract type declaration.
- abstype color = blend of int*int*int
with val white = blend(0,0,0)
and red = blend(15,0,0)
and blue = blend(0,15,0)
and yellow = blend(0,0,15)
fun mix(parts:int, blend(r,b,y),
parts':int, blend(r',b',y')) =
if parts<0 orelse parts'<0 then white
else let val tp=parts+parts'
and rp = (parts*r+parts'*r') div tp
and bp = (parts*b+parts'*b') div tp
and yp = (parts*y+parts'*y') div tp
in blend(rp,bp,yp)
end
end;
> type color
val white = - : color
val red = - : color
val blue = - : color
val yellow = - : color
val mix = fn : int*color*int*color->color
- val green = mix(2, yellow, 1, blue);
> val green = - : color
- val black = mix(1, red, 2, mix(1, blue, 1, yellow));
> val black = - : color
There are several things to note about this declaration. First of all, the
type equation occurring right after abstype is a data type declaration: ex-
actly the same syntax applies, as the above example may suggest. Following
42 CHAPTER 2. THE CORE LANGUAGE
fun singleton
(e: 'a): 'a set = ...
fun union(s1:
'a set, s2: 'a set): 'a set = ...
fun member(e:
'a, s: 'a set): bool = ...
| member(e,
set (h::t)) = (e = h)
orelse member(e, set t)
fun intersection(s1: 'a set, s2: 'a set): 'a set = ...
end;
2.8 Exceptions
Suppose that we wish to dene a function head that returns the head of a list.
The head of a non-empty list is easy to obtain by pattern-matching, but what
about the head of nil? Clearly something must be done to ensure that head
is dened on nil, but it is not clear what to do. Returning some default value
is undesirable, both because it is not at all evident what value this might be,
and furthermore it limits the usability of the function (if head(nil) were
dened to be, say, nil, then head would apply only to lists of lists).
In order to handle cases like this, ML has an exception mechanism. The
purpose of the exception mechanism is to provide the means for a function to
\give up" in a graceful and type-safe way whenever it is unable or unwilling
to return a value in a certain situation. The graceful way to write head is as
follows:
- exception Head;
> exception Head
44 CHAPTER 2. THE CORE LANGUAGE
then raise that exception. Notice that the type of e and the type of e must0
be the same; otherwise, the entire expression would have a dierent type
depending on whether or not the left-hand expression raised an exception.
This explains why the type of head2 is int list->int, even though l does
2.8. EXCEPTIONS 45
The function foo may fail in one of two ways: by dividing by zero, causing
the exception Div to be raised, or by having an odd argument, raising the
exception Odd. The function bar is dened so as to handle either of these
contingencies: if foo(m) raises the exception Odd, then bar(m) returns 0; if
it raises Div, it returns 9999; otherwise it returns the value of foo(m).
Notice that the syntax of a multiple-exception handler is quite like the
syntax used for a pattern-matching denition of a lambda. In fact, one
can think of an exception handler as an anonymous function whose domain
type is exn, the type of exceptions, and whose range type is the type of the
46 CHAPTER 2. THE CORE LANGUAGE
expression appearing to the left of handle. From the point of view of type
checking, exceptions are nothing more than constructors for the type exn,
just as nil and cons are constructors for types of the form 'a list.
It follows that exceptions can carry values, simply by declaring them to
take an argument of the appropriate type. The attached value of an exception
can be used by the handler of the exception. An example will illustrate the
point.
- exception oddlist of int list and oddstring of string;
> exception oddlist of int list
exception oddstring of string
- ... handle oddlist(nil) => 0
| oddlist(h::t) => 17
| oddstring("") => 0
| oddstring(s) => size(s)-1
> 8 : int
Since exceptions are really values of type exn, the argument to a raise
expression need not be simply an identier. For example, the function f
above might have been dened by
- fun f(x) = raise (if x=0 then Mine else Theirs);
> val f = fn : int -> 'a
Despite appearances, the outer handler cannot handle the exception raised
by the raise expression in the body of the let, for the inner Exc is a distinct
exception that cannot be caught outside of the scope of its declaration other
than by a wild-card handler.
Exercise 2.8.1 Explain what is wrong with the following two programs.
1. exception exn: bool;
fun f x =
let exception exn: int
in if x > 100 then raise exn with x else x+1
end;
f(200) handle exn with true => 500 | false => 1000;
48 CHAPTER 2. THE CORE LANGUAGE
2. fun f x =
let exception exn
in if p x then a x
else if q x then f(b x) handle exn => c x
else raise exn with d x
end;
f v;
Exercise 2.8.3 Modify your program so that it returns all solutions to the
problem.
- val x = ref 0;
> val x = ref(0) : int ref;
- !x;
> 0 : int
- x := 3;
> () : unit;
- !x;
> 3 : int
9 At present must be a monotype, though it is expected that one of several proposed
methods of handling polymorphic references will soon be adopted.
2.9. IMPERATIVE FEATURES 49
All reference types admit equality. Objects of type ref are heap ad-
dresses, and two such objects are equal i they are identical. Note that this
implies that they have the same contents, but the converse doesn't hold: we
can have two unequal references to the same value.
- val x = ref 0 ;
> val x = ref 0 : int ref
- val y = ref 0 ;
> val y = ref 0 : int ref
- x=y ;
> false : bool
- !x = !y ;
> true : bool
Exercise 2.9.1 The following abstract type may be used to create an innite
stream of values.
abstype 'a stream = stream of unit -> ('a * 'a stream)
with fun next(stream f) = f()
val mkstream = stream
end;
Given a stream s, next s returns the rst value in the stream, and a stream
that produces the rest of the values. This is illustrated by the following ex-
ample:
50 CHAPTER 2. THE CORE LANGUAGE
Write a function that returns the innite list of prime numbers in the form
of a stream.
Exercise 2.9.2 The implementation of the stream abstract type given above
can be very inecient if the elements of the stream are examined more than
once. This is because the next function computes the next element of the
stream each time it is called. This is wasteful for an applicative stream (such
as the prime numbers example), as the value returned will always be the
same. Modify the abstract type so that this ineciency is removed by using
references.
Exercise 2.9.3 Modify your stream abstract type so that streams can be -
nite or innite, with a predicate endofstream to test whether the stream has
nished.
Chapter 3
The Modules System
3.1 Overview
The ability to decompose a large program into a collection of relatively inde-
pendent modules with well-dened interfaces is essential to the task of build-
ing and maintaining large programs. The ML modules sytem supplements
the core language with constructs to facilitate building and maintaining large
programs.
Many modern programming languages provide for some form of modular
decomposition of programs into relatively independent parts. Exactly what
constitutes a program unit and how they are related is by no means estab-
lished in the literature, and consequently there is no standard terminology.
Program components are variously called, among other things, \modules",
\packages", and \clusters"; in ML we use the term \structure", short for
\environment structure". This choice of terminology is telling: ML's con-
ception of a program unit is that it is a reied environment. Recall that the
environment is the repository of the meanings of the identiers that have
been declared in a program. For example, after the declaration val x=3,
the environment records the fact that x has value 3, which is of type int.
Now the fundamental notion underlying program modularization is that the
aim is to partition the environment into chunks that can be manipulated
relatively independently of one another. The reason for saying \relatively" is
that if two modules constitute a program, then there must be some form of
interaction between them, and there must be some means of expressing and
51
52 CHAPTER 3. THE MODULES SYSTEM
and signatures.
The expression S.x is a qualied name that refers to the value identier x in
the structure S. Its value, as you might expect, is 3. Similarly, S.f designates
the function f dened in the structure S, the factorial function. When it is
applied to S.x (that is, to 3), it returns 6. Reference to the identiers dened
by S is not limited to values: the last example illustrates the use of the type
identier S.t, dened in S to be int.
If you are writing a bit of code that refers to several components of a
single structure, it can get quite tedious to continually use qualied names.
1 For technical reasons some implementations of ML rearrange the environment before
printing.
3.2. STRUCTURES AND SIGNATURES 55
> structure S =
struct
val x = 4
val b = true
end
:
sig
val x : int
val b : bool
end
In this fanciful example, the type information for the variables appears in the
signature, whereas the value appears in the structure. This accords with our
intuitive idea of a signature as a description of a value, the structure. One
can see that the val binding format is rather awkward for \fat" objects like
structures, so the actual ML system prints an amalgamation of the structure
and its signature in response to a structure binding.
The expression bracketed by sig and end in the above example is called a
signature, the body of which is called a specication. A specication is similar
to a declaration, except that it merely describes an identier (by assigning it
a type) rather than giving it a value (and implicitly a type). For the present
we consider only val specications, adding the other forms as we go along.
In the above example, x is specied to have type int and b type bool.
Signature expressions are not limited to the output of the ML compiler.
They play a crucial role in the use of the modules system, particularly in
functor declarations, and therefore are often typed directly by the user. Sig-
natures may be bound to signature identiers using signature bindings in
much the same way that types may be bound to type identiers using type
bindings. Signature bindings are introduced with the keyword signature,
and may only appear at top level.
- signature SIG =
sig
val x : int
val b : bool
end;
> signature SIG =
sig
3.2. STRUCTURES AND SIGNATURES 57
val x : int
val b : bool
end;
The output from a signature binding is not very enlightening, and so I'll omit
it from future examples.
The primary signicance of signatures lies in signature matching. A struc-
ture matches a signature if, roughly, the structure satises the specication
in the signature. Since specications are similar to types, the idea is simi-
lar to type checking in the core language, though the details are a bit more
complex. One use of signatures is to attach them to structure identiers in
structure bindings as a form of correctness check in which we specify that
the structure being bound must match the given signature.
- structure S : SIG =
struct
val x = 2+1
val b = x=7
end;
> structure S =
struct
val x = 3 : int
val b = false : bool
end
The notation :SIG on the structure binding indicates that the encapsulated
declaration on the right of the equation must match the signature SIG.
Since ML accepted the above declaration, it must be that the structure
does indeed match the given signature. Why is that the case? The given
structure matches SIG because
1. S.x is bound to 3, which is of type int, as required by SIG,
and
2. S.b is bound to false, which is of type bool.
In short, if a variable x is assigned a type in a signature, then the corre-
sponding expression bound to x in the structure must have type .
The signature may require less than the structure presents. For example,
58 CHAPTER 3. THE MODULES SYSTEM
- structure S : SIG =
struct
val x = 2+1
val b = false
val s = "Garbage"
end;
> structure S =
struct
val x = 3 : int
val b = false : bool
end
Here the structure bound to S denes variables x, b, and s, while the signature
SIG only requires x and b. Not only is the type of s immaterial to the
signature matching, but it is also removed from the structure by the process
of signature matching. The idea is that SIG denes a view of the structure
consisting only of x and b. Other signatures may be used to obtain other
views of the same structure, as in the following example:
- structure S =
struct
val x = 2+1
val b = false
val s = "String"
end;
> structure S =
struct
val x = 3 : int
val b = false : bool
val s = "String" : string
end
- signature SIG' =
sig
val x : int
val b : bool
end
and SIG'' =
sig
3.2. STRUCTURES AND SIGNATURES 59
val b : bool
val s : string
end;
- structure S' : SIG' = S and S'' : SIG'' = s;
> structure S' =
struct
val x = 3 : int
val b = false : bool
end
structure S'' =
struct
val b = false : bool
val s = "String" `` string
end
Create structures for ordered integers and (real*string) pairs to match this
signature.
If a value in a structure has polymorphic type, then it satises a speci-
cation only if the polymorphic type has the specied type as an instance. So,
for example, if x is bound in some structure to nil, which as type 'a list,
then x satises the specications int list and bool list list, for exam-
ple, as should be obvious by now. But what happens if the specication type
is polymorphic? Let's suppose that an identier f is specied to have type
'a list->'a list. In order to satisfy this specication, a structure must
bind a value to f that can take an arbitrary list to another list of that type.
Thus it is not good enough that f be of type, say, int list->int list, for
the specication requires that f work for bool list as well. The general
principle is that the value in the structure must be at least as general as that
60 CHAPTER 3. THE MODULES SYSTEM
end
> structure S =
struct
type 'a t = 'a * bool
val x = (3,true) : int * bool
end
The structure bound to S matches SIG because S.t is a unary (one argument)
type constructor, as specied in SIG.
If a signature species a type constructor, then that type constructor may
be used in the remainder of the specication. Here's an example:
- signature SIG =
sig
type 'a t
val x: int t
end;
This signature species the class of structures that dene a unary type con-
structor t and a variable of type int t (for that type constructor t).
Now let's return to the structure S above, and consider whether or not
it matches this signature SIG. According to the informal reading of SIG just
given, S ought to match SIG. More precisely, S matches SIG because
1. S.t is a unary type constructor, as required;
2. The type of S.x is int*bool. Now int t is equal to int*bool,
by denition of S.t, and therefore S.x satises the speci-
cation int t.
It is important to realize that during signature matching, all of the type
identiers in the signature are taken to refer to the corresponding identiers
in the structure, so that the specication int t is taken to mean int S.t.
Exercise 3.2.3 Which signatures match the following structure?
structure S =
struct
type 'a t = 'a * int
val x = (true, 3)
end
64 CHAPTER 3. THE MODULES SYSTEM
Notice that 'a List is no longer a data type, and that Nil and Cons are
simply variables, not value constructors.
The other possibility is to specify the constructors as constructors so that
the structure of a type is visible. The way to do this is with the data type
specication, which is syntactically identical to the data type declaration.
Here's an example:
- signature SIG =
sig
datatype 'a List = Nil | Cons of 'a * 'a List
66 CHAPTER 3. THE MODULES SYSTEM
val x = 7
end;
> structure T =
struct
val x = 7 : int
end
- structure S =
struct
val y = T.x + 1
end;
> structure S =
struct
val y = 8 : int
end
It is clear that S can be used independently of T, even though S was dened
by reference to T. This form of dependence is sometimes called dependence
by construction.
Essential dependence is much more important. One form of essential
dependence occurs when T declares an exception that can be raised by a
function in S. For example,
- structure T =
struct
exception Barf
fun foo(x) = if x=0 then raise Barf else 3 div x
end;
> structure T =
struct
exception Barf
val foo = fn : int->int
end
- structure S =
struct
fun g(x) = T.foo(x) + 1
end
Since S.g(0) raises the exception Barf, the use of S is limited to contexts in
which T is available, for otherwise one cannot handle the exception. Therefore
68 CHAPTER 3. THE MODULES SYSTEM
- structure S =
struct
structure T =
struct
datatype 'a List = Nil | Cons of 'a * 'a List
fun len(Nil) = 0
| len(Cons(h,t)) = 1 + len(t)
end
val len = T.len
end;
> structure S =
struct
structure T =
struct
type 'a List
con Nil : 'a List
con Cons : 'a * 'a List -> 'a List
val len = fn : 'a List -> int
end
val len = fn : 'a T.List -> int
end
- signature SIGS =
sig
structure T : SIGT
val len : 'a T.List -> int
end;
Notice the structure specication in SIGS, which asserts that the substructure
T is to match signature SIGT. Note also that the specication of len in SIGS
mentions T.List, which is local to SIGS by virtue of the fact that T is a
substructure of S.
Exercise 3.2.6 Dene a structure Exp that implements a datatype of ex-
pressions with associated operation. It should satisfy the signature
- signature EXP =
sig
datatype id = Id of string
datatype exp = Var of id
| App of id * (exp list)
end
3.3 Abstractions
We noted above that the process of signature matching \cuts down" struc-
tures so that they have only the components present in the signature. The
ascription of a signature to a structure provides a \view" of that structure,
so that signature matching provides a limited form of information hiding by
restricting access to only those components that appear in the signature.
One reason to make such restrictions is that it can be helpful in program
maintenance to precisely dene the interface of each program module. Simi-
lar concerns are addressed by abstract types in the core language: one reason
3.3. ABSTRACTIONS 71
to use an abstract type is to ensure that all uses of that type are indepen-
dent of the details of the implementation. Signature matching can provide
some of the facilities of abstract types since with it one can \throw away"
the constructors of a data type, thereby hiding the representation. But this
turns out to be a special case of a more general information hiding construct
in ML, called an abstraction.
The fundamental idea is that we would like, in certain circumstances, to
limit the view of a structure to being exactly what is specied in the signature.
The following example illustrates the point:
- signature SIG =
sig
type t
val x : t -> t
end;
- structure S : SIG =
struct
type t = int
val x = fn x => x
end;
> structure S =
struct
type t = int
val x = fn : t -> t
end
- S.x(3);
> 3 : int
- S.x(3) : S.t;
> 3 : int : S.t
Note that S.t is int, even though SIG makes no mention of this fact.
The purpose of an abstraction is to suppress all information about the
structure other than what explicitly appears in the signature.
- abstraction S : SIG =
struct
type t = int
val x = fn x => x
72 CHAPTER 3. THE MODULES SYSTEM
end;
> abstraction S : SIG
- S.x(3);
> 3 : int
- S.x(3) : S.t;
Type error in: S.x(3) : S.t
Looking for a: int
I have found a: S.t
This declaration denes a type 'a set with operations empty set and union.
The constructor set for sets is hidden in order to ensure that the type is
abstract (i.e., that no client can depend on the representation details).
In general, an abstype declaration denes a type and a collection of
operations on it, while hiding the implementation type. Abstractions provide
another way of accomplishing the same thing, as the following example shows.
- signature SET =
sig
type 'a set
val empty_set : 'a set
val union : 'a set * 'a set -> 'a set
end;
3.3. ABSTRACTIONS 73
Abstractions are more
exible than abstract types in one sense, and a
bit less
exible in another. The
exibility comes from the fact that the
abstraction needn't t the \data type with operations" mold imposed by
abstract types. For example, no type need be declared at all, or if so, it
needn't be a data type. Abstract types are marginally more
exible in that
they are ordinary declaration forms, and may therefore appear anywhere
that a declaration may appear, whereas abstractions are subject to the same
limitations as structure bindings: they may only appear at top level or within
an encapsulated declaration. This limitation does not appear to be unduly
restrictive as it is customary to dene all types at top level anyway.2
3.4 Functors
ML programs are hierarchical arrangements of interrelated structures. Func-
tors, which are functions on structures, are used to manage the dynamics of
program development in ML. Functors play the role of a linking loader in
many programming languages: they are the means by which a program is
assembled from its component parts.
Functors are dened using functor bindings, which may only occur at
top level. The syntax of a functor binding is similar to the clausal form of
function denition in the core language. Here is an example:
- signature SIG =
sig
type t
val eq : t * t -> bool
end;
- functor F( P: SIG ) : SIG =
struct
type t = P.t * P.t
fun eq((x,y),(u,v)) = P.eq(x,u) andalso P.eq(y,v)
end;
> functor F( P: SIG ): SIG
2 It is advisable to avoid abstype's in ML because they are being phased out in favor
of abstractions.
3.4. FUNCTORS 75
The signature SIG species a type t with a binary relation eq. The functor F
denes a function that, given any structure matching signature SIG, returns
another structure, which is required to match SIG as well. (Of course, the
result signature may, in general, dier from the parameter signature.)
Functors are applied to structures to yield structures.
- structure S : SIG =
struct
type t = int
val eq : t*t->bool = op =
end;
> structure S =
struct
type t = int
val eq = fn : t*t->bool
end
- structure SS : SIG = F(S);
> structure SS =
struct
type t = int * int
val eq = fn : t * t -> bool
end
Here we have created a structure S that matches signature SIG. The functor F,
when applied to structure S, builds another structure of the same signature,
but with t being the type of pairs of integers, and the equality function
dened on these pairs. Notice how SS is built as a function of S by F.
Functors enjoy a degree of polymorphism that stems from the fact that
signature matching is dened to allow the structure to have more information
than is required by the signature (which is then thrown away, as discussed
above). For example,
- structure T : SIG =
struct
type t = string * int
val eq : t * t -> bool = op =
fun f(x:t)=(x,x)
end;
76 CHAPTER 3. THE MODULES SYSTEM
> structure T =
struct
type t = string * int
val eq = fn : t * t -> bool
end;
- structure TT : SIG = F(T);
> structure TT =
struct
type t = (string*int)*(string*int)
val eq : t * t -> bool
end
It is worth noting that the functor I is not the identity function, for if S is
a structure matching SIG but with more components than are mentioned in
SIG, then the result of the application F(S) will be the cut-down view of S,
and not S itself. For example,
3.5. THE MODULES SYSTEM IN PRACTICE 77
- structure S =
struct
type t = int
val eq = op =
fun f(x) = x
end;
> structure S =
struct
type t = int
val eq = fn : int * int -> bool
val f = fn : 'a -> 'a
end
- structure S' = I( S );
> structure S' =
struct
type t = int
val eq = fn : t * t -> bool
end
is divided into four units, one for the parser, one for the abstract syntax tree
management routines, one for the symbol table, and one to manage symbols.
Here are the signatures of these four units:
- signature SYMBOL =
sig
type symbol
val mksymbol: string -> symbol
val eqsymbol: symbol * symbol -> bool
end;
- signature ABSTSYNTAX =
sig
structure Symbol : SYMBOL
type term
val idname: term -> Symbol.symbol
end;
- signature SYMBOLTABLE =
sig
structure Symbol : SYMBOL
type entry
type table
val mktable : unit -> table
val lookup : Symbol.symbol * table -> entry
end;
- signature PARSER =
sig
structure AbstSyntax : ABSTSYNTAX
structure SymbolTable : SYMBOLTABLE
val symtable : SymbolTable.table
val parse: string -> AbstSyntax.term
end;
Of course, these signatures are abbreviated and idealized, but it is hoped that
they are suciently plausible to be convincing and informative. Please note
the hierarchical arrangement of these structures. Since the parser module
uses both the abstract syntax module and the symbol table module in an
essential way, it must include them as substructures. Similarly, both the
3.5. THE MODULES SYSTEM IN PRACTICE 79
abstract syntax module and the symbol table module include the symbol
module as substructures.
Now let's consider how we might build a parser in this conguration. For-
getting about the algorithms and representations, we might think of writing
down a collection of structures such as the following:
- structure Symbol : SYMBOL =
struct
datatype symbol = symbol of string * ...
fun mksymbol(s) = symbol( s, ... )
fun eqsymbol( sym1, sym2 ) = ...
end;
- structure AbstSyntax : ABSTSYNTAX =
struct
structure Symbol : SYMBOL = Symbol
datatype term = ...
fun idname( term ) = ...
end;
- structure SymbolTable : SYMBOLTABLE =
struct
structure Symbol : SYMBOL = Symbol
type entry = ...
type table = ...
fun mktable() = ...
fun lookup(sym,table) = ...
end;
- structure Parser : PARSER =
struct
structure AbstSyntax : ABSTSYNTAX = AbstSyntax
structure SymbolTable : SYMBOLTABLE = SymbolTable
val symtable = SymbolTable.mktable();
fun parse(str) =
... SymbolTable.lookup(AbstSyntax.idname(t), symtable) ...
end;
virtue of the fact that AbstSyntax and SymbolTable include the same struc-
ture Symbol. Were there to be two structures matching signature SYMBOL,
one bound into SymbolTable and the other bound into AbstSyntax, then this
line of code would not type check. Keep this fact in mind in what follows.
Now this organization of our compiler seems to be OK, at least so far
as the static structure of the system is concerned. But if you imagine that
there are umpteen other structures around, each with a few thousand lines of
code, then one can easily imagine that this approach would become somewhat
unwieldy. Suppose that there is a bug in the symbol table code, which we
x, and now we would like to rebuild the system with the new symbol table
module installed. This requires us to recompile the above set of structure
expressions (along with all the others that are aected as a consequence) in
order to rebuild the system. Clearly some form of separate compilation and
linking facility is needed. What we are aiming at is to be able to recompile
any one module in isolation from the others, and then relink the compiled
forms into the desired static conguration. Of course, this idea is not new;
the point is to see how it's done in ML.
The key is never to write down a structure explicitly, but rather to orga-
nize the system as a set of functors, each taking its dependents as arguments
(and taking no arguments if it has no dependents). Then to link the sys-
tem, one merely applies the functors so as to construct the appropriate static
conguration. For our example, the functors will look like this:
- functor SymbolFun(): SYMBOL =
struct
datatype symbol = symbol of string * ...
fun mksymbol(s) = symbol( s, ... )
fun eqsymbol( sym1, sym2 ) = ...
end;
- functor AbstSyntaxFun( Symbol: SYMBOL ): ABSTSYNTAX =
struct
structure Symbol : SYMBOL = Symbol
datatype term = ...
fun idname( term ) = ...
end;
- functor SymbolTableFun( Symbol: SYMBOL ): SYMBOLTABLE =
struct
3.5. THE MODULES SYSTEM IN PRACTICE 81
of the fact that SymbolTable and AbstSyntax have the same substructure
Symbol, and hence the same type of symbols. Now in ParserFun, the function
parse knows only the signatures of these two structures, and not that they
are implemented in a compatible way. Therefore the compiler is forced to
reject ParserFun, and our policy of using functors to support modularity
appears to be in trouble.
There is a way around this, called the sharing specication. The idea is
to attach a set of equations to the signature PARSER PIECES that guarantees
that only a compatible pair of symbol table and abstract syntax structures
can be passed to ParserFun. Here is a revised denition of PARSER PIECES
that expresses the requisite sharing information:
- signature PARSER_PIECES =
sig
structure SymbolTable : SYMBOLTABLE
structure AbstSyntax : ABSTSYNTAX
sharing SymbolTable.Symbol = AbstSyntax.Symbol
end;
The sharing clause ensures that only compatible pairs of symbol table and
abstract syntax modules may be packaged together as PARSER PIECES (where
\compatible" means \having the same Symbol module".) Using this revised
signature, the declaration of ParserFun is now legal, and can be used to
construct the conguration of structures that we described above.
There are, in general, two forms of sharing specication, one for types and
one for structures. In the above example we used a structure sharing speci-
cation to insist that two components of the parameters be equal structures.
Two structures are equal if and only if they result from the same evalua-
tion of the same struct expression or functor application. For example, the
following attempt to construct an argument for ParserFun fails because the
sharing specication is not satised:
- structure Pieces : PARSER_PIECES =
struct
structure SymbolTable = SymbolTableFun( SymbolFun() )
structure AbstSyntax = AbstSyntaxFun( SymbolFun() )
end;
3.5. THE MODULES SYSTEM IN PRACTICE 83
Type equality is similar to structure equality in that two data types are equal
if and only if they result from the same evaluation of the same declaration.
So, for example, if we have two syntactically identical data type declarations,
the types they dene are distinct.
Returning to our motivating example, suppose that we wish to x a bug
in the symbol manipulation routines. How, then, is our program to be re-
constructed to re
ect the change? First, we x the bug in SymbolFun, and
re-evaluate the functor binding for SymbolFun. Then we repeat the above
sequence of functor applications in order to rebuild the system with the new
symbol routines. The other functors needn't be recompiled, only reapplied.
Chapter 4
Input-Output
ML provides a small collection of input/output primitives for performing sim-
ple character I/O to les and terminals. The fundamental notion in the ML
I/O system is the character stream, a nite or innite sequence of characters.
There are two types of stream, instream for input streams, and outstream
for output streams. An input stream receives its characters from a producer,
typically a terminal or disk le, and an output stream sends its characters
to a consumer, also often a terminal or disk le. A stream is initialized by
connecting it to a producer or consumer. Input streams may or may not
have a denite end, but in the case that they do, ML provides primitives for
detecting this condition.
The fundamental I/O primitives are packaged into a structure BasicIO
with signature BASICIO, dened as follows:
- signature BASICIO = sig
(* Types and exceptions *)
type instream
type outstream
exception io_failure: string
(* Stream creation *)
84
85
87
Appendix A
Answers
Answer 2.3.1:
1. Unbound value identifier: x
2. > val x = 1: int
> val y = 3: int
> val z = 2: int
3. > 3: int
Answer 2.4.1:
The computer would match hd::tl::nil against
"Eat"::"the"::"walnut"::nil. The lists are of dierent length
so the pattern matching would fail.
Answer 2.4.2:
1. { b=x, ... }
2. _::_::x::_ or [_, _, x, _, _]
3. [_, (x,_) ]
Answer 2.5.1:
local val pi = 3.141592654
in fun circumference r = 2.0 * pi * r
fun area r = pi * r * r
end
88
89
Answer 2.5.2:
fun abs x = if x < 0.0 then ~x else x
Answer 2.5.3:
To evaluate fact(n), the system must evaluate newif(n=0,1,fact(n-1)).
The arguments to this function must be evaluated before the call
to the function. This involves evaluating fact(n-1), even when
n<= 0. The function will therefore loop.
Answer 2.5.5:
This is an inecient denition of a function to reverse the order
of the elements in a list.
Answer 2.5.6:
fun isperfect n =
let fun addfactors(1) = 1
| addfactors(m) =
if n mod m = 0
then m + addfactors(m-1) else addfactors(m-1)
in (n < 2) orelse (addfactors(n div 2) = n) end;
Answer 2.5.7:
fun cons h t = h::t
Answer 2.5.8:
fun cc(0,_) = 1
| cc(_,[]) = 0
| cc(amount, kinds as (h::t)) =
if amount < 0 then 0
else cc(amount-h,kinds) + cc(amount, t);
Answer 2.5.9:
fun nth(0,l) = l | nth(n,h::t) = nth(n-1,t);
fun count(amount,table) =
let fun count_using([],l) = l
| count_using(h::t,h1::t1) =
let val t1' as ((c::_)::_) =
count_using(t,t1)
val diff = amount - h
val cnt = c + if diff < 0 then 0
else if diff = 0 then 1
else hd(nth(h-1,h1))
in (cnt::h1)::t1'
end
in if amount > sum then hd(hd table)
else count(amount+1,count_using(coins,table))
end
in count(0, initial_table coins) end;
Answer 2.5.10:
local
fun move_disk(from, to) = (from, to);
in
fun tower_of_hanoi(n) = transfer("A","B","C",n)
end;
91
An alternative solution, that explicitly models the disks, and checks for illegal
moves, could be written as follows.
local
fun incl(m,n) = if m>n then [] else m::incl(m+1,n)
Answer 2.7.1:
fun samefrontier(empty,empty) = true
| samefrontier(leaf x, leaf y) = x = y
| samefrontier(node(empty,t1), node(empty,t2)) =
samefrontier(t1,t2)
| samefrontier(node(leaf x,t1), node(leaf y,t2)) =
x = y andalso samefrontier(t1,t2)
| samefrontier(t1 as node _, t2 as node _) =
samefrontier(adjust t1, adjust t2)
| samefrontier(_,_) = false
Answer 2.7.2:
abstype 'a set = set of 'a list
with val emptyset = set []
fun singleton e = set [e]
fun union(set l1, set l2) = set(l1@l2)
fun member(e, set []) = false
| member(e, set (h::t)) =
(e = h) orelse member(e, set t)
fun intersection(set [], s2) = set []
| intersection(set(h::t), s2) =
let val tset as (set tl) = intersection(set t, s2)
in if member(h,s2) then set(h::tl) else tset end
end;
Answer 2.7.3:
abstype 'a set = set of ( 'a list *
93
Answer 2.8.1:
1. The exception bound to the outer exn is distinct from that
bound to the inner exn; thus the exception raised by f(200),
94 APPENDIX A. ANSWERS
exception conflict;
fun addqueen(i,n,place) =
let fun tryqueen(j) =
( if conflict((i,j), place) then raise conflict
else if i=n then (i,j)::place
else addqueen(i+1,n,(i,j)::place) )
handle conflict =>
if j = n then raise conflict else tryqueen(j+1)
in tryqueen(1) end;
Answer 2.8.3:
exception conflict: ((int * int) list) list;
fun addqueen(i,n,place,places) =
let fun tryqueen(j, places) =
( if conflict((i,j), place)
then raise conflict with places
else if i=n
then raise conflict with ((i,j)::place)::places
else addqueen(i+1,n,(i,j)::place,places) )
handle conflict with newplaces =>
if j = n then raise conflict with newplaces
else tryqueen(j+1, newplaces)
in tryqueen(1,places) end;
fun allqueens(n) =
addqueen(1,n,[],[]) handle conflict with places => places;
Answer 2.9.1:
val primes =
let fun nextprime(n,l) =
let fun check(n,[]) = n
| check(n,h::t) =
if (n mod h) = 0 then check(n+1,l)
else check(n,t)
in check(n,l) end
fun primstream (n,l) =
mkstream(fn () => let val n' = nextprime(n,l)
in (n', primstream(n'+1,n'::l)) end)
in primstream(2,[]) end;
Answer 2.9.2:
abstype 'a stream = stream of (unit -> ('a * 'a stream)) ref
with fun next(stream f) =
let val res = (!f)() in (f := fn () => res; res) end
fun mkstream f = stream(ref f)
end;
96 APPENDIX A. ANSWERS
Answer 2.9.3:
abstype 'a stream = stream of (unit -> ('a * 'a stream)) ref
with local exception endofstream in
fun next(stream f) =
let val res = (!f)()
in (f := fn () => res; res)
end
fun mkstream f =
stream(ref f)
fun emptystream() =
stream(ref(fn () => raise endofstream))
fun endofstream(s) =
(next s; false) handle endofstream => true
end
end;
Answer 3.2.1:
structure INTORD: ORD =
struct
type t = int
val le: int * int -> bool = op <
end
Answer 3.2.2:
The signature requires the type of n to be an 'a list, i.e. if a
structure T matches SIG, then true::(T.n) should be legitimate.
This cannot be the case if we were allowed to supply a value for
n with a more specic type such as int list. Therefore the
declaration is disallowed.
Answer 3.2.3:
sig type 'a t val x: bool * int end
and
sig type 'a t val x: bool t end
Answer 3.2.4:
Only sig type t val f: t -> t end satises the signature
closure rule (the others contain free references to the structure
A).
Answer 3.2.5:
signature STACK =
sig
datatype 'a stack = nilstack | push of 'a * 'a stack
exception pop: unit and top: unit
val empty: 'a stack -> bool
and pop: 'a stack -> 'a stack
and top: 'a stack -> 'a
end
signature SUBST =
sig
structure E: EXP
type subst
val subst: (E.id * E.exp) list -> subst
val lookup: E.id * subst -> E.exp
val substitute: subst -> E.exp -> E.exp
end
Answer 3.4.1:
signature ORD =
sig
type elem
val eq: elem * elem -> bool
val le: elem * elem -> bool
end
signature SET =
sig
type set
structure O: ORD
val emptyset: set
val singleton: O.elem -> set
val member: O.elem * set -> bool
val union: set * set -> set
val intersect: set * set -> set
end
100 APPENDIX A. ANSWERS
Answer 4.0.1:
local
fun incl(m,n) = if m>n then [] else m::incl(m+1,n)
101
Answer 4.0.2:
fun printboard(place,n,s) =
let fun present(pos: (int*int), []) = false
| present(pos, h::t) = (pos=h) orelse present(pos,t)
fun printcolumn(i,j) =
if j > n then ()
else
( output(s,if present((i,j), place)
then " Q " else " . ");
printcolumn(i,j+1) )
fun printrow(i) =
if i > n then ()
else (printcolumn(i,1);
output(s,"\n");
printrow(i+1))
102 APPENDIX A. ANSWERS