CMSC 22100/32100: Programming Languages An Overview of Standard ML M. Blume October 2, 2008
CMSC 22100/32100: Programming Languages An Overview of Standard ML M. Blume October 2, 2008
An Overview of Standard ML
M. Blume October 2, 2008
Contents
1 What is SML 2
3 Variables 3
5 Functions 9
5.1 Function values and function definitions . . . . . . . . . . . . . . . . . 9
5.2 Function application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.3 Curried function definitions . . . . . . . . . . . . . . . . . . . . . . . . 10
6 Other types 11
6.1 The unit type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.2 References—the ref type . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.3 Text—the types char and string . . . . . . . . . . . . . . . . . . . . . 11
7 Block structure 12
7.1 Simultaneous bindings and mutual recursion . . . . . . . . . . . . . . . 13
10 Using Files 21
10.1 Function use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
10.2 A note on polymorphism and type inference . . . . . . . . . . . . . . . 22
10.3 Generativity of datatype definitions . . . . . . . . . . . . . . . . . . . 22
10.4 CM — the SML/NJ compilation manager . . . . . . . . . . . . . . . . 23
1
A Reserved words 23
1 What is SML
Standard ML is a strongly typed, impure, strict functional language.
Strongly typed: Every value, expression in the language has a type (int, real,
bool etc.). The compiler rejects a program that does not conform to the
type system of the language.
Functional: Each expression evaluates to a value. Some of these values are
functions. In fact, every function in ML is a value. Like other values,
functions can be bound to variables, passed as arguments to other func-
tions, returned as values from function calls, and stored in data structures.
Impure: Unlike in other functional languages such as Haskell, the evaluation
of expressions in ML can incur side-effects, e.g., assignment to locations
within mutable data structures or I/O.
Strict: Unlike “lazy” languages such as Haskell, arguments to ML functions are
evaluated before the function call is performed. This means that if one of
the arguments loops forever, then so will the entire program, regardless
of whether or not the function actually needed that argument. Similarly,
all side-effects caused by the evaluation of the arguments occur before any
side-effects caused by the evaluation of the function body.
In this class, we will use the SML/NJ compiler. SML/NJ generates reason-
ably fast executable code, although some other compilers, e.g., MLton, outper-
form it on a regular basis. However, SML/NJ can be used interactively and
comes with a mature programming environment. (Some people use SML/NJ
for development and use MLton to speed up the final version. However, in this
class, runtime performance will not be an issue.)
$ sml
Standard ML of New Jersey v110.68 [built: Mon Sep 8 13:47:59 2008]
- (* Integers: *)
1 This is a misnomer; they are really finite precision numbers.
2
- 1;
val it = 1 : int
- 2;
val it = 2 : int
- 1+2;
val it = 3 : int
-
- (* Reals: *)
- 1.0;
val it = 1.0 : real
- 2.0;
val it = 2.0 : real
- Math.sqrt(3.0*3.0+4.0*4.0);
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[autoloading done]
val it = 5.0 : real
- (* Booleans: *)
- false;
val it = false : bool
- true;
val it = true : bool
- if true then 1 else 2;
val it = 1 : int
Some notes:
• Comments in Standard ML are enclosed within comment brackets: (*
. . . *). Comments are nestable.
• Many library functions such as Math.sqrt are automatically loaded by
the SML/NJ interactive system on an on-demand basis. This is what
accounts for the [autoloading . . . done] sequence of messages.
3 Variables
Variables in Standard ML are identifiers that name values. Once a binding for
a variable is established, the variable continues to name the same value until it
goes out of scope. In other words, Standard ML variables are immutable.
Imperative programming with assignment, while possible using reference val-
ues (see below), it is not encouraged and cannot be done using variables alone.
Assignment to variables can usually be avoided by writing iterative algo-
rithms in recursive style. In this style, instead of changing the value of a vari-
able one simply establishes a new binding. The new binding can use the same
identifier, in which case the old binding will be shadowed and go out of scope.
As an example, consider the following iterative version of the factorial func-
tion, written in tail-recursive style:
3
- fun fac_loop (n, f) = if n = 0 then f else fac_loop (n-1, f*n);
val fac_loop = fn : int * int -> int
- fun fac n = fac_loop (n, 1);
val fac = fn : int -> int
- fac 10;
val it = 3628800 : int
- NONE;
val it = NONE : ’a option
- SOME 1;
val it = SOME 1 : int option
- SOME 1.0;
val it = SOME 1.0 : real option
- SOME false;
val it = SOME false : bool option
parentheses.
4
empty list is called nil. The keyword nil is special in that it is interchangeable
with the notation []. A non-empty list consists of a head element and a tail
list (of the same type). If h is the head and t is the tail, then the entire list is
h :: t. The infix constructor :: is sometimes pronounced “cons.” It associates
to the right.
A k-element list x1 :: · · · :: xk :: nil can alternatively be written
[x1 , . . ., xk ].
- nil;
val it = [] : ’a list
- 1 :: nil;
val it = [1] : int list
- true :: false :: nil;
val it = [true,false] : bool list
- [1.0,2.0,4.0];
val it = [1.0,2.0,4.0] : real list
However, to be precise, one must also declare :: as an infix operator and estab-
lish the alternative syntax using square brackets. The former would be possible
using a Standard ML infix declaration (actually, since we need right associativ-
ity, it would be an infixr declaration), but the latter is built-in syntax—which
is why the list type itself must be built-in and cannot be defined from first
principles.
4.3 Tuples
The built-in type constructor *, which is written in infix notation and may
be repeated as in int * real * bool to form arbitrary n-ary tuple type con-
structors, produces types that correspond to cartesian products of other types.
The values inhabiting these types are called tuples and are written as comma-
separated sequences enclosed in round parentheses.
- (1, 2);
val it = (1,2) : int * int
- (1.0, 2.0);
val it = (1.0,2.0) : real * real
- (1, 2.0, true, NONE);
val it = (1,2.0,true,NONE) : int * real * bool * ’a option
5
- #1 (1, 2.0);
val it = 1 : int
- #2 (1, 2.0);
val it = 2.0 : real
4.4 Records
Records generalize tuples by making it possible to give names (labels) to fields
instead of relying on positional information. (In fact, Standard ML defines
tuples to be special cases of records where the labels are numeric and form an
uninterrupted sequence from 1 to some natural number n.) Projection from
records uses the notation #l where l is the label name.
where the li are the labels and the ti are the corresponding types of individual
record fields. Record expressions have the form:
6
2. When a constructor ci is within a pattern (see below), it serves the
purpose of determining whether or not a given value was formed by
injecting some vi into the datatype using the constructor ci . If so,
the pattern match also recovers vi .
• Datatype definitions can define recursive types. That is, the name of the
defined type may be used on the right-hand side of the definition.
• Some (or all) variants of a datatype can be constructors that do not carry
a type. These constructors become constants, i.e., brand-new values that
inhabit the newly defined type. If all constructors are constants, then a
datatype definition effectively becomes the definition of an enumeration
type.
• Datatypes are type constructor, i.e., they can have type arguments. In the
definition, the formal parameter list consisting of type variables precedes
the type constructor name. Type variables are identifiers that start with
the ’-character (apostrophe).
Examples:
$ sml
Standard ML of New Jersey v110.68 [built: Mon Sep 8 13:47:59 2008]
- (* enumerations *)
- datatype color = Red | Green | Blue;
datatype color = Blue | Green | Red
-
- (* integer trees with values on leaves *)
- datatype itree = Leaf of int | Node of itree * itree;
datatype itree = Leaf of int | Node of itree * itree
-
- (* real trees with values on internal nodes *)
- datatype rtree = RLeaf | RNode of real * rtree * rtree;
datatype rtree = RLeaf | RNode of real * rtree * rtree
-
- (* trees with integer values on leaves and real values on internal
= nodes *)
- datatype irtree = IRLeaf of int | IRNode of real * irtree * irtree;
datatype irtree = IRLeaf of int | IRNode of real * irtree * irtree
-
- (* integer lists *)
- datatype ilist = INil | ICons of int * ilist;
datatype ilist = ICons of int * ilist | INil
-
- (* our own list type constructor; ’a is the formal argument *)
- datatype ’a mylist = MyNil | MyCons of ’a * ’a mylist;
datatype ’a mylist = MyCons of ’a * ’a mylist | MyNil
-
7
- (* or own option type constructor *)
- datatype ’a myoption = MyNONE | MySOME of ’a;
datatype ’a myoption = MyNONE | MySOME of ’a
The option constructor that we have seen is just a (built-in) datatype. The
built-in list type constructor is essentially isomorpmic to mylist.3 Similarly,
the bool type constructor is also a built-in datatype consisting precisely of
the two constants false and true. The type is built-in because Standard ML
provides syntactic conveniences in form of if-expressions.4
We use case to dispatch on the constructor of a datatype value:
- val x = Red;
val x = Red : color
- case x of
= Red => "rouge"
= | Green => "vert"
= | Blue => "bleu";
val it = "rouge" : string
The same trick also works well in conjunction with the use of record types. In
Standard ML, projecting from a record—or, more generally, the use of flexible
record patterns—requires that the compiler knows the “shape” of the record
type, i.e., the set of all its labels, including those not being selected. The
constructor of a single-variant datatype can serve as a light-weight annotation
that provides precisely this information:
list.
4 There is more, in particular: andalso and orelse, which are short-circuiting forms logical
8
5 Functions
5.1 Function values and function definitions
Functions in Standard ML take a single argument and produce a single result.
Functions that we think of as taking multiple arguments can be defined in a
number of ways, namely
A function that maps arguments of type t1 to results of type t2 itself has type t1
-> t2 . Functions do not need to have names; they can be written anonymously
using the fn/=> syntax:
- fn x => x*2.0;
val it = fn : real -> real
Functions (like all other values) can be bound to names using the val keyword:
9
- fun fac n = if n = 0 then 1 else n * fac (n-1);
val fac = fn : int -> int
- fun append (nil, ys) = ys
= | append (x :: xs, ys) = x :: append (xs, ys);
val append = fn : ’a list * ’a list -> ’a list
This is equivalent to
And since this definition does not actually use recursion, it is also the same as:
- val p1 = plus 1;
val p1 = fn : int -> int
- val p2 = plus 2;
5 The data structure used internally to capture such free variables is called a closure, refer-
10
val p2 = fn : int -> int
- p1 3;
val it = 4 : int
- p2 3;
val it = 5 : int
6 Other types
6.1 The unit type
The built-in type unit has precisely one value, written (). The unit type is
used for computations that do not naturally have a meaningful result and are
performed for effect only.
11
for tabulator. Strings can be concatenated using the ^ infix operator. To send
a string to standard output, use print.
White space within strings that is enclosed by escape characters is ignored.
Since white space includes newlines, this can be used to conveniently spread long
literals across multiple source lines (even without losing proper indentation).
Characters can also be handled individually. They have type char. Charac-
ter literals look like one-element string literals with a preceding #.
7 Block structure
We have seen a number of ways of declaring things at the interactive toplevel:
value bindings (including functions) using val, val rec, and fun, algebraic
datatypes using datatype, and type abbreviations using type. There are a few
more that have not been explained in detail: exceptions using exception as
well as infix status and operator precedence using infix and infixr.
These are the declaration forms of the Standard ML core language, and all of
them can be used at arbitrary nesting levels. New nested blocks are established
by let-expressions.
The general form of a let-expression is
let
12
decl 1
..
.
decl n
in
exp 1 ;
..
.
exp m
end
where n ≥ 0 and m ≥ 1. The bindings established by the declarations come
into effect incrementally, i.e., a binding established by decl i is visible in the
right-hand side of every decl j where j > i. Declarations of the form datatype,
val rec and fun are recursive, meaning that their right-hand sides can see the
bindings established on their own left-hand sides.
There can be more than one ;-separated expression between in and end.
All of them see all the bindings established by the list of declarations. All of
them except the last are evaluated for effect only. Their types should be unit
(although, unfortunately, the language does not enforce this convention). The
value exp m becomes the value of the entire let-expression.
As an example, here is an alternative definition of the iterative factorial func-
tion that does not expose the binding of the “helper” function than implements
the iterative (i.e., tail-recursive) loop:
- val fac =
= let fun loop f n = if n = 0 then f else loop (f*n) (n-1)
= in loop 1
= end;
val fac = fn : int -> int
- fac 10;
val it = 3628800 : int
This example also shows a use of currying and a partial application of the curried
loop function.
- val x = 1 and y = 2;
val x = 1 : int
val y = 2 : int
- fun even n = n = 0 orelse odd (n-1)
= and odd n = n <> 0 orelse even (n-1);
13
val even = fn : int -> bool
val odd = fn : int -> bool
- datatype t = A of u
= and u = B of t | C;
datatype t = A of u
datatype u = B of t | C
- type t = int and u = real;
type t = int
type u = real
Notice that and replaces the usual declaration keyword in all but the first indi-
vidual declaration.
If a declaration form is not recursive (e.g., val, or type, or exception),
then using and causes simultaneous binding, meaning that none of the right-
hand sides can see any of the new bindings.
If a declaration form is recursive (e.g., datatype, or fun, or val rec), then
using and causes mutual recursion, meaning that all of the right-hand sides can
see all of the new bindings.
14
Matches are used in fn-expressions, where they define an anonymous func-
tion of type t -> u and in case-expressions, where they are used to scrutinize
a value of type t and dispatch into one ore more distinct branches of control,
each returning a value of type u.
The syntax
case e of p1 => e1 | · · · | pn => en
is actually syntactic sugar for
The pattern language mirrors (to some extend) the expression language;
patterns can be made from parts that contain other patterns:
wildcard The pattern (underscore) has an arbitrary type t and matches all
values of that type. The actual runtime value is discarded, i.e., it is not
bound to any variable.
variable Any variable name x (except those currently bound to datatype con-
structors) can be used as a pattern. Such a pattern has an arbitrary type
t (which becomes the type of x within its scope) and matches any value of
that type. Within each branch of a pattern (see or -patterns below), there
can be at most one occurrence of each variable. If a value v matches the
pattern, then x becomes bound to the part of v that corresponds to the
position of x within the overall pattern.
15
list If p1 , . . . , pn are patterns of type t, then [p1 ,. . .,pn ] is a pattern of type
t list. It matches list values [v1 ,. . .,vn ] that are precisely n elements
long provided that each element vi matches its corresponding sub-pattern
pi .
integer Any integer literal i is a pattern of type int that matches precisely the
value i.
string Any string literal s is a pattern of type string that matches precisely
the value s.
data constructor Let ci be a datatype constructor of type ti -> t, and let pi
be a pattern of type ti . Then ci pi is a pattern of type t. It will match
values of the form ci vi where vi matches pi . (These are the values of type
t than were formed by applying the constructor ci to vi .)
If ci has infix status, then it must be written as an infix operator: px ci py
instead of ci (px , py ). A frequently occurring example of this is the “cons”
constructor :: of type list.
or pattern (This is a conservative extension to Standard ML implemented by
SML/NJ.) If p1 and p2 are patterns of the same type t, then p1 | p2 is
also a pattern of type t. It matches the union of the values matched by
p1 and p2 . The sub-patterns p1 and p2 must agree precisely on the set
of variable that are bound by them (same set of variables, same types).
Matching proceeds from left to right, meaning that if a value matches both
p1 and p2 at the same time, then the bindings from p1 will go into effect.
as-pattern If p is a pattern of type t and x is a variable pattern, then x as p
is also a pattern of type t. It matches the same values v that p alone
would match. In this case, in addition to the variable bindings that are
established within p, the variable x is bound to the respective v itself.
reference If p is a pattern of type t, then ref p is a pattern of type t ref.
It matches a value v if v denotes a location that at the time the match is
performed contains a value v 0 matching p. Reference patterns are unusual
in that they incur a side-effect (reading of a mutable location).
8.1 Examples
As mentioned above, patterns are used in case-expressions and fn-expressions.
They are also used in function definitions in clause form (using keyword fun).
Examples:
- fun fac n =
= case n of
= 0 => 1
= | _ => n * fac (n-1);
val fac = fn : int -> int
16
-
- (* several equivalent ways of defining "reverse-and-append": *)
- fun revappend (xs, ys) =
= case xs of
= [] => ys
= | x :: xs => revappend (xs, x :: ys);
val revappend = fn : ’a list * ’a list -> ’a list
-
- fun revappend2 ([], ys) = ys
= | revappend2 (x :: xs, ys) = revappend2 (xs, x :: ys);
val revappend2 = fn : ’a list * ’a list -> ’a list
-
- val rec revappend3 =
= fn ([], ys) => ys | (x :: xs, ys) => revappend3 (xs, x :: ys);
val revappend3 = fn : ’a list * ’a list -> ’a list
Notice that in case- and fn-expressions the syntax requires the use of the
double-arrow =>, while function definitions in clause form (using fun) use an
equal sign =.
The definition of a curried function in clause form
fun f x1 . . . xn =
case (x1 ,. . .,xn ) of
(p11 ,. . .,p1n ) => b1
| (p21 ,. . .,p2n ) => b2
..
.
| (pm1 ,. . .,pmn ) => bm
17
9 Other language features
9.1 Polymorphism
9.2 Modules
One of the most distinguishing features of Standard ML is its module system.
It deserves a much more complete introduction. However, we will use only
relatively little of its full power, so the following overview should suffice:
Structures
(First-order) Modules in Standard ML are called structures. Structures can be
bound to module names much like values can be bound to variables. A module
binding is established by a declaration that starts with the keyword structure.
The right-hand side of a structure declaration is often a structure expres-
sion, i.e., a sequence of declarations enclosed within struct and end. Any
declaration that can appear at top level can also appear within a structure.
One of the roles that structures play is that of name space management.
Declarations that appear within a module are not visible directly. Any reference
to a name bound within a module, when used from elsewhere, must be qualified
with the module’s name.
Structures are bound
Example:
- structure A = struct
= type t = int
= type u = t * t
= fun f (x : t) = (x, x+1) : u
= end;
structure A :
sig
type t = int
type u = t * t
val f : t -> u
end
- A.f 10;
val it = (10,11) : A.u
Signatures
Much like values that are classified by types, structures are classified by sig-
natures. Programmers can explicitly write signatures, bind them to signature
names, and ascribe them to structures.
For every structure (even in the absence of a signature ascription) the com-
piler infers a principal signature, which in some sense is the “default” signature
of the structure. An ascribed signature may elide some elements and give more
18
specific types for some values, but it must otherwise match the principal signa-
ture.
An opaque signature ascription can also hide the identity of certain types,
thus rendering them abstract. This is Standard ML’s primary way of defining
abstract types. An opaque signature ascription uses the symbol :>, while its
counterpart, transparent signature ascription uses :.
Examples:
First define the signature and bind it to S:
- signature S = sig
= type t
= val x : t
= val f : t -> int
= end;
signature S =
sig
type t
val x : t
val f : t -> int
end
- structure M = struct
= type t = int
= val x = 10
= fun f y = y (* still polymorphic *)
val a = "hello"
= end;
structure M :
sig
type t = int
val x : int
val f : ’a -> ’a
val a : string
end
- M.x;
val it = 10 : int
- M.f M.x;
val it = 10 : int
- M.f 3;
val it = 3 : int
- M.f true;
val it = true : bool
The transparent ascription of S to M hides a and makes the type of f less general:
19
- structure MT = M : S;
structure MT : S
- MT.x;
val it = 10 : M.t
- MT.f MT.x; (* MT.t is the same as int *)
val it = 10 : int
- MT.f 3;
val it = 3 : int
- MT.f true; (* MT.f is specialize to int now *)
stdIn:1.1-1.10 Error: operator and operand don’t agree [tycon mismatch]
operator domain: M.t
operand: bool
in expression:
MT.f true
If we use opaque ascription instead, the identity of type t (i.e., the fact that it
is defined to be int) becomes hidden:
- structure MO = M :> S;
structure MO : S
- MO.x;
val it = - : MO.t
- MO.f MO.x; (* MO.f and MO.x still have matching types *)
val it = 10 : int
- MO.f 3; (* but MO.t is not the same as int anymore *)
stdIn:36.1-36.7 Error: operator and operand don’t agree [literal]
operator domain: MO.t
operand: int
in expression:
MO.f 3
- MO.f true; (* and, of course, it is not bool either *)
stdIn:1.1-1.10 Error: operator and operand don’t agree [tycon mismatch]
operator domain: MO.t
operand: bool
in expression:
MO.f true
Functors
In Standard ML, a functor is a “function” from structures to structures. Func-
tors make it possible to write generic code that is parametrized not only over
other values but also over other types. In this course we will have little need for
writing our own functors, but we will sometimes make use of functors defined
in libraries.
20
9.3 Exceptions
9.4 Infix declarations
10 Using Files
When you start SML/NJ using the sml command, you find yourself in the
interactive toplevel loop. Code you type here is compiled to machine code on
the fly and then executed right away. A pretty-printing mechanism displays
results or gives a summary of the definitions that were entered.
Code entered at the interactive level is ephemeral, since the system does not
save it for you. Therefore, any program longer than a few lines of code should
first be saved to a text file and then read into the system. While it would
be possible to use your favorite GUI’s cut&paste feature to achieve the latter,
SML/NJ provides a number of facilities to make the process easier.
21
10.2 A note on polymorphism and type inference
Notice that the type of myLength is ’a mylist -> int. Implicitly, the type
variable ’a is universally quantified. This means that myLength can be used
with any instantiation of mylist. (Indeed, the implementation of myLength
completely ignores the contents carried by the MyCons nodes of the list.)
The type of myLength (or any other polymorphic value) is automatically
instantiated appropriately as needed wherever it is used. This process is part
of type inference, a mechanism implemented by the compiler designed to relieve
the programmer from having to write most of the type annotations that are
necessary in many other statically typed programming languages.
22
Apparently this works without problem. But now, if we want to run another
test of myLength, we find trouble:
This rather cryptic error message indicates that the expected argument type of
myLength does not match the type of the actual argument! Why is that?
The reason for this behavior is that reloading the definition of mylist has the
effect of creating a brand-new type with brand-new constructors. But myLength,
which had not been reloaded, still has the old type! In fact, as indicated by
the question marks ? in the error message, it now has a type that cannot even
be named anymore, since the new but identically named definition of mylist
shadows it.
6 The details here are actually somewhat more complicated, but we will stick to this first
approximation of an explanation.
23
A Reserved words
The following identifiers are reserved words in Standard ML. They cannot be
used as names of user-defined variables, constructors, types, or modules:
abstype and andalso as case datatype do else end eqtype exception
fn fun functor handle if in include infix infixr let local nonfix of
op open orelse raise rec sharing sig signature struct structure then
type val where with withtype while ( ) [ ] { } , : ; ... | = =>
-> # :>
The SML/NJ implementation of Standard ML adds the following two addi-
tional reserved words to the above list:
abstraction funsig
24