An Introduction To SETL, Set Theroretic Language - Kennedy

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

CVII/p. & M aths with App ls, Vo l. I, pp. 97·· 119. Pergamon Press. 1975. Printed in Grea t Britain.

AN INTRODUCTION TO THE SET THEORETICAL


LANGUAGE SETL*
K . KEN NEDY
Department of Mathematical Sciences, Rice Universit y, Hou ston , Texas, U.S.A.
and
J. SCHWARTZ
Department of Computer Science. Courant Institute of Mathematical Sciences, New York
University. New York , U.S.A.

Communicated by Ervin Y. Rod in


iReceired 6 February 1974)

Abstract-The problem of pr ogramming is discussed and a two-sta ge approach to programming


is proposed. T his approac h would first produce a bstract algorithms ignoring efficiency considera -
tion s ; then it would prod uce concrete algorithm s by the addition of da ta structuring. The set-
theoretic language (SET L) is introd uced as a suita ble vehicle for the formulati on o f abstr act
algorithms. An extensi ve description of SETL a nd examples of its use are included.

I. I NTROD U CTIO N

The problem of programming is that of defining a problem in terms comprehensible to a


computer. The increasing complexity of problems being attacked makes it more and more
difficult to produce well-designed , correct programs rapidly. The development of improved
programming techniques is therefore an issue of central importance. Many approaches
ha ve been suggested. Some, like the use of carefully "s tructured programming" are partly
administrative; others, like the use of novel control structures, are highl y technical. In the
present article we shall describe an approach to the general problem of programming ease
which emphasizes the use of powerful operations on very general data structures.
Most programming languages make it impossible to separa te issues of problem formu-
lation from those of efficiency. For this reason, the process of problem formulation often
becomes entangled with the problem of choosing data structures which will lead to highly
efficient program realizations. While in a complete implementation both problems must
be faced, it is well to have available a mechanism which allows these two problems to be
treated separately during the initial phase of work on a complex program. Observe also
that too-early choice of dat a structures can create subsequent difficulties. In particular,
necessary algorithmic cha nges discovered after the sta rt of programming can requ ire that
the data structures originally specified be modified at great cost to a project. The result can
be a poorly formed system which can take a great deal of time to debug or which may never
work properly. The same comments apply to the specification of interfaces between modules.
Many of the difficulties ju st noted can be avoided if low-level efficiency-related decision s
are postponed until after algorithms are designed and debugged. Putting this another way,
if the programmer can first build and verify his system, representing it coarsely, he can

'Work supported by the Natio nal Science Found ati on , Offi ce of Co mputing Activities, Co ntract NSF·G J·
l 202X.

97

C.A.M.W.A., Vol. t, No. l-G


98 K. KENNEDY and J. SCHWARTZ

subsequently design data structures and interfaces which fit his algorithmic scheme well.
This approach might be thought of as a two-stage development process. The first stage
produces an abstract algorithm in which low-level efficiency considerations are ignored;
the second stage converts the abstract algorithm to a concrete algorithm by the addition of
data structuring.
From this point of view, a programming language based on set theory has great attraction,
since such a language is bound to encourage a simple yet fundamental view of data. Speci-
fically, a set-theoretic programming language will tend to view data as constituting sets
of objects and mappings defined on these sets. For example, in such a language a linked
list will be a set of nodes, together with a mapping "next" which chains the list together; a
binary tree will be a set of nodes on which left and right descendant functions are defined,
etc.
This paper provides an introduction to one set-theoretic language (SETL) (see Ref. [1])
developed at New York University. We do not give a complete description of the language;
rather it is hoped that this introduction will give the reader some idea of the power of
languages which use sets as their primary data structure. The existence of other set-
theoretically oriented languages [2-4J may be noted.

II. BASIC ENTITIES


Every language has certain basic entities which can be manipulated. In SETL these
entities are "atoms", "sets" and "tuples".
A. Atoms
SETL atoms include most of the elementary data types found in other languages:
integers, reals, Boolean values, bit strings, character strings, labels, subroutines, and
functions. In addition, there are two special types of atoms which play an important role
in the language: blank atoms and the undefined atom. An element may be tested for atom
status by the operator atom a which returns "true" if a is an atom.
Integers. All the standard arithmetic operators are provided for integer values: addition
(+), subtraction (-), multiplication (*), division (I) and remainder (II), along with the
dyadic operators max and min and the monadic operator abs. Integer values may be
compared to produce Boolean values using the operators eq, ne, It, gt, Ie and ge.
Real values. The operators +, -, *, I, exp (exponential), max, min and abs are provided
for real numbers. If x is real, top x is the least integer exceeding x and bot x is the greatest
integer not exceeding x. All comparison operators available for integers are also available
for reals.
Boolean values. The special symbols t and f denote the two possible Boolean values. As
we have seen, Boolean values can be produced by certain operations on integers, rea Is, and
other basic entities. In addition, they will be produced by quantified Boolean expressions,
as discussed in Section IV. The standard Boolean operators and, or, not, implies and exor
are provided; and, not and implies may be abbreviated as a, n and imp. Boolean values are
identified with bit strings of length I.
Bit and character strings. All the Boolean operations apply on a bit-by-bit basis to bit
strings. If two strings of unequal length are combined by these operations, the shorter is
extended by leading zeros to the length of the longer. The special symbol nulb denotes the
empty bit string.
The characters allowable in a character string are all the normal members of a fairly
An introduction to the set theoretical language SETL 99

standard character set, plus a few characters which denote special SETL constants and
operators, plus some additional characters that playa special role in input/output (which
will not be discussed here). The special symbol nulc denotes the empty character string.
The length of any (bit or character) string 5 is denoted by # 5. If n is an integer and 5 is
a string, then n * 5 denotes the result of joining n copies of 5 end-to-end. The elements of a
string are numbered beginning with 1, so strings may be indexed: 5(n) denotes the nth
element of a string 5. Strings may also be "sliced"; 5(n1 :n 2) denotes the substring of 5
beginning with the n t st element and having a length of n2 elements. The special notation
s(n l :) denotes the substring beginning with the n l st element and containing all the remaining
elements in s. Strings may also be concatenated: 51 + 52 denotes the concatenation of
strings 5 I and s 2' These notations are essentially the same as the corresponding notations
for tuples (see below). Strings may be tested for equality by using the operators eq and ne.
Labels. A label is declared by its appearance within SETL code followed by a colon:
lab:x = y.
No operations combining labels, except the equality and inequality comparisons, exist;
however, labels may be members of sets and tuples and the result of applying a function
may be a label. A label-valued expression may appear in a go to statement (see Section
IV.B); this type of construction can be used to obtain a "calculated go to" effect.
Subroutines and [unctions. In this section we are concerned with programmed functions;
another type of function, the "tabular" function, will be discussed in Section III.C. Sub-
routines and functions are legitimate SETL atoms. Thus, they may be assigned, tested for
equality, appear as members of sets and tuples, and be produced as the results of other
functions. A subroutine (or function) declaration has the following form:
define name (arg I, arg 2, ... , arg n); block; end name;
(where "definef" is substituted for "define" if a function is being declared). Such a declara-
tion may be viewed as initializing the variable "name" to the subroutine atom defined by
the <block> of statements comprising the routine body. After being defined, subroutines
and functions can be called by writing the subroutine name followed by the list of actual
parameters in parentheses:
name(e 1,e2,···,e.).
Of course, function calls can appear within expressions. Arguments are elaborated at the
point of call; changes of argument value made by a subroutine are transmitted back to the
calling routine. All subroutines and functions are recursive.
Blank atoms. Blank atoms are provided to be used as structural markers in complex
objects built up in the course of a SETL computation. SETL uses blank atoms in many
situations in which a pointer-oriented language would use machine addresses or pointers
to data blocks.
No operations combining blank atoms, except the equality and inequality comparisons
exist; however, blank atoms may be members of sets and tuples and, like atoms of any
other kind, can be the result of functions.
Blank atoms are created by the built-in SETL function newat, which produces a new
blank atom each time it is called. Note, for example, that the expression <newat, newat)
designates an ordered pair consisting of two distinct blank atoms.
The undefined atom. The undefined atom Q is a particular blank atom related to various
SETL operations in rather special ways. Q is not allowed to be a member of any set, so that
100 K . KENNEDY and J. S CHWART Z

any attempt to form a combination such as


{Q} or {Q, a}
will lead to an error. In addition, Q will cause an error in most contexts within expressions.
Thi s helps to locate bugs in SETL programs because many situations in which the actual
form of data differs from the assumed form will rapidly lead to the occurrence of Q and
from thence to an error.
There are several important contexts in which Q is legal.
(a) Q is allowed in the combinations x eq Q and x ne Q .
(b) Q may be a component of a tuple.
(c) Q may appear on the right-hand side of an assignment such as
f(x) = Q;
the uses of this construction will be discussed in Section IV.C.
B. Sets
A set is an unordered finite collection of distinct basic entities. Thus sets may contain
atoms, tuples, and other sets. Any of these types may be mixed within a set. Consistently
with set theory, a set may be formed in two ways:
(1) by enumeration: {t, 2, 3} {A, B}
(2) by using a general set former construction.
The simplest type of set former is
{x E slC(xl}
which forms the set of all tho se elements x of the set s which satisfy the Boolean condition
C(x) (i.e. for which C(x) is true). A slightly more general form is
{e(x), x E 51c( x)} (1)
where e(x) is an expression involving x. The evaluation of this construct is simple-the
expression 5 is evaluated to produce a set ; then , for each x in this set such that C(x) is true.
one calculates e(x) and gathers the resulting values into a new set which is the value of (l) .
In the form (1), the Boolean condition is optional.
To orient the reader we consider a few examples: Suppose ints is the set {I, 3, 5, 7, 9}.
(1) {i E intsli It 4} creates the set of integers which are clements of ints and less than 4,
i.e. {1,3}.
(2) {i * i, i E ints} creates the set of squares of elements of ints, i.e. {I, 9, 25, 49, 81}.
An allowed variant of the construction (l) is
{e(i), min :::;; i :::;; max IC(i)}. (2)
The form (2) is evaluated as follows. For each integer between min and max such that C(i)
is true, eU) is evaluated and all the resulting values are collected into the resulting set.
In equation (2), max and min ma y be any integer expressions ; C(i) is optional.
Example : {i * 2 - 1,1 :::;; i :::;; 5} yields the set {I, 3, 5, 7, 9}.
More general multiply-iterative set formers are also allowed. A general form is
{e(x l , x 2 , · · · , x n) , XI E 51 ' X2 E S2(X 1), • .. , x, E S.(:X: I , ... , XI_I)IC(xl , . . · , xn ) } . (3)
Example: {i * j, i E {2, 3},j E {4 * i, 5 * i}} yields the set {16, 20, 36, 45}.
The individual restrictions xj E six I ' .. . , x j _ l ) appearing in equation (3) are called range
An introduction to the set theoretical language SETL 101

restrictions and may also have the forms


min (Xl"'" X j - 1) S Xj S max (Xl"'" Xj _ d
max (Xl' ... , xj - 1) 2 xj 2 min (Xl' ... ,X j _ d.
For additional information on the set former construction in SETL see Ref. [1].
A number of operations on sets are provided:
XES is a Boolean expression which is true if X is a member of the set s;
3 S is the choice function-it selects an arbitrary member of the set s;
#s yields the number of elements in s ;
Sl + S2 forms the union of sets Sl and S2 ;
S 1 * S2 forms the intersection of s 1 and S2 ;
S1 - S2 forms the set theoretic difference of s 1 and s 2 ;
pow(s) designates the set of all subsets of s ;
s with a designates the set s + {a};
s less a designates the set s - {a};
a from a denotes the composite operation b = 3S;S = siess b; i.e. an arbitrary element of
s is selected, removed from s, and assigned to b.
Sets may be compared for equality using the operators eq and ne and for set inclusion
using the operator iDes. The special notation 01 indicates the empty set.
Examples: {1, 2, 3} + {3, 5} yields {l, 2, 3, 5}
[2 * i-I, 1 sis 5} - {2, 3,4, 5} yields {I, 7, 9}
1 E {I, 2, 3} yields t
{1, 2, 3} *01 yields 01
# {2, 3} yields 2
pow ({I, 2}) yields {{I, 2}, {I}, {2}, nl}.
All basic entities have a "type". The type of an entity can be determined by the operator
type. If s is a set, then
(type s) eq set yields 1.
C. Tuples
A tuple is a special type of object fully defined by an ordered sequence of components
all but a finite number of which are identical with the undefined atom n.
C 1, C 2, C 3, ... ,
Any object may be a tuple component. The components of a tuple are ordered, and two
identical components may appear within the tuple. Two tuples are equal if and only if all
their components are equal. The length of a tuple (written # t) is the index of its last defined
component; thus the tuple Cl' C 2, C 3, ... , Cn - 1, Cn , n,n,... will usually be written

The tuple with no defined components, which has length 0, will be written as nult, The type
of a tuple is tupl, i.e. if t is a tuple then (type t) eq tupl yields 1.
< >
Let t = Cl' C2' ... ,cn be a tuple. Then t(k) denotes the kth component Ck of t. Tuples
may also be "sliced" The notation t(i :j) denotes the tuple c., c,+ <
Ci + j _ 1,...,
i.e. in the 1> ;
notation t(i :j), i indicates the initial index and j indicates the length of the subtuple. The
102 K. KENNEDY and J. SCHWARTZ

special notation t(i:) denotes the tuple which contains all components of t from the ith:
<Ci' Ci + ] , · .. , cn ) ·
Several operators are available for tuples. # t is the length of t: hd t yields t(1); tI t yields
t(2:); and if t 1 and l : are tuples, t] + t 2 is their concatenation. The special operator
pair x yields t if x is a tuple of length 2, and f otherwise.
Examples: Suppose that t] = < 1,2,3,4) and t 2 = <2, 1,2), then
t](3) yields 3
t 2(3) yields 2
t](2:2) yields (2,3)

t 2(2:) yields <1,2)


#t] yields 4

hd l z yields 2
tl t 2 yields <1,2)
t] + l: yields (1,2,3,4,2, 1,2).
There is no direct analog for tuples of the general set former discussed above; however,
such a "tuple former" can be coded using the SETL compound operator which will be
described below.
D. Names and constants
SETL names are formed in the usual way: strings of alphabetic and numeric characters,
beginning with an alphabetic character. A name may have any basic entity as its value.
SETL also provides boldface names which are used to represent infix and prefix operators
and special constants (such as 01). In a few cases, keywords are boldface. Four types of
constants are available: signed integers, real constants, character string constants, and
octal constants. Signed integers are formed in the usual way, character string constants are
enclosed in single quotes, and octal constants consist of an octal integer followed by the
suffix B. Real constants are similar to those of FORTRAN.
Examples:
signed integers: - I. 26. - 346
real constants: - 3·14, 2·56E - 10
character strings: 'abed " 'set theory'
octal constants: 00237B, 776B

III. SPECIAL EXPRESSION FORMS


The availability within SETL of sets and tuples gives the language much of its semantic
power, as well as its special flavor. Various special syntactic forms allow important opera-
tions on sets and tuples to be invoked conveniently. We shall now describe several of these
syntactic constructions.
A. Quantified Boolean and conditional expressions
Most programming languages permit Boolean expressions such as
An introduction to the set theoretical language SETL 103

[(a gt b) and (b It e)) or (b eq 0)


which correspond roughly to first-order logic. SETL also allows quantified expressions to
be written in notations borrowed from the predicate calculus. If s is a set and C(x) is a
Boolean formula, then a formula of either of the forms
:Jx E sIC(x) (4)
\Ix E slc(x) (5)
represents a Boolean value. The value of the first of these forms, called the existentially
quantified form, is obtained by calculating the value of C(x) for each element of a set s
in turn and by assigning the value true on first obtaining a true result, but assigning the
value false if no such result is found. If the result is true, x will be set equal to the first
element of the set for which C(x) is true. Existentially quantified expressions are therefore
convenient for the expression of "search loops" in SETL. Suppose for example that we
wish to search a set of integers ints for a number which is divisible by 3, and to perform
some action on that number if it is found. The search for such a number is coded simply as
:Jx E intsl(x#3) eq O.
This expression can then be included in a conditional statement (see Section IV) such as:
if Jx E intsl(x//3) eq 0 then perform action
else error return.
The value of expression (5), called a "universally quantified" expression, is true if C(x)
is true for every x in s and false otherwise. This construction allows convenient representa-
tion of "verification loops" ; that is, loops which verify that a certain condition holds for all
elements of a set. Suppose for example that we want to see if a set of integers ints contains
only odd numbers. This can be checked by evaluating
\Ix E intsl(x//2) eq 1.
The Boolean expression C(x) in a quantified expression can be completely general (i.e.
it is allowed to contain other quantified expressions) with one small restriction. The occur-
rence of the name "x" in \Ix or Jx is known as a bound occurrence because x is the variable
to which the quantifier attaches. In a construction like
\Ix E sjC(x).
C(x) may contain no bound occurrence of the name x. This restriction rules out ambiguous
constructions such as
vx E sl(D(x) or (:Jx E sIE(x)))
which are set-theoretic versions of programming sequences, illegal in other languages, in
which an iteration variable is modified within the iteration itself.
Several forms of quantified expressions related to (4) and (5) are also provided.
min :os; :Jk :os; maxIC(k) (6a)
max ~ :Jk ~ minIC(k) (6b)
min :os; \lk :os; maxIC(k) (7a)
104 K. KENNEDY and J. SCHWARTZ

max ~ Vk ~ minIC(k) (7b)

In these formulae max and min are integer expressions in which k does not occur. The
values of these quantified expressions are calculated in a manner similar to (4) and (5),
except that the values of k are selected for testing in the order implied by the range. Thus,
5 :s; :Jk :s; 91(k;i'2) eq 0
will yield the value "true" and k will be set to 6 (since 6 is the first integer in the specified
range satisfying the condition).
More generalized quantified forms such as
:JxI E Sl' VX z E sz(xd,·· ·IC(x l , · · · , x n )
are provided. These have meanings which should be evident. We will be content to give an
example to show their usefulness. Suppose ints is a set of integers. Then
:Jx E ints, »v E (ints - {x}) [x It y
is a (highly inefficient) search loop which finds the minimal element of ints and assigns it to x.
SETL Boolean expressions can be used within the ALGOL-like conditional expressions
which SETL provides. These conditional expressions have the form
if bool, then expr I else if bool, then expr z ... else expr..

The meaning of such an expression will be obvious to the reader.


The following is an expression which calculates the minimal element of a set ints of
integers if that set is non-empty and produces Q otherwise:
if (Jx E ints, VyE (ints - {x}) [x It y) then x else Q.
The outer parentheses are redundant in this case and are included for clarity.
As a final example, we show how the PL/I index function can be coded as a conditional
expression. Suppose "bigstring" and "pattern" are character strings and we wish to see if
pattern is a substring of bigstring. If it is, we wish to return the index in bigstring of the first
character of pattern, otherwise we return O. The following expression accomplishes this:
if 1 :s; :Jk :s; (# bigstring - # pattern + I) [bigstring (k: # pattern) eq pattern then k else O.
B. Compound operators
The compound operator is another SETL expression form which can be used to avoid
the use of loops. Its form is
[op:x E sIC(x)]e(x) (8)
where op is any dyadic operator, s is a set, C(x) is an (optional) Boolean expression, and e(x)
is an expression involving x. The value of this construction is calculated as follows: form
the set Sl = {e(x): x E slc(x)}; select and remove an element y from Sl'; then for each
remaining element XI in 51' apply op to y and XI to produce a new value for y. The result is
the final value of y when 51 is exhausted. This process can be clarified by an example:
[+ :xE{l,2,3}](x*x)yields 1 + 4 + 9 = 14.
The compound operator in this example is the SETL analog of the mathematical summation
I. The maximum of a set of integers ints can be computed by another compound operator
[max: x E ints]x.
An introduction to the set theoretical language SETL 105

The maximum odd number in ints is


[max: x E ints[ (xI/2) eq 1Jx.
Alternate forms of the compound operator are provided:
[op:min :s; k :s; max[C(k)Je(k) (9a)
[op: max :2: k :2: min[C(k)Je(k). (9b)
The meanings of these forms should be evident in the light of previous discussion.
Quantified expressions allow a concise formulation of search loops; compound operators
allow concise formulation of more general computation loops. As an example, we give an
expression which reverses the order of the characters in a string str (recall that" +" can
designate string concatenation)
[+: #str :2: k :2: IJ str(k).
This operator takes the last character of str concatenates the next-to-last, and so on, thus
reversing the string.
More general compound operators are ~rovided. The general form is
[op:x t ESt, X z E sz(xd,···, XnE sn(x t"", x n - dlc(x t,···, xn)Je(x t,···, x n)
which has an evident meaning. An example: to form the Cartesian product of sets sl and s2,
we may write
[+ :xEsl,yEs2]{<x,y)}.
We mentioned that no direct analog for tuples ofthe general set former is provided by SETL.
However, such a "tuple-former" can be defined in terms of the general compound operator
construction. If we wish to form a tuple of all elements x in a set s such that C(x) is true,
we may use the following construction:
[+ :xEslC(x)J<x).
(Recall that + designates concatenation when applied to tuples.)

C. T abular functions
In set theory, a function f is defined as a set of ordered pairs <x,
y), where x is an element
of the domain of f and y is the corresponding element of the range. SETL allows sets of
pairs to be used as "tabular functions" in just this way.
In particular, SETL allows any set f of ordered pairs to be used as a mapping or relation.
If the first component of an ordered pair in f defines the pair uniquely, then f is single
valued; otherwise f is multiple valued. Let us first consider single-valued functions. As an
example, we take
f = {(1, I), <2,4), (3, 9)}.
Then f(2) designates the second component of the unique pair in f whose first element is 2,
so that f(2) yields 4. Similarly
f(l) yields 1 and f(3) yields 9.
What happens when we try to evaluate f(4)? Since 'there is no pair in f with 4 as its first
component, the result is the SETL undefined atom Q.
Tabular functions are sets, and, like all other sets, can be modified. Suppose we wish to
106 K . KENNEDY and J. S CHWARTZ

modify the function f so as to make the image of 4 under f be 16. This could be done by
the assignment (see Section IV)
f = f + {<4, 16) }
which adds the desired pair to the set. However, SETL allows the diction
f(4) = 16
to be used with the same effect. This same construction can be used to change the value off
for elements already in the domain of f. For example, the assignment
f(3) = 10
would change f to
{<I, 1),<2,4), <3, 10) }.
In summary, the assignment f(x) = y causes the following actions to take place; if there are
any pairs in f with x as their first component, these pairs are deleted; then <x,
y ) is added
to f. The special construction
f(x) = n
causes all pairs whose first component is x to be removed from f.
We pointed out in the Introduction that SETL encourages one to view data abstractly,
i.e. as a collection of objects and of mappings defined on those objects. Tabular functions
are a prime mechanism supporting this approach. For example, consider the mini-tree

This tree can be represented in SETL by the set of its nodes {A, B, C, D, E} and by two
functions: lson, which maps nodes onto their left descendants; and rson, which maps nodes
onto their right descendants. In the specific case considered we would have
lson = {<A, B), <B, D) }
rson = {<A, C), <B, E) }.
To add the node F to the left of node C we write
Ison (C) = F
and add F to the set of nodes . To delete node E from the tree, we remove it from the set
of nodes and write
rson (B) = n.
Tabular functions can be also used to build up control structures. An example is the switch .
Suppose 11 ,12 " " , In are labels, and that sw is defined as
{(1, l j ) , O , 12 ),· · · , <n, ln ) }·
An introduction to the set theoretical language SETL 107

Then
go to sw(i)

causes a transfer of control to the ith label Ii.


SETL supports several significant extensions of the basic tabular function construction.
Suppose I is a function and s is a set; then
f[s]

denotes the set of all images of elements of s under I, i.e. denotes the set
[I(X):XES}.

Thus if
I= {(I, 1),<L4),(3,9)}
then

I[{1,2}] is {I,4}.

The square-bracket notation can be used with programmed functions and standard opera-
tors as well. For example, the domain of a function I may be calculated by
hd[I]

and its range by


I [hd[f]]
SETL also supports an extension of standard functional notation which allows multi-
valued functions to be used conveniently. Suppose that I is a set of ordered pairs defining
a multi-valued function. Then f(x) is only defined if there exists a unique pair p in I with x
its first component; in cases of nonuniqueness, the value of {(x) is n. However, in all cases,
the notation I {x} denotes the set of all images of x under I. Thus if

I = {(I, 1),(1,2),(L3),(2,4)}
then
I{2} yields {3,4}.
The assignment

f' f
, lX
} = S''
where s is a set, deletes from I all pairs in I with x as their first component and adds the
set of pairs
[(X,y);yES}

to I. The square-bracket notation can also be used with multivalued maps: if s is a set, f[s]
is the set of all images of elements of s under I, i.e. is the set
[+ .x e sjj'{x}.
108 K. KENNEDY and J. SCHWARTZ

Multivalued maps are useful in such application areas as graph theory. The directed graph

can be represented by the set of its nodes {A, B, C, D, E} and by an immediate successor
map, cesor, where
cesor = {<A, B), <A, C), <A, D), <B, D)o <B, E), <Co D), <C, A), <Eo A)}.
The set of all direct successors of B in the graph is then given by
cesor{B),
which yields {D, E}.
Tabular maps with more than one argument (i.e. steps of tuples of length greater than 2)
are also provided in SETL. Suppose "map" is a set of ordered triples. If there exists a
unique triple in map whose first two components are a and b, the value of this third com-
ponent of this triple is computed by
map(a, h).
If such a unique triple does not exist, the value of this expression is Q. An alternative con-
struction yielding the same value is
(map{a})(h).
Note that map{ a} yields the set of ordered pairs which are tails of triples beginning with a,
and that map{ a} may itself be used as a tabular function. Even if map is a multi-valued
function of two variables, the set of third components of triples beginning with a and b
can be written
map{a, h:.
Alternatively we may write
(map{a}) [h}
to obtain the same value. We note finally that the square-bracket notation
map[5 1, 5 2 ]
yields the set of third components of triples in map whose first component is a member of
51 and whose second component is a member of 52'
Extensions of these notations to functions of more than two variables are straightforward.

IV. STATEM ENTS

Many SETL statement forms resemble those of ALGOL or PL/l. Assignments, transfer
of control, subroutine calls, etc. are all provided. In addition, SETL provides iteration
forms which allow iterations over sets to be written easily. We will now describe the four
An introduction to the set theoretical language SETL 109

basic statement types provided. Note that all statements in SETL are terminated by a
semicolon.

A. Assignments

Assignment statements in SETL have the form:


lexpr = rexpr;
where rexpr is any expression which yields a value and lexpr is an expression which yields
an object to which assignment can be made. SETL allows fairly general constructions to be
used on the left-hand side of assignments, and we will not attempt to give a complete des-
cription of those constructions here (such a description is found in Ref. [IJ). Instead, we will
present a few examples. A name may appear on the left, as in
x={1,2,3,4}.

We have already seen that the form f(x), where f is a tabular function, may appear on the
left :
f(x) = y.
If t is a tuple whose value is (1,2,3), then
hdt =4
is legal, and changes t to (4,2,3). Similarly
t(2) =4
is allowed, and changes t to (1,4,3). Operators which can appear on the left may be
compounded; for example
hdtlt=4
is allowed, and happens to have the same effect as t(2) = 4. Conditionals may appear on the
left, as in
if b then x else y = expr.
"Slices" of strings may also appear, as in
string (i: s) = pattern.
An interesting form of assignment can be achieved by using tuples on the left. Consider
(Xl' x 2) = (Yl, Y2)'
This has the effect of performing the two assignments Xl = Yl and X 2 = Y2 in parallel. Thus,
the values of X and Y can be interchanged by
(x, y) = (y, x).
If t is a tuple, then
(x,y)=t
means x = hd t : y = tl t.
110 K. KENNEDY and J. SCHWARTZ

From these examples, one can see that a rule of thumb is: if an expression seems sensible
on the left of an assignment then it may appear there. Of course this is no substitute for
the explicit discussion of left-hand sides given in Ref. [1].
Assignments can be made as side effects of the evaluation of expressions by use of the
special SETL operator is. The form of such an assignment is
expr is name
where expr is the expression to be evaluated and name is the variable to which the assign-
ment is made. The value of this expression is the value of expr; in addition, it assigns this
value to the variable name.

B. Transfer of control
SETL provides a "go to" statement (although because of the generality of the other
control structures of SETL this statement is not often needed). The form of the go to state-
ment is
go to labelexpr ;
where labelexpr is an expression which produces a label as its value.
Another type of control transfer statement, which is associated with subroutines and
functions, is the "return" statement. (Subprocedure calls were discussed in Section II.)
These have the form
return; (10)
and
return expr;. (11)
The first form is used for returning from subroutines; the second is used to return from a
function, and delivers the value yielded by expr as the function value.

C. Conditional statements
SETL provides an ALGOL-like conditional statement of the form
if bool. then block. else if bool , then block, ... else block, ;
which may also have the slightly simpler form
if bool, then block. else if bool , then block, ... else if bool n _ 1 then block n _ 1 ;.

Here bool., ... , bool, _ 1 must be Boolean expressions; each of block l' ... , block, is an
arbitrary sequence of valid SETL statements (followed by semicolons), possibly including
go to statements and other if statements.
Each statement block appearing in an if-statement, with the exception of the last such
statement block, is terminated by the occurrence of the next following keyword "else" or
"then". The last block is terminated by a semicolon. Since the last statement of the last
block will itself be terminated by a semicolon, the visible sign of an if-statement termination
will often be a double semicolon, as in
if x gt 0 then sum = sum + x;;. (12)
An introduction to the set theoretical language SETL III

This convention is adequate for short if-statements, but may cause confusion in longer ones
especially if they are nested. Therefore, several alternative terminates are provided, speci-
fically.
end if;
or
end if tokens;
where "tokens" can be the first few tokens following the "if" opening the conditional
statement being terminated. For example, statement (12) above might also be written
if x gt 0 then sum = sum + x ; end if x ;
or
if x gt 0 then sum = sum + x; end if x gt 0;.
SETL also has an interesting conditional statement called the "flow" statement. This
statement, which we will not describe here, provides a flowchart-like, two-dimensional
syntax for the description of complex conditional sequences of actions.

D. Iteration
Two forms of iterators are allowed in SETL : set-theoretic iterators and "while" iterators.
Set-theoretic iterators. The basic form of a set-theoretic iteration statement is
('t/x E slc(x)) block;
this has the effect of executing the collection of statements block once for each of these
elements x of the set s for which the (optional) Boolean condition C(x) is true. For example,
if ints is a set of integers, then
('t/x E intslx//2 ne 0) sum = sum + x;;
will cause all odd integers in the set ints to be added to sum. As in the case ofthe if-statement,
the extra semicolon terminating the scope of an iterator may be replaced by
end; or end Vtokens ;
so that the above loop could also appear as
('t/XE intslx//2 ne 0) sum = sum + x; end't/x;.
Iterators involving alternate forms of range restriction are available, e.g.
(min ~ 't/k ~ maxlC(k))
(max ~ 't/k ~ minlC(k)).
Similarly, multiple range restrictions (of either of the types shown above) may be used; the
general form of the set-theoretic iterator is
('t/x 1 E S1' X 2 E s2(xd,··., x, E sn(x 1 , . · · , X n- tlIC(x 1 , · · · , x n)) block ;
Example: Let "triang" be a triangular matrix of size n (represented by a tabular function
of two variables). The elements are stored in the upper right half of the matrix, so that
112 K. KENNEDY and J. SCHWARTZ

column > row for each non-zero element. The following loop sets all entries in the matrix
to zero.
(1 ~ Vi ~ n, i ~ Vj ~ n) triang(i,j) = 0; end Vi;
While-iterators. The elementary form of this type of iteration is
(while C) block;.

This executes block repeatedly until the Boolean condition C becomes false. The termin-
ating semicolon may be replaced by
end;
or by
end while;
or by
end while tokens;

where "tokens" is used in the same sense as above. For example, the following (quite
inefficient) while-loop takes the character string str and produces its reversal backstr.

backstr = DUIc;
(while str De nulc)
backstr = backstr + str (# str):
str = str(1: #str-l);
end while str ..

In order to improve the readability of while loops, SETL allows bookkeeping operations
to be moved from the end of such a loop to a position near the loop header. This form of
loop has the following appearance:

(while C doing blocka) blockb;.

Using this notation, the above example can be recast as follows.

backstr = nulc;
(while str De nulc doing str = str (1 : # str-l););
backstr = backstr + str (# str);
end while str ..

Iteration escapes. In order to avoid proliferating go to statements and labels within


iterative statements, SETL provides two types of iteration escapes.
The "quit" statement, which may be used with either type of iterative statement, may
have any of the following forms:

(1) quit;
(2) quit while;
(3) quit Vtokens:
(4) quit while tokens ;
An introduction to the set theoretical language SETL 113

Form (1) causes a transfer to the first statement outside the range of the innermost iterative
statement; form (2) causes an escape from the innermost while iteration; form (3) causes an
escape from the innermost \if iteration whose first few tokens match the tokens following
quit; form (4) is the analog of (3) for while loops;
The "continue" statement, which has forms like those of the quit statement, causes a
transfer not out of but to the end of the range of an iterative statement. Thus it allows a
single iteration to be bypassed without terminating the iterative loop itself.

V. TWO ELEMENTARY EXAMPLES OF THE USE OF SETL

Before proceeding to give more extensive examples in the next section, we shall give a
few very simple functions which will show how various types of applications might be
treated in SETL, and which will also illustrate some of the language features we have been
discussing.
Our first example involves character manipulation; specifically, text editing. Suppose
we have a line of text and we wish to replace the first instance of a certain substring called
"old" with the string "new". The function returns the edited line as its value.
definef replace (line, old, new);
!* search line-for the old string *!
if (1 ~ :Jk ~ # line - # old + liline(k: # old) eq old) then return (line( 1: k - 1)
-l-new + line(k + #old:»;
else !* no editing *! return line; end replace;.
Next we give a simple example from number theory, specifically a function which, given
an integer n, returns the tuple of its prime factors in increasing order.
definef primefacts (n);
facts = Dnlt;!* start with null vector *!
m = n;
(while m gt 1)
factor = if(2 :s; :Jk < ml(m!/k) eq 0) then k else m;
/* if this finds a factor, attach it to the tuple of factors, and divide m by the new
factor */
facts ( # facts + 1) = factor; m = m/factor;
end while;
return facts;
end primefacts ;.

VI. THE ABSTRACT REPRESENTATION OF ALGORITHMS


The best way to illustrate the important characteristic features of a language is often to
show its application to some well-known problems. With this intent, we give additional
examples of the use of our set-theoretic language.

A. Knuth's topological sort


James Morris has used this well known problem (see Ref. [5J) to illustrate the Los Alamos
language MADCAP (see Ref. [2]); we recast it in SETL. The problem is this: given a

C.A.M.W.A., Vol. I, No. l-H


114 K. KENNEDY and J. SCHWARTZ

partially ordered set, produce a total order on the set which respects the partial order.
That is, if item! precedes item2 in the partial order then item! must precede item2 in the
imposed total order.

The SETL algorithm that we shall describe assumes that the partial order is represented
by a set of ordered pairs "partord". A pair <a, b) belongs to partord if a precedes b in the
partial order.
K~uth defines the method to be employed:

"There is a very simple way to do topological sorting: We start by taking an object which is not
preceded by any other object in the ordering. This object may be placed first in the output. Now we
remove this object from the set s. The resulting set is again partially ordered, and the process can be
repeated until the whole set has been sorted."

In implementing this method we use two functions, which respectively produce the
domain (dom) and range (range) of a tabular function f.
definef dom(f) ; return hd[f] ; end dom ;
definef range(f); return .f[dom(f)]; end range;.
The set of all nodes in the partial order can then be computed by
nodeset = dom(partord) + range(partord);.
After initializing the output tuple order, we loop through nodeset, selecting an element
which is not preceded by any other node in the partial order. Such an element will always
exist if nodeset is not empty. Then the element x is placed at the end of order and eliminated
from nodeset.
This procedure is represented by the following code.
definef topsort (partord);
/* create nodeset and initialize order */
nodeset = dom(partord) + range(partord);
order = DUIt;
/* loop while there are elements in nodeset, choosing a maximal element */
(while 3xEnodeseti (vvenodesetlnof <y, x) E partord)))
/* the existential test will also find x */
nodeset = nodeset - {x};
/* add x to order */
order = order + <x);
end while;
return order;
end topsort ;.

Upon reading this code, which is clearly faithful to the Knuth description, one is im-
mediately suspicious. It seems too easy; there must be some hidden inefficiencies. In fact
there are, but for someone who is writing a one-shot program this program would suffice.
However, if one were designing a production system using SETL, one would want to
recognize and eliminate sources of algorithmic inefficiency. The next step therefore is to
recast the above algorithm in a form allowing a more efficient representation in a language
such as ALGOL-68.
An introduction to the set theoretical language SETL 115

The operation in the above algorithm which implies inefficiency is the repeated search
of node set to find elements with no predecessor. One way of eliminating this inefficiency
is to introduce an auxiliary data structure which allows maximal elements of nodeset to be
detected more rapidly. In that context, it is useful to maintain a table "count" always
showing the number of elements in nodeset which are predecessors of a given element.
In updating count each time an element is removed from nodeset, we will detect new
maximal elements as cases in which count has become zero. The improved algorithm has
the following SETL form:

definef topsort(partord);
/* create nodeset and initialize order */
node set = dom(partord) + range(partord);
order = nult ;
/* initialize count to zero */
count = nl; (vxsnodeset) count(x) = 0;;
/* now build true count of number of predecessors */
(lipe=partord) count(p(2)) = count(p(2)) + 1 .:
/* initialize outset to be those nodes whose count is zero */
outset = {xenodesetjcounux] = O};
/* loop while there are nodes to be output */
(while outset ne nl) x from outset;
order = order + <x);
/* update counts of successors */
(lis e= partord{x})
count(s) = count(s) - 1;
/* add node to outset if count goes to 0 */
if count(s) eq 0 then
outset = outset + [s};
end if;
end lis;
end while;
return order;
end topsort ;

In realizing this algorithm in a language of the level of ALGOL-68, one will want to
eliminate all uses of sets from it. In doing so, one will have to proceed down one of two
paths, barely distinguishable at the SETL level but quite different in most non-set theoretic
languages (SNOBOL is an exception). If the elements of nodeset are represented as integers
filling a range from 1 to some highest n, it is easy to eliminate all uses of sets. For example,
one can represent "partord" by an array (i.e. tuple) of lists, the kth of which gives all the
successors of node k. Then count can be represented by an array of n integers, outset by a
list, and order by a tuple. On the other hand, if the elements of nodeset are not a tightly
clustered collection of integers. one may choose either to issue clustered serial numbers
to the elements of nodeset (i.e. to use a hash technique); or to keep the elements x of nodeset
together with the associated quantities count(x) on a list or in a sorted array; or, if the
116 K. KENNEDY and 1. SCHWARTZ

elements of nodeset are pointers to areas in which an integer may be stored, to keep elements
on a list and to store count(x) in the storage area referenced by x.
In the following code we make the simplest assumption, namely that nodeset is the set
of integers from 1 to # nodeset. The code illustrates one method for handling lists in SETL.
We use a LISP-like representation, in which a list (ai' a z, ... ,an) is a tuple <ai' ... , an)'
The LISP CAR is then our hd; the LISP CDR is our t1.

definef topsort(partord);
/* partord is a tuple whose kth element is a list of all the successors of node k */
/* initialize order, and create the array count */
order = nult; count = nult ;
/* initialize count to zero */
(1 ~ Vk ~ #partord) count(k) = 0;;
/* now build true count of the number of predecessors of each node */
(1 ~ vk ~ # partord) list = partord(k);
(while list ne nult doing list = tl Iist.) count(hd list) = count(hd list) + 1;
end while;
end Vk;
/* let outset be the list of all those elements whose count is zero */
outset = nult;
(l ~ Vk ~ # partordlcounuc) eq 0) outset = ck, outset) ; ;
/* loop while there are nodes to be output */
(while outset ne nult)
x = hd outset; outset = tl outset;
order = order + <x);
/* update counts of successors */
list = partord(x);
(while list ne nult doing list = tl Iist .)
s = hd list;
count(s) = count(s) - 1;
/* add node to outset if count goes to 0 */
if count(s) eq 0 then
outset = <s, outset) ;
end if;
end while list;
end while outset;
return order;
end topsort;

B. Tree construction from node lists


This is another problem discussed by Knuth. Suppose we have lists of the nodes of a
tree in preorder and postorder. Our problem is to construct the tree from these lists. For
example, the two lists
An introduction to the set theoretical language SETL 117

pre order postorder

A G
B D
D H
G B
H E
E A
C C
F F
result in the tree:

From this example we can see three things:


(l) the first element in the preorder list will always be the root (head) of the tree;
(2) in the postorder list, every node which precedes the root is in the left-hand subtree of
the given tree and every node which follows the root is in its right-hand subtree;
(3) in the preorder list, all the nodes of the left-hand subtree precede all the nodes of the
right-hand subtree.
Thus we can
(i) locate the head in the postorder list;
(ii) separate the postorder list P into two parts Pl and Pz,
Pl being the part of P preceding the head, and
pz being the part of P following the head;
(iii) separate the pre order list into two parts ql and qz,
ql following the head and being equal in length to Pl'
qz following ql and being equal in length to pz ;
(iv) use w.. q d to construct the left-hand subtree of the desired tree t and (Pz, qz) to construct
t's right-hand subtree. Finally we can attach these two subtrees to the "head" node
to obtain t.
The following SETL subroutine "buildtree" uses exactly this procedure. Its "input"
arguments are two tuples "prelist" and "postlist" which contain the nodes in appropriate
orders. It has also two "output" arguments "lson" and "rson" which, on return from
buildtree, will be maps defining left descent and right descent in the desired tree. When
buildtree (which is recursive) is initially called, the lson and rson arguments should be
supplied as nl, nl.
define buildtree (prelist, postlist, lson, rson);
if (# prelist) eq 0 then /* tree lacks all nodes so */ return;
head = prelist(1);
/* now we use a SETL existential as a locator */
118 K. KENNEDY and J. SCHWARTZ

must = 1 ::; :In ::; # postlistlpostlisun) eq head;


pI = prelist(2:n - 1); p2 = prelist(n + 1:);
ql = postlist(1:n - 1); q2 = postlist(n + 1 :);
/* now build left-hand and right-hand subtree */
buildtree (pI, ql, Ison, rson); buildtree (p2, q2, Ison, rson);
/* finally make the head of these two trees descendants of head if they are nontrivial */
Ison(head) = if n eq 1 then n else prelist(2);
rson(head) = ifn eq #prelist then n else prelist(n + 1);
return ;
end buildtree ;.
We may now make the following remarks concerning the optimization of the foregoing
algorithm (at the abstract level):
(a) All the vectors pI which are formed are subvectors of the originally given prelist and
postlist. Thus it is not really necessary to form these vectors; we need merely use two
integers to indicate their upper and lower limits.
(b) If we make available a mapping which gives the position, in postlist, of each of the
items appearing in prelist, then the existential search carried out in the above algorithm
(which is its most time-consuming inner loop) may be omitted. Such a mapping is
easily computed.
(c) With the buildtree algorithm revised as suggested by (a) and (b), we need merely trans-
mit to buildtree the starting and finishing indices of the prelist sublist representing a
subtree to be constructed, and the starting index of the corresponding subpart of
postlist. The tuples prelist and post list themselves, as well as the maps lson and rson
which are being built up, can be transmitted globally.
Making these revisions, we obtain an optimized algorithm, which should be called in
the following way:
postinverse = {(postlist(n), n), 1 ::; n ::; # postlist};
Ison = 01; rson = 01;
buildtree (1, # prelist, 1);
The revised buildtree algorithm itself has the following form:
define buildtree (prestart, preend, poststart);
if preend It prestart then /*tree has no nodes so-,' return;
end if;
/* now find relative position in postlist of subtree head */
n = postinverse(prelist(prestart)) - posts tart ;
psI = prestart + 1; pel = prestart + n; qsl = poststart;
ps2 = pel + 1; pe2 = preend; qs2 = poststart + n + 1;
/* now build left-hand and right-hand subtree */ buildtree (psI, pel, qsl); buildtree
(ps2, pe2, qs2);
/* finally, make the heads of these two trees descendants of head, if they are nontrivial */
head = prelist(prestart);
Ison(head) = if ps1 gt pe1 then n else prelist (ps1);
rson(head) = if ps2 gt pe2 then n else prelist(ps2);
return ;
end buildtree ;
An introduction to the set theoretical language SETL 119

At this point, it is not hard to select data structures appropriate to an efficient low-level
implementation of this algorithm. The crucial structure is that representing postinverse.
The use in our abstract algorithm of a SETL map is best imitated in a low level language by
using a hash table, which allows the postlist position of a node to be retrieved rapidly if the
node is given.
If this approach is used, then the SETL statement forming postinverse will be represented
by a lower level procedure which inserts tree nodes and their positions into the hash table.
If the nodes happen to be numbered from 1 to k, things are even simpler, since in this case
postinverse can simply be an array of length k, and can be initialized simply by putting n
in location postlist(n) for all 1 ::;; n ::;; k.

VII. SUMMARY

We have described the semantic constructs available in the set-theoretic language (SETL)
and have presented enough of its syntax to give the flavor of the language. Set-theoretic
languages of this kind are powerful tools for attacking complex problems because they
allow efficiency-related considerations (for example, data structure choices) to be post-
poned until after abstract algorithms for a problem have been designed.
The authors believe that the advantages of set theoretic languages may in the near future
make them prime vehicles for algorithm specification, large system development, and
one-shot programming projects.

REFERENCES

I. 1. T. Schwartz, On Programming: An Interim Report on the SETL Project. Installment I-Generalities.


Installment 2~ The SETL Language and Examples of its Use. Computer Science Department, Courant
Institute of Mathematical Sciences, New York University (1973).
2. 1. B. Morris, A Comparison of MADCAP and SETL, Preliminary draft, Los Alamos Scientific Laboratory,
University of California.
3. A. W. Elcock, et al. ABSET, a Programming Language Based on Sets . Motivation and Examples, Machine
Intelligence 6, Edinburgh University Press (1971).
4. J. A. Feldman and P. D. Rovner, An ALGOL-based associative language, Commun. ACM 12 (8), 439-449
(1969).
5. D. E. Knuth, The Art of Computer Programming, Vol. I, Addison-Wesley, London (1969).

You might also like