A Discipline of Programming - Edsger Dijkstra PDF
A Discipline of Programming - Edsger Dijkstra PDF
a
disci,line
0 .
proan1mm1na
edsger
w.
dijkstn1
' ' For a long time I have wanted to write a
book somewhat along the lines of this one: on
the one hand I knew that programs could have
a compelling and deep logical beauty, on the
other hand I was forced to admit that most
programs are presented in a way fit for mechan-
ical execution but, even if of any beauty at all,
totally unfit for human appreciation. ' '
A DISCIPLINE
OF PROGRAMMING
EDSGER W. DIJKSTRA
Burroughs Research Fellow,
Professor Extraordinarius,
Technological University, Eindhoven
PRENTICE-HALL, INC.
10
FOREWORD ix
PREFACE xiii
0 EXECUTIONAL ABSTRACTION
5 TWO THEOREMS 37
vii
viii CONTENTS
11 ARRAY VARIABLES 94
AN EXERCISE ATTRIBUTED TO
17 R. W. HAMMING 129
27 IN RETROSPECT 209
ix
FOREWORD
In the older intellectual disciplines of poetry, music, art, and science, histo-
rians pay tribute to those outstanding practitioners, whose achievements have
widened the experience and understanding of their admirers, and have
inspired and enhanced the talents of their imitators. Their innovations are
based on superb skill in the practice of their craft, combined with an acute
insight into the underlying principles. In many 'cases, their influence is en-
hanced by their breadth of culture and the power and lucidity of their expres-
sion.
This book expounds, in its author's usual cultured style, his radical new
insights into the nature of computer programming. From these insights, he
has developed a new range of programming methods and notational tools,
which are displayed and tested in a host of elegant and efficient examples.
This will surely be recognised as one of the outstanding achievements in the
development of the intellectual discipline of computer programming.
C.A.R. HOARE
xi
PREFACE
For a long time I have wanted to write a book somewhat along the lines of
this one: on the one hand I knew that programs could have a compelling and
deep logical beauty, on the other hand I was forced to admit that most pro-
grams are presented in a way fit for mechanical execution but, even if of any
beauty at all, totally unfit for human appreciation. A second reason for dis-
satisfaction was that algorithms are often published in the form of finished
products, while the majority of the considerations that had played their role
during the design process and should justify the eventual shape of the finished
program were often hardly mentioned. My original idea was to publish a
number of beautiful algorithms in such a way that the reader could appreciate
their beauty, and I envisaged doing so by describing the -real or imagined-
design process that would each time lead to the program concerned. I have
remained true to my original intention in the sense that the long sequence of
chapters, in each of which a new problem is tackled and solved, is still the
core of this monograph; on the other hand the final book is quite different
from what I had foreseen, for the self-imposed task to present these solutions
in a natural and convincing manner has been responsible for so much more,
that I shall remain grateful forever for having undertaken it.
When starting on a book like this, one is immediately faced with the
question: "Which programming language am I going to use?'', and this is not
a mere question of presentation! A most important, but also a most elusive,
aspect of any tool is its influence on the habits of those who train themselves
in its use. If the tool is a programming language, this influence is -whether
we like it or not- an influence on our thinking habits. Having analyzed that
influence to the best of my knowledge, I had come to the conclusion that
none of the existing programming languages, nor a subset of them, would
suit my purpose; on the other hand I knew myself so unready for the design
xiii
xiv PREFACE
of a new programming language that I had taken a vow not to do so for the
next five years, and I had a most distinct feeling that that period had not
yet elapsed! (Prior to that, among many other things, this monograph had to
be written.) I have tried to resolve this conflict by only designing a mini-
Ianguage suitable for my purposes, by making only those commitments
that seemed unavoidable and sufficiently justified.
This hesitation and self-imposed restriction, when ill-understood, may
make this monograph disappointing for many of its potential readers. It will
certainly leave all those dissatisfied who identify the difficulty of program-
ming with the difficulty of cunning exploitation of the elaborate and baroque
tools known as "higher level programming languages" or -worse!- "pro-
gramming systems". When they feel cheated because I just ignore all those
bells and whistles, I can only answer: "Are you quite sure that all those bells
and whistles, all those wonderful facilities of your so-called "powerful" pro-
gramming languages belong to the solution set rather than to the problem
set?". I can only hope that, in spite of my usage of a mini-language, they will
study my text; after having done so, they may agree that, even without the
bells and the whistles, so rich a subject remains that it is questionable whether
the majority of the bells and the whistles should have been introduced in the
first place. And to all readers with a pronounced interest in the design of pro-
gramming languages, I can only express my regret that, as yet, I do not feel
able to be much more explicit on that subject; on the other hand I hope that,
for the time being, this monograph will inspire them and will enable them
to avoid some of the mistakes they might have made without having read it.
be a most helpful discovery that the same program text always admits two
rather complementary interpretations, the interpretation as a code for a
predicate transformer, which seems the more suitable one for us, versus the
interpretation as executable code, an interpretation I prefer to leave to the
machines! The second surprise was that the most natural and systematic
"codes for predicate transformers" that I could think of would call for non-
deterministic implementations when regarded as "executable code". For a
while I shuddered at the thought of introducing nondeterminacy already in
uniprogramming (the complications it has caused in multiprogramming were
only too well known to me!), until I realized that the text interpretation as
code for a predicate transformer has its own, independent right of existence.
(And in retrospect we may observe that many of the problems multiprogram-
ming has posed in the past are nothing else but the consequence of a prior
tendency to attach undue significance to determinacy.) Eventually I came to
regard nondeterminacy as the normal situation, determinacy being reduced
to a -not even very interesting- special case.
After having laid the foundations, I started with what I had intended to
do all the time, viz. solve a long sequence of problems. To do so was an
unexpected pleasure. I experienced that the formal apparatus gave me a much
firmer grip on what I was doing than I was used to; I had the pleasure of
discovering that explicit concerns about termination can be of great heuristic
value-to the extent that I came to regret the strong bias towards partial
correctness that is still so common. The greatest pleasure, however, was that
for the majority of the problems that I had solved before, this time I ended
up with a more beautiful solution! This was very encouraging, for I took it
as an indication that the methods developed had, indeed, improved my pro-
gramming ability.
How should this monograph be studied? The best advice I can give is to
stop reading as soon as a problem has been described and to try to solve it
yourself before reading on. Trying to solve the problem on your own seems
the only way in which you can assess how difficult the problem is; it gives
you the opportunity to compare your own solution with mine; and it may
give you the satisfaction of having discovered yourself a solution that is
superior to mine. And, by way of a priori reassurance: be not depressed when
you find the text far from easy reading! Those who have studied the manu-
script found it quite often difficult (but equally rewarding!); each time, how-
ever, that we analyzed their difficulties, we came together to the conclusion
that not the text (i.e. the way of presentation), but the subject matter itself
was "to blame". The moral of the story can only be that a nontrivial algorithm
is just nontrivial, and that its final description in a programming language is
highly compact compared to the considerations that justify its design: the
shortness of the final text should not mislead us! One of my assistants made
the suggestion -which I faithfully transmit, as it could be a valuable one-
xvi PREFACE
that little groups of students should study it together. (Here I must add a
parenthetical remark about the "difficulty" of the text. After having devoted
a considerable number of years of my scientific life to clarifying the pro-
grammer's task, with the aim of making it intellectually better manageable,
I found this effort at clarification to my amazement (and annoyance) repeat-
edly rewarded by the accusation that "I had made programming difficult".
But the difficulty has always been there, and only by making it visible can we
hope to become able to design programs with a high confidence level, rather
than "smearing code", i.e., producing texts with the status of hardly sup-
ported conjectures that wait to be killed by the first counterexample. None
of the programs in this monograph, needless to say, has been tested on a
machine.)
integer coordinates x and y, satisfying 0 < x < 500 and 0 < y < 500. For
all the points (x, y) with positive coordinates only, i.e. excluding the points
on the axes, we can write down at that position the value of GCD(x, y); we
propose a two-dimensional table with 250,000 entries. From the point of view
of usefulness, this is a great improvement: instead of a mechanism able to
supply the greatest common divisor for a single pair of numbers, we now
have a "mechanism" able to supply the greatest common divisor for any
pair of the 250,000 different pairs of numbers. Great, but we should not get
too excited, for what we identified as our second drawback -"Why should
we believe that the mechanism produces the correct answer?"- has been
multiplied by that same factor of 250,000: we now have to have a tremendous
faith in the manufacturer!
So let us consider a next mechanism. On the same cardboard with the
grid points, the only numbers written on it are the values I through 500
along both axes. Furthermore the following straight lines are drawn:
is that the same argument is applicable to each of the 500 points of the answer
line. Thirdly -and again this is not difficult- we have to show that for any
initial position (X, Y) a finite number of steps will indeed bring the pebble on
the answer line, and again the important observation is that the same argu-
ment is equally well applicable to any of the 250,000 initial positions (X, Y).
Three simple arguments, whose length is independent of the number of grid
points: that, in a nutshell, shows what mathematics can do for us! Denoting
with (x, y) any of the pebble positions during a game started at position
(X, Y), our first theorem allows us to conclude that during the game the
relation
GCD(x, y) = GCD(X, Y)
will always hold or -as the jargon says- "is kept invariant". The second
theorem then tells us that we may interpret the x-coordinate of the final
pebble position as the desired answer and the third theorem tells us that the
final position exists (i.e. will be reached in a finite number of steps). And this
concludes the analysis of what we could call "our abstract machine".
Our next duty is to verify that the board as supplied by the manufacturer
is, indeed, a fair model. For this purpose we have to check the numbering
along both axes and we have to check that all the straight lines have been
drawn correctly. This is slightly awkward as we have to investigate a number
of objects proportional to N if N (in our example 500) is the length of the
side of the square, but it is always better than N 2 , the number of possible
computations.
An alternative machine would not work with a huge cardboard but with
two nine-bit registers, each capable of storing a binary number between 0
and 500. We could then use one register to store the value of the x-coordinate
and the other to store the value of the y-coordinate as they correspond to
"the current pebble position". A move then corresponds to decreasing the
contents of one register by the contents of the other. We could do the arith-
metic ourselves, but of course it is better if the machine could do that for us.
Ifwe then want to believe the answer, we should be able to convince ourselves
that the machine compares and subtracts correctly. On a smaller scale the
history repeats itself: we derive once and for all, i.e. for any pair of n-digit
binary numbers, the equations for the binary subtractor and then satisfy
ourselves that the physical machine is a fair model of this binary subtractor.
If it is a parallel subtractor, the number of verifications -proportional
to the number of components and their interactions- is proportional to
n = !og2 N. In a serial machine the trading of time against equipment is
carried still one step further.
given the rules of the game in terms of the rules for performing "a step"
together with a criterion whether "the step" has to be performed another
time. (As a matter of fact, the step has to be repeated until a state has been
reached in which the step is undefined.) In other words, even a single game is
allowed to be generated by repeatedly applying the same "sub-rule".
This is a very powerful device. An algorithm embodies the design of the
class of computations that may take place under control of it; thanks to the
conditional repetition of "a step" the computations from such a class may
greatly differ in length. It explains how a short algorithm can keep a machine
busy for a considerable period of time. Alternatively we may see it as a first
hint as to why we might need extremely fast machines.
It is a fascinating thought that this chapter could have been written while
Euclid was looking over my shoulder.
THE ROLE OF PROGRAMMING
1 LANGUAGES
7
8 THE ROLE OF PROGRAMMING LANGUAGES
In the case of Euclid's algorithm, one can argue that it is so simple that we
can come away with an informal description of it. The power of a formal
notation should manifest itself in the achievements we could never do without
it!
The second advantage of a formal notation technique is that it enables
us to study algorithms as mathematical objects; the formal description of the
algorithm then provides the handle for our intellectual grip. It will enable
us to prove theorems about classes of algorithms, for instance, because their
descriptions share some structural property.
Finally, such a notation technique should enable us to define algorithms
so unambiguously that, given an algorithm described by it and given the
values for the arguments (the input), there should be no doubt or uncertainty
as to what the corresponding answers (the output) should be. It is then con-
ceivable that the computation is carried out by an automaton that, given
(the formal description of) the algorithm and the arguments, will produce
the answers without further human intervention. Such automata, able to
carry out the mutual confrontation of algorithm and argument with each
other, have indeed been built. They are called "automatic computers".
Algorithms intended for automatic execution by computers are called "pro-
grams" and since the late fifties the formal techniques used for program
notation are called "programming languages". (The introduction of the term
"language" in connection with notation techniques for programs has been a
mixed blessing. On the one hand it has been very helpful in as far as existing
linguistic theory now provided a natural framework and an established ter-
minology ("grammar", "syntax", "semantics", etc.) for discussion. On the
other hand we must observe that the analogy with (now so-called!) "natural
languages" has also been very misleading, because natural languages, non-
formalized as they are, derive both their weakness and their power from their
vagueness and imprecision.)
Historically speaking, this last aspect, viz. the fact that programming
languages could be used as a vehicle for instructing existing automatic compu-
ters, has for a long time been regarded as their most important property. The
efficiency with which existing automatic computers could execute programs
written in a certain language became the major quality criterion for that
language! As a regrettable result, it is not unusual to find anomalies in
existing machines truthfully reflected in programming languages, this at the
expense of the intellectual manageability of the programs expressed in such
a language (as if programming without such anomalies was not already diffi-
cult enough!). In our approach we shall try to redress the balance, and we
shall do so by regarding the fact that our algorithms could actually be carried
out by a computer as a lucky accidental circumstance that need not occupy a
central position in our considerations. (In a recent educational text addressed
to the PL/I programmer one can find the strong advice to avoid procedure
THE ROLE OF PROGRAMMING LANGUAGES 9
So much for a single wheel. Let us now turn our attention to a register
with eight of such wheels in a row. Because each of these eight wheels is in
one of ten different states, this register, considered as a whole, is in one of
100,000,000 possible, different states, each of which is suitably identified by
the number (or rather by the row of eight digits) displayed through the
window.
If the state for each of the wheels is given, then the state of the register as
a whole is uniquely determined; conversely, from each state of the register
as a whole, the state of each individual wheel is determined uniquely. In this
case we say (in an earlier chapter we have already used the term) that we
get (or build) the state space of the register as a whole by forming the
"Cartesian product" of the state spaces of the eight individual wheels. The
total number of points in that state space is the product of the number of
points in the state spaces from which it has been built (that is why it is
called the Cartesian product).
Whether such a register is considered as a single variable with 10 8 different
possible values, or as a composite variable composed out of eight different
ten-valued variables called "wheels" depends on our interest in the thing. If
we are only interested in the value displayed, we shall probably regard the
register as an unanalyzed entity, whereas the maintenance engineer who has
to replace a wheel with a worn tooth will certainly regard the register as
a composite object.
We have seen another example of building up a state space as the Car-
tesian product of much smaller state spaces when we discussed Euclid's
algorithm and observed that the position of the pebble somewhere on the
board could equally well be identified by two half-pebbles, each somewhere
on an axis, that is, by the combination (or more precisely, an ordered pair)
of two variables "x" and "y". (The idea of identifying the position of a point
in a plane by the values of its x-and y-coordinates comes from Descartes when
he developed the analytical geometry, and the Cartesian product is named
that way in honour of him.) The pebble on the board has been introduced as
a visualization of the fact that an evolving computational process -such as
the execution of Euclid's algorithm- can be viewed as the system travelling
through its state space. In accordance with this metaphor, the initial state
is also referred to as "the starting point".
In this book we shall mainly, or perhaps even exclusively, occupy our-
selves with systems whose state space will eventually be regarded as being
built up as a Cartesian product. This is certainly not to be interpreted as
my suggesting that state spaces built by forming Cartesian products are the
one and final answer to all our problems, for I know only too well that this
is not true. As we proceed it will become apparent why they are worthy of
so much of our attention and, simultaneously, why the concept plays such
a central role in many programming languages.
STATES AND THEIR CHARACTERIZATION 13
15
16 THE CHARACTERIZATION OF SEMANTICS
because the game terminates when x = y, but that is not part of our require-
ment when we decide to accept the final value of x as our "answer".
We call condition (J) the (desired) "post-condition"-"post" because it
imposes a condition upon the state in which the system must find itself after
its activity. Note that the post-condition could be satisfied by many of the
possible states. In that case we apparently regard each of them as equally
satisfactory and there is then no reason to require that the final state be a
unique function of the initial state. (As the reader will be aware, it is here
that the potential usefulness of a nondeterministic mechanism presents itself.)
In order to use such a system when we want it to produce an answer,
say "reach a final state satisfying post-condition (J) for a given set of values
of X and Y", we should like to know the set of corresponding initial states,
more precisely, the set of initial states such that activation will certainly
result in a properly terminating happening leaving the system in a final state
satisfying the post-condition. If we can bring the system without computa-
tional effort into one of these states, we know how to use the system to pro-
duce for us the desired answer! To give the example for Euclid's cardboard
game: we can guarantee a final state satisfying the postcondition (J) for any
initial state satisfying
GCD(x, y) = GCD(X, Y) and 0 < x < 500 and 0 < y < 500 (2)
(The upper limits have been added to do justice to the limited size of the
cardboard. If we start with a pair (X, Y) such that GCD(X, Y) = 713, then
there exists no pair (x, y) satisfying condition (2), i.e. for those values of X
and Y condition (2) reduces to F; and that means that the machine in question
cannot be used to compute the GCD(X, Y) for that pair of values of X and
Y.)
For many (X, Y) combinations, many states satisfy (2). In the case that
0 < X < 500 and 0 < Y < 500, the trivial choice is x = X and y = Y.
It is a choice that can be made without any evaluation of the GCD-function,
even without appealing to the fact that the GCD-function is a symmetric
function of its arguments.
The condition that characterizes the set of all initial states such that
activation will certainly result in a properly terminating happening leaving
the system in a final state satisfying a given post-condition is called "the
weakest pre-condition corresponding to that post-condition". (We call it
"weakest", because the weaker a condition, the more states satisfy it and we
aim here at characterizing all possible starting states that are certain to lead
to a desired final state.)
If the system (machine, mechanism) is denoted by "S" and the desired
post-condition by "R", then we denote the corresponding weakest pre-con-
dition by
wp(S, R)
THE CHARACTERIZATION OF SEMANTICS 17
If the initial state satisfies wp(S, R), the mechanism is certain to establish
eventually the truth of R. Because wp(S, R) is the weakest pre-condition,
we also know that if the initial state does not satisfy wp(S, R), this guarantee
cannot be given, i.e. the happening may end in a final state not satisfying R
or the happening may even fail to reach a final state at all (as we shall see,
either because the system finds itself engaged in an endless task or because
the system has got stuck).
We take the point of view that we know the possible performance of
the mechanism S sufficiently well, provided that we can derive for any post-
condition R the corresponding weakest pre-condition wp(S, R), because then
we have captured what the mechanism can do for us; and in the jargon the
latter is called "its semantics".
Two remarks are in order. Firstly, the set of possible post-conditions is
in general so huge that this knowledge in tabular form (i.e. in a table with
an entry for each R wherein we would find the corresponding wp(S, R))
would be utterly unmanageable, and therefore useless. Therefore the defi-
nition of the semantics of a mechanism is always given in another way, viz.
in the form of a rule describing how for any given post-condition R the
corresponding weakest pre-condition wp(S, R) can be derived. For a fixed
mechanism S such a rule, which is fed with the predicate R denoting the
post-condition and delivers a predicate wp(S, R) denoting the corresponding
weakest precondition, is called "a predicate transformer". When we ask for
the definition of the semantics of the mechanism S, what we really ask for
is its corresponding predicate transformer.
Secondly -and I feel tempted to add "thank goodness"- we are often
not interested in the complete semantics of a mechanism. This is because it
is our intention to use the mechanism S for a specific purpose only, viz. for
establishing the truth of a very specific post-condition R for which it has
been designed. And even for that specific post-condition R, we are often not
interested in the exact form of wp(S, R) i often we are content with a stronger
condition P, that is, a condition for which we can show that
P=> wp(S, R) for all states (3)
holds. (The predicate "P => Q" (read "P implies Q") is only false in those
points in state space where P holds, but Q does not, and it is true everywhere
else. By requiring that "P => wp(S, R)" holds for all states, we just require
that wherever Pis true, wp(S, R) is true as well: Pis a sufficient pre-condition.
In terms of sets it means that the set of states characterized by Pis a subset of
the set of states characterized by wp(S, R).) If for a given P, S, and R rela-
tion (3) holds, this can often be proved without explicit formulation -or,
if you prefer, "computation" or "derivation"- of the predicate wp(S, R).
And this is a good thing, for except in trivial cases we must expect that the
explicit formulation of wp(S, R) will defy at least the size of our sheet of
18 THE CHARACTERIZATION OF SEMANTICS
for at least three months. And even after I had given in (I had been flattered
out of my resistance!) I was highly uncomfortable. When the prototype was
becoming kind of operational I had my worst fears fully confirmed: a bug
in the program could evoke the erratic behaviour so strongly suggestive of
an irreproducible machine error. And secondly -and that was in the time
that for deterministic machines we still believed in "debugging"- it was
right from the start quite obvious that program testing was quite ineffective
as a means for raising the confidence level.
For many years thereafter I have regarded the irreproducibility of the
behaviour of the nondeterministic machine as an added complication that
should be avoided whenever possible. Interrupts were nothing but a curse
inflicted by the hardware engineers upon the poor software makers! Out of
this fear of mine the discipline for "harmoniously cooperating sequential
processes" has been born. In spite of its success I was still afraid, for our
solutions -although proved to be correct- seemed ad hoc solutions to the
problem of "taming" (that is the way we felt about it!) special forms of non-
determinacy. The background of my fear was the absence of a general meth-
odology.
Two circumstances have changed the scene since then. The one is the
insight that, even in the case of fully deterministic machines, program testing
is hardly helpful. As I have now said many times and written in many places:
program testing can be quite effective for showing the presence of bugs, but
is hopelessly inadequate for showing their absence. The other one is the
discovery that in the meantime it has emerged that any design discipline
must do justice to the fact that the design of a mechanism that is to have a
purpose must be a goal-directed activity. In our special case it means that
we can expect our post-condition to be the starting point of our design
considerations. In a sense we shall be "working backwards". In doing so we
shall find that the implication of property 4 is the essential part; for the equal-
ity of property 4' we shall have very little use.
Once the mathematical equipment needed for the design of nondeter-
ministic mechanisms achieving a purpose has been developed, the nondeter-
ministic machine is no longer frightening. On the contrary! We shall learn
to appreciate it, even as a valuable stepping stone in the design of an ulti-
mately fully deterministic mechanism.
and some post-condition R each initial state falls in one of three disjoint
sets, according to the following, mutually exclusive, possibilities:
The first set is characterized by wp(S, R), the second set by wp(S, non R),
their union by
(wp(S, R) or wp(S, non R)) = wp(S, R or non R) = wp(S, T)
and therefore the third set is characterized by non wp(S, T).
To give the complete semantic characterization of a nondeterministic
system requires more. With respect to a given post-condition R we have
again the three possible types of happenings as listed above under (a), (b),
and (c). But in the case of a nondeterministic system an initial state need
not lead to a unique happening, which by definition is one out of the three
mutually exclusive categories; for each initial state the possible happenings
may now belong to two or even to all three categories.
In order to describe them we can use the notion of "a liberal pre-condi-
tion". Earlier we considered pre-conditions such that it was guaranteed that
"the right result", i.e. a final state satisfying R, would be reached. A liberal
pre-condition is weaker: it only guarantees that the system won't produce
the wrong result, i.e. will not reach a final state not satisfying R, but non-
termination is left as an alternative. Also for liberal pre-conditions we can
introduce the concept of "the weakest liberal pre-condition"; let us denote
it by wlp(S, R). Then the initial state space is, in principle, subdivided into
seven mutually exclusive regions, none of which need to be empty. (Seven,
because from three objects one can make seven nonempty selections.) They
are all easily characterized by three predicates, viz. wlp(S, R), wlp(S, non R),
and wp(S, T).
abc
FIGURE J.J
The above analysis has been given for completeness' sake and also
because in practice the notion of a liberal pre-condition is a quite useful
one. If one implements, for instance, a programming language, one will not
prove that the implementation executes any correct program correctly; one
should be happy and content with the assertion that no correct program will
THE CHARACTERIZATION OF SEMANTICS 23
In the previous chapter we have taken the position that we know the
semantics of a mechanism S sufficiently well if we know its "predicate trans-
former", i.e. a rule telling us how to derive for any post-condition R the
corresponding weakest pre-condition, which we have denoted by "wp(S, R)",
for the initial state such that attempted activation will lead to a properly
terminating activity that leaves the system in a final state satisfying R. The
question is: how does one derive wp(S, R) for given Sand R?
So much, for the time being, about a single, specific mechanism S. A
program written in a well-defined programming language can be regarded
as a mechanism, a mechanism that we know sufficiently well provided that
we know the corresponding predicate transformer. But a programming lan-
guage is only useful provided that we can use it for the formulation of many
different programs and for all of them we should like to know their corre-
sponding predicate transformers.
Any such program is defined by its text as written in that well-defined
programming language and that text should therefore be our starting point.
But now we see suddenly two completely different roles for such a program
text! On the one hand the program text is to be interpreted by a machine
whenever we wish the program to be executed automatically, whenever we
wish a specific computation to be performed for us. On the other hand the
program text should tell us how to construct the corresponding predicate
transformer, how to accomplish the predicate transformation that will derive
wp(S, R) for any given post-condition R that has caught our fancy. This
observation tells us what we mean by "a well-defined programming language"
as far as we are concerned. While the semantics of a specific mechanism
(program) are given by its predicate transformer, we consider the semantic
24
THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE 25
with our previous result that all initial states would establish the final truth
of "a = 7" and therefore the final falsity of "a -::;t::. 7".) Also
wp("a:= 7'', b = bO) = {b = bO}
i.e. if we wish to guarantee that after the assignment "a:= 7" the variable
b has some value bO, then b should have that value already at the initial state.
In other words, all variables other than "a" are not tampered with, they keep
the value they had; the assignment "a:= 7" moves the point in state space
corresponding to the current system state parallel to the a-axis such that
"a = 7" finally holds.
Instead of choosing a constant for the expression E, we could also have
a function of the initial state. This is illustrated in the following examples:
wp("a:= 2 *b + l", a= 13) = {2 *b + 1 = 13} = {b = 6}
wp("a:= a+ l", a> 10) ={a+ 1 > JO}= {a> 9}
wp("a:= a - b'', a> b) ={a - b > b} ={a> 2 * b}
There is a slight complication if we allow the expression E to be a partial
function of the initial state, i.e. such that its attempted evaluation with an
initial state that lies outside its domain will not lead to a properly terminating
activity; if we wish to cater to that situation as well, we must sharpen our
definition of the semantics of the assignment operator and write
wp("x:= E", R) = {D(E) cand RE-•x}
Here the predicate D(E) means "in the domain of E"; the boolean expression
"Bl cand B2" (the so-called "conditional conjunction") has the same value
as "Bl and B2" where both operands are defined, but is also defined to have
the value "false" where Bl is "false", the latter regardless of the question
whether B2 is defined. Usually the condition D(E) is not mentioned explicitly,
either because it is = T or because we have seen to it that the assignment
statement will never be activated in initial states outside the domain of E.
ith expression from the right-hand list, such that, for instance, for given
xi, x2, El, and E2
xi, x2:= El, E2
is semantically equivalent with
x2, xi:= E2, El
The concurrent assignment allows us to prescribe that the two variables x
and y interchange their values by means of
x,y:=y,x
an operation that is awkward to describe otherwise. This, the fact that it
is easily implemented, and the fact that it allows us to avoid some over-
specification, are the reasons for its popularity. If the lists become long, the
resulting program becomes very hard to read.
The true BNF addict will extend his syntax by providing two alternative
forms for the assignment statement, viz.:
<assignment statement) : : = <variable) : = <expression) I
<variable), <assignment statement), <expression)
This is a so-called "recursive definition'', because one of the alternative
forms for a syntactic unit called "assignment statement" (viz. the second one)
contains as one of its components again the same syntactic unit called
"assignment statement'', i.e., the syntactic unit we are defining! At first
sight such a cyclic definition seems frightening, but upon closer inspection
we can convince ourselves that, at least from a syntactic point of view, there
is nothing wrong with it. For instance, because according to the first alterna-
tive
x2:= El
is an instance of an assignment statement, the formula
xi, x2:= El, E2
admits a parsing of the form
xi, <assignment statement), E2
and is therefore, according to the second alternative, also an assignment
statement. From a semantic point of view, however, it is a horror because
it suggests that E2 is associated with xi instead of with x2.
Compared with the two-statement language with only "skip" and "abort"
our language with the assignment statement is considerably richer: there is
no upper bound anymore on the number of different instances of the syn-
tactic unit "assignment statement". Yet it is clearly insufficient for our pur-
pose; we need the ability to build more sophisticated programs, more
30 THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE
EXERCISE
Verify that
"xl:= El; x2:= E2" and "x2:= E2; xl:= El"
are semantically equivalent if the variable xl does not occur in the expression E2
while, also, the variable x2 does not occur in the expression El. As a matter of fact,
they are then both semantically equivalent to the concurrent assignment "xl, x2: =
El, E2". (This equivalence is one of the arguments for promoting the concurrent
assignment; its use enables us to avoid sequential overspecification and, even more,
in the concurrent assignment it is clear that the two expressions El and E2 could be
evaluated concurrently, a fact that for some implementation techniques could be of
interest. Besides that we have the perhaps more interesting possibility that "xl,
x2:= El, E2" is semantically equivalent neither to "xl:= El; x2:= E2" nor to
"x2:= E2; xl:= El".) (End of Exercise.)
but for reasons that need not concern us now, I prefer the syntax that intro-
duces the concept of the guarding head.)
In this connection the boolean expression preceding the arrow is called
"a guard". The idea is that the statement list following the arrow will only
be executed provided initially the corresponding guard is true. The guard
enables us to prevent execution of a statement list under those initial circum-
stances under which execution would be undesirable or, if partial operations
are involved, impossible.
The truth of the guard is a necessary initial condition for the execution of
the guarded command as a whole; it is, of course, not sufficient, because in
some way or another -we shall meet two of them- it must also potentially
be "its turn". That is why a guarded command is not considered as a state-
ment: a statement is irrevocably executed when its turn has arrived, the
guarded command can be used as a building block for a statement. More
precisely: we shall propose two different ways of composing a statement of
a set of guarded commands.
After some reflection it is quite natural to consider a set of guarded com-
mands. Suppose that we are requested to construct a mechanism such that,
ifthe initial state satisfies Q, the final state will satisfy R. Suppose furthermore
that we cannot find a single statement list that will do the job in all cases.
(If there existed such a statement list, we should use just that one and there
would be no need for guarded commands.) We may, however, be able to
find a number of statement lists, each of which will do the job for a subset of
possible initial states. To each of these statement lists we can attach as guard
a boolean expression characterizing the subset for which it is adequate and
when we have enough sufficiently tolerant guards such that the truth of Q
implies the truth of at least one guard, we have for each initial state satisfying
Q a mechanism that will bring the system in a state satisfying R, viz. one of
the guarded commands whose guard is initially true.
In order to express this we define first
<guarded command set) : :=<guarded command){D<guarded command)}
where the symbol "0" (pronounce "bar") acts as a separator between other-
wise unordered alternatives. One of the ways to form a statement from a
guarded command set is by embracing it by the bracket pair "if ... fi'', i.e.
our syntax for the syntactic category called "statement" is extended with
a next form:
<statement)::= if <guarded command set) fi
It indicates a special way in which we can combine a number of guarded
commands into a new mechanism. We can view the activity that will take
place when this mechanism is activated as follows. One of the guarded com-
mands whose guard is true is selected and its statement list is activated.
34 THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE
1. It is assumed that all guards are defined; if not, i.e. if the evaluation of
a guard may lead to a not properly terminating activity, then the whole
construct is allowed to fail to terminate properly.
2. In general our construct will give rise to nondeterminacy, viz. for each
initial state for which more than one guard is true, because it is left
undefined which of the corresponding statement lists will then be selected
for activation. No nondeterminacy is introduced if any two guards
exclude each other.
3. If the initial state is such that none of the guards is true, we are faced
with an initial state to which none of the alternatives and therefore
neither the construct as a whole does cater. Activation in such an initial
state will lead to abortion.
Note. If we allow the empty guarded command set as well, the state-
ment "if fi" is therefore semantically equivalent with our earlier statement
"abort". (End of note.)
(In the following formal definition of the weakest pre-condition for the
if-fl-construct we shall restrict ourselves to the case that all the guards are
total functions. If this is not the case, the expression should be pre-fixed,
with a cand, by the additional requirement that the initial state lies in the
domain of all the guards.)
Let "IF" be the name of the statement
if Bi --> SL1 a B2 --> SL2 a D B. __, SL. fi
this implication is true regardless of the value of wp(SLj, R), i.e. for those
values of j, apparently it does not matter what SL j would do. Our implemen-
tation reflects this by not selecting for activation an SL 1 with an initially
false guard Bj. For those values ofj for which Bj is true, this implication can
only be true if wp(SL1 , R) is true as well. As our formal definition requires
the truth of the implication for all values of j, our implementation is indeed
free to choose when more than one guard is true.
The if-ft-construct is only one of the two ways in which we can build
a statement from a guarded command set. In the if-ft-construct, a state in
which all guards are false leads to abortion; in our second form we allow
the state in which no guards are true to lead to proper termination, and
because then no statement list is activated, it is only natural that it will then
be semantically equivalent to the empty statement; the counterpart of this
permission to terminate properly when no guard is true, however, is that
the activity is not allowed to terminate as long as one of the guards is true.
That is, upon activation the guards are inspected. The activity terminates if
there are no true guards; if there are true guards one of the corresponding
statement lists is activated and upon its termination the implementation
starts all over again inspecting the guards. This second construct is denoted
by embracing the guarded command list by the bracket pair "do ... od".
·The formal definition of the weakest pre-condition for the do-od-construct
is more complicated than the one for the if-ft-construct; as a matter of fact
the first one is expressed in terms of the second one. We shall first give the
formal definition and then its explanation. Let "DO" be the name of the
statement
do B 1 --> SL1 D B 2 --> SL 2 D ••• DB.--> SL. od
and let "IF" be the name of the statement formed by embracing the same
guarded command set by the bracket pair "if . .. ft". The conditions Hk(R)
are given by
H 0 (R) =Rand non (Ej: 1 <j < n: B)
and fork> 0:
then
wp(DO, R) =(Ek: k > 0: Hk(R))
Here the intuitive understanding of Hk(R) is: the weakest precondition
such that the do-od-construct will terminate after at most k selections of a
guarded command, leaving the system in a final state satisfying the post-
condition R.
Fork = 0 it is required that the do-od-construct will terminate without
selecting any guarded command, i.e. there may not exist a true guard, as is
expressed by the second term; and the initial truth of R is then clearly the
36 THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE
37
38 TWO THEOREMS
and Q implies on account of (J) the first term on the right-hand side, (3)
is proved if on account of (2) we can conclude that
Q ~ (Aj: I <j < n: Bi~ wp(SL1, R)) (4)
holds for all states. For any state for which Q is false, (4) is true by defini-
tion of the implication. For any state for which Q is true and for any j we
distinguish two cases: either B 1 is false, but then B 1 ~ wp(SLi, R) is true
by definition of the implication, or B 1 is true, but then on account of (2),
wp(SL1, R) is true and therefore Bi~ wp(SLi, R) is true as well. As a
result (4) and therefore (3) has been proved.
Note. In the special case of binary choice (n = 2) and B 2 = non B 1 ,
we have BB = T and the weakest pre-condition reduces to
(B1 ~ wp(SLI> R)) and (non B 1 ~ wp(SL2 , R)) =
(non B 1 or wp(SLI> R)) and (B 1 or wp(SL 2 , R)) =
(B1 and wp(SLI> R)) or (non B 1 and wp(SL 2 , R)) (5)
The last reduction is possible because of the four cross-terms B1 and non
B 1 = F and can be omitted, while wp(SL1 , R) and wp(SL 2 , R) can be
omitted as well: in every state such that it is true, exactly one of the
two terms of (5) must be true and thus it can be omitted from that
disjunction. Formula (5) is closely related to the way in which C.A.R.
Hoare has given the semantics for the if-then-else of ALGOL 60. Because
here BB= T and is implied by everything, we can conclude (3) on the
weaker assumption
((Q and B 1) ~ wp(SLI> R)) and ((Q and non B 1) ~ wp(SL 2 , R)).
(End of Note.)
The theorem for the alternative construct is of special importance in the
case that the predicate pair Q and R can be written as
R=P
Q =PandBB
In that case the antecedent (J) is fulfilled automatically while the antecedent
(2) reduces -because (BB and Bi)= Bi- to
(Aj: I <j < n: (P and B 1) ~ wp(SLi, P)) (6)
from which we can conclude, on account of (3)
(P and BB) ~ wp(IF, P) for all states (7)
a relation that will form the antecedent for our next theorem.
THEOREM
The equality in the first line follows from (10), the equality in the second
line follows from the fact that any wp(IF, R) =- BB, the implication in the
third line follows from (7), the equality in the fourth line from property 3
for predicate transformers, the implication of the fifth line follows from
property 2 for predicate transformers and (13) assumed for k = K - l, and
the last line follows from (12). Thus (13) has now been proved for k = K
and therefore for all k > 0.
Finally, for any point in state space we have -thanks to (13)-
p and wp(DO, T) = (E k: k > 0: P and Hk(T))
=-(Ek: k > O: Hk(P and non BB))
= wp(DO, P and non BB)
and thus (8), the basic theorem for the repetitive construct, has been proved.
The basic theorem for the repetitive construct derives its extreme usefulness
from the fact that neither in the antecedent nor in the consequent the actual
number of times a guarded command has been selected is mentioned. As a
result it allows assertions even in those cases in which this number is not
determined by the initial state.
ON THE DESIGN
OF PROPERLY TERMINATING
6 CONSTRUCTS
The basic theorem for the repetitive construct asserts for a condition P
that is kept invariantly true that
(P and wp(DO, T)) => wp(DO, P and non BB)
Here the term wp(DO, T) is the weakest pre-condition such that the
repetitive construct will terminate. Given an arbitrary construct DO it is in
general very hard -if not impossible- to determine wp(DO, T); I therefore
suggest to design our repetitive constructs with the requirement of termina-
tion consciously in mind, i.e. to choose an appropriate proof for termination
and to make the program in such a way that it satisfies the assumptions of
the proof.
Let, again, P be the relation that is kept invariant, i.e.
(P and BB)=> wp(IF, P) for all states, (J)
let furthermore t be a finite integer function of the current state such that
(P and BB) => (t > 0) for all states (2)
and furthermore, for any value tO and for all states
(P and BB and t < tO + 1) => wp(IF, t < tO) (3)
41
42 ON THE DESIGN OF PROPERLY TERMINATING CONSTRUCTS
And these two implications can be combined (from A ==> C and B ==> D we
may conclude that (A or B) ==> (C or D) holds):
(P and t < K + I)==> wp(IF, HK(T)) or H 0 (T) = HK+i(T)
and thus the truth of (6) has been established for all k > 0. Becuase t is a
finite function, we have
-the occurrence of the free variable tO in both predicates is the reason why
we have talked about "a predicate pair"- tells us, that we can conclude that
(3) holds if
(Aj: 1 s j < n: (P and Bi and t < tO + 1) =- wp(SLi, t < tO))
In other words, we have to prove for each guarded command that the selec-
tion will cause an effective decrease oft. Bearing in mind that t is a function
of the current state, we can consider
wp(SLi, t < tO) (8)
This is a predicate involving, besides the coordinate variables of the state
space, also the free variable tO. Up till now we have regarded such a predicate
as a predicate characterizing a subset of states. For any given state, however,
we can also regard it as a condition imposed upon tO. Let tO = tmin be the
minimum solution for tO of equation (8); we can then interpret the value
tmin as the lowest upper bound for the final value oft. Remembering that,
just as t itself, tmin also is a function of the current state, the predicate
tmin < t- 1
can be interpreted as the weakest pre-condition such that execution of SL1
is guaranteed to decrease the value of t by at least 1. Let us denote this pre-
condition, where -we repeat- the second argument t is an integer valued
function of the current state, by
wdec(SL1, t);
then the invariance of P and the effective decrease of t is guaranteed if we
have for allj:
(P and B 1) =- (wp(SL 1, P) and wdec(SL1, t)) (9)
A usually practical way for finding a suitable B 1 is the following. Equa-
tion (9) is of the type
(P and Q) =- R
where a -practically computable!- Q must be found for given P and R.
We observe that
1. Q = R is a solution.
2. If Q = (Ql and Q2) is a solution and P =- Q2, then Ql is a solution
as well.
3. If Q = (Ql or Q2) is a solution and P =-non Q2 (or, what amounts to
the same thing: (P and Q2) = F), then Ql is a solution as well.
4. If Q is a solution and Ql =- Q, then Ql is a solution as well.
At the risk of boring my readers I shall now devote yet another chapter
to Euclid's algorithm. I expect that in the meantime some of my readers will
already have coded it in the form
x,y:= X, Y;
do x =I= y-> if x > y-> x:= x - y
ny > x _. y:= y - x
fi
od;
print(x)
where the guard of the repetitive construct ensures that the alternative con-
struct will not lead to abortion. Others will have discovered that the algorithm
can be coded more simply as follows:
x,y:= X, Y;
do x > y - > x: = x - y
lly>x->y:=y-x
od;
print(x)
Let us now try to forget the cardboard game and let us try to invent
Euclid's algorithm for the greatest common divisor of two positive numbers
X and Y afresh. When confronted with such a problem, there are in principle
always two ways open to us.
The one way is to try to follow the definition of the required answer as
closely as possible. Presumably we could form a table of the divisors of X;
45
46 EUCLID'S ALGORITHM REVISITED
this table would only contain a finite number of entries, among which would
be 1 as the smallest and X as the largest entry. (If X = 1, smallest and largest
entry will coincide.) We could then also form a similar table of the divisors
of Y. From those two tables we could form a table of the numbers occurring
in both of them; this then is the table of the common divisors of X and Y
and is certainly nonempty, because it will contain the entry 1. From this
third table we therefore can select (because it is also finite!) the maximum
entry and that would be the greatest common divisor.
Sometimes following the definition closely, as sketched above, is the best
thing we can do. There is, however, an alternative approach to be tried if we
know (or can find) properties of the function to be computed. It may be that
we know so many properties that they together determine the function and
we may try to construct the answer by exploiting those properties.
In the case of the greatest common divisor we observe, for instance, that,
because the divisors of -x are the same as those for x itself, the GCD(x, y)
is also defined for negative arguments and not changed 1f we change the sign
of arguments. It is also defined when just one of the arguments is =0; that
argument has an infinite table of divisors (and we should therefore not try
to construct that table!), but because the other argument (=FO) has a finite
table of divisors, the table of common divisors is still nonempty and finite.
So we come to the conclusion that GCD(x, y) is defined for each pair (x, y)
sue~ that (x, y) =I= (0, 0). Furthermore, on account of the symmetry of the
notion "common'', the greatest common divisor of two numbers is a sym-
metric function of its two arguments. A little more reasoning can convince
us of the fact that the greatest common divisor of two arguments is unchanged
if we replace one of them by their sum or difference. Collecting our know-
ledge we can write down:
for (x, y) =I= (0, 0)
Let us suppose for the sake of argument that the above four properties
represent our only knowledge about the GCD-function. Do they suffice?
You see, the first three relations express the greatest common divisor of x
and y in that of another pair, but the last one expresses it directly in terms
of x. And this is strongly suggestive of an algorithm that, to start with,
establishes the truth of
P = (GCD(X, Y) = GCD(x, y))
(this is trivially achieved by the assignment "x, y:= X, Y"), whereafter we
EUCLID'S ALGORITHM REVISITED 47
"massage" the value pair (x, y) in such ways, that according to (a), (b) or
(c) relation Pis kept invariant. If we can manage this massaging process so
as to reach a state satisfying x = y, then, according to (d), we have found
our answer by taking the absolute value of x.
Because our ultimate goal is to establish under invariance of P the truth
of x = y we could try as monotonically decreasing function t = abs(x - y).
In order to simplify our analysis -always a laudable goal!- we observe
that, when starting with nonnegative values for x and y, there is nothing to
be gained by introducing a negative value: if the assignment x: = E would
have established x < 0, the assignment x:= -Ewould never have given rise
to a larger final value oft (because y > 0). We therefore sharpen our relation
P to be kept invariant:
P =(Pl and P2)
with
Pl= (GCD(X, Y) = GCD(x,y))
and
P2 = (x > 0 and y > 0)
This means that we have lost all usage for the operations x: = -x and
y:= -y, the massagings permissible on account of property (b). We are
left with
from (a): x,y:=y,x
from (c): x:= x +y y:= y +x
x:= x - y y:=y-x
x:= y- x y:= x - y
Let us deal with them in turn and start with x, y:= y, x:
wp("x, y:= y, x", abs(x - y) < tO) = (abs(y - x) < tO)
therefore
tmin(x, y) = abs(y - x)
hence
wdec("x, y:= y, x", abs(x - y)) = (abs(y - x) < abs(x - y) - 1) = F.
And here -for those who would not believe it without a formal deriva-
tion- we haved proved (or, if you prefer, discovered) by means of our cal-
culus that the massaging operation x, y:= y, xis no good because it fails to
cause an effective decrease of our t as chosen.
The next trial is x:= x + y and we find, again applying the calculus of
the preceding chapters:
wp("x:= x + y", abs(x - y) < tO) = (abs(x) < tO)
tmin(x, y) = abs(x) = x (we confine ourselves to states satisfying P)
48 EUCLID'S ALGORITHM REVISITED
EXERCISES
prints the greatest common divisor of X and Y, followed by their smallest com-
mon multiple. (End of exercises.)
50 EUCLID'S ALGORITHM REVISITED
Finally, if our little algorithm is activated with a pair (X, Y) that does
not satisfy our assumption X > 0 and Y > 0, unpleasant things will happen:
if (X, Y) = (0, 0), it will produce the erroneous result zero, and if one of the
arguments is negative, activation will set an endless activity in motion. This
can be prevented by writing
x,y:= X, Y;
do x > y -> x: = x - y ay > x -> y: = y - x od;
print(x)
fi
51
52 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES
FIGURE 8-J
The point is that at the entry the good choice must be made so as to
guarantee that upon completion R(m) holds. For this purpose we "push the
post-condition through the alternatives":
FIGURE 8-2
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 53
Second example.
For a fixed value of n (n > 0) a function f(i) is given for 0 < i < n.
Establish the truth of R:
0 < k < n and (Ai: 0 < i < n:f(k) > f(i))
Because our program must work for any positive value of n it is hard to
see how R can be established without a loop; we are therefore looking for
a relation P that is easily established to start with and such that eventually
(P and non BB) ~ R. In search of P we are therefore looking for a relation
weaker than R; in other words, we want a generalization of our final state.
A standard way of generalizing a relation is the replacement of a constant
by a variable -possibly with a restricted range- and here my experience
suggests that we replace the constant n by a new variable, j say, and take
for P:
0 < k <j < n and (Ai: 0 < i <j:f(k) > f(i))
where the condition j < n has been added in order to do justice to the finite
domain of the functionf Then, with such a generalization, we have trivially
(Pandj=n)~R
we venture the following structure for our program (comments are added
between braces).
54 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES
A final remark is not so much concerned with our solution as with our
considerations. We have had our mathematical concerns, we have had our
engineering concerns, and we have accomplished a certain amount of separa-
tion between them, now focussing our attention on this aspect and then on
that aspect. While such a separation of concerns is absolutely essential when
dealing with more complicated problems, I must stress that focussing one's
attention on one aspect does not mean completely ignoring the others. In
the more mathematical part of the design activity we should not head for a
mathematically correct program that is so badly engineered that it is beyond
salvation. Similarly, while "trading" we should not introduce errors through
sloppiness, we should do it carefully and systematically; also, although the
mathematical analysis as such has been completed, we should still understand
enough about the problem to judge whether our considered changes are
significant improvements.
Note. Prior to my getting used to these formal developments I would
always have used ''j < n" as the guard for this repetitive construct, a
habit I still have to unlearn, for in a case like this, the guard ''j =I= n" is
certainly to be preferred. The reason for the preference is twofold. The
guard ''j =I= n" allows us to conclude j = n upon termination without an
appeal to the invariant relation P and thus simplifies the argument about
what the whole construct achieves for us compared with the guard
''j < n". Much more important, however, is that the guard ''j =I= n" makes
termination dependent upon (part of) the invariant relation, viz. j < n,
and is therefore to be preferred for reasons of robustness. If the addition
}:= j + I would erroneously increase j too much and would establish
j > n, then the guard ''j < n" would give no alarm, while the guard
''j =I= n" would at least prevent proper termination. Even without taking
machine malfunctioning into account, this argument seems valid. Let
a sequence x 0 , x 1 , x 2 , ••• be given by a value for x 0 and for i > 0 by
X; = f (x 1_ 1 ), where f is some computable function and let us carefully
and correctly keep the relation X = x 1 invariant. Suppose that we have
in a program a monotonically increasing variable n such that for some
values of n we are interested in x •. Provided n > i, we can always establish
X= x. by
do i =I= n--> i, X:= i + l,f(X) od
If -due perhaps to a later change in the program with the result that it
is no longer guaranteed that n can only increase as the computation
proceeds- the relation n > i does not necessarily hold, the above con-
struct would (luckily!) fail to terminate, while the use of the terminating
do i < n--> i, X:= i + l,f(X) od
would have failed to establish the relation X = x •. The moral of the story
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 57
is that, all other things being equal, we should choose our guards as weak
as possible. (End of note.)
Third example.
For fixed a (a> 0) and d (d > 0) it is requested to establish R:
O<r<danddl(a-r)
(Here the vertical bar "I" is to be read as "is a divisor of".) In other words
we are requested to compute the smallest nonnegative remainder r that is
left after division of a by d. In order that the problem be a problem, we have
to restrict ourselves to addition and subtraction as the only arithmetic opera-
tions. Because the term d I(a - r) is satisfied by r = a, an initialization that,
on account of a> 0, also satisfies 0 < r, it is suggested to choose as invariant
relation P:
0 <rand di (a - r)
For the function t, the decrease of which should ensure termination, we
choose r itself. Because the massaging of r must be such that the relation
di (a - r) is kept invariant, r may only be changed by a multiple of d, for
instance d itself. Thus we find ourselves invited to evaluate
wp("r:= r - d", P) and wdec("r:= r - d", r) =
0 < r - d and di (a - r + d) and d > 0
Because the term d > 0 could have been added to the invariant relation
P, only the first term is then not implied; we find the corresponding guard
"r > d" and the tentative program:
ifa>Oandd>O-->
r:= a;
do r > d --> r: = r - d od
fi
Upon completion the truth of P and non r > d has been established, a
relation that implies R and thus the problem has been solved.
Suppose now that in addition it would have been required to assign to
q such a value that finally we also have
a=d*q+r
in other words it is requested to compute the quotient as well, then we can
try to add this term to our invariant relation. Because
(a= d * q + r) =>(a= d *(q + l)+(r - d))
we are led to the program:
58 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES
EXERCISE
Modify also our second program in such a way that it computes the quotient as well
and give a formal correctness proof for your program. (End of exercise)
Let us assume next that there is a small number, 3 say, by which we are
allowed to multiply and to divide and that these operations are sufficiently
fast so that they are attractive to use. We shall denote the product by "m * 3"
-or by "3 * m"- and the quotient by "m/ 3"; the latter expression will
only be called for evaluation provided initially 3 Im holds. (We are working
with integer numbers, aren't we?)
60 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES
and then let dd grow until it is large enough and r < dd is satisfied as well.
The following program would do:
ifa>Oandd>O-->
r, dd:= a, d;
do r > dd --> dd: = dd * J od;
do dd =1= d __. dd:= dd / J;
do r > dd--> r: = r - dd od
od
fi
EXERCISE
Modify also the above program in such a way that it computes the quotient as well
and give a formal correctness proof for your program. This proof has to demon-
strate that whenever dd/3 is computed, originally 3 \dd holds. (End of exercise.)
Fourth example.
For fixed QI, Q2, Q3, and Q4 it is requested to establish R where R
is given as RI and R2 with
RI: The sequence of values (qi, q2, q3, q4) is a permutation
of the sequence of values (QI, Q2, Q3, Q4)
R2: qi < q2 < q3 < q4
Taking RI as relation P to be kept invariant a possible solution is
qi, q2, q3, q4:= QI, Q2, Q3, Q4;
do qi > q2--> qi, q2:= q2, qi
nq2 > q3 __. q2, q3:= q3, q2
nq3 > q4 - q3, q4:= q4, q3
od
The first assignment obviously establishes P and no guarded command
destroys it. Upon termination we have non BB, and that is relation R2. The
way in which people convince themselves that it does terminate depends
largely on their background: a mathematician might observe that the number
of inversions decreases, an operations researcher will interpret it as maximiz-
ing qi + 2*q2 + 3*q3 + 4*q4, and I, as a physicist, just "see" the center of
gravity moving in the one direction (to the right, to be quite precise). The
program is remarkable in the sense that, whatever we would have chosen
for the guards, never would there be the danger of destroying relation P:
the guards are in this example a pure consequence of the requirement of
termination.
Note. Observe that we could have added other alternatives such as
qi > q3--> qi, q3:= q3, qi
as well; they cannot be used to replace one of the given three.
(End of note.)
It is a nice example of the kind of clarity that our nondeterminacy has
made possible to achieve; needless to say, however, I do not recommend to
sort a large number of values in an analogous manner.
Fifth example.
We are requested to design a program approximating a square root;
more precisely: for fixed n (n > 0) the program should establish
R: a 2 < n and (a+ 1) 2 > n
One way of weakening this relation is to drop one of the terms of the
conjunction, e.g. the last one, and focus upon
P: a2 < n
a relation that is obviously satisfied by a= 0, so that the initialization need
62 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES
not bother us. We observe that if the second term is not satisfied this is due
to the fact that a is too small and we could therefore consider the statement
"a:= a+ l". Formally we find
wp("a:= a+ l", P) =((a+ 1) 2 < n)
Taking this condition as -the only!- guard, we have (P and non BB) = R
and therefore we are invited to consider the program
if n > 0---->
a:= 0 {P has been established};
do (a+ 1) 2 < n--> a:= a+ 1 {P has not been destroyed} od
{R has been established}
fi {R has been established}
all under the assumption that the program terminates, which is what it does
thanks to the fact that the square of a nonnegative number is a monotonically
increasing function: we can take for t the function n - a 2 •
This program is not very surprising; it is not very efficient either: for large
values of n it could be rather time-consuming. Another way of generalizing
Risby the introduction of another variable (b say-and again restricting its
range) that is to replace part of R, for instance
P: a2 < n and b 2 > n and 0 <a < b
By the way this has been chosen it has the pleasant property that
(P and (a +1= b)) =- R
Thus we are led to consider a program of the form (from now on omitting
the if n > 0 --> ••• fi)
a, b:= 0, n + 1 {P has been established};
do a + 1 -::;t::. b --> decrease b - a under invariance of P od
{R has been established}
Each time the guarded command is executed let d be the amount by which
the difference b - a is decreased. Decreasing this difference can be done by
either decreasing b or increasing a or both. Without loss of generality we
can restrict ourselves to such steps in which either a or b is changed, but not
both: if a is too small and b is too large and in one step only b is decreased,
then a can be increased in a next step. This consideration leads to a program
of the following form.
a, b:= 0, n + 1 {P has been established};
do a+ 1 -::;t::. b __.
d:= ... {d has a suitable value and Pis still valid};
if . .. ---->a:= a+ d {P has not been destroyed}
0 ••• ----> b:= b - d {P has not been destroyed}
fi {P has not been destroyed}
od {R has been established}
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 63
Now
a,b:=O,n+l;
do a+ 1 -::;t::. b--> d:= ... ;
if (a+ d) 2 < n--> a:= a+ d
a (b - d) 2 > n--> b:= b - d
fi {P has not been destroyed}
od {R has been established}
We are still left with a suitable choice for d. Because we have chosen b - a
(actually, b - a - 1) as our function t, effective decrease implies that d
must satisfy d > 0. Furthermore the following alternative construct may not
lead to abortion, i.e. at least one of the guards must be true. That is, the
negation of the first, (a+ d) 2 > n, must imply the other, (b - d) 2 > n;
this is guaranteed if
a+d<b-d
or
2*d<b-a
Besides a lower bound we have also found an upper bound ford. We could
choose d = 1, but the larger d is, the faster the program, and therefore we
propose:
a,b:=O,n+l;
do a+ 1
-::;t::. b--> d:=(b - a)div 2;
if (a+ d) 2 < n--> a:= a+ d
a (b - d) 2 > n--> b:= b - d
fi
od
and the program (in which the roles of c and d have coincided)
64 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES
Note. This program is very much like the last program for the third
example, the computation of the remainder under the assumption that
we could multiply and divide by 3. The alternative construct in our above
program could have been replaced by
do (a+ c) 2 < n--> a:= a+ cod
If the condition for the remainder 0 < r < d would have been rewritten
as r < d and (r + d) > d, the similarity would be even more striking.
(End of note.)
Under admission of the danger of beating this little example to death,
I would like to submit the last version to yet another transformation. We
have written the program under the assumption that squaring a number is
among the repertoire of available operations; but suppose it is not and
suppose that multiplying and dividing by (small) powers of 2 are the only
(semi-)multiplicative operations at our disposal. Then our last program as
it stands is no good, i.e. it is no good if we assume that the values of the
variables as directly manipulated by the machine are to be equated to the
values of the variables a and c if this computation were performed "in
abstracto". To put it in another way: we can consider a and c as abstract
variables whose values are represented -according to a convention more
complicated than just identity- by the values of other variables that are in
fact manipulated by the machine. Instead of directly manipulating a and c,
we can let the machine manipulate p, q, and r, such that
p =a* c
q = c2
r = n - a2
It is a coordinate transformation and to each path through our (a,c)-space
corresponds a path through our (p,q,r )-space. This is not always true the
other way round, for the values of p, q, and r are not independent: in terms of
p, q, and r we have redundancy and therefore the potential to trade some
storage space against not only computation time but even against the need
to square! (The transformation from a point in (a,c)-space to a point in
(p,q,r)-space has quite clearly been constructed with that objective in mind.)
We can now try to translate all boolean expressions and moves in (a,c)-space
into the corresponding boolean expressions and moves in (p,q,r)-space. If
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 65
this can be done in terms of the permissible operations there, we have been
successful. The transformation suggested is indeed adequate and the follow-
ing program is the result (the variable h has been introduced for a very local
optimization):
This fifth example has been included because it relates -in an embellished
form- a true design history. When the youngest of our two dogs was only
a few months old I walked with both of them one evening. At the time, I was
preparing my lectures for the next morning, when I would have to address
students with only a few weeks exposure to programming, and I wanted a
simple problem such that I could "massage" the solutions. During that
one-hour walk the first, third, and fourth programs were developed in that
order, but for the fact that the correct introduction of h in the last program
was something I could only manage with the aid of pencil and paper after
I had returned home. The second program, the one manipulating a and b,
which here has been presented as a stepping stone to our third solution, was
only discovered a few weeks later-be it in a less elegant form than presented
here. A second reason for its inclusion is the relation between the third and
the fourth program: with respect to the latter one the other one represents
our first example of so-called "representational abstraction".
Sixth example.
For fixed X ( X > I) and Y ( Y > 0) the program should establish
R: z = XY
under the -obvious- assumption that exponentiation is not among the
available repertoire. This problem can be solved with the aid of an "abstract
variable", h say; we shall do it with a loop, for which the invariant relation is
P:
and our (equally "abstract") program could be
h, z:= xr,
I {P has been established};
do h -=;t:. Isqueeze h under invariance of P od
--->
x, y, z:= X, Y, I;
do y =F 0 --> if non 2 IY --> y' z: = y - l, z * x a 21 y --> skip fi;
X,y:=X*X,y/2
od
This latter program is very well known; it is a program that many of us have
discovered independently of each other. Because the last squaring of x when
y has reached the value 0 is clearly superfluous, this program has often been
cited as supporting the need for what were called "intermediate exits". In
view of our second program I come to the conclusion that this support is
weak.
Seventh example.
For a fixed value of n (n > 0) a function f (i) is given for 0 < i < n.
Assign to the boolean variable "al/six" the value such that eventually
R: al/six= (Ai: 0 < i < n:f(i) = 6)
holds. (This example shows some similarity to the second example of this
chapter. Note, however, that in this example, n = 0 is allowed as well. In
that case the range for i for the all-quantifier "A" is empty and al/six = true
should hold.) Analogous to what we did in the second example the invariant
relation
P: (al/six= (Ai: 0 < i <j:f(i) = 6)) and 0 <j < n
suggests itself, because it is easily established for j = 0, while (P andj = n)
~ R. The only thing to do is to investigate how to increase j under invari-
ance of P. We therefore derive
wp(''j:= j + l", P) =
(al/six= (Ai: 0 < i <j + J:f(i) = 6)) and 0 <j + 1 < n
The last term is implied by P and j =F n; it presents no problem because we
had already decided that j =F n as a guard is weak enough to conclude R
upon termination. The weakest pre-condition such that the assignment
al/six:= al/six and f (j) = 6
will establish the other term, is
(al/six and/(j) = 6) = (A i: 0 < i < j + 1: f (i) = 6)
a condition that is implied by P. We thus arrive at the program
al/six,j:= true, O;
do j =F n __. al/six:= allsix and f (j) = 6;
j:=j + 1
od
68 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES
(In the guarded command we have not used the concurrent assignment for
no particular reason.)
By the time that we read this program -or perhaps sooner- we should
get the uneasy feeling that as soon as a function value :::/= 6 has been found,
there is not much point in going on. And indeed, although (P andj = n) ~ R,
we could have used the weaker
(P and (j = n or non a/lsix)) ~ R
leading to the stronger guard ''j :::/= n and allsix" and to the program
a/lsix, j:= true, 0;
do j :::/= n and allsix __.al/six, j:= f (j) = 6, j + 1 od
(Note the simplification of the assignment to al/six, a simplification that is
justified by the stronger guard.)
EXERCISE
if n = 0 -+ al/six:= true
Dn > 0-+ j:= O;
doj=F n -1 and /(j) = 6-+j:=j +I od;
a/lsix:= f(j) = 6
fi
and also for the still more tricky program (that does away with the need to invoke
the function/from more than one place in the program)
j:=O;
doj =F n cand f(j) = 6--> j:= j +I od;
al/six : = j = n
(Here the conditional conjunction operator "cand" has been used in order to do
justice to the fact that f(n) need not be defined.) The last program is one that some
people like very much. (End of exercise.)
Eighth example.
Before I can state our next problem, I must first give some definitions
and a theorem. Let p = (p 0 , p 1 , ••• , p._ 1) be a permutation of n (n > 1)
different values p 1 (0 < i < n), i.e. (i :::/= j) ~ (p, :::/= p 1 ). Let q = (q 0 , qI> ... ,
q._ 1) be a different permutation of the same set of n values. By definition
"permutation p precedes q in the alphabetic order" if and only if for the mini-
mum value of k such that Pk:::/= qk we have Pk< qk.
The so-called "alphabetic index." of a permutation of n different values
is the ordinal number given to it when we number then! possible permuta-
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 69
as this is easily established initially (viz. by "s: = O") and (P1 ands = r) ==> R.
Again we ask whether we can think of restricting the range of s and in
view of its initial value we might try
Pl: index.(c 0 , ch ... , c._ 1) =sand 0 < s < r
which would lead to a program of the form
Our next concern is what to choose for "a suitable amount". Because
our increase of s must be accompanied by a rearrangement of the cards in
order to keep P1 invariant, it seems wise to investigate whether we can find
conditions under which a single cardswap corresponds to a known increase
of s. For a value of k satisfying 1 < k < n, let
Cn-k < Cn-k; I < · · · < Cn-1
In order to find "the suitable amount" for a major step, the machine
first determines the largest smaller value of k for which r < s + k ! no
longer holds (c; with i = n - k - 1 is then too small, but values to the left
of it are all OK) and then increases s by the minimum multiple of k !
needed to make r < s + k ! hold again; this is done in "minor steps" of k !
at a time, simultaneously increasing c; with cards to the right of it. In the
following program we introduce the additional variable kfac, satisfying
P3: kfac = k!
and for the second inner repetition i and j, such that i = n - k - 1 and
either j = nor i < j < n and c1 > c; and c1 _ 1 < c1•
s:= 0 {Pl has been established};
kfac, k:= 1, l{P3 has been established as well};
do k * n-> kfac, k:= kfac *(k + 1), k + 1 od
{P2 has been established as well};
dos -=;t:. r-> {s < r, i.e. at least one and therefore
at least two cards have not reached their
final position}
do r < s + kfac-> kfac, k:= kfac/k, k - 1 od
{Pl and P3 have been kept true, but in P2
the last term is replaced by
s + kfac < r < s + (k + l)* kfac};
i,j:= n - k - 1, n - k;
dos+ kfac < r-> {n - k <j < n}
s:= s + kfac; cardswap(i,j); j:= j + 1
od {P2 has been restored again: Pl and P2 and P3}
od{R has been established}
EXERCISE
Convince yourself of the fact that also the following rather similar program would
have done the job:
s:= O; kfac, k:= 1, l;
do k =F n-> kfac, k:= kfac *(k + 1), k + 1 od;
do k =F 1->
kfac, k:= kfac/k, k - l;
i,j:=n-k-1,n-k;
do s + kfac :::;; r ->
s:= s + kfac; cardswap(i,j);j:=j +1
od
od
(Hint: the monotonically decreasing function t ::2: 0 for the outer repetition is
t = r - s + k - 1.) (End of exercise.)
9 ON NONDETERMINACY
BEING BOUNDED
72
ON NONDETERMINACY BEING BOUNDED 73
Because of (J) and the fact that SLr enjoys property 2, we conclude that
wp(SLr, C,) => wp(SLr, C,+1)
and thus we conclude from (4) that in point X we also have
(Es': s' > 0: (As: s > s': wp(SLr, C,))) (5)
Lets' = s'(j') be the minimum value satisfying (5). We now define smax as
the maximum value of s'(j') taken over the (at most n, and therefore the
maximum exists!) valuesj' for which Br(X) =true. In point X then holds
on account of (3) and (5)
BB and (Aj: 1 <j < n: Bi=> wp(SL1, C,m.J) =
(by definition of the semantics of the alternative construct)
But the truth of the latter relation in state X implies that there also
(Es: s > 0: wp(IF, C,))
but as X was an arbitrary state satisfying (3), for S = IF the fact that the
left-hand side of (2) implies its right-hand side as well has been proved, and
thus the alternative construct enjoys property 5 as well. Note the essential
role played by the antecedent (J) and the fact that a guarded command set
is a finite set of guarded commands.
Property 5 is proved for the repetitive construct by mathematical induc-
tion.
Base: Property 5 holds for H 0 •
Induction step: From the assumption that property 5 holds for Hk and H 0 it
follows that it holds for Hk+I·
Hk+i(E r: r > 0: C,) =
(by virtue of the definition of Hk+i)
wp(IF, Hk(E r: r > 0: C,)) or H 0 (Er: r > 0: C,) =
(because property 5 is assumed to hold for Hk and for H 0 )
wp(IF, (Er': r' > 0: Hk(C,,))) or (Es: s > 0: H 0 (C,)) =
(because property 5 holds for the alternative construct and property 2 is
enjoyed by Hk)
ON NONDETERMINACY BEING BOUNDED 75
Here property (a) expresses the requirement that activation of Sis guaranteed
to terminate with x equal to some positive value, property (b) expresses that
Sis a mechanism of unbounded nondeterminacy, i.e. that no a priori upper
bound for the final value of x can be given. For such a program S, we could,
however, derive now:
T = wp(S, x > 0)
= wp(S, (Er: r > 0: 0 < x < r))
=(Es: s > 0: wp(S, 0 < x < s))
= (E s: s > 0: F)
=F
our formalism for the repetitive construct gives wp(S, T) = (x > 0), while I
expect most of my readers to conclude that under the assumption of the
existence of "set x to any positive integer" for x < 0 termination would be
guaranteed as well. But then the interpretation of wp(S, T) as the weakest
pre-condition guaranteeing termination would no longer be justified. How-
ever, when we substitute our first would-be implementation:
S: do x > 0 __. x: = x - I
Dx < 0-> go on:= true; x:= I;
+
do go on --> x: = x I
a go on--> go on:= false
od
od
79
80 AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES"
for our ability to cope (mentally) with the program as a whole, is the more
vital the larger the total number of variables involved. The question is whether
(and, if so, how) such "separations" should be reflected more explicitly in our
program texts.
Our first "autocoders" (the later ones of which were denoted by misno-
mers such as "automatic programming systems" or -even worse- "high level
programming languages") certainly did not cater to such possibilities. They
were conceived at a time when it was the general opinion that it was our
program's purpose to instruct our machines, in contrast to the situation of
today in which more and more people are leaning towards the opinion that
it is our machines' purpose to execute our programs. Jn those early days it
was quite usual to find all sorts of machine properties directly reflected in
the write-up of our programs. For instance, because machines had so-called
"jump instructions'', our programs used "go to statements". Similarly, be-
cause machines had constant size stores, computations were regarded as
evolving in a state space with a constant number of dimensions, i.e. manipu-
lating the values of a constant set of variables. Similarly, because in a random
access store each storage location is equally well accessible, the programmer
was allowed to refer at any place in the program text to any variable of the
program.
This was all right in the old days when, due to the limited storage sizes,
program texts were short and the number of different variables referred to
was small. With growing size and sophistication, however, such homogeneity
becomes, as a virtue, subject to doubt. From the point of view of flexibility
and general applicability the random access store is of course a splendid
invention, but comes the moment that we must realize that each flexibility,
each generality of our tools requires a discipline for its exploitation. That
moment has come. Let us tackle the "free accessibility" first.
In FORTRAN's first version there were two types of variables, integer
variables and floating point variables, and the first letter of their name
decided -according to a fixed convention- the type, and any occurrence of
a variable name anywhere in the program text implied at run time the per-
manent existence of a variable with that name. In practice this proved to be
very unsafe: if in a program operating on a variable named "TEST" a single
misspelling occurred, erroneously referring to "TETS" instead of to "TEST",
no warning could be generated; another variable called "TETS" would be
introduced. In ALGOL 60 the idea of so-called "declarations" was introduced
and as far as catching such silly misspellings was concerned, this proved to be
an extremely valuable form of redundancy. The basic idea of the explicit
declaration of variables is that statements may only refer to variables that
have been explicitly declared to exist: an erroneous reference to a variable by
the name of"TETS" is then caught automatically if no variable with the name
"TETS" has been declared. The declarations of ALGOL 60 served a second
82 AN ESSAY ON THE NOTION: "THE SCOPE OF VARI ABLES"
that at that time it had been recently discovered how a one-pass assembler
could use a stack for coping with textually nested different meanings of the
same name, may have had something to do with its adoption.
From the user's point of view, however, the convention is less attractive,
for it makes the variables declared in his outermost block extremely vulner-
able. If he discovers to his dismay that the value of one of these variables has
been tampered with in an unintended and as yet unexplained way, he is in
principle obliged to read all the code of all the inner blocks, including the
ones that should not refer to the variable at all-for precisely there such a
reference would be erroneous! Under the assumption that the programmer
does not need to refer everywhere to anything he seems to be better served
by more explicit means for restricting the textual scope of names than the
more or less accidental re-declaration.
A first step in this direction, which maintains the notion of textually
nested contexts is the following. For each block we postulate for its level
(i.e. its text with the exception of its inner blocks) a textual context, i.e. a
constant nomenclature in which all names have a unique meaning. The
names occurring in a block's textual context are either "global", i.e. inherited
with their meaning from the immediate surroundings, or "local", i.e. with
its meaning only pertinent to the text of this block. The suggestion is to
enumerate after the block's opening bracket (with the appropriate separators)
the names that together form its textual context, for instance first the global
names (if any) and then the local names (if any); obviously all these names
must be different.
Confession. The above suggestion was only written down after long hesi-
tations, during which I considered alternatives that would enable the pro-
grammer to introduce in a block one or more local names without also being
obliged to enumerate all the global names the block would inherit, i.e. alter-
natives that would indicate (with the same compactness) the change of nomen-
clature of the ALGOL 60 block (without "re-declaration of identifiers"), i.e.
a pure extension of the nomenclature. This would give the programmer
the possibility to indicate contraction of the nomenclature, i.e. limited
inheritance from the immediate surroundings, but not the obligation. And
for some time I thought that this would be a nice, nonpaternalistic attitude
for a language designer; I also felt that this would make my scope rules
more palatable, because I feared that many a programmer would object
to explicit enumeration of the inheritance whenever he felt like introducing a
local variable. This continued until I got very cross with myself: too many
language designs have been spoiled by fear of nonacceptance and I know of
only one programmer who is going to program in this language and that is
myself! And I am enough of a puritan to oblige myself to indicate the inheri-
tance explicitly. Even stronger: not only will my inner blocks refer only to
84 AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES"
global variables explicitly inherited, but also the inheritance will not mention
any global variables not referred to. The inheritance will give a complete
description of the block's possible interference with the state space valid in
its surroundings, no more and no less! When I discovered that I had allowed
my desire "to please my public" -which, I think, is an honourable one- to
influence not only the way of presentation, but also the subject matter itself,
I was frightened, cross with, and ashamed of myself. (End of confession.)
Besides having a name, variables have the unique property of being able
to have a value that may be changed. This immediately raises the question
"What will be the value of a local variable upon block entry?". Various
answers have been chosen. ALGOL 60 postulates that upon block entry the
values of its local variables are "undefined", i.e. any effort to evaluate their
value prior to an assignment to them is regarded as "undefined". Failure to
initialize local variables prior to entry of a loop turned out to be a very
common error and a run-time check against the use of the undefined value
of local variables, although expensive, proved in many circumstances not to
be a luxury. Such a run-time check is, of course, the direct implementation
of the pure mathematician's answer to the question of what to do with a
variable whose value is undefined, i.e. extend its range with a special value,
called "UNDEFINED", and initialize upon block entry each local variable
with that unique, special value. Any effort to evaluate the value of a variable
having that unique, special value "UNDEFINED" can then be honoured
by program abortion and an error message.
Upon closer scrutiny, however, this simple proposal leads to logical
problems; for instance, it is then impossible to copy the values of any set of
variables. Efforts to remedy that situation include, for instance, the possibility
to inspect whether a value is defined or not. But such ability to manipulate
the special value -e.g. the value "NIL" for a pointer pointing nowhere-
easily leads to confusions and contradictions: one might discover a case of
bigamy when meeting two bachelors married to the same "nobody".
Another way out, abolishing the variables with undefined values, has
been the implicit initialization upon block entry not with a very special, but
with a very common, almost "neutral" value (say "zero" for all integers and
"true" for all booleans). But this, of course, is only fooling oneself; now
detection of a very common programming error has been made impossible
by making all sorts of nonsensical programs artificially into legal ones. (This
proposal has been mitigated by the convention that initialization with the
"neutral" value would only occur "by default", i.e. unless indicated other-
wise, but such a default convention is clearly a patch.)
A next attack to the problem of catching the use of variables with still
undefined values has been the performance of (automatic) flow analysis of
the program text that could at least warn the programmer that at certain
AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES" 85
places variables would-or possibly could-be used prior to the first assign-
ment to them. In a sense my proposal can be regarded as being inspired by
that approach. I propose such a rigid discipline that:
One way of achieving this would be to make the initialization of all local
variables obligatory upon block entry; together with the wish not to initialize
with "meaningless" values -a wish that implies that local variables should
only be introduced at a stage that their meaningful initial value is available-
this, I am afraid, will lead to confusingly high depths of nesting of textual
scopes. Besides that we would have to "distribute" the block entry over the
various guarded commands of an alternative construct whenever the initial-
ization should be done by one of the guarded commands of a set. These two
considerations made me look for an alternative that would require less (and
more unique) block boundaries. The following proposal seems to meet our
requirements.
First of all we insist that upon block entry the complete nomenclature
(both inherited and private) is listed. Besides the assignment statement that
destroys the current value of a variable by assigning a new one to it, we have
initializing statements, by syntactical means recognizable as such, that give a
private variable its first value since block entry. (If we so desire we can regard
the execution of its initializing statement as coinciding in time with the vari-
able's "creation"; the earlier mentioning of its name at block entry can then
be regarded as "reserving its identifier for it".)
In other words, the textual scope of a variable private (i.e. local) to a
block extends from the block's opening "begin" until its corresponding
closing "end" with the exception of the texts of inner blocks that do not
inherit it. We propose to divide its textual scope into what we might call
"the passive scope", where reference to it is not allowed and the variable is
not regarded as a coordinate of the local state space, and "the active scope",
where the variable can be referenced. Passive and active scopes of a variable
will always be separated by an initializing statement for that variable, and
initializing statements for a variable have to be placed in such a way that,
independent of values of guards:
I. after block entry exactly one initializing statement for it will be executed
before the corresponding block exit;
2. between block entry and the execution of the initializing statement no
statement from its active scope can be executed.
The following discipline guarantees that the above requirements are met. To
86 AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES"
start with we consider the block at the syntactic grain where the enumeration
of its private nomenclature is followed by a list of statements mutually sepa-
rated by semicolons. Such a statement list must have the following properties:
For the BNF-addicts the following syntax (where intialization refers to one
private variable) may be helpful:
Note. In using ALGOL 60 the sheer size of the brackets "begin" and
"end" has caused discomfort; having abolished ALGOL 60's compound
statement, we expect to need fewer of them. (End of note.)
The corresponding repetitive construct is not included as a permissible
form of the <initializing statement) because its inclusion would violate our
first requirement: regardless of the sequencing initialization must occur
exactly once. Such a restriction does not occur in a programming language
like ALGOL 60 in which simply the (dynamically) first assignment is taken as
"the initialization". The price paid for the greater freedom in an ALGOL-like
language is that with programs written in such a language we cannot neces-
sarily decide statically (i.e. for all computations) at each semicolon which
AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES" 87
any difficulties either: it may not inherit the variable from its surroundings.
The third case, where a block begins in the passive scope of a variable
and ends in its active one deserves still some further attention: it means no
more and no less than the block's obligation to initialize the variable. It has
inherited what we could call "a virgin variable" for the purpose of its initial-
ization. Both in context IN and in context OUT we can ask the question
whether the variable is changeable in its active scope. In the case of the
inheritance of a virgin variable, these two questions are, however, fully inde-
pendent; the circumstance that at the textual level of context OUT a variable
will not have its value changed after initialization does not exclude that
the initialization itself (in an inner block) is a multistep affair, viz. a multistep
affair when we consider the initialization not as a single, undivided act, but
-at a smaller grain of interest- as a sequential process, building up the initial
value.
After these explorations the time has come to be as precise as possible.
We recall that we introduced the name "the textual context" of a block for
the constant nomenclature (in which all names have a unique meaning) per-
taining to the block's "level", i.e. its text with the exception of its inner
blocks. We now consider two nested blocks, an inner one (with the context
called "IN") and an outer one (with the context called "OUT"); with respect
to IN we have referred to OUT as "the surrounding context".
Names of a context are of two kinds: either they are private to the block,
i.e. unrelated to anything outside the block, or they are "inherited" from the
surrounding context. In the case of inheritance, we must distinguish two
cases: the context IN may inherit a name of a variable from the context OUT
with or without the obligation for the inner block to initialize, when activated,
the variable inherited. We shall distinguish these three ways by
A variable can belong to more than one textual context: to start with it
belongs to the textual context of the block to which it is private and further-
more it belongs to the textual contexts of all inner blocks that inherit it from
their surrounding contexts. The scope of a variable extends over the levels
of all blocks to whose textual contexts the variable belongs. The scope of a
variable is always subdivided into two parts, its passive scope and its active
scope, and the way in which initializing statements for a variable may occur
in the text has been restricted so as to guarantee with respect to each variable
in time the succession:
AN ESSAY ON THE NOTION; "THE SCOPE OF VARIABLES" 91
IN OUT
privar, pricon not applicable
glovar privar, virvar (only if inner block
fully within active scope) or
glovar (without restriction)
glocon privar, pricon, virvar, vircon (only if
inner block fully within active scope) or
glovar, glocon (without restriction)
virvar, vircon privar, pricon, virvar, vircon (only if
inner block begins in passive scope)
"virvar table" will be created and built up. Once the execution of such an
initializing inner block has been completed, the value of"table"will remain
constant throughout the execution of the outer block (and its further
inner blocks, if any). (End of note 2.)
The remaining decisions, although far from unimportant (they determine
what our texts look like, how easily they write and read), have less far-
reaching consequences; they are purely concerned with syntax. We have to
decide upon notations for the <nomenclature) and for the <primitive initial-
izing statement). I propose for the nomenclature a notation very similar to
ALGOL 60's <block head), a syntax with which I have always been perfectly
happy.
<nomenclature)::= <nomenclature element){; <nomenclature element)}
<nomenclature element)::= <nomenclature header) <variable)
{,<variable)}
<nomenclature header): : = privar Ipricon I virvar Ivircon I
glovar I glocon
Admittedly with a view to later extensions I propose to derive the initial-
izing statements from the assignment statements by post-fixing the variable
at the left-hand side by the special character "vir" -indicating that we deal
with a virgin variable- followed by the name of its type:
<primitive initializing statement)::= <variable) vir <type)
:=<expression)
where, as far as types are concerned we have confined ourselves up till now
to integers and booleans:
<type)::= int Iboo/
Note 1. The extension to concurrent initialization and concurrent assign-
ment and initialization is left to the ambitious reader. (End of note 1.)
Note 2. The expression(s) at the right-hand side are to be regarded as
still in the passive scope of the variables being initialized. (End of note 2.)
As an example we give the inner block that initializes the global integer
variable x with the GCD(X, Y), where with regard to the inner block, X and
Y are positive constants. The block uses a private variable called y.
An indication like (1) seems too specific, for it is only applicable to types
whose values have a natural ordering. An indication like (2) seems also too
specific. What about a pair of global variables whose initial values only influ-
ence their own and each other's final value? An indication like (3), however,
could be meaningful. Our indication "con" then emerges as "the subset of
allowed modifiers is empty".
11 ARRAY VARIABLES
94
ARRAY VARIABLES 95
If we go that route we are clearly piling one logical patch upon another.
However, I have now come to the conclusion that it is not the concurrent
assignment, but the notion of the subscripted variable that is to be blamed.
In the axiomatic definition of the assignment statement via "substitution of
a variable" one cannot afford -as in, I guess, all parts of logic- any uncer-
tainty as to whether two variables are the same or not.
The moral of the story is that we must regard the array in its entirety as a
single variable, a so-called "array variable", in contrast to the "scalar vari-
ables" discussed so far. In the following I shall restrict myself to array vari-
ables that are the analogue of one-dimensional arrays.
We can regard (the value of) a variable of type "integer" as an integer-
valued function without arguments (i.e. defined on a domain consisting of
a single, anonymous point), a function that does not change unless explicitly
changed (usually by an assignment). It is perhaps unusual to consider func-
tions without arguments, but we mention the viewpoint for the sake of the
analogy. For, similarly, we can regard( the value of) a variable of type "integer
array" as an integer-valued function of one argument with a domain in the
integers, a function, again, that does not change unless explicitly changed.
But the value of a variable of type "integer array" cannot be any integer-
valued function defined on a domain in the integers, for I shall restrict myself
to such types that, given two variables of that type, we can write an algorithm
establishing whether or not the two variables have the same value. If x and y
are scalar variables of type "integer", then this algorithm boils down to the
boolean expression x = y, i.e. both functions are evaluated at the only
(anonymous) point of their domain and these integer values are then com-
pared. Similarly, if ax and ay are two variables of type "integer array", their
values are equal if and only if, as functions, they have the same domain and
in each point of the domain their values are equal to each other. In order
that all these comparisons are possible, we must restrict ourselves to finite
domains. And what is more, besides being finite, the domains must be avail-
able in one way or another to the algorithm that is to compare the values of
the array variables ax and ay.
For practical purposes I shall restrict myself to domains consisting of
consecutive integers (when not empty). But even then there are at least two
possibilities. In ALGOL 60 the domain is fixed by giving in the declaration
-e.g. "boolean array A[l: JO], B[l: 5]"- the lower and upper bounds for
the subscript value. As a type determines the class of possible values for a
variable of that type, we must come to the conclusion that the two arrays
A and Bin the above example are of different type: A may have 1024 different
values, B only 32. In ALGOL 60 we have as many different types "boolean
array" as we can have bound pairs (and, as the bound pair may contain
expressions, the type is in principle only determined upon block entry).
Besides that, the necessary knowledge about the domain must be provided
96 ARRAY VARIABLES
Remark 1. In other contexts, i.e. not following the dot, the same names
may be used with completely different meaning. We could introduce an array
variable named "dom" and in its active scope we could refer to "dom.lob'',
"dom.hib" and even "dom.dom" ! Such perversities are not recommended and
therefore I have tried to find subordinate names that, although of some
mnemonic value, are unlikely candidates for introduction by the programmer
himself. (End of remark 1.)
Remark 2. A further reason for using the dot notation rather than the
function notation -e.g. "dom(ax)'', etc.- is that, unless we introduce differ-
ent sets of names for these functions defined on boolean arrays and integer
arrays respectively (which would be awkward) we are forced to introduce
functions of an argument that may be of more than one type, something I
would like to avoid as long as possible. (End of remark 2.)
For the sake of convenience we introduce two further functions; for the
array variable ax they are defined if ax.dom > 0. They are
ax.low, defined to be equal to ax(ax.lob)
and
ax.high, defined to be equal to ax(ax.hib)
They denote the function values at the lowest and the highest point of the
domain respectively. They are nothing really new and are defined in terms
of concepts already known; in the definition of the semantics of operations
on array values we do not need to mention the effect on them explicitly.
As stated above, a scalar variable can be regarded as a function (without
argument) that can be changed by assigning a new value to it: such an assign-
ment destroys the information stored as "its old value" completely. We also
need operations to change the value of an array variable (without them it
would always be an array constant!) but the assignment of a new value to
it that is totally unrelated to its old value will play a less central role. It is
not that the assignment to an array variable presents any logical difficulties -
on the contrary, I am tempted to add- but there is something wrong with
its economics. With a large domain size the amount of information stored as
"the value of an array variable" can be very large, and neither copying nor
destroying such large amounts of information are considered as "nice"
operations. On the contrary: in many programming tasks the core of the
problem consists of building up an array value gradually, i.e. in a number of
steps, each of which can be considered as a "nice" operation, "nice" in the
sense that the new value of the array can be regarded as a "pleasant" deriva-
tion of its old value. What makes such operations "nice" or "pleasant"
depends essentially on two aspects: firstly, the relation between the old and
the new value should be mathematically manageable, otherwise the opera-
tions are too cumbersome for us to use; secondly, its implementation should
not be too expensive for the kind of hardware that we intend to instruct
with our program. The extent to which we are willing to take the latter hard-
ware constraints into account is not a scientific question, but a political one,
and as a consequence I don't feel obliged to give an elaborate justification
of my choices. For the sake of convenience I shall be somewhat more liberal
98 ARRAY VARIABLES
than many programmers would be, particularly those that are working
daily with machinery, the conceptual design of which is ten or more years
old; on the other hand I hope to be sufficiently aware of the possible technical
consequences of my choices that they remain, if not realistic, at least not
totally unrealistic.
Our first modification of the value of an array variable, ax say, does not
change the domain size, nor the set of function values, nor their order; it
only shifts the domain over a number of places, k say, upwards along the
number line. (If k < 0 it is a shift over -k places in the other direction; if
k = 0 it is the identity transformation, semantically equivalent to "skip".)
We denote it by
ax:shift(k)
Here we have introduced the colon ":". Its lowest dot indicates in the usual
manner that the following name is subordinate to the type of the variable
mentioned to its left; the upper dot is just an embellishment (inspired by the
assignment operator ": = "), indicating that the value of the variable men-
tioned to its left is subject to redefinition.
Immediately we are confronted with the question whether we can give
an axiomatic definition of the predicate transformer wp("ax:shift(E)'', R).
Well, it must be a predicate transformer similar to the one of the axiom of
assignment to a scalar variable, but more complicated -and this will be
true as well for all the other modifiers of array values- because the value of
a scalar value is fully defined by one (elementary) value, while the value of
an array variable involves the domain itself and a function value for all
points of the domain. Because the value of the array variable ax is fully
determined by
the value of ax.lob,
the value of ax.dom and
the value of ax(i) for ax.lob <i< ax.lob + ax.dom
we can, in principle at least, restrict ourselves to post-conditions R referring
to the array value only in terms of "ax.lob", "ax.dom" and "ax(arg)" where
"arg" may be any integer-valued expression. For such a post-condition R
the corresponding weakest pre-condition
wp("ax:shift(E)", R)
is derived from R by simultaneously replacing
For the definition of our further operators we shall follow the latter
technique: it describes more clearly how the final value ax' depends on the
initial value ax.
The next operators extend the domain at either the high or the low end
with one point. The function value in the new point is given as parameter
which must be of the so-called "base type" of the array, i.e. boolean for a
boolean array, etc. The operators are of the form
ax:hiext(x) or ax:loext(x)
The semantic definition of hiext is given by
wp("ax:hiext(x)", R) = Rax'-ax
where
ax'./ob = ax.lob
ax'.hib= ax.hib + 1
ax'.dom = ax.dom + 1
1 00 ARRAY VARIABLES
The next two operators remove a point from the domain at either the
high or the low end. They are only defined when initially dom > 0 holds
for the array to which they are applied; when applied to an array with dom =
0, they lead to abortion. They destroy information in the sense that one of
the function values gets lost.
The semantic definition of hirem is given by
wp("Qx:hirem", R) = (ax.dom > 0 and Rax'-ax>
where
ax'.lob =ax.lob
ax' .hib = ax.hib - I
ax' .dom = ax.dom - I
ax'(arg) =undefined for arg = ax.hib
= ax(arg) for arg -=f:: ax.hib
The semantic definition of lorem is given by
wp("ax:lorem", R) = (ax.dom > 0 and Rax'-aJ
where
ax'.lob =ax.lob + I
ax'.hib= ax.hib
ax'.dom = ax.dom - I
ax'(arg) =undefined for arg = ax.lob
= ax(arg) for arg -=f:: ax.lob
For the sake of convenience we introduce two further operations, the
ARRAY VARIABLES 101
where
ax'.lob = ax.lob
ax'.hib = ax.hib
ax' .dom = ax.dom
ax'(arg) = ax(.i) for arg = i
= ax(i) for arg =j
= ax(arg) for arg =I= i and arg =I= j
Note. Initially i ::/= j is not required: if initially i = j holds, the value of
the array variable remains unaffected. (End of note.)
where
ax'.lob = ax.lob
ax'.hib = ax.hib
ax'.dom = ax.dom
ax'(arg) = x for arg = i
= ax(arg) for arg -::Fi
The operation denoted above as "ax:alt(i, x)" is semantically equivalent
to what FORTRAN or ALGOL 60 programmers know as "the assignment
to a subscripted variable". (They would write "AX(!)= X" and "ax[i] := x"
respectively.) I have introduced this operation in the form "ax:alt(i, x)" in
order to stress that such an operation affects the array ax as a whole: two
functions with the same domain are different functions if they differ in at
least one point of the domain. The "official" -or, if you prefer, "puritan"-
notation "ax:a/t(i, x)" is, however, even to my taste too cumbersome and too
unfamiliar and I therefore propose (I too have my weaker moments!) to
use instead
ax:(i) = x
a notation which is somewhat shorter, reminiscent of the so much more
familiar assignment statement, and still reflects by its opening "ax:" that we
must view it as affecting the array variable ax. (The decision to write "ax:(i)=
x" is not much different from the decision to write "ax(i)" instead of the
more pompous "ax.val(i)".)
None of the previous operators can be used for initialization. They can
only change the value of an array under the assumption that it has already
a value; they can only occur in the active scope of the array variable. We
have not yet introduced the assignment
ax:= bx
a construct that would do the job. I am, however, very hesitant to do so,
because in its full generality "assignment of a value" usually implies "copying
a value" and ifthe domain of the function bx is large, this is not to be regarded
as a "nice" operation in present technology. Not that I am absolutely unwill-
ing to introduce "unpleasant" operations, but if I do so, I would not like
them to appear on paper as innocent ones. A programming language in
which "x:= y" should be regarded as "nice" but "ax:= bx" should have to
be regarded as "unpleasant" would be misleading; it would at least mislead
me. A way out of this dilemma is to admit as the right-hand side of the
ARRAY VARIABLES 103
but I would like to reject that inner block as a worthy substitute, not so much
on account of the length of the text, but on account of its inefficiency. I will
not even regard "ax:(5)= 7" as an abbreviation of the above inner block.
With the possible exception of the assignment of an enumerated value,
I assume in particular the price of all operations independent of the values
of the arguments supplied to it: the price of executing ax:shift(k) will be
independent of the value of k, the price of executing ax:swap(i,j) will be
independent of the values of i and j, etc. With present-day technology these
assumptions are not unrealistic.
It is in such considerations that the justification is to be found for my
willingness to introduce otherwise superfluous names; we could have restrict-
1 04 ARRAY VARIABLES
105
1 06 THE LINEAR SEARCH THEOREM
i.e. i is the minimum value > 0, such that non B(i) holds. In other words,
when we look for the minimum value (at least equal to some lower bound)
that satisfies some criterion, our program investigates values (starting at that
lower bound) in increasing order. Searching in increasing order translates
the first satisfactory value encountered into the smallest satisfactory value
existing. Similarly, when looking for a maximum value, we shall search in
decreasing order.
Very often, the two statements have the form
x: = xnought;
do B(x) ___. x:= F(x) od
This program searches in the sequence of values given by
x 0 = xnought
for i > 0: x 1 = F(x1 _ 1 )
the value x 1 with the minimum value of i (> 0), such that non B(x1) holds.
(More formal proofs of the above are left as an exercise to the industrious
reader, if so inclined.)
The insights described in this chapter are referred to as the "Linear
Search Theorem". In the next chapter we shall use it as part of our reasoning
for actually finding a solution; simple as it is, the Linear Search Theorem
has often proved to be of significant heuristic value.
THE PROBLEM
13 OF THE NEXT PERMUTATION
Having found i, we must find from "the tail", i.e. among the values c(j)
with i + I < j < n, the new value c(i). Because we are looking for the
immediate successor, we must find that value ofj in the range i + I < j < n,
such that c(j) is the smallest value satisfying
c(j) > c(i)
Having found j, we can see to it that c(i) gets adjusted to its final value
by "c:swap(i,j)". This operation has the additional advantage that the total
sequence remains a permutation of the numbers from I through n; the final
operation is to rearrange the values in the tail in monotonically increasing
order. The overall structure of the program we are considering is now
determine i;
determine j;
c:swap(i,j);
sort the tail
(In our example i = 6, j = 8 and the final result would be reached via the
intermediate sequence I 4 6 2 9 7 8 5 3.)
When determining i, we look for a maximum value of i; the Linear Search
Theorem tells us that we should investigate the potential values for i in
decreasing order.
When determining}, we look for a minimum value c(j); the Linear Search
Theorem tells us that we must investigate c(j) values in increasing order.
Because the tail is a monotonically decreasing function (on account of the
way in which i was determined), this obligation boils down to inspecting
c(j) values in decreasing order of j.
The operation "c:swap(i,j)" does not destroy the monotonicity of the
function values in the tail (prove this!) and "sort the tail" reduces to inverting
the order. (In doing so, our program "borrows" the variables i and j that
have done their job. Note that the way in which the tail is reflected works
equally well with an even number as with an odd number of values in the
tail.)
THE PROBLEM OF THE NEXT PERMUTATION 109
Remark 1. Nowhere have we used the fact that the values c(J), c(2), . .. ,
c(n) were all different from each other. As a result one would expect that
this program would correctly transform the initial sequence into its immediate
alphabetic successor also if some values occurred more than once in the
sequence. It does indeed, thanks to the fact that, while determining i and j,
we have formed our guards by "mechanically" negating the required condi-
tion c(i) < c(i + 1) and c(j) > c(i) respectively. I once showed this program,
when visiting a university, to an audience that absolutely refused to accept
my guards with equality included. They insisted on writing, when you knew
that all values were different
do c( i) > c(i + 1) ----> • • •
and
do c(j) < c(i) ----> •••
Their unshakable argument was "that it was much more expensive to test
for equality as well". I gave up, wondering by what kind of equipment on the
campus they had been brainwashed. (End of remark 1.)
machine code programming (even without index registers: in the good old
von Neumann tradition, programs had to modify their own instructions in
store!). And I also remember that, after a vain struggle of more than two
hours, I gave up! And that at a moment when I was already an experienced
programmer! A few years ago, needing an example for lecturing purposes,
I suddenly remembered that old problem and solved it without hesitation
(and could even explain it the next morning to a fairly inexperienced audience
within twenty minutes). That now one can explain within twenty minutes to
an inexperienced audience what twenty years before an experienced pro-
grammer could not find shows the dramatic improvement of the state of the
art (to the extent that it is now even hard to believe that then I could not
solve this problem!). (End of remark 4.)
111
112 THE PROBLEM OF THE DUTCH NATIONAL FLAG
The constant N is a global constant from the context in which our pro-
gram is to be embedded as an inner block. Our program, however, has to
meet three special requirements:
1. It must be able to cope with all possible forms of "degeneration" as
presented by missing colours: the buckets may have been filled with
pebbles of two colours only, of one colour only, or of no colour at all
(if N = 0).
2. The mini-computer has a very small store compared with the values of
N it should be able to cope with, and therefore we are not allowed to
introduce arrays of any sort, only a fixed number of variables of the
types "integer" and/or variables of type "colour". (With variables of
type integer we mean here variables that cannot take on much more
than N different values.)
3. The program may direct the "eye" at most once upon each pebble (it
is assumed that the input operation is so time-consuming that looking
twice at the same pebble would lead to an inacceptable loss of time).
Furthermore, regarding programs of the same degree of complication,
the one that needs (on the average) the fewer swaps is to be preferred.
Although our pebbles are of only three different colours, the fact that
our eye can only inspect pebbles one at a time, together with requirement
(3), implies that halfway through the rearrangement process, we have to
distinguish between pebbles of four different categories, viz. "established
red" (ER), "established white" (EW), "established blue" (EB), and "as yet
uninspected" (X). Requirement (2) excludes that pebbles of these different
categories lie arbitrarily mixed: inside the mini-computer we then cannot
store "who is what". Our only way out is to divide the row of buckets into
a fixed number of (possibly empty) zones of consecutively numbered buckets,
each zone being reserved for pebbles of a specific category. Because four
different zones is the minimum, the introduction of just four zones seems
the first thing to try. But in what order? I found that many programmers
tend to decide without much thinking upon the order "ER", "EW", "EB'',
"X", but this is a rash decision. As soon as anyone is of the opinion that it
is attractive to place the zone "ER" at the low end, considerations of sym-
metry should suggest that the zone "EB" at the high end is equally attractive.
Still sticking to our earlier decision of only four different zones, we come to
the conclusion that the zones "EW" and "X" should be in the middle in
some order (convince yourself that it is now immaterial in which order!),
for instance:
"ER", "X", "EW", "EB"
Once we have chosen the above "general situation'', our problem is
essentially solved, for here we have a general situation of which both the
THE PROBLEM OF THE DUTCH NATIONAL FLAG 113
initial state (all buckets in zone "X") and the final state (zone "X" empty)
are special cases! We can establish it, and then a repetitive construct has to
decrease the size of zone "X" while maintaining this general situation. In
our mini-computer we need three integer variables for keeping track of the
place of the zone boundaries, e.g. "r", "w", and "b" with the meanings
1 <k< r: the kth bucket is in zone "ER"
(number of buckets r - I > 0)
the kth bucket is in zone "X"
(number of buckets w - r + 1 > 0)
w< k < b: the kth bucket is in zone "EW"
(number of buckets b - w > 0)
b <k<N: the kth bucket is in zone "EB"
(number of buckets N - b > 0)
Establishing the relation P to be kept invariant means initializing these
three variables in accordance with "all buckets in zone "X"", and the overall
structure of our program could be:
Immediately we are faced with the question: by which amount shall the
guarded statement decrease the number of buckets in zone "X"? There are
three arguments -and as the reader will notice, they are of a fairly general
nature- in favour of trying first whether we can come away with "decrease
the number of buckets in zone "X" by J". The arguments are:
J. Decreasing by 1 is sufficient.
2. As we have chosen our guard "w > r", we can guarantee the presence
of at most one bucket in zone "X"; for two, we would have needed the
guard "w > r".
3. The one pebble inspected will face us with three different cases, inspect-
ing two confronts us already with nine different cases; this multiplicative
building up of cases to be considered should be interpreted, in principle,
as a heavy price to pay for whatever we can gain by it.
The next question to be settled is: which one of the uninspected pebbles
will be looked at? This question is not necessarily irrelevant, because in the
meantime in the ordering "ER","X", "EW'',"EB" an asymmetric situation
114 THE PROBLEM OF THE DUTCH NATIONAL FLAG
Note. The program is robust in the sense that it will lead to abortion
when fed with erroneous data such as one of the pebbles being green.
(End of note.)
In the case that all pebbles are red and no swaps are necessary, our pro-
gram will prescribe N swaps and as conscious programmers we should
investigate how complicated a possibly more refined solution becomes:
perhaps we have acted too cowardly in rejecting it. (As a general strategy
I would recommend not to try the more refined solution before having con-
structed the more straightforward one; that strategy gives us, besides a
working program, an inexpensive indication of what the considered refine-
ment as such has to compete with.) I have always thought the above solution
perfectly satisfactory, and up till now I have never considered a more com-
plicated one. So, here we go!
Inspecting just one can be extended to "inspecting one or two" or "inspect-
ing as many as we can conveniently place". In view of the case "all pebbles
red" something along the latter line seems indicated. Before inspecting the
uninspected pebble at the high end we could try to move the boundary
indicated by r to the high end as much as we possibly can without swapping,
because it seems a pity to replace a red pebble in a perfectly OK-position by
another red pebble. The outer repetition could then begin with
THE PROBLEM OF THE DUTCH NATIONAL FLAG 115
do w > r-->
begin glovar buck, r, w, b; privar colr;
colr vir colour : = buck(r);
do colr =red and r < w--> r:= r + 1; colr:= buck(r) od;
The inner repetition stops, either because all pebbles have been inspected
(r = w) or because we have hit a nonred pebble. The case r = w, where
colr may have one of three different values, reduces to the alternative clause
of the earlier program but for the fact that the "buck :swap(r, w)" -r and w
being equal to each other- can be omitted. The case r < w implies
colr -::;t::. red; now we must be willing to inspect another pebble, for otherwise
our solution reduces to the one which always inspects the pebble at the low
end of the zone "X" and of that solution we know that on the average it
generates more swaps than our first effort. Because r < w, there is indeed
another uninspected pebble and the one in the wth bucket is the obvious
candidate.
Again, with colr = white, it seems a pity to swap the pebble in the rth
bucket with a white one in the wth bucket and in case of r < w it seems
indicated to enter a new inner block
We have now for colr the two possibilities white or blue, for colw the
three possibilities red, white, or blue, and the set of uninspected pebbles
may be empty or not. For a moment I feared that I might have to distinguish
between about 12 cases! But after looking at it for a long time (and after
one false start), I discovered that the way to proceed now is first to place the
pebble now in the wth bucket and to see to it that in all three cases the pebble
originally in the rth bucket is left in the (new) wth bucket. Then the three
alternatives can merge and a single text deals uniformly with the second
pebble, the colour of which is still given by colr.
Remark. For pedagogical reasons I slightly regret that the final treat-
ment of "two inspected pebbles in wrong buckets" did not turn out to be
worse; perhaps I should have resisted the temptation to do even this messy
job as decently as I could. (End of remark.)
When the guarded commands had emerged and the word got around, a
graduate student that was occupying himself mainly with business-oriented
computer applications expressed his doubts as to whether our approaches
were of any value outside (what he called) the scientific/technical applications
area. To support his doubt he confronted us with what he regarded as a
typical business application, viz. the updating of a sequential file. (For a
~ore precise statement of his problem, see below.) He could show the flow-
chart of a program that was supposed to solve the problem, but that had
arrows going from one box to another in such a wild manner that that solu-
tion (if it was one, for we could never find out!) was considered a kludge by
both of us. With some pride he showed us a decision table -his decision
table- that, according to him, would solve the problem; but the size of that
transition table terrified us. As the gauntlet was thrown to us, the only thing
left for us to do was to pick it up. Our first two efforts, although cleaner
than anything we had seen, did not yet satisfy W.H.J. Feijen, whose solution
I shall describe in this chapter. It turned out that our first efforts had been
relatively unsuccessful because by the way in which the problem had been
presented to us, we had erroneously been led to believe that this special nut
was particularly hard to crack. Quod non. I include the treatment of the file
updating problem in this monograph for three reasons. Firstly, because it
gives us the opportunity to publish Feijen's neat solution to a common type
of problem for which, apparently, very messy solutions are hanging around.
Secondly, because it can be found by exactly the same argument that led to
the elegant solution of the problem of the Dutch national flag. Finally, it
gives us the opportunity to cast some doubts on the opinion that business
117
118 UPDATING A SEQUENTIAL FILE
There is given a file, i.e. ordered sequence, of records or, more precisely,
of values of type "record". If xis (the value of) a variable of type record, a
boolean function x.norm and an integer function x.key are defined, such that
for some constant inf
x.norm =- (x.key < inf)
(non x.norm) =- (x.key = inf)
The given file of records is called "old.file" and successive records in the
sequence have monotonically increasing values of their "key"; only for the
last record of oldfile "x.norm" is false.
Furthermore there is given a file, called "trans.file", which is an ordered
sequence of transactions or, more precisely, of values of type "transaction".
If y is (the value of) a variable of type transaction, the boolean y.norm and
the integer y.key are defined, such that for the same constant inf as above
y. norm=- (y.key <inf)
(non y.norm) =-
(y.key =inf)
Successive transactions of "transfile" have monotonically nondecreasing
values of their "key", only the last transaction is abnormal and has "y.key =
inf". If y.norm is true, three further booleans are defined, viz. "y.upd", "y.del"
and "y.ins", such that always exactly one of them is true.
Furthermore, with x of type record and y of type transaction, such that
y.norm is true, three operations modifying x are defined:
initialization with meaningful values. In the case x.key < y.key both xx
and xxnorm can be given meaningful initial values; in the case x.key > y.key,
however, only
xxnorm vir boo/ :=false
would be really meaningful: our conventions would oblige us to initialize
xx then with some dummy value. We have circumvented this problem by
assuming the type record to include the abnormal value as well, thus postpon-
ing a discussion of the linguistic means that would be needed for the introduc-
tion of new types. (End of remark 1.)
MERGING PROBLEMS
16 REVISITED
123
1 24 MERGING PROBLEMS REVISITED
We denote the empty set by "0 ", i.e. z = 0 if and only if z contains no
element at all.
We now consider the task of computing for fixed sets X and Y the value
Z, given by
Z= X+ Y
(In the course of this discussion X and Y, and therefore Z, are regarded as
constants: Z is the desired final value of a variable to be introduced later.)
Our program is requested to do so by manipulating -i.e. inspecting, chang-
ing, etc.- sets element by element.
Before proceeding to think in more detail about the algorithm, we realize
that halfway through the computational process, some of the elements of Z
will have been found and some not, that is, there exists for the set Z a parti-
tioning
Z = zl z2 +
Here the symbol "+" is a shorthand for
Z = zl + z2 and zl * z2 = 0
(We may think of zl as the set of elements whose membership of Z has been
definitely established, and of z2 as the set of Z's remaining elements.)
Similarly, halfway through the computational process, the sets X and Y
can be partitioned
X = xl +
x2 and Y = yl y2 +
(Here we may think of the sets xl and yl as those elements of X and Y
respectively which do not need to be taken into account anymore, as they
Jiave been successfully processed.)
These interpretations of the partitionings of Z, X, and Y are, however,
of later concern. We shall first prove, quite independent of what might be
happening during the execution of our program, a theorem about such
partitionings.
THEOREM.
If Z=X+ Y (J)
X= xl + x2 (2)
Y= yl +Y2 (3)
zl = xl + yl (4)
z2 = x2 + y2, (5)
then Z = zl + z2 <=> (xl * y2 = 0 and yl * x2 = 0) (6)
Proof To show that the left-hand side of (6) implies its right-hand side,
we argue as follows:
Z = zl +
z2 => zl * z2 = 0
MERGING PROBLEMS REVISITED 125
zi * z2 = 0 and Z = zi + z2
zi * z2 =(xi + yi)*(x2 + y2)
= (xi *x2)+(xi *y2)+(yi *x2)+(yi *y2)
=0+0+0+0=0
zi + z2 =(xi + yi)+(x2 + y2)
=xi + x2 + yi + y2
= X + Y= Z (End of proof)
If relations (J) through (5) hold, the right-hand side of(6), and, therefore,
Z = zi :::+=: z2
is also implied by
zi * x2 = 0 and zi * y2 = 0 (7)
and from (J) through (5) and (7) it follows that if the partitioning of Z has
been chosen, the other two partitionings are uniquely defined.
Armed with the above knowledge, we return to our original problem, in
which our program should establish
R: z=Z
Not unlike our treatment of earlier problems, we introduce two variables
x and y and could try the invariant relation
P: z + (x + y) =Z
which has the pleasant property that it is trivially satisfied by
PO: = 0 and x = X and y =
z Y
while, together with (x + y) = 0, it implies R.
Our theorem now suggests to identify x with x2, y with y2, and z with
zi (the asymmetry reflecting that X and Y are known sets, while Z has to
be computed). After this identification, (2) through (5) define all sets in terms
of x and y. If we now synchronize the shrinking of x and y in such a way as
to keep the right-hand side of (6) or
Z*X= 0 and Z*Y= 0 (7')
invariant as well, then we know that
Z = X +Y= zi :::+=: z2
126 MERGING PROBLEMS REVISITED
EXERCISES
1. Modify this program such that it will establish u = U as well, where U is given
by U= X* Y.
1. Make a similar program, such that it will establish z = Z, where Z is given by
Z=W+X+Y.
3. Make a similar program, such that it will establish z = Z, where Z is given by
Z= W+ (X* Y).
4. Make a program establishing z = X + Y, but without assuming (nor intro-
ducing!) the value "inf" marking the high ends of the domains; empty sets may
be detected by ax.dam = 0 and ay.dom = 0 respectively. (End of exercises.)
At the expense of still some more formal machinery we could have played
our formal game in extenso.
Let, for any predicate P(z), the semantics of "z: = x '.f y" be given by
wp("z:= x + y", P(z)) = (x *y = 0 and P(x + y))
here the first term expresses that the intersection of x and y should be empty
for x '.f y to be defined.
Let, for any predicate P(z), the semantics of z:= x::::: y be given by
wp("z:= x::::: y", P(z)) = (x *y = y cand P(x::::: y))
where the first term expresses that y should be fully contained in x for x ::::: y
to be defined and x ::::: y then represents the unique solution of (x::::: y) '.f y
=x.
Eliminating xi, x2, yl, y2, zl, and z2, we find that we have to maintain
in terms of x, y, and z the relations:
Pl:
P2: Y*Y=y
P3: z = (X::::: x) + (Y::::: y)
P4: X*(Y:::::y)=0
P5: y *(X::::: x) = 0
and we can ask ourselves under what initial circumstances the execution of
S: z, x := z '.f {e}, x::::: {e}
will leave these relations invariant for some element e. We begin by investigat-
ing when this concurrent assignment is defined, i.e.
Because
(Pl and x *[e} = [e}) =- (X ~ x)*[e} = 0
(P2 and P4 and x *te} = [e}) =- (Y ~ Y)*[e} = 0
Q =- wp(S, T) with
Q =(Pl and P2 and P3 and P4 and x *[e} = [e})
It is now not too difficult to establish
Q =- wp(S, Pl and P2 and P3 and P4)
However:
wp(S, P5) = (wp(S, T) and y *(X ~ (x ~ [e})) = 0)
+
= (wp(S, T) and y *((X ~ x) [e}) = 0)
= (wp(S, T) and P5 and y *{e} = 0)
and consequently, the guard for S such that Pl through P5 remain invariant
IS
x *{e} = {e} and y *{e} = 0
i.e. e should be an element of x, but not of y, et cetera.
The above concludes my revisiting of merging problems. In the last two
chapters I have given treatments of different degrees of formality and which
one of them my reader will prefer will depend as much on his needs as on
his mood. But it seems instructive to go through the motions at least once!
(As the result in all probability shows, the writing of this chapter created
considerably more difficulties than anticipated. It is at least the fifth version;
that, in itself, already justifies its inclusion.)
AN EXERCISE ATTRIBUTED
17 TO R.W. HAMMING
The way the problem reached me was: "To generate in increasing order
the sequence J, 2, 3, 4, 5, 6, 8, 9, JO, 12, ... of all numbers divisible by no
primes other than 2, 3, or 5." Another way of stating which values are in the
sequence is by means of three axioms:
(We leave to the number theorists the task of establishing the equivalence of
the two above definitions.)
We include this exercise because its structure is quite typical for a large
class of problems. Being interested only in terminating programs, we shall
make a program generating only the, say, first 1000 values of the sequence.
Let
PO(n, q) mean: the value of "q" represents the ordered set of the first "n"
values of the sequence.
Under the assumption that we can extend a sequence with a value "xnext",
provided that the value "xnext" is known, the main problem of "increase n
by 1 under invariance of PO(n, q)" is how to determine the value "xnext".
Because the value 1 is already in q, xnext > 1, and xnext's membership of
the sequence must therefore rely on Axiom 2. Calling the maximum value
occurring in q "q.high", xnext is the minimum value > q.high, that is, of the
form 2 * x or 3 * x or 5 * x such that x occurs in the sequence. But because
2 * x, 3 * x, and 5 * x are all functions whose value is > x for x > 0, that
value of x must satisfy x < xnext; furthermore, x cannot satisfy x > q.high,
for then we would have
q.high < x < xnext
which would contradict that xnext is the minimum value > q.high. Therefore
we have x < q.high, i.e. x must already occur in q, and we can sharpen our
definition of xnext: xnext is the minimum value > q.high, that is of the form
2 * x or 3 * x or 5 * x, such that x occurs in q. (It is for the sake of the above
analysis that we have initialized PO(n, q) for n = 1; initialization for n = 0
would have been just as easy, but then q.high would not be defined.)
A straightforward implementation of the above analysis would lead to
the introduction of the set qq, where qq consists of all values xx > q.high,
such that xx can be written as
xx= 2 * x, with x in q,
or as
xx= 3 * x, with x in q,
or as
xx= 5 * x, with x in q
The set qq is nonempty and xnext would be the minimum value occurring
in it. But upon closer inspection, this is not too attractive, because the adjust-
ment of qq would imply (in the notation of the previous chapter)
qq:= (qq ~ {xnext}) + {2 * xnext, 3 * xnext, 5 * xnext}
where the "+" means "forming the union of two sets". Because we have to
determine the minimum value occurring in qq, it would be nice to have the
elements of q ordered; forming the union in the above adjustment would
then require an amount of reshuffling, which we would like to avoid.
A few moments of reflection, however, will suffice for the discovery that
we do not need to keep track of the whole set qq, but can select xnext as the
minimum value occurring in the much smaller set
AN EXERCISE ATTRIBUTED TO R.W. HAMMING 131
"establish Pl(q, x2, x3, x5) for the current value of q";
"increase n by I under invariance of PO(n, q), i.e.
extend q with min(x2, x3, x5)"
od
A program along the above lines would be correct, but now "establish
Pl(q, x2, x3, x5) for the current value of q" would be the nasty operation,
even if -what we assume- the elements of the ordered set q are as accessible
as we desire. The answer to this is a standard one: instead of computing x2,
x3, and x5 as a function of q afresh when we need them, we realize that the
value of q only changes "slowly" and try to "adjust" the values, which are a
function of q, whenever q changes. This is such a standard technique that it
is good to have a name for it; let us call it "taking the relation outside (the
repetitive construct)". Its application is reflected in the program of the follow-
ing structure:
The re-establishment of Pl(q, x2, x3, x5) has to take place after extension
of q, i.e. after increase of q.high; as a result, the adjustment of x2, x3, and x5
is either the empty operation, or an increase, viz. a replacement by the corre-
sponding multiple of a higher x from q. Representing the ordered set q by
132 AN EXERCISE ATIRIBUTED TO R.W. HAMMING
means of an array aq, i.e. as the values aq(l) through aq(n) in monotonically
increasing order, we introduce three indices i2, i3, and i5, and extend Pl with
... and x2 = 2 * aq(i2) and x3 = 3 * aq(i3) and x5 = 5 * aq(i5)
Our inner block, initializing the global array variable aq with the desired
final value could be:
begin virvar aq; privar i2, i3, i5, x2, x3, x5;
aq vir int array:= (J, l); {PO established}
i2 vir int, i3 vir int, i5 vir int:= 1, 1, l;
x2 vir int, x3 vir int, xS vir int:= 2, 3, 5; {Pl established}
do aq.dom =F 1000 -->
if x3 > x2 < x5--> aq:hiext(x2)
0 x2 > x3 < x5--> aq:hiext(x3)
0 x2 > x5 < x3 --> aq :hiext(x5)
fi {aq.dom has been increased by 1 under invariance of PO};
do x2 < aq.high--> i2:= i2 + 1; x2:= 2 * aq(i2) od;
dox3 < aq.high--> i3:= i3 + l; x3:= 3 * aq(i3) od;
do x5 < aq.high--> i5:= i5 + 1; x5:= 5 * aq(i5) od
{Pl has been re-established}
od
end
I prefer, however, not to do so, and not to combine the guarded com-
mands into a single set when the execution of one guarded statement list
cannot influence the truth of other guards from the set. The fact that the
three repetitive constructs, separated by semicolons, now appear in an
arbitrary order does not worry me: it is the usual form of over-specifica-
tion that we always encounter in sequential programs prescribing things
in succession that could take place concurrently. (End of note 2.)
Note that if nothing about the functions/, g, and h were given, the prob-
lem could not be solved!
EXERCISES
1. f(x,y)> x
2. (yl > y2) ~ (f(x, yl) > f(x, y2))
(End of exercises.)
The inventive reader who has done the above exercises successfully can
think of further variations himself.
THE PATTERN MATCHING
18 PROBLEM
The problem that is solved in this chapter is a very famous one and has
been tackled independently by many programmers. Yet we hope that our
treatment gives some pleasure to even those of my readers who considered
themselves thoroughly familiar with the problem and its various solutions.
We consider as given two sequences of values
p(O), p(l), ... , p(N - 1) with N> 1
and
x(O), x(l), ... , x(M - 1) withM> 0
(usually Mis regarded as being many times larger than N). The question to
be answered is: how many times does the "pattern", as given by the first
sequence, occur in the second sequence?
Using
(Ni: 0 < i < m: B(i))
to denote "the number of different values of i in the range 0 < i < m for
which B(i) holds", a more precise description of the final relation R that is
to be established is
R: count= (Ni: 0 < i< M - N: match(i))
where the function match(i) is given by
for 0 < i < M - N: match(i) = (Aj: 0 <j < N:p(j) = x(i + j))
for i < 0 or i > M - N: match(i) =false
(To define match(i) =false for those further values of i, thus making it a
total function, is a matter of convenience.)
If we take as invariant relation
Pl: count= (Ni: 0 < i< r: match(i)) and r > 0
135
136 THE PATTERN MATCHING PROBLEM
we have one which is trivially established by "count, r:= 0, O" and, further-
more, is such that
(Pl and r > M - N) => R
(The "matter of convenience" referred to above is that now the above inequal-
ity will do the job.) This gives a sketch for the program:
count, r:= 0, O;
do r < M - N __. "increase r under invariance of PI" od
and the reader is invited to work out for himself the refinement in which r
is always increased by I ; in the worst case, the time taken by the execution
of that program will be proportional to M * N.
Depending on the pattern, however, much larger increases of r seem
sometimes possible: if, for instance, the pattern is (J, 2, 3, 4, 5) andmatch(r)
has been found to hold, "count, r:= count+ 1, r + 5" would leave Pl invari-
ant! Considering the invariant relation
P2: (Aj: 0 <j < k:p(j) = x(r + j)) andO < k < N
(which can be expected to play a role in the repetitive construct computing
match(r )), we can investigate what we can gain by taking that relation outside
the repetitive construct, i.e. we consider:
count, r, k:= 0, 0, O;
do r < M - N--> "increase r under invariance of PI and P2" od
(relation P2 being vacuously satisfied by k = 0).
In view of the validity of relation P2 and the formula for match(r), the
most natural thing to start the repeatable statement with is to try to determine
match(r); as the truth of match(r) can be concluded from P2 and k = N, we
prescribe that k be increased as long as is necessary and possible:
do k * N candp(k) = x(r + k)--> k:= k +I od (J)
upon termination of which -and termination is guaranteed- we have
P2 and (k = N cor p(k) -=;t:. x(r + k))
from which we can conclude that match(r) = (k = N). Thus it is known
whether increasing r by I should be accompanied by "count:= count+ I"
or not. We would like to know by how much r can be increased without
further increase of count and without taking any further x-values into account.
(The taking into account of x-values is done in statement (J); to do so is its
specific purpose! Here we are willing to exploit only properties of the -
constant- pattern.)
If k = 0, we conclude (because N > 0) that match(r) =false; the relation
PI then justifies an increase of r by I (leaving PI invariant by leaving count
unchanged) but P2 does not justify any higher increase of r, and k = 0
(making P2 vacuously true) is maintained.
THE PATTERN MATCHING PROBLEM 137
i.e. for k = k2, d(kl) is also a solution for i of (2), but not necessarily the
smallest! From that we conclude that d(k) is a monotonically nondecreasing
function of k. And the algorithm therefore investigates increasing values of
i, each time deciding whether for one or more k-values i = d(k) can be con-
cluded (should be established). More precisely, let j(i) for given value of i
be the maximum value < N - i, such that
(Aj: 0 <j <j(i):p(j) = p(i +.i))
then d(k) = i for all k such that k - i <j(i) (or k <j(i) + i), for which no
solution d(k) < i exists. As the values of i will be tried in increasing order
and, upon identification as minimal solution, will be recorded in the mono-
tonically nondecreasing function d, the condition is
d.hib < k <j(i) +i
and we get the following program:
"initialize d":
begin glocon p, N; virvar d; privar i;
dvir int array, i vir int:=(/), O;
do d.hib =I= N ____.
begin glocon p, N; glovar d, i; privar j;
j vir int:= 0; i: = i + I;
do j < N - i cand p(j) = p(i + j) ____. j: = j + I od;
do d.hib < j + i ____. d:hiext(i) od
end
od
end
EXERCISES
Small Superset. Suppose that we had not been asked to count the number
of matches, but to generate the sequence of r-values for which match(r) holds.
When a program has to generate the members of a set A, there are
(roughly) only two situations. Either we have a simple, straightforward "suc-
cessor function" by means of which a next member of A can be generated
-and then the whole set can be trivially _generated by means of repeated
application of that successor function- or we do not have a function like
that. In the latter case, the usual technique is to generate the members of a
set B instead, where:
The trained problem solver, aware of the above, will consciously look
for a smaller set B than the obvious one. In this example, the set of all r-values
satisfying 0 < r < M - N is the obvious one. Note that in the previous
chapter "An Exercise Attributed to R. W. Hamming" the replacement of
the set "qq" by the much smaller set "qqq" was another application of the
principle of the Search for the Small Superset. And besides "taking a relation
outside the repetitive construct" this illustrates the second strategical similar-
ity between the solutions presented in the current and in the previous chapter.
(End of remark.)
WRITING A NUMBER
19 AS THE SUM OF TWO SQUARES
Suppose we are requested to design a program that will generate for any
given r > 0 all the essentially different ways in which r can be written as the
sum of two squares, more precisely, it has to generate all pairs (x, y), such
that
x2 +y =
2 rand x >y> 0 (I)
The answer will be delivered in two array variables xv and yv, such that
for i from xv.lob(= yv.lob) through xv.hib( = yv.hib) the pairs (xv(i), yv(i))
will enumerate all solutions of (J). The standard way of ensuring that our
sequential algorithm will find all solutions to (I) is to order the solutions of
(I) in some way, and I propose to order the solutions of (J) in the order of
increasing value of x (no two different solutions having the same x-value, this
ordering is unique). We propose to keep the following relation invariant
PI: xv(i) will be a monotonically increasing function with the same domain
as the monotonically decreasing function yv(i), such that the pairs
(xv(i), yv(i)) are all solutions of (J) with xv(i) < x
140
WRITING A NUMBER AS THE SUM OF TWO SQUARES 141
From this program we conclude that the invariant relation is really the
stronger relation
Pl'; Pl and 2 * x 2 > r
It is too much to hope to determine for each value of x the value y, such
that x 2 + y 2 = r, for such a value need not exist. What we can do is establish
x2 + y < r and x + (y + 1) >
2 2 2 r
From that relation we can conclude not only that if x + y 2 = r, a solution
2
of (J) has been found, but also that if x 2 + y 2 < r, for that value of x no
value y exists that would complete the pair. Taking the relation
P2: x2 + (y + 1)2 > r
as invariant relation for an inner repetitive construct, we can program
Observing, however, that the last alternative construct will not destroy the
validity of P2, we can improve the efficiency of this program considerably by
taking the relation P2 outside the outer repetitive construct:
In this chapter we shall tackle the problem of finding the smallest prime
factor of a large number N > I (by "large" I mean here a number of the
order of magnitude of, say, 1016 ), under the assumption that the program is
intended for a small machine whose additive operations and comparisons are
assumed to be very fast compared with arbitrary multiplications and divi-
sions. (Nowadays, these assumptions are realistic for most so-called "mini-
computers"; the algorithm to be described was developed years ago for what,
in spite of its physical size, would now be called "a micro-computer".)
A straightforward application of the Linear Search Theorem tells us that,
when looking for the smallest prime factor of N, we should investigate prime
numbers as possible factors in increasing order of magnitude. Because a
divisible number has at least one prime factor not exceeding its square root,
the investigation need not go beyond the square root; if then still no factor
has been found, the number N must be prime. An algorithm of the following
structure would do the job:
143
144 THE PROBLEM OF THE SMALLEST PRIME FACTOR OF A LARGE NUMBER
The main trouble with this algorithm is that we have only assumed that
additive operations and comparisons would be fast, but have allowed the
computations of N modf and of (f + 1) 2 to be so slow as to be avoided in the
inner cycle, if possible.
The only way out seems to find some way of applying the technique of
"taking a relation outside the repetitive construct'', i.e. seeking to store and
maintain such information that after the computation of r = N mod/, the
computation of the next value of r (for f + 1) can profit from it. What can
we store?
We can start with the observation that r = Nmodfis the solution ofthe
equation
N = f * q + r and 0 < r < f,
and we could store q as well. Then we know that
N = (f + J) * q + (r - q)
and, in general, we can expect to have "gained" in the sense that (r - q) will
be closer to zero than the original N and, therefore, "easier" to reduce modulo
+
(f J). But, particularly for smaller values off (and r) and -therefore-
Iarger values of q, we cannot expect to have gained very much.
THE PROBLEM OF THE SMALLEST PRIME FACTOR OF A LAROE NUMBER 145
q.-1 = (f + n)* q. + r.
q. = 0.
Eliminating the q's we get:
N=r 0 +
f*r1 +
f *U + l)*r2 +
f *(f + l)*(f + 2)*r 3 +
but for the fact that it would not do the job as far as the inequalities
0 < r1 <f + i
are concerned: r's could become negative. But this is easily remedied, because
(1) shows that an increase r0 : = r0 + f can be compensated by a decrease
r1 := r1 - 1. In general: r1 := r1 + (f + i) is compensated by r1+1 := r1+1 - 1.
As a result, the complete transformation is correctly described by
f:=f + l; i:= 0;
do i < n---->
r1 := r1 - (i + l)*r1+ 1 ;
do r 1 < 0 ---->
r1 := r1 + (f + i); r1+1 := r1+1 - 1
od;
i:= i + 1
od;
do r n = 0 ----> n : = n - 1 od
Under the assumption that multiplication by small integers presents no
serious problems -that could be done by repeated addition- the computa-
tion of successive values of N mod f has been reduced to the repertoire of
admissible operations. Furthermore, the test (f 2 < N in our earliest version,
(f + 1) 2 < Nin our next version) whether it is still worthwhile to proceed or
that the square root has been reached, can be replaced by n > l, for (n < 1)
=> (N < (f + 1)2).
With ar(k) = rk for 0 < k < ar.hib, we arrive at the following program:
begin glocon N; virvar p; privar f, ar;
begin glocon N; virvar ar; privar x, y;
ar vir int array:= (O); x vir int, y vir int:= N, 2;
do x-=/= 0----> ar:hiext(x mod y); x, y:= x div y, y + 1 od
end {ar has been initialized};
/vir int:= 2 {relation (J) has been established};
do ar(O) * 0 and ar.hib > 1 ---->
begin glovar f, ar; privar i;
f: = f + 1; i vir int:= 0;
do i -=/= ar.hib ---->
begin glocon/; glovar ar, i; priconj;
j vir int:= i + 1; ar:(i) = ar(i) - j * ar(j);
do ar(i) < 0----> ar:(i) = ar(i) + f + i;
ar :(}) = ar(j) - 1
od;
i:=}
end
THE PROBLEM OF THE SMALLEST PRIME FACTOR OF A LARGE NUMBER 147
od
end;
do ar.high = 0--> ar:hirem od
od;
if ar(O) = 0--> p vir int:= f
0ar(O) -::F 0--> p vir int:= N
fi
end
Remark 1. One might think that the algorithm could be speeded up by
a factor 2 by separately dealing with even values of N and with odd values of
N. For the latter we then could restrict the sequence of /-values to 3, 5, 7, 9,
11, 13, 15, ... ; the analogy of
Yi:= Yi - (i + l)*Yi+I
then becomes
r, := Yi - 2*(i + 1)* Yi+I
and this more violent disturbance will cause the next loop, which has to
bring ri within range again, to be repeated on the average about twice as
many times. The change is hardly an improvement and has mainly the effect
of messing up the formulae. (End of remark 1.)
for i = j :f (i,j) = M
(Here f (i, j) can be interpreted as the distance from i to j; the rule f (i, i) =
M has been added for the purpose of the above simplification.)
We are requested to determine the set of maximally isolated villages,
i.e. the set of all values of k such that
Note that eventually all values I < miv.dom < n are possible.
A very simple and straightforward program computes the n isolation
degrees in succession and keeps track of their maximum value found thus
far. On account of the bounds for f (i,j) we can take as the minimum of an
empty set the value Mand as the maximum of an empty set 0.
149
150 THE PROBLEM OF THE MOST ISOLA TED VILLAGES
This can be catered for by introducing an array, b say, such that fork satisfy-
ing i < k < n:
for i = 0 : b(k) = M
for i > 0: b(k) = minimumf(k, h)
O~h<i
(In words: b(k) is the minimum distance connecting village k that has been
computed thus far.)
The result of Optimization 2 is also fairly straightforward.
The innermost loop can now terminate with j < n; the values b(k) with
j< k < n for which updating is still of possible interest are now the ones
152 THE PROBLEM OF THE MOST ISOLATED VILLAGES
with b(k) > max, the other ones are already small enough. The following
insertion will do the job:
do j =I= n-->
if b(j) <max--> j:= j + I
a b(j) >max-->
begin glocon i; glovar j, b; privar ff;
ff vir int := f (i, j);
do ff < b(j) --> b :(j) =ff od;
j:=j +I
end
fi
od
The best place for this insertion is immediately preceding "i:= i +I'', but
after the adjustment of max; the higher max, the larger the probability that
a b(k) does not need any more adjustments.
The two optimizations that we have combined are of a vastly different
nature. Optimization 2 is just "avoiding redoing work known to have been
done'', and its effectiveness is known a priori. Optimization I, however, is
a strategy whose effectiveness depends on the unknown values of/: it is just
one of the many possible strategies in the same vein.
We are looking for those rows of the distance matrix whose minimum
element value S exceeds the minimum elements of the remaining rows and
the idea of Optimization I is that for that purpose we do not need to compute
for the remaining rows the actual minimum if we can find for each row an
upper bound B 1 for its minimum, such that Bi < S. In an intermediate stage
of the computation, for some row(s) the minimum S is known because all
its/their elements have been computed; for other rows we only know an
upper bound Bi. And now the strategic freedom is quite clear: do we first
compute the smallest number of additional matrix elements still needed to
determine a new minimum, in the hope that it will be larger than the minimum
we had and, therefore, may exceed a few more B's? Or do we first compute
unknown elements in rows with a high B in the hope of cheaply decreasing
that upper bound? Or any mixture?
My original version combining the two strategies postponed the "updat-
ing of the remaining b(k)" somewhat longer, in the hope that in the meantime
max would have grown still further, but whether it was a more efficient
program than the one published in this chapter is subject to doubt. It was
certainly more complicated, needing yet another array for storing a sequence
of village numbers. The published version was only discovered when writing
this chapter.
In retrospect I consider my ingenuity spent on my original program as
THE PROBLEM OF THE MOST ISOLATED VILLAGES 153
wasted: if it was "more efficient" it could only be so "on the average". But
on what average? Such an average is only defined provided that we postulate
-quite arbitrarily!- a probability distribution for the distance matrix
f(i,j). On the other hand it was not my intention to tailor the algorithm
to a specific subclass of distance matrices!
The moral- of the story is that, in making a general program, we should
hesitate to yield to the temptation to incorporate the smart strategy that
would improve the performance in cases that might never occur, if such
incorporation complicates the program notably: simplicity of the program
is a less ambiguous target. (The problem is that we are often so proud of our
smart strategies that it hurts to abandon them.)
Remnrk. Our final program combines two ideas and we have found it
by first considering-as "stepping stones", so to speak- two programs, each
incorporating one of them, but not the other. In many instances I found
such stepping stones most helpful. (End of remark.)
THE PROBLEM
OF THE SHORTEST
22 SUBSPANNING TREE
154
THE PROBLEM OF THE SHORTEST SUBSPANNING TREE 155
Any two of these properties imply that the graph is a tree and, therefore,
also enjoys the third property.)
But now we have the framework for an algorithm, provided that we can
find an initial subtree to colour red. Once we have that, we can select the
shortest violet branch, colour it and its blue endpoint red, etc., letting the
red tree grow until there are no more blue points. To start the process, it
suffices to colour an arbitrary point red:
156 THE PROBLEM OF THE SHORTEST SUBSPANNING TREE
As it stands, the main task will be: "select the shortest now violet branch",
because the number of violet branches may be quite large, viz. k *(N - k),
where k = number of red points. If "select the shortest now violet branch"
were executed as ·an isolated operation, it would require on the average a
number of comparisons proportional to N 2 and the amount of work to be
done by the algorithm as a whole would grow as N 5 • Observing, however,
that the operation "select the shortest now violet branch" does not occur in
isolation, but as component of a repetitive construct, we should ask ourselves
whether we can apply the technique of "taking a relation outside the repeti-
tive construct", i.e. whether we can arrange matters in such a way that
subsequent executions of "select the shortest now violet branch" may profit
from the preceding one. There is considerable hope that this may be possible,
because one set of violet branches is closely related to the next: the set of
violet branches is defined by the way in which the points have been parti-
tioned in red ones and blue ones, and this partitioning is each time only
changed by painting one blue point red.
Hoping for a drastic reduction in searching time when selecting the
shortest branch from a set means hoping to reduce the size of that set; in
other words, what we are looking for is a subset of the violet branches -caII
it· the "ultraviolet" ones- that will contain the shortest one and can be used
to transmit helpful information from one selection to the next. We are envis-
aging a program of the structure:
2. The set of ultraviolet branches is, on the average, much smaller than
the set of violet ones.
3. The operation "adjust the set of ultraviolet branches" is relatively cheap.
(We require the first property because then our new algorithm is correct
as well; we require the second and the third properties because we would
like our new algorithm to be more efficient than the old one.)
Can we find such a definition of the notion "ultraviolet"? Well, for lack
of further knowledge, I can only suggest that we try. Considering that the
set of violet branches leading from k red points to N - k blue points, has
k *(N - k) members, and observing our first criterion, two obvious possible
subsets immediately present themselves:
I. Make for each red point the shortest violet branch ending in it ultravio-
let; the set of ultraviolet branches has then k members.
2. Make for each blue point the shortest violet branch ending in it ultravio-
let; the set of ultraviolet branches has then N - k members.
Our aim is to keep the ultraviolet subset small, but we won't get a clue
from their sizes: for the first choice the size will run from I through N - I,
for the second choice it will be the other way round. So, ifthere is any chance
of deciding, we must find it in the price of the operation "adjust the set of
ultraviolet branches".
Without trying different adjustments, however, there is one observation
that suggests a strong preference for the second choice. In the first choice,
different ultraviolet branches may lead to the same blue point and then we
know a priori that at most one of them will be coloured red; with the second
choice each blue point is connected in only one way to the red tree, i.e. red
and ultraviolet branches form all the time a subspanning tree between the
N points. Let us therefore explore the consequences of the second definition
for our notion "ultraviolet".
Consider the stage in which we had a red subtree R and in which from
the set of corresponding ultraviolet branches (according to the second
definition; I shall no longer repeat that qualification) the shortest one and
its originally blue endpoint P have been coloured red. The number of ultra-
violet branches has been decreased by I as it should be. But, are the remain-
ing ones the correct ones? They represent for each blue point the shortest
possible connection to the originally red tree R, they should represent the
shortest possible connection to the new red tree R + P. But this question is
settled by means of one simple comparison for each blue point B: if the
branch BP is shorter than the ultraviolet branch connecting B to R, the latter
is to be replaced by BP, otherwise it is maintained as, apparently, the growth
of the red tree did not result in a shorter way of connecting B with it. As a
158 THE PROBLEM OF THE SHORTEST SUBSPANNING TREE
EXERCISE
Convince yourself that the rejected alternative for the concept "ultraviolet" is not
so helpful. (End of exercise.)
means of the convention that the hth ultraviolet branch connects the red
point ''from(h)" with the blue point "to(h)".
In the following program, point N is chosen as the arbitrary point that
is initially coloured red.
Note. With respect to the array ''from" and the scalar variable "suv"
one could argue that we have not been able to avoid meaningless initial-
izations; they refer, however, to a virtual point 0 at a distance "inf" from
all the others and to an equally virtual 0th ultraviolet branch. (End
of note.)
In spite of the simplicity of the final program -it is not much more than
a loop inside a loop- the algorithm it embodies is generally not regarded as
a trivial one. It is, as a matter of fact, well-known for being highly efficient,
both with respect to its storage utilization and with respect to the number of
comparisons of branch lengths that are performed. It may, therefore, be
rewarding to review the major steps that led to its ultimate discovery.
The first crucial choice has been to try to restrict ourselves to such inter-
160 THE PROBLEM OF THE SHORTEST SUBSPANNING TREE
mediate states that the red branches, i.e. the ones known to belong to the
final answer, always form a tree by themselves. The clerical gain is that then
the number of red points exceeds the number of red branches by exactly one
and that we are allowed to conclude that a branch leading from one red
point to another is never a candidate for being coloured red: it would erro-
neously close a cycle. (And now we see an alternative algorithm: sort all
branches in the order of increasing length and process them in that order,
where processing means if the branch, together with the red branches, forms
a cycle, reject it, otherwise colour it red. Obviously, this algorithm establishes
the membership of the final answer for the red branches in the order of
increasing length. The algorithm is less attractive because, firstly, we have
to sort all the branches and, secondly, the determination of whether a new
branch will close a cycle is not too attractive either. That problem will be
the subject of a next chapter.) The moral of the story is that the effort to
reach the final goal via "simple" intermediate states is usually worth trying!
A second crucial step was the formulation of the conjecture that the
shortest violet branch could be painted red as well. Again, that conjecture
has not been pulled out of a magic hat; if we wish to "grow" a red tree, a
violet branch is what we should be looking for, and the fact that then the
shortest one is a likely candidate is hardly surprising.
The decision not to be content with an N 3 -algorithm -a decision which
led to the notion "ultraviolet"- is sometimes felt to be the most unexpected
one. People argue: "But suppose that it had not entered my head to investi-
gate whether I could find a better algorithm?" Well, that decision came at a
moment that we had an algorithm, and the mathematical analysis of the
original problem essentially had been done. It was only an optimization for
which, for instance, no further knowledge of graph theory was anymore
required! Besides that, it was an optimization that followed a well-known
pattern: taking a relation outside the loop. The moral of the story is that
once one has an algorithm, one should not be content with it too soon, but
investigate whether it can still be massaged. When one has made such recon-
siderations a habit, it is unlikely that the notion of "ultraviolet" would in
this example have escaped one's attention.
Note. It follows from the algorithm that the shortest subspanning tree
is unique if no two different branches have equal lengths. Verify that if
there is more than one shortest subspanning tree, our algorithm may
construct any of them. (End of note.)
A very different algorithm places the branches in arbitrary order, but,
whenever after placement of a branch, a cycle is formed, the (or a) longest
branch of that cycle is removed before the next branch is placed.
REM'S ALGORITHM
FOR THE RECORDING
23 OF EQUIVALENCE CLASSES
In a general graph (which need not be a tree) the points are usually called
"vertices" and the connections are usually called "edges" rather than
branches. A graph is called "connected" if and only if it contains only one
vertex or its edges provide at least one path between any two different vertices
from it. Because many of the possible edges of a graph with N vertices may
be missing, a graph need not be connected. But each graph, connected or
not, can always be partitioned uniquely into connected subgraphs, i.e. the
vertices of the graph can be partitioned in subsets, such that any pair from
the same subset is connected, while the edges provide no path between any
two vertices taken from two different subsets. (For the mathematicians:
"being connected" is a reflexive, symmetric, and transitive relation, which,
therefore, generates equivalence classes.)
We consider N vertices, numbered from 0 through N - I, where N is
supposed to be large (10,000, say), and a sequence of graphs G0 , G1 , G2 , • ••
which result from connecting these vertices via the edges of the sets E 0 ,
EI> E 2 , ••• , where E 0 is empty and E 1+I = E 1 + {ei} and e 0 , e1 , e2 , ••• is a
given sequence of edges. The edges e 0 , e1 , e 2 , ••• have to be processed in that
order and when n of them have been processed (i.e. the last edge processed,
if any, is e._ 1) we must be able to determine for any pair of vertices whether
they are connected in G. or not. The main problem to be solved in this
chapter is: "How do we store the relevant information as derived from the
edge sequence e 0 , e1 , • • • , e._ 1 ?"
We could, of course, store the edge sequence "e 0 , e1' ... , e._/' itself, but
that is not a very practical solution. For, firstly, it stores a lot of irrelevant
information; e.g. if the edges {7, 8} and {12, 7} have been processed, a new
edge {12, 8} does not tell us anything new! And, secondly, the answer to the
161
162 REM'S ALGORITHM FOR THE RECORDING OF EQUIVALENCE CLASSES
JO(p) = p,
and for i > 0
f(qs) = ps should hold. Processing the edge {p, q} can therefore be done by
the following inner block:
Although correct, the above processing of a new edge is not too attractive,
as its worst case performance can become very bad: the cycles may have to
be repeated very many times. It seems advisable to "clean up the tree"; for
both vertex p and vertex q we now know the current identifying vertex and it
seems a pity to trace those paths possibly over and over again. To remedy
this situation, we offer the following inner block (we have also incorporated
an effort to reduce the number of .f-evaluations)
EXERCISES
The above algorithm has not been included for its beauty. As a matter
of fact, I would not be amazed ifit left the majority of my readers dissatisfied.
It is, for instance, annoying that it is not a trivial task to estimate how much
REM'S ALGORITHM FOR THE RECORDING OF EQUIVALENCE CLASSES 165
we have gained by going from the first to the second version. It has been
included for two other reasons.
Firstly, it gave me in a nicely compact setting the opportunity to discuss
alternative ways for representing information and to illustrate the various
economic considerations. Secondly, it is worthwhile to point out that we
allow as intermediate state a nonunique representation of the (unique) cur-
rent partitioning and allow the further destruction of irrelevant information
to be postponed until a more convenient moment arrives. I have encountered
such use of nonunique representations as the basis for a few, otherwise very
surprising, inventions.
When M. Rem read the above text, he became very pensive -my solution,
indeed, left him very dissatisfied- and shortly afterwards he showed me
another solution. Rem's algorithm is in a few respects such a beauty that
I could not resist the temptation to include it as well. (The following reason-
ing about it is the result of a joint effort, in which W.H.J. Feijen participated
as well.)
In the previous solution it is not good form that, starting at p, the path
to the root of that tree is traced twice, once to determine ps, and then, with
the knowledge of ps, a second time in order to clean it up. Furthermore it
destroys the symmetry between p and q.
The two scans of the path from p were necessary because we wanted to
perform a complete cleaning up of it. Without knowing the number of the
identifying vertex we could, however, at least do a partial cleaning up in
a single scan, if we knew a direction towards "a cleaner tree". It is therefore
suggested to exploit the ordering relation between the vertex numbers and to
choose for each subset as identifying vertex number, say, the minimum value;
then, the smaller the.f-values, the cleaner the tree. Because initially f (k) = k
for all k and our target has now become to decrease .f-values, we should
restrict ourselves to representations of the partitioning satisfying
f(k) < k for 0 < k < N.
This restriction has the further advantage that it is now obvious that the
only cycles present are the stationary points for which f (k) = k (i.e. the
identifying vertices).
For the purpose of a more precise treatment we introduce the following
notation: a function ''part" and a binary operator"$".
part(/) denotes the partitioning represented by f
part(f)$(p, q) denotes the partitioning resulting when in part(/) the
subset containing p and the subset containing q are
combined into a single one. If and only if in part(/) the
vertices p and q are already in the same subset, we have
part(!) = part(f)$(p, q)
166 REM'S ALGORITHM FOR THE RECORDING OF EQUIVALENCE CLASSES
Denoting the initial value off by hni•• the processing of an edge (p, q)
can be described as establishing the relation
R: part(!) = part(f;nit)$(p, q)
This is done (in the usual fashion!) by introducing two local variables, pO
and qO say, (or, if you prefer, a local edge) satisfying the relation
P: part(f)$(p0, qO) = part(f;nit)$(p, q)
which is trivially established by the initialization
pO vir int, qO vir int : = p, q
After the establishment of P the algorithm should massage f, pO, and qO
under invariance of P until we can conclude that
Q: part(!) = part(f)$(p0, qO)
holds, as (Q and P) => R.
Relation R has in general many solutions for f, but the cleaner the one
we can get, the better; it is therefore proposed to look for a massaging process
such that each step decreases at least one /-value. Then termination is
ensured (as monotonically decreasing variant function we can take the sum
of the N /-values) and we must try to find enough steps such that BB, the
disjunction of the guards, is weak enough so that (non BB) => Q.
We can change the value of the function/in pointpO, say, by
/: (pO) = something
but in order to ensure an effective decrease of the variant function, that
"something" must be smaller than the original value of f(pO), i.e. smaller
than pl if we introduce
Pl: pl = f(pO) and qi = f(qO)
(The second term has been introduced for reasons of symmetry.)
Because part(f)$(p0, qO) has to remain constant, obvious candidates for
the "something" are qO and qi; but because qi < qO, the choice qi will in
general be more effective, and we are led to consider
qi <pl __. /: (pO)= qi
where the guard is fully caused by the requirement of effective decrease of
the variant function. The next question is whether after this change off
we can readjust (pO, qO) so as to restore the possibly destroyed relation P.
The connection (from pO) to pl being removed, a safe readjustment has to
reestablish the (possibly) destroyed connection with pl. After the change of
f caused by/: (pO)= qi, we know that
part(f)$(pl, x) = part(f;nii)$(p, q)
for x equal to pO, qO, or qi. The relation P is most readily re-established
REM'S ALGORITHM FOR THE RECORDING OF EQUIVALENCE CLASSES 167
The repetitive construct has been constructed in such a way that termination
is ensured: upon completion we can conclude pl =qi, which on account of
Pl impliesf(pO) = f(qO), from which Q follows! Q.E.D.
Note. For the controlled derivation of this program --even for an a
posteriori correctness proof- the introduction of "part" and "$", or a
similarly powerful notation, seems fairly essential. Those readers who
doubt this assertion are invited to try for themselves a proof in terms of
the N values of/ (k), rather than in terms of a nicely captured property of
the function/ as a whole, as we have done. (End of note.)
Advice. Those readers who have not fully grasped the compelling beauty
of Rem's algorithm should reread this chapter very carefully. (End of
advice.)
THE PROBLEM
OF THE CONVEX HULL
24 IN THREE DIMENSIONS
In order to forestall the criticism that I only show examples that admit
a nice, convincing solution, and, in a sense, do not lead to a difficult program,
we shall now tackle a problem which I am pretty sure will turn out to be
much harder. (For my reader's information: while starting to write this
chapter, I have never seen a program solving the problem we are going to
deal with.)
Given a number of different points on a straight line -by their coordi-
nates, say- we can ask to select those points P, such that all other points lie
at the same side of P. This problem is simple: scanning the coordinates once
we determine their minimum and their maximum value.
Given, by means of their x-y-coordinates, a number of different points
in a plane such that no three points lie on the same straight line, we can ask
to select those points P through which a straight line can be drawn such that
all other points lie at the same side of that line. These points Pare the vertices
of what is called "the convex hull of the given points". The convex hull itself
is a cyclic ordering of these points with the property that the straight line
connecting two successive ones has all remaining points at one of its sides.
The convex hull is the shortest closed line such that each point lies either
on it or inside it.
In this chapter we shall tackle the analogous problem for three dimen-
sions. Given, by their x-y-z-coordinates, N different points (N large) such
that no four different points lie in the same plane, select all those points P
through which a plane can be "drawn", such that all other points lie at the
same side of that plane. These points are the vertices of the convex hull
around those N points, i.e. the minimal closed surface such that each point
lies either on it or inside it. (The restriction that no four points lie in the
168
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 169
same plane has been introduced to simplify the problem; as a result all the
faces of the convex hull will be triangles.)
For the time being we leave the question open as to whether the convex
hull should be produced as the collection triangles forming its faces, or as the
graph of its vertices and edges -where the edges are the lines through two
different vertices such that a plane through that line can be "drawn" with
all other points at the same side of it.
The reason why we postpone this decision is exactly the reason why the
problem of the three-dimensional convex hull is such a hairy one. In the
two-dimensional case, the convex hull is one-dimensional and its "process-
ing" (scanning, building up, etc.) is quite naturally done by a sequential
algorithm with a linear store. In the three-dimensional case, however, neither
the representation of the "two-dimensional" answer with the aid of a linear
store nor the "sequencing" in manipulating it are obvious.
All algorithms that I know for the solution to the two-dimensional
problem can be viewed as specific instances of the abstract program:
current hull. Instead of scanning all vertices, we can also exploit that a point
lies only inside the convex hull if it lies inside a triangle between three of its
vertices and we can try to find such a triple of vertices according to some
strategy aiming at "quickest inclusion". Some of these strategies can, on
"an" average, be speeded up quite considerably by some additional book-
keeping, by trading storage space versus computation time.
From the above we can only expect that for the three-dimensional case,
the collection of algorithms worthy of the predicate "reasonable" will be
of a dazzling variety. It would be vain to try anything approaching an
exhaustive exploration of that class, and I promise that I will be more than
happy if I can find one, perhaps two "reasonable" algorithms that do not
seem to be unduly complicated.
Personally I find such a global exploration of the potential difficulty,
as given above, very helpful, as it indicates how humbly I should approach
the problem. In this case I interpret the indications as pointing towards a
very humble approach and it seems wise to refrain from as much complicat-
ing sophistication as we possibly can, before we have discovered -if ever!
- that, after all, the problem is not as bad as it seemed at first sight. The
most drastic simplification I can think of is confining our attention to the topo-
logy, and refraining from all strategic considerations based upon expecta-
tion values for the numbers of points inside given volumes.
It seems that the most sensible thing to do is to look at the various ways
of solving the two-dimensional problem and to investigate their generaliza-
tion to three dimensions.
A simple solution of the two-dimensional problem deals with the points
in arbitrary order, and it maintains the convex hull for the first n points,
initializing with n = 3. Whenever a next point is taken into consideration
two problems have to be solved:
1. It has to be decided whether the new point lies inside or outside the
current hull.
2. If it is found to lie outside the current hull, the current hull should be
adjusted.
One way of doing this is to search for the set of k consecutive (!)edges
such that the new point lies at their wrong side. Those k edges (and k - I
points) have to be removed during the adjustment; they will be replaced by
I point and 2 edges. If the search fails to find such a set, the new point lies
inside.
Assuming the current vertices cyclically arranged, such that for each
vertex we can find its predecessor as well as its successor, the least sophisti-
cated program investigates these edges in order. In the three-dimensional
problem the equivalents of the edges are the triangular faces, the equivalent
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 171
As a result "i'', "suc(i)" and "suc(suc(i))" are the numbers of the edges
forming in that order a clockwise boundary of a face. Because all faces are
triangles, we shall have
suc(suc(suc(i))) = i
The functions "inv" and "sue" give the complete topological description
of the convex hull in terms of directed edge names. If we want to go from
there to point numbers, we can introduce a third function
end(i) = the number of the point in which the directed edge nr. i ends.
We then have end(inv(i)) = end(suc(suc(i))), because both expressions
denote the number of the point in which the directed edge nr. i begins; the
left-hand expression is to be preferred, not so much because it is simpler,
but because it is independent of the assumption that all faces are triangles.
To find for a given point, say nr. k, the set of edges ending in it is very
awkward, and therefore must be avoided. Rather than storing "k", we must
store "ek" such that end(ek) = k. Then, for instance,
ek:= inv(suc(ek))
will switch ek to the next edge ending in k; by repeated application of that
transformation we shall be able to rotate ek along all edges ending in point
nr. k (and thus we have access to all faces with point nr. k on their boundary).
Note. As inv(inv(i)) = i and I expect to be rather free in assigning num-
bers to edges, we can probably assign numbers -::;t::. 0 and introduce the
convention inv(i) = -i; in that case we do not need to store the function
inv at all. (End of note.)
Our task is to try to separate with respect to the new point the current
hull into two caps. I take the position that establishing at which side of a
face the new point lies will be the time-consuming operation and would like
to confront the new point at most once with each face (and possibly with
some faces not at all).
To use a metaphor, we can regard the new point as a lamp that illumi-
nates all faces on which its rays fall from outside. Let us call them the "light"
faces and all the other ones the "dark" faces. The light faces have to be
removed; only if the point lies inside the current hull, does it leave all faces
dark.
After the colouring of the first face I only expect to colour new faces
adjoining a coloured one. The two main questions that I find myself ponder-
ing now are:
1. Do we insist that, as long as only faces of the same colour have been
found, the next face to investigate should form a cap with the already
coloured ones, or do we admit "holes"? The reason is that the test
whether a new face, when added, will give rise to a hole does not seem
too attractive.
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 173
2. As soon as two colours have been used, we know that the new point
lies outside the current hull and we also know an edge between two
differently coloured faces. Should we change strategy and from then on
try to trace the boundary between dark and light faces, rather than
colour all faces of our initial colour?
Can we come away with that? We observe that the intersection of Kand H
is empty, because no edge i can be a member of both Kand H.
Can we maintain both sets K and H if the new face is right, and also if
the new face is wrong? Let x be an edge of set K and let us consider again
the three edges y for y = -x, y = suc(-x), and y = suc(suc(-x)). In the
following, B is the set of discovered edges of the clockwise boundary of the
right cap.
If the new face is right, its three edges y have each to be processed by:
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 175
1. We do not need to treat the first face inspected separately; its three edges
y can be treated as those of any other face.
2. The two alternative constructs for dealing with the three edges of a
newly inspected face (for a right face and for a wrong face respectively)
as given above can be mapped on a single alternative construct.
3. We don't need to invert the edges of B if it was the clockwise boundary
of the dark cap.
The further details will be postponed until we have explored the next
operation: adjusting the convex hull so as to include the new point as well.
Axiom 3. The set S(B) contains only those elements that belong
to it on account of Axioms I and 2.
The admission of holes implies that there may be (even will be) values of i
such that finally 0 < abs(i) < sue.hib, while i is not the number of an edge
of the final answer; the introduction of the variable "start" allows us not to
make any commitments regarding the value of sue(i) and end(i) for such a
value of i. Symbolically, we can now describe the function of the inner block
to be designed by
"(sue, end, start) vir hull:= convex hull of (x, y, z)".
As far as our external commitments are concerned, we could restrict
ourselves to the introduction of a single private variable, "np" say, and the
invariant relation
Pl: (sue, end, start)= convex hull of the first np points of (x, y, z)
and a block
but this is not sufficient for two reasons. Firstly, inside the repeatable state-
ment we want to reuse holes, and therefore we should introduce the variable
yh for the youngest hole. Secondly, the increase of np implies scanning the
(faces along) edges of the convex hull. We could introduce in the repeatable
statement an array that each time could be initialized with a special value for
each edge, meaning "not yet confronted with the new point". Logically, this
would be perfectly correct, but it seems more efficient to bring that array
outside the repeatable statement and to see to it that, as the scanning pro-
ceeds, it ends up with the same neutral values with which it started. We shall
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 181
call this array "set" and choose the neutral value= 0. Summing up, we intro-
duce the invariant relation
P2: (sue, end, start)= convex hull of the first np points of (x, y, z)
and yh = youngest hole
and (i is a hole =F 0) ~ suc(i) is the next oldest hole
and (i is an edge of (sue, end, start)) ~ set(i) = 0
has been completed, all edges i that have had set(i) = ±1 will have set(i)
reset to 0 or to 2.
Besides recording the boundary as "the edges with set(i) = 2", it is
helpful to have a list of these edge numbers in cyclic order, because that
comes in handy when the edges to and from the new point have to be added.
Because the optimization that switches to a linear search as soon as the
first edge of the boundary has been found finds the edges in cyclic order,
our version will produce that list as well. We propose to record the numbers
of the edges of the boundary in cyclic order in an array, "b" say; b.dom = O
can then be taken as an indication that no boundary has been found. Our
coarsest design becomes:
To establish the boundary in "set" and "b" would require two steps: in
the first step all faces are confronted with the new point and the boundary
is established in "set", and in the second step we would have to trace the
boundary and place its edges in cyclic order in the list "b". Although it was
my original intention to do so, and to leave the transition to the more linear
search as soon as the first edge of the boundary has been found as an exercise
in optimization to my readers, I now change my mind, because I have dis-
covered a hitherto unsuspected problem: the first inspection of a face that
reveals an edge of the boundary may reveal more than one boundary edge.
If the faces are triangles, they will be adjacent boundary edges, but the
absence of faces with more than three edges is only due to the restriction
that we would not have four points in a plane. I did not intend to allow this
restriction to play a very central role and as a result I refuse to exploit this
more or less accidental adjacency of the boundary edges that may be revealed
at the inspection of a new face. And the potential "simultaneous" discovery
of nonadjacent boundary edges was something I had not foreseen; adjacency
plays a role if we want to discover boundary edges in cyclic order, i.e. place
their edge numbers in cyclic order in array b (with "b:hiext"). The moral of
the story seems to be to separate the "discovery" of a boundary edge -while
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 183
scanning the edges of the newly inspected face- from the building up of the
value of "b". Because the discovered boundary edges have to be separated
in those "processed", i.e. stored as a function value of b, and those still
unprocessed, some more information needs to be stored in the array "set".
I propose:
set(i) = 1 and set(-i) = 0 the face along edge i has been established
light, the face along edge -i has not yet
been confronted with the current new point
set(i) = -1 and set( -i) = 0 the face along edge i has been established
dark, the face along edge - i has not yet
been confronted with the current new point
set(i) = 1 and set( -i) = -1 edge i is an unprocessed edge of the
clockwise boundary of the light cap
set(i) = 2 and set( - i) = 0 edge i is a processed edge of the clockwise
boundary of the light cap
set(i) = 0 and set(-i) = 0 the faces along i and -i are both unin-
spected or have been established to have
the same colour.
We stick to our original principle only to inspect (after the first time) a
face along (the inverse of) a half-inspected edge, say along -xx (i.e. set(xx) =
±1 and set(-xx) = O); we can then use the value xx= 0 to indicate that
no more inspection is necessary. The relation "b.dom = O" can be used to
if!dicate that up till now no boundary edges have been discovered; the first
boundary edge to be discovered will just be placed in b (thereby causing
b.dom = 1), the remaining boundary edges will then be placed by careful
extension of b. A first sketch (at this level still too rough) is
(The different names "refresh xx" and "reassign xx" have been used in order
to indicate that rather different strategies are involved.)
I have called this sketch too rough: "reassign xx" has to assign to xx
the number of a half-inspected edge (if it can find one, otherwise zero). It
184 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS
would be very awkward indeed if this implied a search over the current hull,
which would imply again an administration to prevent the algorithm from
visiting the same edge twice during that search! Therefore we introduce an
array c (short for "candidates") and will guarantee that
if i is the number of a half-inspected edge, then either i
occurs among the function values of c, or i = xx.
(Note that function values of c -which will always be edge numbers- may
also equal the number of an edge that, in the meantime, has been totally
inspected.) In view of the fact that zero is the last xx-value to be produced,
it will turn out to be handy to store the value zero at the low end of c upon
initialization (it is as if we introduce a "virtual edge" with number zero;
this is a standard coding trick). Our new version becomes:
Because for "reassign xx" initially edge xx is not half-inspected (for the
face along -xx has just been inspected!) and
abs(set(xx)) = 1 and set(-xx) = 0
is the condition for being half-inspected, the last subalgorithm is coded
quite easily:
"reassign xx" :
do xx -::;t::. 0 and non (abs(set(xx)) = 1 and set(-xx) = 0) __.
xx,c: hipop
od
We are left with the task of refining "adjust the hull'', where the edges of
B, the clockwise boundary of the light cap, are given in two ways:
1. In cyclic order as the function value of the array b: this representation
facilitates tracing the boundary, doing something for all edges of B, etc.
2. set(i) = 2 holds if and only if edge i belongs to B; this facilitates answer-
ing the question "Does edge i belong to B?"
Because the number of edges that have to disappear (i.e. the inner edges
of the light cap) and the number of edges that have to be added (i.e. the
edges connecting the new point with the vertices on the clockwise boundary)
are totally unrelated, it seems best to separate the two activities:
"adjust the hull":
"removal of edges";
"addition of edges"
In the first one it does not seem too attractive to merge identification of
the inner edges with their removal: during the identification process the
light cap of the current hull has to be scanned, and I would rather not mess
with that structure by removing edges before the scanning has been carried
out completely. (This does not mean to imply that it could not be done, it
just says that I am currently no longer in the mood to investigate the possi-
bility.)
Because, on account of our "holes", inner edge i and inner edge - i have
to be removed simultaneously, it suffices for the removal to build up a list of
"undirected" edge numbers, i.e. the value i will be used to indicate that both
edge i and edge -i should disappear. Calling that list rm, we are led to
"removal of edges":
begin glocon b; glovar sue, yh, set; privar rm;
"initialize rm with list of inner edges";
"removal of edges listed in rm"
end
In accordance with our earlier conventions about holes, the second one
is now coded quite easily
"removal of edges listed in rm" :
do rm.dom > 0 __.sue: (rm.high)= yh; yh, rm: hipop od
The initialization of rm is, after our earlier analysis, no longer difficult.
The relation set(i) = set(-i) = 3 will be used to represent "i and -i have
been detected as inner edges" and enables us to test whether edge i belongs
to what we have called V (boundary+ inner edges) by set(i) > 2. Again we
use an array variable, called "e", to list the candidates.
188 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS
"addition of edges":
begin glovar sue, end, start, yh, set; glocon np, b;
privar t, k;
t vir int, k vir int := 0, b.lob;
do k < b.hib -->
begin glovar sue, end, yh, set, t, k; glocon np, b;
pricon e;
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 1 89
4) compute lumen
3) extend b and refresh xx
3) reassign xx
2) adjust the hull
3) removal of edges
4) initialize rm with list of inner edges
4) removal of edges listed in rm
3) addition of edges
whose shortest connection goes through the interior and we could try to
locate the intersection of the convex hull with the plane through the new
point and the two points of such a pair. Our searches would then be linear.
(End of note 2.)
Note 3. In the program as published above, Mark Bebie has found an
error. In order to maintain the convention
"set(i) = 2 and set(-i) = 0 edge i is a processed edge of the
clockwise boundary of the light cap"
the values of set(i) and set(-i) have to be adjusted when edge i is proc-
essed, i.e. b:hiext(i) takes places. In "extend b and refresh xx" this has
been done (in the third line), in "inspect face along -xx", however, it
has erroneously been omitted. Its tenth line
do b.dom = 0 ~ b:hiext(lumen * yy) od
should therefore be replaced by
do b.dom = 0 ~ b:hiext(lumen * yy);
set:(b.high) = 2; set:(-b.high) = 0
od
(End of note 3.)
FINDING THE MAXIMAL
STRONG COMPONENTS
25 IN A DIRECTED GRAPH
Given a directed graph, i.e. a set of vertices and a set of directed edges,
each leading from one vertex to another, it is requested to partition the ver-
tices into so-called "maximal strong components". A strong component is a
set of vertices such that the edges between them provide a directed path from
any vertex of the set to any vertex of the set and vice versa. A single vertex
is a special case of a strong component; then the path can be empty. A maxi-
mal strong component is a strong component to which no further vertices
can be added.
In order to establish this partitioning, we have to be able to make two
kinds of assertions: the assertion that vertices belong to the same strong com-
ponent, but also -because we have to find maximal strong components- the
assertion that vertices do not belong to the same strong component.
For the first type of assertion, we may use the following
192
FINDING THE MAXIMAL STRONG COMPONENTS IN A DIRECTED GRAPH 193
THEOREM 2. If the vertices are subdivided into two sets svA and svB such
that there exist no edges originating in a vertex of svA and terminating in a
vertex of svB, then
firstly: the set of maximal strong components does not depend on the
presence or absence of edges originating in a vertex of svB and
terminating in a vertex of svA, and
secondly: no strong component comprises vertices from both sets.
THEOREM 2A. A strong component whose outgoing edges, if any, are all ingo-
ing edges of maximal strong components is itself a maximal strong compo-
nent.
Or, to put it in another way, once the first maximal strong component
without outgoing edges -the existence of which is guaranteed by Corollary
1- has been found (identified as such by being a strong component without
outgoing edges), the remaining maximal strong components can be found by
solving the problem for the graph consisting of the remaining vertices and
only the given edges between them. Or, to put it in still another way, the
maximal strong components of a graph can be ordered according to "age",
such that each maximal strong component has outgoing edges only to "older"
ones.
In order to be able to be a little bit more precise, we denote by
sv: the given set of vertices (a constant)
se: the given set of edges (a constant)
pv: a partitioning of the vertices of sv.
The final relation to be established can then be written as
R: pv = MSC(se)
in which for the fixed set sv the function MSC, i.e. the partitioning in Maximal
Strong Components, is regarded as a function of the set of edges se.
194 FINDING THE MAXIMAL STRONG COMPONENTS IN A DIRECTED GRAPH
a time", the remaining vertices have to be separated a little bit more subtly,
viz. into two disjoint subsets, sv2 and sv3 say (with sv = svl :::+=: sv2 :::+=: sv3),
where sv3 contains the totally unprocessed vertices,
P2: no edge in sel begins or ends at a vertex in sv3
(sv3 is initially equal to sv and finally empty).
Transfer from sv3 to svl can then take place in two steps: from sv3 to
sv2 (one at a time) and from sv2 to svl (together with all other vertices from
the same definite maximal strong component).
In other words, among the vertices of sv2 we shall try to build up (by
enlarging sel) the next maximal strong component of MSC(se) to be trans-
ferred to svl. The maximal strong components in MSC(seJ) -note the argu-
ment!- are such that they comprise either vertices from svl only, or vertices
from sv2 only, or a (single) vertex from sv3. We propose a limitation on the
connections that the edges of sel provide between the maximal strong com-
ponents in MSC(sel) that contain nodes from sv2 only: between those maxi-
mal strong components the edges of sel shall provide no more and no less
than a single directed path, leading from the "oldest" to the "youngest" one.
We call these maximal strong components "the elements of the chain". This
choice is suggested by the following considerations.
Firstly, we are looking for a cyclic path that would allow us to apply
Theorem 1 or 1A in order to decide that different vertices belong to the same
maximal strong component. Under the assumption that we are free to pre-
scribe which edge will be the next one to be added to sel, there does not
seem to be much advantage in introducing disconnected maximal strong
components in MSC(sel) among those built up from vertices of sv2.
Secondly, the directed path from the "oldest" to the "youngest" com-
ponent in the chain -as "cycle in statu nascendi"- is easily maintained, as
is shown by the following analysis.
Suppose that se2 contains an edge that is outgoing from one of the ver-
tices of the youngest maximal strong component in the chain. Such an edge
"e" is then transferred from se2 to sel, and the state of affairs is easily main-
tained:
3. If e leads to a vertex from sv3, that latter vertex is transferred to sv2 and
as new youngest element (a maximal strong component in MSC(sel) all
by itself) it is appended to the chain, whose length is increased by one.
If there exists no such edge "e", there are two possibilities. Either the
chain is nonempty, but then Theorem 2A tells us that this maximal strong
component of MSC(sel) is a maximal strong component of MSC(se) as well:
the youngest element is removed from the chain and its vertices are trans-
ferred from sv2 to sv I. Or the chain is empty: if sv3 is not empty, an arbitrary
element of sv3 can be transferred to sv2, otherwise the computation is
finished.
In the above degree of detail we can describe our algorithm as follows:
sel, se2, svl, sv2, sv3:= empty, se, empty, empty, sv;
do sv3 #:- empty--> {the chain is empty}
transfer a vertex v from sv3 to sv2 and initialize the chain with {v};
do sv2 #:- empty--> {the chain is nonempty}
do se2 contains an edge starting in a vertex of the youngest
element of the chain -->
transfer such an edge e from se2 to sel;
if e leads to a vertex v in sv I --> skip
0e leads to a vertex v in sv2 --> compaction
0 e leads to a vertex v in sv3 --> extend chain and trans-
fer v from sv3 to sv2
fi
od; {the chain is nonempty}
remove youngest element and transfer its vertices from sv2 to svl
od {the chain is again empty}
od
Note I. As soon as vertices are transferred from sv2 to svl, their incom-
ing edges (if any) that are still in se2 could be transferred simultaneously
from se2 to sel, but the price for this "advanced" processing (the gain of
which is doubtful) is that we have to be able to select for a given vertex
the set of its incoming edges. As the algorithm is described, we only need
to find for each vertex its outgoing edges. Hence the above arrangement.
(End of note I.)
Note 2. Termination of the innermost repetition is guaranteed by de-
crease of the number of edges in se2; termination of the next embracing
repetition is guaranteed by decrease of the number of vertices in sv2 ::t:
sv3; termination of the outer repetition is guaranteed by decrease of the
number of vertices in sv3. The mixed reasoning, sometimes in terms of
FINDING THE MAXIMAL STRONG COMPONENTS IN A DIRECTED GRAPH 197
In the meantime we have somewhat lost trace of the identity of the ver-
tices in the chain. If, for instance, we would like to transfer the vertices of
the youngest element of the chain from sv2 to svl, our current tabulations
would force us to scan the function rank for all values of v, such as to find
those satisfying cc.high < rank(v) < nvc. We would not like to do that,
but thanks to the fact that at least for the vertices in sv2, all values of rank(v)
are different, we can also store the inverse function:
for 1 < r < nvc: rank(v) = r <::> knar(r) = v
So much for keeping track of the vertices; let us now turn our attention
to the edges. The most crucial question with regard to the edges is, of course,
the guard of the innermost repetitive construct: "se2 contains an edge starting
in a vertex of the youngest element of the chain". That guard is evaluated
easily with the aid of a list of edges from se2 outgoing from the vertices of
the youngest element of the chain. One of the ways in which the youngest in
the chain may change, however, is compaction; in order to maintain that list
we, therefore, also need the corresponding lists for the older elements of the
chain. Because for those edges we are interested only in the identity of their
"target vertex", we introduce as the next part of our chain administration
two further array variables -with domain = 0 when the chain is empty-
called "tv" (for "target vertices") and "tvb" (for "tv-bounds").
The domain of tvb will have one point for each element of the chain: its
value equals the number of outgoing edges of se2 from vertices of older ele-
ments in the chain (the domain of tvb is all the time equal to that of cc,
which also stores one value for each chain element). Each time a new vertex
vis transferred from sv3 to sv2, the array tvb is extended at the high end with
the value of tv.dom, whereafter tv is extended at the high end with the target
vertices of the outgoing edges of v. Denoting that latter operation with
"extend tv with target of v" the whole inner repetition now becomes (taking
knar, tv, and tvb into account as well)
"inner loop":
do tv.dom > tvb.high --->
v, tv :hipop;
if rank(v) > 0--->
do cc.high> rank(v)---> cc:hirem; tvb:hirem od
0rank(v) = 0--->
nvc:= nvc + 1; rank:(v) = nvc; knar:hiext(v);
cc :hiext(nvc); tvb :hiext(tv .dom);
"extend tv with targets of v"
fi
od
FINDING THE MAXIMAL STRONG COMPONENTS IN A DIRECTED GRAPH 199
With the variable "strno" (initially = 0), we can now code the
"middle loop":
do cc.dom > 0 -->
"inner loop";
strno: = strno + 1;
do nvc > cc.high -->
nvc:= nvc - l; rank:(knar.high) =NV+ strno;
knar:hirem; svlcount:= svlcount + 1
od;
cc :hirem; tvb :hirem
od
begin glocon edge, edgeb, NV; virvar rank; privar svlcount, cand, strno;
rank vir int array:= (J); do rank.dom-::/= NV--> rank:hiext(O) od;
svlcount vir int, cand vir int, strno vir int:= 0, 1, O;
do svlcount-::/= NV-->
begin glocon edge, edgeb, NV; glovar rank, svlcount, cand, strno;
privar v, cc, tv, tvb, knar, nvc;
do rank(cand)-::/= 0---+ cand:= cand + 1 od; v vir int:= cand;
nvc vir int:= 1; rank :(v) = 1; knar vir int array:= (J, v);
cc vir int array:= (I, 1); tvb vir int array:= (1, 0);
tv vir int array:= (J);
"extend tv with targets of v";
"middle loop"
end
od
end
Note J. A very similar algorithm has been developed independently by
Robert Tarjan. (End of note 1.)
Note 2. In retrospect we see that the variable "nvc" is superfluous,
because nvc = knar.dom. (End of note 2.)
Note 3. The operation "extend tv with the targets of v" is used twice.
(End of note 3.)
Remark 1. The reader will have noticed that in this example the actual
code development took place in a different order than in the development of
the program for the convex hull in three dimensions. The reason is, I think,
the following. In the case of the convex hull, the representation had already
been investigated very carefully as part of the logical analysis of the problem.
In this example the logical analysis had been largely completed when we
faced the task of selecting a representation that would admit an efficient
execution of the algorithm we had in mind. It is then natural to focus one's
attention on the most crucial part first, i.e. the innermost loop.
(End of remark 1.)
Remark 2. It is worth noticing the various steps in which we arrived at
our solution. In. the first stage our main concern has been to process each
edge only once, forgetting for the time being about the dependence of the
computation time on the number of vertices. This is fully correct, because, in
general, the number of edges can be expected to be an order of magnitude
larger than the number of vertices. (As a matter of fact, my first solution for
this problem -not recorded in this chapter- was linear in the number of
edges but quadratic in the number of vertices.) It was only in the second
stage that we started to worry about linear dependence on the number of
vertices as well. How effective this "separation of concerns" has been is strik-
ingly illustrated by the fact that in that second stage graph theory did no
longer enter our considerations at all! (End of remark 2.)
26 ON MANUALS
AND IMPLEMENTATIONS
201
202 ON MANUALS AND IMPLEMENTATIONS
If the HSLM carries the computation out to the end, we can deduce from the
nonoccurrence of the refusal that the embodied SLM has been large enough.
From the above it is clear that explicit refusal by the HSLM, whenever
asked to do something exceeding its capacity, is a vital feature of the HSLM:
it is necessary for our ability of doing the experiment. There exist, regretfully
enough, machines in which the continuous check that the simulation of the
behaviour of the UM is not beyond their capacity is so time-consuming, that
this check is suppressed for the supposed sake of efficiency: whenever the
capacity would be exceeded by a correct execution, they just continue -for
the supposed sake of convenience- incorrectly. It is very difficult to use such
a machine as a reliable tool, for the justification of our belief in the correct-
ness of the answers produced requires in addition to the proof of the pro-
gram's correctness a proof that the computation is not beyond the capacity
of the machine, and, compared to the first one, this second proof is a rather
formidable obligation. We would need an (axiomatic) definition of the pos-
sible happenings in the UM, while up till now it sufficed to prescribe the net
effects; besides that, the precise constraints imposed by the actual machine's
finiteness are often very hard to formulate. We therefore consider such
machines that do not check whether simulating the UM exceeds their capacity
as unfit for use and ignore them in the sequel.
Thanks to its explicit refusal to continue, recognizable as such, the HSLM
is a safe tool, but it would not be a very useful one if it refused too often!
In practice, a programmer does not only want to make a program that would
instruct the UM to produce the desired result, he also wants to reduce the
probability (or even to exclude the possibility) that the HSLM refuses to
simulate the UM. If, for a given HSLM, this desire to reduce the probability
of refusal entails very awkward obligations for the programmer (or, also, if
the programmer has a hard time in estimating how effective measures that he
considers to take will turn out to be) this HSLM is just awkward to use.
od;
printboo/(refuted)
end
Because I have not proved that Goldbach's Conjecture is false, I have not
proved that wp(S, T) is initially true; therefore, the UM may act as it pleases
and I am, therefore, not allowed to conclude that Goldbach's Conjecture is
wrong when it prints "true" and stops. I would be allowed to draw that
surprising conclusion, however, if the third line had been changed into
"do non refuted and n < 1 000 000 -->"
Once the automatic computer was there, it was not only a new tool, it
was also a new challenge and, if the tool was without precedent, so was the
challenge. The challenge was -and still is- a two-fold one.
Firstly we are faced with the challenge of discovering new (desirable)
applications, and this is not easy, because the applications could be as revo-
lutionary as the tool itself. Ask the average computing scientist: "If I were
to put a ten-megabuck machine at your disposal, to be installed for the benefit
of mankind, how and to what problem would you apply it?", and you will
discover that it will take him a long time to come up with a sensible answer.
This is a serious problem that puts great demands on our fantasy and on our
powers of imagination. This challenge is mentioned for the sake of complete-
ness; this monograph does not address it.
Secondly, once an (hopefully desirable!) application has been discovered,
we are faced with the programming task, i.e. with the problem of bending the
general tool to our specific purpose. For the relatively small and slow mach-
ines of the earlier days the programming problem was not too serious, but
when machines at least a thousand times as powerful became generally
available, society's ambition in applying them grew in proportion and the
programming task emerged as an intellectual challenge without precedent.
The latter challenge was the incentive to write this monograph.
On the one hand the mathematical basis of programming is very simple.
Only a finite number of zeros and ones are to be subjected to a finite number
of simple operations, and in a certain sense programming should be trivial.
On the other hand, stores with a capacity of many millions of bits are so
unimaginably huge and processing these bits can now occur at so unimagin-
ably high speeds that the computational processes that may take place -and
209
210 IN RETROSPECT
that, therefore, we are invited to invent- have outgrown the level of triviality
by several orders of magnitude. It is the unique combination of basic sim-
plicity and ultimate sophistication which is characteristic for the programm-
ing task.
We realize what this combination implies when we compare the program-
mer with, say, a surgeon who does an advanced operation. Both should
exercise the utmost care, but the surgeon has fulfilled his obligations in this
respect when he has taken the known precautions and is then allowed to
hope that circumstances outside his control will not ruin his work. Nor is
the surgeon blamed for the incompleteness of his control: the unfathomed
complexity of the human body is an accepted fact of life. But the programmer
can hardly exonerate himself by appealing to the unfathomed complexity
of his program, for the latter is his own construction! With the possibility
of complete control, he also gets the obligation: it is the consequence of the
basic simplicity.
One consequence of the power of modern computers must be mentioned
here. In hierarchical systems, something considered as an undivided, unana-
lyzed entity at one level is considered as something composite at the next
lower level of greater detail; as a result the natural grain of time or space
that is appropriate for each level decreases by an order of magnitude each
time we shift our attention from one level to the next lower one. As a con-
sequence, the maximum number of levels that can be distinguished meaning-
fully in a hierarchical system is more or less proportional to the logarithm
of the ratio between the largest and the smallest grain, and, therefore, we
cannot expect many levels unless this ratio is very large. In computer pro-
gramming our basic building block, the instruction, takes less than a micro-
second, but our program may require hours of computation time. I do not
know of any other technology than programming that is invited to cover a
grain ratio of 10 10 or more. The automatic computer, by virtue of its fan-
tastic speed, was the first to provide an environment with enough "room"
for highly hierarchical artifacts. And in this respect the challenge of the
programming task seems indeed without precedent. For anyone interested
in the human ability to think difficult thoughts (by avoiding unmastered
complexity) the programming task provides an ideal proving ground.
at least without becoming barren. But not any odd collection of scraps of
knowledge and an equally odd collection of skills, even of the right size,
constitute a viable scientific discipline! There are two other requirements.
The internal requirement is one of coherence: the skills must be able to
improve the knowledge and the knowledge must be able to refine the skills.
And finally there is the external requirement -we would call it "a narrow
interface"- that the subject matter can be studied in a reasonably high degree
of isolation, not at any moment critically dependent on developments in other
areas.
The analogy is not only useful to explain "modularization" to the layman,
conversely it gives us a clue as to how we should try to arrange our thoughts
when programming. When programming we are faced with similar problems
of size and diversity. (Even when programming at the best of our ability, we
can sometimes not avoid that program texts become so long that their sheer
length causes (for instance, clerical) problems. The possible computations
may be so long or so varied that we have difficulty in imagining them. We
may have conflicting goals such as high throughput and short reaction times,
etc.) But we cannot solve them by just splitting the program to be made into
"modules".
To my taste the main characteristic of intelligent thinking is that one
is willing and able to study in depth an aspect of one's subject matter in
isolation, for the sake of its own consistency, all the time knowing that one
is occupying oneself with only one of the aspects. The other aspects have to
wait their turn, because our heads are so small that we cannot deal with
them simultaneously without getting confused. This is what I mean by
"focussing one's attention upon a certain aspect"; it does not mean com-
pletely ignoring the other ones, but temporarily forgetting them to the extent
that they are irrelevant for the current topic. Such separation, even if not
perfectly possible, is yet the only available technique for effective ordering
of one's thoughts that I know of.
I usually refer to it as "a separation of concerns", because one tries to
deal with the difficulties, the obligations, the desires, and the constraints
one by one. When this can be achieved successfully, we have more or less
partitioned the reasoning that had to be done -and this partitioning may
find its reflection in the resulting partitioning of the program into "modules"
- but I would like to point out that this partitioning of the reasoning to be
done is only the result, and not the purpose. The purpose of thinking is to
reduce the detailed reasoning needed to a doable amount, and a separation
of concerns is the way in which we hope to achieve this reduction.
The crucial choice is, of course, what aspects to study "in isolation",
how to disentangle the original amorphous knot of obligations, constraints,
and goals into a set of "concerns" that admit a reasonably effective separa-
tion. To arrive at a successful separation of concerns for a new, difficult
212 IN RETROSPECT
problem area will nearly always take a long time of hard work; it seems
unrealistic to expect it to be otherwise. But even without five rules of thumb
for doing so (after all, we are not writing a brochure on "How to Think Big
Thoughts in Ten Easy Lessons"), the knowledge of the goal of "separation
of concerns" is a useful one: we are at least beginning to understand what
we are aiming at.
Not that we don't have a rule of thumb! It says: don't lump concerns
together that were perfectly separated to start with! This rule was applied
before we started this monograph. The original worry was that we would
end up with unreliable systems that either would produce the wrong result
that could be taken for the correct one, or would even fail to function at all.
If such a system consists of a combination of hardware and software, then,
ideally, the software would be correct and the hardware would function
flawlessly and the system's performance would be perfect. If it does not,
either the software is wrong or the hardware has malfunctioned, or both.
These two different sources of errors may have nearly identical effects: if,
due to a transient error, an instruction in store 'has been corrupted or if,
due to a permanent malfunctioning, a certain instruction is permanently
misinterpreted, the net effect is very similar to that of a program bug. Yet
the origins of these two failures are very different. Even a perfect piece of
hardware, because it is subject to wear and tear, needs maintenance; software
either needs correction, but then it has been wrong from the beginning, or
modification because, at second thought, we want a different program. Our
rule of thumb tells us not to mix the two concerns. On the one hand we may
ponder about increasing the confidence level of our programs (as it were,
under the assumption of execution by a perfect machine). On the other hand
we may think about execution by not fully reliable machines, but during that
stage of our investigations we had better assume our programs to be perfect.
This monograph deals with the first of the two concerns.
In this case, our rule of thumb seems to have been valid: without the
separation of hardware and software concerns, we would have been forced
to a statistical approach, probably using the concept MTBF ( = "Mean Time
Between Failures", where "Mean Time Between Manifested Errors" would
have been more truthful), and the theory described in this monograph could
never have been developed.
Before embarking upon this monograph, a further separation of concerns
was carried through. I quote from a letter from one of my colleagues:
"There is a third concern in programming: after the preparation of "the pro-
gram text as a static, rather formal, mathematical object'', and after the
engineering considerations of the computational processes intended to be
evoked by it under a specific implementation, I personally find hardest actually
achieving this execution: converting the human-readable text, with its slips
which are not seen by the eye which "sees what it wishes to see", into machine-
IN RETROSPECT 213
readable text, and then achieving the elusive confidence that nothing has been
lost during this conversion."
(From the fact that my colleague calls the third concern the "hardest" we
may conclude that he is a very competent programmer; also an honest one!
I can add the perhaps irrelevant information that his handwriting is, however,
rather poor.) This third concern is not dealt with in this monograph, not
because it is of no importance, but because it can (and, therefore, should)
be separated from the others, and is dealt with by very different, specific
precautions (proof reading, duplication, triplication, or other forms of
redundancy). I mentioned this third concern because I found another col-
league -he is an engineer by training- so utterly obsessed by it that he
could not bring himself to consider the other two concerns in isolation from
it and, consequently, dismissed the whole idea of proving a program to be
correct as irrelevant. We should be aware of the fact, independent of whether
we try to explain or understand the phenomenon, that the act of separating
concerns tends to evoke resistance, often voiced by the remark that "one is
not solving the real problems". This resistance is by no means confined to
pragmatic engineers, as is shown by Bertrand Russell's verdict: "The advan-
tages of the method of postulation are great; they are the same as the advan-
tages of theft over honest toil.".
The next separations of concerns are carried through in the book itself:
it is the separation between the mathematical concerns about correctness
and the engineering concerns about execution. And we have carried this
separation through to the extent that we have given an axiomatic definition
of the semantics of our programming languages which allows us, if we so
desire, to ignore the possibility of execution. This is done in the book itself
for the simple reason that, historically speaking, this separation has not been
suggested by our rule of thumb; the operational approach, characterized by
"The semantics itself is given by an interpreter that describes how the state
vector changes as the computation progresses." (John McCarthy, 1965) was
the predominant one during most of the sixties, from which R.W. Floyd
(1967) and C.A.R. Hoare (1969) were among the first to depart.
Such a separation takes much more time, for even after having the inkling
that it might be possible and desirable, there are many ways in which one
can go. Depending on one's temperament, one's capacities, and one's evalu-
ation of the difficulties ahead, one can either be very ambitious and tackle the
problem for as universal a programming language as possible, or one can
be cautious and search consciously for the most effective constraints. I have
clearly opted for the second alternative, and not including procedures (sec,
or also as parameters or even as results) seemed an effective simplification,
so drastic, as a matter of fact, that some of my readers may lose interest
in the "trivial" stuff that remains.
214 IN RETROSPECT
How does one settle them? The fact that the derivation of the weakest
pre-conditions instead of strongest post-conditions seemed to give a smoother
formalism may be obvious to others, I had to discover it by trying both.
When starting from the desired post-condition seemed more convenient,
that settled the matter in my mind, as it also seemed to do more justice to
the fact that programming is a goal-directed activity.
The decision to concentrate on just pre-conditions rather than liberal
pre-conditions took longer. I wished to do so, because as long as predicate
transformers deriving weakest liberal pre-conditions are the only carrier for
our definition of the semantics, we shall never be able to guarantee termina-
tion: such a system seemed too weak to be attractive. The matter was settled
by the possibility of defining the wp(DO, R) in terms of the wp(IF, R).
The decision to incorporate nondeterminacy was only taken gradually.
After the analogy between synchronizing conditions in multiprogramming
and the sequencing conditions in sequential programming had suggested the
guarded command sets and had prepared me for the inclusion of nondeter-
minacy in sequential programs as well, my growing dislike for the asymme-
tric "if B then SJ else S2 fi", which treats S2 as the default -and defaults I
have learned to mistrust- did the rest. The symmetry and elegance of
if x > y --> m: = x a y > x- m: = y fi
and the fact that I could derive this program systematically settled this
question.
For one day -and this was a direct consequence of my experience with
multiprogramming, where "individual starvation" is usually to be avoided-
! thought it wise to postulate that the daemon should select "in fair random
order", i.e. without permanent neglect of one of the permissible alternatives.
This fair random order was postulated at the stage when I had only given an
operational description of how I thought to implement the repetitive con-
struct. The next day, when I considered a formal definition of its semantics,
I saw my mistake and the daemon was declared to be totally erratic.
In short, of course after the necessary exploratory experiments, ques-
tions (J) through (4) have mainly been settled by the same yardstick: formal
simplicity.
IN RETROSPECT 215
My interest in formal correctness proofs was, and mainly still is, a derived
one. I had witnessed many discussions about programming languages and
programming style that were depressingly inconclusive. The cause of the
difficulty to come to a consensus was the absence of a few effective yardsticks
in whose relevance we could all believe. (Too much we tried to settle in the
name of convenience for the user, but too often we confused "convenient"
with "conventional", and that latter criterion is too much dependent on each
person's own past.) During that muddle, the suggestion that what we called
"elegant" was nearly always what admitted a nice, short proof came as a
gift from heaven; it was immediately accepted as a reasonable hypothesis
and its effectiveness made it into a cherished criterion. And, above all, length
of a formal proof i~ an objective criterion: this objectivity has probably been
more effective in reaching a comfortable consensus than anything else, cer-
tainly more effective than eloquence could ever have been. The primary
interest was not in formal correctness proofs, but in a discipline that would
assist us in keeping our programs intelligible, understandable, and intellec-
tually manageable.
I have dealt with the examples in different degrees of formality. This
variation was intended, as I would not like to give my readers the impression
that a certain, fixed degree of formality is "the right one". I prefer to view
formal methods as tools, the use of which might be helpful.
I have tried to present programming rather as a discipline than as a
craft. Since centuries we know two main techniques for transmitting knowl-
edge and skills to the next generation. The one technique is characteristic for
the guilds: the young apprentice works for seven years with a master, all
knowledge is transferred implicitly, the apprentice absorbs, by osmosis so
to speak, until he may call himself a master too. (This implicit transfer makes
the knowledge vulnerable: old crafts have been lost!) The other technique
has been promoted by the universities, whose rise coincided (not accidentally!)
with the rise of the printing press; here we try to formulate our knowledge
and, by doing so, try to bring it into the public domain. (Our actual teaching
at the universities often occupies an in-between position: in mathematics,
for instance, mathematical results are published and taught quite explicitly,
the teaching of how to do mathematics is often largely left to the osmosis,
not necessarily because we are unwilling to be more explicit, but because we
feel ourselves unable to teach the "how" above the level of motherhood
statements.)
While dealing with the examples I have been as explicit as I could
(although, of course, I have not always been able to buffer the shock of
invention); the examples were no more than a vehicle for that goal of explicit-
ness.
We have formulated a number of theorems about alternative and repeti-
tive constructs. That was the easy part, as it concerns knowledge. With the
216 IN RETROSPECT
aid of examples we have tried to show how a conscious effort to apply this
knowledge can assist the programming process, and that was the hard part,
for it concerns skill. (I am thinking, for instance, of the way in which the
knowledge of the Linear Search Theorem assisted us in solving the problem
of the next permutation.) We have tried to make a few strategies explicit,
such as the Search for the Small Superset, and a few techniques for "massag-
ing" programs, such as bringing a relation outside a repetitive construct. But
these are techniques that are rather closely tied to (our form of) programming.
Between the lines the reader may have caught a few more general mes-
sages. The first message is that it does not suffice to design a mechanism of
which we hope that it will meet its requirements, but that we must design it
in such a form that we can convince ourselves -and anyone else for that
matter- that it will, indeed, meet its requirements. And, therefore, instead
of first designing the program and then trying to prove its correctness, we
develop correctness proof and program hand in hand. (In actual fact, the
correctness proof is developed slightly ahead of the program: after having
chosen the form of the correctness proof we make the program so that it
satisfies the proof's requirements.) This, when carried out successfully,
implies that the design remains "intellectually manageable". The second
message is that, if this constructive approach to the problem of program
correctness is to be our plan, we had better see to it that the intellectual
labour involved does not exceed our limited powers, and quite a few design
decisions fell under that heading. In the problem of the Dutch national flag,
for instance, we have been warned for the case analysis in which the number
of cases to be distinguished between is built up multiplicatively: as soon as
we admit that, we are quickly faced with a case analysis exceeding our abilities.
In the problem of the shortest subspanning tree, we have seen how a restric-
tion of the class of admissible intermediate states (here, the "red" branches
always forming a tree) could simplify the analysis considerably. But most
helpful of all -it can be regarded as a separation of concerns- has been the
stepwise approach, in which we try to deal with our various objectives one
after the other. In the problem of the shortest subspanning tree, we found by
the time that we started to worry about computation time, the N 2 -algorithm
as an improvement of the N 3 -algorithm. In the problem of the maximal
strong components, we first found an algorithm linear in the number of edges,
and only the next refinement guaranteed a fixed maximum amount of process-
ing per vertex as well. In the problem of the most isolated villages, our crude
solution was independently subjected to two very different optimizations,
and, after they had been established, it was not difficult to combine them.
ask why I have written this book in the first place; ifl answer "Yes" to this
question, I would make a fool of myself, and the only answer left to me is
"Up to a point ... ". It seems vain to hope -to put it mildly- that a book
could be written that we could give to young people, saying "Read this, and
afterwards you will be able to think effectively", and replacing the book by
a beautiful, interactive system for Computer-Aided Instruction ("CAI" for
the intimi) will not make this hope less vain.
But insofar as people try to understand (at first subconsciously), strive
after clarity, and attempt to avoid unmastered complexity, I believe in the
possibility of assisting them significantly by making them aware of the
human inability "to talk of many things" (at any one moment, at least), by
making them alert to how complexity is introduced. To the extent that a
professor of music at a conservatoire can assist his students in becoming
familiar with the patterns of harmony and rhythm, and with how they
combine, it must be possible to assist students in becoming sensitive to
patterns of reasoning and to how they combine. The analogy is not far-fetched
at all: a clear argument can make one catch one's breath, like a Mozart
adagio can.