100% found this document useful (2 votes)
610 views232 pages

A Discipline of Programming - Edsger Dijkstra PDF

Uploaded by

watson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
610 views232 pages

A Discipline of Programming - Edsger Dijkstra PDF

Uploaded by

watson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 232

Prentice- H all Series in A utomat ic Co m putat i on

a
disci,line
0 .
proan1mm1na
edsger
w.
dijkstn1
' ' For a long time I have wanted to write a
book somewhat along the lines of this one: on
the one hand I knew that programs could have
a compelling and deep logical beauty, on the
other hand I was forced to admit that most
programs are presented in a way fit for mechan-
ical execution but, even if of any beauty at all,
totally unfit for human appreciation. ' '
A DISCIPLINE
OF PROGRAMMING

EDSGER W. DIJKSTRA
Burroughs Research Fellow,
Professor Extraordinarius,
Technological University, Eindhoven

PRENTICE-HALL, INC.

ENGLEWOOD CLIFFS, N.J.


Library of Congress Cataloging in Publication Data
Dijkstra, Edsger Wybe.
A discipline of programming.
1. Electronic digital computers-Programming.
I. Title.
QA76.6.D54 001.6'42 75-40478
ISBN 0-13-215871-X

© 1976 by Prentice-Hall, Inc.


Englewood Cliffs, New Jersey

All rights reserved. No part of this book


may be reproduced in any form or by any means
without permission in writing
from the publisher.

10

Printed in the United States of America

PRENTICE-HALL INTERNATIONAL, INC., London


PRENTICE-HALL OF AUSTRALIA PTY. LIMITED, Sydney
PRENTICE-HALL OF CANADA, LTD., Toronto
PRENTICE-HALL OF INDIA PRIVATE LIMITED, New Delhi
PRENTICE-HALL OF JAPAN, INC., Tokyo
PRENTICE-HALL OF SOUTHEAST ASIA PTE. LTD., Singapore
CONTENTS

FOREWORD ix

PREFACE xiii

0 EXECUTIONAL ABSTRACTION

1 THE ROLE OF PROGRAMMING LANGUAGES 7

2 STATES AND THEIR CHARACTERIZATION IO

3 THE CHARACTERIZATION OF SEMANTICS 15

THE SEMANTIC CHARACTERIZATION OF A


4 PROGRAMMING LANGUAGE 24

5 TWO THEOREMS 37

ON THE DESIGN OF PROPERLY TERMINATING


6 CONSTRUCTS 41

7 EUCLID'S ALGORITHM REVISITED 45

THE FORMAL TREATMENT OF SOME SMALL


8 EXAMPLES 51

vii
viii CONTENTS

9 ON NONDETERMINACY BEING BOUNDED 72

AN ESSAY ON THE NOTION: "THE SCOPE


10 OF VARIABLES" 79

11 ARRAY VARIABLES 94

12 THE LINEAR SEARCH THEOREM 105

13 THE PROBLEM OF THE NEXT PERMUTATION 107

THE PROBLEM OF THE DUTCH NATIONAL


14 FLAG 111

15 UPDATING A SEQUENTIAL FILE 117

16 MERGING PROBLEMS REVISITED 123

AN EXERCISE ATTRIBUTED TO
17 R. W. HAMMING 129

18 THE PATTERN MATCHING PROBLEM 135

WRITING A NUMBER AS THE SUM OF TWO


19 SQUARES 140

THE PROBLEM OF THE SMALLEST PRIME


20 FACTOR OF A LARGE NUMBER 143

THE PROBLEM OF THE MOST ISOLATED


21 VILLAGES 149

THE PROBLEM OF THE SHORTEST


22 SUBSPANNING TREE 154
REM'S ALGORITHM FOR THE RECORDING
23 OF EQUIVALENCE CLASSES 161

THE PROBLEM OF THE CONVEX HULL


24 IN THREE DIMENSIONS 168

FINDING THE MAXIMAL STRONG COMPONENTS


25 IN A DIRECTED GRAPH 192

26 ON MANUALS AND IMPLEMENTATIONS 201

27 IN RETROSPECT 209

ix
FOREWORD

In the older intellectual disciplines of poetry, music, art, and science, histo-
rians pay tribute to those outstanding practitioners, whose achievements have
widened the experience and understanding of their admirers, and have
inspired and enhanced the talents of their imitators. Their innovations are
based on superb skill in the practice of their craft, combined with an acute
insight into the underlying principles. In many 'cases, their influence is en-
hanced by their breadth of culture and the power and lucidity of their expres-
sion.
This book expounds, in its author's usual cultured style, his radical new
insights into the nature of computer programming. From these insights, he
has developed a new range of programming methods and notational tools,
which are displayed and tested in a host of elegant and efficient examples.
This will surely be recognised as one of the outstanding achievements in the
development of the intellectual discipline of computer programming.

C.A.R. HOARE

xi
PREFACE

For a long time I have wanted to write a book somewhat along the lines of
this one: on the one hand I knew that programs could have a compelling and
deep logical beauty, on the other hand I was forced to admit that most pro-
grams are presented in a way fit for mechanical execution but, even if of any
beauty at all, totally unfit for human appreciation. A second reason for dis-
satisfaction was that algorithms are often published in the form of finished
products, while the majority of the considerations that had played their role
during the design process and should justify the eventual shape of the finished
program were often hardly mentioned. My original idea was to publish a
number of beautiful algorithms in such a way that the reader could appreciate
their beauty, and I envisaged doing so by describing the -real or imagined-
design process that would each time lead to the program concerned. I have
remained true to my original intention in the sense that the long sequence of
chapters, in each of which a new problem is tackled and solved, is still the
core of this monograph; on the other hand the final book is quite different
from what I had foreseen, for the self-imposed task to present these solutions
in a natural and convincing manner has been responsible for so much more,
that I shall remain grateful forever for having undertaken it.
When starting on a book like this, one is immediately faced with the
question: "Which programming language am I going to use?'', and this is not
a mere question of presentation! A most important, but also a most elusive,
aspect of any tool is its influence on the habits of those who train themselves
in its use. If the tool is a programming language, this influence is -whether
we like it or not- an influence on our thinking habits. Having analyzed that
influence to the best of my knowledge, I had come to the conclusion that
none of the existing programming languages, nor a subset of them, would
suit my purpose; on the other hand I knew myself so unready for the design

xiii
xiv PREFACE

of a new programming language that I had taken a vow not to do so for the
next five years, and I had a most distinct feeling that that period had not
yet elapsed! (Prior to that, among many other things, this monograph had to
be written.) I have tried to resolve this conflict by only designing a mini-
Ianguage suitable for my purposes, by making only those commitments
that seemed unavoidable and sufficiently justified.
This hesitation and self-imposed restriction, when ill-understood, may
make this monograph disappointing for many of its potential readers. It will
certainly leave all those dissatisfied who identify the difficulty of program-
ming with the difficulty of cunning exploitation of the elaborate and baroque
tools known as "higher level programming languages" or -worse!- "pro-
gramming systems". When they feel cheated because I just ignore all those
bells and whistles, I can only answer: "Are you quite sure that all those bells
and whistles, all those wonderful facilities of your so-called "powerful" pro-
gramming languages belong to the solution set rather than to the problem
set?". I can only hope that, in spite of my usage of a mini-language, they will
study my text; after having done so, they may agree that, even without the
bells and the whistles, so rich a subject remains that it is questionable whether
the majority of the bells and the whistles should have been introduced in the
first place. And to all readers with a pronounced interest in the design of pro-
gramming languages, I can only express my regret that, as yet, I do not feel
able to be much more explicit on that subject; on the other hand I hope that,
for the time being, this monograph will inspire them and will enable them
to avoid some of the mistakes they might have made without having read it.

During the act of writing -which was a continuous source of surprise


and excitement- a text emerged that was rather different from what I had
originally in mind. I started with the (understandable) intention to present
my program developments with a little bit more formal apparatus than I used
to use in my (introductory) lectures, in which semantics used to be introduced
intuitively and correctness demonstrations were the usual mixture of rigorous
arguments, handwaving, and eloquence. In laying the necessary foundations
for such a more formal approach, I had two surprises. The first surprise was
that the so-called "predicate transformers" that I had chosen as my vehicle
provided a means for directly defining a relation between initial and final
state, without any reference to intermediate states as may occur during pro-
gram execution. I was very grateful for that, as it affords a clear separation
between two of the programmer's major concerns, the mathematical correct-
ness concerns (viz. whether the program defines the proper relation between
initial and final state-and the predicate transformers give us a formal tool
for that investigation without bringing computational processes into the pic-
ture) and the engineering concerns about efficiency (of which it is now clear
that they are only defined in relation to an implementation). It turned out to
PREFACE XV

be a most helpful discovery that the same program text always admits two
rather complementary interpretations, the interpretation as a code for a
predicate transformer, which seems the more suitable one for us, versus the
interpretation as executable code, an interpretation I prefer to leave to the
machines! The second surprise was that the most natural and systematic
"codes for predicate transformers" that I could think of would call for non-
deterministic implementations when regarded as "executable code". For a
while I shuddered at the thought of introducing nondeterminacy already in
uniprogramming (the complications it has caused in multiprogramming were
only too well known to me!), until I realized that the text interpretation as
code for a predicate transformer has its own, independent right of existence.
(And in retrospect we may observe that many of the problems multiprogram-
ming has posed in the past are nothing else but the consequence of a prior
tendency to attach undue significance to determinacy.) Eventually I came to
regard nondeterminacy as the normal situation, determinacy being reduced
to a -not even very interesting- special case.
After having laid the foundations, I started with what I had intended to
do all the time, viz. solve a long sequence of problems. To do so was an
unexpected pleasure. I experienced that the formal apparatus gave me a much
firmer grip on what I was doing than I was used to; I had the pleasure of
discovering that explicit concerns about termination can be of great heuristic
value-to the extent that I came to regret the strong bias towards partial
correctness that is still so common. The greatest pleasure, however, was that
for the majority of the problems that I had solved before, this time I ended
up with a more beautiful solution! This was very encouraging, for I took it
as an indication that the methods developed had, indeed, improved my pro-
gramming ability.

How should this monograph be studied? The best advice I can give is to
stop reading as soon as a problem has been described and to try to solve it
yourself before reading on. Trying to solve the problem on your own seems
the only way in which you can assess how difficult the problem is; it gives
you the opportunity to compare your own solution with mine; and it may
give you the satisfaction of having discovered yourself a solution that is
superior to mine. And, by way of a priori reassurance: be not depressed when
you find the text far from easy reading! Those who have studied the manu-
script found it quite often difficult (but equally rewarding!); each time, how-
ever, that we analyzed their difficulties, we came together to the conclusion
that not the text (i.e. the way of presentation), but the subject matter itself
was "to blame". The moral of the story can only be that a nontrivial algorithm
is just nontrivial, and that its final description in a programming language is
highly compact compared to the considerations that justify its design: the
shortness of the final text should not mislead us! One of my assistants made
the suggestion -which I faithfully transmit, as it could be a valuable one-
xvi PREFACE

that little groups of students should study it together. (Here I must add a
parenthetical remark about the "difficulty" of the text. After having devoted
a considerable number of years of my scientific life to clarifying the pro-
grammer's task, with the aim of making it intellectually better manageable,
I found this effort at clarification to my amazement (and annoyance) repeat-
edly rewarded by the accusation that "I had made programming difficult".
But the difficulty has always been there, and only by making it visible can we
hope to become able to design programs with a high confidence level, rather
than "smearing code", i.e., producing texts with the status of hardly sup-
ported conjectures that wait to be killed by the first counterexample. None
of the programs in this monograph, needless to say, has been tested on a
machine.)

I owe the reader an explanation why I have kept the mini-language so


small that it does not even contain procedures and recursion. As each next
language extension would have added a few more chapters to the book and,
therefore, would have made it correspondingly more expensive, the absence
of most possible extensions (such as, for instance, multiprogramming) needs
no further justification. Procedures, however, have always occupied such a
central position and recursion has been for computing science so much the
hallmark of academic respectability, that some explanation is due.
First of all, this monograph has not been written for the novice and, con-
sequently, I expect my readers to be familiar with these concepts. Secondly,
this book is not an introductory text on a specific programming language
and the absence of these constructs and examples of their use should there-
fore not be interpreted as my inability or unwillingness to use them, nor as
a suggestion that anyone else who can use them well should not do so. The
point is that I felt no need for them in order to get my message across, viz.
how a carefully chosen separation of concerns is essential for the design of
in all respects, high-quality programs: the modest tools of the mini-language
gave us already more than enough latitude for nontrivial, yet very satisfactory
designs.
The above explanation, although sufficient, is, however, not the full story.
In any case I felt obliged to present repetition as a construct in its own right,
as such a presentation seemed to me overdue. When programming languages
emerged, the "dynamic" nature of the assignment statement did not seem to
fit too well into the "static" nature of traditional mathematics. For lack of
adequate theory mathematicians did not feel too easy about it, and, because
it is the repetitive construct that creates the need for assignment to variables,
mathematicians did not feel too easy about repetition either. When pro-
gramming languages without assignments and without repetition -such as
pure LISP- were developed, many felt greatly relieved. They were back on
the familiar grounds and saw a glimmer of hope of making programming an
activity with a firm and respectable mathematical basis. (Up to this very day
PREFACE xvii

there is among the more theoretically inclined computing scientists still a


widespread feeling that recursive programs "come more naturally" than repe-
titive ones.)
For the alternative way out, viz. providing the couple "repetition" and
"assignment to a variable" with a sound and workable mathematical basis,
we had to wait another ten years. The outcome, as is demonstrated in this
monograph, has been that the semantics of a repetitive construct can be
defined in terms of a recurrence relation between predicates, whereas the
semantic definition of general recursion requires a recurrence relation be-
tween predicate transformers. This shows quite clearly why I regard general
recursion as an order of magnitude more complicated than just repetition,
and it therefore hurts me to see the semantics of the repetitive construct
"while B do S"
defined as that of the call
"whiledo(B, S)"
of the recursive procedure (described in ALGOL 60 syntax):
procedure whi/edo (condition, statement);
begin if condition then begin statement;
whiledo (condition, statement) end
end
Although correct, it hurts me, for I don't like to crack an egg with a
sledgehammer, no matter how effective the sledgehammer is for doing so.
For the generation of theoretical computing scientists that became involved
in the subject during the sixties, the above recursive definition is often not
only "the natural one'', but even "the true one". In view of the fact that we
cannot even define what a Turing machine is supposed to do without appeal-
ing to the notion of repetition, some redressing of the balance seemed indi-
cated.
For the absence of a bibliography I offer neither explanation nor apology.
Acknowledgements. The following people have had a direct influence on this
book, either by their willingness to discuss its intended contents or by com-
menting on (parts of) the finished manuscript: C. Bron, R.M. Burstall,
W.H.J. Feijen, C.A.R. Hoare, D.E. Knuth, M. Rem, J.C. Reynolds,
D.T. Ross, C.S. Scholten, G. Seegmiiller, N. Wirth and M. Woodger. It is
a privilege to be able to express in print my gratitude for their cooperation.
Furthermore I am greatly indebted to Burroughs Corporation for providing
me with the opportunity and necessary facilities, and thankful to my wife for
her unfailing support and encouragement.
EDSGER W. DIJKSTRA
Nuenen,
The Netherlands
EXECUTIONAL ABSTRACTION
0

Executional abstraction is so basic to the whole notion of "an algorithm"


that it is usually taken for granted and left unmentioned. Its purpose is to
map different computations upon each other. Or, to put it in another way, it
refers to the way in which we can get a specific computation within our
intellectual grip by considering it as a member of a large class of different
computations; we can then abstract from the mutual differences between the
members of that class and, based on the definition of the class as a whole,
make assertions applicable to each of its members and therefore also to the
specific computation we wanted to consider.
In order to drive home what we mean by "a computation" let me just
describe a noncomputational mechanism "producing" -intentionally I avoid
the term "computing"- say, the greatest common divisor of I 11 and 259. It
consists of two pieces of cardboard, placed on top of each other. The top one
displays the text "GCD(J 11, 259)="; in order to let the mechanism produce
the answer, we pick up the top one and place it to the left of the bottom one,
on which we can now read the text "37".
The simplicity of the cardboard mechanism is a great virtue, but it is
overshadowed by two drawbacks, a minor one and a major one. The minor
one is that the mechanism can, indeed, be used for producing the greatest
common divisor of Ill and 259, but for very little else. The major drawback,
however, is that, no matter how carefully we inspect the construction of the
mechanism, our confidence that it produces the correct answer can only be
based on our faith in the manufacturer: he may have made an error, either in
the design of his machine or in the production of our particular copy.
In order to overcome our minor objection we could consider on a huge
piece of cardboard a large rectangular array of the grid points with the
2 EXECUTIONAL ABSTRACTION

integer coordinates x and y, satisfying 0 < x < 500 and 0 < y < 500. For
all the points (x, y) with positive coordinates only, i.e. excluding the points
on the axes, we can write down at that position the value of GCD(x, y); we
propose a two-dimensional table with 250,000 entries. From the point of view
of usefulness, this is a great improvement: instead of a mechanism able to
supply the greatest common divisor for a single pair of numbers, we now
have a "mechanism" able to supply the greatest common divisor for any
pair of the 250,000 different pairs of numbers. Great, but we should not get
too excited, for what we identified as our second drawback -"Why should
we believe that the mechanism produces the correct answer?"- has been
multiplied by that same factor of 250,000: we now have to have a tremendous
faith in the manufacturer!
So let us consider a next mechanism. On the same cardboard with the
grid points, the only numbers written on it are the values I through 500
along both axes. Furthermore the following straight lines are drawn:

1. the vertical lines (with the equation x =constant);


2. the horizontal lines (with the equation y =constant);
3. the diagonals (with the equation x + y = constant);
4. the "answer line" with the equation x = y.

In order to operate this machine, we have to follow the following instruc-


tions ("play the game with the following rules"). When we wish to find the
greatest common divisor of two values X and Y, we place a pebble -also
provided by the manufacturer- on the grid point with the coordinates x = X
and y = Y. As long as the pebble is not lying on the "answer line'', we
consider the smallest equilateral rectangular triangle with its right angle
coinciding with the pebble and one sharp angle (either under or to the left of
the pebble) on one of the axes. (Because the pebble is not on the answer line,
this smallest triangle will have only one sharp angle on an axis.) The pebble
is then moved to the grid point coinciding with the other sharp angle of the
triangle. The above move is repeated as long as the pebble has not yet arrived
on the answer line. When it has, the x-coordinate (or they-coordinate) of the
final pebble position is the desired answer.
What is involved when we wish to- convince ourselves that this machine
will produce the correct answer? If (x, y) is any of the 249,500 points not on
the answer line and (x', y') is the point to which the pebble will then be moved
by one step of the game, then either x' = x and y' = y - x or x' = x - y
and y' = y. It is not difficult to prove that GCD(x, y) = GCD(x', y'). The
important point here is that the same argument applies equally well to each
of the 249,500 possible steps! Secondly-and again it is not difficult- we can
prove for any point (x, y) where x = y (i.e. such that (x, y) is one of the 500
points on the answer line) that GCD(x, y) = x. Again the important point
EXECUTIONAL ABSTRACTION 3

is that the same argument is applicable to each of the 500 points of the answer
line. Thirdly -and again this is not difficult- we have to show that for any
initial position (X, Y) a finite number of steps will indeed bring the pebble on
the answer line, and again the important observation is that the same argu-
ment is equally well applicable to any of the 250,000 initial positions (X, Y).
Three simple arguments, whose length is independent of the number of grid
points: that, in a nutshell, shows what mathematics can do for us! Denoting
with (x, y) any of the pebble positions during a game started at position
(X, Y), our first theorem allows us to conclude that during the game the
relation
GCD(x, y) = GCD(X, Y)

will always hold or -as the jargon says- "is kept invariant". The second
theorem then tells us that we may interpret the x-coordinate of the final
pebble position as the desired answer and the third theorem tells us that the
final position exists (i.e. will be reached in a finite number of steps). And this
concludes the analysis of what we could call "our abstract machine".
Our next duty is to verify that the board as supplied by the manufacturer
is, indeed, a fair model. For this purpose we have to check the numbering
along both axes and we have to check that all the straight lines have been
drawn correctly. This is slightly awkward as we have to investigate a number
of objects proportional to N if N (in our example 500) is the length of the
side of the square, but it is always better than N 2 , the number of possible
computations.
An alternative machine would not work with a huge cardboard but with
two nine-bit registers, each capable of storing a binary number between 0
and 500. We could then use one register to store the value of the x-coordinate
and the other to store the value of the y-coordinate as they correspond to
"the current pebble position". A move then corresponds to decreasing the
contents of one register by the contents of the other. We could do the arith-
metic ourselves, but of course it is better if the machine could do that for us.
Ifwe then want to believe the answer, we should be able to convince ourselves
that the machine compares and subtracts correctly. On a smaller scale the
history repeats itself: we derive once and for all, i.e. for any pair of n-digit
binary numbers, the equations for the binary subtractor and then satisfy
ourselves that the physical machine is a fair model of this binary subtractor.
If it is a parallel subtractor, the number of verifications -proportional
to the number of components and their interactions- is proportional to
n = !og2 N. In a serial machine the trading of time against equipment is
carried still one step further.

Let me try, at least for my own illumination, to capture the highlights of


our preceding example.
4 EXECUTIONAL ABSTRACTION

Instead of considering the single problem of how to compute the


GCD(l 11, 259), we have generalized the problem and have regarded this as
a specific instance of the wider class of problems of how to compute the
GCD(X, Y). It is worthwhile to point out that we could have generalized the
problem of computing GCD(l 11, 259) in a different way: we could have
regarded the task as a specific instance of a wider class of tasks, such as
the computation of GCD(lll, 259), SCM(lll, 259), Ill* 259, Ill+ 259,
11 I/259, Ill - 259, 11 l2 59 , the day of the week of the 111th day of the 259th
year B.c., etc. This would have given rise to a "1 JJ-and-259-processor" and
in order to let that produce the originally desired answer, we should have had
to give the request "GCD, please" as its input! We have proposed a "GCD-
computer" instead, that should be given the number pair "I 1I, 259" as its
input if it is to produce the originally desired answer, and that is a quite differ-
ent machine!
In other words, when asked to produce one or more results, it is usual
to generalize the problem and to consider these results as specific instances
of a wider class. But it is no good just to say that everything is a special
instance of something more general! If we want to follow such an approach
we have two obligations:

I. We have to be quite specific as to how we generalize, i.e. we have to


choose that wider class carefully and to define it explicitly, because our
argument has to apply to that whole class.
2. We have to choose ("invent" if you wish) a generalization that is helpful
to our purpose.

In our example I certainly prefer the "GCD-computer" above the "Ill-


and-259-processor" and a comparison between the two will give us a hint as
to what characteristics make a generalization "helpful for our purpose". The
machine that upon request can produce as answer the value of all sorts of
funny functions of I I 1 and 259 becomes harder to verify as the collection of
functions grows. This is in sharp contrast with our "GCD-computer".
The GCD-computer would have been equally bad if it had been a table
with 250,000 entries containing the "ready-made" answers. Its unique feature
is that it could be given in the form of a compact set of "rules of a game"
that, when played according to those rules, will produce the answer.
The tremendous gain is that a single argument applied to these rules
allows us to prove the vital assertions about the outcome of any of the games.
The price to be paid is that in each of the 250,000 specific applications of these
rules, we don't get our answer "immediately": each time the game has to be
played according to the rules!
The fact that we could give such a compact formulation of the rules of
the game such that a single argument allowed us to draw conclusions about
EXECUTIONAL ABSTRACTION 5

any possible game is intimately tied to the systematic arrangement of the


250,000 grid points. We would have been powerless if the cardboard had
shown a shapeless, chaotic cloud of points without a systematic nomencla-
ture! As things are, however, we could divide our pebble into two half-pebbles
and move one half-pebble downward until it lies on the horizontal axis and
the other half-pebble to the left until it lies on the vertical axis. Instead of
coping with one pebble with 250,000 possible positions, we could also deal
with two half-pebbles with only 500 possible positions each, i.e. only 1000
positions in toto ! Our wealth of 250,000 possible states has been built up by
the circumstance that any of the 500 positions of the one half-pebble can be
combined with any of the 500 positions of the other half-pebble: the number
of positions of the undivided pebble equals the product of the number of
positions of the half-pebbles. In the jargon we say that "the total state space
is regarded as the Cartesian product of the state spaces of the variables x
andy".
The freedom to replace one pebble with a two-dimensional freedom of
position by two half-pebbles with a one-dimensional freedom of position is
exploited in the suggested two-register machine. From a technical point of
view this is very attractive; one only needs to build registers able to distin-
guish between 500 different cases ("values") and by just combining two
such registers, the total number of different cases is squared! This multi-
plicative rule enables us to distinguish between a huge number of possible
total states with the aid of a modest number of components with only a
modest number of possible states each. By adding such components the size
of the state space grows exponentially but we should bear in mind that we
may only do so provided that the argument justifying our whole contraption
remains very compact; by the time that the argument grows exponentially
as well, there is no point in designing the machine at all!
Note. A perfect illustration of the above can be found in an invention
which is now more than ten centuries old: the decimal number system!
This has indeed the fascinating property that the number of digits needed
only grows proportional to the logarithm of the largest number to be
represented. The binary number system is what you get when you ignore
that each hand has five fingers. (End of note.)
In the above we have dealt with one aspect of multitude, viz. the great
number of pebble positions ( = possible states). There is an analogous multi-
plicity, viz. the large number of different games ( = computations) that can
be played according to our rules of the game: one game for each initial posi-
tion to be exact. Our rules of the game are very general in the sense that they
are applicable to any initial position. But we have insisted upon a compact
justification for the rules of the game and this implies that the rules of the
game themselves have to be compact. In our example this has been achieved
by the following device: instead of enumerating "do this, do that" we have
6 EXECUTIONAL ABSTRACTION

given the rules of the game in terms of the rules for performing "a step"
together with a criterion whether "the step" has to be performed another
time. (As a matter of fact, the step has to be repeated until a state has been
reached in which the step is undefined.) In other words, even a single game is
allowed to be generated by repeatedly applying the same "sub-rule".
This is a very powerful device. An algorithm embodies the design of the
class of computations that may take place under control of it; thanks to the
conditional repetition of "a step" the computations from such a class may
greatly differ in length. It explains how a short algorithm can keep a machine
busy for a considerable period of time. Alternatively we may see it as a first
hint as to why we might need extremely fast machines.
It is a fascinating thought that this chapter could have been written while
Euclid was looking over my shoulder.
THE ROLE OF PROGRAMMING
1 LANGUAGES

In the chapter "Executional Abstraction" I have given an informal


description of various "machines" designed to compute the greatest common
divisor of two positive (and not too large) integers. One was in terms of a
pebble moving over a cardboard, another was in terms of two half-pebbles,
each moving along the axes, and the last one was in terms of two registers,
each capable of holding an integer value (up to a certain bound). Physically,
these three "machines" are very different; mathematically, however, they are
very similar: the major part of the argument that they are capable of com-
puting the greatest common divisor is the same for all three of them. This is
because they are only different embodiments of the same set of "rules of the
game" and it is really this set of rules that constitute the core of the inven-
tion, the invention which is known as "Euclid's algorithm".
In the previous chapter Euclid's algorithm was described verbally in a
rather informal way. Yet it was argued that the number of possible computa-
tions corresponding to it was so large that we needed a proof of its correct-
ness. As long as an algorithm is only given informally, it is not a very proper
object for a formal treatment. For the latter we need a description of the
algorithm in some suitable formal notation.
The possible advantages of such a formal notation technique are numer-
ous. Any notation technique implies that whatever is described by it is pre-
sented as a specific member of the (often infinite) class of objects describable
by it. Our notation technique has, of course, to cater for an elegant and concise
description of Euclid's algorithm, but once that has been achieved it will
indeed have been presented as a member of a huge class of all sorts of algo-
rithms. And in the description of some of these other algorithms we may
expect to find the more interesting applications of our notation technique.

7
8 THE ROLE OF PROGRAMMING LANGUAGES

In the case of Euclid's algorithm, one can argue that it is so simple that we
can come away with an informal description of it. The power of a formal
notation should manifest itself in the achievements we could never do without
it!
The second advantage of a formal notation technique is that it enables
us to study algorithms as mathematical objects; the formal description of the
algorithm then provides the handle for our intellectual grip. It will enable
us to prove theorems about classes of algorithms, for instance, because their
descriptions share some structural property.
Finally, such a notation technique should enable us to define algorithms
so unambiguously that, given an algorithm described by it and given the
values for the arguments (the input), there should be no doubt or uncertainty
as to what the corresponding answers (the output) should be. It is then con-
ceivable that the computation is carried out by an automaton that, given
(the formal description of) the algorithm and the arguments, will produce
the answers without further human intervention. Such automata, able to
carry out the mutual confrontation of algorithm and argument with each
other, have indeed been built. They are called "automatic computers".
Algorithms intended for automatic execution by computers are called "pro-
grams" and since the late fifties the formal techniques used for program
notation are called "programming languages". (The introduction of the term
"language" in connection with notation techniques for programs has been a
mixed blessing. On the one hand it has been very helpful in as far as existing
linguistic theory now provided a natural framework and an established ter-
minology ("grammar", "syntax", "semantics", etc.) for discussion. On the
other hand we must observe that the analogy with (now so-called!) "natural
languages" has also been very misleading, because natural languages, non-
formalized as they are, derive both their weakness and their power from their
vagueness and imprecision.)
Historically speaking, this last aspect, viz. the fact that programming
languages could be used as a vehicle for instructing existing automatic compu-
ters, has for a long time been regarded as their most important property. The
efficiency with which existing automatic computers could execute programs
written in a certain language became the major quality criterion for that
language! As a regrettable result, it is not unusual to find anomalies in
existing machines truthfully reflected in programming languages, this at the
expense of the intellectual manageability of the programs expressed in such
a language (as if programming without such anomalies was not already diffi-
cult enough!). In our approach we shall try to redress the balance, and we
shall do so by regarding the fact that our algorithms could actually be carried
out by a computer as a lucky accidental circumstance that need not occupy a
central position in our considerations. (In a recent educational text addressed
to the PL/I programmer one can find the strong advice to avoid procedure
THE ROLE OF PROGRAMMING LANGUAGES 9

calls as much as possible "because they make the program so inefficient".


In view of the fact that the procedure is one of PL/l's main vehicles for
expressing structure, this is a terrible advice, so terrible that I can hardly call
the text in question "educational". If you are convinced of the usefulness
of the procedure concept and are surrounded by implementations in which
the overhead of the procedure mechanism imposes too great a penalty, then
blame these inadequate implementations instead of raising them to the level
of standards! The balance, indeed, needs redressing!)
I view a programming language primarily as a vehicle for the description
of (potentially highly sophisticated) abstract mechanisms. As shown in
the chapter "Executional Abstraction", the algorithm's outstanding virtue is
the potential compactness of the arguments on which our confidence in the
mechanism can be based. Once this compactness is lost, the algorithm has
lost much of its "right of existence" and therefore we shall consciously aim
at retaining that compactness. Also our choice of programming language
shall be aimed at that goal.
2 STATES
AND THEIR CHARACTERIZATION

Since many centuries man characterizes natural numbers. I imagine that


in prehistoric times, when the notion of "a number" dawned upon our ances-
tors, they invented individual names for each number they found they wanted
to refer to; they must have had names for numbers in very much the same
way as we have the names "one, two, three, four, etcetera."
They are truly "names" in the sense that by inspecting the sequence
"one, two, three" no rule enables us to derive that the next one will be "four".
You really must know that. (At an age that I knew perfectly well how to count
-in Dutch- I had to learn how to count in English, and during a school
test no clever inspection of the words "seven" and "nine" would enable me
to derive how to spell "eight", let alone how to pronounce it!)
It is obvious that such a nonsystematic nomenclature enables us to
distinguish only between a very limited number of different numbers; in order
to overcome that limitation each language in the civilized world has intro-
duced a (more or less) systematic nomenclature for the natural numbers and
learning to count is mainly discovering the system underlying the nomencla-
ture. When a child has learned how to count up to a thousand, he has not
learned those thousand names (in order!) by heart, he knows the rules:
comes the moment that the child has discovered how to go from any number
to the next and therefore also from "four hundred and thirty-nine" to "four
hundred and forty".
The ease of manipulation with numbers is greatly dependent on the
nomenclature we have chosen for them. It is much harder to establish that
twelve times a dozen = a gross
eleven plus twelve = twenty-three
XLVII +IV= LI
10
STATES AND THEIR CHARACTERIZATION 11

than it is to establish that


12 * 12 = 144
11+12 = 23
47 +4 = 51
because the latter three answers can be generated by a simple set of rules
that any eight-year-old child can master.
In mechanical manipulation of numbers, the advantages of the decimal
number system are much more pronounced. For centuries already, we have
had mechanical adding machines, displaying the answer in a window behind
which there are a number of wheels with ten different positions, each wheel
showing one decimal digit in each of its positions. (It is no longer a problem
to display "00000019", to add 4, and then to display "00000023"; it would
be a problem -at least by purely mechanical means- to display "nineteen"
and "twenty-three" instead!)
The essential thing about such a wheel is that it has ten different, stable
positions. In the jargon this is expressed in a variety of ways. For instance,
the wheel is called "a ten-valued variable" and if we want to be more explicit
we even enumerate the values: from 0 through 9. Here each "position" of the
wheel is identified with a "value" of the variable. The wheel is called "a
variable" because, although the positions are stable, the wheel can be turned
into another position: the "value" can be changed. (This term is, I am sorry
to say, rather misleading in more than one respect. Firstly, such a wheel
which is (almost) always in one of its ten positions and therefore (almost)
always represents a "value", is a concept widely different from what mathe-
maticians call a "variable'', because, as a rule, a mathematical variable
represents no specific value at all; if we say that for each whole number n
the assertion n 2 > 0 is true, then this n is a variable of quite a different nature.
Secondly, in our context we use the term "variable" for something existing
in time, whose value, unless something is done about it, remains constant!
The term "changeable constant" would have been better, but we shall not
introduce it and shall stick to the by now firmly established tradition.)
Another way in which the jargon tries to capture the essentials of such a
wheel that is (almost) always in one of ten different positions or "states" is
to associate with the wheel "a state space of ten points". Here each state
(position) is associated with "a point" and the collection of these "points"
is called -and this is in accordance with mathematical tradition- a "space"
or if we want to be more specific "a state space". Instead of saying that the
variable has (almost) always one of its possible values one can now express
this by saying that the system consisting of this single variable is (almost)
always in one of the points of its state space. The state space describes the
amount of freedom of the system; it just has nowhere else to go.
12 STATES AND THEIR CHARACTERIZATION

So much for a single wheel. Let us now turn our attention to a register
with eight of such wheels in a row. Because each of these eight wheels is in
one of ten different states, this register, considered as a whole, is in one of
100,000,000 possible, different states, each of which is suitably identified by
the number (or rather by the row of eight digits) displayed through the
window.
If the state for each of the wheels is given, then the state of the register as
a whole is uniquely determined; conversely, from each state of the register
as a whole, the state of each individual wheel is determined uniquely. In this
case we say (in an earlier chapter we have already used the term) that we
get (or build) the state space of the register as a whole by forming the
"Cartesian product" of the state spaces of the eight individual wheels. The
total number of points in that state space is the product of the number of
points in the state spaces from which it has been built (that is why it is
called the Cartesian product).
Whether such a register is considered as a single variable with 10 8 different
possible values, or as a composite variable composed out of eight different
ten-valued variables called "wheels" depends on our interest in the thing. If
we are only interested in the value displayed, we shall probably regard the
register as an unanalyzed entity, whereas the maintenance engineer who has
to replace a wheel with a worn tooth will certainly regard the register as
a composite object.
We have seen another example of building up a state space as the Car-
tesian product of much smaller state spaces when we discussed Euclid's
algorithm and observed that the position of the pebble somewhere on the
board could equally well be identified by two half-pebbles, each somewhere
on an axis, that is, by the combination (or more precisely, an ordered pair)
of two variables "x" and "y". (The idea of identifying the position of a point
in a plane by the values of its x-and y-coordinates comes from Descartes when
he developed the analytical geometry, and the Cartesian product is named
that way in honour of him.) The pebble on the board has been introduced as
a visualization of the fact that an evolving computational process -such as
the execution of Euclid's algorithm- can be viewed as the system travelling
through its state space. In accordance with this metaphor, the initial state
is also referred to as "the starting point".
In this book we shall mainly, or perhaps even exclusively, occupy our-
selves with systems whose state space will eventually be regarded as being
built up as a Cartesian product. This is certainly not to be interpreted as
my suggesting that state spaces built by forming Cartesian products are the
one and final answer to all our problems, for I know only too well that this
is not true. As we proceed it will become apparent why they are worthy of
so much of our attention and, simultaneously, why the concept plays such
a central role in many programming languages.
STATES AND THEIR CHARACTERIZATION 13

Before proceeding I mention one problem that we shall have to face.


When we construct a state space by forming a Cartesian product, it is by
no means certain that we shall have good use for all its points. The usual
example to illustrate this is characterizing the days of a given year by a pair
(month, day), where "month" is a 12-valued variable (running from "Jan"
through "Dec") and "day" a 31-valued variable (ranging from "l" through
"31"). We have then created a state space with 372 points, while no year has
more than 366 days. What do we do with, say, (Jun, 31)? Either we disallow
it, thereby catering for "impossible dates" and thus enabling in a sense the
system to contradict itself, or we allow it as an alternative name for one of
the "true" days, e.g. equating it to (Jul, 1). The phenomenon of "unused
points of the state space" is bound to arise whenever the number of different
possible values between which we want to distinguish happens to be a prime
number.
The nomenclature that is automatically introduced when we form a
state space as a Cartesian product enables us to identify a single point; for
instance, I can now state that my birthday is (May, 11). Thanks to Descartes,
however, we now know of another way of stating this fact: I have my birth-
day on the date (month, day) whenever they are a solution of the equation
(month = May) and (day= 11)
The above equation has only one solution and is therefore a rather complic-
ated way of specifying that single day in the year. The advantage of using an
equation, however, is that we can use it to characterize the set of all its
solutions, and such a set can be much larger than just a single point. A
trivial example would be the definition of Christmas
(month = Dec) and ((day = 25) or (day = 26))
a more striking example is the definition of the set of days on which my
monthly salary is paid
(day= 23)
and this, indeed, is a much more compact specification than an enumeration
like "(Jan, 23), (Feb, 23), (Mar, 23)," etc.
From the above it is clear that the ease with which we use such equations
to characterize sets of states depends on whether the sets we wish to charac-
terize "match" the structure of the state space, i.e., "match" the coordinate
system introduced. In the above coordinate system it would, for instance, be
somewhat awkward to characterize the set of days that fall on the same day
of the week as (Jan, 1). Many a programmer's decisions have to do with the
introduction of state spaces with coordinate systems that are appropriate for
his goal and the latter requirement will often lead him to the introduction of
state spaces with a number of points many times larger than the number of
different possible values he has to distinguish between.
14 STATES AND THEIR CHARACTERIZATION

We have seen another example of using an equation to characterize a set


of states in our description of the cardboard machine for the computation
of GCD( X, Y), viz.
x=y
characterizing all the points of what we called the "answer line"; it is the set
of final states, i.e. the computation stops if and only if a state satisfying the
equation x = y has been reached.
Besides the coordinates of the state space, i.e. the variables in terms of
whose values the computational process evolves, we have seen in our equa-
tions constants (such as "May" or "23"). Besides those, we may also have so-
called "free variables" which you may think of as "unspecified constants". We
use them specifically to relate different states as they occur at successive stages
of the same computational process. For instance, during a specific execution
of Euclid's algorithm with starting point (X, Y), all states (x, y) will satisfy
GCD(x, y) = GCD(X, Y) and 0 < x < X and 0 <y< Y
Here the X and Y are not variables such as x and y. They are "their initial
values", they are constants for a specific computation, but unspecified in
the sense that we could have started Euclid's algorithm with any point of
the grid as initial position of our pebble.
Some final terminology. I shall call such equations "conditions" or
"predicates". (I could, and perhaps should, distinguish between them, reserv-
ing the term "predicate" for the formal expression denoting the "condition":
we could then, for instance, say that the two different predicates "x = y"
and "y = x" denote the same condition. Knowing myself I do not expect to
indulge very much in such a mannerism.) I shall use synonymously expres-
sions such as "a state for which a predicate is true" and "a state that satisfies
a condition" and "a state in which a condition is satisfied" and "a state in
which a condition holds", etc. If a system is certain to arrive at a state satisfy-
ing a condition P, we shall say that the system is certain "to establish the
truth of P".
Each predicate is assumed to be defined in each point of the state space
under consideration: in each point the value of a predicate is either "true"
or "false'', and the predicate is used to characterize the set of all points for
which (or where) the predicate is true.
We call two predicates P and Q equal (in formula: "P = Q") when they
denote the same condition, i.e. when they characterize the same set of states.
Two predicates will play a special role and we reserve the names "T" and
"F" for them.
T is the predicate that is true in all points of the state space concerned:
the corresponding set is the universe.
Fis the predicate that is false in all points of the state space: it corresponds
to the empty set.
3 THE CHARACTERIZATION
OF SEMANTICS

We are primarily interested in systems that, when started in an "initial


state", will end up in a "final state" which, as a rule, depends on the choice
of the initial state. This is a view that is somewhat different from the idea of
the finite state automaton that on the one hand absorbs a stream of input
characters and on the other hand produces a stream of output characters. To
translate that in our picture we must assume that the value of the input (i.e.
the argument) is reflected in the choice of the initial state and that the value
of the output (i.e. the answer) is reflected in the final state. Our view relieves
us from all sorts of peripheral complications.
The first section of this chapter deals almost exclusively with so-called
"deterministic machines", whereas the second section (which can be skipped
at first reading) deals with so-called "nondeterministic machines". The
difference between the two is that for the deterministic machine the happening
that will take place upon activation of the mechanism is fully determined by
its initial state. When activated twice in identical initial states, identical hap-
penings will take place: the deterministic machine has a fully reproducible
behaviour. This is in contrast to the nondeterministic machine, for which
activation in a given initial state will give rise to one out of a class of possible
happenings, the initial state only fixing the class as a whole.
Now I assume that the design of such a system is a goal-directed activity,
in other words that we want to achieve something with the system. For
instance, if we want to make a machine capable of computing the greatest
common divisor, we could demand of the final state that it satisfies
x = GCD(X, Y) (J)
In the machine we have been envisaging, we shall also have y = GCD(X, Y)

15
16 THE CHARACTERIZATION OF SEMANTICS

because the game terminates when x = y, but that is not part of our require-
ment when we decide to accept the final value of x as our "answer".
We call condition (J) the (desired) "post-condition"-"post" because it
imposes a condition upon the state in which the system must find itself after
its activity. Note that the post-condition could be satisfied by many of the
possible states. In that case we apparently regard each of them as equally
satisfactory and there is then no reason to require that the final state be a
unique function of the initial state. (As the reader will be aware, it is here
that the potential usefulness of a nondeterministic mechanism presents itself.)
In order to use such a system when we want it to produce an answer,
say "reach a final state satisfying post-condition (J) for a given set of values
of X and Y", we should like to know the set of corresponding initial states,
more precisely, the set of initial states such that activation will certainly
result in a properly terminating happening leaving the system in a final state
satisfying the post-condition. If we can bring the system without computa-
tional effort into one of these states, we know how to use the system to pro-
duce for us the desired answer! To give the example for Euclid's cardboard
game: we can guarantee a final state satisfying the postcondition (J) for any
initial state satisfying
GCD(x, y) = GCD(X, Y) and 0 < x < 500 and 0 < y < 500 (2)
(The upper limits have been added to do justice to the limited size of the
cardboard. If we start with a pair (X, Y) such that GCD(X, Y) = 713, then
there exists no pair (x, y) satisfying condition (2), i.e. for those values of X
and Y condition (2) reduces to F; and that means that the machine in question
cannot be used to compute the GCD(X, Y) for that pair of values of X and
Y.)
For many (X, Y) combinations, many states satisfy (2). In the case that
0 < X < 500 and 0 < Y < 500, the trivial choice is x = X and y = Y.
It is a choice that can be made without any evaluation of the GCD-function,
even without appealing to the fact that the GCD-function is a symmetric
function of its arguments.
The condition that characterizes the set of all initial states such that
activation will certainly result in a properly terminating happening leaving
the system in a final state satisfying a given post-condition is called "the
weakest pre-condition corresponding to that post-condition". (We call it
"weakest", because the weaker a condition, the more states satisfy it and we
aim here at characterizing all possible starting states that are certain to lead
to a desired final state.)
If the system (machine, mechanism) is denoted by "S" and the desired
post-condition by "R", then we denote the corresponding weakest pre-con-
dition by
wp(S, R)
THE CHARACTERIZATION OF SEMANTICS 17

If the initial state satisfies wp(S, R), the mechanism is certain to establish
eventually the truth of R. Because wp(S, R) is the weakest pre-condition,
we also know that if the initial state does not satisfy wp(S, R), this guarantee
cannot be given, i.e. the happening may end in a final state not satisfying R
or the happening may even fail to reach a final state at all (as we shall see,
either because the system finds itself engaged in an endless task or because
the system has got stuck).
We take the point of view that we know the possible performance of
the mechanism S sufficiently well, provided that we can derive for any post-
condition R the corresponding weakest pre-condition wp(S, R), because then
we have captured what the mechanism can do for us; and in the jargon the
latter is called "its semantics".
Two remarks are in order. Firstly, the set of possible post-conditions is
in general so huge that this knowledge in tabular form (i.e. in a table with
an entry for each R wherein we would find the corresponding wp(S, R))
would be utterly unmanageable, and therefore useless. Therefore the defi-
nition of the semantics of a mechanism is always given in another way, viz.
in the form of a rule describing how for any given post-condition R the
corresponding weakest pre-condition wp(S, R) can be derived. For a fixed
mechanism S such a rule, which is fed with the predicate R denoting the
post-condition and delivers a predicate wp(S, R) denoting the corresponding
weakest precondition, is called "a predicate transformer". When we ask for
the definition of the semantics of the mechanism S, what we really ask for
is its corresponding predicate transformer.
Secondly -and I feel tempted to add "thank goodness"- we are often
not interested in the complete semantics of a mechanism. This is because it
is our intention to use the mechanism S for a specific purpose only, viz. for
establishing the truth of a very specific post-condition R for which it has
been designed. And even for that specific post-condition R, we are often not
interested in the exact form of wp(S, R) i often we are content with a stronger
condition P, that is, a condition for which we can show that
P=> wp(S, R) for all states (3)
holds. (The predicate "P => Q" (read "P implies Q") is only false in those
points in state space where P holds, but Q does not, and it is true everywhere
else. By requiring that "P => wp(S, R)" holds for all states, we just require
that wherever Pis true, wp(S, R) is true as well: Pis a sufficient pre-condition.
In terms of sets it means that the set of states characterized by Pis a subset of
the set of states characterized by wp(S, R).) If for a given P, S, and R rela-
tion (3) holds, this can often be proved without explicit formulation -or,
if you prefer, "computation" or "derivation"- of the predicate wp(S, R).
And this is a good thing, for except in trivial cases we must expect that the
explicit formulation of wp(S, R) will defy at least the size of our sheet of
18 THE CHARACTERIZATION OF SEMANTICS

paper, our patience, or our (analytical) ingenuity (or any combination of


them).
The meaning of wp(S, R) i.e. "the weakest pre-condition for the initial
state such that activation will certainly result in a properly terminating hap-
pening, leaving the system Sin a final state satisfying the post-condition R'',
allows us to conclude that, considered as a function of the post-condition R,
the predicate transformer has a number of properties.

PROPERTY I. For any mechanism S we have


wp(S, F) = F (4)
Suppose that this was not true; under that assumption there would be at
least one state satisfying wp(S, F). Take such a state as the initial state for
the mechanism S; then, according to our definition, activation would result
in a properly terminating happening, leaving the system S in a final state
satisfying F. But this is a contradiction, for by definition there are no states
satisfying F and thus relation (4) has been proved. Property 1 is also known
under the name of the "Law of the Excluded Miracle".

PROPERTY 2. For any mechanism S and any post-conditions Q and R such


that
for all states (5)
we also have
wp(S, Q) => wp(S, R) for all states (6)
Indeed, for any initial state satisfying wp(S, Q) will upon activation establish
the truth of Q by definition; on account of (5) it will therefore establish the
truth of Ras well and as initial state it will therefore satisfy wp(S, R) as well,
as expressed in (6). Property 2 is a property of monotonicity.

PROPERTY 3. For any mechanism S and any post-condition Q and R we


have
(wp(S, Q) and wp(S, R)) = wp(S, Q and R) (7)
In every point of the state space the left-hand side of (7) implies the right-
hand side, because for any initial state satisfying both wp(S, Q) and wp(S, R)
we have the combined knowledge that a final state will be established satisfy-
ing both Q and R. Furthermore, because by definition
(Q and R) => Q for all states
property 2 allows us to conclude
wp(S, Q and R) => wp(S, Q) for all states
similarly,
wp(S, Q and R) => wp(S, R) for all states
THE CHARACTERIZATION OF SEMANTICS 19

But from A =- B and A =- C, propositional calculus tells us that we may


conclude A =- (B and C); therefore the right-hand side of (7) implies the
left-hand side in every point of the state space. Both sides implying each
other everywhere, they must be equal and thus property 3 has been proved.

PROPERTY 4. For any mechanism S and any post-conditions Q and R we


have
(wp(S, Q) or wp(S, R)) =- wp(S, Q or R) for all states (8)
Because by definition
Q =- (Q or R) for all states
property 2 allows us to conclude
=- wp(S, Q or R) for all states
wp(S, Q)
similarly,
wp(S, R) =- wp(S, Q or R) for all states
But from A =- C and B =- C, propositional calculus tells us that we may
conclude (A or B) =- C, and thus (8) has been proved. In general, the implica-
tion in the other direction does not hold: the certainty that a pregnant woman
will give birth to a son is nil, similarly the certainty that she will give birth
to a daughter is nil, the certainty that she will give birth to a son or a daughter,
however, is absolute. For deterministic mechanisms, however, we have the
stronger property which follows.

PROPERTY 4'. For any deterministic mechanism Sand any post-conditions


Q and R we have
(wp(S, Q) or wp(S, R)) = wp(S, Q or R)
We have to show the implication to the left. Consider an initial state satisfy-
ing wp(S, Q or R); to this initial state corresponds a unique final state, satisfy-
ing either Q, or R, or both; the initial state therefore must satisfy either
wp(S, Q) or wp(S, R) or both respectively, i.e. it must satisfy (wp(S, Q) or
wp(S, R)). And this proves property 4'.
In this book -and that may turn out to be one of its distinctive features-
! shall treat nondeterminacy as the rule and determinacy as the exception: a
deterministic machine will be regarded as a special case of the nondeter-
ministic one, as a mechanism for which property 4' holds rather than the
somewhat weaker property 4. This decision reflects a drastic change in my
own thinking. Back in 1958 I was one of the first to develop the basic software
for a machine with an 1/0 interrupt and the irreproducibility of the behaviour
of such a -to all intents and purposes: nondeterministic- machine was a
traumatic experience. When the idea of the 1/0 interrupt was first suggested
I was so terrified at the thought of having to build reliable software for such
an intractable beast that I delayed the decision to incorporate the feature
20 THE CHARACTERIZATION OF SEMANTICS

for at least three months. And even after I had given in (I had been flattered
out of my resistance!) I was highly uncomfortable. When the prototype was
becoming kind of operational I had my worst fears fully confirmed: a bug
in the program could evoke the erratic behaviour so strongly suggestive of
an irreproducible machine error. And secondly -and that was in the time
that for deterministic machines we still believed in "debugging"- it was
right from the start quite obvious that program testing was quite ineffective
as a means for raising the confidence level.
For many years thereafter I have regarded the irreproducibility of the
behaviour of the nondeterministic machine as an added complication that
should be avoided whenever possible. Interrupts were nothing but a curse
inflicted by the hardware engineers upon the poor software makers! Out of
this fear of mine the discipline for "harmoniously cooperating sequential
processes" has been born. In spite of its success I was still afraid, for our
solutions -although proved to be correct- seemed ad hoc solutions to the
problem of "taming" (that is the way we felt about it!) special forms of non-
determinacy. The background of my fear was the absence of a general meth-
odology.
Two circumstances have changed the scene since then. The one is the
insight that, even in the case of fully deterministic machines, program testing
is hardly helpful. As I have now said many times and written in many places:
program testing can be quite effective for showing the presence of bugs, but
is hopelessly inadequate for showing their absence. The other one is the
discovery that in the meantime it has emerged that any design discipline
must do justice to the fact that the design of a mechanism that is to have a
purpose must be a goal-directed activity. In our special case it means that
we can expect our post-condition to be the starting point of our design
considerations. In a sense we shall be "working backwards". In doing so we
shall find that the implication of property 4 is the essential part; for the equal-
ity of property 4' we shall have very little use.
Once the mathematical equipment needed for the design of nondeter-
ministic mechanisms achieving a purpose has been developed, the nondeter-
ministic machine is no longer frightening. On the contrary! We shall learn
to appreciate it, even as a valuable stepping stone in the design of an ulti-
mately fully deterministic mechanism.

(The remainder of this chapter can be skipped at first reading.) We have


stated our position that we know the possible performance of the mechanism
S sufficiently well, provided that we know how its associated predicate trans-
former wp(S, R) acts upon any post-condition R. If we also know that the
mechanism is deterministic, the knowledge of this predicate transformer
fixes its possible behaviour completely. For a detenpinistic mechanism S
THE CHARACTERIZATION OF SEMANTICS 21

and some post-condition R each initial state falls in one of three disjoint
sets, according to the following, mutually exclusive, possibilities:

(a) Activation of Swill lead to a final state satisfying R.


(b) Activation of Swill lead to a final state satisfying non R.
(c) Activation of Swill not lead to a final state, i.e. the activity will fail to
terminate properly.

The first set is characterized by wp(S, R), the second set by wp(S, non R),
their union by
(wp(S, R) or wp(S, non R)) = wp(S, R or non R) = wp(S, T)
and therefore the third set is characterized by non wp(S, T).
To give the complete semantic characterization of a nondeterministic
system requires more. With respect to a given post-condition R we have
again the three possible types of happenings as listed above under (a), (b),
and (c). But in the case of a nondeterministic system an initial state need
not lead to a unique happening, which by definition is one out of the three
mutually exclusive categories; for each initial state the possible happenings
may now belong to two or even to all three categories.
In order to describe them we can use the notion of "a liberal pre-condi-
tion". Earlier we considered pre-conditions such that it was guaranteed that
"the right result", i.e. a final state satisfying R, would be reached. A liberal
pre-condition is weaker: it only guarantees that the system won't produce
the wrong result, i.e. will not reach a final state not satisfying R, but non-
termination is left as an alternative. Also for liberal pre-conditions we can
introduce the concept of "the weakest liberal pre-condition"; let us denote
it by wlp(S, R). Then the initial state space is, in principle, subdivided into
seven mutually exclusive regions, none of which need to be empty. (Seven,
because from three objects one can make seven nonempty selections.) They
are all easily characterized by three predicates, viz. wlp(S, R), wlp(S, non R),
and wp(S, T).

(a) wp(S, R) = (wlp(S, R) and wp(S, T))


Activation will establish the truth of R.
(b) wp(S, non R) = (wlp(S, non R) and wp(S, T))
Activation will establish the truth of non R.
(c) wlp(S, F) = (wlp(S, R) and wlp(S, non R))
Activation will fail to lead to a properly terminating activity.
(ab) wp(S, T) and non wlp (S, R) and non wlp(S, non R)
Activation will lead to a terminating activity, but the initial state does
not determine whether the final state will satisfy R or not.
22 THE CHARACTERIZATION OF SEMANTICS

(ac) wlp(S, R) and non wp(S, T)


If activation leads to a final state, that final state will satisfy R, but the
initial state does not determine whether the activity will terminate or
not.
(be) wlp(S, non R) and nonwp(S, T)
If activation leads to a final state, that final state will not satisfy R,
but the initial state does not determine whether the activity will termi-
nate or not.
(abc) non (wlp(S, R) or wlp(S, non R) or wp(S, T))
The initial state does not determine, whether activation will lead to a
terminating activity, nor whether in the case of termination, R will be
satisfied or not.

The last four possibilities only exist for nondeterministic machines.


From the definition of wlp(S, R) it follows that
wlp(S, T) = T
it is also clear that
(wlp(S, F) and wp(S, T)) = F
If it were not, there would be an initial state for which both termination and
nontermination could be guaranteed.
Figure 3.1 gives a pictorial representation of the initial state space with
the insides of the rectangles satisfying wlp(S, R), wlp(S, non R) and wp(S, T)
respectively.

abc

FIGURE J.J

The above analysis has been given for completeness' sake and also
because in practice the notion of a liberal pre-condition is a quite useful
one. If one implements, for instance, a programming language, one will not
prove that the implementation executes any correct program correctly; one
should be happy and content with the assertion that no correct program will
THE CHARACTERIZATION OF SEMANTICS 23

be processed incorrectly without warning-provided, of course, that the


class of programs that indeed will be processed correctly is sufficiently large
to make the implementation of any interest.
For the time being, however, we shall pay no attention to the concept of
the liberal pre-condition and shall confine ourselves to the characterization
of initial states that guarantee that the right result will be produced. Once
this tool has been developed, we shall consider how it can be bent into one
allowing us to talk about liberal pre-conditions to the extent we are interested
in them.
THE SEMANTIC CHARACTERIZATION
4 OF A PROGRAMMING LANGUAGE

In the previous chapter we have taken the position that we know the
semantics of a mechanism S sufficiently well if we know its "predicate trans-
former", i.e. a rule telling us how to derive for any post-condition R the
corresponding weakest pre-condition, which we have denoted by "wp(S, R)",
for the initial state such that attempted activation will lead to a properly
terminating activity that leaves the system in a final state satisfying R. The
question is: how does one derive wp(S, R) for given Sand R?
So much, for the time being, about a single, specific mechanism S. A
program written in a well-defined programming language can be regarded
as a mechanism, a mechanism that we know sufficiently well provided that
we know the corresponding predicate transformer. But a programming lan-
guage is only useful provided that we can use it for the formulation of many
different programs and for all of them we should like to know their corre-
sponding predicate transformers.
Any such program is defined by its text as written in that well-defined
programming language and that text should therefore be our starting point.
But now we see suddenly two completely different roles for such a program
text! On the one hand the program text is to be interpreted by a machine
whenever we wish the program to be executed automatically, whenever we
wish a specific computation to be performed for us. On the other hand the
program text should tell us how to construct the corresponding predicate
transformer, how to accomplish the predicate transformation that will derive
wp(S, R) for any given post-condition R that has caught our fancy. This
observation tells us what we mean by "a well-defined programming language"
as far as we are concerned. While the semantics of a specific mechanism
(program) are given by its predicate transformer, we consider the semantic

24
THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE 25

characterization of a programming language given by the set of rules that


associate the corresponding predicate transformer with each program written
in that language. From that point of view we can regard the program as "a
code" for a predicate transformer.
If one so desires one can approach the problem of programming language
design from out of that corner. In such an approach the -rather formal-
starting point is that the rules for constructing predicate transformers must
be such that whatever can be constructed by applying them must be a predic-
ate transformer enjoying the properties 1 through 4 from the previous chapter
"The Characterization of Semantics'', for if they don't, you are just massag-
ing predicates in a way such that they can no longer be interpreted as post-
conditions and corresponding weakest preconditions respectively.
Two very simple predicate transformers that satisfy the required pro-
perties immediately present themselves.
There is, to begin with, the identity transformation, i.e. the mechanism
S such that for any post-condition R we have wp(S, R) = R. This mechanism
is known to and beloved by all programmers: they know it as "the empty
statement" and in their program text they often denote it by writing nothing
at a place in the text where syntactically a statement is required. This is not
a particularly good convention (a compiler only "sees" it by not seeing a
statement that should be there) and we shall give it a name, say "skip". The
semantics of the statement named "skip" are therefore given by:
wp(skip, R) = R for any post-condition R
(As everybody does, I shall use the term "statement" because it has found its
firm place in the jargon; when people suggested that "command" was perhaps
a more appropriate term, it was already too late!)
Note. Those who think it a waste of characters to introduce an explicit
name such as "skip" for the empty statement while "nothing" expresses
its semantics so eloquently, should realize that the decimal number system
was only possible thanks to the introduction of the character "O" for the
concept zero. (End of note.)
Before going on I would not like to miss the opportunity of pointing out
that in the meantime we have defined a programming language! Admittedly
it is a rather rudimentary one: it is a one-statement language in which only
one mechanism can be defined and the only thing that mechanism can do
for us is "leaving things as they are" (or "doing nothing", but on account of
the negation that is a dangerous use oflanguage; see next paragraph).
The next simple predicate transformer is the one which leads to a constant
weakest pre-condition that does not depend on the post-condition R at all.
As constant predicates we have two, T and F. A mechanism S such that
wp(S, R) = T for all R cannot exist, for it would violate the Law of the
Excluded Miracle; a mechanism S such that wp(S, R) = F for all R has,
26 THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE

however, a predicate transformer that satisfies all the necessary properties.


We shall also give this statement a name, say "abort". The semantics of the
statement named "abort" are therefore given by
wp(abort, R) = F for any post-condition R
This one cannot even "do nothing" in the sense of "leaving things as they
are"; it really cannot do a thing. If we take R = T, i.e. imposing beyond its
existence no further requirement upon the final state, even then there is no
corresponding initial state. When evoked, the mechanism named "abort"
will therefore fail to reach a final state: its attempted activation is interpreted
as a symptom of failure. (It need not concern us here (and not even later!)
that later we shall present frameworks of statements that contain the semantic
equivalents of "skip" and "abort" as special cases.)
Now we have a (still very rudimentary!) two-statement programming
language in which we can define two mechanisms, one doing nothing and the
other always failing. Since the publication of the famous "Report on the
Algorithmic Language ALGOL 60" in 1960, no self-respecting computing
scientist can reach this stage without giving a formal definition of the syntax
of his language thus far developed in the notational technique called "BNF''
(short for "Backus-Naur-Form"), viz.:
(statement) : : = skip Iabort
(To be read as: "An element of the syntactic category called "statement"
(that is what the funny brackets " <"and " ) " stand for) is defined as (that
is what"::=" stands for) "skip" or (that is what the vertical bar" I" stands
for) "abort".". Great! But don't worry; more impressive applications of BNF
as notational technique will follow in due time!)

A class of definitely more interesting predicate transformers is based


upon substitution, i.e. replacing all occurrences of a variable in the formal
expression for the post-condition R by (the same) "something else". If in a
predicate R all occurrences of the variable x are replaced by some expression
(E), then we denote the result of this transformation by RE-x· Now we can
consider for given x and E a mechanism such that for all post-conditions R
we have wp(S, R) = RE-x> where x is a "coordinate variable" of our state
space and Eis an expression of the appropriate type.
Note. Such a transformation by substitution satisfies the properties I
through 4 from the previous chapter. We shall not try to demonstrate this
and leave it to the reader's taste whether he will regard this as a trivial or
as a deep mathematical result. (End of note.)
The above pattern introduces a whole class of predicate transformers,
a whole class of mechanisms. They are denoted by a statement that is called
"an assignment statement" and such a statement has to specify three things:
THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE 27

1. the identity of the variable to be replaced;


2. the fact that substitution is the corresponding rule for predicate trans-
formation;
3. the expression which is to replace every occurrence of the variable to be
replaced in the post-condition.

If the variable x is to be replaced by the expression (E), the usual way to


write such a statement is:
x:=E
(where the so-called assignment operator":=" should be read as "becomes").
This can be summarized by defining
wp("x:= E", R) = RE~x for any post-condition R
which for any coordinate variable x and any expression E of the appropriate
type can, if we so desire, be viewed as the semantic definition of the assign-
ment operator.
Revelling, as we do, in the use of BNF we can extend our formal syntax
to read:
<statement) : : = skip Iabort I<assignment statement)
<assignment statement)::= <variable):= <expression)
where the last line should be read as "An element of the syntactic category
called "assignment statement" is defined as an element of the syntactic
category called "variable'', followed by the assignment operator ": = '',
followed by an element of the syntactic category called "expression".".
Before proceeding it seems wise to verify that our formal definition of the
semantics of the assignment statement indeed captures our intuitive under-
standing of the assignment statement-if we have one! Let us consider a
state space with the two integer coordinate variables "a" and "b". Then
wp("a:= 7'', a = 7) = {7 = 7}
and because the boolean expression at the right-hand side is true for all
values of a and b, i.e. for all points in the state space, we can simplify to
wp("a:= 7'', a= 7) = T
i.e., each initial state will guarantee that the assignment "a:= 7" will establish
the truth of "a= 7". Similarly
wp("a:= 7", a= 6) = {7 = 6}
and because the boolean expression is false for all values of a and b, we find
wp("a:= 7", a= 6) = F
This means that there is no initial state for which we can guarantee that the
assignment "a:= 7" establishes the truth of "a= 6". (This is in accordance
28 THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE

with our previous result that all initial states would establish the final truth
of "a = 7" and therefore the final falsity of "a -::;t::. 7".) Also
wp("a:= 7'', b = bO) = {b = bO}
i.e. if we wish to guarantee that after the assignment "a:= 7" the variable
b has some value bO, then b should have that value already at the initial state.
In other words, all variables other than "a" are not tampered with, they keep
the value they had; the assignment "a:= 7" moves the point in state space
corresponding to the current system state parallel to the a-axis such that
"a = 7" finally holds.
Instead of choosing a constant for the expression E, we could also have
a function of the initial state. This is illustrated in the following examples:
wp("a:= 2 *b + l", a= 13) = {2 *b + 1 = 13} = {b = 6}
wp("a:= a+ l", a> 10) ={a+ 1 > JO}= {a> 9}
wp("a:= a - b'', a> b) ={a - b > b} ={a> 2 * b}
There is a slight complication if we allow the expression E to be a partial
function of the initial state, i.e. such that its attempted evaluation with an
initial state that lies outside its domain will not lead to a properly terminating
activity; if we wish to cater to that situation as well, we must sharpen our
definition of the semantics of the assignment operator and write
wp("x:= E", R) = {D(E) cand RE-•x}
Here the predicate D(E) means "in the domain of E"; the boolean expression
"Bl cand B2" (the so-called "conditional conjunction") has the same value
as "Bl and B2" where both operands are defined, but is also defined to have
the value "false" where Bl is "false", the latter regardless of the question
whether B2 is defined. Usually the condition D(E) is not mentioned explicitly,
either because it is = T or because we have seen to it that the assignment
statement will never be activated in initial states outside the domain of E.

A natural extension of the assignment statement, beloved by some pro-


grammers, is the so-called "concurrent assignment". Here a number of
different variables can be substituted simultaneously; the concurrent assign-
ment statement is denoted by a list of the different variables to be substituted
(mutually separated by commas) at the left-hand side of the assignment
operator and an equally long list of expressions (also mutually separated by
commas) at its right-hand side. Thus one is allowed to write
xl, x2:= El, E2
xl, x2, x3:= El, E2, E3
Note that the ith variable from the left-hand list is to be replaced by the
THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE 29

ith expression from the right-hand list, such that, for instance, for given
xi, x2, El, and E2
xi, x2:= El, E2
is semantically equivalent with
x2, xi:= E2, El
The concurrent assignment allows us to prescribe that the two variables x
and y interchange their values by means of
x,y:=y,x
an operation that is awkward to describe otherwise. This, the fact that it
is easily implemented, and the fact that it allows us to avoid some over-
specification, are the reasons for its popularity. If the lists become long, the
resulting program becomes very hard to read.
The true BNF addict will extend his syntax by providing two alternative
forms for the assignment statement, viz.:
<assignment statement) : : = <variable) : = <expression) I
<variable), <assignment statement), <expression)
This is a so-called "recursive definition'', because one of the alternative
forms for a syntactic unit called "assignment statement" (viz. the second one)
contains as one of its components again the same syntactic unit called
"assignment statement'', i.e., the syntactic unit we are defining! At first
sight such a cyclic definition seems frightening, but upon closer inspection
we can convince ourselves that, at least from a syntactic point of view, there
is nothing wrong with it. For instance, because according to the first alterna-
tive
x2:= El
is an instance of an assignment statement, the formula
xi, x2:= El, E2
admits a parsing of the form
xi, <assignment statement), E2
and is therefore, according to the second alternative, also an assignment
statement. From a semantic point of view, however, it is a horror because
it suggests that E2 is associated with xi instead of with x2.

Compared with the two-statement language with only "skip" and "abort"
our language with the assignment statement is considerably richer: there is
no upper bound anymore on the number of different instances of the syn-
tactic unit "assignment statement". Yet it is clearly insufficient for our pur-
pose; we need the ability to build more sophisticated programs, more
30 THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE

elaborate mechanisms. For the construction of potentially elaborate mech-


anisms we follow the pattern that can be described recursively by
<mechanism)::= <primitive mechanism) I
<proper composition of <mechanism)'s)
For this pattern to be of any use at all, two conditions must be satisfied: we
must have "primitive mechanisms" to start with and, secondly, we must
know how to "compose properly". The statements introduced thus far can be
taken as the primitive mechanisms, and it is with the act of properly com-
posing a new mechanism out of given ones that the remainder of this chapter
is concerned. The new mechanism, in its turn, can act as part of a still larger
composite object.
Whenever an object has been composed of parts, we can view the resulting
object in two ways. Either we view it as "an unanalyzed whole" having its
properties more or less by magic (or by faith or postulate); in this view only
its properties are relevant, it is irrelevant how it has been composed from
which parts. In this view any two mechanisms having the same properties
are equivalent. Alternatively we view it as "a composite object" such that we
can understand why it has the properties stated. Then we regard the parts as
"little" unanalyzed wholes of which only tl:e properties count. The latter
view makes clear what we mean by "compo,-ition". The composition must
define how the properties of the whole follow f1om the properties of the parts.
After these general remarks we return to our specific mechanisms, whose
properties we consider captured by their predicate transformers. More
specifically, given two mechanisms SJ and S2, whose predicate transformers
are known, can we think of a rule for deriving a new predicate transformer
from the two given ones? If so, we can regard this resulting predicate trans-
former as describing the properties of a composite object, built in a special
way from the parts SJ and S2.
One of the simplest ways of deriving a new function from two given ones
is the so-called "functional composition", i.e. supplying the value of the one
as argument to the other. It is tradition to denote the composite object cor-
responding to that predicate transformer by "SJ; S2" and we define
wp("SJ; S2", R) = wp(SJ, wp(S2, R))
which, if we so desire, can be viewed as the semantic definition of the semi-
colon.
Note. From the fact that the predicate transformers for SJ and S2
enjoy the properties J through 4 of the previous chapter, we can derive
that also the predicate transformer for "SJ; S2" as defined above has
these four properties. For instance, because for SJ and S2 the Law of the
Excluded Miracle holds:
wp(SJ, F) = F and wp(S2, F) = F
THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE 31

we conclude, by substituting F for R in the above definition,


wp("SJ; S2'', F) = wp(SJ, wp(S2, F))
= wp(SJ, F)
=F
The verification that the other three properties hold as well is left as an
exercise for the reader. (End of Note.)
Before proceeding we shall convince ourselves that our formal definition
of the semantics of the semicolon captures our intuitive understanding of
it (if we have one!), viz. that the composite mechanism "SJ; S2" can be
implemented by the rule "first activate SI and upon termination of this
activity, activate S2". Indeed, in our definition of wp(" SI; S2", R) we supply
R -the post-condition for the composite mechanism- as the post-condition
to the predicate transformer for S2 and that reflects that the total activity
of "SJ; S2" can end with the activity of S2; the corresponding weakest
pre-condition for S2, viz. wp(S2, R), is supplied as post-condition to the
predicate transformer for SI, i.e. we apparently identify the initial state for
S2 with the final state for SJ. But this is exactly mirrored when the activity
of SI is followed in time by the activation of S2.
Let us, just to be sure, consider an example. Let "SJ; S2" be
"a:= a+ b; b:= a* b"
and let our post-condition be some predicate R(a, b). In that case
wp(S2, R(a, b)) = wp("b:= a* b", R(a, b))
= R(a, a* b)
and
wp("SJ; S2", R(a, b)) = wp(SJ, wp(S2, R(a, b)))
= wp(SJ, R(a, a* b))
= wp("a:= a+ b'', R(a, a* b))
= R(a + b, (a + b) * b)
i.e., we can guarantee a relation R between the final values of a and b,
provided initially the same relation holds between a+ b and (a+ b)* b
respectively.
Finally, because functional composition is associative, it does not matter
whether we parse "SJ; S2; S3" as either "[SI; S2]; S3" or "SI; [S2; S3]",
i.e. we are indeed entitled to regard the semicolon as a concatenation symbol
and there is no ambiguity when we write down a statement list of the form
"SI; S2; S3; ... ; Sn" and we shall freely do so when the opportunity pres-
ents itself.
32 THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE

EXERCISE

Verify that
"xl:= El; x2:= E2" and "x2:= E2; xl:= El"
are semantically equivalent if the variable xl does not occur in the expression E2
while, also, the variable x2 does not occur in the expression El. As a matter of fact,
they are then both semantically equivalent to the concurrent assignment "xl, x2: =
El, E2". (This equivalence is one of the arguments for promoting the concurrent
assignment; its use enables us to avoid sequential overspecification and, even more,
in the concurrent assignment it is clear that the two expressions El and E2 could be
evaluated concurrently, a fact that for some implementation techniques could be of
interest. Besides that we have the perhaps more interesting possibility that "xl,
x2:= El, E2" is semantically equivalent neither to "xl:= El; x2:= E2" nor to
"x2:= E2; xl:= El".) (End of Exercise.)

Before the introduction of the semicolon we could only write single-state-


ment programs; with the aid of the semicolon we can write programs as a
concatenation of n (n > 0) statements: "SJ; S2; S3; ... ; Sn". Intermediate
nontermination excluded, the execution of such a program always implies
the time-succession of n statement executions, first SI, then S2, etc. until Sn.
From our example of the cardboard game implementing Euclid's algorithm
we know, however, that we must be able to describe a wider class of "rules
of the game": each game will exist of a succession of moves, where each move
is either "x:= x - y" or "y:= y - x", but the way in which these moves
alternate in time and even their total number will differ from game to game;
it depends on the initial position of the pebble, it depends on the initial state
of the system. If the semicolon is our only means for composing a new whole
of given parts, we are unable to express this and we must therefore look for
something new.
As long as the semicolon is the only connective we have, the one and
only circumstance under which one of the constituent mechanisms Si (i > I)
is activated is proper termination of the (lexicographically) preceding one.
In order to achieve the flexibility we need, it must be possible to make the
activation of a (sub)mechanism co-dependent on the current state of the
system. For this purpose we introduce -in two steps- the notion of a
"guarded command", the syntax for which is given by:
(guarding head) : :=(boolean expression)--> (statement)
(guarded command) : :=(guarding head){; (statement)}
where the braces "{" and"}" should be read as: "followed by zero or more
instances of the enclosed".
(An alternative syntax for a guarded command would have been:
(statement list)::= (statement){;(statement)}
(guarded command) : :=(boolean expression)--> (statement list)
THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE 33

but for reasons that need not concern us now, I prefer the syntax that intro-
duces the concept of the guarding head.)
In this connection the boolean expression preceding the arrow is called
"a guard". The idea is that the statement list following the arrow will only
be executed provided initially the corresponding guard is true. The guard
enables us to prevent execution of a statement list under those initial circum-
stances under which execution would be undesirable or, if partial operations
are involved, impossible.
The truth of the guard is a necessary initial condition for the execution of
the guarded command as a whole; it is, of course, not sufficient, because in
some way or another -we shall meet two of them- it must also potentially
be "its turn". That is why a guarded command is not considered as a state-
ment: a statement is irrevocably executed when its turn has arrived, the
guarded command can be used as a building block for a statement. More
precisely: we shall propose two different ways of composing a statement of
a set of guarded commands.
After some reflection it is quite natural to consider a set of guarded com-
mands. Suppose that we are requested to construct a mechanism such that,
ifthe initial state satisfies Q, the final state will satisfy R. Suppose furthermore
that we cannot find a single statement list that will do the job in all cases.
(If there existed such a statement list, we should use just that one and there
would be no need for guarded commands.) We may, however, be able to
find a number of statement lists, each of which will do the job for a subset of
possible initial states. To each of these statement lists we can attach as guard
a boolean expression characterizing the subset for which it is adequate and
when we have enough sufficiently tolerant guards such that the truth of Q
implies the truth of at least one guard, we have for each initial state satisfying
Q a mechanism that will bring the system in a state satisfying R, viz. one of
the guarded commands whose guard is initially true.
In order to express this we define first
<guarded command set) : :=<guarded command){D<guarded command)}
where the symbol "0" (pronounce "bar") acts as a separator between other-
wise unordered alternatives. One of the ways to form a statement from a
guarded command set is by embracing it by the bracket pair "if ... fi'', i.e.
our syntax for the syntactic category called "statement" is extended with
a next form:
<statement)::= if <guarded command set) fi
It indicates a special way in which we can combine a number of guarded
commands into a new mechanism. We can view the activity that will take
place when this mechanism is activated as follows. One of the guarded com-
mands whose guard is true is selected and its statement list is activated.
34 THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE

Before we proceed to give a formal definition of the semantics of our new


construct, three remarks are in order.

1. It is assumed that all guards are defined; if not, i.e. if the evaluation of
a guard may lead to a not properly terminating activity, then the whole
construct is allowed to fail to terminate properly.
2. In general our construct will give rise to nondeterminacy, viz. for each
initial state for which more than one guard is true, because it is left
undefined which of the corresponding statement lists will then be selected
for activation. No nondeterminacy is introduced if any two guards
exclude each other.
3. If the initial state is such that none of the guards is true, we are faced
with an initial state to which none of the alternatives and therefore
neither the construct as a whole does cater. Activation in such an initial
state will lead to abortion.
Note. If we allow the empty guarded command set as well, the state-
ment "if fi" is therefore semantically equivalent with our earlier statement
"abort". (End of note.)
(In the following formal definition of the weakest pre-condition for the
if-fl-construct we shall restrict ourselves to the case that all the guards are
total functions. If this is not the case, the expression should be pre-fixed,
with a cand, by the additional requirement that the initial state lies in the
domain of all the guards.)
Let "IF" be the name of the statement
if Bi --> SL1 a B2 --> SL2 a D B. __, SL. fi

then for any post-condition R


wp(IF, R) = (Ej: I <j < n: Bi) and
(Aj: I <j < n: B 1 => wp(SLJ> R))
This formula should be read as follows: wp(IF, R) is true for every point in
state space where there exists at least one j in the range I < j < n such that
BJ is true and where furthermore for all j in the range I < j < n such that
BJ is true, wp(SLi, R) is true as well. Using the " ... " as we have done in
the definition of IF itself, we could have given the alternative form
wp(IF, R) = (B1 or B 2 or ... or B.) and
(B1 => wp(SL1, R)) and
(B 2 => wp(SL 2, R)) and . .. and
(B. => wp(SL., R))
It is not too difficult to understand these formulae. The requirement that
at least one of the guards is true reflects abortion in the case that all guards
are false. Furthermore we require for each initial state satisfying wp(IF, R)
that BJ=> wp(SLi, R) for allj. For those values of j for which BJ is false,
THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE 35

this implication is true regardless of the value of wp(SLj, R), i.e. for those
values of j, apparently it does not matter what SL j would do. Our implemen-
tation reflects this by not selecting for activation an SL 1 with an initially
false guard Bj. For those values ofj for which Bj is true, this implication can
only be true if wp(SL1 , R) is true as well. As our formal definition requires
the truth of the implication for all values of j, our implementation is indeed
free to choose when more than one guard is true.
The if-ft-construct is only one of the two ways in which we can build
a statement from a guarded command set. In the if-ft-construct, a state in
which all guards are false leads to abortion; in our second form we allow
the state in which no guards are true to lead to proper termination, and
because then no statement list is activated, it is only natural that it will then
be semantically equivalent to the empty statement; the counterpart of this
permission to terminate properly when no guard is true, however, is that
the activity is not allowed to terminate as long as one of the guards is true.
That is, upon activation the guards are inspected. The activity terminates if
there are no true guards; if there are true guards one of the corresponding
statement lists is activated and upon its termination the implementation
starts all over again inspecting the guards. This second construct is denoted
by embracing the guarded command list by the bracket pair "do ... od".
·The formal definition of the weakest pre-condition for the do-od-construct
is more complicated than the one for the if-ft-construct; as a matter of fact
the first one is expressed in terms of the second one. We shall first give the
formal definition and then its explanation. Let "DO" be the name of the
statement
do B 1 --> SL1 D B 2 --> SL 2 D ••• DB.--> SL. od
and let "IF" be the name of the statement formed by embracing the same
guarded command set by the bracket pair "if . .. ft". The conditions Hk(R)
are given by
H 0 (R) =Rand non (Ej: 1 <j < n: B)
and fork> 0:

then
wp(DO, R) =(Ek: k > 0: Hk(R))
Here the intuitive understanding of Hk(R) is: the weakest precondition
such that the do-od-construct will terminate after at most k selections of a
guarded command, leaving the system in a final state satisfying the post-
condition R.
Fork = 0 it is required that the do-od-construct will terminate without
selecting any guarded command, i.e. there may not exist a true guard, as is
expressed by the second term; and the initial truth of R is then clearly the
36 THE SEMANTIC CHARACTERIZATION OF A PROGRAMMING LANGUAGE

necessary and sufficient additional condition for the final truth of R, as is


expressed by the first term.
Fork> 0 we have to distinguish two cases: either none of the guards
is true, but then R must hold and this leads to the second term; or at least
one of the guards i'> true, but what then happens starts as if the statement
"IF" is activated once (in an initial state not leading to immediate abortion
due to lack of true guards). But after that execution, in which one guarded
command has been selected, we must be sure to arrive in a state such that
at most k - I further selections are needed to ensure termination in a final
state satisfying R. According to our definition, this post-condition for the
statement "IF" is Hk_i(R).
The last line, defining wp(DO, R) expresses that there must exist a value
of k such that at most k selections will be needed to ensure termination in a
final state satisfying the post-condition R.
Note. lfwe allow the empty guarded command set as well, the statement
"do od" is therefore semantically equivalent with our earlier statement
"skip". (End of note.)
5 TWO THEOREMS

In this chapter we derive two theorems concerning the statements we


build from guarded command sets. The minor theorem concerns the alterna-
tive if-ti-construct, the major one the repetitive do-od-construct. In this
chapter we shall discuss the constructs derived from the guarded command
set
B1--> SL1 a B2--> SL2 a ... a B.--> SL.
We shall denote by "IF" and "DO" respectively the statements constructed
by embracing the above guarded command set by the bracket pairs "if . .. fi"
and "do . .. od" respectively. We shall furthermore use the abbreviation
BB= (Ej: I <j < n: B 1)
THEOREM

The basic theorem for the alternative construct.


Using the notational conventions just described, we can formulate the
basic theorem for the alternative construct:
Let the alternative construct IF and a predicate pair Q and R be such
that
Q=>BB (J)
and
(Aj: I <j < n: (Q and B 1) => wp(SL 1, R)) (2)
both hold for all states, then
Q => wp(IF, R) (3)
holds for all states as well.
Because by definition
wp(IF, R) =BB and (Aj: I <j < n: B 1 => wp(SL1, R))

37
38 TWO THEOREMS

and Q implies on account of (J) the first term on the right-hand side, (3)
is proved if on account of (2) we can conclude that
Q ~ (Aj: I <j < n: Bi~ wp(SL1, R)) (4)
holds for all states. For any state for which Q is false, (4) is true by defini-
tion of the implication. For any state for which Q is true and for any j we
distinguish two cases: either B 1 is false, but then B 1 ~ wp(SLi, R) is true
by definition of the implication, or B 1 is true, but then on account of (2),
wp(SL1, R) is true and therefore Bi~ wp(SLi, R) is true as well. As a
result (4) and therefore (3) has been proved.
Note. In the special case of binary choice (n = 2) and B 2 = non B 1 ,
we have BB = T and the weakest pre-condition reduces to
(B1 ~ wp(SLI> R)) and (non B 1 ~ wp(SL2 , R)) =
(non B 1 or wp(SLI> R)) and (B 1 or wp(SL 2 , R)) =
(B1 and wp(SLI> R)) or (non B 1 and wp(SL 2 , R)) (5)
The last reduction is possible because of the four cross-terms B1 and non
B 1 = F and can be omitted, while wp(SL1 , R) and wp(SL 2 , R) can be
omitted as well: in every state such that it is true, exactly one of the
two terms of (5) must be true and thus it can be omitted from that
disjunction. Formula (5) is closely related to the way in which C.A.R.
Hoare has given the semantics for the if-then-else of ALGOL 60. Because
here BB= T and is implied by everything, we can conclude (3) on the
weaker assumption
((Q and B 1) ~ wp(SLI> R)) and ((Q and non B 1) ~ wp(SL 2 , R)).
(End of Note.)
The theorem for the alternative construct is of special importance in the
case that the predicate pair Q and R can be written as
R=P
Q =PandBB
In that case the antecedent (J) is fulfilled automatically while the antecedent
(2) reduces -because (BB and Bi)= Bi- to
(Aj: I <j < n: (P and B 1) ~ wp(SLi, P)) (6)
from which we can conclude, on account of (3)
(P and BB) ~ wp(IF, P) for all states (7)
a relation that will form the antecedent for our next theorem.

THEOREM

The basic theorem for the repetitive construct.


Let a guarded command set with its derived alternative construct IF and
a predicate P be such that
TWO THEOREMS 39

(P and BB)==> wp(IF, P) (7)


holds for all states; then for the corresponding repetitive construct DO we
can conclude that
(P and wp(DO, T)) ==> wp(DO, P and non BB) (8)
for all states.
This theorem is also referred to as the "Fundamental Invariance Theorem
for Loops" and it is intuitively not difficult to understand. Our antecedent
(7) tells us that if P holds initially and one of the guarded commands is
selected for execution, then after its execution, P is still true. In other words,
the guards ensure that the execution of the corresponding statement lists will
not destroy the validity of P when initially valid. No matter how often a
guarded command of the set is selected, P will therefore hold at each new
inspection of the guards. Upon completion of the whole repetitive construct,
when none of the guards is true, we shall therefore end in a finai state satisfy-
ing P and non BB. The question is: will it terminate properly? Yes, it will,
provided that wp(DO, T) holds initially as well; as any state satisfies T,
wp(DO, T) is by definition the weakest pre-condition for the initial state
such that activation of the statement DO will lead to a properly terminating
activity.
The formal proof of the basic theorem for the repetitive construct relies
on the formal definition of its semantics (see the previous chapter) from which
we derive
H 0 (T) = non BB (9)
fork> 0: Hk(T) = wp(IF, Hk_i(T)) or non BB (JO)
H 0 (P and non BB) = P and non BB (11)
fork> 0: Hk(P and non BB) = wp(IF, Hk_i(P and non BB)) or
P and non BB (12)

We start by proving via mathematical induction that the antecedent (7)


guarantees that
fork> 0: (13)
for all states.
Relations (9) and (11) tell us that (13) holds for k = 0. We shall show
that relation (13) can be proved for k = K (K > 0) on the assumption that
(13) holds fork= K - 1.

P and Hx(T) = P and wp(IF, Hx_i(T)) or P and non BB


= P and BB and wp(IF, Hx_i(T)) or P and non BB
==> wp(IF, P) and wp(IF, Hx_i(T)) or P and non BB
= wp(IF, P and Hx_ 1(T)) or P and non BB
==> wp(IF, Hx_i(P and non BB)) or P and non BB
= Hx(P and non BB)
40 TWO THEOREMS

The equality in the first line follows from (10), the equality in the second
line follows from the fact that any wp(IF, R) =- BB, the implication in the
third line follows from (7), the equality in the fourth line from property 3
for predicate transformers, the implication of the fifth line follows from
property 2 for predicate transformers and (13) assumed for k = K - l, and
the last line follows from (12). Thus (13) has now been proved for k = K
and therefore for all k > 0.
Finally, for any point in state space we have -thanks to (13)-
p and wp(DO, T) = (E k: k > 0: P and Hk(T))
=-(Ek: k > O: Hk(P and non BB))
= wp(DO, P and non BB)
and thus (8), the basic theorem for the repetitive construct, has been proved.
The basic theorem for the repetitive construct derives its extreme usefulness
from the fact that neither in the antecedent nor in the consequent the actual
number of times a guarded command has been selected is mentioned. As a
result it allows assertions even in those cases in which this number is not
determined by the initial state.
ON THE DESIGN
OF PROPERLY TERMINATING
6 CONSTRUCTS

The basic theorem for the repetitive construct asserts for a condition P
that is kept invariantly true that
(P and wp(DO, T)) => wp(DO, P and non BB)
Here the term wp(DO, T) is the weakest pre-condition such that the
repetitive construct will terminate. Given an arbitrary construct DO it is in
general very hard -if not impossible- to determine wp(DO, T); I therefore
suggest to design our repetitive constructs with the requirement of termina-
tion consciously in mind, i.e. to choose an appropriate proof for termination
and to make the program in such a way that it satisfies the assumptions of
the proof.
Let, again, P be the relation that is kept invariant, i.e.
(P and BB)=> wp(IF, P) for all states, (J)
let furthermore t be a finite integer function of the current state such that
(P and BB) => (t > 0) for all states (2)
and furthermore, for any value tO and for all states
(P and BB and t < tO + 1) => wp(IF, t < tO) (3)

Then we shall prove that


P => wp(DO, T) for all states (4)
from which, together with the basic theorem for repetition we can conclude
that we have for all states
P => wp(DO, P and non BB) (5)

41
42 ON THE DESIGN OF PROPERLY TERMINATING CONSTRUCTS

We show this by proving first via mathematical induction that


(P and t < k) ==> Hk(T) for all states (6)
holds for all k > 0. We first establish the truth of (6) fork = 0. As H 0 (T) =
non BB, we have to show that
(P and t < 0) ==>non BB for all states (7)

But (7) is no other expression than (2): both are equal to


non P or non BB or (t > 0)
and thus (6) holds for k = 0.
We now assume that (6) holds fork= K; then

(P and BB and t < K + I) ==> wp(IF, P and t < K)


==> wp(IF, HAT));
(P and non BB and t < K +I)==> non BB
= Ho(T)

And these two implications can be combined (from A ==> C and B ==> D we
may conclude that (A or B) ==> (C or D) holds):
(P and t < K + I)==> wp(IF, HK(T)) or H 0 (T) = HK+i(T)
and thus the truth of (6) has been established for all k > 0. Becuase t is a
finite function, we have

(Ek: k > 0: t < k)


and
P ==> (E k: k > 0: P and t < k)
==>(Ek: k > 0: Hk(T))
= wp(DO, T)

and thus (4) has been proved.


Intuitively the theorem is quite clear. On the one hand P will remain true
and therefore t > 0 will remain true as well; on the other hand relation (3)
expresses that each selection of a guarded command will cause an effective
decrease oft by at least J. An unbounded number of selections of a guarded
command would decrease t below any limit, which would lead to a con-
tradiction.
The applicability of this theorem relies upon the validity of (2) and (3).
Relation (2) is rather straightforward, relation (3) is more tricky. Our basic
theorem for the alternative construct with
Q = (P and BB and t < tO + I)
R = (t < tO)
ON THE DESIGN OF PROPERLY TERMINATING CONSTRUCTS 43

-the occurrence of the free variable tO in both predicates is the reason why
we have talked about "a predicate pair"- tells us, that we can conclude that
(3) holds if
(Aj: 1 s j < n: (P and Bi and t < tO + 1) =- wp(SLi, t < tO))
In other words, we have to prove for each guarded command that the selec-
tion will cause an effective decrease oft. Bearing in mind that t is a function
of the current state, we can consider
wp(SLi, t < tO) (8)
This is a predicate involving, besides the coordinate variables of the state
space, also the free variable tO. Up till now we have regarded such a predicate
as a predicate characterizing a subset of states. For any given state, however,
we can also regard it as a condition imposed upon tO. Let tO = tmin be the
minimum solution for tO of equation (8); we can then interpret the value
tmin as the lowest upper bound for the final value oft. Remembering that,
just as t itself, tmin also is a function of the current state, the predicate
tmin < t- 1
can be interpreted as the weakest pre-condition such that execution of SL1
is guaranteed to decrease the value of t by at least 1. Let us denote this pre-
condition, where -we repeat- the second argument t is an integer valued
function of the current state, by
wdec(SL1, t);
then the invariance of P and the effective decrease of t is guaranteed if we
have for allj:
(P and B 1) =- (wp(SL 1, P) and wdec(SL1, t)) (9)
A usually practical way for finding a suitable B 1 is the following. Equa-
tion (9) is of the type
(P and Q) =- R
where a -practically computable!- Q must be found for given P and R.
We observe that

1. Q = R is a solution.
2. If Q = (Ql and Q2) is a solution and P =- Q2, then Ql is a solution
as well.
3. If Q = (Ql or Q2) is a solution and P =-non Q2 (or, what amounts to
the same thing: (P and Q2) = F), then Ql is a solution as well.
4. If Q is a solution and Ql =- Q, then Ql is a solution as well.

Note 1. If, in doing so, we arrive at a candidate Q for B 1 such that


P =-non Q, this candidate can further be simplified (according to step
44 ON THE DESIGN OF PROPERLY TERMINATING CONSTRUCTS

(3) from above, because for every Q we have Q = (false or Q)) to Q =


false; this means that the guarded command under consideration is no
good, it can be omitted from the set because it will never be selected.
(End of Note 1.)
Note 2. It is often practical to split equation (9) into the two equations
(P and B 1) ~ wp(SL1 , P) (9a)
and
(9b)
and deal with them separately. Thus one separates the two concerns: (9a)
is concerned with what remains invariant, while (9b) is concerned with
what ensures progress. If, while dealing with an equation (9a) we arrive
at a B 1, such that P ~ B 1, then it is certain that that condition will not
satisfy (9b), because with such a B 1 the invariance of P would ensure
nontermination. (End of Note 2.)
Thus we can make a mechanism DO, such that
P ~ wp(DO, P and non BB)
our B/s must be strong enough so as to satisfy the implications (9) and as
a result the now guaranteed post-condition P and non BB might be too weak
to imply the desired post-condition R. In that case we have not solved our
problem yet and we should consider other possibilities.
7 EUCLID'S ALGORITHM
REVISITED

At the risk of boring my readers I shall now devote yet another chapter
to Euclid's algorithm. I expect that in the meantime some of my readers will
already have coded it in the form

x,y:= X, Y;
do x =I= y-> if x > y-> x:= x - y
ny > x _. y:= y - x
fi
od;
print(x)

where the guard of the repetitive construct ensures that the alternative con-
struct will not lead to abortion. Others will have discovered that the algorithm
can be coded more simply as follows:

x,y:= X, Y;
do x > y - > x: = x - y
lly>x->y:=y-x
od;
print(x)

Let us now try to forget the cardboard game and let us try to invent
Euclid's algorithm for the greatest common divisor of two positive numbers
X and Y afresh. When confronted with such a problem, there are in principle
always two ways open to us.
The one way is to try to follow the definition of the required answer as
closely as possible. Presumably we could form a table of the divisors of X;

45
46 EUCLID'S ALGORITHM REVISITED

this table would only contain a finite number of entries, among which would
be 1 as the smallest and X as the largest entry. (If X = 1, smallest and largest
entry will coincide.) We could then also form a similar table of the divisors
of Y. From those two tables we could form a table of the numbers occurring
in both of them; this then is the table of the common divisors of X and Y
and is certainly nonempty, because it will contain the entry 1. From this
third table we therefore can select (because it is also finite!) the maximum
entry and that would be the greatest common divisor.
Sometimes following the definition closely, as sketched above, is the best
thing we can do. There is, however, an alternative approach to be tried if we
know (or can find) properties of the function to be computed. It may be that
we know so many properties that they together determine the function and
we may try to construct the answer by exploiting those properties.
In the case of the greatest common divisor we observe, for instance, that,
because the divisors of -x are the same as those for x itself, the GCD(x, y)
is also defined for negative arguments and not changed 1f we change the sign
of arguments. It is also defined when just one of the arguments is =0; that
argument has an infinite table of divisors (and we should therefore not try
to construct that table!), but because the other argument (=FO) has a finite
table of divisors, the table of common divisors is still nonempty and finite.
So we come to the conclusion that GCD(x, y) is defined for each pair (x, y)
sue~ that (x, y) =I= (0, 0). Furthermore, on account of the symmetry of the
notion "common'', the greatest common divisor of two numbers is a sym-
metric function of its two arguments. A little more reasoning can convince
us of the fact that the greatest common divisor of two arguments is unchanged
if we replace one of them by their sum or difference. Collecting our know-
ledge we can write down:
for (x, y) =I= (0, 0)

(a) GCD(x, y) = GCD(y, x).


(b) GCD(x, y) = GCD(-x, y).
(c) GCD(x, y) = GCD(x +y, y) = GCD(x - y, y), etc.
(d) GCD(x, y) = abs(x) if x = y.

Let us suppose for the sake of argument that the above four properties
represent our only knowledge about the GCD-function. Do they suffice?
You see, the first three relations express the greatest common divisor of x
and y in that of another pair, but the last one expresses it directly in terms
of x. And this is strongly suggestive of an algorithm that, to start with,
establishes the truth of
P = (GCD(X, Y) = GCD(x, y))
(this is trivially achieved by the assignment "x, y:= X, Y"), whereafter we
EUCLID'S ALGORITHM REVISITED 47

"massage" the value pair (x, y) in such ways, that according to (a), (b) or
(c) relation Pis kept invariant. If we can manage this massaging process so
as to reach a state satisfying x = y, then, according to (d), we have found
our answer by taking the absolute value of x.
Because our ultimate goal is to establish under invariance of P the truth
of x = y we could try as monotonically decreasing function t = abs(x - y).
In order to simplify our analysis -always a laudable goal!- we observe
that, when starting with nonnegative values for x and y, there is nothing to
be gained by introducing a negative value: if the assignment x: = E would
have established x < 0, the assignment x:= -Ewould never have given rise
to a larger final value oft (because y > 0). We therefore sharpen our relation
P to be kept invariant:
P =(Pl and P2)
with
Pl= (GCD(X, Y) = GCD(x,y))
and
P2 = (x > 0 and y > 0)
This means that we have lost all usage for the operations x: = -x and
y:= -y, the massagings permissible on account of property (b). We are
left with
from (a): x,y:=y,x
from (c): x:= x +y y:= y +x
x:= x - y y:=y-x
x:= y- x y:= x - y
Let us deal with them in turn and start with x, y:= y, x:
wp("x, y:= y, x", abs(x - y) < tO) = (abs(y - x) < tO)
therefore
tmin(x, y) = abs(y - x)
hence
wdec("x, y:= y, x", abs(x - y)) = (abs(y - x) < abs(x - y) - 1) = F.
And here -for those who would not believe it without a formal deriva-
tion- we haved proved (or, if you prefer, discovered) by means of our cal-
culus that the massaging operation x, y:= y, xis no good because it fails to
cause an effective decrease of our t as chosen.
The next trial is x:= x + y and we find, again applying the calculus of
the preceding chapters:
wp("x:= x + y", abs(x - y) < tO) = (abs(x) < tO)
tmin(x, y) = abs(x) = x (we confine ourselves to states satisfying P)
48 EUCLID'S ALGORITHM REVISITED

wdec("x:= x + y", abs(x - y)) = (tmin(x, y) < t(x, y) - I)


= (x < abs(x - y) - I)
= (x +I< abs(x - y))
= (x + I < x - y or x + I <y - x)
Because P implies the negation of the first term and furthermore P =>
wp("x:= x + y", P), the equation for our guard
(P and BJ=> (wp(SLj, P) and wdec(SLj, t))
is satisfied by the last term and we have found our first and -for reasons of
symmetry also- our second guarded command:
x+J<y-x-x:=x+y
and
y+J<x-y-y:=y+x
Similarly we find (the formal manipulations are left as an exercise for the
industrious reader)
I < y and 3 * y < 2 * x - I ___. x: = x - y
and
I < x and 3 * x < 2 *y - I ----> y: = y - x
and
x +I<y- x ___. x:= y - x
and
y +I< x -y-> y:= x -y
Investigating what we have got, we must come to the sad conclusion that,
in the manner mentioned at the close of our previous chapter, we have failed
to solve our problem: P and non BB does not imply x = y. (For instance,
for (x, y) = (5, 7) all the guards are false.) The moral of the story is, of course,
that our six steps do not always provide a path from initial state to final
state, such that abs(x - y) is monotonically decreasing. So we must try
"other possibilities".
To start with, we observe that there is no harm in making P2 a little
stronger
P2 = (x > 0 and y > 0)
for the initial values of x and y satisfy it and, also, there is no point in generat-
ing a value = 0, for this value can only be generated by subtraction in a state
where x = y and then the final state has already been reached. But this is
only a minor modification; the major modificatio:i must come from a new
function t and I suggest to take a t that is only bounded from below thanks
to the invariant relation P. An obvious example is
t = x +Y
We find for the concurrent assignment
EUCLID'S ALGORITHM REVISITED 49

wdec("x, y:= y, x'', x + y) = F


so the concurrent assignment is rejected.
We find for the assignment x:= x + y
wdec("x:= x + y", x + y) = (y < 0)
an expression the truth of which is excluded by the truth of the invariant
relation P and therefore (together with y:= y + x) also this is rejected.
For the next assignment x:= x - y, however, we find
wdec("x:= x - y'', x + y) = (y > 0)
a condition that is implied by P (that I have strengthened for this reason).
Full of hope we investigate
wp("x:= x - y'', P) = (GCD(X, Y) = GCD(x - y, y) and
x- y > 0 and y > 0)
the outermost terms can be dropped as they are implied by P and we are left
with the middle one; thus we find
x > y-+ x:= x - y
and
y>x-+y:=y-x
and now we could stop the investigation, for when both guards have become
false, our desired relation x = y holds. When we would proceed we would
find a third and a fourth alternative:
x > y - x and y > x -+ x: = y - x
and
y > x - y and x > y-+ y:= x - y
but it is not clear what could be gained by their inclusion.

EXERCISES

1. Investigate for the same P the choice t = max(x, y).


2. Investigate for the same P the choice t = x + 2 * y.
3. Prove that for X > 0 and Y > 0 the following program, operating on four
variables
x,y, u,v: = X, Y, Y, X;
do x > y-+ x, v: = x - y, v + u
0y > x-+ y, u: = y - x, u + v
od;
print((x + y)/2); print((u + v)/2)

prints the greatest common divisor of X and Y, followed by their smallest com-
mon multiple. (End of exercises.)
50 EUCLID'S ALGORITHM REVISITED

Finally, if our little algorithm is activated with a pair (X, Y) that does
not satisfy our assumption X > 0 and Y > 0, unpleasant things will happen:
if (X, Y) = (0, 0), it will produce the erroneous result zero, and if one of the
arguments is negative, activation will set an endless activity in motion. This
can be prevented by writing

if X > 0 and Y > 0 ->

x,y:= X, Y;
do x > y -> x: = x - y ay > x -> y: = y - x od;
print(x)
fi

By providing only one alternative in the alternative construct we have clearly


expressed the conditions under which this little program is expected to work.
In this form it is a well-protected and rather self-contained piece with the
more pleasant property that attempted activation outside its domain will
lead to immediate abortion.
8 THE FORMAL TREATMENT
OF SOME SMALL EXAMPLES

In this chapter I shall give the formal development of a series of small


programs solving simple problems. This chapter should not be interpreted
as my suggestion that these programs must or should be developed in such
a way: such a suggestion would be somewhat ridiculous. I expect most of
my readers to be familiar with most of the examples and, if not, they can
probably write down a program, hardly aware of having to think about it.
The development, therefore, is given for quite other reasons. One reason
is to make ourselves more familiar with the formalism as far as it has been
developed up till now. A second reason is to convince ourselves that, in
principle at least, the formalism is able to make explicit and quite rigorous
what is often justified with a lot of hand-waving. A third reason is precisely
that most of us are so familiar with them that we have forgotten how, a
long time ago, we have convinced ourselves of their correctness: in this
respect this chapter resembles the beginning lessons in plane geometry that
are traditionally devoted to proving the obvious. Fourthly, we may occa-
sionally get a little surprise and discover that a little familiar problem is not
so familiar after all. Finally it may shed some light on the feasibility, the
difficulties, and the possibilities of automatic program composition or mechan-
ical assistance in the programming process. This could be of importance
even if we do not have the slightest interest in automatic program composi-
tion, for it may give us a better appreciation of the role that our inventive
powers may or have to play.
In my examples I shall state requirements of the form "for fixed x,
y, ... "; this is an abbreviation for "for any values xO, yO, . .. a post-condi-
tion of the form x = xO and y = yO and ... should give rise to a pre-condi-
tion implying x = xO and y = yO and ... ". We shall guarantee this by

51
52 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES

treating such quantities as "temporary constants"; they will not occur to


the left of an assigment statement.
First example.
Establish for fixed x and y the relation R(m):
(m = x or m = y) and m > x and m > y
For general values of x and y the relation m = x can only be established
by the assignment m:= x; as a consequence (m = x or m = y) can only be
established by activating either m:= x or m:= y. In flow-chart form:

FIGURE 8-J

The point is that at the entry the good choice must be made so as to
guarantee that upon completion R(m) holds. For this purpose we "push the
post-condition through the alternatives":

FIGURE 8-2
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 53

and we have derived the guards! As


R(x) = ((x = x or x = y) and x > x and x > y) = (x > y)
and
R(y) = ((y = x or y = y) and y > x and y > y) = (y > x)
we arrive at our solution:
if x > y - m:= x 0 y > x ___. m:= y fi
Because (x > y or y > x) = T, the program will never abort (and in passing
we have given an existence proof: for any values x and y there exists an m
satisfying R(m)). Because (x > y and y > x) #:- F, our program is not
necessarily deterministic. If initially x = y, it is undetermined which of the
two assignments will be selected for execution; this nondeterminacy is fully
correct, because we have shown that the choice does not matter.
Note. If the function "max" had been an available primitive, we could
have coded m:= max(x, y) because R(max(x, y)) = T. (End of note.)
The program we have derived is not very impressive; on the other hand
we observe that in the process of deriving the program from our post-condi-
tion, next to nothing has been left to our invention.

Second example.
For a fixed value of n (n > 0) a function f(i) is given for 0 < i < n.
Establish the truth of R:
0 < k < n and (Ai: 0 < i < n:f(k) > f(i))
Because our program must work for any positive value of n it is hard to
see how R can be established without a loop; we are therefore looking for
a relation P that is easily established to start with and such that eventually
(P and non BB) ~ R. In search of P we are therefore looking for a relation
weaker than R; in other words, we want a generalization of our final state.
A standard way of generalizing a relation is the replacement of a constant
by a variable -possibly with a restricted range- and here my experience
suggests that we replace the constant n by a new variable, j say, and take
for P:
0 < k <j < n and (Ai: 0 < i <j:f(k) > f(i))
where the condition j < n has been added in order to do justice to the finite
domain of the functionf Then, with such a generalization, we have trivially
(Pandj=n)~R

In order to verify whether this choice of P can be used, we must have an


easy way of establishing it to start with. Well, because
(k=Oandj=l)~P

we venture the following structure for our program (comments are added
between braces).
54 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES

k,j:= 0, I {P has been established};


do j =I= n --> a step towards j = n under invariance of P od
{R has been established}
Again my experience suggests to choose as monotonically decreasing
function t of the current state t = (n - j), which, indeed, is such that P =>
(t > 0). In order to ensure this monotonic decrease oft, I propose to subject
j to an increase by I and we can develop
wp(''j:= j + I", P) =
0 < k <i +I< n and (Ai: 0 < i <i +I :f(k) > f(i)) =
0 < k <j +I< n and (Ai: 0 < i <j: f(k) > f(i)) and/(k) > f(j)
The first two terms are implied by P and j =I= n (for (j < n and j =I= n) =>
(j + I < n) and this is the reason why we decided to increase j only by J).
Therefore
(P and} =I= n and/(k) > f(j)) => wp(''j:= j +I", P)
and we can take the last condition as guard. The program
k,j:= 0, I;
doj =I= n--> if/(k) >f(j)--> j:=j +I fi od
will indeed give the correct answer when it terminates properly. Proper
termination, however, is not guaranteed, because the alternative construct
might lead to abortion -and it will certainly do so if k = 0 does not satisfy
R. If f(k) > f(j) does not hold, we can make it hold by the assignment
k: = j and therefore our next investigation is
wp("k,j:=j,j +I", P) =
0 <J<i+ I< n and (Ai: 0 < i <.i+ l:f(j)> f(i)) =
0 <i <i +I< n and (Ai: 0 < i <j:f(j) > f(i))
To our great relief we see that
(P and} =I= n and/(k) </(})) => wp("k,j:= j,j +I", P)
and the following program will do the job without the danger of abortion:
k,j:= 0, I;
do j =I= n--> if/(k) > f(j)--> j:= j + I
Df (k) < f(j)--> k, j:= j, j + I fi od
A few remarks are in order. The first one is that, as the guards of the
alternative construct do not necessarily exclude each other, the program
harbours the same kind of internal nondeterminacy as the first example.
Externally it may display this nondeterminacy as well. The function/ could
be such that the final value of k is not unique; in that case our program can
deliver any acceptable value!
The second remark is that having developed a correct program does not
mean that we are through with the problem. Programming is as much a
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 55

mathematical discipline as an engineering discipline; correctness is as much


our concern as, say, efficiency. Under the assumption that the computation
of a value of the function/for a given argument is a relatively time-consum-
ing operation, a good engineer should observe that in all probability this
program will often ask for many re-computations off (k) for the same value
of k. If this is the case, the trading of some storage space against some com-
putation time is indicated. The effort to make our program more time-
efficient, however, should never be an excuse to make a mess of it. (This is
obvious, but I state it explicitly because so much messiness is so often defend-
ed by an appeal to efficiency considerations. However upon closer inspection
the defense is always invalid: it must be, for a mess is never defensible.) The
orderly technique for trading storage space versus computation time is the
introduction of one or more redundant variables, the value of which can be
used because some relation is kept invariant. In this example the observation
of the possibly frequent re-computation off (k) for the same value of k sug-
gests the introduction of a further variable, max say, and to extend the
invariant relation with the further term
max =f(k)
This relation must be established upon initialization of k and be kept invari-
ant -by explicit assignment to max- upon modification of k. We arrive at
the following program
k,j, max:= 0, l,f(O);
do j =I= n---> if max> f (j)---> j:= j +1
a max <J(j)---> k,j, max:= j,j + l,f(.i) fi od
This program is probably much more efficient than our previous version.
If it is, a good engineer does not stop here, because he will now observe that
for the same value of j he might order a number of times the computation
off (j). It is suggested to introduce a further variable, h say (short for "help"),
and to keep
h =f(j)
invariant. This, however, is something that we cannot do on the same global
level as with our previous term: the value j = n is not excluded and for that
value f (j) is not neces!>arily defined. The relation h = f (j) is therefore re-
established every time j ::/= n has just been checked; upon completion of the
outer guarded command -"just before the od" so to speak- we have
h = f (j - 1) but we don't bother and leave it at that.

k,j, max:= 0, l,f(O);


do j =I= n ---> h : = f (j) ;
if max> h---> j:= j +1
a max< h---> k,j, max:=j,j + 1, h fi od
56 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES

A final remark is not so much concerned with our solution as with our
considerations. We have had our mathematical concerns, we have had our
engineering concerns, and we have accomplished a certain amount of separa-
tion between them, now focussing our attention on this aspect and then on
that aspect. While such a separation of concerns is absolutely essential when
dealing with more complicated problems, I must stress that focussing one's
attention on one aspect does not mean completely ignoring the others. In
the more mathematical part of the design activity we should not head for a
mathematically correct program that is so badly engineered that it is beyond
salvation. Similarly, while "trading" we should not introduce errors through
sloppiness, we should do it carefully and systematically; also, although the
mathematical analysis as such has been completed, we should still understand
enough about the problem to judge whether our considered changes are
significant improvements.
Note. Prior to my getting used to these formal developments I would
always have used ''j < n" as the guard for this repetitive construct, a
habit I still have to unlearn, for in a case like this, the guard ''j =I= n" is
certainly to be preferred. The reason for the preference is twofold. The
guard ''j =I= n" allows us to conclude j = n upon termination without an
appeal to the invariant relation P and thus simplifies the argument about
what the whole construct achieves for us compared with the guard
''j < n". Much more important, however, is that the guard ''j =I= n" makes
termination dependent upon (part of) the invariant relation, viz. j < n,
and is therefore to be preferred for reasons of robustness. If the addition
}:= j + I would erroneously increase j too much and would establish
j > n, then the guard ''j < n" would give no alarm, while the guard
''j =I= n" would at least prevent proper termination. Even without taking
machine malfunctioning into account, this argument seems valid. Let
a sequence x 0 , x 1 , x 2 , ••• be given by a value for x 0 and for i > 0 by
X; = f (x 1_ 1 ), where f is some computable function and let us carefully
and correctly keep the relation X = x 1 invariant. Suppose that we have
in a program a monotonically increasing variable n such that for some
values of n we are interested in x •. Provided n > i, we can always establish
X= x. by
do i =I= n--> i, X:= i + l,f(X) od
If -due perhaps to a later change in the program with the result that it
is no longer guaranteed that n can only increase as the computation
proceeds- the relation n > i does not necessarily hold, the above con-
struct would (luckily!) fail to terminate, while the use of the terminating
do i < n--> i, X:= i + l,f(X) od
would have failed to establish the relation X = x •. The moral of the story
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 57

is that, all other things being equal, we should choose our guards as weak
as possible. (End of note.)

Third example.
For fixed a (a> 0) and d (d > 0) it is requested to establish R:
O<r<danddl(a-r)
(Here the vertical bar "I" is to be read as "is a divisor of".) In other words
we are requested to compute the smallest nonnegative remainder r that is
left after division of a by d. In order that the problem be a problem, we have
to restrict ourselves to addition and subtraction as the only arithmetic opera-
tions. Because the term d I(a - r) is satisfied by r = a, an initialization that,
on account of a> 0, also satisfies 0 < r, it is suggested to choose as invariant
relation P:
0 <rand di (a - r)
For the function t, the decrease of which should ensure termination, we
choose r itself. Because the massaging of r must be such that the relation
di (a - r) is kept invariant, r may only be changed by a multiple of d, for
instance d itself. Thus we find ourselves invited to evaluate
wp("r:= r - d", P) and wdec("r:= r - d", r) =
0 < r - d and di (a - r + d) and d > 0
Because the term d > 0 could have been added to the invariant relation
P, only the first term is then not implied; we find the corresponding guard
"r > d" and the tentative program:

ifa>Oandd>O-->
r:= a;
do r > d --> r: = r - d od
fi

Upon completion the truth of P and non r > d has been established, a
relation that implies R and thus the problem has been solved.
Suppose now that in addition it would have been required to assign to
q such a value that finally we also have
a=d*q+r
in other words it is requested to compute the quotient as well, then we can
try to add this term to our invariant relation. Because
(a= d * q + r) =>(a= d *(q + l)+(r - d))
we are led to the program:
58 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES

if a > 0 and d > 0 ___.


q, r:= 0, a;
do r > d ___. q, r: = q + 1, r - d od
fi

The above programs are, of course, very time-consuming if the quotient


is large. Can we speed it up? The obvious way to do that is to decrease r
by larger multiples of d. Introducing for this purpose a new variable, dd say,
the relation to be established and kept invariant is
d Idd and dd > d
We can speed up our first program by replacing "r:= r - d" by a
possibly repeated decrease of r by dd, while dd, initially = d, is allowed to
grow rather rapidly, e.g. by doubling it each time. So we are led to consider
the following program
ifa>Oandd>O-
r:= a;
dor> d -
dd:= d;
do r > dd---> r:= r - dd; dd:= dd + dd od
od
fi

The relation 0 <rand di (a - r) is clearly kept invariant and therefore this


program establishes R if it terminates properly, but does it? Of course it
does, because the inner loop, that terminates on account of dd > 0, is only
activated with initial states satisfying r > dd and therefore the decrease
r: = r - dd is performed at least once for every repetition of the outer loop.
But the above reasoning -although convincing enough!- is a very
informal one and because this chapter is called "a formal treatment" we can
try to formulate and prove the theorem to which we have appealed intuitively.
With the usual meanings of IF, DO and BB, let P be the relation that is
kept invariant, i.e.
(P and BB) => wp(IF, P) for all states (1)
and furthermore let t be an integer function such that for any value of tO
and for all states
(P and BB and t < tO + 1) => wp(IF, t < tO) (2)
or, in an equivalent formulation,
(P and BB) => wdec(IF, t) for all states, (3)
then for any value of tO and for all states
(P and BB and wp(DO, T) and t < tO + 1) => wp(DO, t < tO) (4)
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 59

or, in an equivalent formulation,


(P and BB and wp(DO, T)) =- wdec(DO, t) (5)
In words: if the relation P that is kept invariant guarantees that each selected
guarded command causes an effective decrease of t, then the repetitive
construct will cause an effective decrease of t if it terminates properly after
at least one execution of a guarded command. The theorem is so obvious
that it would be a shame if it were difficult to prove, but luckily it is not.
We shall show that from (J) and (2) follows that for any value tO and all
states
(P and BB and Hk(T) and t < tO + 1) =- Hk(t < tO) (6)
for all k > 0. It holds for k = 0, because (BB and H 0 (T)) = F, and we have
to derive from the assumption that (6) holds for k = K that it holds for
k = K + 1 as well.

(P and BB and HK+i(T) and t < tO + 1)


=- wp(IF, P) and wp(IF, HK(T)) and wp(IF, t < tO)
= wp(IF, P and HK(T) and t < tO)
=- wp(IF, (P and BB and HK(T) and t < tO + 1) or(t < tO and non BB))
=- wp(IF, HK(t < tO) or H 0 (t < tO))
= wp(IF, HK(t < tO))
=- wp(IF, HK(t < tO)) or H 0 (t < tO)
= HK+ 1(t < tO)
The first implication follows from (1), the definition of HK+ 1(T), and (2);
the equality in the third line is obvious; the implication in the fourth line
is derived by taking the conjunction with (BB or non BB) and then weakening
both terms; the implication in the fifth line follows from (6) fork= Kand
the definition of H 0 (t < tO); the rest is straightforward. Thus relation (6)
has been proved for all k > 0 and from that results (4) and (5) follow imme-
diately.

EXERCISE

Modify also our second program in such a way that it computes the quotient as well
and give a formal correctness proof for your program. (End of exercise)

Let us assume next that there is a small number, 3 say, by which we are
allowed to multiply and to divide and that these operations are sufficiently
fast so that they are attractive to use. We shall denote the product by "m * 3"
-or by "3 * m"- and the quotient by "m/ 3"; the latter expression will
only be called for evaluation provided initially 3 Im holds. (We are working
with integer numbers, aren't we?)
60 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES

Again we try to establish the desired relation R by means of a repetitive


construct, for which the invariant relation Pis derived by replacing a constant
by a variable. Replacing the constant d by the variable dd whose values will
be restricted to d * (a power of J), we come to the invariant relation P:
0 < r< dd and dd I(a - r) and (E i: i > 0: dd = d * J1)
We shall establish the relation and then try to reach, while keeping it invari-
ant, a state satisfying d = dd.
In order to establish it, we need a further repetitive construct: first we
establish
0 < r and dd I(a - r) and (E i: i > 0: dd = d *J 1)

and then let dd grow until it is large enough and r < dd is satisfied as well.
The following program would do:

ifa>Oandd>O-->
r, dd:= a, d;
do r > dd --> dd: = dd * J od;
do dd =1= d __. dd:= dd / J;
do r > dd--> r: = r - dd od
od
fi

EXERCISE

Modify also the above program in such a way that it computes the quotient as well
and give a formal correctness proof for your program. This proof has to demon-
strate that whenever dd/3 is computed, originally 3 \dd holds. (End of exercise.)

The above program exhibits a quite common characteristic. On the outer


level we have two repetitive constructs in succession; when we have two or
more repetitive constructs on the same level in succession, the guarded com-
mands of the later ones tend to be more elaborate than those of the earlier
ones. (This is known as "Dijkstra's Law", which does not always hold.) The
reason for this tendency is clear-: each repetitive construct adds its "and non
BB" to the relation it keeps invariant and that additional relation has to be
kept invariant by the next one as well. But for the inner loop, the second
one is exactly the inverse of the first one; but it is precisely the function of
the added statement
do r > dd--> r: = r - dd od
to restore the potentially destroyed relation r < dd, i.e. the achievement of
the first loop.
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 61

Fourth example.
For fixed QI, Q2, Q3, and Q4 it is requested to establish R where R
is given as RI and R2 with
RI: The sequence of values (qi, q2, q3, q4) is a permutation
of the sequence of values (QI, Q2, Q3, Q4)
R2: qi < q2 < q3 < q4
Taking RI as relation P to be kept invariant a possible solution is
qi, q2, q3, q4:= QI, Q2, Q3, Q4;
do qi > q2--> qi, q2:= q2, qi
nq2 > q3 __. q2, q3:= q3, q2
nq3 > q4 - q3, q4:= q4, q3
od
The first assignment obviously establishes P and no guarded command
destroys it. Upon termination we have non BB, and that is relation R2. The
way in which people convince themselves that it does terminate depends
largely on their background: a mathematician might observe that the number
of inversions decreases, an operations researcher will interpret it as maximiz-
ing qi + 2*q2 + 3*q3 + 4*q4, and I, as a physicist, just "see" the center of
gravity moving in the one direction (to the right, to be quite precise). The
program is remarkable in the sense that, whatever we would have chosen
for the guards, never would there be the danger of destroying relation P:
the guards are in this example a pure consequence of the requirement of
termination.
Note. Observe that we could have added other alternatives such as
qi > q3--> qi, q3:= q3, qi
as well; they cannot be used to replace one of the given three.
(End of note.)
It is a nice example of the kind of clarity that our nondeterminacy has
made possible to achieve; needless to say, however, I do not recommend to
sort a large number of values in an analogous manner.
Fifth example.
We are requested to design a program approximating a square root;
more precisely: for fixed n (n > 0) the program should establish
R: a 2 < n and (a+ 1) 2 > n
One way of weakening this relation is to drop one of the terms of the
conjunction, e.g. the last one, and focus upon
P: a2 < n
a relation that is obviously satisfied by a= 0, so that the initialization need
62 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES

not bother us. We observe that if the second term is not satisfied this is due
to the fact that a is too small and we could therefore consider the statement
"a:= a+ l". Formally we find
wp("a:= a+ l", P) =((a+ 1) 2 < n)
Taking this condition as -the only!- guard, we have (P and non BB) = R
and therefore we are invited to consider the program
if n > 0---->
a:= 0 {P has been established};
do (a+ 1) 2 < n--> a:= a+ 1 {P has not been destroyed} od
{R has been established}
fi {R has been established}
all under the assumption that the program terminates, which is what it does
thanks to the fact that the square of a nonnegative number is a monotonically
increasing function: we can take for t the function n - a 2 •
This program is not very surprising; it is not very efficient either: for large
values of n it could be rather time-consuming. Another way of generalizing
Risby the introduction of another variable (b say-and again restricting its
range) that is to replace part of R, for instance
P: a2 < n and b 2 > n and 0 <a < b
By the way this has been chosen it has the pleasant property that
(P and (a +1= b)) =- R
Thus we are led to consider a program of the form (from now on omitting
the if n > 0 --> ••• fi)
a, b:= 0, n + 1 {P has been established};
do a + 1 -::;t::. b --> decrease b - a under invariance of P od
{R has been established}
Each time the guarded command is executed let d be the amount by which
the difference b - a is decreased. Decreasing this difference can be done by
either decreasing b or increasing a or both. Without loss of generality we
can restrict ourselves to such steps in which either a or b is changed, but not
both: if a is too small and b is too large and in one step only b is decreased,
then a can be increased in a next step. This consideration leads to a program
of the following form.
a, b:= 0, n + 1 {P has been established};
do a+ 1 -::;t::. b __.
d:= ... {d has a suitable value and Pis still valid};
if . .. ---->a:= a+ d {P has not been destroyed}
0 ••• ----> b:= b - d {P has not been destroyed}
fi {P has not been destroyed}
od {R has been established}
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 63

Now

wp("a:= a+ d'', P) =((a+ d) 2 < n and b 2 > n)


which, because P implies the second term, leads to the first term as our first
guard; the second guard is derived similarly and our next form is

a,b:=O,n+l;
do a+ 1 -::;t::. b--> d:= ... ;
if (a+ d) 2 < n--> a:= a+ d
a (b - d) 2 > n--> b:= b - d
fi {P has not been destroyed}
od {R has been established}

We are still left with a suitable choice for d. Because we have chosen b - a
(actually, b - a - 1) as our function t, effective decrease implies that d
must satisfy d > 0. Furthermore the following alternative construct may not
lead to abortion, i.e. at least one of the guards must be true. That is, the
negation of the first, (a+ d) 2 > n, must imply the other, (b - d) 2 > n;
this is guaranteed if
a+d<b-d
or
2*d<b-a
Besides a lower bound we have also found an upper bound ford. We could
choose d = 1, but the larger d is, the faster the program, and therefore we
propose:

a,b:=O,n+l;
do a+ 1
-::;t::. b--> d:=(b - a)div 2;
if (a+ d) 2 < n--> a:= a+ d
a (b - d) 2 > n--> b:= b - d
fi
od

where n div 2 is given by n/2 if 21 n and by (n - 1)/2 if 2 J (n - 1).


The use of the operator div suggests that we should see what happens if
we impose upon ourselves the restriction that whenever dis computed, b - a
should be even. Introducing c = b - a and eliminating the b, we get the
invariant relation
P: a2 < n and (a + c) 2 > n and (E i: i > 0: c = 21)

and the program (in which the roles of c and d have coincided)
64 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES

a, c:= 0, I; do c 2 < n--> c:= 2 *cod;


doc-::/= I--> c:= c / 2;
if (a+ c) 2 < n--> a:= a+ c
a (a+ c) 2 > n--> skip
fi
od

Note. This program is very much like the last program for the third
example, the computation of the remainder under the assumption that
we could multiply and divide by 3. The alternative construct in our above
program could have been replaced by
do (a+ c) 2 < n--> a:= a+ cod
If the condition for the remainder 0 < r < d would have been rewritten
as r < d and (r + d) > d, the similarity would be even more striking.
(End of note.)
Under admission of the danger of beating this little example to death,
I would like to submit the last version to yet another transformation. We
have written the program under the assumption that squaring a number is
among the repertoire of available operations; but suppose it is not and
suppose that multiplying and dividing by (small) powers of 2 are the only
(semi-)multiplicative operations at our disposal. Then our last program as
it stands is no good, i.e. it is no good if we assume that the values of the
variables as directly manipulated by the machine are to be equated to the
values of the variables a and c if this computation were performed "in
abstracto". To put it in another way: we can consider a and c as abstract
variables whose values are represented -according to a convention more
complicated than just identity- by the values of other variables that are in
fact manipulated by the machine. Instead of directly manipulating a and c,
we can let the machine manipulate p, q, and r, such that
p =a* c
q = c2
r = n - a2
It is a coordinate transformation and to each path through our (a,c)-space
corresponds a path through our (p,q,r )-space. This is not always true the
other way round, for the values of p, q, and r are not independent: in terms of
p, q, and r we have redundancy and therefore the potential to trade some
storage space against not only computation time but even against the need
to square! (The transformation from a point in (a,c)-space to a point in
(p,q,r)-space has quite clearly been constructed with that objective in mind.)
We can now try to translate all boolean expressions and moves in (a,c)-space
into the corresponding boolean expressions and moves in (p,q,r)-space. If
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 65

this can be done in terms of the permissible operations there, we have been
successful. The transformation suggested is indeed adequate and the follow-
ing program is the result (the variable h has been introduced for a very local
optimization):

p, q, r:= 0, I, n; do q < n---> q:= q * 4 od;


do q -=;t:. I---> q:= q/4; h:= p + q; p:= p/2 {h = 2 * p + q};
if r > h---> p, r:= p + q, r - h
a r < h ---> skip
fi
od {p has the value desired for a}

This fifth example has been included because it relates -in an embellished
form- a true design history. When the youngest of our two dogs was only
a few months old I walked with both of them one evening. At the time, I was
preparing my lectures for the next morning, when I would have to address
students with only a few weeks exposure to programming, and I wanted a
simple problem such that I could "massage" the solutions. During that
one-hour walk the first, third, and fourth programs were developed in that
order, but for the fact that the correct introduction of h in the last program
was something I could only manage with the aid of pencil and paper after
I had returned home. The second program, the one manipulating a and b,
which here has been presented as a stepping stone to our third solution, was
only discovered a few weeks later-be it in a less elegant form than presented
here. A second reason for its inclusion is the relation between the third and
the fourth program: with respect to the latter one the other one represents
our first example of so-called "representational abstraction".

Sixth example.
For fixed X ( X > I) and Y ( Y > 0) the program should establish
R: z = XY
under the -obvious- assumption that exponentiation is not among the
available repertoire. This problem can be solved with the aid of an "abstract
variable", h say; we shall do it with a loop, for which the invariant relation is
P:
and our (equally "abstract") program could be
h, z:= xr,
I {P has been established};
do h -=;t:. Isqueeze h under invariance of P od
--->

{R has been established}


The last conclusion is justified because (P and h = I) => R. The above
program will terminate under the assumption that a finite number of applica-
66 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES

tions of the operation "squeeze" will have established h = 1. The problem,


of course, is that we are not allowed to represent the value of h by that of a
concrete variable directly manipulated by the machine; if we were allowed
to do that, we could have assigned the value of xr immediately to z, not
bothering about introducing hat all. The trick is that we can introduce two -
at this level, concrete- variables, x and y say, to represent the current value
of h and our first assignment suggests as convention for this representation
h = Xy
The condition "h :::/= I" then translates into "y :::/= O" and our next task
is to discover an implementable operation "squeeze". Because the product
h * z must remain invariant under squeezing, we should divide h by the same
value by which z is multiplied. In view of the way in which h is represented,
the current value of x is the most natural candidate. Without any further
problems we arrive at the translation of our abstract program
x, y, z:= X, Y, l{P has been established};
do y :::/= 0 __. y, z:= y - 1, z * x {P has not been destroyed} od
{R has been established}
Looking at this program we realize that the number of times control
goes through the loop equals the original value Y and we can ask ourselves
whether we can speed things up. Well, the guarded command has now the
task to bring y down to zero; without changing the value of h, we can investi-
gate whether we can change the representation of that value, in the hope of
decreasing the value of y. We are just going to try to exploit the fact that the
concrete representation of a value of h as given by xY is by no means unique.
If y is even, we can halve y and square x, and this will not change h at all.
Just before the squeezing operation we insert the transformation towards
the most attractive representation of h and here is the next program:
x, y, z:= X, Y, l;
do y :::/= 0--> do 2jy--> x, y:= x * x, y/2 od;
y,z:=y-1,Z*X
od {R has been established}
There exists one value that can be halved indefinitely without becoming odd
and that is the value 0; in other words: the outer guard ensures that the inner
repetition terminates.
I have included this example for various reasons. The discovery that a
mere insertion of what on the abstract level acts like an empty statement
could change an algorithm invoking a number of operations proportional
to Y into one invoking a number of operations only proportional to log( Y)
startled me when I made it. This discovery was a direct consequence of my
forcing myself to think in terms of a single abstract variable. The exponentia-
tion program I knew was the following:
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 67

x, y, z:= X, Y, I;
do y =F 0 --> if non 2 IY --> y' z: = y - l, z * x a 21 y --> skip fi;
X,y:=X*X,y/2
od

This latter program is very well known; it is a program that many of us have
discovered independently of each other. Because the last squaring of x when
y has reached the value 0 is clearly superfluous, this program has often been
cited as supporting the need for what were called "intermediate exits". In
view of our second program I come to the conclusion that this support is
weak.

Seventh example.
For a fixed value of n (n > 0) a function f (i) is given for 0 < i < n.
Assign to the boolean variable "al/six" the value such that eventually
R: al/six= (Ai: 0 < i < n:f(i) = 6)
holds. (This example shows some similarity to the second example of this
chapter. Note, however, that in this example, n = 0 is allowed as well. In
that case the range for i for the all-quantifier "A" is empty and al/six = true
should hold.) Analogous to what we did in the second example the invariant
relation
P: (al/six= (Ai: 0 < i <j:f(i) = 6)) and 0 <j < n
suggests itself, because it is easily established for j = 0, while (P andj = n)
~ R. The only thing to do is to investigate how to increase j under invari-
ance of P. We therefore derive
wp(''j:= j + l", P) =
(al/six= (Ai: 0 < i <j + J:f(i) = 6)) and 0 <j + 1 < n
The last term is implied by P and j =F n; it presents no problem because we
had already decided that j =F n as a guard is weak enough to conclude R
upon termination. The weakest pre-condition such that the assignment
al/six:= al/six and f (j) = 6
will establish the other term, is
(al/six and/(j) = 6) = (A i: 0 < i < j + 1: f (i) = 6)
a condition that is implied by P. We thus arrive at the program

al/six,j:= true, O;
do j =F n __. al/six:= allsix and f (j) = 6;
j:=j + 1
od
68 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES

(In the guarded command we have not used the concurrent assignment for
no particular reason.)
By the time that we read this program -or perhaps sooner- we should
get the uneasy feeling that as soon as a function value :::/= 6 has been found,
there is not much point in going on. And indeed, although (P andj = n) ~ R,
we could have used the weaker
(P and (j = n or non a/lsix)) ~ R
leading to the stronger guard ''j :::/= n and allsix" and to the program
a/lsix, j:= true, 0;
do j :::/= n and allsix __.al/six, j:= f (j) = 6, j + 1 od
(Note the simplification of the assignment to al/six, a simplification that is
justified by the stronger guard.)

EXERCISE

Give for the same problem the correctness proof for

if n = 0 -+ al/six:= true
Dn > 0-+ j:= O;
doj=F n -1 and /(j) = 6-+j:=j +I od;
a/lsix:= f(j) = 6
fi

and also for the still more tricky program (that does away with the need to invoke
the function/from more than one place in the program)
j:=O;
doj =F n cand f(j) = 6--> j:= j +I od;
al/six : = j = n
(Here the conditional conjunction operator "cand" has been used in order to do
justice to the fact that f(n) need not be defined.) The last program is one that some
people like very much. (End of exercise.)

Eighth example.
Before I can state our next problem, I must first give some definitions
and a theorem. Let p = (p 0 , p 1 , ••• , p._ 1) be a permutation of n (n > 1)
different values p 1 (0 < i < n), i.e. (i :::/= j) ~ (p, :::/= p 1 ). Let q = (q 0 , qI> ... ,
q._ 1) be a different permutation of the same set of n values. By definition
"permutation p precedes q in the alphabetic order" if and only if for the mini-
mum value of k such that Pk:::/= qk we have Pk< qk.
The so-called "alphabetic index." of a permutation of n different values
is the ordinal number given to it when we number then! possible permuta-
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 69

tions arranged in alphabetic order from 0 through n! - 1. For instance, for


n = 3 and the set of values 2, 4, and 7 we have
indexi2, 4, 7) = 0
indexi2, 7, 4) = I
indexi4, 2, 7) = 2
indexi4, 7, 2) = 3
index 3 (7, 2, 4) = 4
indexi7, 4, 2) = 5
Let (p 0 § p 1 § ... §p._ 1) denote the permutation of the n different values in
monotonically increasing order, i.e. index.((p 0 § p 1 § . .. §p._ 1))= 0. (For
example, (4§ 7§ 2) = (2, 4, 7) but also (7§ 2§ 4) = (2, 4, 7).)
With the above notation we can formulate the following theorem for
n > J:
index.(po, P1• · · · , Pn-1) =
index.(p 0 , (p 1§ P2§ . .. §P.-1)) + index._i(p1, P2, · · . , Pn-1)
(for example, index 3 (4, 7, 2) = 3 index (4, 2, 7) + indexi7, 2) = 2 + I
= 3).
In words: the index. of a permutation of n different values is the index. of
the alphabetically first one with the same leftmost value increased by the
index._ 1 of the permutation of the remaining rightmost n - I values. As a
corollary: from
Pn-k < Pn-k+l < · · · < Pn-1
follows that index.(p 0 , p 1 , ••• , p._ 1) is a multiple of k! and vice versa.
After these preliminaries, we can now describe our problem. We have a
row of n positions (n > I) numbered in the order from left to right from 0
through n - I; in each position lies a card with a value written on it such
that no two different cards show the same value.
When at any moment c; (0 < i < n) denotes the value on the card in
position i, we have initially
c0 < c1 < ... < c._ 1
(i.e. the cards lie sorted in the order of increasing value). For given value of
r (0 < r < n !) we have to rearrange the cards such that
R:
The only way in which our mechanism can interfere with the cards is via the
execution of the statement
cardswap(i, j) with 0 < i, j < n
that will interchange the cards in positions i andj if i :::/= j (and will do nothing
if i = j).
70 THE FORMAL TREATMENT OF SOME SMALL EXAMPLES

In order to perform this transformation we must find a class of states -


all satisfying a suitable condition Pl- such that both initial and final states
are specific instances of that class. Introducing a new variable, s say, an
obvious candidate for Pl is

as this is easily established initially (viz. by "s: = O") and (P1 ands = r) ==> R.
Again we ask whether we can think of restricting the range of s and in
view of its initial value we might try
Pl: index.(c 0 , ch ... , c._ 1) =sand 0 < s < r
which would lead to a program of the form

s:= 0 {Pl has been established};


do s =I= r ___, {P1 and s < r}
increase s by a suitable amount under
invariance of P 1 {P 1 still holds}
od {R has been established}

Our next concern is what to choose for "a suitable amount". Because
our increase of s must be accompanied by a rearrangement of the cards in
order to keep P1 invariant, it seems wise to investigate whether we can find
conditions under which a single cardswap corresponds to a known increase
of s. For a value of k satisfying 1 < k < n, let
Cn-k < Cn-k; I < · · · < Cn-1

hold; this assumption is equivalent with the assumption k ! Is (read "k !


divides s"). Let i = n - k - 1, i.e. c1 is the value on the card to the immediate
left of this sequence. Furthermore let c1 < c._ 1 and let cJ be for j in the range
n - k <j < n the minimum value such that c1 < cJ (i.e. cJ is the smallest
value to the right of c1 exceeding the latter). In that case the operation card-
swap(i, j) leaves the rightmost k values in the same monotonic order and
our theorem about permutations and their indices tells us that k! is the cor-
responding increase of s. It also tells us that when, besides k ! Is, we have
s<r<s+k!
c0 through cn-k-I have attained their final value.
I therefore suggest we strengthen our original invariant relation P 1 with
the additional relation P2 (fixing the function of a new variable k),
P2: l<k<nandk!lsandr<s+k!
which means that the rightmost k cards show still monotonically increasing
values, while the leftmost n - k cards are in their final positions. We have
decided upon the "major steps" in which we shall walk towards our destina-
tion.
THE FORMAL TREATMENT OF SOME SMALL EXAMPLES 71

In order to find "the suitable amount" for a major step, the machine
first determines the largest smaller value of k for which r < s + k ! no
longer holds (c; with i = n - k - 1 is then too small, but values to the left
of it are all OK) and then increases s by the minimum multiple of k !
needed to make r < s + k ! hold again; this is done in "minor steps" of k !
at a time, simultaneously increasing c; with cards to the right of it. In the
following program we introduce the additional variable kfac, satisfying
P3: kfac = k!
and for the second inner repetition i and j, such that i = n - k - 1 and
either j = nor i < j < n and c1 > c; and c1 _ 1 < c1•
s:= 0 {Pl has been established};
kfac, k:= 1, l{P3 has been established as well};
do k * n-> kfac, k:= kfac *(k + 1), k + 1 od
{P2 has been established as well};
dos -=;t:. r-> {s < r, i.e. at least one and therefore
at least two cards have not reached their
final position}
do r < s + kfac-> kfac, k:= kfac/k, k - 1 od
{Pl and P3 have been kept true, but in P2
the last term is replaced by
s + kfac < r < s + (k + l)* kfac};
i,j:= n - k - 1, n - k;
dos+ kfac < r-> {n - k <j < n}
s:= s + kfac; cardswap(i,j); j:= j + 1
od {P2 has been restored again: Pl and P2 and P3}
od{R has been established}

EXERCISE

Convince yourself of the fact that also the following rather similar program would
have done the job:
s:= O; kfac, k:= 1, l;
do k =F n-> kfac, k:= kfac *(k + 1), k + 1 od;
do k =F 1->
kfac, k:= kfac/k, k - l;
i,j:=n-k-1,n-k;
do s + kfac :::;; r ->
s:= s + kfac; cardswap(i,j);j:=j +1
od
od
(Hint: the monotonically decreasing function t ::2: 0 for the outer repetition is
t = r - s + k - 1.) (End of exercise.)
9 ON NONDETERMINACY
BEING BOUNDED

This is again a very formal chapter. In the chapter "The Characterization


of Semantics" we have mentioned four properties that wp(S, R), for any S
considered as a function of R, should have if its interpretation as the weakest
pre-condition for establishing R is to be feasible. (For nondeterministic
mechanisms the fourth property was a direct consequence of the second one.)
In the next chapter, "The Semantic Characterization of a Programming
Language", we have given ways for constructing new predicate transformers,
pointing out that these constructions should lead only to predicate trans-
formers having the aforementioned properties (i.e. if the whole exercise is
to continue to make sense). For every basic statement ("skip", "abort", and
the assignment statements) one has to verify that they enjoy the said proper-
ties; for every way of building up new statements from component statements
(semicolon, alternative, and repetitive constructs) one has to show that the
resulting composite statements enjoy those properties as well, in which
demonstration one may assume that the component statements enjoy them.
We verified this up to and including the Law of the Excluded Miracle for the
semicolon, leaving the rest of the verifications as an exercise to the reader.
We leave it at that: in this chapter we shall prove a deeper property of our
mechanisms, this time verifying it explicitly for the alternative and repetitive
constructs as well. (And the structure of the latter verifications can be taken
as an example for the omitted ones.) It is also known as the "Property of
Continuity".

PROPERTY 5. For any mechanism Sand any infinite sequence of predicates


C0 , C1 , C2 , ••• such that
for r> 0: for all states (J)

72
ON NONDETERMINACY BEING BOUNDED 73

we have for all states


wp(S, (Er: r > 0: C,)) =(Es: s > 0: wp(S, C,)) (2)
For the statements "skip" and "abort" and for the assignment statements,
the truth of (2) is a direct consequence of their definitions, assumption (J) not
even being necessary. For the semicolon we derive
wp("SJ; S2", (Er: r > 0: C,)) =
(by definition of the semantics of the semicolon)
wp(SJ, wp(S2, (Er: r > 0: C,))) =
(because property 5 is assumed to hold for S2)
wp(SJ, (Er': r' > 0: wp(S2, C,,))) =
(because S2 is assumed to enjoy property 2, so that wp(S2, C,,) =-
wp(S2, c,,+ 1) and SJ is assumed to enjoy property 5)
(Es: s > 0: wp(SJ, wp(S2, C,))) =
(by definition of the semantics of the semicolon)
(Es: s > 0: wp("SJ; S2", C,)) Q.E.D.
For the alternative construct we prove (2) in two steps. The easy step is
that the right-hand side of (2) implies its left-hand side. For, consider an
arbitrary point X in state space, such that the right-hand side of (2) holds,
i.e. there exists a nonnegative value, s' say, such that in point X the relation
wp(S, C,,) holds. But because C,, =-(Er: r > 0: C,) and any S enjoys prop-
erty 2, we conclude that
wp(S, (Er: r > 0: C,))
holds in point X as well. As X was an arbitrary state satisfying the right-hand
side of (2), the latter implies the left-hand side of (2). For this argument,
antecedent (J) has not been used, but we need it for proving the implication
in the other direction.
wp(IF, (Er: r > 0: C,)) =
(by definition of the semantics of the alternative construct)
BB and (Aj: I <j < n: BJ=- wp(SLJ, (Er: r > O: C,))) =
(because the individual SLJ are assumed to enjoy property 5)
BB and (Aj: I <j < n: BJ=- (Es: s > 0: wp(SLJ, C,))) (3)
Consider an arbitrary state X for which (3) is true, and let j' be a value
for j such that Br(X) = true; then we have in point X
(Es: s > 0: wp(SLr, C,)) (4)
74 ON NONDETERMINACY BEING BOUNDED

Because of (J) and the fact that SLr enjoys property 2, we conclude that
wp(SLr, C,) => wp(SLr, C,+1)
and thus we conclude from (4) that in point X we also have
(Es': s' > 0: (As: s > s': wp(SLr, C,))) (5)
Lets' = s'(j') be the minimum value satisfying (5). We now define smax as
the maximum value of s'(j') taken over the (at most n, and therefore the
maximum exists!) valuesj' for which Br(X) =true. In point X then holds
on account of (3) and (5)
BB and (Aj: 1 <j < n: Bi=> wp(SL1, C,m.J) =
(by definition of the semantics of the alternative construct)

But the truth of the latter relation in state X implies that there also
(Es: s > 0: wp(IF, C,))

but as X was an arbitrary state satisfying (3), for S = IF the fact that the
left-hand side of (2) implies its right-hand side as well has been proved, and
thus the alternative construct enjoys property 5 as well. Note the essential
role played by the antecedent (J) and the fact that a guarded command set
is a finite set of guarded commands.
Property 5 is proved for the repetitive construct by mathematical induc-
tion.
Base: Property 5 holds for H 0 •

H 0 (Er: r > 0: C,) =


(Er: r > 0: C,) and non BB=
(E s: s > 0: C, and non BB) =
(Es: s > 0: H 0 (C,)) Q.E.D.

Induction step: From the assumption that property 5 holds for Hk and H 0 it
follows that it holds for Hk+I·
Hk+i(E r: r > 0: C,) =
(by virtue of the definition of Hk+i)
wp(IF, Hk(E r: r > 0: C,)) or H 0 (Er: r > 0: C,) =
(because property 5 is assumed to hold for Hk and for H 0 )
wp(IF, (Er': r' > 0: Hk(C,,))) or (Es: s > 0: H 0 (C,)) =
(because property 5 holds for the alternative construct and property 2 is
enjoyed by Hk)
ON NONDETERMINACY BEING BOUNDED 75

(Es: s > 0: wp(IF, Hk(C,))) or (Es: s > 0: H 0 (C,)) =


(Es: s > 0: wp(IF, Hk(C,)) or H 0 (C,)) =
(by virtue of the definition of Hk+I)
(Es: s > 0: Hk+i (C,)) Q.E.D.
From base and induction step we conclude that property 5 holds for all
Hk, and hence
wp(DO, (Er: r > 0: C,)) =
(by definition of the semantics of the repetitive construct)
(Ek: k > 0: Hk(E r: r > O: C,)) =
(because property 5 holds for all Hk)
(Ek: k > 0: (Es: s > 0: Hk(C,))) =
(because this expresses the existence of a (k, s)-pair)
(Es: s > 0: (Ek: k > 0: Hk(C,))) =
(by definition of the semantics of the repetitive construct)
(Es: s > 0: wp(DO, C,)) Q.E.D.

Property 5 is of importance on account of the semantics of the repetitive


construct
wp(DO, R) =(Ek: k > 0: Hk(R))
such a pre-condition could be the post-condition for another statement.
Because
for k>O: for all states
(this is easily proved by mathematical induction), the conditions under which
property 5 is relevant, are satisfied. We can, for instance, prove that in all
initial states in which BB holds
do B1 --> SL1 DB 2 --> SL 2 D ••• D B.--> SL. od
is equivalent to
if B1--> SL1 a B2--> SL2 a ... a B.--> SL. fi;
do B1 --> SL1 D B 2 --> SL 2 D ••• DB.--> SL. od
(In initial states in which BB does not hold, the first program would have acted
as "skip", the second one as "abort".) That is, we have to prove that
(BB and wp(DO, R)) =(BB and wp(IF, wp(DO, R)))

BB and wp(IF, wp(DO, R)) =


(on account of the semantics of the repetitive construct)
BB and wp(IF, (E k: k > 0: Hk(R))) =
76 ON NONDETERMINACY BEING BOUNDED

(because property 5 holds for IF)


BB and (Es: s > 0: wp(IF, H,(R))) =
(because (BB and H 0 (R)) = F)
BB and (Es: s > 0: wp(IF, H,(R)) or H 0 (R)) =

(on account of the recurrence relation for the Hk(R))


BB and (Es: s > 0: H,+i(R)) =
(because (BB and H 0 (R)) = F)
BB and (Ek: k > 0: Hk(R)) =

(on account of the semantics of the repetitive construct)


BB and wp(DO, R) Q.E.D.
Finally, we would like to draw attention to a very different consequence
of the fact that all our mechanisms enjoy property 5. We could try to make
the program S: "set x to any positive integer" with the properties:

(a) wp(S, x > 0) = T.


(b) (As: s > 0: wp(S, 0 < x < s) = F)

Here property (a) expresses the requirement that activation of Sis guaranteed
to terminate with x equal to some positive value, property (b) expresses that
Sis a mechanism of unbounded nondeterminacy, i.e. that no a priori upper
bound for the final value of x can be given. For such a program S, we could,
however, derive now:

T = wp(S, x > 0)
= wp(S, (Er: r > 0: 0 < x < r))
=(Es: s > 0: wp(S, 0 < x < s))
= (E s: s > 0: F)
=F

This, however, is a contradiction: for the mechanism S "set x to any


positive integer" no program exists!
As a result, any effort to write a program for "set x to any positive integer"
must fail. For instance, we could consider:

go on:= true; x:= 1;


do go on --> x: = x + 1
0 go on --> go on:= false
od
ON NONDETERMINACY BEING BOUNDED 77

This construct will continue to increase x as long as the first alternative is


chosen; as soon as the second alternative has been chosen once, it terminates
immediately. Upon termination x may indeed be "any positive integer" in
the sense that we cannot think of a positive value X such that termination
with x = Xis impossible. But termination is not guaranteed either! We can
enforce termination: with N some large, positive constant we can write

go on:= true; x:= I;


do go on and x < N--> x:= x +I
a go on--> go on:= false
od

but then property (b) is no longer satisfied.


The nonexistence of a program for "set x to any positive integer" is re-
assuring in more than one sense. For, if such a program could exist, our
definition of the semantics of the repetitive construct would have been sub-
ject to doubt, to say the least. With

S: do x > 0--> x:= x - I


ax < 0 --> "set x to any positive integer"
od

our formalism for the repetitive construct gives wp(S, T) = (x > 0), while I
expect most of my readers to conclude that under the assumption of the
existence of "set x to any positive integer" for x < 0 termination would be
guaranteed as well. But then the interpretation of wp(S, T) as the weakest
pre-condition guaranteeing termination would no longer be justified. How-
ever, when we substitute our first would-be implementation:

S: do x > 0 __. x: = x - I
Dx < 0-> go on:= true; x:= I;
+
do go on --> x: = x I
a go on--> go on:= false
od
od

wp(S, T) = (x > 0) is fully correct, both intuitively and formally.


The second reason for reassurance is of a rather different nature. A
mechanism of unbounded nondeterminacy yet guaranteed to terminate would
be able to make within a finite time a choice out of infinitely many possibil-
ities: if such a mechanism could be formulated in our programming language,
that very fact would present an insurmountable barrier to the possibility of
the implementation of that programming language.
78 ON NONDETERMINACY BEING BOUNDED

Acknowledgement. I would like to express my great indebtedness to


John C. Reynolds for drawing my attention to the central role of property
5 and to the fact that the nonexistence of a mechanism "set x to any positive
integer" is essential for the intuitive justification of the semantics of the
repetitive construct. He is, of course, in no way to be held responsible for
any of the above. (End of acknowledgement.)
10 AN ESSAY ON THE NOTION:
"THE SCOPE OF VARIABLES"

Before embarking on what the notion "the scope of a variable" could or


should mean, it seems wise to pose a preliminary question first, viz. "Why
did we introduce variables in the first place?". This preliminary question is
not as empty as it might seem to someone with programming experience
only. For instance, in the design of sequential circuitry, it is not unusual to
design, initially, a finite-state automaton in which the different possible states
of the automaton to be built are just numbered "O, I, 2, .... " in the order
in which the designer becomes aware of the desirability of their inclusion. As
the design proceeds, the designer builds up a so-called "transition table",
tabulating for each state the successor state as function of the incoming
symbol to be processed. He recalls, only when the transition table has been
completed, that he has only binary variables (he calls them "flip-flops") at
his disposal and that, if the number of states is, say, between 33 and 64
(bounds included) he needs at least six of them. Those six binary variables
span a state space of 64 points and the designer is free in his choice how to
identify the different states of his finite-state automaton with points in the
64-point state space. That this choice is not irrelevant becomes clear as soon
as we realize that a state transition of the finite-state automaton has to be
translated in combinations of boolean variables being operated upon under
control of boolean values. To circuit designers this choice is known as the
"state assignment problem" and technical constraints or optimization goals
may make it a hairy one-so hairy, as a matter of fact, that one may be
tempted to challenge the adequacy of the design methodology evoking it. A
critical discussion of circuit design methodologies, however, lies beyond the
limits of my competence and therefore falls outside the scope of this mono-
graph; we only mentioned it in order to show that there exist design traditions

79
80 AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES"

in which, apparently, the introduction of "variables" right at the start does


not come naturally.
For two reasons the programmer lives in another world than the circuit
designer, particularly the old-fashioned circuit designer. There has been a
time that technical considerations exerted a very strong pressure towards
minimization of the number of flip-flops used and to use more than six flip-
flops to build a finite-state machine with at most 64 states was at that time
regarded, if not as a crime, at least as a failure. At any rate, flip-flops being
expensive, the circuit designer confined his attention to the design of (sub)-
mechanisms of which the number of possible states were extremely modest
compared with the number of internal states of the mechanisms program-
mers are now invited to consider. The programmer lives in a world where the
bits are cheaper and this has two consequences: firstly his mechanisms may
have many times more internal states, secondly he can allow larger portions
of his state space to remain "unused". The second reason why the program-
mer's world differs from that of the circuit designer is that the larger number
of times that (groups of) bits change translates into a longer computation
time: the cheapest thing a programmer can prescribe is leaving large sections
of the store unaffected!
Comparing the programmer's world with that of the circuit designer,
who, initially, just "names" his states by ordinal number of introduction, we
immediately see why the programmer wishes to introduce, right at the start,
variables that he regards as Cartesian coordinates of a state space of as many
dimensions as he feels bound to introduce: the number of different states to
be introduced is so incredibly large that a nonsystematic terminology would
make the design utterly unmanageable. Whether (and, if so, how!) the design
can be kept "manageable" by introducing variables -by way of "nomen-
clature"- is, of course, the crucial question.
The basic mechanism for changing the state is the assignment statement,
redefining (until further notice, that is the next assignment to it) the value of
one variable. In state space it is a movement parallel to one of the coordinate
axes. This can be embellished, leading to the concurrent assignment; from a
small number of assignment statements we can form a program component,
but such a program component will always affect a modest number of vari-
ables. The net effect of such a program component can be understood in
terms of a movement in the subspace spanned by the few variables that are
affected by it (if you wish, in terms of the projection upon that subspace).
As such, it is absolutely independent of all other variables in the system:
we can "understand" it without taking any of the other variables into account,
even without being aware of their existence. The possibility of this separation,
which is also referred to as "factorization", is not surprising in view of the
fact that our total state space has been built up as a Cartesian "product" in
the first place. The important thing to observe is that this separation, vital
AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES" 81

for our ability to cope (mentally) with the program as a whole, is the more
vital the larger the total number of variables involved. The question is whether
(and, if so, how) such "separations" should be reflected more explicitly in our
program texts.
Our first "autocoders" (the later ones of which were denoted by misno-
mers such as "automatic programming systems" or -even worse- "high level
programming languages") certainly did not cater to such possibilities. They
were conceived at a time when it was the general opinion that it was our
program's purpose to instruct our machines, in contrast to the situation of
today in which more and more people are leaning towards the opinion that
it is our machines' purpose to execute our programs. Jn those early days it
was quite usual to find all sorts of machine properties directly reflected in
the write-up of our programs. For instance, because machines had so-called
"jump instructions'', our programs used "go to statements". Similarly, be-
cause machines had constant size stores, computations were regarded as
evolving in a state space with a constant number of dimensions, i.e. manipu-
lating the values of a constant set of variables. Similarly, because in a random
access store each storage location is equally well accessible, the programmer
was allowed to refer at any place in the program text to any variable of the
program.
This was all right in the old days when, due to the limited storage sizes,
program texts were short and the number of different variables referred to
was small. With growing size and sophistication, however, such homogeneity
becomes, as a virtue, subject to doubt. From the point of view of flexibility
and general applicability the random access store is of course a splendid
invention, but comes the moment that we must realize that each flexibility,
each generality of our tools requires a discipline for its exploitation. That
moment has come. Let us tackle the "free accessibility" first.
In FORTRAN's first version there were two types of variables, integer
variables and floating point variables, and the first letter of their name
decided -according to a fixed convention- the type, and any occurrence of
a variable name anywhere in the program text implied at run time the per-
manent existence of a variable with that name. In practice this proved to be
very unsafe: if in a program operating on a variable named "TEST" a single
misspelling occurred, erroneously referring to "TETS" instead of to "TEST",
no warning could be generated; another variable called "TETS" would be
introduced. In ALGOL 60 the idea of so-called "declarations" was introduced
and as far as catching such silly misspellings was concerned, this proved to be
an extremely valuable form of redundancy. The basic idea of the explicit
declaration of variables is that statements may only refer to variables that
have been explicitly declared to exist: an erroneous reference to a variable by
the name of"TETS" is then caught automatically if no variable with the name
"TETS" has been declared. The declarations of ALGOL 60 served a second
82 AN ESSAY ON THE NOTION: "THE SCOPE OF VARI ABLES"

purpose besides enumerating the "vocabulary" of names of variables permis-


sible in statements; they also coupled each introduced name to a specific
type, thereby abolishing the original FORTRAN convention that the type
was determined by the first letter of the name. As people tended more and
more to choose names of mnemonic significance, this greater freedom of
nomenclature was greatly appreciated.
A greater departure from FORTRAN was ALGOL 60's so-called "block
concept" that was used to limit the so-called "textual scope" of the declara-
tions. A "block" in ALGOL 60 extends from an opening bracket "begin"
to the corresponding closing bracket "end" and this bracket pair marks the
boundaries of a textual level of nomenclature. Following the opening bracket
"begin" one or more new names are declared (in the case of more than one
name they must all be different) to mean something and those names with
those new meanings are said to be "local" to that block: all usage of those
names with those new meanings must occur in statements between the preced-
ing "begin" and the corresponding "end". If our whole program is just one
single block, every variable is accessible from everywhere in the text and the
protection against misspellings (as referred to in the previous paragraph) is
the only gain. A block, however, is one of the possible forms of a statement
and therefore blocks may occur nested inside each other and this gives a
certain amount of protection because variables local to an inner block are
inaccessible from outside.
The ALGOL 60 scope rules protect the local variables of an inner block
from outside interference; in the other direction, however, they provide no
protection whatsoever. From the interior of an inner block everything out-
side is in principle accessible, i.e. everything also accessible in the textually
embracing block. I said "in principle", for there is one exception, an exception
that in those days was regarded as a great invention but that, upon closer
scrutiny 15 years later, looks more like a logical patch. The exception is the
so-called "re-declared identifier". If at the head of an inner block a declaration
occurs for a name -ALGOL 60calls them "identifiers"-which (accidentally)
has already a meaning in the embracing block, then this outside meaning of
that name is textually dormant in the inner block, whose local declaration for
that same meaning overrules the outside one. The idea behind this "priority
of innermost declarations" was that the composer of an inner block needs
only to be aware of the global names in the surrounding context to which he
is actually referring, but that any other global names should not restrict him
in the freedom of the choice of his own names with local significance only.
The fact that this convention smoothly catered to the embedding of inner
blocks within a "growing" environment (e.g. user programs implicitly
embedded in an implementation-supplied outermost block containing the
standard procedure library) has for a long time been regarded as a sufficient
justification for the convention of priority of innermost declarations. The fact
AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES" 83

that at that time it had been recently discovered how a one-pass assembler
could use a stack for coping with textually nested different meanings of the
same name, may have had something to do with its adoption.
From the user's point of view, however, the convention is less attractive,
for it makes the variables declared in his outermost block extremely vulner-
able. If he discovers to his dismay that the value of one of these variables has
been tampered with in an unintended and as yet unexplained way, he is in
principle obliged to read all the code of all the inner blocks, including the
ones that should not refer to the variable at all-for precisely there such a
reference would be erroneous! Under the assumption that the programmer
does not need to refer everywhere to anything he seems to be better served
by more explicit means for restricting the textual scope of names than the
more or less accidental re-declaration.
A first step in this direction, which maintains the notion of textually
nested contexts is the following. For each block we postulate for its level
(i.e. its text with the exception of its inner blocks) a textual context, i.e. a
constant nomenclature in which all names have a unique meaning. The
names occurring in a block's textual context are either "global", i.e. inherited
with their meaning from the immediate surroundings, or "local", i.e. with
its meaning only pertinent to the text of this block. The suggestion is to
enumerate after the block's opening bracket (with the appropriate separators)
the names that together form its textual context, for instance first the global
names (if any) and then the local names (if any); obviously all these names
must be different.

Confession. The above suggestion was only written down after long hesi-
tations, during which I considered alternatives that would enable the pro-
grammer to introduce in a block one or more local names without also being
obliged to enumerate all the global names the block would inherit, i.e. alter-
natives that would indicate (with the same compactness) the change of nomen-
clature of the ALGOL 60 block (without "re-declaration of identifiers"), i.e.
a pure extension of the nomenclature. This would give the programmer
the possibility to indicate contraction of the nomenclature, i.e. limited
inheritance from the immediate surroundings, but not the obligation. And
for some time I thought that this would be a nice, nonpaternalistic attitude
for a language designer; I also felt that this would make my scope rules
more palatable, because I feared that many a programmer would object
to explicit enumeration of the inheritance whenever he felt like introducing a
local variable. This continued until I got very cross with myself: too many
language designs have been spoiled by fear of nonacceptance and I know of
only one programmer who is going to program in this language and that is
myself! And I am enough of a puritan to oblige myself to indicate the inheri-
tance explicitly. Even stronger: not only will my inner blocks refer only to
84 AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES"

global variables explicitly inherited, but also the inheritance will not mention
any global variables not referred to. The inheritance will give a complete
description of the block's possible interference with the state space valid in
its surroundings, no more and no less! When I discovered that I had allowed
my desire "to please my public" -which, I think, is an honourable one- to
influence not only the way of presentation, but also the subject matter itself,
I was frightened, cross with, and ashamed of myself. (End of confession.)

Besides having a name, variables have the unique property of being able
to have a value that may be changed. This immediately raises the question
"What will be the value of a local variable upon block entry?". Various
answers have been chosen. ALGOL 60 postulates that upon block entry the
values of its local variables are "undefined", i.e. any effort to evaluate their
value prior to an assignment to them is regarded as "undefined". Failure to
initialize local variables prior to entry of a loop turned out to be a very
common error and a run-time check against the use of the undefined value
of local variables, although expensive, proved in many circumstances not to
be a luxury. Such a run-time check is, of course, the direct implementation
of the pure mathematician's answer to the question of what to do with a
variable whose value is undefined, i.e. extend its range with a special value,
called "UNDEFINED", and initialize upon block entry each local variable
with that unique, special value. Any effort to evaluate the value of a variable
having that unique, special value "UNDEFINED" can then be honoured
by program abortion and an error message.
Upon closer scrutiny, however, this simple proposal leads to logical
problems; for instance, it is then impossible to copy the values of any set of
variables. Efforts to remedy that situation include, for instance, the possibility
to inspect whether a value is defined or not. But such ability to manipulate
the special value -e.g. the value "NIL" for a pointer pointing nowhere-
easily leads to confusions and contradictions: one might discover a case of
bigamy when meeting two bachelors married to the same "nobody".
Another way out, abolishing the variables with undefined values, has
been the implicit initialization upon block entry not with a very special, but
with a very common, almost "neutral" value (say "zero" for all integers and
"true" for all booleans). But this, of course, is only fooling oneself; now
detection of a very common programming error has been made impossible
by making all sorts of nonsensical programs artificially into legal ones. (This
proposal has been mitigated by the convention that initialization with the
"neutral" value would only occur "by default", i.e. unless indicated other-
wise, but such a default convention is clearly a patch.)
A next attack to the problem of catching the use of variables with still
undefined values has been the performance of (automatic) flow analysis of
the program text that could at least warn the programmer that at certain
AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES" 85

places variables would-or possibly could-be used prior to the first assign-
ment to them. In a sense my proposal can be regarded as being inspired by
that approach. I propose such a rigid discipline that:

I. the flow analysis involved is trivial;


2. at no place where a variable is referenced can there exist uncertainty as
to whether this variable has been initialized.

One way of achieving this would be to make the initialization of all local
variables obligatory upon block entry; together with the wish not to initialize
with "meaningless" values -a wish that implies that local variables should
only be introduced at a stage that their meaningful initial value is available-
this, I am afraid, will lead to confusingly high depths of nesting of textual
scopes. Besides that we would have to "distribute" the block entry over the
various guarded commands of an alternative construct whenever the initial-
ization should be done by one of the guarded commands of a set. These two
considerations made me look for an alternative that would require less (and
more unique) block boundaries. The following proposal seems to meet our
requirements.
First of all we insist that upon block entry the complete nomenclature
(both inherited and private) is listed. Besides the assignment statement that
destroys the current value of a variable by assigning a new one to it, we have
initializing statements, by syntactical means recognizable as such, that give a
private variable its first value since block entry. (If we so desire we can regard
the execution of its initializing statement as coinciding in time with the vari-
able's "creation"; the earlier mentioning of its name at block entry can then
be regarded as "reserving its identifier for it".)
In other words, the textual scope of a variable private (i.e. local) to a
block extends from the block's opening "begin" until its corresponding
closing "end" with the exception of the texts of inner blocks that do not
inherit it. We propose to divide its textual scope into what we might call
"the passive scope", where reference to it is not allowed and the variable is
not regarded as a coordinate of the local state space, and "the active scope",
where the variable can be referenced. Passive and active scopes of a variable
will always be separated by an initializing statement for that variable, and
initializing statements for a variable have to be placed in such a way that,
independent of values of guards:

I. after block entry exactly one initializing statement for it will be executed
before the corresponding block exit;
2. between block entry and the execution of the initializing statement no
statement from its active scope can be executed.

The following discipline guarantees that the above requirements are met. To
86 AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES"

start with we consider the block at the syntactic grain where the enumeration
of its private nomenclature is followed by a list of statements mutually sepa-
rated by semicolons. Such a statement list must have the following properties:

(A) It must contain a unique statement initializing a given private variable.


(B) Statements (if any) preceding the initializing statement are in the passive
scope of the variable, and if they are inner blocks they are not allowed to
inherit it.
(C) Statements (if any) following the initializing statement are in the active
scope of the variable, and if they are inner blocks they may inherit it.
(D) For the initializing statement itself there are three possibilities:
(1) It is a primitive initializing statement.
(2) It is an inner block. In this case it inherits the variable in question
and the statement list following the enumeration of the inner nomen-
clature has again the properties (A), (B), (C), and (D).
(3) It is an alternative construct. In this case the statement lists following
its guards all have the properties (A), (B), (C), and (D).

For the BNF-addicts the following syntax (where intialization refers to one
private variable) may be helpful:

<block)::= begin <nomenclature); <initializing statement list) end


<initializing statement list)::= {<passive statement);}
<initializing statement){; <active statement)}
<initializing statement): : = <primitive initializing statement) I
begin <nomenclature); <initializing statement list) end I
if <guard)--> <initializing statement list)
{O <guard) --> <initializing statement list)} fi

Note. In using ALGOL 60 the sheer size of the brackets "begin" and
"end" has caused discomfort; having abolished ALGOL 60's compound
statement, we expect to need fewer of them. (End of note.)
The corresponding repetitive construct is not included as a permissible
form of the <initializing statement) because its inclusion would violate our
first requirement: regardless of the sequencing initialization must occur
exactly once. Such a restriction does not occur in a programming language
like ALGOL 60 in which simply the (dynamically) first assignment is taken as
"the initialization". The price paid for the greater freedom in an ALGOL-like
language is that with programs written in such a language we cannot neces-
sarily decide statically (i.e. for all computations) at each semicolon which
AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES" 87

variables have defined values, as is shown in the following examples in


ALGOL60.
If B is a global boolean
begin integer x, y; if B then x:= 1 else y:= 2; .....
and if N is a global integer
begin integer i, x; for i:= 1step1 until N do x:= J; .....
In the first example the value of x is only defined provided that B is true; in
the second example only provided N > 1. Obviously, neither of the two con-
structs can be translated into our language and we have to convince ourselves
that we can live with the restrictions we have imposed upon ourselves, or,
even stronger, that beginning blocks as in the two examples just given are not
only avoidable but, in a sense, even pointless. The intuitive argument that
inside a repetitive construct we never need to execute the dynamically first
assignment to a variable whose value is of relevance upon termination is as
follows: such a variable must occur in either the invariant relation or the
guards or both and therefore it had better be defined in the initial state of
the repetitive construct as well. (In the next chapter we shall see that this
position shall force us to reconsider the notion of what constitutes "a vari-
able".)
The fact that we shall restrict ourselves to a language such that at each
semicolon it is statically known which variables constitute the current state
space implies that we never need to doubt -if we would write our programs
in an ALGOL-like language- whether an assignment to a private variable is
the dynamically first one since block entry, and as a result it is no additional
restriction to require that the syntactic form of a (primitive initializing state-
ment) differs from that of the true assignment statement. Upon second
thought it seems only honest to use different notations, for they are very
different operations: while the initializing statement is in a sense a creative
one (it creates a new variable with an initial value), the true assignment state-
ment is a destructive one in the sense that it destroys information, viz. the
variable's former value.
Note. The special role of the repetitive construct should not surprise us
too much, because as long as we have a language without repetition, we
would never need the true assignment; we could come away with initial-
ization only: without repetition we could program in terms of constants
that are initialized once. It is only with the introduction of the repetition
that we need variables in the true sense of the word, i.e. that we need that
at different repetition steps the same name can evoke different values. It
is the notion of the repetition that calls for the notion of variables whose
value can be changed by means of an assignment statement. It is in the
88 AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES"

repetition that we need true assignments to variables existing outside; it


is also there that we can not allow initialization of such variables.
(End of note.)
A few things have still to be decided. One is where and how to indicate
the type of private variables. Two possible places present themselves: either
-as in ALGOL 60- at the beginning of the block to which the variable is
private and its name occurs anyhow in the enumeration of the nomenclature
or at its primitive initializing statement(s).
There are two reasons, however, why I feel that we should depart from
the ALGOL 60 convention and should restrict the function of the enumera-
tion of the nomenclature at the beginning of the block purely to the definition
of the textual scope of names, at least for not burdening this enumeration
with type information. We shall develop the argument for inherited names
and private names separately.
If we prescribe that each inherited name in the nomenclature should carry
its type explicitly with it, what price do we pay and what have we gained?
The price is not only longer texts, but also the fact that a characteristic of
the outer environment, viz. the type of a global variable, diffuses all through
the text. At scattered places, viz. in all nomenclatures inheriting the name,
we find a copy of that type information. This is most unattractive if we would
like to change the type of such a global variable to another one on which
the same set of functions and operations is defined. Furthermore we had
decided that the function of the explicit inheritance would be a protection of
the outside world, a condensation of how the inner block can interact with
its environment that we are supposed to know. Repeating the types does not
tell us in condensed form anything new or anything interesting about an
inner block. The only gain seems to be that the block with the global type
information can to a larger extent be understood in isolation. But I am not
so sure that the relevant information about the environment can be restricted
to just the type of the global variables: relations which may be assumed to
hold between initial values could also be essential. Ifwe want to understand
what the inner block is good for, a fuller description of the interface is
required and as long as we do not intend to mechanize correctness proofs,
suitable comments seem a more adequate vehicle. (Any reader who now
shouts "But what about independent compilation?" is asked to realize that
that remark has significance only in a very specific style of implementation
and is invited to imagine himself-as I do-in an environment in which that
question is utterly meaningless.)
Now we turn to the private variables. If the type is stated in the enumera-
tion, each activation of the block will initialize the variable with that type.
We have seen, however, that ifthe initializing statement is an alternative con-
struct, at each activation of the block, one out of a set of primitive initializing
statements will be executed. As far as I am concerned they could initialize
AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES" 89

variables of different types, something no one can object to as long as the


same functions and operations are defined on them. Whether such a freedom
of choice for the type of a private variable is a useful thing or not is some-
thing that seems hard to discuss at this stage; at any rate it seems unwise to
choose now a notational convention that does not leave the option open.
(As the attentive reader will have noticed, the option that the type of a private
variable may differ between different activations of the block to which it is
private gives another reason for not copying type information when describ-
ing the inheritance, viz. for its inner blocks.)
The next thing we have to decide is whether at block entry we give any
indication as to the nature of the interference with the inherited variables.
I recall that one of the reasons for enumerating at the block's beginning the
variables of the surrounding context it refers to, was to restrict the amount
of text that should be studied when one of these variables has been interfered
with in an ill-understood manner. If such a variable acts for each activation
of an inner block as "a global constant", we should like to see, right at the
block's begin, that it cannot tamper with its value; more precisely, that it
can inspect, but not change its value.
Let us restrict for a moment our attention to an inner block (with the
context "IN"), fully within the active scope of a variable that it inherits from
the immediately surrounding block (with the context "OUT").
If at the level of the context OUT, the variable is changeable, we have
seen the need for two possible ways of inheritance: either it is changeable in
context IN as well -i.e. the inner block may contain assignment statements,
assigning new values to the variable- or the inner block inherits it as a
global constant. This last fact, however, has its consequences for both con-
texts. For context OUT it means that when we are interested in "what can
influence the value of the variable" we can skip the text of the inner block.
If, however, we want to read the text of the inner block -more or less in
isolation of its surroundings- and want to understand its achievement, then
the fact that such a global constant is a constant and not a variable is of
crucial importance. Already in the first example of the chapter "The Formal
Treatment of Some Small Examples" we started with "Establish for fixed x
and y the relation R(m)
(m = x or m = y) and m > x and m > y"
The fact that x and y are to be considered fixed is an essential aspect of its
task.
If, at the level of the context OUT the variable is already a constant, the
inner block can only inherit it in one way: again as a constant. The inner
block fully in the active scope of an inherited variable has been dealt with
satisfactorily.
An inner block fully in the passive scope of a variable does not present
90 AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES"

any difficulties either: it may not inherit the variable from its surroundings.
The third case, where a block begins in the passive scope of a variable
and ends in its active one deserves still some further attention: it means no
more and no less than the block's obligation to initialize the variable. It has
inherited what we could call "a virgin variable" for the purpose of its initial-
ization. Both in context IN and in context OUT we can ask the question
whether the variable is changeable in its active scope. In the case of the
inheritance of a virgin variable, these two questions are, however, fully inde-
pendent; the circumstance that at the textual level of context OUT a variable
will not have its value changed after initialization does not exclude that
the initialization itself (in an inner block) is a multistep affair, viz. a multistep
affair when we consider the initialization not as a single, undivided act, but
-at a smaller grain of interest- as a sequential process, building up the initial
value.
After these explorations the time has come to be as precise as possible.
We recall that we introduced the name "the textual context" of a block for
the constant nomenclature (in which all names have a unique meaning) per-
taining to the block's "level", i.e. its text with the exception of its inner
blocks. We now consider two nested blocks, an inner one (with the context
called "IN") and an outer one (with the context called "OUT"); with respect
to IN we have referred to OUT as "the surrounding context".
Names of a context are of two kinds: either they are private to the block,
i.e. unrelated to anything outside the block, or they are "inherited" from the
surrounding context. In the case of inheritance, we must distinguish two
cases: the context IN may inherit a name of a variable from the context OUT
with or without the obligation for the inner block to initialize, when activated,
the variable inherited. We shall distinguish these three ways by

pri ("private", i.e. unrelated to the surrounding context)


vir ("virgin", i.e. inherited from the surrounding context with the obligation
to initialize)
glo ("global", i.e. inherited from the surrounding context without the obli-
gation nor the permission to initialize).

A variable can belong to more than one textual context: to start with it
belongs to the textual context of the block to which it is private and further-
more it belongs to the textual contexts of all inner blocks that inherit it from
their surrounding contexts. The scope of a variable extends over the levels
of all blocks to whose textual contexts the variable belongs. The scope of a
variable is always subdivided into two parts, its passive scope and its active
scope, and the way in which initializing statements for a variable may occur
in the text has been restricted so as to guarantee with respect to each variable
in time the succession:
AN ESSAY ON THE NOTION; "THE SCOPE OF VARIABLES" 91

entry to the block to which it is private;


zero or more executions of statements from its passive scope;
one initialization for it;
zero or more executions of statements from its active scope;
corresponding block exit.
Whether a variable is enumerated under the heading "pri" or "vir" makes
no difference as regards the block's rights and obligations with respect to that
variable; in both cases it has the obligation to initialize it. The only difference
is that while the variable under the heading "pri" has no relation to the con-
text OUT, the variable under the heading "vir" must occur in the context
OUT and the inner block must start in the variable's passive scope and end
in its active scope. A variable under the heading "glo" must occur in the
context OUT and the inner block must lie entirely within its active scope.
Besides the block's external aspects as described by "pri", "vir", or "glo",
there are internal aspects, viz. whether at the block's level in the active scope
of the variable its value may be changed or not. If the variable's value may
be changed, this is indicated with "var" (from "variable"); if it may not be
changed it is indicated by "con" (from "constant"). Each of the three external
aspects can be combined with each of the two internal aspects, giving the six
possibilities "privar", "pricon", "virvar", "vircon", "glovar'', and "glocon".
For the six headings in the context IN we give the permissible headings in
the context OUT:

IN OUT
privar, pricon not applicable
glovar privar, virvar (only if inner block
fully within active scope) or
glovar (without restriction)
glocon privar, pricon, virvar, vircon (only if
inner block fully within active scope) or
glovar, glocon (without restriction)
virvar, vircon privar, pricon, virvar, vircon (only if
inner block begins in passive scope)

Note 1. As a consequence of the above, the aspect "con" in the context


OUT excludes that after initialization inner blocks can change the value
of the variable. (End of note 1.)
Note 2. The aspect "con" in context OUT does not exclude the initial-
ization by an inner block, at the level of which the variable may enjoy the
aspect "var": in context OUT we may have a "pricon table", the initial-
ization of which is delegated to an inner block, in whose context the
92 AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES"

"virvar table" will be created and built up. Once the execution of such an
initializing inner block has been completed, the value of"table"will remain
constant throughout the execution of the outer block (and its further
inner blocks, if any). (End of note 2.)
The remaining decisions, although far from unimportant (they determine
what our texts look like, how easily they write and read), have less far-
reaching consequences; they are purely concerned with syntax. We have to
decide upon notations for the <nomenclature) and for the <primitive initial-
izing statement). I propose for the nomenclature a notation very similar to
ALGOL 60's <block head), a syntax with which I have always been perfectly
happy.
<nomenclature)::= <nomenclature element){; <nomenclature element)}
<nomenclature element)::= <nomenclature header) <variable)
{,<variable)}
<nomenclature header): : = privar Ipricon I virvar Ivircon I
glovar I glocon
Admittedly with a view to later extensions I propose to derive the initial-
izing statements from the assignment statements by post-fixing the variable
at the left-hand side by the special character "vir" -indicating that we deal
with a virgin variable- followed by the name of its type:
<primitive initializing statement)::= <variable) vir <type)
:=<expression)
where, as far as types are concerned we have confined ourselves up till now
to integers and booleans:
<type)::= int Iboo/
Note 1. The extension to concurrent initialization and concurrent assign-
ment and initialization is left to the ambitious reader. (End of note 1.)
Note 2. The expression(s) at the right-hand side are to be regarded as
still in the passive scope of the variables being initialized. (End of note 2.)
As an example we give the inner block that initializes the global integer
variable x with the GCD(X, Y), where with regard to the inner block, X and
Y are positive constants. The block uses a private variable called y.

begin glocon X, Y; virvar x; privar y;


x vir int, y vir int:= X, Y;
do x > y __. x: = x - y
Dy>x-->y:=y-x
od
end

I would like to point out that my decision to indicate "var" or "con" in


AN ESSAY ON THE NOTION: "THE SCOPE OF VARIABLES" 93

the nomenclature is not without arbitrariness. On the one hand someone


could argue that the indication is superfluous. If he decides to abolish the
convention, he is free to do so; he just reduces the headings to "pri" "vir"
and "glo" by omitting all the var's and con's from my programs. If my
programs were correct according to my conventions, his versions will be
equally correct according to his conventions. On the other hand someone
could have the wish to give more precise information than I provide the
means for. He could wish to indicate, for instance,

1. that the value of a variable can only increase;


2. that the initial value of a global variable will only influence its own final
value but no others;
3. that of all the ways in which the value of a global variable can be modi-
fied, only a subset will be used.

An indication like (1) seems too specific, for it is only applicable to types
whose values have a natural ordering. An indication like (2) seems also too
specific. What about a pair of global variables whose initial values only influ-
ence their own and each other's final value? An indication like (3), however,
could be meaningful. Our indication "con" then emerges as "the subset of
allowed modifiers is empty".
11 ARRAY VARIABLES

I have been trained to regard an array in the ALGOL 60 sense as a finite


set of elementary, consecutively numbered variables, whose "indentifiers"
could be "computed". But for two reasons this view does not satisfy me
anymore.
The first reason is my abhorrence of variables with undefined values.
Jn the previous chapter we solved this problem by introducing for each
variable a passive scope and an active scope, separated by a syntactically
recognizable initialization for that variable. But when we regard the array
as a collection of (subscripted) variables, that solution breaks down.
The second reason is of a combinatorial nature and more fundamental.
In ALGOL 60 the compound statement that causes the variables x and y
to interchange their values needs an additional variable, h say,
h:= x; x:= y; y:= h
which is cumbersome and ugly compared with the concurrent assignment
x, y := y, x
For the concurrent assignment we have insisted that all variables at the left-
hand side should be different: it would be foolish to attach to "x, x := 1, 2"
any other meaning than "error". For a long time, however, I hesitated to
adopt the concurrent assignment on account of the problems it causes in
cases like
A[i], A[j]:= x, y
Should this be allowed when i :::/= j, but not when i = j? Or is, perhaps,
i= j permissible if x = y holds as well, as for instance in
A[i], A[j] := A[j], A[i] ?

94
ARRAY VARIABLES 95

If we go that route we are clearly piling one logical patch upon another.
However, I have now come to the conclusion that it is not the concurrent
assignment, but the notion of the subscripted variable that is to be blamed.
In the axiomatic definition of the assignment statement via "substitution of
a variable" one cannot afford -as in, I guess, all parts of logic- any uncer-
tainty as to whether two variables are the same or not.
The moral of the story is that we must regard the array in its entirety as a
single variable, a so-called "array variable", in contrast to the "scalar vari-
ables" discussed so far. In the following I shall restrict myself to array vari-
ables that are the analogue of one-dimensional arrays.
We can regard (the value of) a variable of type "integer" as an integer-
valued function without arguments (i.e. defined on a domain consisting of
a single, anonymous point), a function that does not change unless explicitly
changed (usually by an assignment). It is perhaps unusual to consider func-
tions without arguments, but we mention the viewpoint for the sake of the
analogy. For, similarly, we can regard( the value of) a variable of type "integer
array" as an integer-valued function of one argument with a domain in the
integers, a function, again, that does not change unless explicitly changed.
But the value of a variable of type "integer array" cannot be any integer-
valued function defined on a domain in the integers, for I shall restrict myself
to such types that, given two variables of that type, we can write an algorithm
establishing whether or not the two variables have the same value. If x and y
are scalar variables of type "integer", then this algorithm boils down to the
boolean expression x = y, i.e. both functions are evaluated at the only
(anonymous) point of their domain and these integer values are then com-
pared. Similarly, if ax and ay are two variables of type "integer array", their
values are equal if and only if, as functions, they have the same domain and
in each point of the domain their values are equal to each other. In order
that all these comparisons are possible, we must restrict ourselves to finite
domains. And what is more, besides being finite, the domains must be avail-
able in one way or another to the algorithm that is to compare the values of
the array variables ax and ay.
For practical purposes I shall restrict myself to domains consisting of
consecutive integers (when not empty). But even then there are at least two
possibilities. In ALGOL 60 the domain is fixed by giving in the declaration
-e.g. "boolean array A[l: JO], B[l: 5]"- the lower and upper bounds for
the subscript value. As a type determines the class of possible values for a
variable of that type, we must come to the conclusion that the two arrays
A and Bin the above example are of different type: A may have 1024 different
values, B only 32. In ALGOL 60 we have as many different types "boolean
array" as we can have bound pairs (and, as the bound pair may contain
expressions, the type is in principle only determined upon block entry).
Besides that, the necessary knowledge about the domain must be provided
96 ARRAY VARIABLES

by other means: without further information it is impossible to write in


ALGOL 60 an inner block determining whether two global boolean arrays
A and B are equal!
The alternative is to introduce only one type "integer array" and only
one type "boolean array" and to regard "the domain" as part (aspect) of
any value of such type; we must then be able to extract that aspect from any
such value. Let ax be an array variable; in its active scope I propose to extract
the bounds of the domain from its value by means of two integer-valued
functions, denoted by "ax.lob" and ax.hib" respectively, with the understand-
ing that the domain of the function "ax(i)" extends over all integers i satisfy-
ing
ax.lob < i < ax.hib
Besides those two I propose a third (dependent) one, "ax.dam", equal to the
number of points in the domain. The three functions satisfy
ax.dam = ax.hib - ax.lob +1> 0
(Note that even the empty domain, dom = 0, has a place along the number
line; lob and hib remain defined and they then satisfy hib =lob - 1.)
We have used here a new notation, the dot as in "ax.lob", "ax.hib" and
"ax.dom". The names following the dot are what is called "subordinate to
the type of the variable whose name precedes the dot". Following the dot
that follows a variable, only names subordinate to the type of that variable
may occur and their meaning will be as defined with respect to that type.

Remark 1. In other contexts, i.e. not following the dot, the same names
may be used with completely different meaning. We could introduce an array
variable named "dom" and in its active scope we could refer to "dom.lob'',
"dom.hib" and even "dom.dom" ! Such perversities are not recommended and
therefore I have tried to find subordinate names that, although of some
mnemonic value, are unlikely candidates for introduction by the programmer
himself. (End of remark 1.)

Remark 2. A further reason for using the dot notation rather than the
function notation -e.g. "dom(ax)'', etc.- is that, unless we introduce differ-
ent sets of names for these functions defined on boolean arrays and integer
arrays respectively (which would be awkward) we are forced to introduce
functions of an argument that may be of more than one type, something I
would like to avoid as long as possible. (End of remark 2.)

Remark 3. The expression "ax(i)" is used to denote the function value


in point i. Only when the value of "ax(i)" is required does the argument i
need to be defined and to satisfy
ax.lob < i < ax.hib
ARRAY VARIABLES 97

In view of the dot notation we could regard "ax(i)" as an abbreviation for


"ax. val(i)'', where "val" is the subordinate name indicating evaluation in
the point as indicated by the value of the further argument i. For each type,
such an abbreviation can be introduced just once! Note that also the type
"integer" could have a subordinate name "val" that would enable us to write
a little bit more explicitly
x:= y.val
instead of the usual and somewhat sloppy x:= y. (End of remark 3.)

For the sake of convenience we introduce two further functions; for the
array variable ax they are defined if ax.dom > 0. They are
ax.low, defined to be equal to ax(ax.lob)
and
ax.high, defined to be equal to ax(ax.hib)
They denote the function values at the lowest and the highest point of the
domain respectively. They are nothing really new and are defined in terms
of concepts already known; in the definition of the semantics of operations
on array values we do not need to mention the effect on them explicitly.
As stated above, a scalar variable can be regarded as a function (without
argument) that can be changed by assigning a new value to it: such an assign-
ment destroys the information stored as "its old value" completely. We also
need operations to change the value of an array variable (without them it
would always be an array constant!) but the assignment of a new value to
it that is totally unrelated to its old value will play a less central role. It is
not that the assignment to an array variable presents any logical difficulties -
on the contrary, I am tempted to add- but there is something wrong with
its economics. With a large domain size the amount of information stored as
"the value of an array variable" can be very large, and neither copying nor
destroying such large amounts of information are considered as "nice"
operations. On the contrary: in many programming tasks the core of the
problem consists of building up an array value gradually, i.e. in a number of
steps, each of which can be considered as a "nice" operation, "nice" in the
sense that the new value of the array can be regarded as a "pleasant" deriva-
tion of its old value. What makes such operations "nice" or "pleasant"
depends essentially on two aspects: firstly, the relation between the old and
the new value should be mathematically manageable, otherwise the opera-
tions are too cumbersome for us to use; secondly, its implementation should
not be too expensive for the kind of hardware that we intend to instruct
with our program. The extent to which we are willing to take the latter hard-
ware constraints into account is not a scientific question, but a political one,
and as a consequence I don't feel obliged to give an elaborate justification
of my choices. For the sake of convenience I shall be somewhat more liberal
98 ARRAY VARIABLES

than many programmers would be, particularly those that are working
daily with machinery, the conceptual design of which is ten or more years
old; on the other hand I hope to be sufficiently aware of the possible technical
consequences of my choices that they remain, if not realistic, at least not
totally unrealistic.
Our first modification of the value of an array variable, ax say, does not
change the domain size, nor the set of function values, nor their order; it
only shifts the domain over a number of places, k say, upwards along the
number line. (If k < 0 it is a shift over -k places in the other direction; if
k = 0 it is the identity transformation, semantically equivalent to "skip".)
We denote it by
ax:shift(k)
Here we have introduced the colon ":". Its lowest dot indicates in the usual
manner that the following name is subordinate to the type of the variable
mentioned to its left; the upper dot is just an embellishment (inspired by the
assignment operator ": = "), indicating that the value of the variable men-
tioned to its left is subject to redefinition.
Immediately we are confronted with the question whether we can give
an axiomatic definition of the predicate transformer wp("ax:shift(E)'', R).
Well, it must be a predicate transformer similar to the one of the axiom of
assignment to a scalar variable, but more complicated -and this will be
true as well for all the other modifiers of array values- because the value of
a scalar value is fully defined by one (elementary) value, while the value of
an array variable involves the domain itself and a function value for all
points of the domain. Because the value of the array variable ax is fully
determined by
the value of ax.lob,
the value of ax.dom and
the value of ax(i) for ax.lob <i< ax.lob + ax.dom
we can, in principle at least, restrict ourselves to post-conditions R referring
to the array value only in terms of "ax.lob", "ax.dom" and "ax(arg)" where
"arg" may be any integer-valued expression. For such a post-condition R
the corresponding weakest pre-condition
wp("ax:shift(E)", R)
is derived from R by simultaneously replacing

1. all occurrences of ax.lob by (ax.lob + (E)) and


2. all occurrences of(sub)expressions of the form ax(arg) by ax((arg) - (£))
Note. If E itself depends on the value of ax, the safest way is to evaluate
first for the given R with a completely new name, K say, wp("ax:shift(K)",
R), in which then the actual expression Eis substituted for K. We have
ARRAY VARIABLES 99

already encountered the same complication when applying the axiom of


assignment to statements such as x:= x + f(x). (End of note.)

We give a few examples. Let R be ax.lob= JO, then


wp("ax:shift(ax.lob)'', R) =(ax.lob+ ax.lob= 10)
= (ax.lob = 5)
Let R be (Ai: 0 < i < ax.dom: ax(ax./ob + i) = i); then
wp("ax:shift(l)", R) =(Ai: 0 < i < ax.dom: ax(ax.lob + 7 + i - 7)=i)
=R
An alternative way of formulating the weakest pre-condition is
wp("ax:shift(E)", R) = Rax'-ax
(i.e. a copy of R, in which every occurrence of ax is replaced by ax'), where
ax'.!ob =ax.lob+ E
ax'.dom = ax.dom
ax'(arg) = ax(arg - E) for any value of arg.
From these three definitions it follows that
ax' .hib = ax.hib +E
ax' .low = ax.low
ax'.high =ax.high
Note. Such equalities are meant to imply that if the right-hand side is
undefined, the left-hand side is undefined as well. (End of note.)

For the definition of our further operators we shall follow the latter
technique: it describes more clearly how the final value ax' depends on the
initial value ax.
The next operators extend the domain at either the high or the low end
with one point. The function value in the new point is given as parameter
which must be of the so-called "base type" of the array, i.e. boolean for a
boolean array, etc. The operators are of the form
ax:hiext(x) or ax:loext(x)
The semantic definition of hiext is given by
wp("ax:hiext(x)", R) = Rax'-ax
where
ax'./ob = ax.lob
ax'.hib= ax.hib + 1
ax'.dom = ax.dom + 1
1 00 ARRAY VARIABLES

ax'(arg) = x for arg = ax.hib +I


= ax(arg) for arg -=f:: ax.hib + I
The semantic definition of loext is given by
wp("ax:loext(x)", R) = Rax'-ax
where
ax'.lob =ax.lob - I
ax'.hib = ax.hib
ax'.dom = ax.dom +I
ax'(arg) = x for arg = ax.lob - I
= ax(arg) for arg -=f:: ax.lob - I
Note. Our earlier remark that also the empty domain would have its
place along the number line was to ensure that the extension operators
hiext and loext are also defined when applied to an array variable with
dom = 0. (End of note.)

The next two operators remove a point from the domain at either the
high or the low end. They are only defined when initially dom > 0 holds
for the array to which they are applied; when applied to an array with dom =
0, they lead to abortion. They destroy information in the sense that one of
the function values gets lost.
The semantic definition of hirem is given by
wp("Qx:hirem", R) = (ax.dom > 0 and Rax'-ax>
where
ax'.lob =ax.lob
ax' .hib = ax.hib - I
ax' .dom = ax.dom - I
ax'(arg) =undefined for arg = ax.hib
= ax(arg) for arg -=f:: ax.hib
The semantic definition of lorem is given by
wp("ax:lorem", R) = (ax.dom > 0 and Rax'-aJ
where
ax'.lob =ax.lob + I
ax'.hib= ax.hib
ax'.dom = ax.dom - I
ax'(arg) =undefined for arg = ax.lob
= ax(arg) for arg -=f:: ax.lob
For the sake of convenience we introduce two further operations, the
ARRAY VARIABLES 101

semantics of which can be expressed in terms of the functions and operations


already introduced; they are
x, ax:hipop, semantically equivalent to "x:= ax.high; ax:hirem"
and
x, ax:lopop, semantically equivalent to "x:= ax.low; ax:lorem"
They are given in a notation which is reminiscent of the one for the concur-
rent assignment; the name following the":" must be subordinate to the type
of the variable immediately before the ":". Obviously, the other variable x
must be of the base type of the array variable ax.
The above modifiers all change the domain of the function, either only
its place along the number line or also its size. Two further modifiers will
be introduced, modifiers that leave the domain as it stands but only affect
one or two function values.
A very important modifier does not introduce new function values, but
only rearranges them. It is of the form
ax :swap(i, j)
It leads to abortion when invoked without both i and j lying in the domain.
Its semantics are given by
wp("ax:swap(i,j)", R) = (ax.lob< i < ax.hib and
ax.lob < j < ax.hib and

where
ax'.lob = ax.lob
ax'.hib = ax.hib
ax' .dom = ax.dom
ax'(arg) = ax(.i) for arg = i
= ax(i) for arg =j
= ax(arg) for arg =I= i and arg =I= j
Note. Initially i ::/= j is not required: if initially i = j holds, the value of
the array variable remains unaffected. (End of note.)

Our last modifier redefines a single function value; it is of the form


ax:alt(i, x)
It leads to abortion when invoked without i lying in the domain; the second
parameter x must be of the array variable's base type. Its semantics are given
by
102 ARRAY VARIABLES

wp("ax:alt(i, x)", R) =(ax.lob< i < ax.hib and

where
ax'.lob = ax.lob
ax'.hib = ax.hib
ax'.dom = ax.dom
ax'(arg) = x for arg = i
= ax(arg) for arg -::Fi
The operation denoted above as "ax:alt(i, x)" is semantically equivalent
to what FORTRAN or ALGOL 60 programmers know as "the assignment
to a subscripted variable". (They would write "AX(!)= X" and "ax[i] := x"
respectively.) I have introduced this operation in the form "ax:alt(i, x)" in
order to stress that such an operation affects the array ax as a whole: two
functions with the same domain are different functions if they differ in at
least one point of the domain. The "official" -or, if you prefer, "puritan"-
notation "ax:a/t(i, x)" is, however, even to my taste too cumbersome and too
unfamiliar and I therefore propose (I too have my weaker moments!) to
use instead
ax:(i) = x
a notation which is somewhat shorter, reminiscent of the so much more
familiar assignment statement, and still reflects by its opening "ax:" that we
must view it as affecting the array variable ax. (The decision to write "ax:(i)=
x" is not much different from the decision to write "ax(i)" instead of the
more pompous "ax.val(i)".)
None of the previous operators can be used for initialization. They can
only change the value of an array under the assumption that it has already
a value; they can only occur in the active scope of the array variable. We
have not yet introduced the assignment
ax:= bx
a construct that would do the job. I am, however, very hesitant to do so,
because in its full generality "assignment of a value" usually implies "copying
a value" and ifthe domain of the function bx is large, this is not to be regarded
as a "nice" operation in present technology. Not that I am absolutely unwill-
ing to introduce "unpleasant" operations, but if I do so, I would not like
them to appear on paper as innocent ones. A programming language in
which "x:= y" should be regarded as "nice" but "ax:= bx" should have to
be regarded as "unpleasant" would be misleading; it would at least mislead
me. A way out of this dilemma is to admit as the right-hand side of the
ARRAY VARIABLES 103

assignment to an array variable only enumerated constants, e.g. of the form


((integer){, <value of the base type)})
such that
bx:= (5, true, true,false, true)
would establish
bx.lob= 5 bx(5) =true
bx.hib = 8 bx(6) =true
bx.dom = 4 bx(l) =false
bx(8) =true
The consequence of such a restriction is that assignment of or initialization
with a value with a large domain size cannot be written down unnoticed. My
expectation is that most initializations will be with values with dom = 0.
A few concluding remarks are in order.
There is, to start with, the question of economics. My basic assumption
is that all operations mentioned in this chapter can be performed at roughly
the same price. Some assumption of this nature has to be made, for without
it the programming task does not make sense. For instance, instead of writing
ax:(5)= 7
we could have written the inner block

begin glovar ax; privar bx;


if ax.lob < 5 and 5 < ax.hib -->
bx vir int array:= (O);
do ax.hib =;t:. 5--> bx:hiext(ax.high); ax:hirem od;
ax :hirem; ax :hiext(l);
do bx.dom =;t:. 0--> ax:hiext(bx.high); bx:hirem od
fi
end

but I would like to reject that inner block as a worthy substitute, not so much
on account of the length of the text, but on account of its inefficiency. I will
not even regard "ax:(5)= 7" as an abbreviation of the above inner block.
With the possible exception of the assignment of an enumerated value,
I assume in particular the price of all operations independent of the values
of the arguments supplied to it: the price of executing ax:shift(k) will be
independent of the value of k, the price of executing ax:swap(i,j) will be
independent of the values of i and j, etc. With present-day technology these
assumptions are not unrealistic.
It is in such considerations that the justification is to be found for my
willingness to introduce otherwise superfluous names; we could have restrict-
1 04 ARRAY VARIABLES

ed ourselves to ax.lob and ax.dom, for whenever we would need ax.hib, we


could write
ax.lob + ax.dom - I
instead, but that would make the effective use of ax.hib "twice as expensive"
as the effective use of ax.lob and our consciousness of this fact could easily
twist our thinking (worse, it is guaranteed to do so).
I said that the prices are of the same order of magnitude. What I also
mean is "of the same order of magnitude as other things that we consider
as primitive". If the array operations were orders of magnitude more expen-
sive than other operations, we would, for instance, find ourselves invited to
replace
ax :swap(i, j)
by
if i -::/= j ---> ax :swap(i, j) 0 i = j---> skip fi
and very quickly we should need to know both the exact price ratios and a
very good estimate for the probability of hitting the case "i = j" in order to
be able to decide whether our replacement of ax: swap(i,j) by the alternative
construct is actually an improvement or not. I know of mathematicians who
revel in such optimization problems, sometimes thinking that they constitute
the central problems of computer programming. I leave these problems
gladly to them if they are happy with them; the operations that we prefer
to consider as primitive should not confront us with such conflicts. I like to
believe that we have more important problems to worry about.
A final remark about implementation. It is conceivable that upon initial-
ization of the array variable ax some limits are given: a lower limit for ax.lob,
or an upper limit for ax.hib, or both, or perhaps only an upper limit for
ax.dom. If such "hints to the compiler" are included, a wealth of traditional
storage management techniques becomes exploitable. I prefer, however, to
regard such "hints to the compiler" not as part of the program. They only
make (on some equipment!) a cheaper implementation possible; they repre-
sent for the implementation the permission (but not the obligation!) to abort
a program execution in which such a stated limit is exceeded.
THE LINEAR
12 SEARCH THEOREM

Let B be a boolean function defined on the integers and consider the


following beginning of a block:
begin privar i;
i vir int:= O;
do B(i) __. i:= i + I od;

For this repetitive construct we can formulate the invariant relation


P(i): (Aj: 0 <J < i: B(j))
Proof As P(O) is true, P(i) is established upon initialization. Furthermore
(P(i) and B(i)) = P(i + I)
= wp("i:= i + I'', P(i)) Q.E.D.
Without further knowledge about the boolean function B we cannot
prove that the repetitive construct will terminate. If, however,
(Ej: j > 0: non B(j))
the assumption of nontermination leads to the usual contradiction: the
existence of at least one value j > 0 such that
non B(j)
holds is sufficient to guarantee termination. Upon termination, however,
we know
P(i) and non B(i) =
(Aj: 0 <j < i: B(j)) and non B(i)

105
1 06 THE LINEAR SEARCH THEOREM

i.e. i is the minimum value > 0, such that non B(i) holds. In other words,
when we look for the minimum value (at least equal to some lower bound)
that satisfies some criterion, our program investigates values (starting at that
lower bound) in increasing order. Searching in increasing order translates
the first satisfactory value encountered into the smallest satisfactory value
existing. Similarly, when looking for a maximum value, we shall search in
decreasing order.
Very often, the two statements have the form
x: = xnought;
do B(x) ___. x:= F(x) od
This program searches in the sequence of values given by
x 0 = xnought
for i > 0: x 1 = F(x1 _ 1 )
the value x 1 with the minimum value of i (> 0), such that non B(x1) holds.
(More formal proofs of the above are left as an exercise to the industrious
reader, if so inclined.)
The insights described in this chapter are referred to as the "Linear
Search Theorem". In the next chapter we shall use it as part of our reasoning
for actually finding a solution; simple as it is, the Linear Search Theorem
has often proved to be of significant heuristic value.
THE PROBLEM
13 OF THE NEXT PERMUTATION

We are requested to write an inner block operating on a global integer


array variable, named c, with
c./ob = I and c.hib = n
for some constant value of n (> 1). Furthermore it is given that the ordered
sequence of values c(J), c(2), ... , c(n) is some permutation of the values
from I through n, but not the alphabetically last one: 11, n - 1, ... , 1. The
inner block has to transform the sequence c(J), c(2), ... , c(n) into its imme-
diate alphabetic successor. (For the notion of "alphabetic order", see the last
example of the chapter "The Formal Treatment of Some Small Examples".)
For instance, with n = 9, the sequence
146295873
should be transformed into
146297358
As the above example shows, we may have at the low end a number of
function values that remain unaffected. The transformation to be performed
is restricted to permuting the values at the high end and our first duty seems
to be to find that split, i.e. to determine the value of i, such that
c(k) remains unaffected for 1 < k < i
c(k) is changed for k = i
That value of i is characterized as the maximum value of i ( < n) such that
c(i) < c(i + 1)
107
108 THE PROBLEM OF THE NEXT PERMUTATION

(It could not be larger, for then we would be restricted to permuting an


initially monotonically decreasing sequence, an operation that cannot give
rise to a sequence that is higher in the alphabetic order; it should not be
smaller either, because then we would never generate the immediate alphabe-
tic successor.)
Note. The fact that the initial sequence is not the alphabetically last
one guarantees the existence of an i (0 < i < n) such that c(i) < c(i + I).
(End of note.)

Having found i, we must find from "the tail", i.e. among the values c(j)
with i + I < j < n, the new value c(i). Because we are looking for the
immediate successor, we must find that value ofj in the range i + I < j < n,
such that c(j) is the smallest value satisfying
c(j) > c(i)
Having found j, we can see to it that c(i) gets adjusted to its final value
by "c:swap(i,j)". This operation has the additional advantage that the total
sequence remains a permutation of the numbers from I through n; the final
operation is to rearrange the values in the tail in monotonically increasing
order. The overall structure of the program we are considering is now
determine i;
determine j;
c:swap(i,j);
sort the tail
(In our example i = 6, j = 8 and the final result would be reached via the
intermediate sequence I 4 6 2 9 7 8 5 3.)
When determining i, we look for a maximum value of i; the Linear Search
Theorem tells us that we should investigate the potential values for i in
decreasing order.
When determining}, we look for a minimum value c(j); the Linear Search
Theorem tells us that we must investigate c(j) values in increasing order.
Because the tail is a monotonically decreasing function (on account of the
way in which i was determined), this obligation boils down to inspecting
c(j) values in decreasing order of j.
The operation "c:swap(i,j)" does not destroy the monotonicity of the
function values in the tail (prove this!) and "sort the tail" reduces to inverting
the order. (In doing so, our program "borrows" the variables i and j that
have done their job. Note that the way in which the tail is reflected works
equally well with an even number as with an odd number of values in the
tail.)
THE PROBLEM OF THE NEXT PERMUTATION 109

begin glovar c; privar i, j;


i vir int:= c.hib - 1; do c(i) > c(i + 1)----> i:= i - 1 od;
j vir int:= c.hib; do c(.i) < c(i)-> j:= j - 1 od;
c :swap(i,j);
i:= i + l;j:= c.hib;
do i <j-> c:swap(i,j); i,j := i + l,j- 1 od
end

Remark 1. Nowhere have we used the fact that the values c(J), c(2), . .. ,
c(n) were all different from each other. As a result one would expect that
this program would correctly transform the initial sequence into its immediate
alphabetic successor also if some values occurred more than once in the
sequence. It does indeed, thanks to the fact that, while determining i and j,
we have formed our guards by "mechanically" negating the required condi-
tion c(i) < c(i + 1) and c(j) > c(i) respectively. I once showed this program,
when visiting a university, to an audience that absolutely refused to accept
my guards with equality included. They insisted on writing, when you knew
that all values were different
do c( i) > c(i + 1) ----> • • •
and
do c(j) < c(i) ----> •••

Their unshakable argument was "that it was much more expensive to test
for equality as well". I gave up, wondering by what kind of equipment on the
campus they had been brainwashed. (End of remark 1.)

Remark 2. Programmers unaware of the Linear Search Theorem often


code "determine j" erroneously in the following form:
+ 1; do c(j + 1) > c(i) ----> j: = j + 1 od
j vir int:= i
They argue that this program will only assign to j the value j + 1 after it
has been established that this new value is acceptable in view of the goal
cU) > c(i). Analyze why their version of "determine j" may fail to work pro-
perly. (End of remark 2.)

Remark 3. One time I had the unpleasant obligation to examine a


student whose inventive powers I knew to be strictly limited. Because he had
studied the above program I asked him to write a program transforming the
initial permutation, known not to be the alphabetically first, into its immedi-
ate alphabetic predecessor. I hope that this exercise takes you considerably
less than the hour he needed. (End of remark 3.)

Remark 4. This program is a particular friend of mine, because I remem-


ber having tackled this problem in my student days, in the Stone Age of
110 THE PROBLEM OF THE NEXT PERMUTATION

machine code programming (even without index registers: in the good old
von Neumann tradition, programs had to modify their own instructions in
store!). And I also remember that, after a vain struggle of more than two
hours, I gave up! And that at a moment when I was already an experienced
programmer! A few years ago, needing an example for lecturing purposes,
I suddenly remembered that old problem and solved it without hesitation
(and could even explain it the next morning to a fairly inexperienced audience
within twenty minutes). That now one can explain within twenty minutes to
an inexperienced audience what twenty years before an experienced pro-
grammer could not find shows the dramatic improvement of the state of the
art (to the extent that it is now even hard to believe that then I could not
solve this problem!). (End of remark 4.)

Remark 5. Equivalent to our criterion for i ("the maximum value of


i ( < n), such that c(i) < c(i + J)") is "the maximum value of i ( < n) such
that (Ej: i < j < n: c(i) < c(j))". The latter criterion is, however, less easily
usable and whoever starts with the latter one had better discover the other
one (in one way or another). (End of remark 5.)
THE PROBLEM OF THE DUTCH
14 NATIONAL FLAG

There is a row of buckets numbered from 1 through N. It is given that


P 1: each bucket contains one pebble
P2: each pebble is either red, white, or blue.
A mini-computer is placed in front of this row of buckets and has to
be programmed in such a way that it will rearrange (if necessary) the pebbles
in the order of the Dutch national flag, i.e. in order from low to high bucket
number first the red, then the white, and finally the blue pebbles. In order to
be able to do so, the mini-computer has been equipped with one output
command that enables it to interfere with pebble positions, viz.
"buck :swap(i,j)" for 1 < i < N and J < j < N:
for i = j: the pebbles are left as they are
for i ::;t:. j: two computer-controlled hands pick up the pebbles
from buckets nrs. i and j respectively and then drop
them in each other's bucket respectively. (This opera-
tion leaves relations Pl and P2 invariantly true.)
and one input command that can inspect the colour of a pebble, viz.
"buck(i)" for 1 < i < N:
when the computer program prescribes the evalua-
tion of this function of type "colour", a movable
"eye" is directed upon bucket nr. i, and delivers to the
mini-computer as the value of the function the colour
(i.e. red, white, or blue) of the pebble currently lying in
the bucket, the contents of which is inspected by the
"eye".

111
112 THE PROBLEM OF THE DUTCH NATIONAL FLAG

The constant N is a global constant from the context in which our pro-
gram is to be embedded as an inner block. Our program, however, has to
meet three special requirements:
1. It must be able to cope with all possible forms of "degeneration" as
presented by missing colours: the buckets may have been filled with
pebbles of two colours only, of one colour only, or of no colour at all
(if N = 0).
2. The mini-computer has a very small store compared with the values of
N it should be able to cope with, and therefore we are not allowed to
introduce arrays of any sort, only a fixed number of variables of the
types "integer" and/or variables of type "colour". (With variables of
type integer we mean here variables that cannot take on much more
than N different values.)
3. The program may direct the "eye" at most once upon each pebble (it
is assumed that the input operation is so time-consuming that looking
twice at the same pebble would lead to an inacceptable loss of time).
Furthermore, regarding programs of the same degree of complication,
the one that needs (on the average) the fewer swaps is to be preferred.
Although our pebbles are of only three different colours, the fact that
our eye can only inspect pebbles one at a time, together with requirement
(3), implies that halfway through the rearrangement process, we have to
distinguish between pebbles of four different categories, viz. "established
red" (ER), "established white" (EW), "established blue" (EB), and "as yet
uninspected" (X). Requirement (2) excludes that pebbles of these different
categories lie arbitrarily mixed: inside the mini-computer we then cannot
store "who is what". Our only way out is to divide the row of buckets into
a fixed number of (possibly empty) zones of consecutively numbered buckets,
each zone being reserved for pebbles of a specific category. Because four
different zones is the minimum, the introduction of just four zones seems
the first thing to try. But in what order? I found that many programmers
tend to decide without much thinking upon the order "ER", "EW", "EB'',
"X", but this is a rash decision. As soon as anyone is of the opinion that it
is attractive to place the zone "ER" at the low end, considerations of sym-
metry should suggest that the zone "EB" at the high end is equally attractive.
Still sticking to our earlier decision of only four different zones, we come to
the conclusion that the zones "EW" and "X" should be in the middle in
some order (convince yourself that it is now immaterial in which order!),
for instance:
"ER", "X", "EW", "EB"
Once we have chosen the above "general situation'', our problem is
essentially solved, for here we have a general situation of which both the
THE PROBLEM OF THE DUTCH NATIONAL FLAG 113

initial state (all buckets in zone "X") and the final state (zone "X" empty)
are special cases! We can establish it, and then a repetitive construct has to
decrease the size of zone "X" while maintaining this general situation. In
our mini-computer we need three integer variables for keeping track of the
place of the zone boundaries, e.g. "r", "w", and "b" with the meanings
1 <k< r: the kth bucket is in zone "ER"
(number of buckets r - I > 0)
the kth bucket is in zone "X"
(number of buckets w - r + 1 > 0)
w< k < b: the kth bucket is in zone "EW"
(number of buckets b - w > 0)
b <k<N: the kth bucket is in zone "EB"
(number of buckets N - b > 0)
Establishing the relation P to be kept invariant means initializing these
three variables in accordance with "all buckets in zone "X"", and the overall
structure of our program could be:

begin glovar buck; glocon N; privar r, w, b;


r vir int, w vir int, b vir int :=I, N, N; {P has been established}
do w > r --> "decrease number of buckets in zone "X"
under invariance of P"
od
end

Immediately we are faced with the question: by which amount shall the
guarded statement decrease the number of buckets in zone "X"? There are
three arguments -and as the reader will notice, they are of a fairly general
nature- in favour of trying first whether we can come away with "decrease
the number of buckets in zone "X" by J". The arguments are:

J. Decreasing by 1 is sufficient.
2. As we have chosen our guard "w > r", we can guarantee the presence
of at most one bucket in zone "X"; for two, we would have needed the
guard "w > r".
3. The one pebble inspected will face us with three different cases, inspect-
ing two confronts us already with nine different cases; this multiplicative
building up of cases to be considered should be interpreted, in principle,
as a heavy price to pay for whatever we can gain by it.

The next question to be settled is: which one of the uninspected pebbles
will be looked at? This question is not necessarily irrelevant, because in the
meantime in the ordering "ER","X", "EW'',"EB" an asymmetric situation
114 THE PROBLEM OF THE DUTCH NATIONAL FLAG

has been created. No experienced programmer will suggest an arbitrary


one, the ones at the low and the high end respectively are the most likely
candidates. With equal probabilities for the three colours, an inspection of
the pebble in the rth bucket will give rise to (0 + 1 + 2)/3 = 1 swap; inspec-
tion of the pebble in the wth bucket, however, will give rise only to
(1 + 0 + 1)/3 = 2/3 swap, and this settles the choice. Thus we arrive at
the following program:

begin glovar buck; glocon N; privar r, w, b;


r vir int, w vir int, b vir int := I, N, N;
do w > r--+
begin glovar buck, r, w, b; pricon col;
col vir colour := buck(w);
if col= red--+ buck:swap(r, w); r:= r + 1
0 col= white--+ w:= w - 1
0 col= blue-> buck:swap(w, b); w, b := w - 1, b - 1
fi
end
od
end

Note. The program is robust in the sense that it will lead to abortion
when fed with erroneous data such as one of the pebbles being green.
(End of note.)

In the case that all pebbles are red and no swaps are necessary, our pro-
gram will prescribe N swaps and as conscious programmers we should
investigate how complicated a possibly more refined solution becomes:
perhaps we have acted too cowardly in rejecting it. (As a general strategy
I would recommend not to try the more refined solution before having con-
structed the more straightforward one; that strategy gives us, besides a
working program, an inexpensive indication of what the considered refine-
ment as such has to compete with.) I have always thought the above solution
perfectly satisfactory, and up till now I have never considered a more com-
plicated one. So, here we go!
Inspecting just one can be extended to "inspecting one or two" or "inspect-
ing as many as we can conveniently place". In view of the case "all pebbles
red" something along the latter line seems indicated. Before inspecting the
uninspected pebble at the high end we could try to move the boundary
indicated by r to the high end as much as we possibly can without swapping,
because it seems a pity to replace a red pebble in a perfectly OK-position by
another red pebble. The outer repetition could then begin with
THE PROBLEM OF THE DUTCH NATIONAL FLAG 115

do w > r-->
begin glovar buck, r, w, b; privar colr;
colr vir colour : = buck(r);
do colr =red and r < w--> r:= r + 1; colr:= buck(r) od;

The inner repetition stops, either because all pebbles have been inspected
(r = w) or because we have hit a nonred pebble. The case r = w, where
colr may have one of three different values, reduces to the alternative clause
of the earlier program but for the fact that the "buck :swap(r, w)" -r and w
being equal to each other- can be omitted. The case r < w implies
colr -::;t::. red; now we must be willing to inspect another pebble, for otherwise
our solution reduces to the one which always inspects the pebble at the low
end of the zone "X" and of that solution we know that on the average it
generates more swaps than our first effort. Because r < w, there is indeed
another uninspected pebble and the one in the wth bucket is the obvious
candidate.
Again, with colr = white, it seems a pity to swap the pebble in the rth
bucket with a white one in the wth bucket and in case of r < w it seems
indicated to enter a new inner block

begin glovar buck, r, w, b; glocon colr; privar colw;


colw vir colour := buck(w);
do colw =white and w > r + 1--> w:= w - l; colw:= buck(w) od;

We have now for colr the two possibilities white or blue, for colw the
three possibilities red, white, or blue, and the set of uninspected pebbles
may be empty or not. For a moment I feared that I might have to distinguish
between about 12 cases! But after looking at it for a long time (and after
one false start), I discovered that the way to proceed now is first to place the
pebble now in the wth bucket and to see to it that in all three cases the pebble
originally in the rth bucket is left in the (new) wth bucket. Then the three
alternatives can merge and a single text deals uniformly with the second
pebble, the colour of which is still given by colr.

if colw =red-> buck:swap(r, w); r:= r 1 +


a colw = white--> w:= w - 1
a colw =blue-> buck:swap(w, b); w, b := w -1, b -1;
buck:swap(r, w)
fi.
'
if colr =white--> w:= w - 1
a colr =blue--> buck:swap(w, b); w, b := w - 1, b - 1
fi
116 THE PROBLEM OF THE DUTCH NATIONAL FLAG

Note I. It is nice that we could achieve that the concatenation of two


alternative constructs caters to the 2 * 3 colour-combinations! (End
of note I.)
Note 2. Convince yourself that the case colw = white has been correctly
dealt with, because in this case w = r + I initially. (End of note 2.)

Remark. For pedagogical reasons I slightly regret that the final treat-
ment of "two inspected pebbles in wrong buckets" did not turn out to be
worse; perhaps I should have resisted the temptation to do even this messy
job as decently as I could. (End of remark.)

I leave the final composition of our second program to the reader, if he


still so desires (I hope he does not!); I think the point has been made. I have
carried the case-analysis up to this stage in order to drive home the message
that such a case-analysis, also known under the name "combinatorial explo-
sion", should nearly always be avoided like the plague. It lengthens the pro-
gram texts, and that may easily impair the efficiency! It always impairs the
program's reliability, for it imposes upon the poor programmer's shoulders
a burden under which he usually succumbs (not in the last place because
the work becomes so boring). The reader may interpret the above exercise
as a hint to software managers not to grade their programmers by the number
of lines of code produced per month; I would rather let them pay the punched
cards they use out of their own pocket.
I have solved this problem often with students, either arriving at zones
in the order "ER", "X", "EW", "EB" or in the order "ER", "EW", "X",
"EB". When asked which pebble to inspect, their first suggestion had always
been "the leftmost one". I had the idea that this preference could be traced
to our habit of reading from left to right. Later I encountered students that
suggested first the rightmost one: one was an Israeli computing scientist,
the other one was of Syrian origin. It is somewhat frightening to discover
the devious ways in which our habits influence our thinking!
And this concludes my treatment of the problem of the Dutch national
flag, a problem that I owe to W.H.J. Feijen.
15 UPDATING
A SEQUENTIAL FILE

When the guarded commands had emerged and the word got around, a
graduate student that was occupying himself mainly with business-oriented
computer applications expressed his doubts as to whether our approaches
were of any value outside (what he called) the scientific/technical applications
area. To support his doubt he confronted us with what he regarded as a
typical business application, viz. the updating of a sequential file. (For a
~ore precise statement of his problem, see below.) He could show the flow-
chart of a program that was supposed to solve the problem, but that had
arrows going from one box to another in such a wild manner that that solu-
tion (if it was one, for we could never find out!) was considered a kludge by
both of us. With some pride he showed us a decision table -his decision
table- that, according to him, would solve the problem; but the size of that
transition table terrified us. As the gauntlet was thrown to us, the only thing
left for us to do was to pick it up. Our first two efforts, although cleaner
than anything we had seen, did not yet satisfy W.H.J. Feijen, whose solution
I shall describe in this chapter. It turned out that our first efforts had been
relatively unsuccessful because by the way in which the problem had been
presented to us, we had erroneously been led to believe that this special nut
was particularly hard to crack. Quod non. I include the treatment of the file
updating problem in this monograph for three reasons. Firstly, because it
gives us the opportunity to publish Feijen's neat solution to a common type
of problem for which, apparently, very messy solutions are hanging around.
Secondly, because it can be found by exactly the same argument that led to
the elegant solution of the problem of the Dutch national flag. Finally, it
gives us the opportunity to cast some doubts on the opinion that business

117
118 UPDATING A SEQUENTIAL FILE

programs are of a special nature. (If there is something special, it might be


the nature of business programmers .... )

There is given a file, i.e. ordered sequence, of records or, more precisely,
of values of type "record". If xis (the value of) a variable of type record, a
boolean function x.norm and an integer function x.key are defined, such that
for some constant inf
x.norm =- (x.key < inf)
(non x.norm) =- (x.key = inf)
The given file of records is called "old.file" and successive records in the
sequence have monotonically increasing values of their "key"; only for the
last record of oldfile "x.norm" is false.
Furthermore there is given a file, called "trans.file", which is an ordered
sequence of transactions or, more precisely, of values of type "transaction".
If y is (the value of) a variable of type transaction, the boolean y.norm and
the integer y.key are defined, such that for the same constant inf as above
y. norm=- (y.key <inf)
(non y.norm) =-
(y.key =inf)
Successive transactions of "transfile" have monotonically nondecreasing
values of their "key", only the last transaction is abnormal and has "y.key =
inf". If y.norm is true, three further booleans are defined, viz. "y.upd", "y.del"
and "y.ins", such that always exactly one of them is true.
Furthermore, with x of type record and y of type transaction, such that
y.norm is true, three operations modifying x are defined:

x :update(y) only defined if x.norm and y.upd and (x.key = y.key);


upon completion x.norm and (x.key = y.key) still holds
x:delete(y) only defined if x.norm and y.del and (x.key = y.key);
upon completion x.norm =false
x :insert(y) only defined if non x.norm and y.ins;
upon completion x.norm and (x.key = y.key) holds

With x of type record, we have furthermore the operation "x :setabnorm"


which leaves x.norm =false.
The program has to generate a record file, called "new.file", whose final
value depends on the initial value of the input files "old.file" and "trans.file"
in the following way. We can merge the records of old.file and the transactions
of trans.file in the order of nondecreasing key with the rule that if a given
key-value occurs (once!) in old.file and also (once or more) in trans.file, the
record with that key-value precedes the transaction(s) with that key-value
in the merged sequence. The internal order of the transactions with the same
key-value is not to be destroyed by this merging process. As long as there is
UPDATING A SEQUENTIAL FILE 119

still a transaction in the merged sequence, the merged sequence is subjected


to the transformation to be described below; the remaining sequence, consist-
ing of records only, is the desired final value of "new.file". The transformation
is described as follows.
Let y be the first transaction in the merged sequence, let x be the imme-
diately preceding record (if any):
if y.upd and there is a preceding record x with x.key = y.key, the latter
is modified by x:update(y) and y is removed from the sequence;
if y.del and there is a preceding record x with x.key = y.key, both x and
y are removed from the sequence;
if y.ins and there is not a preceding record x with x.key = y.key, y is
replaced by a record as results from the insert-operation;
if non norm.y, transaction y is removed;
if the first transaction satisfies none of these four criteria, it is removed
from the sequence under execution of the operation "error message", which,
for the purpose of this discussion, needs no further description.
We shall model this as an inner block operating on three global arrays:

record array old.file , only referred to by "lopop" and eventually


empty (i.e. oldjile.dom = 0)
transaction array trans.file, only referred to by "lopop" and eventually
empty (i.e. transfile.dom = 0)
record array new.file , only referred to by "hiext" and initially
empty (i.e. newjile.dom = 0)

We see some of the complications of this problem when we realize that


the merged sequence might contain a sequence of transactions, all with the
same key-value, and, for instance, successively characterized by the truth of
ins, ins, upd, de/, de/, upd, ins, upd, de/
where the second "ins", the second "de/" and the second "upd" will all give
rise to "error message" and the whole sequence contributes nothing at
all to the final value of new.file; such a sequence may occur with or without
a normal record following it.
As lengths of the files are unknown, our program will consist of a prelude,
one or more repetitive constructs, and possibly a (nonrepetitive) code
closing the new.file with an abnormal record. In general a new record for
new.file can only be generated provided that the record with the next higher
key from old.file and also the transaction with the next higher key from
trans.file have already been "seen". (Think about a new record derived from
the transaction file only.) For this reason, we introduce a variable x of type
"record" and a variable y of type "transaction", in which the record and the
transaction with these next higher keys will be stored; upon creation of a
1 20 UPDATING A SEQUENTIAL FILE

record of new.file, "x, followed by the remaining records of old.file" will


represent the still unprocessed records, "y, followed by the remaining transac-
tions of trans.file" will represent the still unprocessed transactions. This applies
when a new normal record has been added to new.file; the final abnormal
record will be attached by the coda. Because an unknown number of new
records can still be generated as long as one of the two input files is not yet
exhausted, the overall structure suggested for our program is now

begin glovar old.file, trans.file, new.file; privar x, y;


x vir record, old.file :lopop;
y vir transaction, trans.file :lopop;
do x.norm or y.norm __.
"process at least one record or transaction such as to
ensure progress"
od;
"extend newfile with abnormal record"
end

In designing repetitive constructs the crucial decision is how to synchro-


nize the progress of the computation with the cycling, the decision regarding
how much shall be done by each guarded statement list when selected for
execution. In this example we get into trouble when we synchronize with
old.file or trans.file, because either of them may already be exhausted; we
get into similar trouble when we try to synchronize with the generation of
records for new.file, because, though none of the input files is exhausted, we
may have generated the last normal record for newfile. The answer to the
question of synchronization is given in exactly the same manner as with
the problem of the Dutch national flag, when the number of pebbles to be
inspected had to be decided: the guarded statement will do exactly as much
work -no more and no less- as needs to be done and can be done, when
its guard is true.
If our guard x.norm or y.norm is true, the only conclusion we can draw
is that the unprocessed parts of the input files contain for at least one key-
value such that key< irif a record and/or transactions: the guarded state-
ment list will therefore process all records and/or transactions with that
key-value!
In the first statement -an alternative construct- that key-value is deter-
mined and recorded in the local constant "ckey" (short for "current key").
Also the variable "xx" of type record is initialized; it is in this variable that
the new record that may result is being built up. If the current key is derived
from old.file, the only record with that key is processed, otherwise old.file will
be left untouched.
In the second statement -a repetitive construct- all transactions with
UPDATING A SEQUENTIAL FILE 1 21

that key-value are processed; as the processing of transactions is fully


restricted to this repetitive construct, the distinction between the various
kinds of transactions is well concentrated.
In the third statement -an alternative construct- a new record may be
added to new.file.

begin glovar old.file, trans.file, new.file; privar x, y;


x vir record, old.file:lopop;
y vir transaction, trans.file :lopop;
do x.norm or y.norm-->
begin glovar old.file, trans.file, new.file, x, y;
pricon ckey; privar xx;
if x.key < y.key--> ckey vir int:= x.key; xx vir record:= x;
x, old.file:lopop
ax.key> y.key--> ckey vir int:= y.key; xx vir record:setabnorm
fi {ckey <inf and ckey < x.key and ckey < y.key};
do y.key = ckey--> {y.key <inf, i.e. y.norm}
if y.upd and xx.norm--> xx:update(y)
ay.del and xx.norm--> xx:delete(y)
ay.ins and non xx.norm--> xx:insert(y)
a y.ins = xx.norm --> error message
fi; y, trans.file:lopop
od {ckey < x.key and ckey < y.key};
if xx.norm--> newfi,le:hiext(xx) a non xx.norm--> skip fi
end
od; new.file:hiext(x) {new.file closed with abnormal record}
end

Remark 1. In the way in which we have presented this problem, one of


the possible values of a variable x of type record was the abnormal value
with x.norm =false; in the problem statement this value was used to mark
the end of old.file. Thanks to that convention it was possible to restrict refer-
ences to old.file to the operation "lopop"; if, however, "old.file.dom" would
also have been available, such an abnormal value as end marker of the file
would not have been necessary and the whole problem could have been
stated in terms of a type "normal record". For the purpose of this program
we would then need linguistic means for introducing the type "record" as
used in our solution. Programming languages without such means force its
users to meet such a need for instance by introducing a pair of variables,
say a variable "xx" of type "normal record" and a boolean variable "xxnorm",
and then to write a program explicitly manipulating the variables of this
pair; xxnorm has then to indicate whether the value of xx is significant.
Such "conditional significance" is incompatible with our idea of explicit
122 UPDATING A SEQUENTIAL FILE

initialization with meaningful values. In the case x.key < y.key both xx
and xxnorm can be given meaningful initial values; in the case x.key > y.key,
however, only
xxnorm vir boo/ :=false
would be really meaningful: our conventions would oblige us to initialize
xx then with some dummy value. We have circumvented this problem by
assuming the type record to include the abnormal value as well, thus postpon-
ing a discussion of the linguistic means that would be needed for the introduc-
tion of new types. (End of remark 1.)
MERGING PROBLEMS
16 REVISITED

In the two preceding chapters we have followed a pattern of reasoning


that was rather different from our earlier formal treatments, where we derived
the guards once the invariant relation and the variant function (for the repeti-
tive construct) or the post-condition (for the alternative construct) and the
statement (list) to be guarded had been decided. In dealing with the problem
of the Dutch national flag and the file updating problem, however, we started
our reasoning about the repetitive construct with the guard and from there
argued what the guarded statement could be like, a more classical approach.
Particularly with respect to the file updating program I can imagine that
earlier chapters have made some of my readers so suspicious that they wonder
why I dare to believe that last chapter's program is correct; against all rules,
I have not even mentioned an invariant relation! The current chapter is
included, among other reasons, to supply the omitted material; simulta-
neously it will give us an opportunity to deal with a somewhat wider class of
problems in a more abstract manner.
Let x, y, and z be sets (and we can restrict ourselves without loss of
generality to sets of integers -there are enough of them!). We recall that, by
definition of the notion "set'', all its elements are different from each other;
in this respect a "set of integers" differs from "a collection of numbers that
may contain duplicates". Two sets are equal (x = y), if and only if every
element of one set occurs in the other as well and vice versa.
We denote the union of two sets by a "+", i.e. if z = x + y, then z
contains an element if and only if that element occurs in either x, or y, or
both.
We denote the intersection of two sets by a "*", i.e. if z = x * y, then z
contains an element if and only if that element occurs in both x and y.

123
1 24 MERGING PROBLEMS REVISITED

We denote the empty set by "0 ", i.e. z = 0 if and only if z contains no
element at all.
We now consider the task of computing for fixed sets X and Y the value
Z, given by
Z= X+ Y
(In the course of this discussion X and Y, and therefore Z, are regarded as
constants: Z is the desired final value of a variable to be introduced later.)
Our program is requested to do so by manipulating -i.e. inspecting, chang-
ing, etc.- sets element by element.
Before proceeding to think in more detail about the algorithm, we realize
that halfway through the computational process, some of the elements of Z
will have been found and some not, that is, there exists for the set Z a parti-
tioning
Z = zl z2 +
Here the symbol "+" is a shorthand for
Z = zl + z2 and zl * z2 = 0
(We may think of zl as the set of elements whose membership of Z has been
definitely established, and of z2 as the set of Z's remaining elements.)
Similarly, halfway through the computational process, the sets X and Y
can be partitioned
X = xl +
x2 and Y = yl y2 +
(Here we may think of the sets xl and yl as those elements of X and Y
respectively which do not need to be taken into account anymore, as they
Jiave been successfully processed.)
These interpretations of the partitionings of Z, X, and Y are, however,
of later concern. We shall first prove, quite independent of what might be
happening during the execution of our program, a theorem about such
partitionings.

THEOREM.

If Z=X+ Y (J)
X= xl + x2 (2)
Y= yl +Y2 (3)
zl = xl + yl (4)
z2 = x2 + y2, (5)
then Z = zl + z2 <=> (xl * y2 = 0 and yl * x2 = 0) (6)
Proof To show that the left-hand side of (6) implies its right-hand side,
we argue as follows:
Z = zl +
z2 => zl * z2 = 0
MERGING PROBLEMS REVISITED 125

substituting (4) and (5), we find


(xi + yi)*(X2 + y2) =
W*~+W*~+~*~+~*~=0
which implies the right-hand side of (6). To show that the right-hand side
of (6) implies its left-hand side, we have to show that it implies

zi * z2 = 0 and Z = zi + z2
zi * z2 =(xi + yi)*(x2 + y2)
= (xi *x2)+(xi *y2)+(yi *x2)+(yi *y2)
=0+0+0+0=0
zi + z2 =(xi + yi)+(x2 + y2)
=xi + x2 + yi + y2
= X + Y= Z (End of proof)

If relations (J) through (5) hold, the right-hand side of(6), and, therefore,
Z = zi :::+=: z2
is also implied by
zi * x2 = 0 and zi * y2 = 0 (7)
and from (J) through (5) and (7) it follows that if the partitioning of Z has
been chosen, the other two partitionings are uniquely defined.
Armed with the above knowledge, we return to our original problem, in
which our program should establish
R: z=Z
Not unlike our treatment of earlier problems, we introduce two variables
x and y and could try the invariant relation
P: z + (x + y) =Z
which has the pleasant property that it is trivially satisfied by
PO: = 0 and x = X and y =
z Y
while, together with (x + y) = 0, it implies R.
Our theorem now suggests to identify x with x2, y with y2, and z with
zi (the asymmetry reflecting that X and Y are known sets, while Z has to
be computed). After this identification, (2) through (5) define all sets in terms
of x and y. If we now synchronize the shrinking of x and y in such a way as
to keep the right-hand side of (6) or
Z*X= 0 and Z*Y= 0 (7')
invariant as well, then we know that
Z = X +Y= zi :::+=: z2
126 MERGING PROBLEMS REVISITED

Extending z ( = zl) with an element e implies, because zl ::t: z2 is a parti-


tioning of the constant set Z, taking away that element e from x + y ( = z2).
In order that such an element exists, the union should not be empty, i.e.
(x + y) -::;t::. 0 or, equivalently, "x -::;t::. 0 or y -::;t::. 0"; the element e should
either be a member of x but not of y, or a member of y but not of x, or a
member of both x and y, and it should be taken away either from x or from
y or from both respectively. The program structure we are considering is:
x,y,z := X, Y, 0;
do x-::;t::. 0 or y -::;t::. 0 --->"transfer an element from (x + y) to z" od
We now assume (the elements of) the set x to be represented by the values
ax(i) with ax.lob < i < ax.hib and (the elements of) the set y by the values
ay(i) with ay.lob < i < ay.hib, where ax and ay are monotonically increasing
integer functions with ax.high = ay.high = inf The advantage of that addi-
tional value inf is that, even if x or y is empty, ax.low and ay.low are still
defined. The advantage of the monotonically increasing order is that for
a nonempty set x, ax.low equals its minimum element, and similarly for y.
As a result, if not both sets are empty, there is one element from the union,
for which it is very easy to determine whether it belongs to x, to y, or to
both, viz.
min(ax.low, ay.low)
If, for instance ax.low < ay.low, the element equals ax.low and occurs in
x but cannot occur in y because all y's elements are larger than it; its removal
from xis then duly represented by ax: lorem, leaving x's remaining elements
still represented by a monotonically increasing function.
If (the elements of) the set z must be represented according to the same
convention, we choose to represent it by all values az(i) with az.lob < i <
az.hib; the last value inf can then be added only at the end of the computa-
tion. The resulting function az will be monotonically increasing if its value
is only changed by az: hiext(K), such that initially either az.dom = 0 or
az.dom > 0 and az.high < K. Our program will satisfy those constraints.
Assuming the array variables ax and ay properly initialized (i.e. with
ax.high = ay.high = inf) and az with az.dom = 0, the following two state-
ments would perform the desired transformation:

do ax.low -::;t::. inf or ay.low -::;t::. inf---> {min(ax.low, ay.low) <inf}


if ax.low< ay.low---> az: hiext(ax.low); ax: lorem
a ax.low> ay.low---> az: hiext(ay.low); ay: lorem
a ax.low= ay.low---> az: hiext(ax.low); ax: lorem; ay: lorem
fi {az.high < min(ax.low, ay.low)}
od {az.dom = 0 cor (az.dom > 0 and az.high <inf)};
az: hiext(inf)
MERGING PROBLEMS REVISITED 1 27

EXERCISES

1. Modify this program such that it will establish u = U as well, where U is given
by U= X* Y.
1. Make a similar program, such that it will establish z = Z, where Z is given by
Z=W+X+Y.
3. Make a similar program, such that it will establish z = Z, where Z is given by
Z= W+ (X* Y).
4. Make a program establishing z = X + Y, but without assuming (nor intro-
ducing!) the value "inf" marking the high ends of the domains; empty sets may
be detected by ax.dam = 0 and ay.dom = 0 respectively. (End of exercises.)

At the expense of still some more formal machinery we could have played
our formal game in extenso.
Let, for any predicate P(z), the semantics of "z: = x '.f y" be given by
wp("z:= x + y", P(z)) = (x *y = 0 and P(x + y))
here the first term expresses that the intersection of x and y should be empty
for x '.f y to be defined.
Let, for any predicate P(z), the semantics of z:= x::::: y be given by
wp("z:= x::::: y", P(z)) = (x *y = y cand P(x::::: y))
where the first term expresses that y should be fully contained in x for x ::::: y
to be defined and x ::::: y then represents the unique solution of (x::::: y) '.f y
=x.
Eliminating xi, x2, yl, y2, zl, and z2, we find that we have to maintain
in terms of x, y, and z the relations:
Pl:
P2: Y*Y=y
P3: z = (X::::: x) + (Y::::: y)
P4: X*(Y:::::y)=0
P5: y *(X::::: x) = 0
and we can ask ourselves under what initial circumstances the execution of
S: z, x := z '.f {e}, x::::: {e}
will leave these relations invariant for some element e. We begin by investigat-
ing when this concurrent assignment is defined, i.e.

wp(S, T) = (z *{e} = 0 and x *{e} = {e})


1 28 MERGING PROBLEMS REVISITED

Because
(Pl and x *[e} = [e}) =- (X ~ x)*[e} = 0
(P2 and P4 and x *te} = [e}) =- (Y ~ Y)*[e} = 0
Q =- wp(S, T) with
Q =(Pl and P2 and P3 and P4 and x *[e} = [e})
It is now not too difficult to establish
Q =- wp(S, Pl and P2 and P3 and P4)
However:
wp(S, P5) = (wp(S, T) and y *(X ~ (x ~ [e})) = 0)
+
= (wp(S, T) and y *((X ~ x) [e}) = 0)
= (wp(S, T) and P5 and y *{e} = 0)
and consequently, the guard for S such that Pl through P5 remain invariant
IS
x *{e} = {e} and y *{e} = 0
i.e. e should be an element of x, but not of y, et cetera.
The above concludes my revisiting of merging problems. In the last two
chapters I have given treatments of different degrees of formality and which
one of them my reader will prefer will depend as much on his needs as on
his mood. But it seems instructive to go through the motions at least once!
(As the result in all probability shows, the writing of this chapter created
considerably more difficulties than anticipated. It is at least the fifth version;
that, in itself, already justifies its inclusion.)
AN EXERCISE ATTRIBUTED
17 TO R.W. HAMMING

The way the problem reached me was: "To generate in increasing order
the sequence J, 2, 3, 4, 5, 6, 8, 9, JO, 12, ... of all numbers divisible by no
primes other than 2, 3, or 5." Another way of stating which values are in the
sequence is by means of three axioms:

Axiom I. The value 1 is in the sequence.


Axiom 2. If x is in the sequence, so are 2 * x, 3 * x, and 5 * x.
Axiom 3. The sequence contains no other values than those that
belong to it on account of Axioms I and 2.

(We leave to the number theorists the task of establishing the equivalence of
the two above definitions.)
We include this exercise because its structure is quite typical for a large
class of problems. Being interested only in terminating programs, we shall
make a program generating only the, say, first 1000 values of the sequence.
Let

PO(n, q) mean: the value of "q" represents the ordered set of the first "n"
values of the sequence.

Then Axiom 1 tells us that I is in the sequence and, as 2 * x, 3 * x, and 5 * x


are functions whose value is > x for x > 0, Axiom 2 tells us that 1 is the
minimum value whose membership of the sequence can be established on
account of the first two axioms. Axiom 3 then tells us that 1 is the minimum
value occurring in the sequence and therefore PO(n, q) is easily established for
n = 1: "q" then contains the value 1 only. The obvious program structure is:
129
130 AN EXERCISE ATIRIBUTED TO R.W. HAMMING

"establish PO(n, q) for n = J";


don =F 1000-.
"increase n by 1 under invariance of PO(n, q)"
od

Under the assumption that we can extend a sequence with a value "xnext",
provided that the value "xnext" is known, the main problem of "increase n
by 1 under invariance of PO(n, q)" is how to determine the value "xnext".
Because the value 1 is already in q, xnext > 1, and xnext's membership of
the sequence must therefore rely on Axiom 2. Calling the maximum value
occurring in q "q.high", xnext is the minimum value > q.high, that is, of the
form 2 * x or 3 * x or 5 * x such that x occurs in the sequence. But because
2 * x, 3 * x, and 5 * x are all functions whose value is > x for x > 0, that
value of x must satisfy x < xnext; furthermore, x cannot satisfy x > q.high,
for then we would have
q.high < x < xnext
which would contradict that xnext is the minimum value > q.high. Therefore
we have x < q.high, i.e. x must already occur in q, and we can sharpen our
definition of xnext: xnext is the minimum value > q.high, that is of the form
2 * x or 3 * x or 5 * x, such that x occurs in q. (It is for the sake of the above
analysis that we have initialized PO(n, q) for n = 1; initialization for n = 0
would have been just as easy, but then q.high would not be defined.)
A straightforward implementation of the above analysis would lead to
the introduction of the set qq, where qq consists of all values xx > q.high,
such that xx can be written as
xx= 2 * x, with x in q,
or as
xx= 3 * x, with x in q,
or as
xx= 5 * x, with x in q
The set qq is nonempty and xnext would be the minimum value occurring
in it. But upon closer inspection, this is not too attractive, because the adjust-
ment of qq would imply (in the notation of the previous chapter)
qq:= (qq ~ {xnext}) + {2 * xnext, 3 * xnext, 5 * xnext}
where the "+" means "forming the union of two sets". Because we have to
determine the minimum value occurring in qq, it would be nice to have the
elements of q ordered; forming the union in the above adjustment would
then require an amount of reshuffling, which we would like to avoid.
A few moments of reflection, however, will suffice for the discovery that
we do not need to keep track of the whole set qq, but can select xnext as the
minimum value occurring in the much smaller set
AN EXERCISE ATTRIBUTED TO R.W. HAMMING 131

qqq = {x2} + {x3} + {x5},


where
x2 is the minimum value > q.high, such that x2 = 2 * x
and x occurs in q,
x3 is the minimum value> q.high, such that x3 = 3 * x
and x occurs in q and
x5 is the minimum value > q.high, such that x5 = 5 * x
and x occurs in q.
The above relation between q, x2, x3, and x5 is denoted by P l(q, x2, x3, x5).

A next sketch for our program is therefore:

"establish PO(n, q) for n = I";


don :::/= 1000 -->

"establish Pl(q, x2, x3, x5) for the current value of q";
"increase n by I under invariance of PO(n, q), i.e.
extend q with min(x2, x3, x5)"
od

A program along the above lines would be correct, but now "establish
Pl(q, x2, x3, x5) for the current value of q" would be the nasty operation,
even if -what we assume- the elements of the ordered set q are as accessible
as we desire. The answer to this is a standard one: instead of computing x2,
x3, and x5 as a function of q afresh when we need them, we realize that the
value of q only changes "slowly" and try to "adjust" the values, which are a
function of q, whenever q changes. This is such a standard technique that it
is good to have a name for it; let us call it "taking the relation outside (the
repetitive construct)". Its application is reflected in the program of the follow-
ing structure:

"establish PO(n, q) for n = I";


"establish PJ(q, x2, x3, x5) for the current value of q";
do n :::/= 1000 -->
"increase n by I under invariance of PO(n, q), i.e.
extend q with min(x2, x3, x5)";
"re-establish Pl(q, x2, x3, x5) for the new value of q"
od

The re-establishment of Pl(q, x2, x3, x5) has to take place after extension
of q, i.e. after increase of q.high; as a result, the adjustment of x2, x3, and x5
is either the empty operation, or an increase, viz. a replacement by the corre-
sponding multiple of a higher x from q. Representing the ordered set q by
132 AN EXERCISE ATIRIBUTED TO R.W. HAMMING

means of an array aq, i.e. as the values aq(l) through aq(n) in monotonically
increasing order, we introduce three indices i2, i3, and i5, and extend Pl with
... and x2 = 2 * aq(i2) and x3 = 3 * aq(i3) and x5 = 5 * aq(i5)
Our inner block, initializing the global array variable aq with the desired
final value could be:

begin virvar aq; privar i2, i3, i5, x2, x3, x5;
aq vir int array:= (J, l); {PO established}
i2 vir int, i3 vir int, i5 vir int:= 1, 1, l;
x2 vir int, x3 vir int, xS vir int:= 2, 3, 5; {Pl established}
do aq.dom =F 1000 -->
if x3 > x2 < x5--> aq:hiext(x2)
0 x2 > x3 < x5--> aq:hiext(x3)
0 x2 > x5 < x3 --> aq :hiext(x5)
fi {aq.dom has been increased by 1 under invariance of PO};
do x2 < aq.high--> i2:= i2 + 1; x2:= 2 * aq(i2) od;
dox3 < aq.high--> i3:= i3 + l; x3:= 3 * aq(i3) od;
do x5 < aq.high--> i5:= i5 + 1; x5:= 5 * aq(i5) od
{Pl has been re-established}
od
end

In the above version it is clearly expressed that after re-establishing Pl


we have x2 > aq.high and x3 > aq.high and x5 > aq.high. Apart from that
we could have used" ... = aq.high" instead of" ... < aq.high" as well.
Note 1. In the last three inner repetitive constructs each guarded state-
ment list is selected for execution at most once. Therefore, we could have
coded them
if x2 = aq.high--> i2:= i2 + 1; x2:= 2 * aq(i2)
0 x2 > aq.high --> skip
fi; etc.
When I start to think about this choice, I come out with a marked pre-
ference for the repetitive constructs, for what is so particular about the
fact that a repetition terminates after zero or one execution as to justify
expression by syntactic means? Very little, I am afraid. Any hesitation to
recognize "zero or one times" as a special instance of "at most k times"
is probably due to our linguistic inheritance, as all Western languages
distinguish between singular and plural forms. (If we had been classical
Greeks (i.e. used to thinking in terms of a dual form as well) we might
have felt obliged to introduce in addition special syntactical gear for
AN EXERCISE ATTRIBUTED TO R.W. HAMMING 133

expressing termination after at most two executions!) To end in "Updat-


ing a sequential file" with
do xx.norm--> newjile:hiext(xx); xx:setabnorm od
instead of with
if xx.norm--> newjile:hiext(xx) a non xx.norm--> skip fi
would, in a sense, have been more "honest", for the output obligation as
expressed by xx.norm has been met. (End of note I.)
Note 2. The last three inner repetitive constructs could have been com-
bined into a single one:

dox2 < aq.high--> i2:= i2 +I; x2:= 2 * aq(i2)


D x3 < aq.high--> i3:= i3 +I; x3:= 3 * aq(i3)
D x5 < aq.high--> i5:= i5 +I; x5:= 5 * aq(i5)
od

I prefer, however, not to do so, and not to combine the guarded com-
mands into a single set when the execution of one guarded statement list
cannot influence the truth of other guards from the set. The fact that the
three repetitive constructs, separated by semicolons, now appear in an
arbitrary order does not worry me: it is the usual form of over-specifica-
tion that we always encounter in sequential programs prescribing things
in succession that could take place concurrently. (End of note 2.)

The exercise solved in this chapter is a specific instance of a more general


problem, viz. to generate the first N values of the sequence given axiomatically
by

Axiom I. The value I is in the sequence.


Axiom 2. If x is in the sequence, so are f(x), g(x), and h(x), where
f, g, and h are monotonically increasing functions with the
property f (x) > x, g(x) > x, and h(x) > x.
Axiom 3. The sequence contains no other values than those that
belong to it on account of Axioms I and 2.

Note that if nothing about the functions/, g, and h were given, the prob-
lem could not be solved!

EXERCISES

1. Solve the problem if Axiom 2 is replaced by:


Axiom 2. If x is in the sequence, so are /(x) and g(x), where f and g have
the property f(x) > x and g(x) > x.
134 AN EXERCISE ATIRIBUTED TO R.W. HAMMING

2. Solve the problem if Axiom 2 is replaced by:


Axiom 2. If x and y are in the sequence, so is f(x, y), where f has the
properties

1. f(x,y)> x
2. (yl > y2) ~ (f(x, yl) > f(x, y2))
(End of exercises.)

The inventive reader who has done the above exercises successfully can
think of further variations himself.
THE PATTERN MATCHING
18 PROBLEM

The problem that is solved in this chapter is a very famous one and has
been tackled independently by many programmers. Yet we hope that our
treatment gives some pleasure to even those of my readers who considered
themselves thoroughly familiar with the problem and its various solutions.
We consider as given two sequences of values
p(O), p(l), ... , p(N - 1) with N> 1
and
x(O), x(l), ... , x(M - 1) withM> 0
(usually Mis regarded as being many times larger than N). The question to
be answered is: how many times does the "pattern", as given by the first
sequence, occur in the second sequence?
Using
(Ni: 0 < i < m: B(i))
to denote "the number of different values of i in the range 0 < i < m for
which B(i) holds", a more precise description of the final relation R that is
to be established is
R: count= (Ni: 0 < i< M - N: match(i))
where the function match(i) is given by
for 0 < i < M - N: match(i) = (Aj: 0 <j < N:p(j) = x(i + j))
for i < 0 or i > M - N: match(i) =false
(To define match(i) =false for those further values of i, thus making it a
total function, is a matter of convenience.)
If we take as invariant relation
Pl: count= (Ni: 0 < i< r: match(i)) and r > 0

135
136 THE PATTERN MATCHING PROBLEM

we have one which is trivially established by "count, r:= 0, O" and, further-
more, is such that
(Pl and r > M - N) => R
(The "matter of convenience" referred to above is that now the above inequal-
ity will do the job.) This gives a sketch for the program:
count, r:= 0, O;
do r < M - N __. "increase r under invariance of PI" od
and the reader is invited to work out for himself the refinement in which r
is always increased by I ; in the worst case, the time taken by the execution
of that program will be proportional to M * N.
Depending on the pattern, however, much larger increases of r seem
sometimes possible: if, for instance, the pattern is (J, 2, 3, 4, 5) andmatch(r)
has been found to hold, "count, r:= count+ 1, r + 5" would leave Pl invari-
ant! Considering the invariant relation
P2: (Aj: 0 <j < k:p(j) = x(r + j)) andO < k < N
(which can be expected to play a role in the repetitive construct computing
match(r )), we can investigate what we can gain by taking that relation outside
the repetitive construct, i.e. we consider:
count, r, k:= 0, 0, O;
do r < M - N--> "increase r under invariance of PI and P2" od
(relation P2 being vacuously satisfied by k = 0).
In view of the validity of relation P2 and the formula for match(r), the
most natural thing to start the repeatable statement with is to try to determine
match(r); as the truth of match(r) can be concluded from P2 and k = N, we
prescribe that k be increased as long as is necessary and possible:
do k * N candp(k) = x(r + k)--> k:= k +I od (J)
upon termination of which -and termination is guaranteed- we have
P2 and (k = N cor p(k) -=;t:. x(r + k))
from which we can conclude that match(r) = (k = N). Thus it is known
whether increasing r by I should be accompanied by "count:= count+ I"
or not. We would like to know by how much r can be increased without
further increase of count and without taking any further x-values into account.
(The taking into account of x-values is done in statement (J); to do so is its
specific purpose! Here we are willing to exploit only properties of the -
constant- pattern.)
If k = 0, we conclude (because N > 0) that match(r) =false; the relation
PI then justifies an increase of r by I (leaving PI invariant by leaving count
unchanged) but P2 does not justify any higher increase of r, and k = 0
(making P2 vacuously true) is maintained.
THE PATTERN MATCHING PROBLEM 137

For general k, however, there is the following argument. Define for


0 < i<k < N the boolean function
dif(i, k) = (Ej: 0 <j < k - i:p(j) =I= p(i + j))
From this it follows that dif(k, k) =false. If, however, dif (i, k) = true, we
conclude -because 0 < i + j < k- on account of the truth of P2
(Ej: 0 <j < k - i:p(j) =I= x(r + i + j))
that is, dif(i, k) ~non match(r + i). Therefore, the variable "count" needs
no further adjustments (besides the one on account of the value of match(r ))
when r is increased by d(k), where d(k) is the minimum solution for i with
0 < i < k of the equation dif (i, k) =false, or
(Aj: 0 <j < k - i:p(j) = p(i + j)) (2)
The fact that d(k) is a solution of (2) implies
(Aj: 0 <j < k - d(k):p(j) = p(d(k) + j))
which, with P2, amounts to
(Aj: 0 <j < k - d(k):p(j) = x(r + d(k) + j))
and as a result (besides the adjustment of "count" as implied by the value of
match(r)) both Pl and P2 are kept invariant by "r, k:= r + d(k), k - d(k)".
Because the minimum solution of (2) depends on k and p only, we find:
begin gloconp, N, x, M; virvar count; privar r, k; pricon d;
"initialize d";
count vir int, r vir int, k vir int:= 0, 0, 0;
dor<M-N-->
do k =I= N candp(k) = x(r + k)--> k:= k + 1 od;
if k = N--> count:= count+ l; r, k:= r + d(k), k - d(k)
0 0 < k < N--> r, k:= r + d(k), k - d(k)
Dk=O-->r:=r+l
fi
od
end
The only job left is the initialization of the array variable d, i.e. to establish
for each k satisfying 1 < k < N the minimum solution for i of (2). The
Linear Search Theorem tells us that we should try i-values in increasing order.
It pays, however, to realize that this minimum value for i has to be determined
for a whole sequence of k-values. Let kl > k2 and let d(kl) be the minimum
solution for i of (2) with k = kl. From
(Aj: 0 <j <kl - d(kl):p(j) = p(d(kl) + j)) and kl> k2
follows:
(Aj: 0 <j < k2 - d(kl):p(j) = p(d(kl) + j))
138 THE PATTERN MATCHING PROBLEM

i.e. for k = k2, d(kl) is also a solution for i of (2), but not necessarily the
smallest! From that we conclude that d(k) is a monotonically nondecreasing
function of k. And the algorithm therefore investigates increasing values of
i, each time deciding whether for one or more k-values i = d(k) can be con-
cluded (should be established). More precisely, let j(i) for given value of i
be the maximum value < N - i, such that
(Aj: 0 <j <j(i):p(j) = p(i +.i))
then d(k) = i for all k such that k - i <j(i) (or k <j(i) + i), for which no
solution d(k) < i exists. As the values of i will be tried in increasing order
and, upon identification as minimal solution, will be recorded in the mono-
tonically nondecreasing function d, the condition is
d.hib < k <j(i) +i
and we get the following program:

"initialize d":
begin glocon p, N; virvar d; privar i;
dvir int array, i vir int:=(/), O;
do d.hib =I= N ____.
begin glocon p, N; glovar d, i; privar j;
j vir int:= 0; i: = i + I;
do j < N - i cand p(j) = p(i + j) ____. j: = j + I od;
do d.hib < j + i ____. d:hiext(i) od
end
od
end

EXERCISES

1. Give a formal correctness proof for the above initialization.


2. With "r, k: = r + d(k), r - d(k)" for 0 < k, our algorithm adjusts r and k
without changing r + k. Investigate the slight gain that is possible for 0 < k
< N if it is known that the x-values are two-valued. (End of exercises.)

Remark. Our final algorithm is one whose execution time I consider to


grow proportional to M + N. Once one has set his goal to find, if possible,
an algorithm with such a performance, its actual development does not seem
to require much more than the usual care; the crucial point seems the refusal
to be satisfied (without further investigation) with the obvious M * N-algo-
rithm, the development of which I have left as an exercise to the reader. A
slight reformulation of the problem, however, enables us to recognize also
here a general design principle, which might be called the Search for the
THE PA1TERN MATCHING PROBLEM 139

Small Superset. Suppose that we had not been asked to count the number
of matches, but to generate the sequence of r-values for which match(r) holds.
When a program has to generate the members of a set A, there are
(roughly) only two situations. Either we have a simple, straightforward "suc-
cessor function" by means of which a next member of A can be generated
-and then the whole set can be trivially _generated by means of repeated
application of that successor function- or we do not have a function like
that. In the latter case, the usual technique is to generate the members of a
set B instead, where:

J. Each member of A is a member of Bas well.


2. There exists a generator for successive members of B.
3. There exists a test whether a member of B belongs to A as well.
The algorithm then generates and inspects all members of B in turn.

If this technique is to lead to a satisfactory performance, three conditions


should be satisfied:

1. The members of set B should be reasonably efficient to generate.


2. The test whether an element of B belongs to A as well should be reason-
ably efficient (particularly in the case that it does not, for, usually, B is
an order of magnitude larger than A).
3. Set B should not be unnecessarily large.

The trained problem solver, aware of the above, will consciously look
for a smaller set B than the obvious one. In this example, the set of all r-values
satisfying 0 < r < M - N is the obvious one. Note that in the previous
chapter "An Exercise Attributed to R. W. Hamming" the replacement of
the set "qq" by the much smaller set "qqq" was another application of the
principle of the Search for the Small Superset. And besides "taking a relation
outside the repetitive construct" this illustrates the second strategical similar-
ity between the solutions presented in the current and in the previous chapter.
(End of remark.)
WRITING A NUMBER
19 AS THE SUM OF TWO SQUARES

Suppose we are requested to design a program that will generate for any
given r > 0 all the essentially different ways in which r can be written as the
sum of two squares, more precisely, it has to generate all pairs (x, y), such
that

x2 +y =
2 rand x >y> 0 (I)

The answer will be delivered in two array variables xv and yv, such that
for i from xv.lob(= yv.lob) through xv.hib( = yv.hib) the pairs (xv(i), yv(i))
will enumerate all solutions of (J). The standard way of ensuring that our
sequential algorithm will find all solutions to (I) is to order the solutions of
(I) in some way, and I propose to order the solutions of (J) in the order of
increasing value of x (no two different solutions having the same x-value, this
ordering is unique). We propose to keep the following relation invariant

PI: xv(i) will be a monotonically increasing function with the same domain
as the monotonically decreasing function yv(i), such that the pairs
(xv(i), yv(i)) are all solutions of (J) with xv(i) < x

PI is easily established by initializing both xv and yv with an empty


domain and choosing x not too large. If the pair (xv(i), yv(i)) is a solution of
(I), we shall always have 2 * xv(i) 2 > xv(i) 2 + yv(i) 2 = r, and, therefore,
because xv(i) < x, the smallest value x > 0, such that 2 * x 2 > r is not too
large. This smallest value for x can be established by using the Linear Search
Theorem. However, because each xv(i) will satisfy xv(i) 2 < r, we know that
Pl and x 2 > r implies that all solutions have been recorded.
Our first sketch can therefore be :

140
WRITING A NUMBER AS THE SUM OF TWO SQUARES 141

begin glocon r; virvar xv, yv ; privar x;


x vir int:= 0; do 2 * x 2 < r --> x: = x + 1 od;
xv vir int array, yv vir int array:= (J), (J);
do x 2 < r __. "increase x under invariance of Pl" od
end

From this program we conclude that the invariant relation is really the
stronger relation
Pl'; Pl and 2 * x 2 > r
It is too much to hope to determine for each value of x the value y, such
that x 2 + y 2 = r, for such a value need not exist. What we can do is establish
x2 + y < r and x + (y + 1) >
2 2 2 r
From that relation we can conclude not only that if x + y 2 = r, a solution
2

of (J) has been found, but also that if x 2 + y 2 < r, for that value of x no
value y exists that would complete the pair. Taking the relation
P2: x2 + (y + 1)2 > r
as invariant relation for an inner repetitive construct, we can program

"increase x under invariance of Pl'":


begin glocon r; glovar xv, yv, x; privar y;
y vir int:= x; {on account of Pl', P2 has been established}
do x 2 + y 2 > r--> y:= y - 1 od; {x 2 + y 2 <rand P2}
if x 2 + y 2 = r --> xv :hiext(x); yv :hiext(y); x: = x + 1
nx 2 + y 2 < r--> x:= x + 1
fi
end

Observing, however, that the last alternative construct will not destroy the
validity of P2, we can improve the efficiency of this program considerably by
taking the relation P2 outside the outer repetitive construct:

begin glocon r; virvar xv, yv; privar x, y;


x vir int, y vir int:= 0, O;
do x 2 + y 2 < r--> x,y:= x + l,y + 1 od;
xv vir int array, yv vir int array:= (J), (J);
do x 2 < r --> do x 2 + y 2 > r --> y: = y - 1 od;
if x 2 + y 2 = r--> xv:hiext(x); yv:hiext(y); x:= x +1
nx 2 + y 2 < r---> x:= x + 1
fi
od
end
142 WRITING A NUMBER AS THE SUM OF TWO SQUARES

The latter improvement is the outcome of a Search for a Small Superset,


viz. for y; it has been implemented by taking a relation outside a repetitive
construct, viz. relation P2.
Note. Obvious improvements, such as testing whether r mod 4 = 3, and
exploiting the recurrence relation (x + 1) 2 = x 2 + (2 * x + 1) are left as
exercises. (End of note.)

Remark. The above program, which is due to W. H. J. Feijen, is dis-


tinctly superior to the program as we wrote it a few years ago, when, for
instance, the demonstration that no solutions had been missed always re-
quired a drawing. (End of remark.)
THE PROBLEM

20 OF THE SMALLEST PRIME


FACTOR OF A LARGE NUMBER

In this chapter we shall tackle the problem of finding the smallest prime
factor of a large number N > I (by "large" I mean here a number of the
order of magnitude of, say, 1016 ), under the assumption that the program is
intended for a small machine whose additive operations and comparisons are
assumed to be very fast compared with arbitrary multiplications and divi-
sions. (Nowadays, these assumptions are realistic for most so-called "mini-
computers"; the algorithm to be described was developed years ago for what,
in spite of its physical size, would now be called "a micro-computer".)
A straightforward application of the Linear Search Theorem tells us that,
when looking for the smallest prime factor of N, we should investigate prime
numbers as possible factors in increasing order of magnitude. Because a
divisible number has at least one prime factor not exceeding its square root,
the investigation need not go beyond the square root; if then still no factor
has been found, the number N must be prime. An algorithm of the following
structure would do the job:

begin glocon N; virvar p ; privar /;


f vir int:= 2;
do Nmodf-=;t:. 0 and f2 < N->
"increase/ to the next prime number"
od;
if N mod/ -=;t:. 0-> p vir int:= N
a N mod f = o _, p vir int:= f
fi
end
This algorithm, however, is "begging the question'', for how do we intend

143
144 THE PROBLEM OF THE SMALLEST PRIME FACTOR OF A LARGE NUMBER

to increase/to the next prime number? We have assumed a small machine


and that is supposed to exclude storing a table of successive primes up to 10 8 •
(In a straightforward technique, that would require 5 * 10 7 bits and that is
not what is called -not even today!- "a small memory".) Instead of deter-
mining the next prime number by looking it up in a stored table, it could be
computed; but the usual way to do that is, in principle, to investigate the
sequence
f + l,f + 2,f + 3, ...
until the first prime is found. But the investigation of whether a number is
a prime is usually reduced to the question of whether it equals its smallest
prime factor!
There is an absolutely unsophisticated way out of this dilemma. For
N > I, the smallest prime factor of N is also the smallest natural number
> 2, dividing N. This property gives us a method for finding the smallest
prime factor of N without referring to the concept "prime number" any more

begin glocon N; virvar p; privar /;


fvir int:= 2;
do N mod f =I= 0 and (/ + 1) 2 < N --> f: = f + I od;
if Nmodf =I= 0--> p vir int:= N
0 N modf = 0--> p vir int:= f
fi
end

The main trouble with this algorithm is that we have only assumed that
additive operations and comparisons would be fast, but have allowed the
computations of N modf and of (f + 1) 2 to be so slow as to be avoided in the
inner cycle, if possible.
The only way out seems to find some way of applying the technique of
"taking a relation outside the repetitive construct'', i.e. seeking to store and
maintain such information that after the computation of r = N mod/, the
computation of the next value of r (for f + 1) can profit from it. What can
we store?
We can start with the observation that r = Nmodfis the solution ofthe
equation
N = f * q + r and 0 < r < f,
and we could store q as well. Then we know that
N = (f + J) * q + (r - q)
and, in general, we can expect to have "gained" in the sense that (r - q) will
be closer to zero than the original N and, therefore, "easier" to reduce modulo
+
(f J). But, particularly for smaller values off (and r) and -therefore-
Iarger values of q, we cannot expect to have gained very much.
THE PROBLEM OF THE SMALLEST PRIME FACTOR OF A LAROE NUMBER 145

As far as that new value of r is concerned, viz. (r - q) mod (f + 1), we


are, however, not interested in q itself at all! Any smaller value, congruent q
modulo (f + 1), would be equally welcome and decreasing r by it would have
disturbed it less. In other words: we would prefer to decrease r not by q but,
say, by q mod (f + 1). So why not store that? Repeating the argument, we
are led to write down the equations:
N=f*q 0 +r0 with O<r,<f+i
q0 = (f + l)* q1
+r 1 for 0 <i< n
q1 =(f+2)*q2 + r2

q.-1 = (f + n)* q. + r.
q. = 0.
Eliminating the q's we get:
N=r 0 +
f*r1 +
f *U + l)*r2 +
f *(f + l)*(f + 2)*r 3 +

f *(f + l)*(f + 2)* ... *(f + n - J)* r n (J)


which clearly shows how N is fully determined by f and the finite sequence of
r's.
Replacing f by f + 1 and compensating the increase in each line by a
decrease of the term immediately above it, we get for N the alternative repre-
sentation:
N = (r 0 - r1) +
(f + l)*(r1 - 2 * r2) +
(f + l)*(f + 2)*(Y2 - 3 * Y3) +
(f + l)*(f + 2)*(f + 3)*(r1 - 4 * r4) +

The above transformation is effected by the program


f:=f + l; i:= 0;
do i < n---> r 1 := r, - (i + l)*r,+I; i:= i + 1 od
146 THE PROBLEM OF THE SMALLEST PRIME FACTOR OF A LARGE NUMBER

but for the fact that it would not do the job as far as the inequalities
0 < r1 <f + i
are concerned: r's could become negative. But this is easily remedied, because
(1) shows that an increase r0 : = r0 + f can be compensated by a decrease
r1 := r1 - 1. In general: r1 := r1 + (f + i) is compensated by r1+1 := r1+1 - 1.
As a result, the complete transformation is correctly described by

f:=f + l; i:= 0;
do i < n---->
r1 := r1 - (i + l)*r1+ 1 ;
do r 1 < 0 ---->
r1 := r1 + (f + i); r1+1 := r1+1 - 1
od;
i:= i + 1
od;
do r n = 0 ----> n : = n - 1 od
Under the assumption that multiplication by small integers presents no
serious problems -that could be done by repeated addition- the computa-
tion of successive values of N mod f has been reduced to the repertoire of
admissible operations. Furthermore, the test (f 2 < N in our earliest version,
(f + 1) 2 < Nin our next version) whether it is still worthwhile to proceed or
that the square root has been reached, can be replaced by n > l, for (n < 1)
=> (N < (f + 1)2).
With ar(k) = rk for 0 < k < ar.hib, we arrive at the following program:
begin glocon N; virvar p; privar f, ar;
begin glocon N; virvar ar; privar x, y;
ar vir int array:= (O); x vir int, y vir int:= N, 2;
do x-=/= 0----> ar:hiext(x mod y); x, y:= x div y, y + 1 od
end {ar has been initialized};
/vir int:= 2 {relation (J) has been established};
do ar(O) * 0 and ar.hib > 1 ---->
begin glovar f, ar; privar i;
f: = f + 1; i vir int:= 0;
do i -=/= ar.hib ---->
begin glocon/; glovar ar, i; priconj;
j vir int:= i + 1; ar:(i) = ar(i) - j * ar(j);
do ar(i) < 0----> ar:(i) = ar(i) + f + i;
ar :(}) = ar(j) - 1
od;
i:=}
end
THE PROBLEM OF THE SMALLEST PRIME FACTOR OF A LARGE NUMBER 147

od
end;
do ar.high = 0--> ar:hirem od
od;
if ar(O) = 0--> p vir int:= f
0ar(O) -::F 0--> p vir int:= N
fi
end
Remark 1. One might think that the algorithm could be speeded up by
a factor 2 by separately dealing with even values of N and with odd values of
N. For the latter we then could restrict the sequence of /-values to 3, 5, 7, 9,
11, 13, 15, ... ; the analogy of
Yi:= Yi - (i + l)*Yi+I
then becomes
r, := Yi - 2*(i + 1)* Yi+I
and this more violent disturbance will cause the next loop, which has to
bring ri within range again, to be repeated on the average about twice as
many times. The change is hardly an improvement and has mainly the effect
of messing up the formulae. (End of remark 1.)

Remark 2. A well-known technique for discovering where to invest one's


energy optimizing code is executing a program by an implementation that,
upon completion of the computation, will print the complete histogram,
indicating how many times each statement has been executed: it tells where
one should optimize. If that technique is applied to this program, it will
indicate that the innermost loop, bringing ar(i) again within range, absorbs
most of the computation time, particularly when the operations on and func-
tions of the array variable are relatively time-consuming. In this example,
this information is not only trivial, but also misleading in the sense that one
should not decide to start optimizing that inner loop! (Optimizing it, reducing
the number of "subscriptions", is easily done by introducing two scalar vari-
ables, ari and arj, equal to ar(i) and ar(j) respectively, but again it makes the
text longer and more messy.) The point is that the computation is only time-
consuming if N is prime or its smallest prime factor is near its square root.
In that case we shall have for the major part of the computation time ar.hib
= 2 (i.e. as soon as f has passed the cube root of N). The technique is to
replace in the outer loop the guard by
ar(O) -::F 0 and ar.hib > 2
and to deal in the final stage with three scalar variables rO, rl, and r2. In the
coding of that final stage one can then invest all one's ingenuity.
(End of remark 2.)
148 THE PROBLEM OF THE SMALLEST PRIME FACTOR OF A LARGE NUMBER

Remark 3. If the outcome of this computation is p = N, that result is


very hard to check. Even in the face of a computer which is not ultra-reliable,
this algorithm gives us the opportunity to increase the confidence level of the
result considerably: upon completion, the relation N = f * r1 + r0 should
hold! If somewhere in the arithmetic something has gone wrong, it is highly
improbable that, in spite of that, the last equality eventually holds. From the
point of view of reliability this algorithm, all the time computing the new r's
as a function of all the old ones, is far superior to the one that selects the
next prime number as trial factor from a table: the table may be corrupted
and, besides that, each divisibility test is a completely isolated computation.
It seems worth noticing that this tremendous gain in safety has been made
possible by "taking a relation outside the repetitive construct" and by nothing
else. The algorithm described in this chapter has not only been used to
produce highly reliable factorizations (of the type p = N), it has also been
used to check the reliability of a machine's arithmetic unit.
(End of remark 3.)

Consolation. Those of my readers who found this a difficult chapter will


be pleased to hear that it took even my closest collaborators more than an
hour to digest it: programs can be very compact. (End of consolation.)
THE PROBLEM
OF THE MOST
21 ISOLATED VILLAGES

We consider n villages (n > 1), numbered from 0 through n - I; for


0 <i< n and 0 < j < n, a computable function f (i, j) is given, satisfying
for some given positive constant M:

for i =Fj: 0 <f(i,j) < M

for i = j :f (i,j) = M

For the ith village, its isolation degree "id(i)" is given by

id(i) = minimumf(i,j) = minimumf(i,j)


}*I j

(Here f (i, j) can be interpreted as the distance from i to j; the rule f (i, i) =
M has been added for the purpose of the above simplification.)
We are requested to determine the set of maximally isolated villages,
i.e. the set of all values of k such that

(Ah: 0 < h < n: id(h) < id(k)).


The program is expected to deliver this set of values as

miv(miv.lob), ... , miv(miv.hib)

Note that eventually all values I < miv.dom < n are possible.
A very simple and straightforward program computes the n isolation
degrees in succession and keeps track of their maximum value found thus
far. On account of the bounds for f (i,j) we can take as the minimum of an
empty set the value Mand as the maximum of an empty set 0.

149
150 THE PROBLEM OF THE MOST ISOLA TED VILLAGES

begin glocon n, M; virvar miv; privar max, i;


miv vir int array := (O); max vir int, i vir int := 0, O;
do i =F n---->
begin glocon n, M; glovar miv, max, i; privar min, j;
min vir int, j vir int : = M, 0;
do j =F n---->
dof(i,j) <min----> min:=f(i,j) od;
j:=j +I
od {min= id(i)};
if max> min----> skip
0 max = min --> miv :hiext(i)
0max< min--> miv:= (0, i); max:= min
fi;
i:= i +I
end
od
end

The above is a very unsophisticated program: in the innermost loop


the value of min is monotonically nonincreasing in time, and the following
alternative construct will react equivalently to any value of min satisfying
min < max. Combining these two observations, we conclude that there is
only a point in continuing the innermost repetition as long as min > max.
We can replace the line "do j =F n -->" therefore by
"do j =F n and min> max--->"
and the assertion after the corresponding od by
{id(i) <min< max or id(i) =min> max}.
Let us call the above modification "Optimization I".
A very different optimization is possible if it is given that
f (i, j) = f (j, i)
and, because the computation off is assumed to be time-consuming, it is
requested never to compute f (i,j) for such values of the argument that
f (j, i) has already been computed. Starting from our original program we
can achieve that for each unordered argument pair the corresponding.f-value
will only be computed once by initializing j each time with i + I instead of
with 0 -only scanning the upper triangle of the symmetric distance matrix,
so to speak. The program is then only guaranteed to compute min correctly
provided that we initialize min instead of with M with
minimumf(i, h)
O:':h<I
THE PROBLEM OF THE MOST ISOLATED VILLAGES 151

This can be catered for by introducing an array, b say, such that fork satisfy-
ing i < k < n:
for i = 0 : b(k) = M
for i > 0: b(k) = minimumf(k, h)
O~h<i

(In words: b(k) is the minimum distance connecting village k that has been
computed thus far.)
The result of Optimization 2 is also fairly straightforward.

begin glocon n, M; virvar miv; privar max, i, b;


miv vir int array := (O); max vir int, i vir int := 0, O;
b vir int array := (O); do b.dom =F n -- b:hiext(M) od;
do i =F n --
begin glocon n; glovar miv, max, i, b; privar min,j;
min vir int, b:/opop;jvir int:= i + l;
doj =F n--->
begin glocon i; glovar min, j, b; privar ff;
./fvir int := f(i,j);
do.ff< min---> min:= ./f od;
do.ff< b(j) -- b:(j)= ./f od;
j:=j + 1
end
od {min= id(i)};
if max > min -- skip
a max = min -- miv :hiext(i)
a max< min -- miv:= (0, i); max:= min
fi;
i:= i + 1
end
od
end

To try to combine these two optimizations presents a problem. In Opti-


mization 1 the scanning of a row of the distance matrix is aborted if min
has become small enough; in Optimization 2, however, the scanning of
the row is also the scanning of a column and that is done to keep the
values of b(k) up to date. Let us apply Optimization 1 and replace the line
"doj =F n --->"by
"do j =F n and min > max --"

The innermost loop can now terminate with j < n; the values b(k) with
j< k < n for which updating is still of possible interest are now the ones
152 THE PROBLEM OF THE MOST ISOLATED VILLAGES

with b(k) > max, the other ones are already small enough. The following
insertion will do the job:

do j =I= n-->
if b(j) <max--> j:= j + I
a b(j) >max-->
begin glocon i; glovar j, b; privar ff;
ff vir int := f (i, j);
do ff < b(j) --> b :(j) =ff od;
j:=j +I
end
fi
od

The best place for this insertion is immediately preceding "i:= i +I'', but
after the adjustment of max; the higher max, the larger the probability that
a b(k) does not need any more adjustments.
The two optimizations that we have combined are of a vastly different
nature. Optimization 2 is just "avoiding redoing work known to have been
done'', and its effectiveness is known a priori. Optimization I, however, is
a strategy whose effectiveness depends on the unknown values of/: it is just
one of the many possible strategies in the same vein.
We are looking for those rows of the distance matrix whose minimum
element value S exceeds the minimum elements of the remaining rows and
the idea of Optimization I is that for that purpose we do not need to compute
for the remaining rows the actual minimum if we can find for each row an
upper bound B 1 for its minimum, such that Bi < S. In an intermediate stage
of the computation, for some row(s) the minimum S is known because all
its/their elements have been computed; for other rows we only know an
upper bound Bi. And now the strategic freedom is quite clear: do we first
compute the smallest number of additional matrix elements still needed to
determine a new minimum, in the hope that it will be larger than the minimum
we had and, therefore, may exceed a few more B's? Or do we first compute
unknown elements in rows with a high B in the hope of cheaply decreasing
that upper bound? Or any mixture?
My original version combining the two strategies postponed the "updat-
ing of the remaining b(k)" somewhat longer, in the hope that in the meantime
max would have grown still further, but whether it was a more efficient
program than the one published in this chapter is subject to doubt. It was
certainly more complicated, needing yet another array for storing a sequence
of village numbers. The published version was only discovered when writing
this chapter.
In retrospect I consider my ingenuity spent on my original program as
THE PROBLEM OF THE MOST ISOLATED VILLAGES 153

wasted: if it was "more efficient" it could only be so "on the average". But
on what average? Such an average is only defined provided that we postulate
-quite arbitrarily!- a probability distribution for the distance matrix
f(i,j). On the other hand it was not my intention to tailor the algorithm
to a specific subclass of distance matrices!
The moral- of the story is that, in making a general program, we should
hesitate to yield to the temptation to incorporate the smart strategy that
would improve the performance in cases that might never occur, if such
incorporation complicates the program notably: simplicity of the program
is a less ambiguous target. (The problem is that we are often so proud of our
smart strategies that it hurts to abandon them.)

Remnrk. Our final program combines two ideas and we have found it
by first considering-as "stepping stones", so to speak- two programs, each
incorporating one of them, but not the other. In many instances I found
such stepping stones most helpful. (End of remark.)
THE PROBLEM
OF THE SHORTEST
22 SUBSPANNING TREE

Two points can be connected by one point-to-point connection; three


points can be interconnected by two point-to-point connections; in general,
N points can be fully interconnected by N - I point-to-point connections.
Such a set of interconnections is called a "tree" or, if we wish to stress that
it connects the N points to each other, a "subspanning tree" and the connec-
tions are called "its branches". Cayley has been the first to prove that the
number of different possible trees between N given points equals NN- 2 •
(Verify this for N = 4, but not for N = 5.)
We now assume that the length of each of the N*(N - 1)/2 possible
branches has been given. Defining the length of a tree as the sum of the
lengths of its branches, we can ask ourselves how to determine the shortest
tree between those N points. (For the time being, we assume that the given
lengths are such that the shortest tree is unique.)
Note. The points are not necessarily in a Euclidean plane; the given
lengths need not have any relation to a Euclidean distance. (End ofnote.)
An apparently straightforward solution would generate all trees between
the N points, would determine their lengths, and select the shortest one.
But Cayley's theorem tells us that, as N increases, this rapidly becomes very
expensive, even prohibitively so. We would like to find a more efficient
algorithm.
When faced with such a problem (we have already done so quite explicitly
when solving the problem of the Dutch national flag), it is often very instruc-
tive, keeping in mind that we try to design a sequential algorithm, to consider
what intermediate states of the computation to expect. As our final answer
consists of N - I branches whose membership of the shortest tree will -

154
THE PROBLEM OF THE SHORTEST SUBSPANNING TREE 155

hopefully- be established in turn, we are led to investigate what we can say


when a number of branches of the shortest tree are known.
Immediately we are faced with the choice whether we shall consider the
general case in which an arbitrary subset of these branches is known, or
whether we confine our attention to certain types of subsets only, in the
hope that these will turn out to be the only subsets that will occur in our
computation and, also, that this restriction will simplify our analysis. This
is at least a choice that presents itself if we can think of a natural type of
subset. In this example, the natural type of subset that suggests itself is that
the known branches, instead of being randomly distributed over the points,
themselves already form a tree. As this special case indeed seems simpler
than the general one, we rephrase our question. Is there anything helpful
that we can say when a subtree of the shortest subspanning tree is known?
For the sake of this discussion we colour red the branches of the known
subtree and the points connected by it and colour all remaining points blue.
Can we then think of a branch that must belong to the shortest tree as well?
Because the final tree connects all points with each other, the final tree must
contain at least one branch connecting a red point to a blue one. Let us call
the branches between a red point and a blue one "the violet branches". The
obvious conjecture is that the shortest violet branch belongs to the shortest
tree as well.
The correctness of this conjecture is easily proved. Consider a tree T
between the N points that contains all the red branches but not the shortest
violet one. Add the shortest violet branch to it. This closes a a cycle with at
least one red and at least one blue point, a cycle which, therefore, contains
at least one other (and longer) violet branch. Remove such a loriger violet
branch from the cycle. The resulting graph is again a tree. In tree T we have
replaced a longer violet branch by a shorter one, and, therefore, tree T cannot
have been the shortest one. (A tree between N points is a graph with the
following three properties:

1. It interconnects the N points.


2. It has N - 1 branches.
3. It has no cycles.

Any two of these properties imply that the graph is a tree and, therefore,
also enjoys the third property.)
But now we have the framework for an algorithm, provided that we can
find an initial subtree to colour red. Once we have that, we can select the
shortest violet branch, colour it and its blue endpoint red, etc., letting the
red tree grow until there are no more blue points. To start the process, it
suffices to colour an arbitrary point red:
156 THE PROBLEM OF THE SHORTEST SUBSPANNING TREE

colour an arbitrary point red and


the remaining points blue;
do number of red points ::/= N-->
select the shortest now violet branch;
colour it and its blue endpoint red
od

As it stands, the main task will be: "select the shortest now violet branch",
because the number of violet branches may be quite large, viz. k *(N - k),
where k = number of red points. If "select the shortest now violet branch"
were executed as ·an isolated operation, it would require on the average a
number of comparisons proportional to N 2 and the amount of work to be
done by the algorithm as a whole would grow as N 5 • Observing, however,
that the operation "select the shortest now violet branch" does not occur in
isolation, but as component of a repetitive construct, we should ask ourselves
whether we can apply the technique of "taking a relation outside the repeti-
tive construct", i.e. whether we can arrange matters in such a way that
subsequent executions of "select the shortest now violet branch" may profit
from the preceding one. There is considerable hope that this may be possible,
because one set of violet branches is closely related to the next: the set of
violet branches is defined by the way in which the points have been parti-
tioned in red ones and blue ones, and this partitioning is each time only
changed by painting one blue point red.
Hoping for a drastic reduction in searching time when selecting the
shortest branch from a set means hoping to reduce the size of that set; in
other words, what we are looking for is a subset of the violet branches -caII
it· the "ultraviolet" ones- that will contain the shortest one and can be used
to transmit helpful information from one selection to the next. We are envis-
aging a program of the structure:

colour an arbitrary point red


and the remaining ones blue;
determine the set of ultraviolet branches;
do number of red points ::/= N-->
select the shortest now ultraviolet branch;
colour it and its blue endpoint red;
adjust the set of ultraviolet branches
od

where the notion "ultraviolet" should be chosen in such a way that:

1. It is guaranteed that the shortest violet branch is among the ultraviolet


ones.
THE PROBLEM OF THE SHORTEST SUBSPANNING TREE 157

2. The set of ultraviolet branches is, on the average, much smaller than
the set of violet ones.
3. The operation "adjust the set of ultraviolet branches" is relatively cheap.

(We require the first property because then our new algorithm is correct
as well; we require the second and the third properties because we would
like our new algorithm to be more efficient than the old one.)
Can we find such a definition of the notion "ultraviolet"? Well, for lack
of further knowledge, I can only suggest that we try. Considering that the
set of violet branches leading from k red points to N - k blue points, has
k *(N - k) members, and observing our first criterion, two obvious possible
subsets immediately present themselves:

I. Make for each red point the shortest violet branch ending in it ultravio-
let; the set of ultraviolet branches has then k members.
2. Make for each blue point the shortest violet branch ending in it ultravio-
let; the set of ultraviolet branches has then N - k members.

Our aim is to keep the ultraviolet subset small, but we won't get a clue
from their sizes: for the first choice the size will run from I through N - I,
for the second choice it will be the other way round. So, ifthere is any chance
of deciding, we must find it in the price of the operation "adjust the set of
ultraviolet branches".
Without trying different adjustments, however, there is one observation
that suggests a strong preference for the second choice. In the first choice,
different ultraviolet branches may lead to the same blue point and then we
know a priori that at most one of them will be coloured red; with the second
choice each blue point is connected in only one way to the red tree, i.e. red
and ultraviolet branches form all the time a subspanning tree between the
N points. Let us therefore explore the consequences of the second definition
for our notion "ultraviolet".
Consider the stage in which we had a red subtree R and in which from
the set of corresponding ultraviolet branches (according to the second
definition; I shall no longer repeat that qualification) the shortest one and
its originally blue endpoint P have been coloured red. The number of ultra-
violet branches has been decreased by I as it should be. But, are the remain-
ing ones the correct ones? They represent for each blue point the shortest
possible connection to the originally red tree R, they should represent the
shortest possible connection to the new red tree R + P. But this question is
settled by means of one simple comparison for each blue point B: if the
branch BP is shorter than the ultraviolet branch connecting B to R, the latter
is to be replaced by BP, otherwise it is maintained as, apparently, the growth
of the red tree did not result in a shorter way of connecting B with it. As a
158 THE PROBLEM OF THE SHORTEST SUBSPANNING TREE

result, both selection of the shortest ultraviolet branch and adjustment of


the set of ultraviolet ones have become operations in price proportional to
N, the price of the total algorithm grows with N 2 and the introduction of
this concept "ultraviolet" has, indeed, accomplished the savings we were
hoping for.

EXERCISE

Convince yourself that the rejected alternative for the concept "ultraviolet" is not
so helpful. (End of exercise.)

Because "adjust the set of ultraviolet branches" implies checking for


each ultraviolet branch whether it has to be kept or has to be replaced, it is
tempting to combine this process with the selection of the shortest one of
the adjusted set into one single loop. Instead of transmitting the adjusted
set of ultraviolet branches to the next repetition, we shall transmit the unad-
justed set together with the "/er'', the ordinal number of the point lastly
coloured red. The initialization problem can be solved nicely if we know that
all branches have a length < inf; we shall assume the knowledge of this
constant available and shall initialize the ultraviolet branches of the unad-
justed set all with length inf (and connect th<: blue points to a hypothetical
point with ordinal number 0).
We assume the N given points numbered from 1 through N, and we
assume the length of the branch between points p and q given by the sym-
metric function "dist(p, q)". The answer required is a tree of N - 1 branches,
each branch being identified by the ordinal numbers of its endpoints; the
answer is an (unordered) set of (unordered) pairs. We shall represent them
by two arrays, "from" and "to" respectively, such that for 1 < h < N - 1
''from(h)" and "to(h)" are the (ordinal numbers of the) two endpoints of the
hth branch. In our final answer the branches will be ordered (by h); the only
order that makes sense is the order in which they have been coloured red.
This suggests to use, during our computation, the arrays ''from" and "to"
for representing the ultraviolet branches as well, with the aid of the conven-
tion that
for 1 < h < k: the hth branch is red
fork < h < N: the hth branch is ultraviolet.
A local array uvl ("ultraviolet length") is introduced in order to avoid
recomputation of lengths of ultraviolet branches, i.e. we shall maintain
fork< h < N: uvl(h) =length of the hth branch.
Furthermore, because there is a one-to-one correspondence between ultra-
violet branches and blue points, we can represent which points are blue by
THE PROBLEM OF THE SHORTEST SUBSPANNING TREE 159

means of the convention that the hth ultraviolet branch connects the red
point ''from(h)" with the blue point "to(h)".
In the following program, point N is chosen as the arbitrary point that
is initially coloured red.
Note. With respect to the array ''from" and the scalar variable "suv"
one could argue that we have not been able to avoid meaningless initial-
izations; they refer, however, to a virtual point 0 at a distance "inf" from
all the others and to an equally virtual 0th ultraviolet branch. (End
of note.)

begin glocon N, inf; virvar from, to; privar uvl, k, /er;


from vir int array : = (J); to vir int array : = (J);
uvl vir int array : = (J);
do from.dam =F N - 1 _.
from: hiext(O); to: hiext(from.dom); uvl: hiext(inf)
od;
k vir int, /er vir int := 1, N;
do k =F N->
begin glocon N, inf; glovar from, to, uvl, k, /er; privar suv, min, h;
suv vir int, min vir int, h vir int := 0, inf, k;
doh =F N--
begin glocon to, /er; glovar from, uvl, suv, min, h; privar /en;
/en vir int:= dist(ler, to(h));
if /en< uvl(h)-> uvl:(h)= len;from:(h)= /er
a /en> uvl(h) -- /en:= uv/(h)
fi {/en = uv/(h)};
do /en< min----> min:= /en; suv:= hod {min= uvl(suv)};
h:= h + 1
end
od {the suv-th branch is the shortest ultraviolet-one};
from:swap(k, suv); to:swap(k, suv); uvl:swap(k, suv);
/er:= to(k); k:= k + 1; uvl:lorem
end
od
end

In spite of the simplicity of the final program -it is not much more than
a loop inside a loop- the algorithm it embodies is generally not regarded as
a trivial one. It is, as a matter of fact, well-known for being highly efficient,
both with respect to its storage utilization and with respect to the number of
comparisons of branch lengths that are performed. It may, therefore, be
rewarding to review the major steps that led to its ultimate discovery.
The first crucial choice has been to try to restrict ourselves to such inter-
160 THE PROBLEM OF THE SHORTEST SUBSPANNING TREE

mediate states that the red branches, i.e. the ones known to belong to the
final answer, always form a tree by themselves. The clerical gain is that then
the number of red points exceeds the number of red branches by exactly one
and that we are allowed to conclude that a branch leading from one red
point to another is never a candidate for being coloured red: it would erro-
neously close a cycle. (And now we see an alternative algorithm: sort all
branches in the order of increasing length and process them in that order,
where processing means if the branch, together with the red branches, forms
a cycle, reject it, otherwise colour it red. Obviously, this algorithm establishes
the membership of the final answer for the red branches in the order of
increasing length. The algorithm is less attractive because, firstly, we have
to sort all the branches and, secondly, the determination of whether a new
branch will close a cycle is not too attractive either. That problem will be
the subject of a next chapter.) The moral of the story is that the effort to
reach the final goal via "simple" intermediate states is usually worth trying!
A second crucial step was the formulation of the conjecture that the
shortest violet branch could be painted red as well. Again, that conjecture
has not been pulled out of a magic hat; if we wish to "grow" a red tree, a
violet branch is what we should be looking for, and the fact that then the
shortest one is a likely candidate is hardly surprising.
The decision not to be content with an N 3 -algorithm -a decision which
led to the notion "ultraviolet"- is sometimes felt to be the most unexpected
one. People argue: "But suppose that it had not entered my head to investi-
gate whether I could find a better algorithm?" Well, that decision came at a
moment that we had an algorithm, and the mathematical analysis of the
original problem essentially had been done. It was only an optimization for
which, for instance, no further knowledge of graph theory was anymore
required! Besides that, it was an optimization that followed a well-known
pattern: taking a relation outside the loop. The moral of the story is that
once one has an algorithm, one should not be content with it too soon, but
investigate whether it can still be massaged. When one has made such recon-
siderations a habit, it is unlikely that the notion of "ultraviolet" would in
this example have escaped one's attention.
Note. It follows from the algorithm that the shortest subspanning tree
is unique if no two different branches have equal lengths. Verify that if
there is more than one shortest subspanning tree, our algorithm may
construct any of them. (End of note.)
A very different algorithm places the branches in arbitrary order, but,
whenever after placement of a branch, a cycle is formed, the (or a) longest
branch of that cycle is removed before the next branch is placed.
REM'S ALGORITHM
FOR THE RECORDING
23 OF EQUIVALENCE CLASSES

In a general graph (which need not be a tree) the points are usually called
"vertices" and the connections are usually called "edges" rather than
branches. A graph is called "connected" if and only if it contains only one
vertex or its edges provide at least one path between any two different vertices
from it. Because many of the possible edges of a graph with N vertices may
be missing, a graph need not be connected. But each graph, connected or
not, can always be partitioned uniquely into connected subgraphs, i.e. the
vertices of the graph can be partitioned in subsets, such that any pair from
the same subset is connected, while the edges provide no path between any
two vertices taken from two different subsets. (For the mathematicians:
"being connected" is a reflexive, symmetric, and transitive relation, which,
therefore, generates equivalence classes.)
We consider N vertices, numbered from 0 through N - I, where N is
supposed to be large (10,000, say), and a sequence of graphs G0 , G1 , G2 , • ••
which result from connecting these vertices via the edges of the sets E 0 ,
EI> E 2 , ••• , where E 0 is empty and E 1+I = E 1 + {ei} and e 0 , e1 , e2 , ••• is a
given sequence of edges. The edges e 0 , e1 , e 2 , ••• have to be processed in that
order and when n of them have been processed (i.e. the last edge processed,
if any, is e._ 1) we must be able to determine for any pair of vertices whether
they are connected in G. or not. The main problem to be solved in this
chapter is: "How do we store the relevant information as derived from the
edge sequence e 0 , e1 , • • • , e._ 1 ?"
We could, of course, store the edge sequence "e 0 , e1' ... , e._/' itself, but
that is not a very practical solution. For, firstly, it stores a lot of irrelevant
information; e.g. if the edges {7, 8} and {12, 7} have been processed, a new
edge {12, 8} does not tell us anything new! And, secondly, the answer to the

161
162 REM'S ALGORITHM FOR THE RECORDING OF EQUIVALENCE CLASSES

question "Are vertices p and q connected?" -our goal!- is deeply hidden.


At the other extreme we could store the complete connectivity matrix,
i.e. N 2 bits (or, if we exploit its symmetry, N* (N + 1)/2 bits), such that the
question whether p and q are connected is the stored boolean value connp,q·
(The fact that we talk about a matrix, while the array variable more resembles
a vector, need not bother us; we could use an array variable, "aeon" say,
and maintain
connp,q = acon(N * p + q) for 0 <p < N and 0 <q< N.)
I called this "the other extreme" because it stores indeed the readymade
answer to any possible question of the form "Are vertices p and q con-
nected?"; but it stores too much, because many of these answers are in
general not independent of each other. The decision to store the information
in such a redundant form should not be taken lightly, because the more
redundant information we store, the more may have to be updated when a
new edge has to be processed.
A more compact way to represent which vertices are connected to each
other is to name (for instance, to number) the subsets of the current parti-
tioning. Then the question whether vertex p is connected to vertex q boils
down to the question of whether they belong to the same subset, i.e. to the
value of the boolean expression
subset(p) = subset(q)
Because there are at most N different subsets, this does not require more
than N * )og(N) bits (compared to the N 2 of the complete connectivity
matrix).
To answer the question whether p and q are connected, an array variable
storing the function "subset" would be extremely helpful, but ... its updating
can be very painful! For, suppose that subset(p) = P and subset(q) = Q
with P * Q and suppose that the next edge to be processed is {p, q}. As a
result, the two subsets have to be combined into a single one and, if the sub-
sets are already large, that requires a lot of updating. Besides that, ifthe array
variable "subset" is the only thing we have, how do we find, for instance, all
values of k such that subset(k) = Q (assuming that P will be the name of the
new subset formed by the combination) otherwise than by scanning the func-
tion "subset" over its complete domain?
The way out of this dilemma is not to store the function "subset'', but a
different function, ''!" say, with the properties that

1. For given value of p, the knowledge of the function fallows, at least on


the average, an easy computation of the value subset(p).
2. The processing of a new edge {p, q} allows, at least on the average, an
easy updating of the function f
REM'S AWORITHM FOR THE RECORDING OF EQUIVALENCE CLASSES 163

Can we invent such a function?


Because the primary information given by a new edge to be processed is
that from now on the one vertex belongs to the same subset as the other one,
it is suggested to store a function/, with/ (k) meaning
"vertex nr. k belongs to the same subset as vertex nr. f (k)"
i.e. the argument off and the function value off are both interpreted as
vertex numbers.
Because, at the beginning, E0 is empty, each vertex is then the only mem-
ber of the subset it belongs to, and this function, therefore, must be initialized:
f(k) = k for 0 <k< N
and the most natural nomenclature for the N different subsets is then the
number of the only vertex it contains. The obvious generalization to subsets
of more vertices is that each subset will be identified by the vertex number of
one of the vertices it contains -which one need not be considered now;
with this convention, we can now consider a function f such that
if f (k) = k, vertex nr. k belongs to subset nr. k, and
if f (k) =I= k, vertex nr. k belongs to the same subset as vertex
nr.f(k).
Repeated computation of the function f will lead us from one vertex of
the subset to another vertex of the same subset. If we can see to it that this
will not lead us into a cycle before we have been led to the "identifying vertex
of the subset", i.e. the one whose number has been inherited by the subset
as a whole, the knowledge of the function/will indeed enable us to compute
subset(p) for any value of p.
More precisely, with the notational convention

JO(p) = p,
and for i > 0

there exists for a vertex p belonging to subset ps a value j such that


for i < j: fi(p) =I= ps
for i > j: f 1(p) = ps
From this it follows that for 0 < jl < j2 < j: f1 1(p) =I= f1 2 (p) and
''ps vir int:= p; do ps =I= f (ps)---> ps:= f (ps) od"
will terminate with ps = subset(p).
The fact that the function f leads -possibly after repeated application-
for all points of subset nr. qs to the same identifying vertex nr. qs, merging
that subset with subset nr. ps and identifying the result with nr. ps only
requires a minute change in the function/; instead of/(qs) = qs, eventually
164 REM'S ALGORITHM FOR THE RECORDING OF EQUIVALENCE CLASSES

f(qs) = ps should hold. Processing the edge {p, q} can therefore be done by
the following inner block:

begin glocon p,q; glovar f; privar ps, qs;


ps vir int := p; do ps =:/= f(ps)-----> ps:= f(ps) od; {ps = subset(p)}
qs vir int:= q; do qs =f::.f(qs)-----> qs:= f(qs) od; {qs = subset(q)}
f:(qs)= ps
end

Although correct, the above processing of a new edge is not too attractive,
as its worst case performance can become very bad: the cycles may have to
be repeated very many times. It seems advisable to "clean up the tree"; for
both vertex p and vertex q we now know the current identifying vertex and it
seems a pity to trace those paths possibly over and over again. To remedy
this situation, we offer the following inner block (we have also incorporated
an effort to reduce the number of .f-evaluations)

begin glocon p, q; glovar f; pricon ps;


begin glocon p, f; virvar ps; privar psO;
psO vir int, ps vir int := p,f(p);
do psO =:/= ps-----> psO, ps := ps,f(ps) od
end {ps = subset(p)};
begin glocon p, ps; glovar f; privar psO, psi;
psO vir int, psi vir int := p,f(p);
do psO =:/=psi-----> f:(psO)= ps; psO, psi := psi,f(psi) od
end {vertices encountered from nr. p "cleaned up"};
begin glocon q, ps; glovar f; privar qsO, qsi;
qsO vir int, qsi vir int:= q,f(q);
do qsO =:/= qsi-----> f:(qsO)= ps; qsO, qsi := qsi,f(qsi) od;
f:(qsO)= ps
end {vertices encountered from nr. q "cleaned up" and the
whole subset combined with nr. ps}
end

EXERCISES

1. Give a more formal proof of the correctness of the above algorithm.


2. We have quite arbitrarily decided that subset(p) would remain unchanged and
that subset(q) would be redefined. Can you exploit this freedom to advantage?
(End of exercises.)

The above algorithm has not been included for its beauty. As a matter
of fact, I would not be amazed ifit left the majority of my readers dissatisfied.
It is, for instance, annoying that it is not a trivial task to estimate how much
REM'S ALGORITHM FOR THE RECORDING OF EQUIVALENCE CLASSES 165

we have gained by going from the first to the second version. It has been
included for two other reasons.
Firstly, it gave me in a nicely compact setting the opportunity to discuss
alternative ways for representing information and to illustrate the various
economic considerations. Secondly, it is worthwhile to point out that we
allow as intermediate state a nonunique representation of the (unique) cur-
rent partitioning and allow the further destruction of irrelevant information
to be postponed until a more convenient moment arrives. I have encountered
such use of nonunique representations as the basis for a few, otherwise very
surprising, inventions.

When M. Rem read the above text, he became very pensive -my solution,
indeed, left him very dissatisfied- and shortly afterwards he showed me
another solution. Rem's algorithm is in a few respects such a beauty that
I could not resist the temptation to include it as well. (The following reason-
ing about it is the result of a joint effort, in which W.H.J. Feijen participated
as well.)
In the previous solution it is not good form that, starting at p, the path
to the root of that tree is traced twice, once to determine ps, and then, with
the knowledge of ps, a second time in order to clean it up. Furthermore it
destroys the symmetry between p and q.
The two scans of the path from p were necessary because we wanted to
perform a complete cleaning up of it. Without knowing the number of the
identifying vertex we could, however, at least do a partial cleaning up in
a single scan, if we knew a direction towards "a cleaner tree". It is therefore
suggested to exploit the ordering relation between the vertex numbers and to
choose for each subset as identifying vertex number, say, the minimum value;
then, the smaller the.f-values, the cleaner the tree. Because initially f (k) = k
for all k and our target has now become to decrease .f-values, we should
restrict ourselves to representations of the partitioning satisfying
f(k) < k for 0 < k < N.
This restriction has the further advantage that it is now obvious that the
only cycles present are the stationary points for which f (k) = k (i.e. the
identifying vertices).
For the purpose of a more precise treatment we introduce the following
notation: a function ''part" and a binary operator"$".
part(/) denotes the partitioning represented by f
part(f)$(p, q) denotes the partitioning resulting when in part(/) the
subset containing p and the subset containing q are
combined into a single one. If and only if in part(/) the
vertices p and q are already in the same subset, we have
part(!) = part(f)$(p, q)
166 REM'S ALGORITHM FOR THE RECORDING OF EQUIVALENCE CLASSES

Denoting the initial value off by hni•• the processing of an edge (p, q)
can be described as establishing the relation
R: part(!) = part(f;nit)$(p, q)
This is done (in the usual fashion!) by introducing two local variables, pO
and qO say, (or, if you prefer, a local edge) satisfying the relation
P: part(f)$(p0, qO) = part(f;nit)$(p, q)
which is trivially established by the initialization
pO vir int, qO vir int : = p, q
After the establishment of P the algorithm should massage f, pO, and qO
under invariance of P until we can conclude that
Q: part(!) = part(f)$(p0, qO)
holds, as (Q and P) => R.
Relation R has in general many solutions for f, but the cleaner the one
we can get, the better; it is therefore proposed to look for a massaging process
such that each step decreases at least one /-value. Then termination is
ensured (as monotonically decreasing variant function we can take the sum
of the N /-values) and we must try to find enough steps such that BB, the
disjunction of the guards, is weak enough so that (non BB) => Q.
We can change the value of the function/in pointpO, say, by
/: (pO) = something
but in order to ensure an effective decrease of the variant function, that
"something" must be smaller than the original value of f(pO), i.e. smaller
than pl if we introduce
Pl: pl = f(pO) and qi = f(qO)
(The second term has been introduced for reasons of symmetry.)
Because part(f)$(p0, qO) has to remain constant, obvious candidates for
the "something" are qO and qi; but because qi < qO, the choice qi will in
general be more effective, and we are led to consider
qi <pl __. /: (pO)= qi
where the guard is fully caused by the requirement of effective decrease of
the variant function. The next question is whether after this change off
we can readjust (pO, qO) so as to restore the possibly destroyed relation P.
The connection (from pO) to pl being removed, a safe readjustment has to
reestablish the (possibly) destroyed connection with pl. After the change of
f caused by/: (pO)= qi, we know that
part(f)$(pl, x) = part(f;nii)$(p, q)
for x equal to pO, qO, or qi. The relation P is most readily re-established
REM'S ALGORITHM FOR THE RECORDING OF EQUIVALENCE CLASSES 167

(because for x, qO may be chosen) by "pO:= pl", which, in view of Pl is


coded as
pO, pl := pl,f(pl)
Thus we are led to the following program, the second guarded command
being justified by symmetry considerations,

begin glocon p, q; glovar /; privar pO, pl, qO, qi;


pO vir int, qO vir int := p, q {P has been established};
pl vir int, qi vir int :=f(pO),f(qO) {Pl has been established};
do qi< pl__. f:(pO)= qi; pO, pl := pl,f(pl)
Dpl <qi__. f:(qO)= pl; qO, qi := ql,f(ql)
od
end

The repetitive construct has been constructed in such a way that termination
is ensured: upon completion we can conclude pl =qi, which on account of
Pl impliesf(pO) = f(qO), from which Q follows! Q.E.D.
Note. For the controlled derivation of this program --even for an a
posteriori correctness proof- the introduction of "part" and "$", or a
similarly powerful notation, seems fairly essential. Those readers who
doubt this assertion are invited to try for themselves a proof in terms of
the N values of/ (k), rather than in terms of a nicely captured property of
the function/ as a whole, as we have done. (End of note.)

Advice. Those readers who have not fully grasped the compelling beauty
of Rem's algorithm should reread this chapter very carefully. (End of
advice.)
THE PROBLEM
OF THE CONVEX HULL
24 IN THREE DIMENSIONS

In order to forestall the criticism that I only show examples that admit
a nice, convincing solution, and, in a sense, do not lead to a difficult program,
we shall now tackle a problem which I am pretty sure will turn out to be
much harder. (For my reader's information: while starting to write this
chapter, I have never seen a program solving the problem we are going to
deal with.)
Given a number of different points on a straight line -by their coordi-
nates, say- we can ask to select those points P, such that all other points lie
at the same side of P. This problem is simple: scanning the coordinates once
we determine their minimum and their maximum value.
Given, by means of their x-y-coordinates, a number of different points
in a plane such that no three points lie on the same straight line, we can ask
to select those points P through which a straight line can be drawn such that
all other points lie at the same side of that line. These points Pare the vertices
of what is called "the convex hull of the given points". The convex hull itself
is a cyclic ordering of these points with the property that the straight line
connecting two successive ones has all remaining points at one of its sides.
The convex hull is the shortest closed line such that each point lies either
on it or inside it.
In this chapter we shall tackle the analogous problem for three dimen-
sions. Given, by their x-y-z-coordinates, N different points (N large) such
that no four different points lie in the same plane, select all those points P
through which a plane can be "drawn", such that all other points lie at the
same side of that plane. These points are the vertices of the convex hull
around those N points, i.e. the minimal closed surface such that each point
lies either on it or inside it. (The restriction that no four points lie in the

168
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 169

same plane has been introduced to simplify the problem; as a result all the
faces of the convex hull will be triangles.)
For the time being we leave the question open as to whether the convex
hull should be produced as the collection triangles forming its faces, or as the
graph of its vertices and edges -where the edges are the lines through two
different vertices such that a plane through that line can be "drawn" with
all other points at the same side of it.
The reason why we postpone this decision is exactly the reason why the
problem of the three-dimensional convex hull is such a hairy one. In the
two-dimensional case, the convex hull is one-dimensional and its "process-
ing" (scanning, building up, etc.) is quite naturally done by a sequential
algorithm with a linear store. In the three-dimensional case, however, neither
the representation of the "two-dimensional" answer with the aid of a linear
store nor the "sequencing" in manipulating it are obvious.
All algorithms that I know for the solution to the two-dimensional
problem can be viewed as specific instances of the abstract program:

construct the convex hull for two or three points;


do there exists a point outside the current hull --->
select a point outside the current hull;
adjust the current hull so as to include
the selected point as well
od

The known algorithms differ in various respects. A subclass of these


algorithms select in the action "select a point outside the current hull" only
points that are a vertex of the final answer. These fall into two subsubclasses:
in the one subsubclass not only the new vertex but also one of the new edges
will belong to the final answer; in the other subsubclass the new vertex of the
final answer is selected with the aim of reducing the number of points still
outside the hull as much as possible. The worst case and best case effectiveness
of such a strategy is, however, dependent on the positions of the given points
and its "average effectiveness" is only defined with respect to a population of
sets of points from which our input can be regarded as a random sample.
(Jn general, an algorithm with widely different worst and best case perfor-
mance is not an attractive tool; the trouble is that often these two bounds
can only be brought more closely together by making all performances
equally bad.)
The second aspect in which the various known algorithms for the two-
dimensional case differ is the way in which it is discovered which points lie
within the current hull (and, therefore, can be discarded). To determine
whether an arbitrary point lies within the current hull, we can scan the edges
(in sequence, say): if it lies at the inner side of all of them, it lies inside the
170 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS

current hull. Instead of scanning all vertices, we can also exploit that a point
lies only inside the convex hull if it lies inside a triangle between three of its
vertices and we can try to find such a triple of vertices according to some
strategy aiming at "quickest inclusion". Some of these strategies can, on
"an" average, be speeded up quite considerably by some additional book-
keeping, by trading storage space versus computation time.
From the above we can only expect that for the three-dimensional case,
the collection of algorithms worthy of the predicate "reasonable" will be
of a dazzling variety. It would be vain to try anything approaching an
exhaustive exploration of that class, and I promise that I will be more than
happy if I can find one, perhaps two "reasonable" algorithms that do not
seem to be unduly complicated.
Personally I find such a global exploration of the potential difficulty,
as given above, very helpful, as it indicates how humbly I should approach
the problem. In this case I interpret the indications as pointing towards a
very humble approach and it seems wise to refrain from as much complicat-
ing sophistication as we possibly can, before we have discovered -if ever!
- that, after all, the problem is not as bad as it seemed at first sight. The
most drastic simplification I can think of is confining our attention to the topo-
logy, and refraining from all strategic considerations based upon expecta-
tion values for the numbers of points inside given volumes.
It seems that the most sensible thing to do is to look at the various ways
of solving the two-dimensional problem and to investigate their generaliza-
tion to three dimensions.
A simple solution of the two-dimensional problem deals with the points
in arbitrary order, and it maintains the convex hull for the first n points,
initializing with n = 3. Whenever a next point is taken into consideration
two problems have to be solved:

1. It has to be decided whether the new point lies inside or outside the
current hull.
2. If it is found to lie outside the current hull, the current hull should be
adjusted.

One way of doing this is to search for the set of k consecutive (!)edges
such that the new point lies at their wrong side. Those k edges (and k - I
points) have to be removed during the adjustment; they will be replaced by
I point and 2 edges. If the search fails to find such a set, the new point lies
inside.
Assuming the current vertices cyclically arranged, such that for each
vertex we can find its predecessor as well as its successor, the least sophisti-
cated program investigates these edges in order. In the three-dimensional
problem the equivalents of the edges are the triangular faces, the equivalent
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 171

of a set of consecutive edges is a set of connected triangles which are topolog-


ically equivalent to a circle. Let us call this "a cap".
The convex hull for the two-dimensional problem has to be divided into
two "sections", the set of edges to be maintained or to be rejected respectively,
where the two sections are separated by two points; in the three-dimensional
problem the convex hull has to be divided into two caps separated by a
cyclic arrangement of points and edges.
We could start confronting the new point with an arbitrary face and
take this as the starting face of one of the two caps; that cap can be extended
one face at a time, and the process will terminate either because the cap
comprises all faces (and the other cap is empty; the new point lies inside the
current hull) or because it is fully surrounded by faces belonging to the other
cap.
The key problem seems to be to maintain a floating population of faces
in such a way that given a cap we can find a face with which we can extend
it such that the new set of faces again forms a cap. We can do so by maintain-
ing (in cyclic order) the points (and the edges) of the cap boundary. A new
face to be added must have an edge, and therefore two points in common
with the cap boundary: if the third point occurs in the cap boundary as well,
it must in the old boundary be adjacent to one of the other two.
I would not like the effort to find a face with which the cap can be ex-
tended to imply a search through the remaining faces, i.e. given an edge of
the boundary, I would prefer quick access to the necessary data defining the
other face to which this edge belongs. A desire to tabulate -we would like
to use arrays- suggests that we should maintain a population of numbered
edges as well.
The fact that an edge defines two points as well as two faces, and that a
face is defined equally well by its three points as by its three edges, suggests
that we should try to carry out as much of the administration as we can in
terms of edges. We seem to get the most symmetrical arrangement if we
number each undirected edge twice, i.e. regard it as a superposition of two
numbered, directed edges.
Let i be the number of a directed edge of the convex hull; then we can
define
inv(i) = the number of the directed edge that connects the same points
as the directed edge nr. i, but in the inverse direction.
Furthermore, when we associate with each face the three directed edges
of its clockwise boundary, each directed edge is a clockwise edge of exactly
one face, and we can give the complete topology by one further function
suc(i) = the number of the directed edge that is the next clockwise edge
of the face of which nr. i is a clockwise edge; with "next" is
meant that the edge "suc(i)" begins where the edge "i" ends.
172 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS

As a result "i'', "suc(i)" and "suc(suc(i))" are the numbers of the edges
forming in that order a clockwise boundary of a face. Because all faces are
triangles, we shall have
suc(suc(suc(i))) = i
The functions "inv" and "sue" give the complete topological description
of the convex hull in terms of directed edge names. If we want to go from
there to point numbers, we can introduce a third function
end(i) = the number of the point in which the directed edge nr. i ends.
We then have end(inv(i)) = end(suc(suc(i))), because both expressions
denote the number of the point in which the directed edge nr. i begins; the
left-hand expression is to be preferred, not so much because it is simpler,
but because it is independent of the assumption that all faces are triangles.
To find for a given point, say nr. k, the set of edges ending in it is very
awkward, and therefore must be avoided. Rather than storing "k", we must
store "ek" such that end(ek) = k. Then, for instance,
ek:= inv(suc(ek))
will switch ek to the next edge ending in k; by repeated application of that
transformation we shall be able to rotate ek along all edges ending in point
nr. k (and thus we have access to all faces with point nr. k on their boundary).
Note. As inv(inv(i)) = i and I expect to be rather free in assigning num-
bers to edges, we can probably assign numbers -::;t::. 0 and introduce the
convention inv(i) = -i; in that case we do not need to store the function
inv at all. (End of note.)
Our task is to try to separate with respect to the new point the current
hull into two caps. I take the position that establishing at which side of a
face the new point lies will be the time-consuming operation and would like
to confront the new point at most once with each face (and possibly with
some faces not at all).
To use a metaphor, we can regard the new point as a lamp that illumi-
nates all faces on which its rays fall from outside. Let us call them the "light"
faces and all the other ones the "dark" faces. The light faces have to be
removed; only if the point lies inside the current hull, does it leave all faces
dark.
After the colouring of the first face I only expect to colour new faces
adjoining a coloured one. The two main questions that I find myself ponder-
ing now are:
1. Do we insist that, as long as only faces of the same colour have been
found, the next face to investigate should form a cap with the already
coloured ones, or do we admit "holes"? The reason is that the test
whether a new face, when added, will give rise to a hole does not seem
too attractive.
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 173

2. As soon as two colours have been used, we know that the new point
lies outside the current hull and we also know an edge between two
differently coloured faces. Should we change strategy and from then on
try to trace the boundary between dark and light faces, rather than
colour all faces of our initial colour?

It took me a very long time to find a suitable answer to these questions.


Because we have taken the position that we shall confront each face at most
once with a new point, we have in general three kinds of faces: "dark'',
"light", and "undecided yet". And a certain amount of this information has
to be recorded in order to avoid reconfrontation of the same face with the
new point. But faces have no names! A face is only known as "the face along
edge i", i.e. with edge i on its clockwise boundary, and for each face there
are three such edges. So it seems more advantageous to carry out the admin-
istration not so much in terms of faces, but in terms of edges, the more so
because in the first phase we are searching for an edge between a light and a
dark face.
For a new point to be confronted with the faces we introduce for the
purpose of this discussion two constants, called "right" and "wrong":
"right" is the colour (dark or light respectively) of the (arbitrarily chosen)
first face that has been confronted with the new point, "wrong" (light or
dark respectively) is the other colour.
Let K be the set of edges i such that
1. the face along edge i is right (and this has been established);
2. the face along edge -i has not been inspected yet.
The initialization of K consists then of the three edges of the clockwise bound-
ary of the first face confronted. As long as the set K is not empty, we select
an arbitrary edge from it, say edge x, and the face along edge -x is con-
fronted (for the first time, because edge x satisfied criterion (2)) with the new
point. If the new face is wrong, the first edge between two differently coloured
faces has been found and we enter the second phase, the discussion of which
is postponed. If the new face, however, is also right we consider the edges
y for y = -x, y = sue( - x) and y = suc(suc( -x)), i.e. the edges of the
clockwise boundary of the new right face, and for each of the three values
we perform
if edge -y is in K __. remove edge -y from K
Dedge -y is not in K __. add edge y to K
fi
For, in the first case the edge -y no longer satisfies criterion (2), and in the
second case edge y (which cannot be in K already because it did not satisfy
criterion (J)) now satisfies criterion (J) and also criterion (2) (for if the face
along -y had been inspected earlier, edge -y would still have been in K).
1 74 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS

In the meantime we have, at least tentatively, made up our mind about


questions 1 and 2: we do not insist that the established right faces always form
a cap and we do intend to change strategy as soon as the first edge of the
boundary between the caps has been found. (These decisions were made
during a period that I did not write; it included an evening walk through the
village. My first remark to myself was that tracing the boundary between the
two caps as soon as one of its edges has been found reduces a two-dimensional
search to a one-dimensional one and is, therefore, not to be discarded lightly.
Then I realized that as soon as it has been decided to adopt that strategy,
there does not seem to be much of a point anymore in sticking during the
first phase to established right faces always forming a cap. Then for some time
I tried to figure out how to carry out the first phase in terms of faces being
right or uninspected, until I remembered what I had temporarily forgotten,
viz. that I had already decided to express the structure in terms of edges.
Confident that now it could be worked out I returned to my desk and wrote
the above.)
Now we must tackle the second phase. Let x be an edge of the clockwise
boundary of the right cap. How do we find, for instance, the next edge of
that clockwise boundary? We have seen how to scan the faces having a
point in common; the only question is whether the knowledge of set K is
sufficient to stick to our principle that no face should be confronted with the
new point more than once.
Well, that seems more than we can hope for, because in the meantime
we have found a wrong face, and on the one hand we cannot expect to deal
definitely with it, i.e. with all its adjoining faces, and on the other hand we
would not like to confront the wrong face more than once with the new
point either. The simplest suggestion seems to maintain a similar administra-
tion for edges with respect to wrong faces, viz. let H be the set of edges i
such that

1. the face along edge i has been established wrong;


2. the face along edge -i has not been inspected yet.

Can we come away with that? We observe that the intersection of Kand H
is empty, because no edge i can be a member of both Kand H.
Can we maintain both sets K and H if the new face is right, and also if
the new face is wrong? Let x be an edge of set K and let us consider again
the three edges y for y = -x, y = suc(-x), and y = suc(suc(-x)). In the
following, B is the set of discovered edges of the clockwise boundary of the
right cap.
If the new face is right, its three edges y have each to be processed by:
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 175

if edge -y is in K--> remove edge -y from K


Dedge -y is in H __. remove edge -y from H and
add edge y to B
Dedge -y is not in (H + K) __.add edge y to K
fi
If the new face is wrong, its three edges y have each to be processed by:
if edge -y is in K--> remove edge -y from Kand
add edge -y to B
Dedge -y is in H __. remove edge -y from H
Dedge -y is not in (H + K) __.add edge y to H
fi
Looking at the above, I have one of the many surprises of my life! When
His still empty and the new face is right, the adjustment reduces -and that
is not very surprising- to the adjustment of the first phase. But nowhere
have we used that x was a member of K: if x would have been a member of
H, the algorithm would have worked equally well. We only insist that x
should be a member of the union H + K in order to avoid reconfrontation
and thus to ensure termination! As far as logical considerations are con-
cerned, we can initialize K with the edges of the clockwise boundary of the
first face and initialize Hand B empty. Then we can select as long as possible
an arbitrary edge x from the union H + K and subject the clockwise edges
y of the face along the edge -x to the above treatment. This process ends
when H + K is empty; the new point is already inside the current hull if
and only if finally B is still empty!
Before proceeding, a few retrospective remarks seem in order. I have
introduced the restriction to confront each face only once with the new
point for reasons of efficiency, after having made the logically irrelevant
assumption that that would be the time-consuming operation. J should not
have worried about efficiency so early in the game; I should have focussed
my attention more on the logical requirement of guaranteed termination. If
I had done so, I would never have worried about "caps" -a herring that is
becoming redder and redder- and I would not have considered the two
phases as logically different! (My long hesitation as to how to answer ques-
tions (J) and (2) should have made me more suspicious!) For what remains?
It boils down to the observation that as soon as the edges in a nonempty B
form a cycle, we can stop, even if H + K is not empty yet and knowing that,
we can try to exploit our freedom in choosing x from H + K so as to find all
edges of that cycle as quickly as possible. Jn our effort to keep things as sim-
ple as possible we feel now entitled to postpone the conscious tracing of the
boundary between the two caps as an optimization that can be expected to
be easily plugged in at a later stage.
176 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS

The introduction of the concepts right/wrong also seems to have been a


mistake, seeing how we can select an edge x from the union H + K. (It was
not only a mistake, it was a stupid mistake. I hesitated very long before I
did so and then I had unusual difficulties in finding suitable names for them
and in phrasing their definitions, but I ignored this warning, fooling myself
in believing that I had done something "smart".) We shall drop these con-
cepts and carry our administration out in the original concepts light/dark.
With this, we must be able to accomplish three simplifications:

1. We do not need to treat the first face inspected separately; its three edges
y can be treated as those of any other face.
2. The two alternative constructs for dealing with the three edges of a
newly inspected face (for a right face and for a wrong face respectively)
as given above can be mapped on a single alternative construct.
3. We don't need to invert the edges of B if it was the clockwise boundary
of the dark cap.

The further details will be postponed until we have explored the next
operation: adjusting the convex hull so as to include the new point as well.

Having found a nonempty boundary we must remove the inner edges of


the light cap, if any, and add the edges connecting the new point to the
points on the cap boundary. As the number of edges to be removed is totally
unrelated to the number of edges to be added, we propose to deal with these
two tasks separately. In the unrefined search for the boundary, in the course
of which all faces have been inspected, simultaneous removal of all inner
edges of the light cap could easily have been incorporated. Because I would
like to keep the option for optimization of the search for the boundary open,
we have to solve the problem of finding the set of inner edges of the white
cap when its nonempty clockwise boundary B is given, regardless of how
that boundary has been found.
Because the problem of finding the set of inner edges of the light cap is a
(not so special) instance of a more general problem -of which the exercise
attributed to R.W. Hamming and the traversal of a binary tree are more
special instances- we shall tackle the general problem (which is really the
problem of finding the transitive closure) first.
Given a (finite) set of elements. For each element x zero or more other
elements are given as "the consequences of x". For any set B, a set S(B) is
given by the following three axioms:

Axiom 1. Each element of B belongs to S(B).


Axiom 2. If x belongs to S(B), then the consequences of x, if any,
belong to S(B) as well.
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 177

Axiom 3. The set S(B) contains only those elements that belong
to it on account of Axioms I and 2.

For given Bit is required to establish the relation


R: V= S(B)
The idea of the algorithm is quite simple: initialize with V:= B, and as long
as V contains an element x whose consequences do not all belong to V,
extend V with those consequences and stop when no such element can be
found anymore. A refinement of this algorithm can be found by observing
that once for a given element y it has been established that all its consequences
are in Vas well, that relation will continue to hold, because the only modi-
fication Vis subjected to is the extension with new elements. The refinement
consists of keeping track of the subset of such elements y, because their
consequences need not to be taken into consideration anymore. The set V
is therefore split into two parts, which we may call C and V::::: C. Here
V::::: C contains all elements of which it has been established that their
consequences are already all in V, while C (possible "causes") contains all
remaining elements which may have a consequence outside V. The algorithm
deals with one element of Cat a time. Let c be an element of C; it is removed
from C (and, therefore, added to V::::: C) and all its consequences outside
V are added to V and C. The algorithm stops when C is empty. Termination
is guaranteed for finite S(B) because, although the number of elements of
C may increase, the number of elements of V::::: C increases by one at each
step, and this number is bounded from above by S(B). (The number of steps
is independent of the choice of the element c from C!)
The above informal description of the intended computational process
may satisfy some of my readers -it satisfied me for more than fifteen years.
I hope that in the meantime it will dissatisfy some of them as well. For
without stating the invariance relation and without stating clearly the nature
of the inductive argument, the best we can reach is: "Well, I don't see how
the above computational process could fail to accomplish the desired result."
The argument should express clearly what is meant by the intuitive
remark that there is no point in investigating the consequences of a given
element more than once. For this purpose we introduce the functions Sc(B)
which are defined by the same axioms as S(B) but for another set of conse-
quences, viz.
if x belongs to C, x has no consequences,
if x does not belong to C, x has the consequences as originally given.
Because Sif>(B) = S(B), it is suggested to choose as invariant relations
Pl: V = Sc(B) and P2: C is a subset of V,
relations which are easily established by the assignment V, C := B, B.
178 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS

The crucial properties of the function Sc(B) are the following:


1. Sc(B) is a monotonic function of C; that is, if C2 is a subset of CJ, then
Sc1(B) is a subset of SciB).
2. Sc(B) is not changed if C is extended with elements not belonging
to Sc(B).
Let c be an arbitrary element of C; extension of C with the consequences
of c that do not belong to Vleaves the relation Pl undisturbed on account of
the second property. Subsequent removal of c from C means that, on account
of the first property, all elements already in V remain in V, which (main-
taining P 1 and restoring P2) is adjusted by
V:= V + {the set of consequences of c}
(where"+" is used to denote set union). The forming of the set union boils
down to extension of V with the consequences of c not belonging to V, i.e.
the elements that just have been added to C; but as these new elements "have
no consequences", this completes the adjustment.
Remark. The algorithm for the transitive closure is of general interest,
not in the last place because it shows how the usual recursive solution for
traversing a binary tree is a special case of this one. The role of C is then
taken over by the stack and the test whether a consequence of c does already
belong to V can be omitted because absence of loops in the tree guarantees
that each node will be encountered only once. This observation casts some
doubts upon opinions held high by some members of the Artificial Intel-
ligentsia. The one is that recursive solutions, thought of as in some sense
"more basic", come more natural than repetitive ones, an opinion which has
given rise to the development of tsemi-)automatic program rewriting systems
that translate recursive solutions into -presumably more efficient- repeti-
tive ones. The other opinion is that in the case of searching through a finite
portion of a potentially infinite set the linguistic tool of recursive co-routines
is needed. The point is that from a nonempty C any element may be chosen
to act as "c" and any operating system designer knows that a last-in-first-out
strategy engenders the danger of individual starvation; any operating system
designer also knows how to exorcize this danger. It is not excluded that later
generations will pronounce as their judgment that in the last fifteen years,
recursive solutions have been placed upon too high a pedestal. If this judg-
ment is correct, the phenomenon itself will be explained by the fact that up
to some years ago we had no adequate mathematical tool for dealing with
repetition and assignment to variables. (End of remark.)
In our example we can take for B (as used in the above general treatment)
the B as used in our treatment of the convex hull, i.e. the edges of the clock-
wise boundary of the light cap. As "consequences of edge i" we can take
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 179

I. none, if suc(i) belongs to B;


2. suc(i) and -suc(i) if suc(i) does not belong to B.
The final value of Vis then B + fthe set of inner edges of the white cap}.
Removing the inner edges of the white cap presents us with a minor
clerical problem. Because we want to tabulate the functions suc(i) and end(i),
we would like to number the edges of the convex hull -zero excluded- with
consecutive numbers ranging from -n through +n; and of course we can
begin to do so (extending the arrays at both ends as n grows), but as soon as
edges are removed, we get "holes" in our numbering system. Two ways out
present themselves: either we renumber the remaining edges of the convex
hull so as to use again consecutive numbers, or we keep track of the "holes"
and use them -if present- as names for edges to be added. Renumbering
the remaining edges is, in general, not a very attractive solution, because it
means the updating of all references to the edge being renumbered. Although
in this case the updating of the administration seems quite feasible, I prefer
to use the more general technique of keeping track of the "holes", i.e. the
unused names in the range from -n through +n.
The standard technique is to use holes in last-in-first-out fashion, and to
exploit one of the function values, e.g. sue, for keeping track of the stack of
holes. More precisely:
i = 0 is permanently the oldest hole
if i ( -::F 0) is a hole, then sue(i) is the next older hole.
Finally we introduce one integer variable, yh say (short for "youngest hole").
Removal of edge i, i.e. making a hole of it, is performed by
sue: (i)= yh; yh:= i;
if there are no holes, except zero, we have yh = 0; otherwise we have yh -::F 0
and assigning to i the value of the youngest hole is then done by
i:= yh; yh:= sue(i)
In order not to duplicate the administration we shall associate with the
single hole i the edge pair f +i, -i}.
We seem to approach the moment of coding, and of fixing the ways in
which the argument is given and the answer is to be delivered. Given is a
cloud of N > 4 points, by means of three array constants x, y, and z, such
that
x.lob = y.lob = z.lob = 1
x.hib = y.hib = z.hib = N
and x(i), y(i), and z(i) are for I <i< N the x-, y- and z-coordinates respec-
tively of point nr. i.
180 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS

We propose to deliver the convex hull by means of two arrays, "sue"


and "end" and a single integer "start" such that
sue.lob = end.lob = -sue.hib = -end.hib
and

(a) start is the number of a directed edge of the convex hull.


(b) If i is the number of a directed edge of the convex hull:
(bl) - i is the number of the directed edge in the opposite direction;
(b2) end(i) is the number of the point towards which the directed edge
i points;
(bJ) sue(i) is the number of the "next" directed edge of the clockwise
boundary of the face "along" edge i (for the definitions of "next"
and "along" in this context, see far above).

The admission of holes implies that there may be (even will be) values of i
such that finally 0 < abs(i) < sue.hib, while i is not the number of an edge
of the final answer; the introduction of the variable "start" allows us not to
make any commitments regarding the value of sue(i) and end(i) for such a
value of i. Symbolically, we can now describe the function of the inner block
to be designed by
"(sue, end, start) vir hull:= convex hull of (x, y, z)".
As far as our external commitments are concerned, we could restrict
ourselves to the introduction of a single private variable, "np" say, and the
invariant relation
Pl: (sue, end, start)= convex hull of the first np points of (x, y, z)
and a block

"(sue, end, start) vir hull:= convex hull of (x, y, z)":


begin glocon x, y, z; virvar sue, end, start; privar np;
"initialize P 1 for small value of np";
do np =F x.hib __. "increase np by 1 under invariance of PI" od
end

but this is not sufficient for two reasons. Firstly, inside the repeatable state-
ment we want to reuse holes, and therefore we should introduce the variable
yh for the youngest hole. Secondly, the increase of np implies scanning the
(faces along) edges of the convex hull. We could introduce in the repeatable
statement an array that each time could be initialized with a special value for
each edge, meaning "not yet confronted with the new point". Logically, this
would be perfectly correct, but it seems more efficient to bring that array
outside the repeatable statement and to see to it that, as the scanning pro-
ceeds, it ends up with the same neutral values with which it started. We shall
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 181

call this array "set" and choose the neutral value= 0. Summing up, we intro-
duce the invariant relation

P2: (sue, end, start)= convex hull of the first np points of (x, y, z)
and yh = youngest hole
and (i is a hole =F 0) ~ suc(i) is the next oldest hole
and (i is an edge of (sue, end, start)) ~ set(i) = 0

and the version from which I propose to proceed is

"(sue, end, start) vir hull : = convex hull of (x, y, z)":


begin glocon x, y, z; virvar sue, end, start; privar np, yh, set;
"initialize P2 for small value of np";
do np =F x.hib --> "increase np by 1 under invariance of P2" od
end

The initialization is very simple as soon as we realize that it suffices to do


so for np = 3, just a triangle with two faces:

"initialize P2 for small value of np":


begin virvar sue, end, start, np, yh, set;
sue vir int array:= (-3, -2, -1, -3, 0, 2, 3, I);
endvir int array:= (-3, 1, 3, 2, 0, 3, 1, 2);
start vir int := 1; np vir int := 3; yh vir int := O;
set vir int array := (-3, 0, 0, 0, 0, 0, 0, 0)
end

The refinement of "increase np by 1 under invariance of P2"-the only


thing left to be done- is more complicated. The increase of np and the choice
of point np as the new point (npx, npy, and npz will be introduced as its
coordinates) is easy. The next steps are to determine the boundary and, if
any, to adjust the hull. In the first step we shall use values set(i) = ±1 to
indicate "half-inspected" edges, i.e.
set(i) = +1 means: the face along edge i has been established light, the
face along edge -i has not yet been confronted with
the current new point
set(i) = -1 means: the face along edge i has been established dark, the
face along edge -i has not yet been confronted with
the current new point
set(i) = +2 means: the edge i has been established to belong to the
clockwise boundary of the light cap.
For edges of the convex hull not belonging to any of the above categories,
we shall have set(i) = 0 during the search for the boundary. When this search
182 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS

has been completed, all edges i that have had set(i) = ±1 will have set(i)
reset to 0 or to 2.
Besides recording the boundary as "the edges with set(i) = 2", it is
helpful to have a list of these edge numbers in cyclic order, because that
comes in handy when the edges to and from the new point have to be added.
Because the optimization that switches to a linear search as soon as the
first edge of the boundary has been found finds the edges in cyclic order,
our version will produce that list as well. We propose to record the numbers
of the edges of the boundary in cyclic order in an array, "b" say; b.dom = O
can then be taken as an indication that no boundary has been found. Our
coarsest design becomes:

"increase np by 1 under invariance of P2" :


begin glocon x, y, z; glovar sue, end, start, np, yh, set;
pricon npx, npy, npz; privar b;
np:= np + 1;
npx vir int, npy vir int, npz vir int := x(np), y(np), z(np);
b vir int array := (O);
"establish boundary in set and b";
if b.dom = 0 --> skip
a b.dom > 0 --> "adjust the hull"
fi
end

To establish the boundary in "set" and "b" would require two steps: in
the first step all faces are confronted with the new point and the boundary
is established in "set", and in the second step we would have to trace the
boundary and place its edges in cyclic order in the list "b". Although it was
my original intention to do so, and to leave the transition to the more linear
search as soon as the first edge of the boundary has been found as an exercise
in optimization to my readers, I now change my mind, because I have dis-
covered a hitherto unsuspected problem: the first inspection of a face that
reveals an edge of the boundary may reveal more than one boundary edge.
If the faces are triangles, they will be adjacent boundary edges, but the
absence of faces with more than three edges is only due to the restriction
that we would not have four points in a plane. I did not intend to allow this
restriction to play a very central role and as a result I refuse to exploit this
more or less accidental adjacency of the boundary edges that may be revealed
at the inspection of a new face. And the potential "simultaneous" discovery
of nonadjacent boundary edges was something I had not foreseen; adjacency
plays a role if we want to discover boundary edges in cyclic order, i.e. place
their edge numbers in cyclic order in array b (with "b:hiext"). The moral of
the story seems to be to separate the "discovery" of a boundary edge -while
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 183

scanning the edges of the newly inspected face- from the building up of the
value of "b". Because the discovered boundary edges have to be separated
in those "processed", i.e. stored as a function value of b, and those still
unprocessed, some more information needs to be stored in the array "set".
I propose:
set(i) = 1 and set(-i) = 0 the face along edge i has been established
light, the face along edge -i has not yet
been confronted with the current new point
set(i) = -1 and set( -i) = 0 the face along edge i has been established
dark, the face along edge - i has not yet
been confronted with the current new point
set(i) = 1 and set( -i) = -1 edge i is an unprocessed edge of the
clockwise boundary of the light cap
set(i) = 2 and set( - i) = 0 edge i is a processed edge of the clockwise
boundary of the light cap
set(i) = 0 and set(-i) = 0 the faces along i and -i are both unin-
spected or have been established to have
the same colour.
We stick to our original principle only to inspect (after the first time) a
face along (the inverse of) a half-inspected edge, say along -xx (i.e. set(xx) =
±1 and set(-xx) = O); we can then use the value xx= 0 to indicate that
no more inspection is necessary. The relation "b.dom = O" can be used to
if!dicate that up till now no boundary edges have been discovered; the first
boundary edge to be discovered will just be placed in b (thereby causing
b.dom = 1), the remaining boundary edges will then be placed by careful
extension of b. A first sketch (at this level still too rough) is

"establish boundary in set and b":


begin glocon x, y, z, sue, end, start, npx, npy, npz;
glovar set, b; privar xx; xx vir int :=start;
do xx-::/= 0-->
"inspect face along -xx";
if b.dom > 0 --> "extend b and refresh xx"
0 b.dom = 0 __. "reassign xx"
fi
od
end

(The different names "refresh xx" and "reassign xx" have been used in order
to indicate that rather different strategies are involved.)
I have called this sketch too rough: "reassign xx" has to assign to xx
the number of a half-inspected edge (if it can find one, otherwise zero). It
184 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS

would be very awkward indeed if this implied a search over the current hull,
which would imply again an administration to prevent the algorithm from
visiting the same edge twice during that search! Therefore we introduce an
array c (short for "candidates") and will guarantee that
if i is the number of a half-inspected edge, then either i
occurs among the function values of c, or i = xx.
(Note that function values of c -which will always be edge numbers- may
also equal the number of an edge that, in the meantime, has been totally
inspected.) In view of the fact that zero is the last xx-value to be produced,
it will turn out to be handy to store the value zero at the low end of c upon
initialization (it is as if we introduce a "virtual edge" with number zero;
this is a standard coding trick). Our new version becomes:

"establish boundary in set and b":


begin glocon x, y, z, sue, end, start, npx, npy, npz;
glovar set, b; privar xx, c;
xx vir int :=start; c vir int array := (0, O);
do xx -::;t::. 0 __. "inspect face along -xx";
if b.dom > 0--> "extend b and refresh xx"
0 b.dom = 0 --> "reassign xx"
fi
od
end

Because for "reassign xx" initially edge xx is not half-inspected (for the
face along -xx has just been inspected!) and
abs(set(xx)) = 1 and set(-xx) = 0
is the condition for being half-inspected, the last subalgorithm is coded
quite easily:

"reassign xx" :
do xx -::;t::. 0 and non (abs(set(xx)) = 1 and set(-xx) = 0) __.
xx,c: hipop
od

The algorithm for "extend b and refresh xx" is more complicated. If we


focus our attention on the search for the edge with which we would like to
extend b, we realize that we are looking for an edge, xx say, such that
P1: it begins, where edge b.high ends, i.e. end(b.high) = end(-xx)
P2: the face along xx has been established to be light
P3: the face along -xx has been established to be dark.
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 185

Because xx:= suc(b.high) establishes the first two, we take Pl and P2 as


invariant relation. We can now distinguish four -and the largeness of four
explains why this algorithm is more complicated- different cases.
1. set(xx) = 1 and set(-xx) = -1: in that case, xx is the next edge of the
clockwise boundary of the light cap to be processed; it can be processed
and after that xx:= suc(xx) then re-establishes Pl and P2.
2. set(xx) = 0 and set(-xx) = 0: in that case, because the face along edge
xx has been established light, also the face along -xx has been estab-
lished light; therefore xx:= sue( -xx) gives a new xx satisfying P 1
and P2.
Operation (J) can only happen a finite number of times because there
are only a finite number of edges of the clockwise boundary of the light
cap to be processed. Operation (2) can only be repeated a finite number of
times, because end(-xx) is a vertex of a face that has been established to
be dark. When none of the two can take place, we have cne of the following
two cases:
3. set(xx) = 1 and set(-xx) = 0: in this case the face along -xx has not
been inspected yet and xx has an acceptable value for controlling the
next face inspection.
4. set(xx) = 2 and set(-xx) = 0: in this case xx is a processed edge of the
boundary and the loop must have been closed: we must have xx = b.low,
and the search is completed, i.e. xx = 0 is the only acceptable final value
for xx. We must, however, see to it that no edges i with set(i) = ±1
remain.
In the following program, for safety's sake guards have been made stronger
than strictly necessary.
"extend b and refresh xx":
xx:= suc(b.high);
do set(xx) = 1 and set(-xx) = -1 --->
set: (xx)= 2; set: (-xx)= O; b: hiext(xx); xx:= suc(xx)
0set(xx) = 0 and set(-xx) = 0---> xx:= suc(-xx)
od;
if set(xx) = 1 and set(-xx) = 0---> skip
0set(xx) = 2 and set(-xx) = 0 and xx = b.low--->
xx, c: hipop;
do xx* 0 ___,
do abs(set(xx)) = 1---> set: (xx)= 0 od;
xx, c: hipop
od
fi
186 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS

Under the assumption that "compute lumen" will initialize lumen = +1


if the face along -xx is light and = -1 if it is dark, the coding of "inspect
face along -xx" is now fairly straightforward.
"inspect face along -xx":
begin glocon x, y, z, sue, end, npx, npy, npz, xx;
glovar set, b, e; pricon lumen; privar yy, round;
"compute lumen";
yy vir int : = - xx; round vir boo/ :=false;
do non round -->
if set(-yy) = 0--> set: (yy)= lumen; e: hiext(yy)
0set(-yy) =lumen--> set: (-yy)= 0
0set( -yy) = -lumen -->
set: (yy)= lumen;
do b.dom = 0--> b: hiext(/umen * yy) od
fi.
'
yy:= sue(yy); round:= (yy = -xx)
od
end
The programming of the computation of "lumen" is not difficult, but
it is tedious: it boils down to the computation of the signed volume of
the tetrahedron with the vertices np, end(-xx), end(sue(-xx)) and
end(sue(sue(-xx))). In passing we choose a sense of rotation for our clock.
"compute lumen":
begin glocon x, y, z, sue, end, npx, npy, npz, xx;
vircon lumen; privar pt;
pricon vol, xl, yl, zl, x2, y2, z2, x3, y3, z3;
pt vir int:= end(-xx);
xl vir int, yl vir int, zl vir int :=
x(pt) - npx, y(pt) - npy, z(pt) - npz;
pt:= end(sue(-xx));
x2 vir int, y2 vir int, z2 vir int :=
x(pt) - npx, y(pt) - npy, z(pt) - npz;
pt:= end(sue(sue(-xx)));
x3 vir int, y3 vir int, z3 vir int : =
x(pt) - npx, y(pt) - npy, z(pt) - npz;
vol vir int := xl *(Y2 * z3 - y3 * z2) +
x2 *(Y3 * zl - yl * z3) +
x3 *(Yl * z2 - y2 * zl);
if vol > 0 --> lumen vir int : = -1
0 vol< 0-> lumen vir int:= +l
fi
end
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 187

We are left with the task of refining "adjust the hull'', where the edges of
B, the clockwise boundary of the light cap, are given in two ways:
1. In cyclic order as the function value of the array b: this representation
facilitates tracing the boundary, doing something for all edges of B, etc.
2. set(i) = 2 holds if and only if edge i belongs to B; this facilitates answer-
ing the question "Does edge i belong to B?"
Because the number of edges that have to disappear (i.e. the inner edges
of the light cap) and the number of edges that have to be added (i.e. the
edges connecting the new point with the vertices on the clockwise boundary)
are totally unrelated, it seems best to separate the two activities:
"adjust the hull":
"removal of edges";
"addition of edges"
In the first one it does not seem too attractive to merge identification of
the inner edges with their removal: during the identification process the
light cap of the current hull has to be scanned, and I would rather not mess
with that structure by removing edges before the scanning has been carried
out completely. (This does not mean to imply that it could not be done, it
just says that I am currently no longer in the mood to investigate the possi-
bility.)
Because, on account of our "holes", inner edge i and inner edge - i have
to be removed simultaneously, it suffices for the removal to build up a list of
"undirected" edge numbers, i.e. the value i will be used to indicate that both
edge i and edge -i should disappear. Calling that list rm, we are led to

"removal of edges":
begin glocon b; glovar sue, yh, set; privar rm;
"initialize rm with list of inner edges";
"removal of edges listed in rm"
end

In accordance with our earlier conventions about holes, the second one
is now coded quite easily
"removal of edges listed in rm" :
do rm.dom > 0 __.sue: (rm.high)= yh; yh, rm: hipop od
The initialization of rm is, after our earlier analysis, no longer difficult.
The relation set(i) = set(-i) = 3 will be used to represent "i and -i have
been detected as inner edges" and enables us to test whether edge i belongs
to what we have called V (boundary+ inner edges) by set(i) > 2. Again we
use an array variable, called "e", to list the candidates.
188 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS

"initialize rm with list of inner edges":


begin glocon b, sue; glovar set; virvar rm; privar e, k;
e vir int array := (O); k vir int := b.lob;
do e.dom -::F b.dom __. e: hiext(b(k)); k:= k + 1 od;
rm vir int array := (O);
do e.dom > 0 -->
begin glocon sue; glovar set, rm, e; pricon h;
h vir int:= sue(e.high); e: hirem;
if set(h) > 2--> skip
0set(h) = 0--> set: (h)= 3; set: (-h)= 3; rm: hiext(h);
e: hiext(h); e: hiext(-h)
fi
end
od
end

Finally, we have to refine "addition of edges". If we deal with one edge


of the boundary at a time, connecting its end to the new point, then we have
some difficulty with the first triangle: the number of the edge connecting its
begin with the new point has not yet been decided. To start with, we can
proceed as if this was the virtual edge, nr. zero, and initialize t (for "trail")
accordingly and close the loop at the end. Before the local constant e is
initialized with the number of the youngest hole, the algorithm checks
whether there is such a hole, otherwise one is made by extending the arrays
concerned at both ends. At the end of the algorithm it is assured that the
value of "start" again is the number of an edge of the convex hull. (This is
typically the kind of statement to forget; I had done so, as a matter of fact,
for originally it was my intention to include this resetting in "removal of
edges" because that is the action that may cause "start" to become equal to
the number of an edge that is no longer part of the convex hull. My attention
was drawn to that omission when I investigated the environment of "addition
of edges", which is the environment of "adjust the hull" and there I found
start among the global variables! Too lazy to make a change in the already
typed-out part of my manuscript, I am seduced to a typical program patch!)

"addition of edges":
begin glovar sue, end, start, yh, set; glocon np, b;
privar t, k;
t vir int, k vir int := 0, b.lob;
do k < b.hib -->
begin glovar sue, end, yh, set, t, k; glocon np, b;
pricon e;
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 1 89

do yh = 0 ----> sue: hiext(O); sue: /oext(O);


end: hiext(O); end: /oext(O);
set: hiext(O); set: /oext(O);
yh:= sue.hib
od;
e vir int := yh; yh:= sue(e);
sue: (b(k))= e; set: (b(k))= O;
sue: (e)= t; end: (e)= np; set: (e)= O;
sue: (t)= b(k); end: (t)= end(-b(k)); set: (t)= 0;
t:= -e; k:= k + 1
end
od;
sue: (sue(b.low))= t;
sue: (t)= b.low; end: (t)= end(-b.low); set: (t)= O;
sue: (0)= O; end: (0)= O; set: (0)= O;
start:= b./ow
end

The above completes the historical description of the first algorithm I


have made that should construct the convex hull in three dimensions. It is
only my first effort and the algorithm I arrived at does not seem to be a very
efficient one; its "quality" is certainly no justification for its publication. (To
try to develop more efficient algorithms is an exercise that I gladly leave to
01y readers. I stop here, for, after all, this book is not a treatise on convex
hull algorithms.) This chapter is included for other reasons.
The first reason has been stated at the beginning of this chapter: to fore-
stall the criticism that I only deal with easy, little problems.
The second reason was that I wanted to show how such algorithms can
be arrived at by means of a process of step-wise refinement. To be quite
honest, I wanted to show more, viz. that the method of step-wise refinement
seems one of the most promising ways of developing programs of -ultimately
- considerable yet mastered complexity. I hope that with this example I
have succeeded in doing so. I think that this example deserves to be called
"of considerable yet mastered complexity" because the ultimate program
consists of thirteen named chunks of code, whose (what we might call) "sub-
stitution hierarchy" is five layers deep! It is displayed in the following indent-
ed table:

O) (sue, end, start) vir hull:= convex hull of (x, y, z)


1) initialize P2 for small value of np
1) increase np by 1 under invariance of P2
2) establish boundary in set and b
3) inspect face along - xx
190 THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS

4) compute lumen
3) extend b and refresh xx
3) reassign xx
2) adjust the hull
3) removal of edges
4) initialize rm with list of inner edges
4) removal of edges listed in rm
3) addition of edges

The method of step-wise refinement was developed in a conscious effort


to do justice to the smallness of our human skulls. This implies that when we
have to design an algorithm that ultimately will be represented by a large
text, we just have to invent the concepts that will enable us to describe the
algorithm at such a level of abstraction that that description comes out
short enough to be intellectually manageable. And as far as that goal is
concerned, the method of step-wise refinement seems adequate. It is of
course no guarantee for an in all respects high-quality program!
The third reason to include this chapter was that I wanted to be as fair
and, thereby, as convincing as possible. The first example that I have used to
demonstrate the method of step-wise refinement was the development of a
program generating a table of the first thousand prime numbers. Although
this was in many ways a very nice and compact example, its convincing power
was somewhat impaired by everyone's knowledge that I already knew how
to generate the first thousand prime numbers before I started to write that
essay. Therefore I wrote this chapter, not having the foggiest notion how
it would end when I started it. Whenever a few pages were written, I typed
them out before proceeding.
Note 1. After the above had been typed out, we found by inspection
three errors in the program text: a semicolon was missing, in "removal
of edges listed in rm" I had written "sue: (yh)= rm.high", and in "initial-
ize rm with list of inner edges" the line "c: hiext(h); c: hiext(-h)" had
been dropped. It is worth noticing that only the most trivial one of these
could have been detected by a syntactical checker. It is also worth noticing
that the other two occurred in sections of which I had announced that
they were "now coded quite easily" and "no longer difficult" respectively!
(End of note 1.)
Note 2. It is far from excluded that the efficiency of my program can be
improved considerably, even without introducing strategies based upon
expectation values of numbers of points inside given volumes: if the point
lies already inside the convex hull, one does not need to confront it with
all faces; if it lies outside, there should be (on the average, bah!) cheaper
ways of finding a first boundary edge of the light cap. If, for instance,
the convex hull has more than five points, there exists at least one pair
THE PROBLEM OF THE CONVEX HULL IN THREE DIMENSIONS 191

whose shortest connection goes through the interior and we could try to
locate the intersection of the convex hull with the plane through the new
point and the two points of such a pair. Our searches would then be linear.
(End of note 2.)
Note 3. In the program as published above, Mark Bebie has found an
error. In order to maintain the convention
"set(i) = 2 and set(-i) = 0 edge i is a processed edge of the
clockwise boundary of the light cap"
the values of set(i) and set(-i) have to be adjusted when edge i is proc-
essed, i.e. b:hiext(i) takes places. In "extend b and refresh xx" this has
been done (in the third line), in "inspect face along -xx", however, it
has erroneously been omitted. Its tenth line
do b.dom = 0 ~ b:hiext(lumen * yy) od
should therefore be replaced by
do b.dom = 0 ~ b:hiext(lumen * yy);
set:(b.high) = 2; set:(-b.high) = 0
od
(End of note 3.)
FINDING THE MAXIMAL
STRONG COMPONENTS
25 IN A DIRECTED GRAPH

Given a directed graph, i.e. a set of vertices and a set of directed edges,
each leading from one vertex to another, it is requested to partition the ver-
tices into so-called "maximal strong components". A strong component is a
set of vertices such that the edges between them provide a directed path from
any vertex of the set to any vertex of the set and vice versa. A single vertex
is a special case of a strong component; then the path can be empty. A maxi-
mal strong component is a strong component to which no further vertices
can be added.
In order to establish this partitioning, we have to be able to make two
kinds of assertions: the assertion that vertices belong to the same strong com-
ponent, but also -because we have to find maximal strong components- the
assertion that vertices do not belong to the same strong component.
For the first type of assertion, we may use the following

THEOREM 1. Cyclically connected vertices belong to the same strong com-


ponent.
Besides (directed) connections between individual vertices, we can talk
about directed connections between different strong components; we say that
there is a connection from a strong component A to a strong component B
if there exists a directed edge from a vertex of A to a vertex of B. Because
A and B are strong components, there is then a path from any vertex of A
to any vertex of B. And as a result, Theorem 1 can be generalized into

THEOREM IA. Vertices of cyclically connected strong components belong to


the same strong component.

192
FINDING THE MAXIMAL STRONG COMPONENTS IN A DIRECTED GRAPH 193

COROLLARY 1. A nonempty graph has at least one maximal strong compo-


nent without outgoing edges.
So much for theorems asserting that vertices belong to the same strong
component. Because for different points to be in the same strong component
there must be paths between them in both ways, assertions that vertices do
not belong to the same strong component can be made on account of:

THEOREM 2. If the vertices are subdivided into two sets svA and svB such
that there exist no edges originating in a vertex of svA and terminating in a
vertex of svB, then

firstly: the set of maximal strong components does not depend on the
presence or absence of edges originating in a vertex of svB and
terminating in a vertex of svA, and
secondly: no strong component comprises vertices from both sets.

From Theorem 2 it follows that as soon as a strong component without


outgoing edges has been found, we can take its vertices as set svA and con-
clude that this strong component is a maximal strong component and that
all ingoing edges of svA can further be ignored. We conclude:

THEOREM 2A. A strong component whose outgoing edges, if any, are all ingo-
ing edges of maximal strong components is itself a maximal strong compo-
nent.
Or, to put it in another way, once the first maximal strong component
without outgoing edges -the existence of which is guaranteed by Corollary
1- has been found (identified as such by being a strong component without
outgoing edges), the remaining maximal strong components can be found by
solving the problem for the graph consisting of the remaining vertices and
only the given edges between them. Or, to put it in still another way, the
maximal strong components of a graph can be ordered according to "age",
such that each maximal strong component has outgoing edges only to "older"
ones.
In order to be able to be a little bit more precise, we denote by
sv: the given set of vertices (a constant)
se: the given set of edges (a constant)
pv: a partitioning of the vertices of sv.
The final relation to be established can then be written as
R: pv = MSC(se)
in which for the fixed set sv the function MSC, i.e. the partitioning in Maximal
Strong Components, is regarded as a function of the set of edges se.
194 FINDING THE MAXIMAL STRONG COMPONENTS IN A DIRECTED GRAPH

The corresponding invariant relation is suggested by the standard tech-


nique of replacing a constant by a variable, sel say, whose value will always
be a subset of se:
P: pv = MSC(sel)
Relation Pis easily initialized for empty se I, i.e. each vertex of sv is a maximal
strong component all by itself. Because sel is bounded in size by se, mono-
tonically increasing sel guarantees termination; if we can accomplish this
under invariance of P, relation R has been established by the time that
sel = se. In our discussions it will be convenient also to have a name, se2
say, for the remaining edges, i.e. se = sel + se2.
Our task is clearly to discover the most convenient order in which edges
are to be added to sel, where "convenience" is related to the ease with which
the invariance of relation P is maintained. This, as we know, can also be
phrased as: what is our general intermediate state, what types of pv-values
do we admit? In order to describe such a general intermediate state, it seems
practical to group the vertices of sv also in disjoint subsets (as we have done
for the edges sel and se2). After all, we are interested in partitioning vertices!
The general intermediate state should be a generalization of both initial
state and final state. At the beginning, it has not been established for any of
the vertices to which maximal strong component in MSC(se) they belong,
eventually it has been established for all vertices. Analogous to sel we can
introduce (initially empty and finally comprising all vertices) svl, where
svl contains all vertices of sv, for which the maximal strong component in
MSC(se) to which they belong has been identified.
We intend to use Theorem 2A for deciding that a strong component is a
maximal one; that is after having established something about all its outgoing
edges. When we now identify:
se I with the set of all processed edges, and
se2 with the set of all unprocessed edges, i.e. edges whose presence has not
yet been taken into account,
then we see that
PI : all outgoing edges of vertices in sv 1 are in se I
It is, however, too crude to group all remaining vertices in a single set
sv2. The way in which svl is defined implies that, each time a new maximal
strong component of MSC(se) has been identified, all the vertices of that
maximal strong component have to be transferred together to svl. Between
two such transfers, in general a number of edges have to be processed (i.e.
transferred from se2 to sel), and for the description of the intermediate states
that have to be taken into account with respect to "processing one edge at
FINDING THE MAXIMAL STRONG COMPONENTS IN A DIRECTED GRAPH 195

a time", the remaining vertices have to be separated a little bit more subtly,
viz. into two disjoint subsets, sv2 and sv3 say (with sv = svl :::+=: sv2 :::+=: sv3),
where sv3 contains the totally unprocessed vertices,
P2: no edge in sel begins or ends at a vertex in sv3
(sv3 is initially equal to sv and finally empty).
Transfer from sv3 to svl can then take place in two steps: from sv3 to
sv2 (one at a time) and from sv2 to svl (together with all other vertices from
the same definite maximal strong component).
In other words, among the vertices of sv2 we shall try to build up (by
enlarging sel) the next maximal strong component of MSC(se) to be trans-
ferred to svl. The maximal strong components in MSC(seJ) -note the argu-
ment!- are such that they comprise either vertices from svl only, or vertices
from sv2 only, or a (single) vertex from sv3. We propose a limitation on the
connections that the edges of sel provide between the maximal strong com-
ponents in MSC(sel) that contain nodes from sv2 only: between those maxi-
mal strong components the edges of sel shall provide no more and no less
than a single directed path, leading from the "oldest" to the "youngest" one.
We call these maximal strong components "the elements of the chain". This
choice is suggested by the following considerations.
Firstly, we are looking for a cyclic path that would allow us to apply
Theorem 1 or 1A in order to decide that different vertices belong to the same
maximal strong component. Under the assumption that we are free to pre-
scribe which edge will be the next one to be added to sel, there does not
seem to be much advantage in introducing disconnected maximal strong
components in MSC(sel) among those built up from vertices of sv2.
Secondly, the directed path from the "oldest" to the "youngest" com-
ponent in the chain -as "cycle in statu nascendi"- is easily maintained, as
is shown by the following analysis.
Suppose that se2 contains an edge that is outgoing from one of the ver-
tices of the youngest maximal strong component in the chain. Such an edge
"e" is then transferred from se2 to sel, and the state of affairs is easily main-
tained:

1. If e leads to a vertex from svl, it can be ignored on account of Theorem 2.


2. If e leads to a vertex from sv2, the youngest element of the chain can be
combined with zero or more next older elements to form the new young-
est element of the chain. More precisely, if e leads to a vertex in the
youngest element, it can be ignored; if it leads to an older element in the
chain, a cycle between strong components has been detected and then
Theorem 1A tells us that a number of the younger elements of the chain
have to be combined into a single one, thus reducing the length of the
chain, measured in number of elements.
196 FINDING THE MAXIMAL STRONG COMPONENTS IN A DIRECTED GRAPH

3. If e leads to a vertex from sv3, that latter vertex is transferred to sv2 and
as new youngest element (a maximal strong component in MSC(sel) all
by itself) it is appended to the chain, whose length is increased by one.

If there exists no such edge "e", there are two possibilities. Either the
chain is nonempty, but then Theorem 2A tells us that this maximal strong
component of MSC(sel) is a maximal strong component of MSC(se) as well:
the youngest element is removed from the chain and its vertices are trans-
ferred from sv2 to sv I. Or the chain is empty: if sv3 is not empty, an arbitrary
element of sv3 can be transferred to sv2, otherwise the computation is
finished.
In the above degree of detail we can describe our algorithm as follows:

sel, se2, svl, sv2, sv3:= empty, se, empty, empty, sv;
do sv3 #:- empty--> {the chain is empty}
transfer a vertex v from sv3 to sv2 and initialize the chain with {v};
do sv2 #:- empty--> {the chain is nonempty}
do se2 contains an edge starting in a vertex of the youngest
element of the chain -->
transfer such an edge e from se2 to sel;
if e leads to a vertex v in sv I --> skip
0e leads to a vertex v in sv2 --> compaction
0 e leads to a vertex v in sv3 --> extend chain and trans-
fer v from sv3 to sv2
fi
od; {the chain is nonempty}
remove youngest element and transfer its vertices from sv2 to svl
od {the chain is again empty}
od

Note I. As soon as vertices are transferred from sv2 to svl, their incom-
ing edges (if any) that are still in se2 could be transferred simultaneously
from se2 to sel, but the price for this "advanced" processing (the gain of
which is doubtful) is that we have to be able to select for a given vertex
the set of its incoming edges. As the algorithm is described, we only need
to find for each vertex its outgoing edges. Hence the above arrangement.
(End of note I.)
Note 2. Termination of the innermost repetition is guaranteed by de-
crease of the number of edges in se2; termination of the next embracing
repetition is guaranteed by decrease of the number of vertices in sv2 ::t:
sv3; termination of the outer repetition is guaranteed by decrease of the
number of vertices in sv3. The mixed reasoning, sometimes in terms of
FINDING THE MAXIMAL STRONG COMPONENTS IN A DIRECTED GRAPH 197

edges and sometimes in terms of vertices, is a symptom of the nontriviality


of the algorithm we are developing. (End of note 2.)

To the degree of detail in which we have described our algorithm, each


edge is transferred once from se2 to sel and each vertex is transferred once
from sv3 via sv2 to svl: as such our algorithm implies an amount of work
linear in the number of edges and vertices. In our next refinement we should
try not to spoil that pleasant property, as we would do if, for instance, the
test whether v is in svl, sv2, or sv3 (which occurs within the innermost
repetition!) implied a search with a computation time proportional to the
number of vertices. The restricted way in which our vertex sets are manipu-
lated, in particular the fact that the vertices enter and leave the chain in
last-in-first-out fashion, can be exploited for this purpose.
We consider our vertices consecutively numbered and tabulate the func-
tion "rank(v)", where v ranges over all vertex numbers; we assume NV to
denote the number of vertices:
rank(v) = 0 means: vertex nr. vis in sv3
rank(v) > 0 means: vertex nr. vis in svl '.f sv2
(The sets sv2 and sv 1 are, to start with, combined: one of the possible forms
of compaction is a skip!)
If nvc equals the "number of vertices in the chain", i.e. the number of
vertices in sv2, then
1 < rank(v) < nvc means: vertex v is in sv2
rank(v) > NV+ 1 means: vertex v is in svl
All vertices in sv2 will have different rank-values, and as far as rank and nvc
are concerned, transferring vertex v from sv3 to sv2 will be coded by
"nvc:= nvc + l; rank:(v) = nvc"
i.e. the vertices in the chain are "ranked" in the order of decreasing "age in
the chain". The latter convention allows us to represent how the vertices of
sv2 are partitioned in strong components quite efficiently: vertices belonging
to the same element of the chain have consecutive values of rank; and for the
elements themselves, the rank of their oldest vertex is a decreasing function
of the element age. Using cc(i) to denote the rank of the oldest vertex of the
ith oldest element of the chain (we have then cc.dom = the number of ele-
ments in the chain), as far as rank, nvc, and cc are concerned, we can code
the alternative construct (combining the first two alternatives) as follows:
if rank(v) > 0--> do cc.high > rank(v)--> cc:hirem od
a rank(v) = 0--> nvc:= nvc + 1; rank:(v) = nvc; cc:hiext(nvc)
fi
198 FINDING THE MAXIMAL STRONG COMPONENTS IN A DIRECTED GRAPH

In the meantime we have somewhat lost trace of the identity of the ver-
tices in the chain. If, for instance, we would like to transfer the vertices of
the youngest element of the chain from sv2 to svl, our current tabulations
would force us to scan the function rank for all values of v, such as to find
those satisfying cc.high < rank(v) < nvc. We would not like to do that,
but thanks to the fact that at least for the vertices in sv2, all values of rank(v)
are different, we can also store the inverse function:
for 1 < r < nvc: rank(v) = r <::> knar(r) = v
So much for keeping track of the vertices; let us now turn our attention
to the edges. The most crucial question with regard to the edges is, of course,
the guard of the innermost repetitive construct: "se2 contains an edge starting
in a vertex of the youngest element of the chain". That guard is evaluated
easily with the aid of a list of edges from se2 outgoing from the vertices of
the youngest element of the chain. One of the ways in which the youngest in
the chain may change, however, is compaction; in order to maintain that list
we, therefore, also need the corresponding lists for the older elements of the
chain. Because for those edges we are interested only in the identity of their
"target vertex", we introduce as the next part of our chain administration
two further array variables -with domain = 0 when the chain is empty-
called "tv" (for "target vertices") and "tvb" (for "tv-bounds").
The domain of tvb will have one point for each element of the chain: its
value equals the number of outgoing edges of se2 from vertices of older ele-
ments in the chain (the domain of tvb is all the time equal to that of cc,
which also stores one value for each chain element). Each time a new vertex
vis transferred from sv3 to sv2, the array tvb is extended at the high end with
the value of tv.dom, whereafter tv is extended at the high end with the target
vertices of the outgoing edges of v. Denoting that latter operation with
"extend tv with target of v" the whole inner repetition now becomes (taking
knar, tv, and tvb into account as well)

"inner loop":
do tv.dom > tvb.high --->
v, tv :hipop;
if rank(v) > 0--->
do cc.high> rank(v)---> cc:hirem; tvb:hirem od
0rank(v) = 0--->
nvc:= nvc + 1; rank:(v) = nvc; knar:hiext(v);
cc :hiext(nvc); tvb :hiext(tv .dom);
"extend tv with targets of v"
fi
od
FINDING THE MAXIMAL STRONG COMPONENTS IN A DIRECTED GRAPH 199

We had introduced for vertices v in svl the convention: rank(v) >NV.


We can make a stronger convention by numbering the maximal strong com-
ponents from 1 onwards (in the order in which they are detected) and intro-
ducing the convention that for a vertex v in svl we will have

rank(v) =NV+ v's maximal strong component number

With the variable "strno" (initially = 0), we can now code the

"middle loop":
do cc.dom > 0 -->
"inner loop";
strno: = strno + 1;
do nvc > cc.high -->
nvc:= nvc - l; rank:(knar.high) =NV+ strno;
knar:hirem; svlcount:= svlcount + 1
od;
cc :hirem; tvb :hirem
od

(The variable svlcount, initially = 0, counts the number of vertices in svl;


then svlcount = NV will be the criterion for completion of the task.)
We assume the vertices numbered from 1 through NV, and the edges to
be given by means of two array constants "edge" and "edgeb", such that for
1 < i < NV the values of edge(j) for edgeb(i) < j < edgeb(i + 1) give the
numbers of the vertices to which the edges outgoing from vertex nr. i lead.
We can then code

"extend tv with targets of v":


begin glocon edge, edgeb, v; glovar tv; privar j;
j vir int:= edgeb(v + l);
do j > edgeb(v)--> j:= j - 1; tv :hiext(edge(i)) od
end

The last problem to be solved is the selection of an arbitrary vertex v from


sv3 for the initialization of the chain. If each time the search would start at
vertex nr. 1, computation time could be proportional to NV 2 , but again this
can be avoided by taking a relation outside the repetition and introducing at
the outer level a variable "cand" (initially = 1) with the property:

sv3 contains no vertex v with v < cand


200 FINDING THE MAXIMAL STRONG COMPONENTS IN A DIRECTED GRAPH

begin glocon edge, edgeb, NV; virvar rank; privar svlcount, cand, strno;
rank vir int array:= (J); do rank.dom-::/= NV--> rank:hiext(O) od;
svlcount vir int, cand vir int, strno vir int:= 0, 1, O;
do svlcount-::/= NV-->
begin glocon edge, edgeb, NV; glovar rank, svlcount, cand, strno;
privar v, cc, tv, tvb, knar, nvc;
do rank(cand)-::/= 0---+ cand:= cand + 1 od; v vir int:= cand;
nvc vir int:= 1; rank :(v) = 1; knar vir int array:= (J, v);
cc vir int array:= (I, 1); tvb vir int array:= (1, 0);
tv vir int array:= (J);
"extend tv with targets of v";
"middle loop"
end
od
end
Note J. A very similar algorithm has been developed independently by
Robert Tarjan. (End of note 1.)
Note 2. In retrospect we see that the variable "nvc" is superfluous,
because nvc = knar.dom. (End of note 2.)
Note 3. The operation "extend tv with the targets of v" is used twice.
(End of note 3.)
Remark 1. The reader will have noticed that in this example the actual
code development took place in a different order than in the development of
the program for the convex hull in three dimensions. The reason is, I think,
the following. In the case of the convex hull, the representation had already
been investigated very carefully as part of the logical analysis of the problem.
In this example the logical analysis had been largely completed when we
faced the task of selecting a representation that would admit an efficient
execution of the algorithm we had in mind. It is then natural to focus one's
attention on the most crucial part first, i.e. the innermost loop.
(End of remark 1.)
Remark 2. It is worth noticing the various steps in which we arrived at
our solution. In. the first stage our main concern has been to process each
edge only once, forgetting for the time being about the dependence of the
computation time on the number of vertices. This is fully correct, because, in
general, the number of edges can be expected to be an order of magnitude
larger than the number of vertices. (As a matter of fact, my first solution for
this problem -not recorded in this chapter- was linear in the number of
edges but quadratic in the number of vertices.) It was only in the second
stage that we started to worry about linear dependence on the number of
vertices as well. How effective this "separation of concerns" has been is strik-
ingly illustrated by the fact that in that second stage graph theory did no
longer enter our considerations at all! (End of remark 2.)
26 ON MANUALS
AND IMPLEMENTATIONS

In the fifties I wrote for a number of machines of Dutch design what I


called "a functional description" -the type of document that I should en-
counter later under the name "reference manual"- and I still remember
vividly the pains I took to see that these functional descriptions were un-
ambiguous, complete, and accurate: I regarded it as my duty to describe the
machines as far as their properties were of importance for the programmer.
Since then my appreciation of the relation between machine and manual,
however, has undergone a change.
Eventually, there are two "machines". On the one hand there is the phys-
ical machine that is installed in the computer room, can go wrong, requires
power, air conditioning, and maintenance and is shown to visitors. On the
other hand there is the abstract machine as defined in the manual, the "think-
able" machine for which the programmer programs and with respect to which
the question of program correctness is settled.
Originally I viewed it as the function of the abstract machine to provide
a truthful picture of the physical reality. Later, however, I learned to consider
the abstract machine as the "true" one, because that is the only one we can
"think"; it is the physical machine's purpose to supply "a working model'',
a (hopefully!) sufficiently accurate physical simulation of the true, abstract
machine.
This change in attitude was accompanied by a change in terminology:
instead of "computer science" the term "computing science" came into use
and we no longer gave courses in "Programming for Electronic Computers"
-we could not care less whether the physical machine worked electronically,
pneumatically or by magic! But it was of course more than a mere play with
words; it was the symptom that slowly the programming profession was
becoming of age. It used to be the program's purpose to instruct our com-
puters; it became the computer's purpose to execute our programs.

201
202 ON MANUALS AND IMPLEMENTATIONS

The full force of an interface was emerging, and a discrepancy between


abstract and physical machine was no longer interpreted as an inaccuracy in
the manual, but as the physical machine not working according to specific-
ations. This presupposes, of course, that these specifications were not only
completely unambiguous, but also so "understandable" -and orders of mag-
nitude more simple than the engineering documentation of the machine-
that everybody could agree that they described what was intended. I remem-
ber the time that with respect to hardware specification such interfaces were
achieved with a clarity that satisfied everybody's needs: the hardware
designers knew what they had to achieve and the programmer had complete
control over his tool without the need for "experimental" programs for dis-
covering its properties. This rigorous and indispensible clarity did not only
extend itself over the hardware, but also over the basic software, such as
loaders, input and output routines, etc., no more than a few hundred, perhaps
a thousand instructions anyhow. (If I remember correctly, the ominous term
"software" had still to be invented.)

Sad remark. Since then we have witnessed the proliferation of baroque,


ill-defined and, therefore, unstable software systems. Instead of working with
a formal tool, which their task requires, many programmers now live in a
limbo of folklore, in a vague and slippery world, in which they are never
quite sure what the system will do to their programs. Under such regretful
circumstances the whole notion of a correct program -let alone a program
that has been proved to be correct- becomes void. What the proliferation of
such systems has done to the morale of the computing community is more
than I can describe. (End of sad remark.)

In the preceding chapters we have introduced predicates for the charac-


terization of sets of states and have shown how program texts could be inter-
preted as codes for predicate transformers, establishing a relation between
what we called "final and initial states". In accordance with that approach
we have presented the programming task as the construction of (a code for)
a predicate transformer establishing a desired relation between those two
states.
When introducing the various ways of constructing new predicate trans-
formers from existing ones -semicolon, alternative, and repetitive constructs
-- we have given hints as to how these could be implemented. We would like
to stress, however, that, although these possibilities have certainly acted as
a source of inspiration, they have no defining function whatsoever. The fact
that our program texts admit the alternative interpretation of "executable
code" has played a role in our motivations, but plays no role in the definition
of the semantics of our programming language: our semantic definitions are
not based upon any "model of computation".
ON MANUALS AND IMPLEMENTATIONS 203

Note 1. We have aimed at a semantic definition independent of compu-


tational history with a very specific purpose in mind, viz. a separation of
concerns that strikes me as vital in the whole programming activity. The
concerns to be separated could be called "the mathematical concerns"
and "the engineering concerns". With the mathematical concerns I refer
to the program's correctness; with the engineering concerns I refer mainly
to the various cost aspects -such as storage space and computation time
- associated with program execution. These cost aspects are only defined
with respect to an implementation, and, conversely, the implementation
(or, stronger, the interpretation of the program text as executable code)
need only be taken into account when cost aspects are in discussion. As
long as we are interested in program correctness, it suffices to interpret
the text as a code for a predicate transformer and nothing is gained by
simultaneously remembering that the text can also be interpreted as exe-
cutable code. On the contrary! Hence our preference for an axiomatic
system for the semantics that is independent of any computational model
and that does not define the final state as "the output" of a computational
process with the initial state as "the input". (End of note 1.)
Note 2. The simplicity of the formal system presented, together with the
circumstance that to some extent it could be forged into a tool for the
formal derivation of programs of a certain type, is for me its greatest
attraction. The question of how much (or how little) of what should (or
could) be done with automatic computers can be captured in this way
falls outside the scope of this monograph. For the time being the part
that could be captured seems rich enough to justify exploration.
(End of note 2.)
Having defined semantics independent of any computational model raises
the question of to what extent we may hope that our programming language
can be implemented at all. From a very etheric point of view, we could dis-
miss the question, saying that this is none of our business, that this is the
implementer's concern, and that as long as we don't want our programs to
be executed the question is irrelevant anyhow. But to dismiss this question is
not my intention.
Actual machines have stores of a finite capacity and, left to themselves,
they behave in principle like finite state automata which after a sufficient
large number of steps must return to a state in which they have been before.
(If, in addition, the machine is deterministic, history will from then on repeat
itself.) For the overall behaviour of today's modern computers, the theory of
finite state automata is, however, oflimited significance, because their number
of possible states is so incredibly huge that they could go on working for
years without returning to a state in which they have been before. As an
alternative we can study "the infinite state automaton" and drop the restric-
tion of a finite store capacity. This, apparently, is what we have done: we
204 ON MANUALS AND IMPLEMENTATIONS

have introduced a finite number of variables, but, having introduced variables


of type "integer", there is no bound on their number of possible values and,
therefore, no bound on the number of possible states. We have written pro-
grams for the Unbounded Machine "UM", and this is clearly something no
engineer can build for us!
He can, however, build a Sufficiently Large Machine "SLM" that, when
loaded with the same program S as the UM, will simulate the latter's behav-
iour successfully when UM embarks upon the execution of the program S
started at an initial state satisfying wp(S, T). Thanks to the fact that for each
initial state satisfying wp(S, T) both nondeterminacy and the number of com-
putational steps are bounded, the collection of integer values possibly mani-
pulated by the UM is finite and, therefore, they all fall within a finite range.
Provided that the SLM can manipulate integers in that finite range, it can
simulate the UM's behaviour on that computation (hopefully fast enough to
make the simulation interesting).
Note 3. Since Turing it is customary for mathematicians to regard the
SLM, i.e. the machine performing with a finite speed computational steps
in a bounded state space, as "feasible". It is in this connection wise to
remember that the engineer must simulate the SLM by analogue means
and that most theoretical physicists seem to believe that the probability
of erroneous simulation of the SLM differs from zero and increases with
the speed of operation. (End of note 3.)
The implementation of the nondeterminacy requires some special atten-
tion. When we consider the program part
S: do go on --> x: = x +1
a go on --> go on:= false
od
our formalism gives for all k > 0: Hk(T) = non go on and, therefore,
wp(S, T) =non go on
We imagine that the hypothetical machine UM will execute the repetitive
contruct by evaluating all the guards, terminating if they are all false, other-
wise selecting one of the alternatives with a true guard for execution, and
then, i.e. after successful completion of the latter, repeating the process. When
initially non go on holds, i.e. go on =false, our formalism tells us that initially
wp(S, T) is satisfied and this is in full accordance with our desire that termi-
nation is guaranteed: as a matter of fact, the UM will immediately stop
repeating. If, however, initially go on = true, wp(S, T) is initially not satisfied,
and if we wish to stick to our interpretation of wp(S, T) as the weakest pre-
condition guaranteeing termination, we must reject any would-be UM in
which the freedom of choosing would be exercised so "fairly" with respect to
the various alternatives that each possible alternative will be selected sooner
ON MANUALS AND IMPLEMENTATIONS 205

or later. We could think of a gambling device, such as tossing a coin, heads


for the first and tails for the second alternative. As no unbiased coin is obliged
sooner or later to turn up with "tails", it would satisfy our requirement. But
in a sense we are now over-specifying the UM. The trouble is that the then
natural assumption of an unbiased coin would tempt us to conclude that,
although termination is not exactly guaranteed, the probability of nontermi-
nation is 0 and that the expectation value for the amount by which x will
be increased equals 1. Such probabilistic studies can very soon become very
difficult, but, luckily, we should not embark on them, for there is something
wrong in the whole approach: we allowed nondeterminacy in those cases
that we did not care which way the UM would choose, but after that we
should not start caring! The most effective way out is to assume the UM not
equipped with an unbiased coin, but with a totally erratic daemon; such a
daemon makes all these probabilistic questions a priori void. (And we can
live with such a daemon in the machine; whenever, at second thought, we
do care which of the alternatives will be chosen and when, we had better
strengthen the guards.)
It might be thought that the simulation of the UM with its erratic daemon
presents some serious problems to the engineer who has to design the simu-
lating SLM. It would, indeed, be difficult, ifit required the generation of truly
erratic choices-whatever that may be. But as the daemon is only a metaphor
embodying our ignorance, its implementation should not present the slightest
problem. If we know that our engineer of the SLM has ten different "quasi-
daemons" on the shelf, we give him complete freedom in his choice which
quasi-daemon he plugs in when we want to use the SLM and we solemnly
declare that we shall even not complain when he plugs in an eleventh one
that he may have concocted in the meantime. As long as we don't know and
refuse to know its properties, everything is all right.
We have now reached the stage that, given a program and given the
initial state, we could, at least in principle, select a sufficiently large machine
SLM, but such a course of action is somewhat unrealistic for two reasons.
Firstly, it is highly unusual to have a whole sequence of SLM's of increasing
size on the shelf; secondly, it is in general quite a job to determine for an
arbitrary program and a given initial state a priori the size of the SLM that
would do it. It is easier "to try".
The more realistic situation is that, instead of a sequence of SLM's, we
have one "hopefully sufficiently large machine" HSLM. The HSLM is two
things, merged into one. Besides acting as the largest SLM we can afford, it
checks, when called to execute a program, as the computation proceeds,
whether this SLM is large enough for the current computation. If so, it pro-
ceeds with the simulation of the UM's behaviour, otherwise it refuses to
continue. The HSLM enables us "to try", i.e. to do the experiment, whether
the SLM it embodies is really sufficiently large for the current computation.
206 ON MANUALS AND IMPLEMENTATIONS

If the HSLM carries the computation out to the end, we can deduce from the
nonoccurrence of the refusal that the embodied SLM has been large enough.
From the above it is clear that explicit refusal by the HSLM, whenever
asked to do something exceeding its capacity, is a vital feature of the HSLM:
it is necessary for our ability of doing the experiment. There exist, regretfully
enough, machines in which the continuous check that the simulation of the
behaviour of the UM is not beyond their capacity is so time-consuming, that
this check is suppressed for the supposed sake of efficiency: whenever the
capacity would be exceeded by a correct execution, they just continue -for
the supposed sake of convenience- incorrectly. It is very difficult to use such
a machine as a reliable tool, for the justification of our belief in the correct-
ness of the answers produced requires in addition to the proof of the pro-
gram's correctness a proof that the computation is not beyond the capacity
of the machine, and, compared to the first one, this second proof is a rather
formidable obligation. We would need an (axiomatic) definition of the pos-
sible happenings in the UM, while up till now it sufficed to prescribe the net
effects; besides that, the precise constraints imposed by the actual machine's
finiteness are often very hard to formulate. We therefore consider such
machines that do not check whether simulating the UM exceeds their capacity
as unfit for use and ignore them in the sequel.
Thanks to its explicit refusal to continue, recognizable as such, the HSLM
is a safe tool, but it would not be a very useful one if it refused too often!
In practice, a programmer does not only want to make a program that would
instruct the UM to produce the desired result, he also wants to reduce the
probability (or even to exclude the possibility) that the HSLM refuses to
simulate the UM. If, for a given HSLM, this desire to reduce the probability
of refusal entails very awkward obligations for the programmer (or, also, if
the programmer has a hard time in estimating how effective measures that he
considers to take will turn out to be) this HSLM is just awkward to use.

We now return to the notion of a so-called "liberal" pre-condition, which


has been discussed at the end of the chapter "The Characterization of Seman-
tics". A liberal pre-condition guarantees that a post-condition will be satisfied
provided that the computation has terminated properly, but does not guar-
antee the proper termination itself. In the intervening chapters we had no use
for this notion, because we defined the semantics as a relation between initial
and final state, independent of program execution, and it is only in connec-
tion with execution that the notion becomes meaningful. The possibility of
program execution only entered the picture when we introduced the un-
bounded machine UM. But of the UM we have only required that it would
execute our programs, i.e. that it would bring for any (unmentioned) post-
condition R itself in a state satisfying R provided that we started it in an
initial state satisfying wp(S, R); in particular, we only insisted upon proper
termination, provided that the initial state satisfied wp(S, T). For initial states
not satisfying wp(S, T) we have not cared to prescribe a behaviour of the
ON MANUALS AND IMPLEMENTATIONS 207

UM and to the UM the notion of a liberal pre-condition is therefore not


applicable. But we do introduce the notion with respect to the HSLM, by
requiring the latter to refuse would-be continuation when its capacity is
exceeded. In other words, we accept an HSLM that is only able to simulate
properly a subset of the computations that are guaranteed to terminate prop-
erly when executed by the UM.
Note 4. The notion of the liberal pre-condition is introduced here in
recognition of the fact that the HSLM is so bounded. This is in sharp
contrast to the very similar notion of "partial correctness", which has
been introduced in connection with unbounded machines (such as Turing
machines) because of the undecidability of the Halting Problem.
(End of note 4.)
One may raise the question -but I shall not answer it- of what the UM
will do when started at an initial state for which we don't know whether it
satisfies wp(S, T) or not. I take the position (from a philosophical point of
view probably very shaky) that as long as we have not proved that the initial
state satisfies wp(S, T), the UM may do as it likes, in the sense that we have
no right to complain. (We can imagine the following setup. In order to start
the UM, the user must push the start button, but in order to be able to do so,
he must be in the computer room. For reasons of protection of the air condi-
tioning, the UM can only be started provided that all doors of the computer
room are closed and for safety's sake, the UM (which has no stop button!)
will keep the doors locked while in operation. This environment shows how
deadly dangerous it can be to start a machine without having proved termi-
nation!)
This refusal to commit the UM to a well-defined behaviour if it has not
been proved that wp(S, T) is initially satisfied, has the consequence that we
cannot draw any conclusion from the fact of termination itself. We could try
to use a machine to search for a refutation of Goldbach's Conjecture that
each natural number n > 2 is the average of two primes, but would the fol-
lowing program do?

begin privar n, refuted;


n vir int:= I; refuted vir boo/:= false;
do non refuted-->
begin glovar n, refuted; privar x, y;
n:=n+I;
x vir int, y vir int:= 2, 2 * n;
do x < y and x + y < 2 * n -->
x:= smallest prime larger than (x)
a x < y and x + y > 2 * n -->
y:= largest prime smaller than (y)
od;
refuted:= (x + y =F 2 * n)
end
208 ON MANUALS AND IMPLEMENTATIONS

od;
printboo/(refuted)
end

Because I have not proved that Goldbach's Conjecture is false, I have not
proved that wp(S, T) is initially true; therefore, the UM may act as it pleases
and I am, therefore, not allowed to conclude that Goldbach's Conjecture is
wrong when it prints "true" and stops. I would be allowed to draw that
surprising conclusion, however, if the third line had been changed into
"do non refuted and n < 1 000 000 -->"

and, at second thought, I even prefer the modified program, because it is


more honest than the original version: no one starts with a computation with-
out an upper bound for the time he is willing to wait for the answer.
27 IN RETROSPECT

Once the automatic computer was there, it was not only a new tool, it
was also a new challenge and, if the tool was without precedent, so was the
challenge. The challenge was -and still is- a two-fold one.
Firstly we are faced with the challenge of discovering new (desirable)
applications, and this is not easy, because the applications could be as revo-
lutionary as the tool itself. Ask the average computing scientist: "If I were
to put a ten-megabuck machine at your disposal, to be installed for the benefit
of mankind, how and to what problem would you apply it?", and you will
discover that it will take him a long time to come up with a sensible answer.
This is a serious problem that puts great demands on our fantasy and on our
powers of imagination. This challenge is mentioned for the sake of complete-
ness; this monograph does not address it.
Secondly, once an (hopefully desirable!) application has been discovered,
we are faced with the programming task, i.e. with the problem of bending the
general tool to our specific purpose. For the relatively small and slow mach-
ines of the earlier days the programming problem was not too serious, but
when machines at least a thousand times as powerful became generally
available, society's ambition in applying them grew in proportion and the
programming task emerged as an intellectual challenge without precedent.
The latter challenge was the incentive to write this monograph.
On the one hand the mathematical basis of programming is very simple.
Only a finite number of zeros and ones are to be subjected to a finite number
of simple operations, and in a certain sense programming should be trivial.
On the other hand, stores with a capacity of many millions of bits are so
unimaginably huge and processing these bits can now occur at so unimagin-
ably high speeds that the computational processes that may take place -and

209
210 IN RETROSPECT

that, therefore, we are invited to invent- have outgrown the level of triviality
by several orders of magnitude. It is the unique combination of basic sim-
plicity and ultimate sophistication which is characteristic for the programm-
ing task.
We realize what this combination implies when we compare the program-
mer with, say, a surgeon who does an advanced operation. Both should
exercise the utmost care, but the surgeon has fulfilled his obligations in this
respect when he has taken the known precautions and is then allowed to
hope that circumstances outside his control will not ruin his work. Nor is
the surgeon blamed for the incompleteness of his control: the unfathomed
complexity of the human body is an accepted fact of life. But the programmer
can hardly exonerate himself by appealing to the unfathomed complexity
of his program, for the latter is his own construction! With the possibility
of complete control, he also gets the obligation: it is the consequence of the
basic simplicity.
One consequence of the power of modern computers must be mentioned
here. In hierarchical systems, something considered as an undivided, unana-
lyzed entity at one level is considered as something composite at the next
lower level of greater detail; as a result the natural grain of time or space
that is appropriate for each level decreases by an order of magnitude each
time we shift our attention from one level to the next lower one. As a con-
sequence, the maximum number of levels that can be distinguished meaning-
fully in a hierarchical system is more or less proportional to the logarithm
of the ratio between the largest and the smallest grain, and, therefore, we
cannot expect many levels unless this ratio is very large. In computer pro-
gramming our basic building block, the instruction, takes less than a micro-
second, but our program may require hours of computation time. I do not
know of any other technology than programming that is invited to cover a
grain ratio of 10 10 or more. The automatic computer, by virtue of its fan-
tastic speed, was the first to provide an environment with enough "room"
for highly hierarchical artifacts. And in this respect the challenge of the
programming task seems indeed without precedent. For anyone interested
in the human ability to think difficult thoughts (by avoiding unmastered
complexity) the programming task provides an ideal proving ground.

When asked to explain to the layman what computing scientists call


"modularization'', the easiest analogy to use is probably the way in which
the scientific world has parcelled out its combined knowledge and skills
over the various scientific disciplines. Scientific disciplines have a certain
"size" that is determined by human constants: the amount of knowledge
needed must fit into a human head, the number of skills needed may not be
more than a person can learn and maintain. On the other hand, a scientific
discipline may not be too small, too narrow either, for it should last a lifetime
IN RETROSPECT 211

at least without becoming barren. But not any odd collection of scraps of
knowledge and an equally odd collection of skills, even of the right size,
constitute a viable scientific discipline! There are two other requirements.
The internal requirement is one of coherence: the skills must be able to
improve the knowledge and the knowledge must be able to refine the skills.
And finally there is the external requirement -we would call it "a narrow
interface"- that the subject matter can be studied in a reasonably high degree
of isolation, not at any moment critically dependent on developments in other
areas.
The analogy is not only useful to explain "modularization" to the layman,
conversely it gives us a clue as to how we should try to arrange our thoughts
when programming. When programming we are faced with similar problems
of size and diversity. (Even when programming at the best of our ability, we
can sometimes not avoid that program texts become so long that their sheer
length causes (for instance, clerical) problems. The possible computations
may be so long or so varied that we have difficulty in imagining them. We
may have conflicting goals such as high throughput and short reaction times,
etc.) But we cannot solve them by just splitting the program to be made into
"modules".
To my taste the main characteristic of intelligent thinking is that one
is willing and able to study in depth an aspect of one's subject matter in
isolation, for the sake of its own consistency, all the time knowing that one
is occupying oneself with only one of the aspects. The other aspects have to
wait their turn, because our heads are so small that we cannot deal with
them simultaneously without getting confused. This is what I mean by
"focussing one's attention upon a certain aspect"; it does not mean com-
pletely ignoring the other ones, but temporarily forgetting them to the extent
that they are irrelevant for the current topic. Such separation, even if not
perfectly possible, is yet the only available technique for effective ordering
of one's thoughts that I know of.
I usually refer to it as "a separation of concerns", because one tries to
deal with the difficulties, the obligations, the desires, and the constraints
one by one. When this can be achieved successfully, we have more or less
partitioned the reasoning that had to be done -and this partitioning may
find its reflection in the resulting partitioning of the program into "modules"
- but I would like to point out that this partitioning of the reasoning to be
done is only the result, and not the purpose. The purpose of thinking is to
reduce the detailed reasoning needed to a doable amount, and a separation
of concerns is the way in which we hope to achieve this reduction.
The crucial choice is, of course, what aspects to study "in isolation",
how to disentangle the original amorphous knot of obligations, constraints,
and goals into a set of "concerns" that admit a reasonably effective separa-
tion. To arrive at a successful separation of concerns for a new, difficult
212 IN RETROSPECT

problem area will nearly always take a long time of hard work; it seems
unrealistic to expect it to be otherwise. But even without five rules of thumb
for doing so (after all, we are not writing a brochure on "How to Think Big
Thoughts in Ten Easy Lessons"), the knowledge of the goal of "separation
of concerns" is a useful one: we are at least beginning to understand what
we are aiming at.
Not that we don't have a rule of thumb! It says: don't lump concerns
together that were perfectly separated to start with! This rule was applied
before we started this monograph. The original worry was that we would
end up with unreliable systems that either would produce the wrong result
that could be taken for the correct one, or would even fail to function at all.
If such a system consists of a combination of hardware and software, then,
ideally, the software would be correct and the hardware would function
flawlessly and the system's performance would be perfect. If it does not,
either the software is wrong or the hardware has malfunctioned, or both.
These two different sources of errors may have nearly identical effects: if,
due to a transient error, an instruction in store 'has been corrupted or if,
due to a permanent malfunctioning, a certain instruction is permanently
misinterpreted, the net effect is very similar to that of a program bug. Yet
the origins of these two failures are very different. Even a perfect piece of
hardware, because it is subject to wear and tear, needs maintenance; software
either needs correction, but then it has been wrong from the beginning, or
modification because, at second thought, we want a different program. Our
rule of thumb tells us not to mix the two concerns. On the one hand we may
ponder about increasing the confidence level of our programs (as it were,
under the assumption of execution by a perfect machine). On the other hand
we may think about execution by not fully reliable machines, but during that
stage of our investigations we had better assume our programs to be perfect.
This monograph deals with the first of the two concerns.
In this case, our rule of thumb seems to have been valid: without the
separation of hardware and software concerns, we would have been forced
to a statistical approach, probably using the concept MTBF ( = "Mean Time
Between Failures", where "Mean Time Between Manifested Errors" would
have been more truthful), and the theory described in this monograph could
never have been developed.
Before embarking upon this monograph, a further separation of concerns
was carried through. I quote from a letter from one of my colleagues:
"There is a third concern in programming: after the preparation of "the pro-
gram text as a static, rather formal, mathematical object'', and after the
engineering considerations of the computational processes intended to be
evoked by it under a specific implementation, I personally find hardest actually
achieving this execution: converting the human-readable text, with its slips
which are not seen by the eye which "sees what it wishes to see", into machine-
IN RETROSPECT 213

readable text, and then achieving the elusive confidence that nothing has been
lost during this conversion."

(From the fact that my colleague calls the third concern the "hardest" we
may conclude that he is a very competent programmer; also an honest one!
I can add the perhaps irrelevant information that his handwriting is, however,
rather poor.) This third concern is not dealt with in this monograph, not
because it is of no importance, but because it can (and, therefore, should)
be separated from the others, and is dealt with by very different, specific
precautions (proof reading, duplication, triplication, or other forms of
redundancy). I mentioned this third concern because I found another col-
league -he is an engineer by training- so utterly obsessed by it that he
could not bring himself to consider the other two concerns in isolation from
it and, consequently, dismissed the whole idea of proving a program to be
correct as irrelevant. We should be aware of the fact, independent of whether
we try to explain or understand the phenomenon, that the act of separating
concerns tends to evoke resistance, often voiced by the remark that "one is
not solving the real problems". This resistance is by no means confined to
pragmatic engineers, as is shown by Bertrand Russell's verdict: "The advan-
tages of the method of postulation are great; they are the same as the advan-
tages of theft over honest toil.".
The next separations of concerns are carried through in the book itself:
it is the separation between the mathematical concerns about correctness
and the engineering concerns about execution. And we have carried this
separation through to the extent that we have given an axiomatic definition
of the semantics of our programming languages which allows us, if we so
desire, to ignore the possibility of execution. This is done in the book itself
for the simple reason that, historically speaking, this separation has not been
suggested by our rule of thumb; the operational approach, characterized by
"The semantics itself is given by an interpreter that describes how the state
vector changes as the computation progresses." (John McCarthy, 1965) was
the predominant one during most of the sixties, from which R.W. Floyd
(1967) and C.A.R. Hoare (1969) were among the first to depart.
Such a separation takes much more time, for even after having the inkling
that it might be possible and desirable, there are many ways in which one
can go. Depending on one's temperament, one's capacities, and one's evalu-
ation of the difficulties ahead, one can either be very ambitious and tackle the
problem for as universal a programming language as possible, or one can
be cautious and search consciously for the most effective constraints. I have
clearly opted for the second alternative, and not including procedures (sec,
or also as parameters or even as results) seemed an effective simplification,
so drastic, as a matter of fact, that some of my readers may lose interest
in the "trivial" stuff that remains.
214 IN RETROSPECT

The remaining main questions to decide were the following ones:

1. whether to derive a weakest pre-condition from a desired post-condition


or to derive a strongest post-condition from a given pre-condition;
2. whether to focus on weakest pre-conditions -as we have done- or on
weakest liberal pre-conditions;
3. whether or not to include nondeterminacy;
4. whether the "daemon" should be erratic or in some sense "fair".

How does one settle them? The fact that the derivation of the weakest
pre-conditions instead of strongest post-conditions seemed to give a smoother
formalism may be obvious to others, I had to discover it by trying both.
When starting from the desired post-condition seemed more convenient,
that settled the matter in my mind, as it also seemed to do more justice to
the fact that programming is a goal-directed activity.
The decision to concentrate on just pre-conditions rather than liberal
pre-conditions took longer. I wished to do so, because as long as predicate
transformers deriving weakest liberal pre-conditions are the only carrier for
our definition of the semantics, we shall never be able to guarantee termina-
tion: such a system seemed too weak to be attractive. The matter was settled
by the possibility of defining the wp(DO, R) in terms of the wp(IF, R).
The decision to incorporate nondeterminacy was only taken gradually.
After the analogy between synchronizing conditions in multiprogramming
and the sequencing conditions in sequential programming had suggested the
guarded command sets and had prepared me for the inclusion of nondeter-
minacy in sequential programs as well, my growing dislike for the asymme-
tric "if B then SJ else S2 fi", which treats S2 as the default -and defaults I
have learned to mistrust- did the rest. The symmetry and elegance of
if x > y --> m: = x a y > x- m: = y fi
and the fact that I could derive this program systematically settled this
question.
For one day -and this was a direct consequence of my experience with
multiprogramming, where "individual starvation" is usually to be avoided-
! thought it wise to postulate that the daemon should select "in fair random
order", i.e. without permanent neglect of one of the permissible alternatives.
This fair random order was postulated at the stage when I had only given an
operational description of how I thought to implement the repetitive con-
struct. The next day, when I considered a formal definition of its semantics,
I saw my mistake and the daemon was declared to be totally erratic.
In short, of course after the necessary exploratory experiments, ques-
tions (J) through (4) have mainly been settled by the same yardstick: formal
simplicity.
IN RETROSPECT 215

My interest in formal correctness proofs was, and mainly still is, a derived
one. I had witnessed many discussions about programming languages and
programming style that were depressingly inconclusive. The cause of the
difficulty to come to a consensus was the absence of a few effective yardsticks
in whose relevance we could all believe. (Too much we tried to settle in the
name of convenience for the user, but too often we confused "convenient"
with "conventional", and that latter criterion is too much dependent on each
person's own past.) During that muddle, the suggestion that what we called
"elegant" was nearly always what admitted a nice, short proof came as a
gift from heaven; it was immediately accepted as a reasonable hypothesis
and its effectiveness made it into a cherished criterion. And, above all, length
of a formal proof i~ an objective criterion: this objectivity has probably been
more effective in reaching a comfortable consensus than anything else, cer-
tainly more effective than eloquence could ever have been. The primary
interest was not in formal correctness proofs, but in a discipline that would
assist us in keeping our programs intelligible, understandable, and intellec-
tually manageable.
I have dealt with the examples in different degrees of formality. This
variation was intended, as I would not like to give my readers the impression
that a certain, fixed degree of formality is "the right one". I prefer to view
formal methods as tools, the use of which might be helpful.
I have tried to present programming rather as a discipline than as a
craft. Since centuries we know two main techniques for transmitting knowl-
edge and skills to the next generation. The one technique is characteristic for
the guilds: the young apprentice works for seven years with a master, all
knowledge is transferred implicitly, the apprentice absorbs, by osmosis so
to speak, until he may call himself a master too. (This implicit transfer makes
the knowledge vulnerable: old crafts have been lost!) The other technique
has been promoted by the universities, whose rise coincided (not accidentally!)
with the rise of the printing press; here we try to formulate our knowledge
and, by doing so, try to bring it into the public domain. (Our actual teaching
at the universities often occupies an in-between position: in mathematics,
for instance, mathematical results are published and taught quite explicitly,
the teaching of how to do mathematics is often largely left to the osmosis,
not necessarily because we are unwilling to be more explicit, but because we
feel ourselves unable to teach the "how" above the level of motherhood
statements.)
While dealing with the examples I have been as explicit as I could
(although, of course, I have not always been able to buffer the shock of
invention); the examples were no more than a vehicle for that goal of explicit-
ness.
We have formulated a number of theorems about alternative and repeti-
tive constructs. That was the easy part, as it concerns knowledge. With the
216 IN RETROSPECT

aid of examples we have tried to show how a conscious effort to apply this
knowledge can assist the programming process, and that was the hard part,
for it concerns skill. (I am thinking, for instance, of the way in which the
knowledge of the Linear Search Theorem assisted us in solving the problem
of the next permutation.) We have tried to make a few strategies explicit,
such as the Search for the Small Superset, and a few techniques for "massag-
ing" programs, such as bringing a relation outside a repetitive construct. But
these are techniques that are rather closely tied to (our form of) programming.
Between the lines the reader may have caught a few more general mes-
sages. The first message is that it does not suffice to design a mechanism of
which we hope that it will meet its requirements, but that we must design it
in such a form that we can convince ourselves -and anyone else for that
matter- that it will, indeed, meet its requirements. And, therefore, instead
of first designing the program and then trying to prove its correctness, we
develop correctness proof and program hand in hand. (In actual fact, the
correctness proof is developed slightly ahead of the program: after having
chosen the form of the correctness proof we make the program so that it
satisfies the proof's requirements.) This, when carried out successfully,
implies that the design remains "intellectually manageable". The second
message is that, if this constructive approach to the problem of program
correctness is to be our plan, we had better see to it that the intellectual
labour involved does not exceed our limited powers, and quite a few design
decisions fell under that heading. In the problem of the Dutch national flag,
for instance, we have been warned for the case analysis in which the number
of cases to be distinguished between is built up multiplicatively: as soon as
we admit that, we are quickly faced with a case analysis exceeding our abilities.
In the problem of the shortest subspanning tree, we have seen how a restric-
tion of the class of admissible intermediate states (here, the "red" branches
always forming a tree) could simplify the analysis considerably. But most
helpful of all -it can be regarded as a separation of concerns- has been the
stepwise approach, in which we try to deal with our various objectives one
after the other. In the problem of the shortest subspanning tree, we found by
the time that we started to worry about computation time, the N 2 -algorithm
as an improvement of the N 3 -algorithm. In the problem of the maximal
strong components, we first found an algorithm linear in the number of edges,
and only the next refinement guaranteed a fixed maximum amount of process-
ing per vertex as well. In the problem of the most isolated villages, our crude
solution was independently subjected to two very different optimizations,
and, after they had been established, it was not difficult to combine them.

As remarked above, the purpose of thinking is to reduce the detailed


reasoning needed to a doable amount. The burning question is: can "think-
ing" in this sense be taught? If I answer "No" to this question, one may well
IN RETROSPECT 217

ask why I have written this book in the first place; ifl answer "Yes" to this
question, I would make a fool of myself, and the only answer left to me is
"Up to a point ... ". It seems vain to hope -to put it mildly- that a book
could be written that we could give to young people, saying "Read this, and
afterwards you will be able to think effectively", and replacing the book by
a beautiful, interactive system for Computer-Aided Instruction ("CAI" for
the intimi) will not make this hope less vain.
But insofar as people try to understand (at first subconsciously), strive
after clarity, and attempt to avoid unmastered complexity, I believe in the
possibility of assisting them significantly by making them aware of the
human inability "to talk of many things" (at any one moment, at least), by
making them alert to how complexity is introduced. To the extent that a
professor of music at a conservatoire can assist his students in becoming
familiar with the patterns of harmony and rhythm, and with how they
combine, it must be possible to assist students in becoming sensitive to
patterns of reasoning and to how they combine. The analogy is not far-fetched
at all: a clear argument can make one catch one's breath, like a Mozart
adagio can.

You might also like