Complejidad Computacional
Complejidad Computacional
Foundations of Computing
Michael Garey and Albert Meyer, editors
Complexity Issues in VLSI: Optimal Layouts for the Shuffle-Exchange Graph and Other
Networks, Frank Thomson Leighton, 1983
Equational Logic as a Programming Language, Michael J. ODonnell, 1985
General Theory of Deductive Systems and Its Applications, S. Yu Maslov, 1987
Resource Allocation Problems: Algorithmic Approaches, Toshihide Ibaraki and Naoki Katoh,
1988
Algebraic Theory of Processes, Matthew Hennessy, 1988
PX: A Computational Logic, Susumu Hayashi and Hiroshi Nakano, 1989
The Stable Marriage Problem: Structure and Algorithms, Dan Gusfield and Robert Irving,
1989
Realistic Compiler Generation, Peter Lee, 1989
Single-Layer Wire Routing and Compaction, F. Miller Maley, 1990
Basic Category Theory for Computer Scientists, Benjamin C. Pierce, 1991
Categories, Types, and Structures: An Introduction to Category Theory for the Working
Computer Scientist, Andrea Asperti and Giuseppe Longo, 1991
Semantics of Programming Languages: Structures and Techniques, Carl A. Gunter, 1992
The Formal Semantics of Programming Languages: An Introduction, Glynn Winskel, 1993
Hilberts Tenth Problem, Yuri V. Matiyasevich, 1993
Exploring Interior-Point Linear Programming: Algorithms and Software, Ami Arbel, 1993
Theoretical Aspects of Object-Oriented Programming: Types, Semantics, and Language Design,
edited by Carl A. Gunter and John C. Mitchell, 1994
From Logic to Logic Programming, Kees Doets, 1994
The Structure of Typed Programming Languages, David A. Schmidt, 1994
Logic and Information Flow, edited by Jan van Eijck and Albert Visser, 1994
Circuit Complexity and Neural Networks, Ian Parberry, 1994
Control Flow Semantics, Jaco de Bakker and Erik de Vink, 1996
Algebraic Semantics of Imperative Programs, Joseph A. Goguen and Grant Malcolm, 1996
Algorithmic Number Theory, Volume I: Efficient Algorithms, Eric Bach and Jeffrey Shallit,
1996
Foundations for Programming Languages, John C. Mitchell, 1996
Computability and Complexity: From a Programming Perspective, Neil D. Jones, 1997
Jones, Neil D.
Computability and complexity: from a programming perspective / Neil D.
Jones.
p. cm. -- (Foundations of computing)
Includes bibliographical references and index.
ISBN 0-262-10064-9 (alk. paper)
1. Electronic digital computers -- programming. 2. Computational
complexity.
I. Title. II. Series.
QA76.6.J6658 1997
005.130 1--dc21
96-44043
CIP
Contents
Series Foreword
vii
Preface
I
ix
1 Introduction
29
47
II
65
Introduction to Computability
67
73
87
111
8 Robustness of Computability
127
III
153
167
169
189
207
215
vi Contents
IV
Introduction to Complexity
237
239
249
261
271
287
299
21 Space-bounded Computations
317
22 Nondeterministic Computations
335
339
353
Complete Problems
367
369
387
401
409
VI
Appendix
419
421
Bibliography
449
Index
460
Series Foreword
Theoretical computer science has now undergone several decades of development. The
classical topics of automata theory, formal languages, and computational complexity
have become firmly established, and their importance to other theoretical work and to
practice is widely recognized. Stimulated by technological advances, theoreticians have
been rapidly expanding the areas under study, and the time delay between theoretical progress and its practical impact has been decreasing dramatically. Much publicity
has been given recently to breakthroughs in cryptography and linear programming, and
steady progress is being made on programming language semantics, computational geometry, and efficient data structures. Newer, more speculative, areas of study include
relational databases, VLSI theory, and parallel and distributed computation. As this list
of topics continues expanding, it is becoming more and more difficult to stay abreast
of the progress that is being made and increasingly important that the most significant
work be distilled and communicated in a manner that will facilitate further research and
application of this work. By publishing comprehensive books and specialized monographs
on the theoretical aspects of computer science, the series on Foundations of Computing
provides a forum in which important research topics can be presented in their entirety
and placed in perspective for researchers, students, and practitioners alike.
Michael R. Garey
Albert R. Meyer
Preface
This book is a general introduction to computability and complexity theory. It should
be of interest to beginning programming language researchers who are interested in computability and complexity theory, or vice versa.
x Preface
xi
In the other direction, the programming language community has a firm grasp of
algorithm design, presentation and implementation, and several well-developed frameworks for making precise semantic concepts over a wide range of programming language
concepts, e.g., functional, logic, and imperative programming, control operators, communication and concurrency, and object-orientation. Moreover programming languages
constitute computation models some of which are more realistic in certain crucial aspects
than traditional models.
A concrete connection between computability and programming languages: the dryas-dust s-m-n theorem has been known in computability since the 1930s, but seemed
only a technical curiosity useful in certain proofs. Nonetheless, and to the surprise of
many people, the s-m-n theorem has proven its worth under the alias partial evaluation or
program specialization in practice over the past 20 years: when implemented efficiently,
it can be used for realistic compiling, and when self-applied it can be used to generate
program generators as well.
Another cornerstone of computability, the universal machine, is nothing but a selfinterpreter, well-known in programming languages. Further, the simulations seen in
introductory computability and complexity texts are mostly achieved by informal compilers or, sometimes, interpreters.
As for the second point above, a tension has long been felt between computability
and complexity theory on the one hand, and real computing on the other. This is at
least in part bacause one of the first results proven in complexity is the Turing machine
speedup theorem, which asserts a counterintuitive (but true) fact: that any Turing machine program running in superlinear time can be replaced by another running twice as
fast in the limit.2 The existence of efficient self-interpreters in programming language
theory leads to the opposite result: a hierarchy theorem showing, for a more realistic
computing model than the Turing machine, that constant time factors do matter. More
precisely, given time bound f (n), where n measures the size of a problem input, there
are problems solvable in time (1 + )f (n) which cannot be solved in time f (n). Thus
multiplying the available computing time by a constant properly increases the class of
problems that can be solved.
This and other examples using programming language concepts lead (at least for
computer scientists) to more understandable statements of theorems and proofs in computability and complexity, and to stronger results. Further new results include intrinsic
characterizations of the well-known problem classes logspace and ptime on the basis
2 The tension arises because the trick used for the Turing machine construction turns out to be
useless when attempting to speed up real computer programs.
xii
Preface
of program syntax alone, without any externally imposed space or time bounds.
Finally, a number of old computability and complexity questions take on new life
and natural new questions arise. An important class of new questions (not yet fully
resolved) is: what is the effect of the programming styles we employ, i.e., functional style,
imperative style, etc., on the efficiency of the programs we write?
xiii
Goals, and chapters that can be touched lightly on first reading. The books overall
computability goals are first: to argue that the class of all computably solvable problems is well-defined and independent of the computing devices used to define it, and second: carefully to explore the boundary zone between computability and uncomputability.
Its complexity goals are analogous, given naturally defined classes of problems solvable
within time or memory resource bounds.
The Church-Turing thesis states that all natural computation models are of equivalent
power. Powerful evidence for it is the fact that any two among a substantial class
of computation models can simulate each other. Unfortunately, proving this fact is
unavoidably complex since the various computation models must be precisely defined,
and constructions must be given to show how an arbitrary program in one model can be
simulated by programs in each of the other models.
Chapters 7 and 8 do just this: they argue for the Church-Turing thesis without
considering the time or memory required to do the simulations. Chapters 16, 17 and 18
go farther, showing that polynomial time-bounded or space-bounded computability are
similarly robust concepts.
Once the Church-Turing thesis has been convincingly demonstrated, a more casual
attitude is quite often taken: algorithms are just sketched, using whichever model is most
convenient for the task at hand. The reader may wish to anticipate this, and at first
encounter may choose only to skim chapters 7, 8, 16, 17 and 18.
Prerequisites
The reader is expected to be at the beginning graduate level having studied some theory,
or a student at the senior undergraduate level with good mathematical maturity. Specifically, the book uses sets, functions, graphs, induction, and recursive definitions freely.
These concepts are all explained in an appendix, but the appendix may be too terse
to serve as a first introduction to these notions. Familiarity with some programming
language is a necessity; just which language is much less relevant.
xiv
Preface
sets; Kleenes s-m-n, second recursion, and normal form theorems; recursion by fixpoints;
Rogers isomorphism theorem; and Godels incompleteness theorem.
Classical complexity results include study of the hierarchy of classes of problems:
logspace, nlogspace, ptime, nptime, pspace; the robustness of ptime, pspace and
logspace; complete problems for all these classes except the smallest; the speedup and
gap theorems from Blums machine-independent complexity theory.
In contrast with traditional textbooks on computability and complexity, this treatment also features:
1. A language of WHILE programs with LISP-like data. Advantages: programming
convenience and readability in constructions involving programs as data; and freedom from storage management problems.
2. Stronger connections with familiar computer science concepts: compilation (simulation), interpretation (universal programs), program specialization (the s-m-n
theorem), existence or nonexistence of optimal programs.
3. Relation of self-application to compiler bootstrapping.
4. Program specialization in the form of partial evaluation to speed programs up, or
to compile and to generate compilers by specialising interpreters.
5. Speedups from self-application of program specializers.
6. Simpler constructions for robustness of fundamental concepts, also including
functional languages and the lambda calculus.
7. A construction to prove Kleenes second recursion theorem that gives more efficient
programs than those yielded by the classical proof.
8. Proof that constant time factors do matter for a computation model more realistic
than the Turing machine, by an unusually simple and understandable diagonalization proof.
9. A new and much more comprehensible proof of Levins important result on the
existence of optimal algorithms;
10. Intrinsic characterizations of the problem classes logspace and ptime by restricted
WHILE programs.
11. The use of programs manipulating boolean values to characterize complete or
hardest problems for the complexity classes mentioned above.
Items 7 through 11 above appear here for the first time in book form.
xv
xvi
Preface
Acknowledgments
Many have helped with the preparation of the manuscript. Three in particular have made
outstanding contributions to its content, style, and editorial and pedagogical matters:
Morten Heine Srensen, Amir Ben-Amram, and Arne John Glenstrup, and Jens Peter
Secher and Jakob Grue Simonsen gave invaluable Latex assistance. DIKU (the Computer
Science Department at the University of Copenhagen) helped significantly with many
practical matters involving secretarial help, computing, and printing facilities. The idea
of using list structures with only one atom is due to Klaus Grue [57].
From outside DIKU I have received much encouragement from Henk Barendregt, Jim
Royer, and Yuri Gurevich. Invaluable feedback was given by the students attending the
courses at which earlier versions of the manuscript were used, and many have helped by
reading various parts, including Nils Andersen, Kristian Nielsen, and Jakob Rehof from
DIKU, Antanas Zilinskas from Vilnius, and anonymous referees from the MIT Press and
Addison-Wesley Publishing Co. Chapter 11 and the Appendix were written by Morten
Heine Srensen, chapter 20 was written by Amir Ben-Amram, and sections 9.3 and ??
were written by Torben . Mogensen.
Part I
Toward the Theory
Introduction
This book is about computability theory and complexity theory. In this first chapter we
try to convey what the scope and techniques of computability and complexity theory
are. We are deliberately informal in this chapter; in some cases we will even introduce a
definition or a proposition which is not rigorous, relying on certain intuitive notions.
The symbol 3 will be used to mark these definitions or propositions. All such
definitions and propositions will be reintroduced in a rigorous manner in the subsequent
chapters before they occur in any development.
Section 1.1 explains the scope and goals of computability theory. Sections 1.21.3
concern questions that arise in that connection, and Section 1.4 gives examples of techniques and results of computability theory. Section 1.5 describes the scope and goals of
complexity theory. Section 1.6 reviews the historical origins of the two research fields.
Section 1.6 contains exercises; in general the reader is encouraged to try all the exercises.
Section 1.6 gives more references to background material.
A small synopsis like this appears in the beginning of every chapter, but from now
on we will not mention the two sections containing exercises and references.
1.1
Computability theory asks questions such as: do there exist problems unsolvable by any
effective procedure unsolvable by any program in any conceivable programming language on any computer?
Our programming intuitions may indicate a no answer, based on the experience that
once a problem is made precise in the form of a specification, it is a more or less routine
task to write a program to satisfy the specification. Indeed, a related intuition predominated the work of Hilbert on the foundations of mathematics, as explained in section 1.6:
they conjectured that all of mathematics could be axiomatized. However, we shall see
that both of these intuitions are disastrously wrong. There are certain problems that
cannot be solved by effective procedures.
To prove this, we must make precise what is meant by an effective procedure and
what is meant by a problem. It is not a priori obvious that any single formalization of
effective procedure could be adequate; it might seem that any specific choice would be
too narrow because it would exclude computing devices with special capabilities. Thus,
4 Introduction
1.2
There are various strategies one can employ in formalizing the notion of effective procedure. Of course, we are free to define notions as we please, but the definitions should
capture the intuitive notion of effective procedure; for example, it should not be the case
that some problem is unsolvable according to our theory, but nevertheless can be solved
on a real-world computer.
Therefore it will be useful to try and analyze the notion of effective procedure and
devise a formalization so that every intuitively effective procedure can be carried out in
the formalism, and such that all the formalisms computations are effective.
1.2.1
Alan Turings analysis attempting to formalize the class of all effective procedures was
carried out in 1936 [170], resulting in the notion of a Turing machine. Its importance is
that it was the first really general analysis to understand how it is that computation takes
place, and that it led to a convincing and widely accepted abstraction of the concept of
effective procedure.
It is worth noting that Turings analysis was done before any computers more powerful
than desk calculators had been invented. His insights led, more or less directly, to John
von Neumanns invention in the 1940s of the stored program digital computer, a machine
with essentially the same underlying architecture as todays computers.
We give the floor to Turing. Note that by a computer Turing means a human who
is solving a computational problem in a mechanical way, not a machine.
Computing is normally done by writing certain symbols on paper. We may suppose
this paper is divided into squares like a childs arithemetic book. In elementary
arithmetic the two-dimensional character of the paper is sometimes used. But such
a use is always avoidable, and I think that it will be agreed that the two-dimensional character of paper is no essential of computation. I assume then that the
computation is carried out on one-dimensional paper, i.e., on a tape divided into
squares. I shall also suppose that the number of symbols which may be printed
is finite. If we were to allow an infinity of symbols, then there would be symbols
differing to an arbitrarily small extent1 . The effect of this restriction of the number
of symbols is not very serious. It is always possible to use sequences of symbols in
the place of single symbols. Thus an Arabic numeral such as 17 or 999999999999999
is normally treated as a single symbol. Similarly in any European language words
are treated as single symbols (Chinese, however, attempts to have an enumerable
infinity of symbols). The differences from our point of view between the single and
compound symbols is that the compound symbols, if they are too lengthy, cannot
be observed at one glance. This is in accordance with experience. We cannot tell
at a glance whether 9999999999999999 and 999999999999999 are the same.
The behaviour of the computer at any moment is determined by the symbols which
1 If we regard a symbol as literally printed on a square we may suppose that the square is 0 x
1, 0 y 1. The symbol is defined as a set of points in this square, viz. the set occupied by printers
ink. If these sets are restricted to be measurable, we can define the distance between two symbols as
the cost of transforming one symbol into the other if the cost of moving a unit area of printers ink unit
distance istoward unity, and there is an infinite supply of ink at x = 2, y = 0. With this topology the
symbols form a conditionally compact space. [Turings note].
6 Introduction
he is observing, and his state of mind at that moment. We may suppose that
there is a bound B to the number of symbols or squares which the computer can
observe at one moment. If he wishes to observe more, he must use successive
observations. We will also suppose that the number of states of mind which need
be taken into account is finite. The reasons for this are of the same character as
those which restrict the number of symbols. If we admitted an infinity of states
of mind, some of them will be arbitrarily close and will be confused. Again, the
restriction is not one which seriously affects computation, since the use of more
complicated states of mind can be avoided by writing more symbols on the tape.
Let us imagine the operations performed by the computer to be split up into simple
operations which are so elementary that it is not easy to imagine them further
divided. Every such operation consists of some change of the physical system
consisting of the computer and his tape. We know the state of the system if we
know the sequence of symbols on the tape, which of these are observed by the
computer (possible with a special order), and the state of mind of the computer.
We may suppose that in a simple operation not more than one symbol is altered.
Any other changes can be split up into simple changes of this kind. The situation
in regard to the squares whose symbols may be altered in this way is the same as
in regard to the observed squares. We may, therefore, without loss of generality,
assume that the squares whose symbols are changed are always observed squares.
Besides these changes of symbols, the simple operations must include changes of
distribution of observed squares. The new observed squares must be immediately
recognizable by the computer. I think it is reasonable to suppose that they can only
be squares whose distance from the closest of the immediately previously observed
squares does not exceed a certain fixed amount. Let us say that each of the new
observed squares is within L squares of an immediately previously observed square.
In connection with immediate recognizability, it may be thought that there are
other kinds of squares which are immediately recognizable. In particular, squares
marked by special symbols might be taken as immediately recognizable. Now if
these squares are marked only by single symbols there can be only a finite number of
them, and we should not upset our theory by adjoining these marked squares to the
observed squares. If, on the other hand, they are marked by a sequence of symbols,
we cannot regard the process of recognition as a simple process. This is a fundamental point and should be illustrated. In most mathematical papers the equations
and theorems are numbered. Normally the numbers do not go beyond (say) 1000.
It is, therefore, possible to recognize a theorem at a glance by its number. But if
the paper was very long, we might reach Theorem 157767733443477; then, further
on in the paper, we might find ... hence (applying Theorem 157767733443477) we
have ... In order to make sure which was the relevant theorem we should have to
compare the two numbers figure by figure, possible ticking the figures off in pencil
to make sure of their not being counted twice. If in spite of this it is still thought
that there are other immediately recognizable squares, it does not upset my contention so long as these squares can be found by some process of which my type of
machine is capable.
The simple operations must therefore include:
(a) Changes of the symbol on one of the observed squares.
(b) Changes of one of the squares observed to another square within L squares of
one of the previously observed squares.
It may be that some of these changes necessarily involve a change of state of mind.
The most general single operation must therefore be taken to be one of the following:
(A) A possible change (a) of symbol together with a possible change of state of
mind.
(B) A possible change (b) of observed squares, together with a possible change of
state of mind.
The operation actually performed is determined, as has been suggested [above] by
the state of mind of the computer and the observed symbols. In particular, they
determine the state of mind of the computer after the operation.
We may now construct a machine to do the work of this computer. To each state
of mind of the computer corresponds an m-configuration of the machine. The
machine scans B squares corresponding to the B squares observed by the computer.
In any move the machine can change a symbol on a scanned square or can change
any one of the scanned squares to another square distant not more than L squares
from one of the other scanned squares. The move which is done, and the succeeding
configuration, are determined by the scanned symbol and the m-configuration. The
machines just described do not differ very essentially from computing machines as
defined (previously) and corresponding to any machine of this type a computing
machine can be constructed to compute the same sequence, that is to say the
sequence computed by the computer.
8 Introduction
1.2.2
The machines mentioned in Turings analysis are called Turing machines. The wideranging identification of the intuitive notion of effective procedure with the mathematical
concept of Turing machine (and related identifications) has become well-known as the
Church-Turing thesis, named after Church and Turing, two pioneers of computability
[170, 22, 23].
The thesis is not amenable to mathematical proof since it identifies an intuitive notion with a mathematical concept; however we shall provide various kinds of evidence
supporting it. In one direction this is easy: the Turing machine (as well as other computational models we will introduce) is sufficiently simple that its computations are certainly
effective in any reasonable sense. In the other direction, Turings analysis is a rather
convincing argument for the Turing machines generality.
There are many other notions of effective procedure than Turing machines, e.g.,
Recursive functions as defined by Kleene [98]
The lambda calculus approach to function definitions due to Church [22, 23].
Random access machines [163]
Markov algorithms [115]
Despite considerable differences in formalism, some common characteristics of these notions are [155]:
1. An effective procedure is given by means of a set of instructions of finite size. There
are only finitely many different instructions.
2. The computation is carried out in a discrete stepwise fashion, without the use of
continuous methods or analogue devices.
3. The computation is carried out deterministically, without resort to random methods
or devices, e.g., dice.
4. There is no a priori fixed bound on the amount of memory storage space or time
available, although a terminating computation must not rely on an infinite amount
of space or time.
5. Each computational step involves only a finite amount of data.
All of the above notions of effective procedure have turned out to be equivalent. In view
of this, the Church-Turing thesis is sometimes expressed in the following form:
1. All reasonable formalizations of the intuitive notion of effective computability are
equivalent;
1.2.3
Discussions of the question whether algorithms are hardware of software resemble those
of whether the chicken or the egg came first, but are nonetheless worthwhile since much
literature on computability, and especially on complexity theory, is implicitly biased
toward one or the other viewpoint. For example, the phrase Turing machine carries
overtones of hardware, and the states of mind of Turings argument seem to correspond
to machine states.
The hardware viewpoint states that an algorithm is a piece of machinery to realize
the desired computations. The set of instructions is a specification of its architecture. At any one point in time a total machine state comprises the instruction it is
currently executing and its memory state. Larger algorithms correspond to larger pieces
of hardware.
The problem of not limiting the amount of storage can be handled several ways:
Assume given an infinite separate storage unit, e.g., Turings tape;
Assume an idealized hardware which is indefinitely expandable, though always
finite at any one point in time; or
Work with an infinite family of finite machines M1 , M2 , . . ., so larger input data is
processed by larger machines.
The last way corresponds to what is often called circuit complexity. One usually requires
the sequence M1 , M2 , . . . to be uniform, so progressively larger data are not processed by
completely disparate machines.
The software viewpoint states that the algorithm is a set or sequence of instructions.
For instance an algorithm can simply be a program in ones favorite programming language. The computing agent then interprets the algorithm; it can be a piece of hardware, or it can be software: an interpreter program written in a lower-level programming
language. Operationally, an interpreter maintains a pointer to the current instruction
within the algorithms instruction set, together with a representation of that algorithms
10 Introduction
current storage state. Larger algorithms correspond to larger interpreted programs, but
the interpreter itself remains fixed, either as a machine or as a program.
The first fully automatic computer was von Neumanns stored program machine. It
consisted of a piece of hardware, the central processing unit (CPU), specifically designed
to interpret the program stored in its memory; and this memory was physically decoupled
from the CPU. Thus the software viewpoint was present from hardwares first days and
characterizes most of todays computers. Nonetheless the distinction is becoming yet less
clear because todays chip technology allows relatively easy construction of specialpurpose digital hardware for rather complex problems, something which was impractical
only a few years ago. Further, even though Turings machine is described in hardware
terms, it was Alan Turing himself who proved the existence of a universal machine: a
single Turing machine capable of simulating any arbitrary Turing machine, when given
its input data and an encoding of its instruction set.
This book mostly takes the viewpoint of algorithm as software, though the random
access machine model will come closer to hardware.
1.3
What is a problem?
1.3.1
What is a problem?
11
1.3.2
On data representation
It might seem that the definition of an effectively computable function depends on the
notation used to represent the arguments. For instance, the addition procedure above
uses the decimal representation of natural numbers.
However, this makes no difference as long as there is an effective procedure that
translates from one notation to another and back. Suppose we have an effective procedure
p which will compute f if the argument is expressed in notation B. The following effective
procedure will then compute f in notation A:
1. Given x in notation A, translate it into notation B, yielding y.
2. Apply procedure p to y, giving z = f (y), in notation B.
3. Translate z back into notation A, giving the result.
In the remainder of this chapter we shall be informal about data representations.
1.3.3
We stress the important distinction between an algorithm and the mathematical function
it computes. A mathematical function is a set. For instance, the unary number-theoretic
function which returns its argument doubled is:
{(1, 2), (2, 4), (3, 6), . . .}
For convenience one always writes this function thus: f (n) = 2n. So, a function associates a result with each input, but does not say anything about how the result can be
12 Introduction
computed1 .
On the other hand, an algorithm is a text, giving instructions on how to proceed
from inputs to result. We can write algorithms which, when fed a representation of a
number as input, will compute the representation of another number as output, and the
connection between input and output can be described by a mathematical function. For
instance, an algorithm p may, from the representation of n, compute the representation
of 2n. In this case we say that p computes the function f (n) = 2n, and we write [[p]] = f .
We pronounce [[p]] the meaning of p.
Given a formalization of effective procedure, that is, given a programming language
L, we may ask: what mathematical functions can be computed by algorithms in the language? We say that the programming language defines the class of all such mathematical
functions:
{[[p]] | p is an L-program }
The relationship between algorithms and functions is a bit subtle. Consider, for instance,
the function f : IN IN , defined by:
(
f (n) =
0
1
(Goldbachs conjecture states that every even number greater than 2 is the sum of two
prime numbers. Whether the conjecture is true, is not known [155]). There is an algorithm computing f ; either it is the algorithm which always return the representation of
0, or it is the algorithm which always returns the representation of 1 but we do not
know which of the two yet it is.
Thus there are functions for which it is has been proved that an algorithm exists, and
yet no concrete algorithm computing the function is known2 . There are also examples of
functions where it is not yet known whether corresponding algorithms exist at all, and
there are functions for which it is known that there definitely do not exist any algorithms
that compute them. We shall soon see an example of the last kind of function.
1 If the reader is not comfortable with the notion of a function simply being a certain set, Subsection A.3.1 may be consulted.
2 This can only happen if the proof is by classical logic; in intuitionistic logic proofs of existence are
always constructive.
1.3.4
13
How can we apply the idea of an effective procedure to the problem of definition of
sets? For example the set of prime numbers seems intuitively effective, in that given an
arbitrary number we can decide whether or not it is a prime.
Definition 1.3.2 3 Given a set D, and a subset S D. S is effectively decidable iff there
is an effective procedure which, when given an object x D, will eventually answer yes
if x S, and will eventually answer no if x 6 S.
2
Note that the procedure eventually halts for any input x.
The problem of deciding some set S can sometimes equally naturally be phrased as
the problem of computing a certain function, and vice versa, as we shall see later on. An
alternative notion is to call a set effective if its elements can be listed in an effective way.
Definition 1.3.3 3 Given a set D, and a subset S D. S is effectively enumerable iff
there is an effective procedure which, when given an object x D, will eventually answer
yes if x S, and will answer no or never terminate if x 6 S.
2
The collection of all subsets of any infinite set (for example IN ) is not countable (Exercise
1.3). This can be proven by diagonalization as introduced in the next section.
On the other hand, the collections of all effectively decidable (or effectively enumerable) subsets of IN are each countable, since for each nonempty set there exists a program
computing a function that decides it (enumerates it), and there is only one empty set.
We will see that there exist effectively enumerable sets which are not effectively decidable. This, too, can be proven by diagonalization; a formal version will be seen later,
as Corollary 5.6.2.
1.4
In this section we review some of the basic results and techniques of computability in an
informal manner.
1.4.1
14 Introduction
1.4.2
Proof. We use Cantors well-known diagonal argument. Suppose the set of all functions
f : IN IN were countable. Then there would be an enumeration f0 , f1 , f2 , . . . such that
for any total function f : IN IN , there is an i such that fi = f , i.e., fi (x) = f (x) for all
x IN .
Consider the function g defined by:
g(x) = fx (x) + 1
This is certainly a total function from IN to IN . Therefore g must be fi for some i. But
this is impossible, as it implies, in particular, that
fi (i) = g(i) = fi (i) + 1
and so 0 = 1 which is impossible.5
3 More
(1.1)
2
15
The proof technique above, called diagonalization, has many applications in computability and complexity theory. To understand the name of the technique, imagine the values
of countably many functions f0 , f1 , f2 , . . . listed in an infinite table for the arguments
0, 1, 2, . . .:
n
0
1
2
..
.
f0 (n)
f0 (0)
f0 (1)
f0 (2)
..
.
f1 (n)
f1 (0)
f1 (1)
f1 (2)
..
.
f2 (n)
f2 (0)
f2 (1)
f2 (2)
..
.
..
.
For instance, the first column defines f0 . Given a countable set of total functions from IN
to IN , the diagonal method constructs a new function which differs from the ith function
on the argument i in the diagonal. Thus from any enumeration of total functions from
IN to IN , at least one total function from IN to IN must be absent.
Note that diagonalization does not directly imply the uncountability of the set of
partial functions from IN to IN , since the analog of (1.1) for partial functions is not a
contradiction in case fi (i) is undefined.
Corollary 1.4.2 The following sets are also uncountable:
1. All partial functions f : IN IN .
2. All total functions f : IN {0, 1}.
3. All total functions f : A B where A is infinite and B has at least two elements.
2
Proof. See the Exercises.
1.4.3
Proposition 1.4.3 3 The set of all effectively computable partial functions from IN to
IN is countable.
2
Proof. By the Church-Turing Thesis each effectively computable function is computed
by some Turing machine. A Turing machine can be represented as a finite string of
symbols over an alphabet consisting of English letters and mathematical and punctuation
16 Introduction
symbols. The set of all finite strings over any finite alphabet is countable, so the set of
all Turing machines is countable; hence the set of all effectively computable functions
must be countable as well.
2
Corollary 1.4.4 3 The set of all effectively computable total functions from IN to IN
is countable.
2
Proof. A subset of a countable set is countable.
Corollary 1.4.5 3
1. There exists an effectively uncomputable total function from IN to IN .
2. There exists an effectively uncomputable partial function from IN to IN .
Proof. By Corollary 1.4.2 there are uncountably many total and partial functions, but
by Proposition 1.4.3 and Corollary 1.4.4 only countably many of these are effectively
computable. If S is a countable subset of an uncountable set T then T \S 6= .
2
It follows from this that the set of computable functions is small indeed, and that there
are uncountably many uncomputable functions.
The next two subsections gives more examples.
1.4.4
The argument in the preceding subsection shows the existence of uncomputable functions,
but not in a constructive way, as no explicit well-defined but uncomputable function was
exhibited.
We now give a concrete example of an unsolvable problem: It is impossible effectively
to decide, given an arbitrary program p and input d, whether or not the computation
resulting from applying p to d halts. The following proof may be carried out in any
reasonable programming language. Assumptions:
1. Any program p has the form read X1 ,...,Xn ; C; write Y.
2. Any program p denotes a partial mathematical function [[p]] : V n V for some n,
as sketched in subsection 1.3.3.
3. The value domain V contains at least two distinct elements, which we call true
and false.
17
4. There is an effective procedure that, given any program p and input value d from
V , will execute p on d and deliver the resulting output (the value of output variable
Y.)6 .
Proposition 1.4.6 3 The total function
(
true
if [[p]](d)
program p terminates on d
halt(p, d) =
false if [[p]](d) = program p does not terminate on d
is not computed by any program.
Proof. Suppose halt were computed by some program q, i.e., for any program p and input
value d
(
true
if [[p]](d)
[[q]](p, d) =
false if [[p]](d) =
By assumption this has the form: q = read P,D; C; write Y. Now consider the following program r, built from q:
read X;
Apply program q to input (X,X); (* Does program X stop on input X? *)
if Y then
while Y do Y := Y;
(* Loop if X stops on input X
*)
write Y
(* Terminate if X does not stop on X *)
Now let us see what happens if we give r as input to the program just built, i.e., apply
r to itself: X = r.
Clearly one or the other of the two assertions [[r]](r) or [[r]](r) = must be true.
If [[r]](r), then program q will yield Y = true on input (r, r). However Y = true implies that program r, when it reaches command while Y do Y := Y;, will not terminate
on input r, a contradiction.
Conclusion: [[r]](r) = must be true. But this implies that program q will yield Y
= false on input (r, r). Thus command while Y do Y := Y; exits without looping, so
program r will terminate on input r, another contradiction.
Thus every possibility leads to a contradiction. The only unjustified assumption above
was the existence of a program q that computes halt, so this must be false.7
2
6A
18 Introduction
1.4.5
The busy beaver function below, due to Rado [149] and related to the Richard paradox
[152], is mathematically well-defined. It is just as concrete as the halting problem just
seen, and is in a sense more elementary. Based on certain reasonable assumptions about
the language used to express computation, we will show that there is no algorithm which
computes it.
Assumptions: Any program p denotes a partial mathematical function [[p]] : IN IN ,
as sketched in subsection 1.3.3. Any program p has a length |p| IN : the number of
symbols required to write p. For any n, there are only finitely many programs with
length not exceeding n.
We use programs in a small subset of Pascal [72, 174] with the following notation.
Programs have the form read X; C; write Y, where X, Y are variables. Commands
C can be either of V:=n, V:=W+1, V:=W-1, where V, W are variables and n is a number
in decimal representation (similar constructions can be carried through with unary and
other representations). Commands of the forms C;C and while X>0 do begin C end
have the usual meanings.
Observation: |p| 19 for any program p = read X;C;write Y.
Proposition 1.4.7 The total function
BB(n) = max{ [[p]](0) | p is a program with |p| n, and [[p]](0)}
2
is computed by no program.8
Proof. Suppose for the sake of contradiction that some program q computes BB:
readX; C; writeY
The proof uses a form of diagonalization. We present the idea in three small steps.
Step 1. The idea in deriving a contradiction is to find a number K and a program r
such that |r| K and [[r]](0) = [[q]](K) + 1. This implies
[[q]](K)
which is a contradiction.
8 Where
we define max = 0.
= BB(K)
Since q computes BB.
[[r]](0)
Since |r| K and [[r]](0)
= [[q]](K) + 1 By definition of r
19
1.4.6
It is not hard to write programs in the small subset of Pascal of the previous section
which do not halt, e.g.,
read X; X:=1; while X > 0 do begin X:=X end; write X
20 Introduction
The following is another proof that it is impossible effectively to decide the halting
problem.
Corollary 1.4.8 3 The total function
(
halt(p, n) =
1 if [[p]](n)
0 otherwise
Proof. Suppose, for the sake of contradiction, that such a procedure does exist. Then
BB can also be computed by the following procedure:
1. Read n.
2. Set max = 0.
3. Construct {p1 , . . . pk } = {p | p is a program and |p| n}.
4. For i = 1, 2, . . . , k do: if [[p]]i (n) and max < [[pi ]](0), then reassign max := [[pi ]](0).
5. Write max.
Step 3 is effective since there are only finitely many programs of any given size, and step
4 is effective by assumption. By the Church-Turing thesis one can turn this procedure
into a program in our subset of Pascal. The conclusion that BB is computable by a
program in this language is in contradiction with Proposition 1.4.7, so the (unjustified)
assumption that q exists must be false.
2
1.4.7
We have just argued informally that the halting problem is not decidable by any program
of the same sort. This is analogous to the classical impossibility proofs, for example that
the circle cannot be squared using tools consisting of an unmarked ruler and a compass.
Such classical impossibility proofs, however, merely point out the need for stronger tools,
for instance a marked ruler, to solve the named problems.
Our busy beaver argument similarly asserts that one particular problem, the halting
problem, cannot be solved be means of any of a class of tools: programs in our Pascal
subset. But here a major difference arises because of the Church-Turing thesis. This
gives the undecidability of the halting problem much more weight since it implies that the
halting problem is not decidable by any intuitively effective computing device whatsoever.
1.5
21
Recall that computability theory is concerned with questions such as whether a problem is solvable at all, assuming one is given unlimited amounts of space and time. In
contrast, complexity theory is concerned with questions such as whether a problem can
be solved within certain limited computing resources, typically space or time. Whereas
computability theory is concerned with unsolvable problems and the boundary between
solvable and unsolvable problems, complexity theory analyzes the set of solvable problems.
To address such questions, one must have a precise definition of space and time costs.
Granted that, complexity theory asks questions such as:
Which problems can be solved within a certain limit of time or space, and which
cannot?
Are there resource limits within which a known combinatorial problem definitely
cannot be solved?
Are there problems which inherently need more resources than others?
What characteristics of problems cause the need for certain amounts of resources?
What is the class of problems solvable within certain resource limits, and what are
the basic properties of this class?
Given a problem, what is the complexity of its best algorithm?
Do best algorithms always exist?
Does adding more resources allow one to solve more problems?
1.5.1
Polynomial time
Similarly to the situation in computability theory, one might fear that one single definition
of resource accounting would not suffice, and in fact different models exist giving rise to
different theories of complexity. Specifically, the class of problems solvable within certain
sharp limits may vary from model to model.
However, we will see that many computation models define precisely the same class
ptimeof problems decidable within time bounded by some polynomial function of the
length of the input. Many researchers identify the class of computationally tractable
problems with those that lie in ptime, thereby suggesting what could well be called Cooks
thesis, after Stephen C. Cook, a pathbreaking researcher in computational complexity:
22 Introduction
1.5.2
Ideally, one would like to be able to make statements such as the XXX problem can be
solved in time O(n3 ) (as a function of its input size); and it cannot be solved in time
O(n3 ) for any > 0. Alas, such definitive statements can only rarely be proven. There
are a few problems whose exact complexity can be identified, but very few.
Because of this, a major goal of complexity theory is classification of problems by
difficulty. This naturally leads to a division of all problems into hierarchies of problem
classes. Standard classes of problems include: logspace, nlogspace, ptime, nptime, pspace. Each class is characterized by certain computational resource bounds. For
example, problems in logspace can be solved with very little storage; those in ptime
can be solved with unlimited storage, but only by algorithms running in polynomial time;
and those in nptime can be solved by polynomial time algorithms with an extra feature:
they are allowed to guess from time to time during their computations.
Various combinations of these resources lead to a widely encompassing backbone
hierarchy:
logspace nlogspace ptime nptime pspace = npspace rec re
Historical background
23
Surprisingly, it is not known whether any one of the inclusions above is proper: for
example, the question ptime = nptime?, often expressed as P = NP?, has been open
for decades.
Nonetheless, this hierarchy has proven itself useful for classifying problems. A great
many problems have been precisely localised in this hierarchy. A typical example is SAT,
the problem of deciding whether a Boolean expression can be made true by assigning
truth values to the variables appearing in it. This problem is complete for nptime,
meaning the following. First, SAT is in nptime: There is a nondeterministic algorithm
that solves it and runs in polynomial time. Second, it is hardest among all problems in
nptime: If it were the case that SAT could be solved by a ptime algorithm, then every
problem in nptime would have a deterministic polynomial time solution, and ptime =
nptime would be true. This means that two stages of the hierarchy would collapse.
The last four chapters of this book concern complete problems for the various complexity classes.
1.6
Historical background
At the Paris Conference in 1900 D. Hilbert gave a lecture which was to have profound
consequences for the development of Mathematics, particularly Mathematical Logic, and
the not yet existing field of Computer Science. Hilberts ambitions were high and his
belief in the power of mathematical methods was strong, as indicated by the following
quote from his lecture:
Occasionally it happens that we seek the solution under insufficient hypotheses or
in an incorrect sense, and for this reason do not succeed. The problem then arises:
to show the impossibility of the solution under the given hypotheses, or in the
sense contemplated. Such proofs of impossibility were effected by the ancients, for
instance when they showed the ratio of the hypotenuse to the side of an isosceles
triangle is irrational. In later mathematics, the question as to the impossibility of
certain solutions plays a preeminent part, and we perceive in this way that old and
difficult problems, such as the proof of the axiom of parallels, the squaring of the
circle, or the solution of equations of the fifth degree by radicals have finally found
fully satisfactory and rigorous solutions, although in another sense than originally
intended. It is probably this important fact along with other philosophical reasons
that gives rise to the conviction (which every mathematician shares, but which
no one has as yet supported by a proof) that every definite mathematical problem
24 Introduction
Historical background
25
various formalisms were soon proved by Kleene, Turing, and others. In fact, one can
write compilers that turn a program in one formalism into a program in one of the
other formalisms that computes the same function, supporting what we have previously
called the Church-Turing thesis. It should be noted that this correspondence between the
algorithms in the various formalisms is a stronger result than the fact that the various
formalisms define the same class of functions.
The initial work in complexity theory in the late 1920s and early 1930s was concerned
with subclasses of the effectively computable functions, e.g., the primitive recursive functions studied by Hilbert [69], Ackermann [1], and others. Subclasses of primitive recursive
functions were studied by Kalmar [92] and Grzegorczyk [58]. More programing language
oriented versions of these classes were later introduced by Meyer and Ritchie [125].
With the appearance of actual physical computers in the 1950s, an increasing interest
emerged in the resource requirements for algorithms solving various problems, and the
field of complexity as it is known today, began around 1960. One of the first to consider
the question as to how difficult it is to compute some function was Rabin [145, 146].
Later, Blum [14] introduced a general theory of complexity independent of any specific
model of computation.
The first systematic investigation of time and space hierachies is due to Hartmanis,
Lewis, and Stearns [65, 64, 109] in the 1960s, who coined the term computational
complexity for what we call complexity theory in this book.
Important results concerning the classes of problems solvable in polynomial time and
non-deterministic polynomial time were established by Cook [26] and Karp [95] who were
among the first to realize the importance of these concepts.
Exercises
1.1 Consider the set of all Turing machine programs. Does Turing argue that the tape
symbol alphabets of different programs should be uniformly bounded in size, or may
different machines each have their own alphabets, without any uniform size bound? 2
1.2 Again, consider the set of all Turing machine programs, and assume that the tape
symbol alphabets of different programs are uniformly bounded in size.
Could one reasonably argue that the set of states of mind should be uniformly
bounded as well? Hint: What would be the effect of bounding both of these on the
number of problems solvable by Turing machines?
2
26 Introduction
1.3 Prove that P(IN ), the set of all subsets of IN , is uncountable, using the diagonal
method. Hint: if all of P(IN ) could be listed S1 , S2 , . . ., then one can find a new subset
of IN not in this list.
2
1.4 Prove that the set of all total functions IN {0, 1} is not countable.
1.5 Let A and B be sets and let S be a non-empty set of partial functions from A into
B, i.e., S A B . Show that the following conditions are equivalent.
1. S is countable.
2. There is a sequence f0 , f1 , . . . so that g S if and only if g ' fi for some i.
3. There is a surjective function u : IN S.
4. There is a function u : IN (A B ) such that g S if and only if g ' u(i) for
some i.
5. There is a partial function u : (IN A) B such that g S if and only if there
is an i IN such that g(a) ' u(i, a) for all a in A.
The reader should note that the f s, gs, etc. above are functions, and that these are not
necessarily computed by any algorithms.
2
1.6 Consider a language like the subset of Pascal in Subsection 1.4.5, but with the
following modification. Instead of commands of form
whileX > 0dobeginCend
there are only commands of form
forX := 1tondobeginCend
where n is a numerical constant, with the usual meaning. (It terminates immediately if
n < 1.) Variable X may not be re-assigned within command C.
Use a construction similar to the one in Subsection 1.4.5 to show that there is a
function which is not computable in this language. Is the function effectively computable
at all?
2
1.7 * Change the language of the previous exercise by expanding the iteration statements syntax to
forX := E1toE2dobeginCend
where E1 and E2 are numerical expressions. (X may still not be assigned within command
C.) Consider two alternative ways to interpret this statement (using a goto syntax):
Historical background
27
E1;
> E2 then goto 2
X + 1
1
Show that every program terminates under semantics 1, but that some may loop under
semantics 2.
2
References
For more on the historical development of computability theory, in particular fuller discussions of the Church-Turing Thesis, see Gandys paper [51] or Kleenes classical book
[100]. A number of early papers on computability are reprinted in Davis book [34]
with comments. This includes an English translation of Godels paper. Presentations of
Godels results for non-specialists appear in the books by Nagel and Newman [135] and
Hofstaedter [70].
More information about the scope and historical development of complexity theory
may be found in the surveys [15, 17, 29, 63, 147]. Broadly encompassing surveys of
complete problems may be found in the books by Garey and Johnson, and by Greenlaw,
Hoover, and Ruzzo [52, 56].
The notions of the introductory chapter, e.g., effectively computable, were imprecise,
because they relied on an intuitive understanding of the notion effective procedure. We
now present a model of computation, or programming language, called WHILE, which
is used throughout the book. In subsequent chapters we define the intuitive notions
of the preceding chapter precisely, by identifying effective procedure with WHILE
program.
It may seem that we avoid the vagueness of intuitive argumentation by going to the
opposite extreme of choosing one model of computation which is too simple to model
realistic computing. Later chapters will argue that this is not the case, by proving the
equivalence of WHILE with a variety of other computation models.
The WHILE language has just the right mix of expressive power and simplicity.
Expressive power is important because we will be presenting many algorithms, some
rather complex, that deal with programs as data objects. The data structures of WHILE
are particularly well suited to this, and are far more convenient than the natural numbers
used in most theory of computation texts. Simplicity is essential since we will be proving
many theorems about programs and their behaviour. This rules out the use of larger,
more powerful languages, since proofs about them would necessarily be too complex to
be easily understood.
Section 2.1 describes the WHILE syntax and informally describes the semantics of
programs. Section 2.2 precisely describes the semantics. Section 2.3 shows that equality
tests may without loss of generality be restricted to atomic values, each taking constant
time. This will be relevant later, when discussing time-bounded computations.
2.1
The syntax of WHILE data structures and programs is described in Subsections 2.1.1
2.1.2. Subsection 2.1.3 informally explains the semantics of WHILE-programs by means
of an elaborate example. Subsection 2.1.4 concerns conditionals and truth values in
WHILE, and Subsections 2.1.5-2.1.6 show how to compute with numbers and lists in
WHILE. Finally, Subsection 2.1.7 describes a useful macro notation.
2.1.1
Recall the idealized subset of Pascal that we used in Subsection 1.4.5 in which one can
compute with numbers.1 It has commands to assign an arbitrary number to a variable,
and to increment and decrement a variable by one.
The language WHILE is very similar but with one very important difference: instead
of computing with numbers, the language computes with certain trees built from a finite
set. For instance, a and (a.c) as well as (a.(b.c)) are trees built from the set {a,b,c}.
The objects a,b,c are called atoms (definition) because, unlike for instance (a.c), they
cannot be divided further into subparts. The reason we call these objects trees is
that they can be represented in a graphical form as trees with atoms as leaf labels, see
Figure 2.1.
@
@
R
@
@
d
@
R
@
@
@
R
@
@
c
@
R
@
e
((a.((b.e).c)).d)
@
@
R
@
@
@
R
@
@
@
R
@
@
R d
@
e
@
b
@
R
@
c
e
a
(a.((b.(c.e)).(d.e)))
31
using three distinct binary trees in their places, e.g., (nil.nil) for a, (nil.(nil.nil))
for b, and ((nil.nil).nil) for c.
In informal examples we will often, for the sake of human readability, use more atoms
than nil, as in Figure 2.1, but in formal definitions we only use the one atom nil.
Formally we define the set of trees ID as follows.
Definition 2.1.1 The set ID of trees is defined by:
1. The atom nil is an element of ID;
2. Whenever d1 and d2 are elements of ID, then so is (d1 .d2 ); and
3. ID is the smallest set satisfying the previous two points.
2
1
if d A
|d1 | + |d2 | if d = (d1 .d2 )
In Figure 2.1, the leftmost value has size 5, and the rightmost value has size 6.
2.1.2
Expressions
E, F
Commands
C, D
Programs
::=
|
|
|
|
|
::=
|
|
::=
X
d
cons E F
hd E
tl E
=? E F
X := E
C; D
while E do C
read X; C; write Y
(for X Vars)
(for atom d)
Here X and Y are the not necessarily distinct input and output variables.
We use indentation to indicate the scope of while and other commands. For instance,
consider the two commands:
while E do
C;
D
while E do
C;
D
The leftmost command repeatedly executes C as long as E is true and executes D once
when E has become false (what it means that an expression is true or false will be clear
later on). The rightmost command repeatedly executes first C and then D, as long as E
is true.
We also use braces to indicate scope, so the two above commands might have been
written {while E do C }; D and while E do {C;D}. Similarly we use parentheses to
explicate scope in expressions, such as cons (hd (tl X)) Y.
Note that a program always expects exactly one input. A program of, say, two inputs
can be expressed as a function which expects one input of form (d.e):2
read X;
Y := hd X;
Z := tl X;
C;
write Y
2 Comments
(* X is (d.e) *)
(* Y is d
*)
(* Z is e
*)
*), as in Pascal.
2.1.3
33
Informal semantics
We now explain the semantics of a simple program to reverse a list, which illustrates
most aspects of WHILE.
Example 2.1.4 Consider the following program, reverse:
read X;
Y := nil;
while X do
Y := cons (hd X) Y;
X := tl X;
write Y
The program consists of a read command, a body, and a write command. The idea is
that some input d ID is assigned to the variable X, and then the body is executed. At
any point during execution every variable is bound to an element of ID; the collection of
all such bindings at one point is a store. Initially X is bound to the input d ID, and all
other variables in the program are bound to nil. If execution of the body terminates,
the value e ID last bound to Y is the output.
For reverse, if X is initially bound to input
(d0 .(d1 .( .(dn1 .(dn .nil)) )))
then Y is bound to
(dn .(dn1 .( .(d1 .(d0 .nil)) )))
when execution reaches the final write command, and this later element of ID is then the
output.
To bind a variable, say Y, to some f ID one uses the assignment Y:=f. So the second
line assigns nil to Y.3
More generally every expression E evaluates to some e ID, and Z := E assigns this
e to Z. Specifically, E evaluates to e. As another example cons E F evaluates to (e.f)
if E evaluates to e and F evaluates to f. Further, hd E and tl E evaluate to e and f,
respectively, if E evaluates to (e.f). Finally, a variable Z evaluates to the value it is
currently bound to.
3 Since
all variables are initially bound to nil this command is superfluous. However it often happens
that one assigns some f ID to a variable without ever making use of the initial value nil. Therefore, if
one does want to make use of the initial value nil, it is good programming practice to enter an explicit
assignment Y := nil in the program.
The expression =? E F evaluates to true, if E and F evaluate to the same value, and to
false otherwise. Thus =? (nil.nil) (nil.nil) evaluates to true, and =? (nil.nil)
nil evaluates to false.
Turning to our program, the next thing that happens is that the while command
beginning in the third line is executed. The meaning of the command while E do C is
as follows. If E evaluates to nil proceed to the command following while E do C. In the
example this is the command write Y. However, if E evaluates to something other than
nil execute C, and test again whether E evaluates to nil. The outcome of this test may
be different from the first since the variables occurring in E may have been assigned new
values by the command C. If E evaluates to nil, go to the next command, and otherwise
execute C and test E again, etc.
So in the example program, the commands Y := cons (hd X) Y; X := tl X are
executed in sequence as long as X is not bound to nil. Before the first of these two
commands X is bound to (e.d) (otherwise execution would have proceeded to the write
command) and Y is bound to some f. After the first command Y is bound to (e.f), and
after the second command X is bound to d.
If we think of the value (d0 .(d1 .( .(dn1 .(dn .nil)) ))) as a list d0 , d1 , . . .,
dn1 , dn , then the program reverses lists; more about lists in Subsection 2.1.5.
2
2.1.4
As is apparent from the preceding example, whenever evaluating expressions in tests one
should think of nil as false and any other element of ID as true. This intuition is so
predominant that we explicate it in a definition:
Definition 2.1.5 We use the following abbreviations:
false = nil
true
= (nil.nil)
Conditional commands and boolean expressions.
commands can be expressed by while-commands alone.
Example 2.1.6 The following compound command executes C if and only if E evaluates
to true. Variable Z must be chosen different from existing variables.
Z := E;
(* if E then C *)
while Z do { Z := false; C };
35
The next statement will execute C1 if E evaluates to true and otherwise C2.
Z := E;
(* if E then C1 else C2 *)
W := true;
while Z do { Z := false; W := false; C1 };
while W do { W := false; C2 };
2
The same idea may be applied to expressions, rather than just commands, thus expressing
conjunction E and F, disjunction E or F, or negation not E, etc..
2.1.5
Lists
As one can see from the example in subsection 2.1.3, elements of ID sometimes have
deeply nested parentheses that are hard to read; one has to resort to counting to parse
an element like ((a.(b.nil)).((d.(e.nil)).nil)).
Often the nesting has a certain regular structure, because we often express a list of
elements d0 , d1 ,. . . , dn1 , dn as the tree (d0 .(d1 .( .(dn1 .(dn .nil)) ))). For
instance (a.(b.nil)) represents the list consisting of elements a, b. Therefore it would
be particularly convenient to have a short notation for this form. Hence the idea is to
use the notation (d0 dn ) for the tree (d0 .(d1 .( .(dn1 .(dn .nil)) ))). Then the
tree (a.(b.nil)) can be written (a b) in short notation and, as another example, the
tree ((a.(b.nil)).((d.(e.nil)).nil) can be written ((a b) (d e)).
This is introduced in the following definition.
Definition 2.1.7 The list representation d of d ID is the string of symbols from alphabet {nil, (, ., )} defined recursively as follows:
(
d
if d is an atom
d=
(d1 dn ) if d = (d1 .(d2 .( (dn .nil) )))
We call (d1 dn ) a list of length l(d) = n; nil is the empty list of length 0. In general,
a length may be computed for any element of ID by induction:
l(nil)
l((d1 .d2 ))
=
=
0
1 + l(d2 )
Notice that every element of ID has exactly one list representation. Henceforth we will
omit the underlines and write all values in the list form. Figure 2.2 gives some examples
of elements in ID and their list representations.
Value d in ID
nil
(a.(b.nil))
(a.((b.(c.nil)).(d.nil)))
((a.(((b.nil).nil).nil)).nil)
Representation
nil
(a b)
(a (b c) d)
((a ((b))))
|d|
1
3
6
6
l(d)
0
2
3
1
The first example in the preceding subsection can now be expressed as saying that the
program reverses lists: if X was initially bound to input (d0 dn ) then Y is bound to
(dn d0 ) when execution reaches the final write command.
2.1.6
Numbers
WHILE has only one atom, so how can we compute with numbers? One idea is to
represent the number n by a list of length n.
Definition 2.1.8 Define n = niln , where
nil0
niln+1
= nil
= ()
n
= (nil.nil ) = ( nil
.{z
. . nil} )
|
n+1
times
As a matter of convenience, we will omit underlines and simply write 0,1,2, . . . instead
of 0, 1, 2, . . . or nil0 , nil1 , nil2 , . With the representation in this definition, while E
do C means: as long as E does not evaluate to 0, execute C. As two very simple examples,
the successor and predecessor functions are computed by:
read X; (* succ *)
Y := cons nil X;
write Y
read X; (* pred *)
Y:=tl X;
write Y
Here is a program for adding two numbers (note that XY is a single variable, whose value
is a pair):
37
2.1.7
(*
(*
(*
(*
X
A
Y
B
is
is
is
is
(d.e) *)
d *)
e *)
d reversed *)
X
A
Y
B
is (d.e) *)
is d *)
is e *)
becomes d reversed *)
We will also allow names to stand for sequences of commands. Thus from now on,
programs may make free use of conditionals.
2.2
Recall from the introductory chapter the important distinction between algorithms and
the mathematical functions they compute. In this section we show how any program
in WHILE can be used to define a partial function from ID to ID. The interpretation is
nothing more than a precise statement of the informal semantics mentioned in Subsection 2.1.3.
39
Subsection 2.2.1 formalizes the notion of a store that was mentioned in Example 2.1.4.
Subsections 2.2.22.2.3 then formalize the notions of evaluation of an expression and
execution of a command, also mentioned in Example 2.1.4. Finally, Subsection 2.2.4
puts together the pieces.
2.2.1
Stores
The notation [x1 7 d1 , . . . , xn 7 dn ] denotes the finite function f such that f (xi ) = di .
The notation f [x 7 d] denotes the function g such that g(x) = d, and g(y) = f (y) for
y 6= x. See Subsection A.3.6 in the Appendix for more information.
[X 7 d, Z1 7 nil . . . , Zm 7 nil]
where Vars(p) = {X, Z1 , . . . , Zm }. Note that if Y and X are different, Y is among the
Zi .
2
2.2.2
Evaluation of expressions
Given a store containing the values of the variables in an expression E, the function E
maps E and into the value E[[E]] = d in ID that E denotes. For example E[[cons X Y]] =
((nil.nil).nil) if = [X7 (nil.nil), Y7 nil].
= (X)
= d
E[[cons E F]]
(E[[E]] . E[[F]])
(
e
if E[[E]] = (e.f)
=
nil otherwise
(
f
if E[[E]] = (e.f)
=
nil otherwise
(
true
if E[[E]] = E[[F]]
=
false otherwise
E[[hd E]]
E[[tl E]]
E[[=? E F]]
2.2.3
Execution of commands
Given a store , the relation C` 0 expresses the fact that the new store is 0 after
executing the command C in the store . (If command C does not terminate in the given
store , then there will be no 0 such that C ` 0 .) For instance,
X:=cons X Y ` [X 7 nil, Y 7 nil] [X 7 (nil.nil), Y 7 nil]
Definition 2.2.3 Define the relation ` Command Storep Storep to be the
smallest relation satisfying:
X:=E ` [X 7 d]
C;D ` 00
if E[[E]] = d
if C ` 0 and D ` 0 00
2.2.4
The function [[]] maps a program p and input value d into a value [[p]](d) = e in ID if
the program terminates. (If the program does not terminate there will be no e ID with
[[p]](d) = e.) This is done by executing C in the initial store 0p (d) (as in Definition 2.2.1)
and writing the value 0 (Y) bound to Y in the new store 0 resulting from execution of C.
41
[[p]]
We write [[p]] instead of [[p]]WHILE when no confusion is likely to arise. If there is no e such
that [[p]](d) = e, then p loops on d;4 otherwise p terminates on d. We also say that p
computes [[p]].
2
Given the precise semantics of programs one can prove rigorously such properties as
[[reverse]](d1 dn ) = (dn d1 ), see the exercises.
2.2.5
Given a program p and an input d on which p does not loop, how can we find the
corresponding output [[p]](d)? According to Definition 2.2.4 we have to find a store
such that C ` 0p (d) , and then look up Ys value in .
How do we solve the problem, given some C and store 0 , of finding a such that
C ` 0 ? This can be done by applying the rules in Definition 2.2.3 as follows.
If C has form C;D we first solve the problem of finding a 0 such that C ` 0 0 ,
and then the problem of finding a 00 such that D ` 0 00 , and then we can use
= 00 .
If C has form X := E we calculate E[[E]]0 = d and then is the same as 0 except
that X 7d.
if C has form while E do C we calculate E[[E]]0 = d. If d is nil then is 0 .
Otherwise, first solve the problem of finding a 0 such that C ` 0 0 , and then
the problem of finding a 00 such that while E do C ` 0 00 , and then we can
use = 00 .
4 In
2.3
One could argue, as in Turings analysis of Section 1.2.1, against our use of the tree
comparison operator =? on the grounds that it is not atomic enough. This can be
countered by showing how to test general equality without =?.
The following program assumes given input as a pair (d.e), and tests them for
equality:
read X;
GO := true; Y := false;
while GO do
if D then
D1 := hd D; D2 := tl D;
if D1 then
if E then
E1 := hd E; E2 := tl E;
if E1 then
D := cons (hd D1) (cons (tl D1) D2));
E := cons (hd E1) (cons (tl E1) E2))
else GO := false
else GO := false
else
if E then
if (hd E) then GO := false
else
D := tl D; E := tl E
else GO := false
else
if E then GO := false
else
Y := true; GO := false;
write Y
A few words on the correctness of this program are in order. First of all, termination is
ensured by the fact that a certain number gets smaller every time the body of the while
loop is executed; this is addressed in an exercise.
Assume that the values d and e have been assigned to program variables D and E.
Initially, Y is set to the most common output value false.
Case 1: If the cascade of tests if D, if D1, if E, if E1 are all true, then d and
e have forms ((d11 .d12 ).d2 ) and ((e11 .e12 ).e2 ). In this case D and E are re-assigned
values (d11 .(d12 .d2 )) and (e11 .(e12 .e2 )), and the loop is repeated. It is clear that the
new values for D and E are equal iff the original ones were equal.
43
The next two, Cases 2 and 3, both fail because d has form ((d11 .d12 ).d2 ) but e has
form (nil.e2 ) or nil, respectively. Then the two values cannot be equal, so the loop is
terminated by setting GO := false and not changing Y. Cases 4 also fails: d has form
(nil.d2 ) but e has form ((e11 .e12 ).e2 ).
If execution enters Case 5, d and e have form (nil.d2 ) and (nil.e2 ). For d and e
to be equal, d2 and e2 must be equal. Therefore D and E are re-assigned values d2 and
e2 , and the loop is repeated.
If execution enters Case 6, d and e have form (nil.d2 and nil, which fails.
If execution enters Case 7, d and e have form nil and (e1 .e2 ) so the loop is terminated with output false. In the final Case 8, d and e are both nil, and comparison
terminates successfully by setting output variable to Y to true.
2.3.1
Rewrite rules: The logic of the nested if commands above is not easy to follow (one
has to parenthesize them). A convenient more compact form is to write nested ifs as
a sequence of rewrite rules of form rewrite [X1, X2,...,Xn] by Rule1;...;Rulem.
Here each Xi is a variable, and a rule Rulej may have one of two forms:
1. [pat1,...,patn] [E1,...,En], or
2. [pat1,...,patn] C;
where each pati is a pattern built from new variables using nil and the tree constructor
. and C is a command.
Informal semantics: if the current values of variables X1, X2,...,Xn match patterns
pat1,...,patn (in left-to-right order), then the rule is applied. If the rule has the first
format, Ei is an expression assigning a new value to variable Xi. If the second, C is a
command that may change Xi. The right-side expressions E1,. . . ,En or command C may
contain references to variables appearing in the patterns, though not to the left of :=.
For an example, the algorithm above could be expressed using rewrite rules as:
read X;
GO := true; Y := false;
while GO do
rewrite [D, E] by
[((D11.D12).D2), ((E11.E12).E2)][(D11.(D12.D2)), (E11.(E12.E2))]
[ ((D11.D12).D2), (nil.E2) ]
GO := false;
[ ((D11.D12).D2), nil ]
GO := false;
[ (nil.D2), ((E11.E12).E2) ]
GO := false;
[ (nil.D2), (nil.E2) ]
[ D2, E2 ]
[ (nil.D2),
nil
]
GO := false;
[ nil,
(E1.E2) ]
GO := false;
[ nil,
nil
]
Y := true; GO := false;
write Y;
Such rules are easily expanded into nested if commands. For instance, the first rule
would naturally expand into:
if D then
if (hd D) then
if E then
if (hd E) then
D := cons (hd (hd D)) (cons (tl (hd D)) (tl D));
E := cons (hd (hd E)) (cons (tl (hd E)) (tl E))
and the next-to-last rule would expand to:
if D then skip
else
if E then
if (hd E) then skip
else
GO := false
The case statement: A similar construction to aid readability is the case statement,
with syntax
case E of
pat1 C1;
...
patn Cn;
45
Again, this expands into a sequence of nested if statements, and the commands C1,. . . ,Cn
may contain references to variables appearing in the patterns (though not to the left of
:=).
Exercises
2.1 Write a WHILE program that takes an input d and returns the list of atoms in d
from left to right. For instance, with d=((a.b).(a.(c.d)) the program should yield (a
b a c d) (i.e., (a.(b.(a.(c.(d.nil)))))).
2
2.2 Write a WHILE program that expects an input of the form (d1 dn ) (a list of
values), and removes adjacent occurrences of the atom nil. For instance, if the input
is (nil (nil) nil nil ((nil)) nil)), the program should yield ((nil (nil) nil
((nil)) nil))).
2
2.3 Let = {X 7 (nil.nil)}, C be while X do X:=X, and show that there is no 0 such
that C ` 0 .
2
2.4 Given d = (a b c), and let p = read X; C; write Y be the reverse program from
Subsection 2.1.3. Find a such that C ` 0p . Explain in detail how is computed.
2
2.5 Prove that [[reverse]](d1 dn ) = (dn d1 ). Hint: Proceed by induction on n.5
2.6 * This concerns the general program for testing equality in section 2.3. Consider
the weight function w : ID IN defined by:
w(d)
r(nil)
r((d1 .d2 ))
=
=
=
|d| r(d)
1
1 + r(d2 )
where
r(d) = length of right spine of d
Exercise: First, argue that this function decreases in each loop of the equality-testing
program of section 2.3. Then find an upper bound on the running time of the equalitytesting program.
2.7 Prove that the size |d| of a value d ID can be computed in time O(|d|). Hint:
modify the program for testing equality in section 2.3, so it compares d against itself,
and increases a counter niln each time a new . is found in d.
2
5 See
References
The data structure of WHILE is very similar to those of Scheme and LISP. The book
by Kent Dybvig [41] is a good introduction to Scheme. The semantics of the WHILE
language is in essence a natural semantics as one would find it in an introductory text
on programming language semantics, e.g., the books by Schmidt [158] or by Nielson and
Nielson [136].
Some other textbooks on computability and complexity use a language very similar to
WHILE, but in most cases the data structure used is numbers, rather than trees [97, 164].
The author has used structured data as well as structured programs for teaching for
several years at Copenhagen. The idea of restricting trees to the single atom nil was
due to Klaus Grue [57]. The same WHILE language was used in article [78], which
contains several results and definitions appearing later in this book.
In this chapter we are concerned with programs that take other programs as data. This
requires that programs be part of the data domain; we show how to achieve this in
Section 3.2. We then study three kinds of programs that have other programs as input
in Sections 3.33.6: compilers, interpreters, and specializers. The chapter concludes with
several simple examples of compilation in Section 3.7.
A compiler is a program transformer which takes a program and translates it into an
equivalent program, possibly in another language. An interpreter takes a program and
its input data, and returns the result of applying the program to that input. A program
specializer, like a compiler, is a program transformer but with two inputs. The first input
is a program p that expects two inputs X,Y. The other input to the program specializer
is a value s for X. The effect of the specializer is to construct a new program ps which
expects one input Y. The result of running ps on input d, is to be the same as that of
running p on inputs s and d.
The reason we emphasize these program types is that many proofs in computability
theory involve, either explicitly or implicitly, constructing an interpreter, a compiler, or
a specializer.
First we define what constitutes a programming language in Section 3.1.
3.1
More programming languages will be seen in later chapters. As was the case for
L
WHILE, we will drop L from the notation [[]] whenever L is clear from the context.
Imagine one has a computer with machine language M. How is it possible to run
programs written in another language L? We will answer this question in two steps. First,
we say what it means for language M to be able to simulate an arbitrary L program. (In
effect, this says M is at least as expressive as L.) Second, we will show how M can simulate
L, in two different ways: compilation and interpretation.
Definition 3.1.2 Suppose L-data = M-data. Language M can simulate language L if for
every p L-programs there is an m-program q such that for all d L-data we have
L
3.2
We have earlier given a syntax for WHILE-programs and WHILE-data. Suppose we want to
give a WHILE program as input to another WHILE program. Presently this is not possible simply because elements of WHILE-programs are not objects in WHILE-data. Therefore
we now give a programs-as-data representation for WHILE programs.
Definition 3.2.1 Let {:=, ;, while, var, quote, cons, hd, tl, =?, nil} denote 10
distinct elements of ID. The representation p of WHILE program p is defined by the map
shown in Figure 3.11 :
: WHILEprograms WHILEdata
1 Recall that Vars = {V , V , . . .}. While we often use X and Y to denote arbitrary elements of Vars, it
0 1
is convenient in the definition of to know the index of the variable to be coded. We assume that no
program contains a variable with higher index than its output variable.
((vari)C(varj))
C;D
while E do C
Vi :=E
=
=
=
(; CD)
(whileEC)
(:= (vari)E)
Vi
d
cons E F
hd E
tl E
=? E F
=
=
=
=
=
=
(vari)
(quoted)
(consEF)
(hdE)
(tlE)
(=? EF)
49
2
For example, if X and Y are the variables V1 and V2 , respectively, then the program written
as
read X;
Y := nil;
while X do
Y := cons (hd X) Y;
X := tl X
write Y;
would be translated to the value in ID:
(
(var 1)
(; (:= (var 2) (quote nil))
(while (var 1)
(; (:= (var 2) (cons (hd (var 1)) (var 2)))
(:= (var 1) (tl (var 1))))))
(var 2)
)
For readability we will continue to use the original syntax when writing programs, but
it should be understood that whenever a program p is input to another, it is the corresponding representation p that we have in mind.
Analogous ideas can be used for other languages L as well, though encoding programs
as data is harder if L-data is, as in classical computability texts, the set of natural
numbers.
3.3
Compilation
3.3.1
Note that we carefully distinguish between a compiling function, and a compiler, i.e. a
compiling program. Spelled out, a compiling function f satisfies for all p Sprograms
and all d Sdata:
S
T
[[p]] (d) ' [[f (p)]] (d)
2 In
Compilation
51
3.3.2
TI-diagrams
to denote the set of compilers from S to T written in L. Suppose we are given a collection
of S-programs, nature unspecified. This set can be denoted by
**
S
If we also have a compiler comp from source language S to target language T, written in
L, then we can perform translations, as described by the diagram:
**
**
source program
3 target program
S
compiler
In this book compilation will most often be described by informal constructions, and
if such diagrams are used, we could replace implementation language L above by H,
indicating human. In fact, all of our programming language translations could be
automated in principle, but going to that level of detail would be more appropriate to a
programming language course than to a theory course.
On the other hand interpreters, under the name universal programs, will be treated
more formally. They play a central role in theorems of both complexity and computability
theory. Since their structure and running times are so important, several interpreters
will be given in considerable detail by programs.
3.3.3
f
-
c
?
B
g
-
?
B
Interpretation
53
L
- S-data
c
?
T-data
3.4
c
[[f(p)]]T
?
- T-data
Interpretation
3.4.1
Definition 3.4.1 Assume that language S has programs-as-data and pairing, and that
Ldata = Sdata. Then:
1. A partial function i : Ldata Ldata is an interpreting function of S if for all
p Sprograms and d Sdata:
S
3.4.2
In this example a source program is a linear sequence of commands built from true and
Boolean variables X0 ,. . . ,Xn using boolean operations and and not. Informal syntax
is defined by the following grammar:
Program
I
::=
::=
read X0 ; I1 I2 Im ; write X0
Xi := true | Xi := Xj and Xk | Xi := not Xj
::=
::=
::=
(I1 ; I2 ; Im )
(:=true X) | (:=and X Y Z) | (:=not X Y)
nil0 | nil1 | nil2 | ...
Figure 3.2 shows an interpreter for Boolean programs. Explanation: The store =
[X0 7 d0 , , Xn 7 dn ] will be represented as a list (d0 d1 dn ). Two auxiliary functions
are detailed in the next section: lookup, which finds the value di , if given the store and i
as arguments; and update, which assigns a new value to variable Xi . Operators and and
not were defined in Section 2.1.4.
The interpreter first initializes the store by binding the input value d to variable
X0 using update. It then repeatedly dispatches on the form of the first instruction in
the remainder P of the program, and performs lookups or updates to implement the
languages three command forms. After the case command, P is reassigned to what
follows after the current instruction; so P decreases until empty.
Once the last command is executed, the value of X0 is looked up in the final store and
written out.
Interpretation
read PD;
(* Input = program and value of X0
P := hd PD; D := tl PD; (* Extract program and data from PD
Store := update 0 D nil; (* Initialize store: X0 equals D
while P do
{ case hd P of
*)
*)
*)
(:=true X)
(:=and X Y Z)
(:=not X Y)
P := tl P };
55
*)
V := lookup 0 Store;
write V
Figure 3.2: An interpreter for Boolean programs.
T := nil;
(*
K := J;
while K do
(*
T := cons (hd Store) T;
(*
Store := tl Store;
(*
K := pred K;
Store := cons V (tl Store); (*
while T do
(*
Store := cons (hd T) Store;
T := tl T;
3.5
*)
*)
*)
*)
Diagrams such as the preceding one, and more complex ones with several interpreter
blocks, compiler blocks, or other blocks put together, can be thought of as describing one
or more computer runs. For example, suppose a Lisp system is processed interpretively
by an interpreter written in Sun RISC machine code (call this M). The machine code
itself is processed by the central processor (call this C) so two levels of interpretation are
involved, as described by Figure 3.3.
**
L
L
M
M
C
Figure 3.3: Diagram of program execution with two interpretation levels.
Assume that certain languages are directly executable; typically a machine language
T, or an implementation language L for which one already has a compiler or interpreter
available. Then a composite diagram composed of several TI-diagrams is defined to be
directly executable if and only if every bottom-most diagram in it is implemented in
an executable language.
Specialization
57
**
S1
S2
**
S
T2
L
must satisfy:
S1 v S, S2 v S and T v T2
3.6
Specialization
subject
program p
stage 1
input s
stage 2
input d
?
?
'
specialized
program ps
&
= data
?
program
specializer
spec
$
output
%
= program
data values are in ovals, and programs are in boxes. The specialized program ps is first
considered as data and then considered as code, whence it is enclosed in both. Further, single arrows
indicate program input data, and double arrows indicate outputs. Thus spec has two inputs while ps
has only one; and ps is the output of spec.
59
3.7
In this section we consider some fragments of WHILE and show by means of translations
that the fragments are as expressive, in a certain sense, as the whole language. The first
section restricts the number of variables a program may use, and the second restricts the
size of expressions.
3.7.1
I is the same language as WHILE, except that its programs only contains one variable
X, which is also used for both input and output. Any WHILE program can be translated
into an I program with the same semantics.
Definition 3.7.1 The syntax of I is given by grammar of Figure 3.5. Program semantics
is as in Section 2.2.
2
Example 3.7.2 Recall the following program to reverse a list:
read X;
Y := nil;
while X do
Y := cons (hd X) Y;
X := tl X;
write Y
Expressions
E, F
Commands
C, D
Programs
::=
|
|
|
|
::=
|
|
::=
X
nil
cons E F
hd E
tl E
X := E
C; D
while E do C
read X; C; write X
The program has two variables. To convert it into an equivalent 1-variable program
pack the two into one A=(cons X Y). Whenever we need X in some expression we take
(hd A), and whenever we need Y we take (tl A). Whenever we wish to assign E to X we
assign cons E (tl A) to A, and whenever we wish to assign E to Y we assign cons (hd
A) E to A. We thus arrive at the following program.
read A;
A := cons A nil;
(* now A = cons X Y *)
while (hd A) do
A := cons (hd A) (cons (hd (hd A)) (tl A));
A := cons (tl (hd A)) (tl A);
A:= tl A;
(* write X *)
write A
For the general translation we will pack the variables X1, . . . , Xn together by consing to
form a list (X1 Xn). More efficient translated programs could be obtained by packing
into balanced trees instead of lists.
Definition 3.7.3 Define tl0 E = E and tli+1 E = tli (tlE). Given a program p with input
variable X1 and output X2, apply the transformation p defined in Figure 3.6.
Proposition 3.7.4
61
= C1 ; C2
= while E do C
= A := consT1 (. . . (consTn nil) . . .)
where Ti = E and Tj = Xj, j 6= i.
Xi
d
cons E1 E2
hd E
tl E
=? E1 E2
=
=
=
=
=
=
i1
hd(tl A)
d
cons E1 E2
hd E
tl E
=? E1 E2
3.7.2
::=
|
|
|
|
|
X
d
cons X Y
hd X
tl X
=? X Y
::=
|
|
::=
X := E
C1 ; C2
while X do C
read X; C; write Y
Note that in assignments the expression may contain at most one operator, and in while
loops the tested expression must contain no operators at all. The semantics and running
times is the same as for WHILE programs.
Any WHILE program p can be translated into a WHILE1op program with the same
semantics. The problem is to break complex expressions and while tests into simple ones.
This can be done systematically introducing new variables and assignment statements.
Example 3.7.6 The program
read XY;
X := hd XY;
Y := tl XY;
while (hd X) do
Y := cons (hd X) (cons (tl Y) (hd Y));
X := tl X ;
write Y
can be translated into:
read XY;
X := hd XY;
Y := tl XY;
Z := hd X;
while Z do
A := hd X;
B := tl Y;
C := hd Y;
D := cons B C;
Y := cons A D;
X := tl X;
Z := hd X;
write Y
We state the general translation using the informal syntax, but it could clearly be expressed via the representation introduced earlier.
Definition 3.7.7 Given a program p, construct the transformed program p by applying
the rules given in Figure 3.7 recursively. Variables Y, Y1, Y2 are fresh variables, chosen
anew every time a rule containing them is used.
Proposition 3.7.8
Exercises
3.1 Show how one can compile from Sprograms to Lprograms, if given an S-interpreter
written in L and a L-specializer. State appropriate assumptions concerning the relationships between various input and output domains.
2
3.2 Prove Proposition 3.7.4.
63
= C1 ; C2
= Y:=E ; {while Y do C ; Y:=E}
Z:=Y
Z:=d
Z:=cons E1 E2
Z:=hd E
Z:=tl E
Z:= (=? E1 E2)
=
=
=
=
=
=
Z:=Y
Z :=d
Y1:=E1 ; Y2:=E2; Z:=cons Y1 Y2
Y:=E ; Z:=hd Y
Y:=E ; Z:=tl Y
Y1:=E1 ; Y2:=E2; Z:= (=? Y1 Y2)
3.4 Can one compile an arbitrary WHILE program into an equivalent with only one variable and one operator per command, i.e. can one combine the results of Propositions 3.7.8
and 3.7.4?
A partial answer: explain what happens when these two compilations are combined. A
full answer: establish that such a compilation is possible (by a construction) or impossible
(by a proof).4
2
References
The practical and theoretical study of compilers and interpreters constitutes a branch
of Computer Science. An introduction to interpreters can be found in [93]. A good
introduction to compiler technology can be found in [3]. The compiler and interpreter
diagrams are due to Bratman [18]. As mentioned, interpretation, compilation, and spe4 Hint,
in case the answer is negative: To show that not every program in L-programs can be simulated
by some M-program, it is enough to give a property P(f ) such that a) P([[p]]M ) holds for all p M-programs,
i.e., P is satisfied by every function computable by any M-program; and b) exhibit an L-program q such
that [[q]]L does not satisfy property P.
This approach requires three things: first, find a suitable property P; second, show that it holds for
every function computed by any M-program; third, find an L-program q whose computed function fails
property P.
cialization all play important roles in computability and complexity theory, and we will
say more about all three types of programs in due course.
The practical study of specializers is yet another branch of Computer Science, also
called partial evaluation, see e.g. the textbook [89] or survey article [87].
Part II
Introduction to Computability
4.1
We first develop an interpreter in WHILE for WHILE programs that use only a single
variable, and then modify this interpreter so as to interpret the full WHILE language.
Let { :=, ;, while, var, quote, cons, hd, tl, =?, nil } denote 10 distinct elements
of ID mentioned in Definition 3.2.1, and let { dohd, dotl, docons, doasgn, dowh, do=? }
denote 6 more values in ID, distinct from the first 10 and from each other.
4.1.1
Proposition 4.1.1 There exists a WHILE program u1var such that [[u1var]](p.d) =
[[p]](d) for all p I-programs and all d WHILE-data.
2
Proof. The overall structure of the program is given in the following program fragment
where STEP is the sequence of commands in Figure 4.1 (explained below). Exercise 4.1
is to prove correctness of the algorithm.
read PD;
P := hd PD;
C := hd (tl P)
Cd := cons C nil;
St := nil;
Vl := tl PD;
while Cd do STEP;
write Vl;
(*
(*
(*
(*
(*
(*
(*
Input (p.d)
*)
P = ((var 1) C (var 1))
*)
C = hd tl p
program code is C
*)
Cd = (c.nil), Code to execute is c *)
St = nil,
Stack empty
*)
Vl = d
Initial value of var.*)
do while there is code to execute *)
Input is a program in the abstract syntax of Definition 3.2.1. (Input and output are
through the first and only variable, hence the (var 1)). The program uses three variables: Cd, St, Vl. The first is the code stack, Cd, holding the code to be executed. Intially
67
St
St
]
]
[Cr,
[Cr,
cons D St]
cons Vl St]
[((hd E).Cr),
[(dohd.Cr),
St ]
(T.Sr)]
[((tl E).Cr),
[(dotl.Cr),
St ]
(T.Sr)]
[((cons E1 E2).Cr),
[(docons.Cr),
[((=? E1 E2).Cr),
[(do=?.Cr),
St ]
(U.(T.Sr)) ]
[((; C1 C2).Cr),
[((:= (var 1) E).Cr),
[(doasgn.Cr),
[((while E C).Cr),
St
St]
Sr]
[nil, St]
this is the whole program. The second is the value stack, St, holding intermediate results.
Finally, the third variable is Vl, the store holding the current value of the single program
variable. Initially this is d, the input to program p.
The effect of the sequence of commands STEP, programmed using the rewrite shorthand notation, is to test what the next instruction in Cd is and update variables Cd, St,
Vl accordingly. Recall the skip and cons* notations from Section 2.1.7.
Expression evaluation and command execution are based on the following invariants:
[(E.Cd), St, d]
[Cd, (e.St), d]
iff
E[[E]] [X 7 d] = e
[(C.Cd), St, d]
[Cd, St, e]
iff
C ` [X 7 d] [X 7 e]
69
4.1.2
We now show how the interpreter u1var for single-variable programs can be extended
to accomodate programs using several variables. For this it is useful to have available
certain techniques which we first develop. The construction is straightforward and uses
the lookup and update functions from Section 3.4.2.
Theorem 4.1.2 There exists a WHILE program u such that for all p WHILE-programs
and all d WHILE-data we have [[p]](d) = [[u]](p.d).
2
Proof. The overall structure of the program is given in the program fragment of Figure 4.2, where STEP is similar to the earlier command sequence.
read PD;
(* Input (p.d)
Pgm := hd PD;
(* p = ((var i) c (var j))
D
:= tl PD;
(* D = d (input value)
I
:= hd (tl (hd Pgm))
(* I = i (input variable)
J
:= hd (tl (hd (tl (tl Pgm)))); (* J = j (output variable)
C
:= hd (tl Pgm))
(* C = c, program code
Vl := update I D nil
(* (var i) initially d, others nil
Cd := cons C nil;
(* Cd = (c.nil), Code to execute is c
St := nil;
(* St = nil, computation Stack empty
while Cd do STEP;
(* do while there is code to execute
Out := lookup J Vl
(* Output is the value of (var j)
write Out;
*)
*)
*)
*)
*)
*)
*)
*)
*)
*)
*)
St ]
St]
(T.Sr) ]
2
The program u is called a self-interpreter in programming language theory, because it
interprets the same language as it is written in. In computability theory u is called a
universal program, since it is capable of simulating any arbitrary program p.
4.2
Recall the interpreter u1var for one-variable WHILE programs constructed in Section 4.1.1.
We obtain a universal program for I by applying methods from Section 3.7 to u1var.
Program u1var is not a self-interpreter for I, since it itself uses more than one variable,
for example Cd and St. We now describe how a 1-variable universal program can be built,
using the example compilations from Section 3.7.
71
We now construct from u1var a true self-interpreter i for I. This is easily done,
since the packing technique of Proposition 3.7.4, translates program u1var into an
equivalent one-variable program i with [[i]]I = [[i]]WHILE = [[u1var]]WHILE . We have thus
proven
Theorem 4.2.1 There exists a self-interpreter i for I using the concrete syntax of Definition ??.
Exercises
4.1 * Prove that [[p]](d) = [[u1var]](p.d) for all 1-variable p WHILE-programs and all d
ID. This can be done by induction on the lengths of computations of program execution
and execution of the interpreter.
2
4.2 Show that for any WHILE-program p without any WHILE commands and for all
d ID, it holds that [[u1var]](p.d). This can be done by induction on the length of p. 2
4.3 Extend the WHILE language with a construction repeat C until E, with a Pascallike semantics. Explain the semantics informally, e.g. when is E evaluated? Extend u1var
so as to interpret this new construction (still for programs with one variable).
2
References
A universal program first appeared in Turings paper [170], and in practically every book
on computability published since then. The universal program for I much resembles the
one sketched in [85].
Chapter ?? set up our model WHILE of computation, Chapter 3 gave a way to pass
WHILE programs as input to other WHILE programs, and Chapter 4 showed the
existence of universal programs. We are now in a position to state and prove some of the
fundamental results of computability theory, including those that were informally proven
in Chapter 1.
Section 5.1 defines the notions of computable function and decidable set, and the two
related notions of semi-decidable and enumerable sets. Section 5.2 presents a specializer for WHILE programs. Section 5.3 proves that the halting problem is undecidable.
Section 5.4 proves that all properties of WHILE programs that depend only on the programs input-output behaviour are undecidable. Section 5.5 proves some properties of
decidable and semi-decidable sets, and Section 5.6 shows that the halting problem is
semi-decidable. Section 5.7 proves some properties of enumerable and semi-decidable
sets.
5.1
A set A will be called decidable if the membership question for A can be answered by a
program that always terminates. If the program possibly loops on elements outside A,
the set will be called semi-decidable.
We will show semi-decidability equivalent to enumerability, where a set A is called
enumerable if there is some program that lists all and only the elements of A in some
order. This allows repetitions, and does not necessarily list As elements in any specific
order, for instance the order need not be increasing or without repetitions.
Definition 5.1.2
73
1. A set A ID is WHILE decidable iff there is a WHILE program p such that [[p]](d)
for all d ID, and moreover d A iff [[p]](d) = true.
2. A set A ID is WHILE semi-decidable iff there is a WHILE-program p such that for
all d ID: d A iff [[p]](d) = true.
3. A set A ID is WHILE enumerable iff A = or there is a WHILE program p such
that for all d ID : [[p]](d), and A = {[[p]](d) | d ID}.
2
5.2
Recall from Chapter 3 the notion of a specializer. We now prove that there exists a
program specializer from WHILE to WHILE written in WHILE.
Theorem 5.2.1 There is a WHILE program spec such that for all p WHILEprograms
and s WHILEdata, [[spec]](p.s) WHILEprograms, and for all d WHILEdata
[[[[spec]](p.s)]](d) = [[p]](s.d)
Proof. Given a program p:
read X; C; write Y
Given input s, consider the following program ps
read X; X := cons s X; C; write Y
It clearly holds that [[p]](s.d) = [[ps ]](d). It therefore suffices to write a program that
transforms the pair (p.s) into ps , when both ps and p are expressed as data values in
ID. The program p is expressed as data by:
((var i) C (var j))
where C is the data representation of C. Then ps expressed as data is:
((var i) (; (:= (var i) (cons (quote s) (var i))) C) (var j))
Transformation from p to ps is done using the following program, spec, which uses the
list notation of Section 2.1.7. The "cons", ":=" and ";" in ConsExp :=..., NewC:=...,
and AssignX:=... are distinct values in ID, as in Definition 3.2.1.
read PS;
P
S
Vari
C
Varj
:=
:=
:=
:=
:=
QuoteS :=
ConsExp :=
AssignX :=
NewC
:=
NewP
:=
write NewP;
hd
tl
hd
hd
hd
PS;
PS;
P;
(tl P)
(tl (tl P));
list
list
list
list
list
(*
(*
(*
(*
(*
(*
75
*)
*)
*)
*)
*)
*)
"quote" S;
"cons" QuoteS Vari;
":=" Vari ConsExp;
";" AssignX C ;
Vari NewC Varj;
2
The same idea can be generalized to specialize programs accepting m + n arguments to
their first m arguments. This is known in recursive function theory as Kleenes s-m-n
theorem, and plays an important role there.
The specializer above is quite trivial, as it just freezes the value of X by adding a
new assignment. It seems likely that spec could sometimes exploit its knowledge of ps
first input more extensively, by performing at specialization time all of ps computations
that depend only on s. This can indeed be done, and is known in the programming
languages community as partial evaluation. We return to this topic in the next part of
this book.
5.3
We now show that the unsolvability of the halting problem for WHILE-programs implies
that many other problems are unsolvable. This also includes many natural problems, as
we shall see in Chapter 10.
Theorem 5.3.1 The total function
(
true
if a = (p.d) and [[p]](d)
halt(a) =
false otherwise
is not computed by any WHILE-program.
The halting problem above is formulated as the problem of computing the function
halt; as such it is uncomputable. One can also formulate the same problem as one of
deciding membership of the subset of ID:
HALT = {(p.d) | p WHILEprograms, d WHILEdata, and [[p]](d)}
It is easy to see that this set is undecidable. If it were WHILE decidable, it would follow
easily that halt is computable. Similarly, if halt were WHILE computable, it would
follow immediately that HALT is WHILE decidable.
5.4
Rices theorem
Rices theorem shows that the unsolvability of the halting problem is far from a unique
phenomenon; in fact, all nontrivial extensional program properties are undecidable.
Definition 5.4.1
1. A program property A is a subset of WHILE-programs.
2. A program property A is non-trivial if {} 6= A 6= WHILE-programs.
3. A program property A is extensional if for all p, q WHILEprograms such that
[[p]] = [[q]] it holds that p A if and only if q A.
2
In other words, a program property is specified by divisiding the world of all programs into
two parts: those which have the property, and those which do not. A non-trivial program
property is one that is satisfied by at least one, but not all, programs. An extensional
program property depends exclusively on the programs input-output behaviour, and
so is independent of its appearance, size, running time or other so-called intensional
characteristics.
An example property of program p is the following: is [[p]](nil) = nil? This is
extensional, since [[p]] = [[q]] implies that [[p]](nil) = nil if and only if [[q]](nil) = nil.
On the other hand, the following program property is nonextensional: is the number of
variables in p more than 100? This is clear, since one can have two different programs p,
q that compute the same input-output function [[p]] = [[q]] : ID ID , but such that one
has more than 100 variables and the other does not.
Theorem 5.4.2 If A is an extensional and nontrivial program property, then A is undecidable.
2
Rices theorem
77
Proof. Assume that nontrivial A is both extensional and decidable. We will show that
this implies that the halting problem is decidable, which it is not. Let b be a program
computing the totally undefined function: [[b]](d) = for all d ID, e.g.,
read X; while true do X := X; write Y
Assume to begin with that A contains b. By extensionality, A must also contain all other
programs computing the totally undefined function. By nontriviality of A there must be
a program c in WHILE-programs which is not in A.
We now show how the halting problem (is [[p]](e) = ?) could be solved if one had a
decision procedure for A. Suppose we are given a program of form:
p
in WHILE-programs, and a value e ID of its input, and we want to decide whether [[p]](e)
= . Without loss of generality, programs p and c have no variables in common (else
one can simply rename those in p). Construct the following program q (using the macro
notation of Subsection 2.1.7):
read X;
Resultp := p e;
Resultc := c X;
write Resultc
(* Read X
*)
(* First, run program p on the constant e *)
(* Then run program c on input X
*)
Clearly if [[p]](e), then [[q]](d) for all d ID. On the other hand, if [[p]](e), then [[q]](d)
= [[c]](d) for all d ID. Thus
(
[[b]] if [[p]](e) =
[[q]] =
[[c]] if [[p]](e) 6=
If p does not halt on e then [[q]] = [[b]], so extensionality and the fact that b A implies
that q A. If p does halt on e then [[q]] = [[c]], and again by extensionality, c
/ A implies
q
/ A. Thus p halts on e if and only if q
/ A, so decidability of A implies decidability of
the halting problem.
The argument above applies to the case b A. If b
/ A then exactly the same
argument can be applied to A = WHILE-programs\A. Both cases imply the decidability
of the halting problem, so the assumption that A is decidable must be false.
2
In conclusion, all nontrivial questions about programs input-output behaviour are
undecidable. For example
5.5
In this section we present some results about WHILE decidable and semi-decidable sets.
In one of these results we encounter the first application of our interpreter u.
Theorem 5.5.1
1. Any finite set A ID is decidable.
2. If A ID is decidable then so is ID \ A.
3. Any decidable set is semi-decidable.
4. A ID is decidable if and only if both A and ID \ A are semi-decidable.
Proof.
1. If A = {d1, . . . , dn} ID, then it can be decided by program
read X;
if (=? X d1) then X
if (=? X d2) then X
.
.
.
if (=? X dn) then X
X
write X
:= true else
:= true else
:= true else
:= false;
79
4. Only if follows from 3 and 2. For if, we use a technique called dovetailing. The
idea is to simulate two computations at once by interleaving their steps, one at a
time1 . Suppose now that A is semi-decided by program p:
read X1; C1; write R1
and that ID \ A is semi-decided by program q:
read X2; C2; write R2
where we can assume that C1 and C2 have no variables in common.
Given d ID, if d A then [[p]](d) = true, and if d ID \ A then [[q]](d) = true.
Consequently one can decide membership in A by running p and q alternately, one
step at a time, until one or the other terminates with output true.
This is easily done using the universal program for WHILE; the details are left
to the reader in an exercise.
2
Theorem 5.5.2
1. If A, B are decidable then A B and A B are both decidable.
2. If A, B are semi-decidable then A B and A B are both semi-decidable.
5.6
Theorem 5.3.1 established that the halting problem is undecidable. Now we show that
it is semi-decidable.
Theorem 5.6.1 The halting problem for WHILE-programs is semi-decidable.
5.7
It is not hard (though not as easy as for IN ) to show that the elements of ID can be
enumerated in sequence, one at a time:
Lemma 5.7.1
1. There is an enumeration d0 , d1 , . . . of all elements of ID such that d0 = nil, and
no elements are repeated;
2. There are commands start and next such that for any i 0, the value of variable
New after executing [[start; next; ...; next]] (with i occurrences of next) is di .
Program start:
L := ();
N := (nil);
New := hd N;
Program next:
N := tl N;
Old := L;
Tmp := cons (cons New New) nil;
while Old do
Tmp := cons (cons New (hd Old)) Tmp;
Tmp := cons (cons (hd Old) New) Tmp;
Old := tl Old;
N := append N Tmp;
L := cons New L;
New := hd N;
Figure 5.1: Enumerating ID.
Proof. Figure 5.1 shows WHILE codes for start, next. Explanation: they follow the
defining equation ID = {nil} ID ID, using the fact that if X ID and d
/ X, then
(X {d}) (X {d}) = X X {(d.d)}
{(d.x) | x X} {(x.d) | x X}
81
The trees created are placed on the list N. They are moved to the list L once they have
served their purpose in creating bigger trees, and New will always be the first element
of N. Thus initially, N contains the single tree nil and L is empty. Every time next is
performed, one tree New is removed from the list N and paired with all the trees that are
already in L as well as with itself. The trees thus created are added to N, and New itself
is added to L.
The following claims are easy to verify:
(1) Every iteration adds a single element to L.
(2) Every element of ID is eventually put on L.
5.7.1
The program of Figure 5.2, given input d, computes [[p]](d0 ), [[p]](d1 ), . . ., and compares d to each in turn. If d = di for some i, then p terminates after writing true.
If d 6= di for all i then p will loop infinitely, sufficient for 2.
2 3. Assume that A is semi-decided by program p of form read I; C; write R, and
construct the program q of Figure 5.3.
read I;
Save := I;
C;
if R then SKIP else while true do SKIP;
write Save
Figure 5.3: 2 3.
Clearly [[p]](d) and [[p]](d) = true together imply [[q]](d) = d. On the other hand,
if either [[p]](d) or [[p]](d) 6= true, then [[q]](d). Thus d A iff [[q]](d) = d, so
A = {[[q]](e) | e ID and [[q]](e)}.
3 1. If A = then 1 holds trivially, so assume A contains at least one member d0, and
that A is the range of partial function [[p]], where p=((var nil1 ) C (var nil1 )),
i.e. A = rng([[p]]). Define f such that f (nil) = d0 and
(
[[p]](d) if p stops when applied to d within |e| steps
f (e.d) =
d0
otherwise
f is obviously total. Claim: A = rng(f ). Proof of : if a A = rng([[p]]) then
a = [[p]](d) for some d ID. Thus p, when applied to d, terminates within some
number of steps, call it m. Then clearly
f (1m .d) = [[p]](d) = a
so a rng(f ). Proof of : Values in the range of f are either of form [[p]](d) and so
in the range of [[p]] and so in A, or are d0 which is also in A. Finally, the program
q of Figure 5.4, using the STEP macro from the universal program u, computes f .
2 4. A program p which semi-decides A can be modified to loop infinitely unless its
output is true, hence 2 implies 4. If p is as in 4, replacing its write command by
write true gives a program to semi-decide A.
2
read TD;
D := tl TD;
Vl := update nil D
Cd := cons C nil;
St := nil;
Time := hd TD;
while Cd do
STEP; Time := tl
if (=? Time nil)
83
(* Input (t.d)
*)
(* D = d
*)
nil (* (var nil1 ) initly d, others nil *)
(* Cd = (C.nil), Code to execute is C *)
(* St = nil,
Stack empty
*)
(* Time = t,
Time bound is t
*)
(* Run p for up to t steps on d
*)
Time;
then Cd := nil; (* Abort if time out *)
if Time
(* Output d0 if time ran out, else nil1 value *)
then Out := lookup nil1 Vl else Out := d0;
write Out;
Figure 5.4: 3 1.
5.7.2
The preceding theorem justifies the following definition of two of the central concepts
of computability theory. Even though at this point only WHILE and I languages have
been considered, we will see as a result of the robustness results of Chapter 8 that the
concepts are invariant with respect to which computing formalism is used.
Definition 5.7.3 A set A is recursive (also called decidable) iff there is a program [[p]]
that decides the problem x A?. A set A is recursively enumerable (or just r.e., for
short) iff there is a program [[p]] that semi-decides the problem x A?.
Exercises
5.1 Consider a language WHILE-forloop which is just like WHILE, except that instead of
the while command, WHILE-forloop has a command
for X := alltails(E) do C
Its informal semantics: First, E is evaluated to yield a value d. If d = (d1.d2), then X is
first bound to d, and command C is executed once. The same procedure is now repeated
with X being bound to d2. In this way command C is executed repeatedly, until X is
bound to the atom nil (which must eventually happen). At that time the for command
terminates and control goes to the next command.
1. Define the semantics of WHILE-forloop by rules similar to those for the WHILEFL
semantics, and semantic function [[p]] .
2. Show how WHILE-forloop programs can be translated into equivalent whileprograms.
3. Prove that your construction in (2) is correct using the semantics for WHILE and
your semantics from (1) for WHILE-forloop.
4. Is the halting problem decidable for WHILE-forloop-programs?
5. Can all computable functions be computed by WHILE-forloop-programs?
FL
5.2 Define the total function g by: g(p) = not [[p]] (p) any WHILE-forloop-program p.
Prove that g is not computable by any WHILE-forloop-program; and prove that g is
WHILE-program computable.
Consequence: the WHILE-forloop language cannot simulate the WHILE language. 2
5.3 Prove that it is undecidable whether a given program computes a total function. 2
5.4 Hint: show that it is undecidable whether a program computes the identity function,
and derive the more general result from this.
2
5.5 Use Rices theorem to prove that unnecessary code elimination is undecidable: given
a program p
read X; C1; while E do C; C2; write Y
with an identified while command, it is undecidable whether test E will be false every
time control reaches the command.
2
5.6 Prove Theorem 5.5.1 part 4. Hint: you will need two copies of the universal program.
2
5.7 * Prove Theorem 5.5.2. Hint: the results for decidable A, B are straightforward, as
is semi-decidability of A B. For semi-decidability of A B, use Theorem 5.7.2, or the
dovetailing technique of Theorem 5.5.1, Part 4.
2
5.8 List the first 10 elements of ID as given in Lemma 5.7.1.
5.9 Use induction to prove the two claims made about the enumeration of ID in the proof
of Lemma 5.7.1.
2
85
5.10 * The pairs in list Tmp (Lemma 5.7.1) are added to the end of list N by append.
Show that the simpler alternative of adding them to the start of N does not work. What
goes wrong in the proof of the previous Exercise 5.9 if this change is made?
2
5.11 Devise alternative start and next commands that take only O(n) time when next
is called, where n is the length of list L. Hint: find a faster way to achieve the effect of
append. More variables may be used, if convenient.
2
5.12 * Devise alternative start and next commands that take only constant time when
next is called. Hint: at each next call the only essential action is that a new element is
added to L. Find a way to defer the addition of elements to N until needed. One method
can be found in [21].
2
5.13 Show that if an infinite set is WHILE enumerable, then it is WHILE enumerable
without repetitions (i.e. the range of a one-to-one effective total function).
2
5.14 Let ID be ordered as in Lemma 5.7.1. Show that an infinite set A can be enumerated in increasing order (i.e., is the range of a strictly increasing function) if and only if
it is decidable.
2
5.15 Show that a set A 6= is decidable if it is
the range of a WHILE computable total monotonic function; or
the range of a WHILE computable total function greater than the identity.
5.16 * Show that any infinite WHILE enumerable set must contain an infinite WHILE
decidable subset. Hint: use the result of Exercise 5.14.
2
5.17 Show that there exists a fixed program p0 such that determination of whether
[[p0 ]](d) terminates for a given d ID is undecidable.
References
Most of the results proved in this chapter appear in classical papers by the pioneers
in computability theory. The s-m-n theorem was proved by Kleene in the paper [98],
and also appears in his book [100]. The halting problem was studied first by Kleene
[98], Turing [170], and Church [22, 23]. Rice [151] developed a general technique to
prove undecidability of sets. A universal program first appeared in Turings paper [170].
Properties of recursively decidable and enumerable sets, and their relationship, were
studied by Kleene [99] and Post [142, 143].
6.1
6.2
Interpretation overhead
In the first subsection we discuss overhead in practice, i.e. for existing interpreters, and
the second subsection is concerned with self-application of interpreters. It will be seen
that interpretation overhead can be substantial, and must be multiplied when one interpreter is used to interpret another one.
Section 6.4 will show how this overhead can be removed (automatically), provided
one has an efficient program specializer.
6.2.1
In the present and the next subsection, we are concerned with interpreters in practice,
and therefore address the question: how fast can an interpreter be, i.e. what are the lower
bounds for the running time of practical interpreters. Suppose one has an S-interpreter
int written in language L, i.e.
S
int
L
In practice, assuming one has both an L-machine and an S-machine at ones disposal,
interpretation often turns out to be rather slower than direct execution of S-programs.
If an S-machine is not available, a compiler from S to L is often to be preferred because
Interpretation overhead
89
the running time of programs compiled into L (or a lower-level language) is faster than
that of interpretively executed S-programs.
In practice, a typical interpreter ints running time on inputs p and d usually satisfies
a relation
p timeSp (d) timeLint (p.d)
for all d. Here p is a constant independent of d, but it may depend on the source
program p. Often p = c + f (p), where constant c represents the time taken for dispatch
on syntax and f (p) represents the time for variable access. In experiments c is often
around 10 for simple interpreters run on small source programs, and larger for more
sophisticated interpreters. Clever use of data structures such as hash tables, binary
trees, etc. can make p grow slowly as a function of ps size.
6.2.2
If the purpose is to execute S-programs, then it is nearly always better to compile than
to interpret. One extreme: if S = L, then the identity is a correct compiling function
and, letting q = [[comp]](p) = p, one has timeSp (d) = timeLq (d): considerably faster than
the above due to the absence of p . Less trivially, even when S 6= L, execution of a
compiled S-program is nearly always considerably faster than running the same program
interpretively.
6.2.3
Layers of interpretation
v
>L0
intL1
L0
6
**
L2
L2
L1
L1
L0
v
> L1
intL2
L1
v
L2
Time
consumption
Nested Interpreter
application
and
L1
12 timeL2
(p2.d)
p2 (d) time
int21
where 01 , 12 are constants representing the overhead of the two interpreters (often
sizable, as mentiond in the previous section).
Consequently replacing p1 in the first by int21 and d by p2.d, and multiplying the
second inequality by 01 we obtain:
91
(int21 .(p2.d))
(p2.d) timeL0
01 timeL1
int1
int2
0
L1
(p2.d)
01 12 timeL2
p2 (d) 01 time
int21
2
L0
Thus 01 12 timeL2
p2 (d) timeint1 (int1 .(p2.d)), confirming the multiplication of inter0
pretive overheads.
6.3
The term bootstrapping comes from the phrase to pull oneself up by ones bootstraps
and refers to the use of compilers to compile themselves. The technique is widely used
in practice, including industrial applications. Examples are numerous. We choose a
common one, that of extending an existing compiler for language S to accept a larger
language S0 , based on the following assumptions:
1. The new language S0 is a conservative extension of S. By definition this means that
every S-program p is also an S0 -program (so S-programs S0 -programs), and has
0
the same semantics in both languages (so [[p]]S = [[p]]S ).
2. We have a compiler h S-programs, from source language S to target language
S
T available in source form. By definition of compiler, [[p]]S = [[[[h]] (p)]]T for any
S-program p.
3. Further, we assume that we have an equivalent program t T-programs available
in target form, so [[h]]S = [[t]]T .
S
high-level compiler h
low-level compiler t
S
S0
high-level compiler
h0
This must be equivalent to h on the old source language S, so for all S-programs1
p, we have [[[[h]]S (p) ]]T = [[[[h0 ]]S (p) ]]T .
2. Now apply t to h0 to obtain an S0 compiler t10 in target language form:
S0
high-level compiler h0
low-level compiler t
S0
T
3 low-level compiler t10
Now we have obtained the desired extended compiler t10 = [[t]]T (h0 ) . It is easy to see
that it is a target program equivalent to h0 , since:
[[t10 ]]T
=
=
=
(substitution of equals)
Since t and h are equivalent
h compiles source program h0 from S to T.
S0
S0
T
3 low-level compiler t20
S0
S0
T
3 low-level compiler t30
1 Note that this does not require h and h0 to produce the same target code, just target code which
will have identical effects when run.
93
Combining these runs, we get a compound diagram like those seen in [3]:
h0
t30
S0
h0
S0
S0
S0
S0
h0
S0
0
t2
0
t1
=
=
=
[[t]]T (h0 )
[[t10 ]]T (h0 )
[[t20 ]]T (h0 )
Now t10 and t20 (and t30 ) are semantically equivalent since they are all obtained by
correct compilers from the same source program, h0 :
[[t10 ]]T
=
=
=
=
=
=
by definition of t10
Since t and h are equivalent
since h0 is a conservative extension of h
since t is a compiler from S to T
by definition of t10
by definition of t20
Note that t10 and t20 may not be textually identical, since they were produced by two
different compilers, t and t10 , and it is quite possible that the extended language S0 may
require different target code than S.
However, one easily sees that t20 and t30 are textually identical since the compilers
used to compile them are semantically equivalent:
t20
=
=
=
by definition of t20
Since t10 and t20 are equivalent: [[t10 ]]T = [[t20 ]]T
by definition of t30
The difference between being semantical and syntactical identity of the produced compilers stems from the relationship between the compilers we start out with: t and h are
equivalent in the sense that given the same input program they produce syntactically the
same output program. However h and h0 are equivalent on S programs only in the sense
that given the same program, the two output programs they produce are semantically
equivalent (natural: when one revises a compiler, the old target code may need to be
modified).
Note that bootstrapping involves self-application in the sense that (compiled versions
of) h0 are used to compile h0 itself. Note also that self-application is useful in that it
eases the tasks of transferring a compiler from one language to another, of extending a
compiler, or of producing otherwise modified versions.
6.4
read XY;
X := hd XY;
Y := tl XY;
P := 0;
while Y do
Y := pred Y;
T := X;
while T do
T := pred T;
P := succ P;
write P;
(*
(*
*)
*)
(*
(*
*)
*)
(*
*)
95
Suppose that we want to specialize this program so that X is 3 = nil3 . Then we could
get the following program:
read Y;
P := 0;
while Y do
Y := pred
P := succ
P := succ
P := succ
write P;
Y;
P;
P;
P;
Rather than calling the first program with arguments of form (3.d) it is clearly better
to use the second, more efficient program. A typical partial evaluator, i.e. specializer,
will be capable of transforming the former into the latter.
6.4.1
6.5
This section shows the sometimes surprising capabilities of partial evaluation for generating program generators. We will see that it is possible to use program specialization
to compile, if given an interpreter and a source program in the interpreted language; to
convert an interpreter into a compiler:
S
by specializing the specializer itself; and even to generate a compiler generator. This is
interesting for several practical reasons:
Interpreters are usually smaller, easier to understand, and easier to debug than
compilers.
An interpreter is a (low-level form of) operational semantics, and so can serve as
a definition of a programming language, assuming the semantics of L is solidly
understood.
The question of compiler correctness is completely avoided, since the compiler will
always be faithful to the interpreter from which it was generated.
The results are called the Futamura projections since they were discovered by Yoshihiko
Futamura in 1971 [48]. We consider for simplicity only specialization without change
in data representation. That is, we assume that all the languages below have concrete
syntax and pairing, and that all the data languages are the same. Suppose we are given
a specializer spec from L to T written in an implementation language Imp.
an interpreter int for S-programs which is written in language L; and
an arbitrary S-program source.
6.5.1
target = [[spec]]
(int.source)
97
is a T-program equivalent to S-program source, i.e. that one can compile by partial
evaluation. (This is a solution of Exercise 3.1.)
This equation is often called the first Futamura projection [48], and can be verified
as follows, where in and out are the input and output data of source.
S
Assumption
Definition 3.4.1 of an interpreter
Definition 3.6.1 of a specializer
Definition of target
In other words, one can compile a new language S to the output language of the specializer, provided that an interpreter for S is given in the input language of the specializer.
Assuming the partial evaluator is correct, this always yields target programs that are
correct with respect to the interpreter. This approach has proven its value in practice.
See [11, 90, 89] for some concrete speedup factors (often between 3 and 10 times faster).
A common special case used by the Lisp and Prolog communities is that Imp = T = L,
so one can compile from a new language S to L by writing an S-interpreter in L.
Speedups from specialization As mentioned before, compiled programs nearly always run faster than interpreted ones, and the same holds for programs output by the
first Futamura projection. To give a more complete picture, though, we need to discuss
two sets of running times:
1. Interpretation versus execution:
timeint (p.d) versus timeintp (d)
2. Interpretation versus specialization plus execution:
timeint (p.d) versus timespec (int.p) + timeintp (d)
If program p is to be run just once, then comparison 2 is the most fair, since it accounts
for what amounts to a form of compile time. If, however, the specialized program
intp is to be run often (e.g. as in typical compilation situations), then comparison 1 is
more fair since the savings gained by running intp instead of int will, in the long term,
outweigh specialization time, even if intp is only a small amount faster than int.
6.5.2
The second equation shows that one can generate an S to T compiler written in T, provided
that an S-interpreter in L is given and Imp = L: the specializer is written in its own input
language. Concretely, we see that
L
=
=
=
[[spec]] (int.source)
L
T
[[[[spec]] (spec.int)]] (source)
T
[[compiler]] (source)
Equation compiler = [[spec]] (spec.int) is called the second Futamura projection. The
compiler generates specialized versions of interpreter int. Operationally, constructing a
compiler this way is hard to understand because it involves self-application using spec
to specialize itself. But it gives good results in practice, and faster compilation than by
the first Futamura projection.
6.5.3
=
=
=
[[spec]] (spec.int)
L
T
[[[[spec]] (spec.spec)]] (int)
T
[[cogen]] (int)
The compilers so produced are versions of spec itself, specialized to various interpreters.
This projection is even harder to understand intuitively than the second, but also gives
good results in practice.
99
The following more general equation, also easily verified from Definition 3.6.1, sums
up the essential property of cogen (we omit language L for simplicity):
[[p]] (s.d) = [[[[spec]] (p.s) ]] d = ... = [[[[[[cogen]] p ]] s ]] d
Further, cogen can produce itself as output (Exercise 6.9.)
While the verifications above by equational reasoning are straightforward, it is far
from clear what their pragmatic consequences are. Answers to these questions form the
bulk of the book [89].
6.5.4
A variety of partial evaluators generating efficient specialized programs have been constructed. Easy equational reasoning from the definitions of specializer, interpreter, and
compiler reveals that program execution, compilation, compiler generation, and compiler
generator generation can each be done in two different ways:
out
target
compiler
cogen
=
=
=
=
[[int]](source.input)
[[spec]](int.source)
[[spec]](spec.int)
[[spec]](spec.spec)
=
=
=
=
[[target]](input)
[[compiler]](source)
[[cogen]](int)
[[cogen]](spec)
The exact timings vary according to the design of spec and int, and with the implementation language L. We have often observed in practical computer experiments [90, 89]
that each equations rightmost run is about 10 times faster than the leftmost. Moral:
self-application can generate programs that run faster!
6.5.5
The right side of Figure 6.2 illustrates graphically that partial evaluation can substantially reduce the cost of the multiple levels of interpretation mentioned in Section 6.2.3.
A literal interpretation of Figure 6.2 would involve writing two partial evaluators,
one for L1 and one for L0. Fortunately there is an alternative approach using only one
partial evaluator, for L0. For concreteness let p2 be an L2-program, and let in, out be
representative input and output data. Then
v
>L0
int10
v
v
v
>
>
int21
int10
spec
spec
?
?
v
v
v
L1
>
int21
v
L2
Two levels of
interpretation
Language
Language
Language
L2
L1
L0
time
consumption
satisfying
By partial evaluation of int20 , any L2-programs can be compiled to an equivalent L0-program. Better still, one may construct a compiler from L2 into L0 by
comp20 := cogenL0 (int20 )
The net effect is that metaprogramming may be used without orderofmagnitude loss
of efficiency. The development above, though conceptually complex, has actually been
realized in practice by partial evaluation, and yields substantial efficiency gains.
6.6
Totality
It is clearly desirable that specialization function [[spec]] is total, so every program p and
partial input s leads to a defined output ps = [[spec]](p.s).
101
Computational completeness
The significant speedups seen in the examples above naturally lead to another demand:
that given program p and partial data s, all of ps computations that depend only on its
partial input s will be performed.
Unfortunately this is in conflict with the desire that [[spec]] be total. Suppose, for
example, that program ps computations are independent of its second input d, and that
[[p]] is a partial function. Then computational completeness would require [[spec]](p.s) to
do all of pa computation on s, so it would also fail to terminate whenever [[p]](s.d) = .
This is a problem, since nobody likes compilers or other program transformers that
sometimes loop infinitely!
A typical example which is difficult to specialize nontrivially without having the
specializer fail to terminate is indicated by the program fragment
if
then
else
complex-but-always-true-condition-with-unavailable-input-d
X := nil
while true do S := cons S S;
One cannot reasonably expect the specializer to determine whether the condition will always be true. A specializer aiming at computational completeness and so less trivial than
that of Section 5.2 will likely attempt to specialize both branches of the if statement,
leading to nontermination at specialization time.
A tempting way out is to allow ps to be less completely specialized in the case that
[[p]](s.d) = , e.g. to produce a trivial specialization as in Section 5.2. This is, however,
impossible in full generality, as it would require solving the halting problem.
Some practical specializers make use of run-time nontermination checks that monitor
the static computations as they are being performed, and force a less thorough specialization whenever there seems to be a risk of nontermination. Such strategies, if capable
of detecting all nontermination, must necessarily be overly conservative in some cases;
for if perfect, they would have solved the halting problem.
Optimality
It is desirable that the specializer be optimal when used for compiling, meaning that
spec removes all interpretational overhead. This can be made somewhat more precise,
given a self-interpreter sint:
L
sint
L
By definition of interpreter and specialization (or by the first Futamura projection), for
every d ID
[[p]](d) = [[sintp ]](d)
where sintp = [[spec]](sint.p). Thus program sintp is semantically equivalent to p.
One could reasonably say that the specializer has removed all interpretational overhead
in case sintp is at least as efficient as p. We elevate this into a definition:
Definition 6.6.1 Program specializer spec is optimal for a self-interpreter sint in case
for every program p and data d, if sintp = [[spec]](sint.p) then
time sintp (d) time p (d)
This definition of optimality has proven itself very useful in constructing practical
evaluators [89]. For several of these, the specialized program sintp is identical up to
variable renaming to the source program p. Further, achieving optimality in this sense has
shown itself to be an excellent stepping stone toward achieving successful and satisfactory
compiler generation by self-application.
An open problem. Unfortunately there is a fly in the ointment. The condition just
proposed is a definition relative to one particular self-interpreter sint. It could therefore
be cheated, by letting spec have the following structure:
read Program, S;
if Program = sint
then Result := S
else Result := the trivial specialization of Program to S;
write Result
On the other hand, it would be too much to demand that spec yield optimal specializations of all possible self-interpreters. Conclusion: the concept of optimality is
pragmatically a good one, but one which mathematically speaking is unsatisfactory. This
problem has not been resolved at the time of writing, and so could be a research topic
for a reader of this book.
103
p=
6.7
Suppose program p expects input (s.d) and we know what s but not d will be. Intuitively, specialization is done by performing those of ps calculations that depend only on
s, and by generating code for those calculations that depend on the as yet unavailable
input d. A partial evaluator thus performs a mixture of execution and code generation
actions the reason Ershov called the process mixed computation [45], hence the
generically used name mix for a partial evaluator (called spec in Chapter 3). Its output
is often called the residual program, the term indicating that it is comprised of operations
that could not be performed during specialization.
For a simple but illustrative example, we will show how Ackermanns function (seen
earlier in Section 6.4.1) can automatically be specialized to various values of its first
parameter. Ackermanns function is useless for practical computation, but an excellent
vehicle to illustrate the main partial evaluation techniques quite simply. An example is
seen in Figure 6.3. (The underlines should be ignored for now.) Note that the specialized
program uses less than half as many arithmetic operations as the original.
Computing a(2,n) involves recursive evaluations of a(m,n) for m = 0, 1 and 2, and
various values of n. The partial evaluator can evaluate expressions m=0 and m-1 for the
needed values of m, and function calls of form a(m-1,...) can be unfolded (i.e. replaced
by the right side of the recursive definition above, after the appropriate substitutions).
More generally, three main partial evaluation techniques are well known from program
transformation: symbolic computation, unfolding function calls, and program point specialization. Program point specialization was used in the Ackermann example to create
specialized versions a0, a1, a2 of the function a.
On-line and Off-line Specialization. Figure 6.3 illustrates off-line specialization,
an approach that makes use of program annotations, indicated there by underlines. The
alternative is called on-line specialization: computing program parts as early as possible,
taking decisions on the fly using only (and all) available information.
These methods sometimes work better than off-line methods. Program p2 in Figure
6.3 is a clear improvement over the unspecialized program, but can obviously be improved
even more; a few online reductions will give:
a2(n) = if n=0 then 3 else a1(a2(n-1))
a1(n) = if n=0 then 2 else a1(n-1)+1
In particular, on-line methods often work well on structured data that is partially static
and partially dynamic. On the other hand they introduce new problems and the need
for new techniques concerning termination of specializers. For a deeper discussion of the
merits of each approach, see [89].
6.7.1
We assume given:
f1(s,d)
=
g(u,v,...) =
...
h(r,s,...) =
expression1
expression2
105
expressionm
2. Annotations that mark every function parameter, operation, test, and function
call as either eliminable: to be performed/computed/unfolded during specialization,
or residual: generate program text to appear in the specialized program.
In particular the parameters of any definition of a function f can be partitioned into
those which are static and the rest, which are dynamic. For instance m is static and n is
dynamic in the Ackermann example.
The specialized program will have the same form as the original, but it will consist
of definitions of specialized functions gstatvalues (program points), each corresponding to
a pair (g, statvalues) where g is defined in the original program and statvalues is
a tuple consisting of some values for all the static parameters of g. The parameters of
function gstatvalues in the specialized will be the remaining, dynamic, parameters of g.
A specialization algorithm
Assumptions:
1. The input program p is as above, with defining function given by f1(s,d) =
expression1, and static s and dynamic d.
2. Every part of p is annotated as eliminable (no underlines) or residual (underlined).
3. The value of s is given.
In the following, variables Seenbefore and Pending both range over sets of specialized
functions gstatvalues . Variable Target will always be a list of (residual) function definitions.
1. Read Program and S. (Program p and static input value s.)
2. Pending := {f1S }; Seenbefore := {};
3. While Pending is nonempty do the following:
4. Choose and remove a pair gstatvalues from Pending, and add it so Seenbefore if
not already there.
5. Find gs definition g(x1,x2,...)
6.
= g-expression.
xn) = f-expression
107
6.7.2
Where do the annotations used by the algorithm above come from? Their primal source
is knowledge of which inputs will be known when the program is specialized, for example
m but not n in the Ackermann example. There are two further requirements for the
algorithm above to succeed.
First, the internal parts of the program must be properly annotated (witness comments such as if . . . the annotation is incorrect). The point is that if any parameter
or operation has been marked as eliminable, then one needs a guarantee that it actually
will be so when specialization is carried out, for any possible static program inputs. For
example, an if marked as eliminable must have a test part that always evaluates to a
constant. This requirement (properly formalized) is called the congruence condition in
[89].
The second condition is termination: regardless of what the values of the static inputs
are, the specializer should neither attempt to produce infinitely many residual functions,
nor an infinitely large residual expression.
It is the task of binding-time analysis to ensure that these conditions are satisfied.
Given an unmarked program together with a division of its inputs into static (will be
known when specialization begins) and dynamic, the binding-time analyzer proceeds to
annotate the whole program. Several techniques for this are described in [89]. The
problem is complex for the following reason:
1. A specializer must account for all possible runtime actions, but only knows the
value of static data. It thus accounts for consequences one step into the future.
2. A binding-time analyzer must account for all possible runtime actions, but only
knows which input values will be static, but not what their values are. It thus
accounts for computational consequences two steps into the future.
The current state of the art is that congruence is definitely achieved, whereas bindingtime analyses that guarantee termination are only beginning to be constructed.
Exercises
6.1 Section 6.3 assumed one already had compilers for language S available in both source
form h and target form t. In practice, however, writing target code is both involved and
error-prone, so it would be strongly preferable only to write h, and the by some form of
bootstrapping obtain t satisfying [[h]]S = [[t]]T .
Explain how this can be done, assuming one only has a compiler for language S
available in source form h. Start by writing an interpreter int for S in some existing and
convenient executable language L.
2
6.2 Find another way to accomplish the same purpose.
6.5 Explain informally the results claimed in Section 6.5.4, e.g. why compilation
T
by target = [[compiler]] (source) should be faster than compilation by target =
L
[[spec]] (int.source).
2
6.6 Prove that [[p]] (s.d) = [[[[[[cogen]] (p) ]] (s) ]] (d)
6.7 * Apply the algorithm sketched in Section 6.7.1 to the program of Figure 6.3 with
static input m = 2.
2
6.8 Find an appropriate set of annotations (underlines) for the multiplication program
specialized In Section 6.4.
2
6.9 Prove that cogen = [[cogen]] (spec) .
109
References
As mentioned earlier, the possibility, in principle, of partial evaluation is contained in
Kleenes s-m-n Theorem [100] from the 1930s. The idea to use partial evaluation as a
programming tool can be traced back to work beginning in the late 1960s by Lombardi
and Raphael [112, 111], Dixon [39], Chang and Lee [20], and Sandewalls group [9],
Futamura showed the surprising equations which are nowadays called the Futamura
projections in a paper from 1971 [48]. Essentially the same discoveries were made independently in the 1970s by A.P. Ershov [43, 44, 45] and V.F. Turchin [168, 169]. Gl
uck
and others have described other ways of combining interpreters, compilers, and specializers, see e.g. [53]. The first implementation of a self-applicable partial evaluator was done
at Copenhagen in 1984 [90]. Much of the material in this chapter stems from [89].
In the 1980s and 1990s partial evaluation became a research field of its own, with
the first conference in 1988 [12]. For more historical information and references, see
[49, 46, 89, 86].
7.1
7.1.1
Data: trees built from one atom nil, strings built from
two symbols 0, 1
We assume without loss of generality that TM-data = {0, 1} , since a Turing machine with
a larger tape alphabet can be simulated with at most linear loss of time, by one that
works on symbols encoded as strings in {0, 1} by encoding each symbol in an k-symbol
alphabet as a block of dlog ke bits.
Our presentation of Turing machines is nonclassical because it has a programmed
control and a fixed tape alphabet {0, 1}. A later section on the speedup theorem will
use the classical model, defined in Section 7.6.
7.1.2
Control structures
Each of the computational models GOTO, TM, RAM, and CM has an imperative control
structure, expressible by a program which is a finite sequence of instructions: p
=
I1 I2 ... Im . Sometimes this will be written with explicit labels: p = 1: I1 2:
I2 ... m: Im m+1: . The exact form of each instruction I` will be different for the
various machine types. At any point in its computation, the program will be in a state
of form
s = (`, ) where
A terminal state has label ` = m + 1, indicating that the computation has terminated.
To describe computations we use the common judgment forms:
Judgment form:
Read as:
[[p]](x) = y
p ` s s0
p ` s s0
113
In any one run, the store will be initialized according to the program input, and the
programs computed result will be read out from the final store. Details differ from
machine to machine, so we assume given functions of the following types, to be specified
later for each model:
Readin :
Readout :
L-data
L-store
L-store
L-data
Finally, we can define the effect of running program p on input x by: [[p]](x) = y if
1. 0 = Readin(x)
2. p ` (1, 0 ) (m + 1, ),1 and
3. y = Readout()
7.2
::=
|
X := nil | X := Y | X := hd Y | X := tl Y
X := cons Y Z | if X goto ` else `0
(m + 1, ) is a terminal state.
1:
2:
3:
4:
5:
6:
7:
8:
Y := nil;
if X goto 4;
goto 8;
Z := hd X;
Y := cons Z Y;
X := tl X;
goto 2;
X:= Y
Note how the combination of if and goto simulates the effect of while.
Definition 7.2.2 Consider a program p = I1 ... Im . Let Vars(p)= {X,Z1...,Zn}
be the set of all variables in p, and let X be a distinguished input-output variable.
1. A store for p is a function from Vars(p) to ID. A state for p is a pair (`, ) where
1 ` m + 1 and is a store for p.
2. Readin(d) = [X 7 d, Z1 7 nil, . . . Zn 7 nil].
3. Readout() = (X).
4. The one-step transition rules for GOTO appear in Figure 7.1.
(`, )
(`, )
(`, )
(`, )
(`, )
(`, )
(`, )
(`, )
(`, )
(` + 1, [X 7 nil])
(` + 1, [X 7 (Y)])
(` + 1, [X 7 d])
(` + 1, [X 7 nil])
(` + 1, [X 7 e])
(` + 1, [X 7 nil])
(` + 1, [X 7 (d.e)])
(`0 , )
(`00 , )
If
If
If
If
If
If
If
If
If
I`
I`
I`
I`
I`
I`
I`
I`
I`
=
=
=
=
=
=
=
=
=
X:=nil
X:=Y
X:=hd Y and (Y) = (d.e)
X:=hd Y and (Y) = nil
X:=tl Y and (Y) = (d.e)
X:=tl Y and (Y) = nil
X:=cons Y Z and (Y) = d, (Z) = e
if X goto `0 else `00 and (X) 6= nil
if X goto `0 else `00 and (X) = nil
7.3
115
First, TM-data = {0, 1} , so an input is a bit string. A Turing machine has one
or more tapes. Each tape is a two-way infinite sequence of squares, where a square
contains a symbol from a finite tape alphabet A including the blank symbol B. During
a computation the squares contents may be tested or overwritten. At any time during
a computation there will only be finitely many nonblank symbols on any tape.
In the literature the tape alphabet can sometimes be arbitrarily large, but we use
{0, 1, B} for simplicity and because it only makes small constant changes in running
times: the same reasons for restricting the GOTO language to the one atom nil.
In a computational total state at some moment, each of the machines read/write
heads is scanning one current square on each tape, and it is about to perform one of
its program instructions. This directs the machine to do one of the following for one of
the tapes: write a new symbol on the tape, replacing the previous scanned tape squares
contents; move its read/write head one square to the left or to the right; or compare the
contents of its scanned square against a fixed symbol and then transfer control to one
instruction if it matches, and to another instruction if not.
...B B 0 1 0 1 B B ...
Finite
>
state
control
...B 1 1 0 0 B B B ...
(program)
:
p
..
...
X.X
XXX
XX
XXX
z
...B 0 0 1 1 1 1 B ...
Tape 1 (input)
...
The following grammar describes TM-prog by giving the syntax of both instructions and
data. Subscript j, 1 j k, indicates which tape is involved. For one-tape Turing
machines the subscript will be omitted.
I:
S, S0 :
L, R :
:
Instruction
Symbol
String
Tapes
Tape
::=
::=
::=
::=
::=
A store is a k-tuple of two-way conceptually infinite tapes. The tapes must be represented finitely in order to define the transition rules. One way is to include all nonblank
symbols, so a full tape is obtained by appending infinitely many blanks to each end of a
finite tape representation. A full storage state consists of a store in which the scanned
symbol will be underlined. Thus we define
TM-store = { (L1 S1 R1 , . . . , Lk Sk Rk ) | Li , Si , Ri as above }
Here the underlines mark the scanned symbols Si , and Li and Ri are (perhaps empty)
strings of symbols.
Inputs and outputs are strings in TM-data = {0, 1} , are found on the first tape,
and consist of all symbols to the right of the scanned symbol, extending up to but not
including the first blank. The store initialization and result readout functions are defined
as follows:
Readin(x)
= (B x, B, . . . , B) Start just left of input
Readout(L1 S1 R1 , L2 S2 R2 , . . . , Lk Sk Rk ) = Pfx (R1 )
Tape 1, right to first B
where
(
Pfx (R) =
if R = or if R begins with B
0
S Pfx (R ) if R = S R0 and S = 0 or 1
Finally, the effect of a one-tape Turing machine one-step transition is defined as in Figure
7.3, where I` is the instruction about to be executed and S, S0 {0, 1, B} are tape symbols.
Extension to multiple tapes is straightforward but notationally tedious, and so is omitted.
7.4
A counter machine program has as storage a finite number of counters (also called registers or cells) X0, X1, X2,. . . , each holding a natural number. Thus CM-data = IN .
Program instructions allow testing a counter for zero, or incrementing or decrementing
. = 0 and (x+1) 1
. = x for x IN ). All
a counters contents by 1 (where by definition 0 1
p ` (`, L S S0 R)
p ` (`, L S)
p ` (`, LS0 S R)
p ` (`, S R)
p ` (`, L S R)
p ` (`, L S R)
p ` (`, L S R)
(` + 1, LS S0 R)
(` + 1, LS B)
(` + 1, LS0 S R)
(` + 1, B SR)
(` + 1, L S0 R)
(`0 , L S R)
(`0 , L S R)
If
If
If
If
If
If
If
I`
I`
I`
I`
I`
I`
I`
=
=
=
=
=
=
=
117
right
right
left
left
write S0
if S goto `0
if S0 goto `0 else `00 and S 6= S0
::=
Xi := Xi + 1 | Xi := Xi .
- 1 | if Xi=0 goto ` else `0
= [0 7 x, 1 7 0, 2 7 0, . . .] Input in counter 0
= (0)
Output from counter 0
Any one program can only reference a fixed set of counters. Thus for any store used to
execute it, (i) = 0 will hold for all but a fixed finite set of indices. Finally, the counter
machine one-step transition rules are defined as in Figure 7.4.
7.5
This machine is an extension of the counter machine which more closely resembles current
machine languages. It has a number of storage registers containing natural numbers (zero
if uninitialized), and a much richer instruction set than the counter machine. The exact
p ` (`, ) (` + 1, [i 7 j + 1])
p ` (`, ) (` + 1, [i 7 j 1])
p ` (`, ) (` + 1, [i 7 0])
p ` (`, ) (`0 , )
p ` (`, ) (`00 , )
If
If
If
If
If
I`
I`
I`
I`
I`
=
=
=
=
=
Xi
Xi
Xi
if
if
:= Xi + 1 and (i) = j
:= Xi .
- 1 and (i) = j 6= 0
.
:= Xi - 1 and (i) = 0
Xi=0 goto `0 else `00 (i) = 0
Xi=0 goto `0 else `00 (i) 6= 0
Finite
state
control
(program)
91
Register 0
Register 1
13
Register 2
..
.
0
..
.
Register i
..
.
119
codes. One reason is that there is no built-in limit to word size or memory address
space: it has a potentially infinite number of storage registers, and each may contain
an arbitrarily large natural number. Even though any one program can only address
a constant number of storage registers directly, indirect addressing allows unboundedly
many other registers to be accessed.
The following grammar describes the SRAM instruction syntax.
I
::=
|
Xi := Xi + 1 | Xi := Xi .
- 1 | if Xi=0 goto ` else `0
Xi := Xj | Xi := <Xj> | <Xi> := Xj
While this machine resembles the counter machine, it is more powerful in that it allows
programs to fetch values from and store them into cells with computed addresses. The
intuitive meaning of Xi := <Xj> is an indirect fetch: register Xjs contents is some
number n; and that the contents of register Xn are to be copied into register Xi. Similarly,
the effect of <Xi> := Xj is an indirect store: register Xis contents is some number m;
and the contents of register Xj are to be copied into register Xm.
This version is nearly minimal, but will suffice for our purposes. More general RAM
models seen in the literature often have larger instruction sets including addition, multiplication, or even all functions computed by finite-state automata with output, operating on their argments binary representations. We will argue that such extensions
do not increase the class of computable functions. They can, however, affect the class
of polynomial-time solvable problems, as the more powerful instructions can allow constructing extremely large values within unrealistically small time bounds.
The RAM storage has the form
SRAM-store = { | : IN IN }
where (j) is the current contents of register Xj. Further,
Readin(x)
Readout()
= [0 7 x, 1 7 0, . . .]
= (0)
Input in register X0
From register X0
Even though one program can directly reference only a fixed set of registers, the indirect operations allow access to registers not appearing in the program text (perhaps
unboundedly many). On the other hand, the store is initialized to zero except for its
input register, so at any point during a computation only finitely many registers can
contain nonzero values. Consequently the machine state can be represented finitely (in
fact we will see that an SRAM can be simulated by a Turing machine).
The SRAM one-step transition rules are defined as in Figure 7.6.
p ` (`, )
p ` (`, )
p ` (`, )
p ` (`, )
p ` (`, )
(` + 1, [i 7 (i) + 1])
(` + 1, [i 7 (i) 1])
(` + 1, [i 7 0])
(` + 1, [i 7 0])
(`0 , )
If
If
If
If
If
p ` (`, )
(`00 , )
If
p ` (`, )
p ` (`, )
p ` (`, )
(` + 1, [i 7 (j)])
(` + 1, [i 7 ((j))])
(` + 1, [(i) 7 (j)])
If
If
If
I` = Xi := Xi+1
I` = Xi := Xi .
- 1 and (i) 6= 0
I` = Xi := Xi .
- 1 and (i) = 0
I` = Xi := 0
I` = if Xi=0 goto `0 else `00
and (i) = 0
I` = if Xi=0 goto `0 else `00
and (i) 6= 0
I` = Xi := Xj
I` = Xi := <Xj>
I` = <Xi>:= Xj
7.6
We will later on prove certain results for which it matters whether one chooses the
formulation of Turing machines above, or the classical formulation usually adopted in
the literature. Therefore we now briefly review the classical definition.
Definition 7.6.1 A k-tape classical Turing machine is a quintuple
(, Q, `init , `f in , T )
where
1. is a finite alphabet containing a distinguished symbol B;
2. Q is a finite set of states, including `init , `f in ; and
3. T is a set of tuples of form
(`, (a1 , b1 , M1 ), . . . , (ak , bk , Mk ), `0 )
where
(a) a1 , . . . , ak , b1 , . . . , bk ;
(b) M1 , . . . , Mk {, , }; and
(c) `, `0 Q and ` 6= `f in .
The Turing machine is deterministic if for every ` and a1 , . . . , ak there exists at most one
b1 , . . . , bk , M1 , . . . , Mk , and `0 such that (`, (a1 , b1 , M1 ), . . . , (ak , bk , Mk ), `0 ) T .
2
121
It is perhaps easiest to understand the definition by comparison with the previous definition of Turing machines. Whereas the previous definition insisted that every Turing
machine use the same tape alphabet {0, 1, B}, the present definition allows each machine
to have its own tape alphabet . Moreover, whereas the previous Turing machine was
controlled by a sequence of labeled commands, we now have instead a set of states Q,
and a set of transitions T between these states. Roughly, every state ` Q corresponds
to a label in the earlier definition, and every transition t T corresponds to a command.
Consider, for instance, a 1-tape Turing machine with transition:
(`, (a, b, M ), `0 )
Such a transition may also be written more simply as a quintuple:
(`, a, b, M, `0 )
The meaning of the transition is: in state `, if the scanned square contains a, then
overwrite this a with b, perform an action as specified by M , and go to state `0 , where
the different values of M are interpreted as follows:
:
:
:
The tape to the left and right of the scanned symbol are at all times finite. In the
situation where one moves, say, to the right and the tape to the right is empty, we simply
add a blank.
This is all made precise in the following definition.
Definition 7.6.2 Given a k-tape Turing machine M = (, Q, `init , `f in , T ).
1. A configuration of M is an element of Q ( )k .
2. One configuration C leads to another C 0 , notation C ; C 0 , if
C
C0
and there is a transition (`, (a1 , b1 , M1 ), . . . , (ak , bk , Mk ), `0 ) T such that for all
i = 1, . . . , k both i = ai , and:
(a) if Mi = then
i. if Li = then L0i = , i0 = B, and R0i = bi Ri ;
ii. if Li = with , then L0i = , i0 = , and R0i = bi Ri .
(b) if Mi = then L0i = Li , i0 = bi , and R0i = Ri
(c) if Mi = then
i. if Ri = then R0i = , i0 = B, and L0i = bi Li ;
ii. if Ri = with then R0i = , i0 = , and L0i = bi Li .
3. C leads to C 0 in m steps, notation C ;m C 0 , if there is a sequence of configurations
C1 , . . . , Cm+1 such that C = C1 and C 0 = Cm+1 .
4. For x, y (\{B}) we write M (x) = y, if for some m
(`init , (, B, x), (, B, ), . . . , (, B, )) ;m
(`f in , (L1 , 1 , yR1 ), (L2 , 2 , R2 ), . . . , (Lk , k , Rk ))
where R1 is either or begins with B.
5. M decides a set L (\{B}) , if
(
1 for every x L
M (x) =
0 for every x (\{B}) \L
2
Example 7.6.3 Here is a 1-tape Turing machine M that takes a number in the unary
number system as input, and returns its successor as output, i.e., M (x) = x1 for all unary
numbers x.
123
1. = {0, 1, B};
2. Q = {`1 , `2 , `3 , `4 };
3. `init = `1 ;
4. `f in = `4 ;
5. T = {(`1 , B, B, , `2 ), (`2 , 1, 1, , `2 ), (`2 , B, 1, , `3 ), (`3 , 1, 1, , `3 ), (`3 , B, B, , `4 )}
The machine is started with scanned symbol B, blank tape to the left, and the input
1 1 to the right. Therefore it begins (first transition) by moving one step to the right.
Then (second transition) it moves one step to the right as long as it sees 1s. When it
reaches a blank after the 1s, it replaces the blank by an extra 1 (third transition). It
then moves to the left to get back to the initial blank (fourth transition), and when it
arrives, it terminates (fifth transition).
Here is a more clever machine computing the same function:
1. = {0, 1, B};
2. Q = {`1 , `2 };
3. `init = `1 ;
4. `f in = `2 ;
2
5. T = {(`1 , B, 1, , `2 )}.
Note that every transition must write something to the scanned square. In order to
simply move the read/write head one must write the same symbol to the scanned square
as is already present. For instance,
(`1 , B, B, , `2 )
is the first transition in the example above which moves the read/head one square to the
right. It is convenient to let nop be an abbreviation for the triple (B, B, ). In case we
know the scanned square is a blank, this operation neither moves the read/write head
nor writes anything to the tapeit performs a no-operation.
Exercises
7.1 Show that a program with several one-dimensional arrays can be simulated in a RAM.
2
7.2 Show that it is not necessary to assume that every memory cell is initialized to 0.
7.6 * This exercise and the next concern the construction of a self-interpreter for SRAM
programs. Part 1: devise an appropriate way to represent the instruction sequence
comprising an SRAM program as SRAM data in memory. (Hint: you may wish to use more
than memory cell to contain one instruction.)
2
7.7 * Part 2: Sketch the operation of the self-interpreter for SRAM programs. This can
store the program to be interpreted in odd memory locations, and can represent memory
cell loc of the program being interpreted by the interpreters memory cell 2 loc.
2
7.8 Prove that the function f (x) = the largest u such that x = 3u y for some y is
CM-computable.
2
References
The data structure of GOTO is very similar to that of first-order LISP or Scheme, and
its control structure is very similar to early imperative languages, e.g. BASIC. Counter
and random access machines were first studied by Shepherdson and Sturgis [163], and
are now very popular in complexity theory, for instance in the book by Aho, Hopcroft
and Ullman [2].
125
The SRAM and equivalent storage modification machines were studied by Schonhage
[160]. Turing machines were introduced in [170] and are widely studied in computability
and complexity theory. The book by Papadimitriou [138] gives a large-scale introduction
to complexity theory and computation models, and [173] covers an even broader range.
Robustness of Computability
In this chapter we undertake the task of justifying the Church-Turing thesis, by proving
that all the different models introduced in the preceding chapter are equivalent to the
WHILE model introduced earlier.1 The result is that computability, without limitations
on resource bounds, is equivalent for all of: WHILE, I, GOTO, CM, 2CM, RAM, and TM. This
implies that many results about WHILE carry over to the other models directly. For
instance, the halting problem is undecidable for all of the above languages.
Section 8.1 presents an overview of the equivalence proof. Sections 8.2-8.7 then prove
the various equivalences by means of compilation and interpretation.
8.1
Overview
Figure 8.1 gives an overview of the translations and interpretations in this chapter. The
labels in the diagram sum up the techniques that are used. The proofs of equivalence
come in three variants:
1. Show for a language pair X, Y how to compile an arbitrary X-program p into an
equivalent Y-program q (possibly with a change in data representation, as in Definition 3.3.3).
2. Show for a language pair X, Y how to write an interpreter for X in Y.
3. The remaining arcs, labeled with , are trivial. For instance, every 2CM-program
is a CM-program with exactly the same computational meaning.
Figure 8.2 shows the form of data and store in each of the computation models. Compilation from WHILE to I was dealt with in Section 3.7.1; this involves coding multi-variable
programs into ones with only one variable X.
8.2
127
list of pairs
(address, contents)
WHILE
6
encode
TM
tape as
two lists
? expression split
I
GOTO
Q
Q
6 Bohm-Jacopini
Q
Q
interpret
SRAM
6
Q
tupling Q
functions Q
interpret
RAM
6
Q
Q
Q
s
Q
pairing -
CM
2CM
Language L
TM
GOTO
WHILE and I
CM
RAM
L-data
{0, 1}
ID
ID
IN
IN
L-store
=L S R
: IN ID
: IN ID
: IN IN
: IN IN
Input
BR
(0)
(0)
(0)
(0)
Output
R
(0)
(0)
(0)
(0)
read X;
C := 1;
while C do
if (=? C
if (=? C
if (=? C
if (=? C
if (=? C
if (=? C
if (=? C
if (=? C
write X
1)
2)
3)
4)
5)
6)
7)
8)
then
then
then
then
then
then
then
then
{
{
{
{
{
{
{
{
Y :=
if X
C :=
Z :=
Y :=
X :=
C :=
X :=
129
nil; C := 2 };
then C := 4 else C := 3};
8 };
hd X; C := 5 };
cons Z Y; C := 6 };
tl X; C := 7 };
2 };
Y; C := 0 };
8.3
The various remaining machine types have different forms of input-output data, which
necessitates transforming back and forth between different data domains. Figures 8.4, 8.5
show the encodings that we will use to represent one machine types data for simulation
by another machine type. (The notation < , > used for cpr will be defined shortly.)
{0, 1} = TM-data
6
c01B
=
ID = GOTO-data
Z
Z
Z
cpr ZZ
IN = 2CM-data
>
bin
c2CM
Z
Z
~
IN = RAM-data = CM-data
Figure 8.4: Data encodings between machine models.
8.3.1
We will mostly gloss over the correctness problem except where the construction is nontrivial, hoping the reader will find the other constructions sufficiently straightforward
not to need formal proof.
Compiling RAM to TM
Coding function c
c01B : {0, 1, B} ID
131
Definition of c
c01B (a1 a2 ...ak ) = (a1 a2 ak ) (in list notation)
where B = nil, 0 = (nil.nil), 1 = (nil.(nil.nil))
cpr
ID IN
bin
IN {0, 1}
c2CM
IN IN
c2CM (v) = 2v
8.4
Compiling RAM to TM
We begin with the most complex compilation, from the most complex machine type (the
RAM) to the a quite simple one (the Turing machine).
First, to simplify the construction we reduce RAM instructions to what might be called a
RISC or reduced instruction set version using register X0 as an accumulator, and with
instruction forms:
I
::=
|
|
X0 := X0 + 1 | X0 := X0 .
- 1 | if X0 = 0 goto `
X0 := Xi | Xi := X0 | X0 := <Xi> | <X0>:= Xi
Other operations: X0 := X0 Operation Xi
Clearly any RAM program can be converted to an equivalent reduced form program, slowing down its running time by at most a small constant factor.
The Turing machine simulating a RISC RAM program p will have 4 tapes as in the
following table, using the binary encoding bin : IN {0, 1} of Figure 8.5. With each
tape form we have marked, by underlining, the standard scan position. This is the
position the scan heads are to occupy between simulation of any two RAM instructions.
The first two tapes represent the locations and values of nonzero entries in the RAM
store = [a0 7 c0 , . . . , ak 7 ck ]. The third tape is the accumulator X0, the fourth is an
auxiliary scratch tape for various purposes.
Note that standard scan position can easily be located: since all number encodings
have at least one bit, it will always be the rightmost B in the first BB to the left of any
tapes nonblank contents.
Tape number
Tape name
Addresses
Contents
Accumulator X0
...B B...
Scratch
...B B...
Initialization code: the RAM program input bin(i) is on tape 1. This is first copied to tape
2 and 0 is placed on tape 1, signifying that cell 0 contains value bin(i). After this, both
heads are moved one position left to assume standard position. Termination code: the
first value bin(c0 ) on tape 2 is copied onto tape 1, and all other information is erased.
The simulation is described by three examples; the reader can fill in the rest.
1. X0 := X0 + 1:
Find the right end of the (nonblank portion of the) Accumulator tape 3. Repeatedly
replace 1 by 0 on it, and shift left one position, as long as possible. When a 0 or B
is found, change it to 1 and move one left to stop in standard position..
2. X23 := X0:
Scan right on tapes 1 and 2, one B block at a time, until the end of tape 1 is reached
or tape 1 contains a block B10111B. (Note: 10111 is 23 in binary notation.)
If the end of tape 1 was reached, location 23 has not been seen before. Add it, by
writing 10111 at the end of tape 1, and copy tape 3 (the value of X0) onto tape 2;
and return to standard position.
If, however, B10111B was found on tape 1, then bin(c23 ) is scanned on tape 2. In
this case it must be overwritten, done as follows:
copy bin(c24 ) ...B bin(ck ) B onto scratch tape 4;
copy tape 3 (the value of X0) in place of bin(c23 ) on tape 2;
write B and copy tape 4 onto tape 2, thus reinstating, after the new bin(c23 ),
the remainder bin(c24 ) ...B bin(ck ) B; and finally
return to standard position.
3. X0 := <X23>:
Starting at the left ends of tapes 1 and 2, scan right on both, one B block at a time,
until the end of tape 1 is reached or tape 1 contains a block with B10111B.
If the end is reached do nothing, as c23 = 0 and tape 3 already contains c0 .
Compiling TM to GOTO
133
If B10111B was found on tape 1, then bin(c23 ) is scanned on tape 2. Copy bin(c23 )
onto tape 4. As above, search tapes 1 and 2 in parallel until location B bin(c23 ) B is
found on tape 1, or tape 1s end is found. If the end was reached, write 0 on tape
3, since cc23 = c0 . Otherwise copy the tape 2 block corresponding to tape 1 onto
tape 3, as the tape 2 block contains bin(c(c23 ) ), and return to standard position.
Finally, other operations X0 := X0 Operation Xi can be simulated as long as they
denote Turing-computable functions on natural numbers. This holds for all operations
in the various RAM models which have been proposed.
8.5
Compiling TM to GOTO
For simplicity of notation we describe how to compile one-tape Turing machine programs
into equivalent GOTO programs; the extension to multiple tapes is obvious and simple. We
follow the common pattern. The encoding of Turing machine tapes as GOTO values
uses the encoding c01B defined in Figure 8.5.
A Turing machine store = L S R will be represented by three GOTO variables Lf,
C, Rt, whose values are related to the tape parts by C = S (notation defined in Figure
, where L
is L written backwards, last symbol first.
8.5), Rt = c01B (R), and Lf = c01B L
A Turing machine program p = I1 ;I2 ; . . . Ik is compiled into a simulating GOTO-program
p = I1 ;I2 ; . . . Ik , where each Ii is the sequence of GOTO commands defined next (with some
syntactic sugar for readability).
TM command
right
left
write S
C := d
if S goto `
if C = d then goto `
where d = S
where d = S
8.6
Compiling GOTO to CM
Extended CM code
Xi := 0
Xi := 2Xj 3Xk
Xi := u where Xi= 2u 3v
Xi := v where Xi= 2u 3v
if Xi = cpr (a) then goto `
Compiling CM to 2CM
135
X3 := X1; X0 := 1;
while X2 6= 0 do { X0 := X0 X3; X2 := X2-1 }
This nearly completes the proof that the functions used in the compilation are CMcomputable, except for m(x). This is left as Exercise 8.1.
2
8.7
Compiling CM to 2CM
Lemma 8.7.1 Suppose CM program p has one input and contains k variables X1,. . . ,Xk
where k 3. Then there is a CM program q with only two variables Y, Z such that2 [[p]]
can be implemented by [[q]] by encoding c2CM (x) = 2x .
2
Proof. Each command I` of p will be replaced by a sequence of commands I` in q.
Variables X1,. . . ,Xk are represented in q by two variables Y, Z. Letting h be the k-th
prime number, the following simulation invariant property will be maintained:
If variables X1, X2,. . . Xk have values x1 , x2 , . . . , xk (respectively) before execution of any p-instruction I` , then
Value of(Y) = 2x1 3x2 . . . hxk
will hold before execution of the corresponding q-instruction sequence I` .
Explanation of the simulation method: variable Z is used as an auxiliary. Assuming the
simulation invariant to hold, operations X2:= X2 + 1, X2 := X2 - 1, and X2=0? (for
instance) can be realized by replacing y by 3 y, or y 3, or deciding whether y is divisible
by 3. It is easy to see that these can be done with two counters; for example y 3 can
be computed by
while Y>0 do { Y:=Y-1; Z:=Z+1 }
while Z 4 do {Y := Y+1; Z := Z-3 }
where the test Z 4 and the operation Z := Z-3 are easily programmed. Operations on
the other Xi are similarly realized, completing the construction. Initially p has its input
x as value of X1, and every other variable has value 0. By the invariant this corresponds
to initial q value y = 2x 30 50 . . . = 2x of Y. Thus q is a 2-counter program which, given
input y = 2x , terminates with y = 2f (x) 30 50 . . . = 2f (x) , as required.
2
2 Recall
Definition 3.3.2.
Theorem 8.7.2 Any CM-computable function f (x) can be implemented by a 2CMcomputable function.
2
Corollary 8.7.3 The halting problem HALT-2CM for 2CM programs is undecidable. 2
Exercises
8.1 Show that the function: m(x) = max{y | z. x = cy z} can be computed by a counter
machine program, for any fixed c.
2
8.2 Give a compiling function from WHILE programs to GOTO programs. Illustrate
on a small example.
2
8.3 Give a compiling function from GOTO programs to WHILE programs. Illustrate
on a small example.
2
8.4 Prove Corollary 8.7.3.
References
Other kinds of support of the Church-Turing thesis include works by Gandy [51], Kleene
[100], Minsky [131], Rogers [155], and Shepherdsen and Sturgis [163].
(partly by T. . Mogensen)
9.1
The language WHILE is imperative. This means that WHILE programs have a global
store that they update by means of assignments. In contrast to this, functional languages
do not have a store. Instead they pass values to other functions in calls and receive values
when these functions return a result. This difference is reflected in the syntax of programs
in that a functional language typically has a syntactic category for expessions but, unlike
WHILE, none for commands.
9.1.1
The language F
The language F is a simple first order Lisp-like functional language whose programs have
one recursively defined function of one variable. It resembles language I of Section 4.2
in that data is built by cons from the single atom nil.
Definition 9.1.1 Let F-data = ID. We use the conventions d, e, f, . . . ID. Let X be a
variable. The informal syntax of programs is given by the following grammar:
Program
Expression
3P
3 E,B
::=
::=
|
|
|
|
|
|
E whererec f(X) = B
X
nil
hd E
tl E
cons E1 E2
if E then E1 else E2
f E
E[[X]]Bv
= v
E[[nil]]Bv
= nil
E[[cons E1 E2]]Bv
= (d1 .d2 )
E[[hd E]]Bv
= d1
E[[hd E]]Bv
= nil
E[[tl E]]Bv
= d2
E[[tl E]]Bv
= nil
E[[if E then E1 else E2]]Bv = d
E[[if E then E1 else E2]]Bv = d
E[[f E]]Bv
= w
P[[E0 whererec f(X) = B]]v
if
if
if
if
if
if
if
if
E[[E1]]Bv = d1 , E[[E2]]Bv = d2
E[[E]]Bv = (d1 .d2 )
E[[E]]Bv A
E[[E]]Bv = (d1 .d2 )
E[[E]]Bv A
E[[E]]Bv 6= or nil, E[[E1]]Bv = d
E[[E]]Bv = nil, E[[E2]]Bv = d
E[[E]]Bv = u and E[[B]]Bu = w
= E[[E0]]Bv
Example 9.1.3 The following is a tail recursive version of the reverse program in
F, esentially the imperative program of Example 2.1.4, written in functional style. The
expression reverse (cons X nil) returns the list X reversed. The program does so by
keeping the two variables X and Y from the corresponding WHILE program in packed
together in the single F variable, here called Z. An update of a variable in WHILE is
simulated by a function call in F.
rev (cons Z nil) whererec rev(Z) =
if (hd Z)
then reverse (cons (tl (hd Z)) (cons (hd (hd Z)) (tl Z)) )
else (tl Z)
139
2
9.1.2
The language F+
9.2
In this section we are concerned with the problem of writing interpreters for F in WHILE
or I, and vice versa. One half of this will be left to the Exercises:
Proposition 9.2.1 There exists an interpreter intIF for I written in F.
2
Proof. First we need a way to express F programs as data values. This is done in
Figure 9.2, where var0 , quote0 ,. . . ,doif0 are chosen to be elements of ID, all distinct.
E whererec f(X)=B =
X
d
cons E F
hd E
tl E
if E F G
f(E)
=
=
=
=
=
=
=
(E . B)
0
(var )
0
(quote d)
0
(cons E F)
0
(hd E)
0
(tl E)
0
(if E F G)
0
(call E)
9.3
A commonly used model of computation is the lambda calculus [?], [?]. It is, however,
seldom used in complexity texts as the notion of a computation cost is unclear. This is
both because the number of reduction steps depends heavily on the reduction strategy
used and because the basic reduction step in the lambda calculus -reduction is
considered too complex to be an atomic computation step. We will not investigate these
issues here, as our prime objective is to show that the lambda calculus has the same
computation power as the I language, and hence the same as WHILE.
Expressions in the lambda calculus are either variables, lambda-abstractions or applications:
::= x1 | x2 |
|
xi .
|
We will, for readability, often use (possibly subscripted) single-letter names for the xi .
In the expression x . E, the variable x is bound by the and has scope in the expression
read X;
(* X will be ((E.B).D) where D input
Cd := cons (hd hd X) nil ;
(* Cd top = Expression to be evaluated
B := tl hd X; (* Body of function definition
Vl := tl X;
(* Initial value of simulated X
St := nil;
(* Computation stack
while Cd do STEP;
X := hd St;
write X
141
*)
*)
*)
*)
*)
]
]
]
]
]
]
]
]
]
]
[
[
[
[
[
[
[
[
[
{
(U.(V.Sr)) ]
St ]
(nil.Sr) ]
(D.Sr) ]
[
[
[
St
St
St
(T.Sr)
St
(T.Sr)
St
(U.(T.Sr))
St
(W.Sr)
Cr, cons D St ]
Cr, cons Vl St ]
cons* E dohd0 Cr, St ]
Cr, cons (hd T) Sr]
cons* E dotl0 Cr, St ]
Cr, cons (tl T) Sr ]
cons* E1 E2 docons0 Cr, St]
Cr, cons (cons T U) Sr ]
cons* E docall0 Cr, St ]
Cd := cons* B return0 Cr;
St := cons Vl Sr; Vl := W }
Cd := Cr;
St :=cons U Sr; Vl:= V }
cons* E doif0 F G Cr, St ]
cons G Cr, Sr ]
cons F Cr, Sr ]
= a . b . c . A
= ((AB)...C)
if x 6= y
where z is a fresh variable
-reduction can be performed anywhere in a lambda expression, and we lift the notation
A B to mean that A reduces to B by a single -reduction anywhere in the term A.
The reflexive and transitive closure of is denoted by .
A term is said to be in (full) normal form if it contains no -redexes. A normal form
of a lambda calculus term A is a normal form term B that can be obtained from A by
repeated -reductions (i.e., A B). Not all terms have normal forms.
The Church-Rosser theorem [8] shows that the order of reductions in a certain sense
doesnt matter: if, starting with the same lambda expression, two sequences of reductions
lead to normal forms (lambda expressions where no -reductions can be performed), then
the same normal form will be reached by both reduction sequences. Hence, it makes sense
to talk about the normal form of a term, if any such exist.
Theorem 9.3.2 If A B and A C then there exists a term D such that B D
and C D.
A
B
R
C
R
D
143
The theorem actually proves more than the uniqueness of normal forms: If, starting
with a lambda term A, you perform two different reduction sequences and obtain the
terms B and C (which need not be in normal form), there exist a term D (also not
necessarily in normal form) such that both B and C reduce to D (using different reduction
sequences). A diagram:
Theorem 9.3.3 If lambda-expression M has a normal form at all, it can be obtained
by repeatedly locating and reducing the leftmost innermost redex (x . A) B.
A remark on efficiency. This stragey does not seem especially efficient, as it involves
repeatedly scanning the current -expression in search of redexes. Further, the order of
reduction can have significant impact on the number of reductions required to reach the
normal form, and indeed even on whether any normal form at all is reached. (Reductions
can continue indefinitely without terminating.)
Practical implementations of functional languages based on the lamda-calculus use
one of the following two alternatives to the above:
Definition 9.3.4 The call-by-value reduction strategy reduces an application M N as
follows:
1. First, reduce the operator M to a normal form. Stop if this is not of form x .P
2. Second, reduce the operand N to a normal form, call it N 0 .
3. Then -reduce ( x .P )N 0 .
Definition 9.3.5 The call-by-name reduction strategy reduces an application M N as
follows:
1. First, reduce the operator M to a normal form. Stop if this is not of form x .P
2. Then -reduce ( x .P )N .
Call-by-value is used by the languages LISP, SCHEME and ML. The HASKELL language
uses a variant on call-by-name known as call-by-need, or lazy evaluation.
Remark. If expression reduction terminates under call-by-value, then it also terminates
under call-by-name reductions.
Various notions of what is considered a value in lambda calculus have been used.
As we want to represent arbitrary tree-structures, we have chosen to let values be normal form lambda expressions. For a discussion on other choices, including weak head
normal forms and head normal forms and on the mathematical models of computation
these imply, see [8]. This also discusses properties of different strategies for the order of
reduction.
We define the language LAMBDA as having closed lambda calculus expressions as programs. Values in input and output are lambda calculus expressions in normal form. Running a program P with inputs X1 , . . . , Xn is done by building the application P X1 . . . Xn
and reducing this to normal form. The output of the program is the normal form obtained
this way.
9.4
9.4.1
We will now write an interpreter for I in LAMBDA. To do so we must first decide how to
represent syntax of I programs and values in the domain ID.
Values can be represented in many ways in the lambda calculus. We have chosen the
representation strategies of [133] and [132]. Values in ID are represented by a representation function d eID : ID , defined as follows:
dnileID
d(v.w)eID
= ab . a
= ab . b dveID dweID
145
::=
::=
|
|
Expression 3 E, F ::= X
|
|
|
|
Program
Command
read X; C; write X
X := E
C ; D
while E do C
nil
cons E F
hd E
tl E
X
E
nil
E
cons E F
E
hd E
E
tl E
=
=
=
=
=
x
dnileID
E E
cE F
E
hE
E
tE
Note that E uses the variables x, c, h and t bound by the in xcht . E . These variables
tag expressions with their kinds (x for X, c for cons, h for hd and t for tl). Note that
C
E
nil has no tag. We represent commands in a similar way: dCeC = xsw . C , where C is
defined by
C
X := E
C
C ; D
C
while E do C
= x dEeE
C C
= sC D
C
= w dEeE C
read X;
X := cons X nil;
while hd X do
X := cons (tl (hd X)) (cons (hd (hd X)) (tl X));
X := tl X;
write X
Above: an I program for reversing a list. Below: the encoding of the reverse program.
xsw .
s (x (xcht . c x (ab . a)))
(s (w (xcht . h x)
(x (xcht . c (t (h x)) (c (h (h x)) (t x)))))
x (xcht . t x))
The layout of the encoding reflects the layout of the I program.
Figure 9.4: An I program and its encoding
9.4.2
We will first construct a LAMBDA expression eval, which will take the representation of
an expression and the value of X and return the value of the expression. Using this, we
will construct a LAMBDA expression do, which will take the representation of a command
and the value of X and return the value of X after the command has been executed. This
will be our interpreter of I in LAMBDA.
Running an I program P on input v is done by running the LAMBDA program do on
inputs dPeP and dveID . This will yield a normal form dweID if and only if P given the
input v yields an output w.
Evaluation of expressions We now define the LAMBDA expression eval for evaluating
expressions:
eval = Ex . E x
(htab . b h t)
(d . d d (ht . h))
(d . d d (ht . t))
147
The idea is that when evaluating an expression every occurrence of the tag (variable) x
in the representation of an expression will be replaced by the current value of X, every
tag c by a function htab . b h t that can cons two arguments, every h by a function
that takes the head of a pair (and returns nil if given nil as argument) and every t by a
similar function returning the tail of a pair. Note that since the constant nil is untagged
in the representation of expressions, it is returned unchanged by eval.
Execution of commands We use a similar idea for the do function. The recursive
nature of the while command is a bit complex, though. We first present do, then explain
the details
do = C . C eval
(cdx . d (c x))
(Ec . W W)
where
W = wx . eval E x (w . x) (htw . w w (c x)) w
Occurrences of eval and W represent that the entire expressions these names represent
should be inserted in place of the names. This is just substitution of text. This means
that the variables E and c used in W are bound by the in the last line of the definition
of do (the line that uses W).
Similar to before, interpretation of a command C will replace occurrences of the
tagging variables x, s and w in the representation of C by functions that do the right
thing with the commands. For the assignment command X := E, this is eval, which
evaluates E and returns its value, which becomes the new value of X. For composite
commands C ; D, the function cdx . d (c x) will first execute C (by its execution function
c) and pass the new value of X to the execution function d of D, which then produces the
final value of X.
Execution of the while statement is more tricky. The function Ec . W W takes the
condition expression E of the while command while E do C and the execution function
c of the body C and then self-applies W. This yields
x . eval E x (w . x) (htw . w w (c x)) W
When this is given a value x0 of X, eval E x0 evaluates E with respect to this value. If
the result of the evaluation is nil, the situation becomes
(ab . a) (w . x0 ) (htw . w w (c x0 )) W
Termination and time complexity Readers familiar with the lambda calculus might
wonder why we havent used a fixed-point operator to model the while loop. The reason
is that the present method is more robust with respect to changes in reduction order: If an
I program P terminates on some input v, then (do dP eP dveID ) terminates regardless of
the order of reduction used to reduce lambda expressions. For certain reduction strategies
(including call-by-value), the number of -reductions required to reach the normal form
is proportional to the number of primitive operations performed by running P (as an I
program).
9.4.3
149
In our interpretation of the lambda calculus we are interested in reduction to normal form.
This means that we cant use the simple abstract machines for reducing lambda-terms to
weak head normal form (WHNF). Instead we will develop an abstract machine that does
a one-step parallel reduction (reduces all redexes, even under lambdas) and iterate this
until no redexes are left. Parallel reduction with systematic renaming of bound variables
can be expressed by the function R[[ ]] : Env IN , where Env : V ar
is a mapping from variables to LAMBDA terms. The number n IN is used for renaming
variables: A lambda at nesting level n will use the name xn . To obtain a normal form,
R[[ ]] must be applied repeatedly until no redexes remain, i.e., when the second rule isnt
used anywhere in the term.
R[[x]]n
R[[(x . e1 ) e2 ]]n
R[[e1 e2 ]]n
R[[x . e]]n
= x
= R[[e1 ]][x := R[[e2 ]]n]n
= (R[[e1 ]]n) (R[[e2 ]]n)
if e1 6= x . e0
= xn . (R[[e]][x := xn ](n + 1))
The notation (R[[e1 ]]n) (R[[e2 ]]n) indicates that an application expression is built from
the components R[[e1 ]]n and R[[e2 ]]n. Likewise, xn . (R[[e]][x := xn ](n + 1)) indicates
building of an abstraction expression. These should not be confused with semantic application and abstraction as used in denotational semantics.
With a suitable representation of syntax, numbers and environments, we can implement R[[ ]] in a language F+, which is the functional language F extended with multiple
functions and multiple parameters as suggested in exercise ?? and with a rewrite command similar to the one used for the while language. The extensions do not add power
to the language, which can be shown by providing a translation from F+ to F. We will
not do so here, though.
We will represent numbers as described in section ??, hence 0 = nil, 1 =
(nil.nil), 2 =(nil.(nil.nil)) etc. We represent the variable xi by the number
i. We represent terms by pairs of tags (numbers) and the components of the term:
bxi cE
bEFcE
bxi .EcE
= (0.i)
= (1.(bEcE .bFcE ))
= (2.(i.bEcE ))
normalize P whererec
normalize(P) =
normalize2(r(P,nil,0))
normalize2(Q) =
if tl Q then normalize(hd Q) else hd Q
r(E,R,N) =
rewrite [ E ] by
[ (0.X)
] => [ cons lookup(X,R) 0 ]
[ (1.((2.(X.E1)).E2)) ]
=> [ cons (hd r(E1,cons (cons X (hd r(E2,R,N))) R,N)) 1 ]
[ (1.(E1.E2)) ] => [ aux1(r(E1,R,N), r(E2,R,N)) [
[ (2.(X.E1)) ] => [ aux2(r(E1,cons (cons X N) R,cons nil N), N) ]
aux1(v,w) = cons (cons 1 (cons (hd v) (hd w)))
(if tl v then 1 else tl w)
aux2(v,N) = cons (cons 2 (cons N (hd v))) (tl v)
lookup(X,R) =
if equal(X,hd (hd R)) then tl (hd R) else lookup(X,tl R)
The functions normalize and normalize2 make up the main loop of the interpreter,
which call r until no further reduction is done. r implements the R[[ ]] function. In
addition to returning the reduced term, r also returns an indication of whether any
reduction occurs. This is done by pairing the reduced term with 1 if any reduction is
done and 0 if no reduction occurs. A few auxiliary functions are used to propagate this
information. The four cases in the case expression correspond to the four equations of
R[[ ]]. lookup fetches the value of a variable in an environment.
Time complexity While this interpreter is guaranteed to find (the representation of)
the normal form of a LAMBDA expression if one exists, we cannot say much about the
complexity of this: An arbitrary amount of work can be done in the interpreter between
each -reduction because we may need to traverse a large term to find a single -redex.
151
Also, just counting -reductions isnt a very precise measure of complexity of reducing
lambda terms, as substitution as defined in definition 9.3.1 can take time proportional
to the size of term. Cost measures for the lambda calculus not based on the number of
-reductions are discussed in [103] and [134].
Exercises
9.1 Consider an expanded version F0 of the language F, where programs contain several
functions, each with a single argument. Thus the syntax of programs should be
E whererec f1(X) = E1,..., fn(X) = En
Note that any function may call any other function, or itself.
Give a semantics similar to Figure 9.1.
9.2 Show how any F0 program can be translated into an equivalent F program.
9.3 Define a version F+ of the language F where programs contain several functions each
of which have several arguments. That is, the syntax of programs should be
E whererec f1(X1...Xk) = E1,..., fn(Y1,...,Ym) = En
Show how any F+ program can be translated into an equivalent F program.
9.4 * Prove Proposition 9.2.1. Hint: the pattern used in Section 4.2 will make this
easier: first, construct the needed interpreter using as many functions and variables as
is convenient. Then use the result of Exercise ??.
2
9.5 Find a lambda term , such that , i.e., such that it reduces to itself in one
step.
2
9.6 Find a family of lambda terms i , i IN such that i i+1 and such that
i = j i = j, i.e., they are all different.
2
9.7 * A lambda term is called linear if every variable in the term occurs exactly twice:
Once where it is bound and once inside the scope of that binding. Show that all linear
lambda terms have a normal form. Hint: Show that a -reduction strictly reduces the
size of a linear term.
2
References
The language F was introduced in [85]. It is very similar to first-order LISP or Scheme,
for example see [41]. The lambda calculus has long been investigated as a branch of pure
mathematics, and is now enjoying a renaissance due to its many applications within functional programming. Important works on this include those by Barendregt [8], Church
[24], and Turing [170].
10
We have seen earlier that there are problems that cannot be solved by programs in
WHILE or any other of the computation models we have considered. By the ChurchTuring thesis, these problems cannot be solved by any notion of effective procedures at
all.
Until now, the unsolvable problems we have considered all concern program properties: first, the Busy Beaver and halting problems, subsequently extended to all non-trivial
extensional program properties in Rices theorem. In this chapter we prove unsolvable
some other problems: Posts correspondence problem, and several problems concerning
context-free grammars (emptiness of intersection, ambiguity, and exhaustiveness). After
an introduction, each of the remaining sections is each devoted to one of these problems.
10.1
We have argued, we hope convincingly, for two points, one formal and one informal:
1. That the halting problem for WHILE programs is not decidable by any WHILE program.
2. That decidability of membership in a set A by any intuitively effective computing
device is exactly equivalent to decidability of membership in A by WHILE programs.
Point 1 is analogous to the classical impossibility proofs e.g. that the circle cannot be
squared using tools consisting of a ruler and a compass. It asserts that one particular
problem, the halting problem, cannot be solved be means of any of a powerful class of
tools: WHILE programs.
Point 2 is a version of the Church-Turing thesis. It cannot be proven, as it equates
an intuitive concept with a formal one. On the other hand it is widely believed, and
we have given evidence for it by showing a variety of different computing engines to be
equivalent.
If we assume the validity of point 2, then point 1 comes to carry much more significance. In particular, it implies that the halting problem for WHILE programs is not
decidable by any intuitively effective computing device whatsoever.
The halting problem is not completely unnatural in computer science, since an operating system designer could have a lively interest in knowing whether the programs to be
153
executed will run the risk of nontermination. Such knowledge about program behavior
in general is, alas, doomed to failure by the undecidability of the halting problem, and
Rices Theorem (see Theorem 5.4.2 and Exercise 5.5.)
The arguments used to prove point 1 use only accepted mathematical reasoning methods (even though the argument is subtle). Nonetheless, the halting problem for WHILE
programs is not a natural problem, as it is hard to imagine a daily mathematical context
in which one would want to solve it if one were not a computer scientist working in
operating systems or computability theory.
This discussion motivates the desire to see whether there also exist natural but undecidable problems. In this chapter we will present some simple problems that on the
surface seem to have nothing to do with Turing machines or WHILE programs, but that
are undecidable since if they were decidable, then one could also decide the halting problem. More generally, this technique is called reduction; a common definition in terms of
set membership problems is seen below.
In part III we present yet another undecidable problem, concerning diophantine equations (whether a polynomial equation possesses integer roots), that was among Hilberts
list of problems, posed in 1900. It was finally shown to be undecidable, but only in 1970
after years of effort by many mathematicians.
'
#
A
X $f
B
#
f
x
"!
- f (x)
"!
- f (x0 )
x
&
- Y
'
155
&
10.2
10.2.1
10.2.2
A
aAa
aacaa
Theorem 10.2.1 The following problem DERIV is undecidable: given a string rewriting
(or semi-Thue) system R over alphabet and two strings r, s , to decide whether or
not r s.
Proof. This is shown by reduction from the the halting problem for two-counter machines:
HALT-2CM DERIV. Recall that the instructions of a two-counter machine program
rec
We use reduction, showing that if the derivability problem were decidable, then the
halting problem for two-counter machines would also be decidable.
Suppose we are given a two-counter program p = L1 : I1 Lm : Im with variables X
(for input) and Y (initially zero), and natural number x as input. Begin by constructing,
from program p and input x, the string rewriting system R of Figure 10.2.
We will show that (p, x) HALT-2CM if and only if R DERIV.
157
#Cm #
::=
::=
::=
::=
#1 L1 #
Lm+1
Lm+1
Form of instruction L`
L` = X:=X+1
L` = X:=X-1
L` = X:=X-1
L` = if X=0 goto `0 else
L` = if X=0 goto `0 else
L` = Y:=Y+1
L` = Y:=Y-1
L` = Y:=Y-1
L` = if Y=0 goto `0 else
L` = if Y=0 goto `0 else
Case
`00
`00
X= 0
X6= 0
X= 0
X6= 0
`00
`00
Y= 0
Y6= 0
Y= 0
Y6= 0
Common rules:
Rules for instructions:
S
::= #11A 1A ::= 1B
C ::= D1
F1
::=
F
#A ::= #F D ::= E1
1F
::=
F
1B ::= C
1E ::= 1B
#F# ::=
#B ::= #C #E ::= #F
10.3
=
=
(a)(b)(ca)(a)(abc) = abcaaabc
(ab)(ca)(a)(ab)(c) = v1 v2 v3 v1 v4
On the other hand, one can verify that the sequence given by (a, ab), (b, ca), (ca, a),
(ab, cc) has no solution sequence at all.
A notational convenience: we will write ~ in place of index sequence i1 , . . . , im , and
u~ and v~ for (respectively) ui1 ui2 . . . , uim and vi1 vi2 . . . , vim . Given this the PCP can be
restated simply as: does u~ = v~ for some ~?
If u~ z = v~, then we call z the remainder of v~ over u~.
Theorem 10.3.2 The Post correspondence problem is undecidable.
undecidable. Afterwards, we show that DERIV PCP, and so the general PCP is also
rec
undecidable.
Although the constructions involved are simple, careful reasoning is needed to prove
that they behave as desired.
Index i
1
2
3
4
5
ui
#
abcba##
A
A
A
vi
#A#
#
aAa
bAb
c
Index i
6
7
8
9
10
ui
A
a
b
c
#
159
vi
A
a
b
c
#
Suppose we are given a string rewriting system R = {(u, v), (u0 , v 0 ), . . .} of pairs of
strings where each u, v , and strings r, s in . Its derivability problem is to decide
whether r s.
Construct the following RPCP problem P over alphabet {#} where # is a new
symbol not in ; and (u1 , v1 ) = (#, #r#), and (u2 , v2 ) = (s##, #):
P = {(#, #r#), (s##, #)} R {(a, a) | a or a = #}
| {z } | {z }
i=1
i=2
To distinguish the two sets of pairs R and P , we will use production notation u ::= v for
R instead of writing (u, v) R. Figure 10.4 shows the result of applying this construction
to the rewriting system with = {A, a, b, c}, r = A, and s = abcba, and rewrite rule set
R = {A ::= a A a, A ::= b A b, A ::= c}
The derivation A aAa abAba abcba is modelled in Figure 10.4 by the sequence
of pairs 1, 3, 10, 7, 4, 7, 10, 7, 8, 5, 8, 7, 10, 2 with
u1 u3 u10 u7 u4 u7 u10 u7 u8 u5 u8 u7 u10 u2
# A # a A a # a b A b a #
#A# aAa # a bAb a # a b c b
v1 v3 v10 v7 v4 v7 v10 v7 v8 v5 v8 v7 v10 v2
abcba##
a # #
=
=
=
r t
u1~ t# = v1~
u1~ t2 #t1 = v1~
Proof.
I implies II. We show by induction on m that r m t implies u1~ t# = v1~ for some
internal ~. The base case with m = 0 is r r, immediate with ~ = since (u1 , v1 ) =
(#, #r#).
Now assume r m+1 t, so r m xgy and xgy xhy = t for some g ::= h R. By
induction, u1~ x g y # = v1~ for some internal ~. Let x = a1 . . . ad , y = b1 . . . be where each
aj , bk .
The remainder of v1~ over u1~ is xgy#; so to extend this partial solution, at least
enough pairs (u, v) from P must be added to extend u1~ by xgy. It is easy to see that
u1~j x h y # = u1~ x g y # x h y # = v1~ x h y # = v1~j
by an appropriate index sequence extending the u and v strings:
~j =~ j1 . . . jd p k1 . . . ke q
where indices j1 . . . jd add pairs (a, a) that in effect copy x, index p of (g, h) adds g to the
u string and h to the v string, indices k1 . . . kq copy y by adding pairs (b, b), and index q
of pair (#, #) adds the final # to both.
II implies III.
III implies I is proven by induction, with inductive hypothesis IH(~) : for any t1 , t2
, if u1~ t2 #t1 = v1~ then r t1 t2 . Clearly this and t = t1 t2 imply I. The base case is
~ = , so we must show u1 t2 #t1 = v1 implies r t1 t2 . But
#t2 #t1 = u1 t2 #t1 = v1 = #r#
can only hold if t2 = r and t1 = . Thus t = t1 t2 = r, so r t is trivially true.
Inductively, suppose IH(~) holds. Consider internal index sequence ~j. We analyse
by cases over the pairs (uj , vj ) P .
161
Index i
1
2
3
4
5
6
7
8
9
10
ui
[#
abc ba# # ]
A
A
A
A
a
b
c
#
vi
[ # A #
#]
aAa
bAb
c
A
a
b
c
#
implies u~k s ## = v~k # (and u~j = v~j ). Thus u~k s # = v~k , which by Lemma 10.3.4 implies
r s.
2
Proof of Theorem 10.3.2. We actually show DERIV PCP, by modifying the construcrec
tion of Lemma 10.3.3. Suppose we are given a string rewriting system R over alphabet
, and r, s .
Let new , [, ] be new padding and grouping characters not in . For any x =
a1 a2 . . . an with each ai , define
x = a1 a2 . . . an and
x = a1 a2 . . . an
10.4
163
Theorem 10.4.1 The following problem is undecidable: given two context-free grammars Gi = (Ni , Ti , Pi , Si ) for i = 1, 2, to decide whether or not L(G1 ) L(G2 ) = .
Proof. This is shown by reduction from the PCP. Assume given a set of pairs
(u1 , v1 ), (u2 , v2 ), . . . , (un , vn ) in . Assuming disjointness of all the involved symbols and alphabets4 , we construct from this the two grammars, with N1 = {S1 , E},
N2 = {S2 , F }, T1 = T2 = {1, 2, . . . , n, } and production sets
P1 = {S1 ::= 1Eu1 | 2Eu2 | . . . | nEun } {E ::= | 1Eu1 | 2Eu2 | . . . | nEun }
P2 = {S2 ::= 1F v1 | 2F v2 | . . . | nF vn } {F ::= | 1F v1 | 2F v2 | . . . | nF vn }
Clearly S1 generates all strings of form im . . . i2 i1 ui1 ui2 . . . , uim , and S2 generates all of
form im . . . i2 i1 vi1 vi2 . . . , vim . Thus L(G1 ) L(G2 ) 6= if and only if there there exists
an index sequence i1 , . . . , im such that ui1 ui2 . . . , uim = vi1 vi2 . . . , vim . If it were possible
to decide emptiness of L(G1 ) L(G2 ) we could also decide the PCP, and so the halting
problem for two-counter machines. But this, as we know, is undecidable.
2
Theorem 10.4.2 The following problem CFAMB is undecidable: given a context-free
grammar G = (N, T, P, S), to decide whether or not G is ambiguous5 .
Proof. This is shown by reduction from the PCP. Given a set of correspondence pairs
(u1 , v1 ), . . . , (un , vn ) over alphabet , construct from this the grammar G = (N, , P, S),
with N = {S, S1 , E, S2 , F } and production set P as follows
S
S1
E
S2
F
::=
::=
::=
::=
::=
S 1 | S2
1Eu1 | 2Eu2 | . . . | nEun
| 1Eu1 | 2Eu2 | . . . | nEun
1F v1 | 2F v2 | . . . | nF vn
| 1F v1 | 2F v2 | . . . | nF vn
Clearly S1 , S2 derive just the same strings they did in G1 and G2 . Thus L(G1 )L(G2 ) 6=
if and only if the same string can be derived from both S1 and S2 . But this is true
4 This
5 See
if and only if G is ambiguous (all derivations are necessarily left-most since at most
one nonterminal is involved). As a consequence, decidability of ambiguity would imply
decidability of context-free interesection, in conflict with the preceding theorem.
2
Lemma 10.4.3 Given a sequence of strings U = (u1 , u2 , . . . , un ) over alphabet , the
following set is generated by some context-free grammar GU = (NU , T, PU , SU ) where
T = {1, 2, . . . , n, } :
{im . . . i2 i1 u | u 6= ui1 ui2 . . . uim }
Theorem 10.4.4 The following problem CFALL is undecidable: given a context-free
grammar G = (N, T, P, S), to decide whether L(G) = T .
Proof. Again we begin with the PCP. Given a sequence of pairs (u1 , v1 ), . . . , (un , vn ) over
alphabet , construct from this three context-free grammars
1. GU as by the preceding lemma with U = (u1 , u2 , . . . , un )
2. GV as by the preceding lemma with V = (v1 , v2 , . . . , vn ).
3. G0 with L(G0 ) = {x T | x is not of the form im . . . i2 i1 uj1 uj2 . . . ujk }
It is easy (and an exercise) to see that G0 exists, and in fact can be a regular grammar.
It is also easy to construct from these a single context-free grammar G = (N, T, P, S) with
L(G) = L(GU ) L(GV ) L(G0 ).
Claim: L(G) 6= T if and only if the PCP has a solution. To see this, suppose x T but x
/ L(G) = L(GU ) L(GV ) L(G0 ). Then x T \ L(G0 ) implies
x has the form x = im . . . i2 i1 uj1 uj2 . . . ujk . Further, x T \ L(GU ) implies x =
im . . . i2 i1 ui1 ui2 . . . uim , and x T \ L(GV ) implies x = im . . . i2 i1 vi1 vi2 . . . vim . Thus
ui1 ui2 . . . uim = vi1 vi2 . . . vim , so the PCP has a solution. Similarly, if the PCP has an
index sequence i1 , . . . , im as solution, then
x = im . . . i2 i1 vi1 vi2 . . . vim
/ L(G)
Thus L(G) 6= T if and only if the PCP has a solution, which is undecidable.
Exercises
10.1 Prove the assertion of Theorem 10.2.1.
165
10.2 Does the PCP with pairs (10, 101), (10, 011), (011, 11), (101, 011) have a solution?
2
10.3 Prove that the following problem is decidable: given a sequence of pairs (u1 , v1 ),
(u2 , v2 ), . . . , (uk , vk ) of nonempty strings over a finite alphabet , the problem is to
determine whether or not there exist two index sequences i1 , . . . , im and j1 , . . . , jn such
that
ui1 ui2 . . . uim = vj1 vj2 . . . vjn
Hint: the sets of left and right sides can be described by regular expressions.
10.4 * Complete the proof of Theorem 10.3.2 by showing that P has a rooted solution if
and only if P 0 has an unrestricted solution. Prove both the if and the only if parts.
2
10.5 Prove Lemma 10.4.3: construct the required context-free grammar GU .
10.6 Complete the proof of Theorem 10.4.4 (for example by showing that the set involved
is recognizable by a finite automaton).
2
10.7 Prove that it is undecidable, given two context-free grammars G, G0 , to determine
whether L(G) L(G0 ).
2
References
Posts correspondence problem was first formulated and shown unsolvable in [144].
Context-free ambiguity and some related problems were proven undecidable in [7]. The
book by Lewis and Papadimitriou, and the one by Rozenberg and Salomaa, contain a
broad selection of natural undecidable problems [108, 156].
Part III
Other Aspects of Computability Theory
11
(by M. H. Srensen)
11.1
Introduction
In the introduction to this book we mentioned Hilberts famous list of open problems at
the International Congress of Mathematicians in 1900. The tenth problem is stated as
follows:
Given a Diophantine equation with any number of unknown quantities and
with rational integral numerical coefficients: to devise a process according
to which it can be determined by a finite number of operations whether the
equation is solvable in rational integers.
In modern terms, the problem is to give an algorithm which for a polynomial equation
with integer coefficients can decide whether or not it has a solution in integers. An
equation of this form is called Diophantine, after the Greek mathematician Diophantus
from the third century, who was interested in such equations.
Hilberts Tenth problem is an example of a problem which is of independent interest in
another field than computability theory, namely number theory. For instance, Fermats
famous Last Theorem states that the equation
(p + 1)n+3 + (q + 1)n+3 = (r + 1)n+3
has no solution in natural numbers for p, q, r, n. Whether this is true has long been one
of the most famous open problems in number theory.1 For each fixed n, Fermats Last
Theorem is an instance of Hilberts Tenth problem, provided we restrict solutions to
the natural numbersthis restriction is not essential as we shall see shortly. Thus, an
algorithm deciding for a Diophantine equation whether any solution exists in the natural
numbers would prove or disprove Fermats Last Theorem for each fixed n. Conversely,
it has later been realized that unsolvability of Hilberts Tenth problem would imply
unsolvability of many other decision problems in number theory and analysis.
1 Wiles has recently given a proof of Fermats last Theorem which seems to be widely accepted, see
Annals of Mathematics, May 1995.
169
171
remained an open problem, believed by many to be false, until it was proved ten years
later in 1970 by the young Russian mathematician Matiyasevich [116].
In this chapter we give an account of the unsolvability of Hilberts Tenth problem,
leaving out the details of Matiyasevichs result. The first section introduces exponential Diophantine equations. The second section develops certain tools that are used in
the third section to prove the Davis-Putnam-Robinson Theorem. The fourth section
considers Hilberts Tenth problem.
11.2
Example 11.2.2
1. The function
f (x, y, z) = 3x + 5xy 71z 5
is a polynomial, where we write z 5 instead of z z z z z. Therefore,
3x + 5xy 71z 5 = 0
is a Diophantine equation, and the set of all natural numbers x such that there
exists y, z with 3x + 5xy 71z 5 = 0 is a Diophantine set.
2. The function
f (x, y) = x 2y
is a polynomial, so
x 2y = 0
is a Diophantine equation. Therefore the set of all even numbers is Diophantine;
indeed, it is the set of all natural numbers x such that there exists a y with x 2y =
0, i.e., x = 2y.
3. The function
f (p, q, r, n) = (p + 1)n+3 + (q + 1)n+3 (r + 1)n+3
is an exponential polynomial. Hence,
(p + 1)n+3 + (q + 1)n+3 (r + 1)n+3 = 0
is an exponential Diophantine equation. Therefore the set of all x, y, z > 0 such
that for some k 3, xk + y k = z k , is exponential Diophantine.
2
In the introduction Diophantine equations had integer solutions, but in the preceding definition their solutions were natural numbers. However, the problem of deciding whether
an arbitrary (exponential) Diophantine equation has solution in integers is equivalent to
the problem of deciding whether an arbitrary (exponential) Diophantine equation has
solution in natural numbers.
To reduce the former problem to the latter, note that there is a solution in integers
to the equation
f (x1 , . . . , xn ) = 0
if and only if there is a solution in natural numbers to the equation
f (p1 q1 , . . . , pn qn ) = 0
173
For the opposite reduction, recall that any natural number can be written as the sum
of four squares (see, e.g., the appendix to [123]). Hence, there is a solution in natural
numbers to the equation
f (x1 , . . . , xn ) = 0
if and only if there is a solution in integers to the equation
f (p21 + q12 + r12 + s21 , . . . , p2n + qn2 + rn2 + s2n ) = 0
In conclusion, we have simplified the problem inessentially by considering only natural
number solutions.
In a similar vein, we may allow equations of form
f (x1 , . . . , xn ) = g(y1 , . . . , ym )
where g(y1 , . . . , ym ) is not simply 0, since this is equivalent to
f (x1 , . . . , xn ) g(y1 , . . . , ym ) = 0
We may allow conjunctions of equations
f (x1 , . . . , xn ) = 0 g(y1 , . . . , ym ) = 0
since this conjunction of equations has a solution if and only if there is a solution to the
ordinary equation
f (x1 , . . . , xn ) f (x1 , . . . , xn ) + g(y1 , . . . , ym ) g(y1 , . . . , ym ) = 0
Similarly we may allow disjunctions of equations
f (x1 , . . . , xn ) = 0 g(y1 , . . . , ym ) = 0
since this disjunction of equations has a solution if and only if there is a solution to the
ordinary equation
f (x1 , . . . , xn ) g(y1 , . . . , ym ) = 0
11.3
There are several such techniques available. The best known, first employed by
Godel [54], uses the Chinese Remainder Theorem. In the present setting this technique
has the disadvantage that it makes it rather hard to express certain necessary operations as exponential Diophantine equations. Therefore another technique was invented
by Matiyasevich [118], which we present in this section.
Definition 11.3.1 For a, b IN , let
a=
n
X
ai 2i (0 ai 1),
b=
i=0
n
X
bi 2i (0 bi 1)
i=0
175
(B + 1) =
n
X
n
k=0
Bk
Note that nk is the kth digit of (B + 1)n written in base B, provided nk < B for all
k. This, in turn, holds if B > 2n (see the exercises). Hence, m = nk is exponential
Diophantine:
n
m=
B : B = 2n + 1 m = [(B + 1)n ]B
2
k
k
The second lemma necessary to prove that the bitwise less-than relation is exponential
Diophantine involves a bit of elementary number theory, which has been banished to the
exercises.
Lemma 11.3.3 n k nk is odd
2
If a b then a is also digitwise less than b using any other base B, provided the base is
a power of 2. The converse does not generally hold; it may be that B is a power of 2,
a is digitwise less than b in base B, and yet a 6 b. However, if B is a power of 2, a is
digitwise less than b in base B, and all the digits of b in B are 0 or 1, then also a b.
All this is perhaps best explained with an example.
Example 11.3.5 For instance, 34 43, as can be seen from the first two rows in Figure 11.1. Moreover, 34 is also digitwise less than 43 with base 4, as can be seen from the
last two rows in the figure. The reason is that every group of two coefficients x2i+1 +y 2i
in the base 2 representation is packed into a single coefficient x 2 + y in the base 4 representation. If, in the base 2 representation, all bits in a number are less than or equal
to those in another number, then the same holds in the base 4 representation; that is, if
x1 x2 and y1 y2 then x1 2 + y1 x2 2 + y2 .
43
34
=
=
1 25 + 0 24
1 25 + 0 24
+
+
1 23 + 0 22
0 23 + 0 22
+
+
1 21 + 1 20
1 21 + 0 20
43
34
=
=
2 42
2 42
+
+
2 41
0 41
+
+
3 40
2 40
=
=
1 25 + 0 24
0 25 + 1 24
+
+
1 23 + 0 22
0 23 + 0 22
+
+
1 21 + 1 20
1 21 + 0 20
43
18
=
=
2 42
1 42
+
+
2 41
0 41
+
+
3 40
2 40
11.4
In this section we show that any recursively enumerable set A is exponential Diophantine.
As mentioned in Section 11.1, the result is due to Davis, Putnam, and Robinson [38].
The present proof is due to Jones and Matiyasevich [74], and is somewhat more in the
spirit of this book than the original recursion-theoretic proof.
Any recursively enumerable set A can be represented by a counter machine p in the
sense that that x A iff [[p]](x) terminates. This follows from the fact that counter machines can express all partial recursive functions. The idea, then, is to formalize the execution of any counter machine p by an exponential Diophantine equation f (x, z1 , . . . , zk ) = 0
such that [[p]](x) terminates iff f (x, z1 , . . . , zk ) = 0 has a solution.
177
Before proceeding with the general construction it will be useful to review an example,
taken from [156], which illustrates how this is done.
Example 11.4.1 Consider the following counter machine p:
I1 :
I2 :
I3 :
I4 :
ifX1 = 0goto4;
X1 := X1 1;
ifX2 = 0goto1;
stop
6
0
0
1
0
0
0
5
0
0
0
0
1
0
4
1
0
0
1
0
0
3
1
0
1
0
0
0
2
1
0
0
0
1
0
1
2
0
0
1
0
0
0
2
0
1
0
0
0
=
=
=
=
=
=
=
t
x1,t
x2,t
i1,t
i2,t
i3,t
i4,t
where y = 7 is the length of the computation and b is a number larger than all the
numbers in the matrix. With this idea the whole matrix becomes the system of equations
in Figure 11.4.
0 b7 +
0 b7 +
0 b7 +
0 b7 +
0 b7 +
1 b7 +
0 b6 +
0 b6 +
1 b6 +
0 b6 +
0 b6 +
0 b6 +
0 b5 +
0 b5 +
0 b5 +
0 b5 +
1 b5 +
0 b5 +
1 b4 +
0 b4 +
0 b4 +
1 b4 +
0 b4 +
0 b4 +
1 b3 +
0 b3 +
1 b3 +
0 b3 +
0 b3 +
0 b3 +
1 b2 +
0 b2 +
0 b2 +
0 b2 +
1 b2 +
0 b2 +
2 b1 +
0 b1 +
0 b1 +
1 b1 +
0 b1 +
0 b1 +
2 b0
0 b0
1 b0
0 b0
0 b0
0 b0
=
=
=
=
=
=
x1
x2
i1
i2
i3
i4
179
Proof. Let A be any recursively enumerable set and p be a counter machine such that
x A iff [[p]](x) terminates. Suppose p has form
p = I1 . . . In (with counters X1 , . . . , Xm )
(11.1)
(11.2)
such that
[[p]](x) terminates (11.2) has a solution
(11.3)
More precisely we derive 12 equation schemes which can be combined into a single conjunction using the technique in Section 11.2.
1. First of all, we need a base b for the representation in Figure 11.4. Recall that b
must be larger than all the coefficients in order for the representation to be correct. Since
the initial value of counter X1 is x and the other counters are initialized to 0, no counter
value can exceed x + y where y is the number of steps of the computation. Therefore,
b = 2x+y+n
(1)
is large enough. We shall need later two additional facts about b, both satisfied by the
above choice: that b is a power of 2 and that b > n.
2. It will be convenient to have a number whose representation in base b is a string
of length y consisting entirely of 1s. This is the number by1 + + b + 1. This number
satisfies the equation
1 + bU = U + by
(2)
and it is the only number satisfying the equation; indeed, if U = (by 1)/(b 1) then
U = by1 + + b + 1.
3. It will be necessary later that the coefficients in Figure 11.4 are all strictly smaller
than b/2. This is enforced by the following equations.
xj (b/2 1)U
(j = 1, . . . , m)
(3)
Indeed, if xj is less than (b/2 1)U bitwise, then the same thing holds digitwise in base
b, since b is a power of 2 (see Example 11.3.5). But the digits in (b/2 1)U in base b are
exactly b/2 1.
4,5. In each computation step of p, exactly one command is executed. This is
expressed by the following equations.
il U
(l = 1, . . . , n)
(4)
U=
n
X
il
(5)
l=1
The first equation states that in the binary representation of the two numbers, all the
coefficients of il are smaller or equal than those of U . Since b is a power of 2, and all the
coefficients of U are 1, this is the same as requiring that in base b, all the coefficients of
il are smaller or equal than those of U , i.e., are 0 or 1 (see Example 11.3.5). That is, in
terms of Figure 11.4, all coefficients in il are 0 or 1.
The second equation similarly expresses the fact that in every il -column in Figure 11.4
there be exactly one coefficient which is 1. For this it is necessary that no carry occur in
the summation, and this is guaranteed by the fact that b > n.
6,7. In any computation with p, the first and last step are to execute command I1
and In , respectively. This is expressed as follows.
1 i1
(6)
in = by1
(7)
The first equation expresses that the rightmost coefficient of i1 in Figure 11.4 is 1. The
second states that the leftmost coefficient in in is 1.
8. After executing a command Il which is either a subtraction or an addition, the
next instruction should be Il+1 . This is expressed as follows.
bil ik + il+1
(8)
The equation states that in Figure 11.4, if the coefficient of bj in il is 1, then the coefficient
of bj+1 should be 1 in il+1 . Note how the multiplication with b represents a move to the
left in Figure 11.4.
9. After executing a conditional Il :if Xj =0 goto k the next instruction should be
either Ik or Il+1 . This is expressed as follows.
bil ik + il+1
(9)
The equation states that in Figure 11.4, if the coefficient of bj in il is 1, and Il is the
command if Xj =0 goto k, then the coefficient of bj+1 should be 1 in il+1 or ik (where
k 6= l + 1 by assumption).
10. Whenever executing a command Il :if Xj =0 goto k, the next command should
be Ik if Xj is 0, and Il+1 otherwise. This is expressed as follows.
bil il+1 + U 2xj
(10)
181
To see this, suppose that Xj = 0 before, and hence also after, step k, i.e.,
xj = . . . + 0 bk+1 + 0 bk + . . .
Then
2xj = . . . + 0 bk+1 + 0 bk + . . .
Here we made use of the fact that all coefficients are smaller than b/2, so that no bit of
the coefficient of bk1 is shifted into the coefficient of bk by the multiplication with 2.
Hence, the subtraction U 2xj looks as in Figure 11.5.
U
2xj
U 2xj
= 1 by1
=
=
+ + 1 bk+1
+ 0 bk+1
+ 1 bk+1
+ 1 bk
+ 0 bk
+
+
+
U =1 by1
2xj =
U 2xj =
+ + 6 1 bk+2
b
+
6 1 bk+1
+
2n bk+1
+ (b 2n) bk+1
+
+
+
b
1 bk
2n bk
+
+
xj = b(xj +
X
lA(j)
il
il )
lS(1)
(j = 2, . . . , m)
(12)
lS(j)
P
Indeed, consider (11). The sum lA(1) il is a number whose base b representation has
1 for every coefficient k where the kth step in the execution of p is X1 := X1 +1. Similarly
with the other sum. (11) now states that if X1 is n before the kthe step, and the
instruction executed in the kth step is an addition, then X1 is n + 1 before the k + 1th
step. For example, consider Figure 11.7.
In this example there is only a single addition to X1 during the whole execution, namely
in step 1, and a single subtraction in step 4. Before step 1, X1 has value x, hence after
it has value x + 1. Similarly with subtractions. This can be expressed by requiring that
P
P
if we add x1 to the sums lA(1) il and lS(1) il and shift the result one position to
the left, then the coefficients should match those in x1 . Note that multiplication with b
does not lead to overflow since it is assumed that all counters are 0 at the end.
Equation (12) is identical to Equation (11) except that the initial contents of xj is 0
rather than x, for j = 2, . . . , m. The whole set of equations is collected in Figure 11.8. It
is now a routine matter to verify that the claim (11.3) is indeed true.
2
Corollary 11.4.3 There is no algorithm that can decide for an exponential Diophantine
equation whether or not is has a solution in natural numbers.
x x
0
x x
x x
183
step
x = x1
P
0
0
0
1
0 =
lA(1) il
P
1
0
0
0
0 =
lS(1) il
P
P
x
x + 1 x + 1 x + 1 x = x1 + lA(1) il lS(1) il
P
P
x
x = x + b(x1 + lA(1) il lS(1) il )
x+1 x+1 x+1
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
b = 2x+y+n
1 + bU = U + by
xj (b/2 1)U
il U
Pn
U = l=1 il
1 i1
in = by1
bil il+1
bil il+1 + U 2xj
bil ik + il+1
P
P
x1 = x + b(x1 + lA(1) il lS(1) il )
P
P
xj = b(xj + lA(j) il lS(j) il )
(j = 1, . . . , m)
(l = 1, . . . , n)
(Il : Xj = Xj 1, Il : Xj = Xj +1)
(Il : ifXj = 0gotok)
(Il : ifXj = 0gotok)
(j = 2, . . . , m)
Proof. Let A IN be a recursive enumerable, non-recursive set (recall that such sets
do exist). By the Davis-Putnam-Robinson Theorem there is an exponential Diophantine
equation f (x, z1 , . . . , zn ) = 0 such that
x A f (x, z1 , . . . , zn ) = 0 has a solution
Since we can construct effectively the equation f (x, z1 , . . . , zn ) = 0 given x it follows that
an algorithm to decide for each x whether f (x, z1 , . . . zn ) has a solution would imply a
decision procedure for A, which is impossible since A is non-recursive.
2
11.5
In this section we briefly show that Hilberts Tenth problem is unsolvable, leaving out
almost all details. As mentioned, the following theorem, due to Matiyasevich [116], was
the final step in solving Hilberts Tenth problem.
Theorem 11.5.1 The relation u = v w is Diophantine.
2
Proof. By the Davis-Putnam-Robinson Theorem, there exists for every recursively enumerable set A an exponential Diophantine equation f (x, z1 , . . . , zn ) = 0 such that
x A z1 , . . . , zn : f (x, z1 , . . . , zn ) = 0
By Matiyasevichs theorem there is a Diophantine equation e(u, v, w, y1 , . . . , ym ) = 0 such
that
u = v w y1 , . . . , ym : e(u, v, w, y1 , . . . , ym ) = 0
Therefore every occurrence in f (x, z1 , . . . , zn ) of tt12 can be replaced by a variable u.
We must then add to the original equation f (x, z1 , . . . , zn ) = 0 the new equations v = t1 ,
w = t2 , and e(u, v, w, y1 , . . . , ym ) = 0. These can all be combined into a single Diophantine
equation using the technique in Section 11.2.
2
The following corollary then shows that Hilberts Tenth problem is unsolvable.
Corollary 11.5.3 There is no algorithm that can decide for a Diophantine equation
whether or not is has a solution in natural numbers.
Proof. Similar to the proof of Corollary 11.4.3 using the preceding corollary.
Exercises
11.1 Show that the non-strict less-than relation a b is Diophantine.
11.2 Show that the set of numbers that are not powers of 2 is Diophantine.
11.3 Show that the set of numbers that are not prime is Diophantine.
185
2
The following is adopted from [156]. For a different proof of Lemma 11.3.3, see [74].
11.5 * Prove that
kn
n
is odd
k
Hint: Prove the assertion for the cases k > n, k = n, and k < n. In the last case proceed
in the following steps.
1. Let m =
define
Pl
i
i=0 mi 2
ONE(m) =
EXP(m) =
n
EXP
= ONE(k) + ONE(n k) ONE(b)
k
Pl
Pl
5. Now let n = i=0 ni 2i and k = i=0 ki 2i . Prove that i : ki ni implies
n
EXP
=0
k
and hence the left-to-right direction in the overall assertion follows.
6. For the right-to-left direction prove that if i : ki > ni then
n
EXP
>0
k
as follows. Let i be the smallest index such that 0 = ni < ki = 1. Let Nj = kj [b
a]2j . Prove that
nj
n (= 0)
Pi l
j+1 nj
= Nj
< (2 =)Ni
Pl
1 + j+1 Nj
l
X
N j nj > 0
j=0
References
As mentioned, Hilberts Tenth Problem was presented at the International Congress of
Mathematicians in 1900. While it was not actually stated during his lecture, it appeared
in the published version, see Reids biography [150].
Several papers by Davis, Putnam, and Robinson were mentioned in Section 11.1.
Another classic recursion-theoretic presentation of the unsolvability of Hilberts Tenth
problem, with a historical appendix and more references, is due to Davis [35].
In Section 11.1 we also mentioned several papers by Matiyasevich. For more references and much more information about all aspects of Hilberts Tenth problem, consult
187
Matiyasevichs book [123]. The book discusses many applications; it infers from unsolvability of Hilberts Tenth problem the unsolvability of several other problems in number
theory and analysis. Its sections with commentaries at the end of each chapter give many
historical details.
In several places we have adopted technical and stylistic improvements from the recent
books by Floyd and Beigel [47] and Rozenberg and Salomaa [156].
12
189
12.1
Language semantics can be defined in (at least) two ways. One way is by Plotkins
structural operational semantics [140] or Kahns similar natural semantics [91]; both are
used by many researchers. By this approach, a language semantics is given by a collection
of inference rules that define how commands are excuted, how expressions are evaluated,
etc.
In an operational semantics a language definition is a set of inference rules and axioms
sufficient to execute programs. An inference rule consists of a set of premises which, if
true, allow one to conclude or deduce a conclusion. An axiom is a special case of an
inference rule one with an empty set of premises. We give some examples now, and a
more general framework later in Section 12.4.
The I semantics defined in Section 2.2 is in essence (though not in apperance) an
operational semantics. For example, the definition of C ` 0 is easy to re-express
using inference rules as in the next section (Exercise 12.2). According to such rules the
meaning of a recursive construction such as a while loop or a recursively defined function
is typically obtained by syntactic unfolding; an example will be seen below.
Another way to define semantics is by denotational semantics, first developed by Scott
[162]. (See Schmidt [158] for a gentle introduction.) By this approach, every syntactic
construction in the language is assigned a meaning in some domain: a set plus a partial
order on its elements, ordered according to their information content. For example,
the set IN IN is a domain, ordered by f v g iff for all x IN , either f (x) = g(x) or
191
f (x) = (see Section 14.1 for a sketch of this approach). The meaning of a recursive
construction such as a while loop or a recursively defined function is obtained by applying
the least fixed-point operator to a certain higher-order function.
12.1.1
`ev
` succ e plus(v, 1)
For an expression e1 + e2, if the subexpressions respectively have values u, v, then the
entire expression has value u + v. This is expressed by a two-premise inference rule:
` e1 u
` e2 v
` e1 + e2 plus(u, v)
This may look content-free but in fact is not, since it defines the meaning of the syntactic symbol + appearing to the left of the in terms of the already well-understood
mathematical operation of addition (the plus appearing to the right).
For another example consider boolean-valued expression e1 = e2, which tests two
values for equality. This is easily described by two rules, one for each case:
` e1 u
` e2 u
` e1 = e2 true
` e1 u
` e2 v
` e1 = e2 false
u 6= v
12.1.2
193
incrementing x from 0 to 1. Again, 1-x has to be evaluated, now yielding 0. Now only
the first `min rule can be used, leading to the conclusion that [x 7 1] `min 1 x 1 and
so that w = 1.
12.2
Predicates
The net effect of an inference system is to define a predicate, i.e. a relation among
values (for example, between expressions and their values in a given store). This section
introduces some terminology concerning predicates, and establishes some of their basic
properties.
The extensional view: predicates are sets
In this book a predicate over a set S is just a subset of S. It is common in logic to express
the fact that v S as S(v) is true, or sometimes even just to assert the statement
S(v).
If S = S1 Sn then P is called an n-ary predicate (0-ary or nullary, unary, binary,
ternary, etc. for n = 0, 1, 2, 3, . . .). Examples of predicates over IN :
1. binary: < is the set {(m, n) IN IN | m is smaller than n}.
2. binary: = is the set {(m, m) IN IN | m IN }.
3. unary: the set of prime numbers.
Operations on predicates
Suppose P and Q are n-ary predicates over S. Then the following are also n-ary predicates:
1. conjunction, or and: P Q = P Q. For s S n , s is in P Q iff s is in both P
and Q.
2. disjunction, or or: P Q = P Q. For s S n , s is in P Q iff s is in P or Q or
both.
3. implication, or if-then :P Q = {s S n | if s is in P then s is also in Q}.
4. negation, or not: P = S n \P . For s S n , s is in P iff s is not in P .
Some examples:
Predicates
195
1. If P is the set of prime numbers and O is the set of odd numbers, then P O is
the set of odd prime numbers.
2. If O is the set of odd numbers and E is the set of even numbers then E O = IN .
Recall that, although functions are just certain sets, we allow shorthand notations like
f (n, m) = n + m. Similarly we allow short hand notations for predicates, like P (x, y) is
the predicate x = y + 1 with the understanding that what we really mean is that P (x, y)
is the set {(1, 0), (2, 1), . . .}.
Suppose that P S1 Sn is an n-ary predicate. Then the following are (n 1)ary predicates:
1. Universal quantifier , or for all:
xi P = {(x1 , . . . , xi1 , xi+1 , . . . , xn ) S1 Si1 Si+1 Sn |
for all xi in Si , (x1 , . . . xn ) is in S1 Sn }
2. Existential quantifier , or there exists:
xi P = {(x1 , . . . , xi1 , xi+1 , . . . , xn ) S1 Si1 Si+1 Sn |
there is an xi in Si such that (x1 , . . . xn ) is in S1 Sn }
Examples:
1. If P (x, y) is the predicate over IN IN then yP (x, y) is the predicate over IN
which only contains 0. (0 is smaller than all other numbers.)
2. Further, xP (x, y) is the predicate over IN which contains no elements at all.
(There is no largest number).
3. If P (x, y) is the predicate x = y + 1 over IN O where O is the set of odd numbers then xP (x, y) is the predicate over IN containing exactly the even positive
numbers.
n-ary predicates over ID as subsets of ID
Since set ID is closed under pairing, we can represent an n-ary predicate P over ID as the
set of list values:
{(d1 . . . dn ) = (d1 .(d2 . . . .(dn .nil) . . .) | d1 , . . . , dn P }
Thus we may take over the terms recursive (decidable) and r.e. or recursively enumerable (semidecidable) for predicates over ID, without change from those concerning
sets. We will henceforth restrict our attention in some definitions and theorems to unary
predicates over ID, but use the freer n-ary notation where convenient.
12.3
197
12.4
We now simplify and generalize the examples of Section 12.1. The result is a framework
able to express the previous examples, and most logical proof systems as well. An
inference system I is a collection of inference rules which, acting together, define a
collection of provable judgments. The idea is to think of the set of values for which each
judgment is true as a predicate over ID. The system proves assertions of form P (d) where
P is a predicate name and d ID.
12.4.1
If
2. No set T hmsI contains any element of ID unless it can be shown so by some finite
number of applications of the preceding clause.
The premises of this application of rule Rr are P1 (d1 ), . . . , Pk (dk ), and P (d) is called its
conclusion. A special case: if k = 0, the rule is called an axiom. The effect of an axiom
is to place elements into set T hmsI with no need for premises.
2
12.4.2
Operational semantics. In previous sections we saw a definition of expression evaluation by two ternary (3-ary) predicates: ` expression value for normal evaluation,
and an auxiliary predicate `min expression value used for the minimization operator.
Horn clause deduction. Section 26.3 will describe the deduction of boolean variables
(also called propositional variables) from a set H of Horn clauses of form A1 A2 . . .
Ak A0 . This is an archetypical example of an infernce system. In this context all
judgments have form ` A where A is a propositional variable, and one inference rule for
each Horn clause A1 A2 . . . Ak A0 H:
` A1
` A2 . . . ` Ak
` A0
199
systems of a certain strength any consistent system must necessarily be incomplete, that
is there must be true statements which are not provable.
` P (Q P )
` (R S) [(R (S T )) (R T )]
` P (Q P Q)
` P Q P
` P Q Q
` (P R) [(Q R) (P Q R)]
` (P Q) [(P Q) P ]
`P
` P P Q
` Q P Q
` P P
`P Q
`Q
Following is an example of its use is to prove that I I for any propositional variable I
(symbol ` omitted for compactness):
I (I I)
I ((I I) I)
12.4.3
Define a proof tree t to be a proof tree form such that every subtree
r
12.5
A version of G
odels incompleteness theorem
Godels original proof involved statements concerning arithmetic on the natural numbers.
Its pathbreaking achievement was to reveal a fundamental limitation in the power of
mathematical proof systems: that beyond a certain complexity level, there can be no
hope to have a proof system which is simultaneously complete and consistent.
As might be expected, we will instead manipulate values in ID. This gives a substantially simpler construction, both displaying the power of our framework, and stimulating
thought about the powers of logical systems in practice, for instance for reasoning about
program behavior.
12.5.1
201
We now introduce a tiny logical language in which one can make statements about values
in ID. Each such statement has an immediately natural reading or truth value. We will
then prove that no inference system as defined above can generate all true statements in
DL.
As is traditional in logic, we first give the syntax of DL expressions. For the sake of
preciseness, we will define exactly what it means for a DL statement to be true, leaving it
to the reader to check that this captures his or her intuitions about statements involving
values from ID.
An abstract syntax of DL This is given by a grammar defining terms, which stand
for values in ID, and statements, which are assertions about relationships among terms.
Terms:
Statements:
The symbol ++ stands for the append operation on list values. Logical operators
, , , etc. can be defined from , , above as usual, and equality T = T0 can be
regarded as syntactic sugar for T = T0 ++ nil. Statements are intuitively interpreted in
the natural way, for example the relation x is a sublist of y could be represented by
the following statement S(x, y):
uvw(y = w++v w = u++x)
We now proceed to define true statement more formally and precisely.
First, a free occurrence of a variable x in statement S is any occurrence which does not
lie within any substatement x T of S. The set F reevars(S) of free variables in statement
S is the set of all x which have at least one free occurrence in S. Finally, S is said to be
closed if F reeV ars(S) = {}. We will sometimes write S(x, y, . . . , z) instead of S alone, to
indicate that its free variables are x, y, . . . , z.
The operation of substitution is done by a function Subst(F, x, d) where d ID which
yields the result of replacing by d every free occurrence of variable x within S(x). This
may also be applied to several variables, written Subst(F, (x1 , . . . , xn ), (d1 . . . dn )).
Definition 12.5.1 Let size(S) be the number of occurrences of operations ++, , , in
S. The set Ti of true closed statements of size i or less is given inductively by
{
S
| S is closed and S
/ Ti }
{ F1 F2 | F1 Ti and F2 Ti }
{
xS
| Subst(S, x, d) Ti for some d ID}
12.5.2
Representation of predicates in DL
203
Proof. Recall the semantics of Section 2.2. We only consider the sublanguage I of WHILE,
so the store used there can be replaced by the current value d of the single variable X.
We will show that for each I expression E and command C, there exist DL-statements
FE (d, d0 ) and GC (d, d0 ) that represent the binary predicates E[[E]]d = d0 and C ` d d0 . This
suffices since if p is read X; C; write X, then dom([[p]]) is represented by statement
d0 GC (d, d0 ).
Expressions.
Fnil (d, d0 )
d0 = nil
FX (d, d0 )
d0 = d
0
F(E1.E2) (d, d ) rs FE1 (d, r) FE2 (d, s) d0 = (r.s)
Commands.
explain.
GX:=E (d, d0 )
FE (d, d0 )
GC1;C2 (d, d0 )
12.5.3
Proof of a version of G
odels incompleteness theorem
We now show that the set T of true DL statements is not recursively enumerable. On the
other hand, the set of all statements deducible in any inference is recursively enumerable
by Theorem 12.4.3. As a consequence, any inference system that only deduces true DL
statements cannot deduce all of them, i.e. there must be at least one statement which is
true but not provable.
Stated another way: any inference system whatever must either be inconsistent: it
deduces some statements that are not true, i.e. in T ; or it must be incomplete, i.e. it
cannot deduce all true statements.
Theorem 12.5.5 (G
odels incompleteness theorem.) TID is not recursively enumerable.
Proof. Consider the set
HALT = {(p.d) | p WHILEprograms, d WHILEdata, and [[p]](d)}
Now HALT = dom([[u]]) where u is the universal program (self-interpreter) for I programs, and so by Theorem 12.5.4 is representable in DL. By Corollary 5.6.2, its complement HALT = ID\ HALT is not recursively enumerable. By Lemma 12.5.3, HALT is
representable by some statement F(x), so
HALT = {(p.d) | Subst(F, x, (p.d)) T }
Suppose TID were recursively enumerable. By Theorem 5.7.2 there must exist a program
q such that TID = dom([[q]]). Then for any I-program p and input d, we have
(p.d) HALT iff [[q]](Subst(F, x, (p.d)))
But this would imply that HALT is recursively enumerable, which is false.
205
it admittedly builds on the nontrivial difference between decidable and recursively enumerable problems).
Differences: first, this presentation does not involve Peano arithmetic at all, as Godels
original work did. Our use of ID instead gave simplified constructions, but it could well
be argued that the result is different since it concerns a different logical system (although
one which seems no more complex than Peano arithmetic). We believe that some form
of equivalence between Peano arithmetic and DL should not be difficult to establish.
Second, G
odels theorem is often presented as any logical system of a certain minimal
complexity must be either incomplete or inconsistent. We have avoided the problem of
dealing with logical system as studied in mathematical logic by substituting a proper
generalization: inference system. The assumption above that {d | P (d) T hmsI }
TID says in effect that I is consistent, and the proper inclusion we conclude expresses
incompleteness. On the other hand, the formulation above says nothing about minimal
complexity of I, just that the full truth of DL statements cannot be ascertained by
means of axioms and rules of logical deduction.
Third, G
odels theorem begins with a logical system containing Peano arithmetic,
and works by diagonalization to construct a witness: an example of a statement S which
is true, but which cannot be provable. Godels original witness is (intuitively) true since
it in effect asserts there is no proof in this system of S so if the system were able to
prove S, it would be inconsistent!
Our version indeed uses diagonalization, but on I programs instead, and to prove
that the problem HALT is not recursively enumerable.
Exercises
12.1 Express the first example of Section 12.1.1 as an inference system I in the style of
Definition 12.4.1.
2
12.2 Construct an inference system which defines the semantics of WHILE programs.
Hint: rewrite the definitions of E and C ` 0 .
2
12.3 Prove that the property t is a proof tree for predicate P is decidable. It suffices
to sketch an algorithm.
2
References
Godels incompleteness theorem appeared in 1931 [54], and overviews can be found in
[34, 37, 138]. The original proof was expressed in terms of provability and consistency,
rather than in terms of truth and the difference between recursive and r.e. sets, as we have
done. Post observed in [143] that this difference is the essence of Godel incompleteness.
It may be relevant that computability theory and the Church-Turing thesis had not been
developed in 1931; an interesting historical account may be found in [36].
The articles [91, 140] by Kahn and Plotkin stimulated the use of inference systems in
Computer Science. They have been used widely since then, for example to define both
the static and dynamic semantics of the programming language ML [129].
13
G
odel numbers versus programs as data objects Our approach differs from the
classical one in that programs are data values in our framework, and so need not be
encoded in the form of natural numbers. For the sake of perspective we briefly outline
the beginning assumptions of classical recursive function theory; being entirely based on
natural numbers, it is necessary to encode programs and nonnumeric data structures
(e.g. n-tuples) as natural numbers.
A straightforward analogy can be made between IN and ID, the set of Lisp data
structures. In our framework programs are elements of ID, so the need to enumerate
programs by assigning each one a numerical index by an often complex Godel numbering
scheme is completely circumvented.
207
13.1
An important early formalization of the concept of computability was the class of partial recursive functions or -recursive functions, defined and systematically investigated
largely by Kleene, but already implicit in Godels earlier pathbreaking work [54, 98, 100].
This is a purely mathematical characterization, with few computational aspects: The
partial recursive functions are defined to be the smallest class of functions containing
certain initial functions and closed under several operations on functions. For the sake
of completeness and links with other work in computability theory, we prove this class
equivalent to functions computable by counter machines
The lambda notation used in this chapter is defined in Appendix A.3.8. An abbreviation: we write xn to stand for the tuple x1 , . . . , xn or (x1 , . . . , xn ).
13.2
This class is defined in stages, beginning with a simpler class of functions, all of which
are total.
13.2.1
h(x ) =
if some gi (xn ) =
209
Definition 13.2.4 Function f is primitive recursive if it is obtainable from base functions x . 0 and x . x + 1 by some finite number of applications of explicit transformation,
composition, and primitive recursion.
An easy induction shows that every primitive recursive function is total. The operations
of primitive recursion and explicit transformation may, however, be applied both to total
and to partial functions.
13.2.2
13.2.3
Definition of -recursiveness
13.3
211
k
s = 2` 3v0 5v1 . . . pk+2
k
f (x) = y where stp(g(x) 1, s) = 2` 3y . . . pk+2
13.4
A parallel development to that of Section 13.2 may be carried out using data set ID and
for the WHILE language, using a standard enumeration d0 , d1 , . . . of ID, for example as in
Lemma 5.7.1. We omit the details as they are exact analogues of the above for the CM
language.
Definition 13.4.1 Let f : IDn+1 IN be a WHILE-computable total function. The partial function t(f (t, xn ) = true) : IDn ID is defined by
(
di if i is the least index such that f (t, xn ) = true
t(f (t, xn ) = true) =
otherwise
Lemma 13.4.2 If f : IDn+1 IN is WHILE-computable and total, then t(f (t, xn ) =
true) is a WHILE-computable partial function.
Proof. Given program p to compute f , and start, next, New as in Lemma 5.7.1, the
following program will compute t(f (t, xn ) = true):
read X1, ..., Xn; start;
R := p New X1 ... Xn;
while not R do
{ next; R := p New X1 ... Xn };
write Y
2
The following is interesting because it shows that all recursive functions can be represented in a uniform way. Intuitively, it says that selection of a function f to compute
amounts to selecting the constant p below. Further, performing the computation on
input d amounts to searching for the unique c that makes function T (p, d, c) the value
true; and reading out the result is done by applying a very simple function U .
This is essentially Kleenes Normal Form theorem as in [100], but the result is somewhat stronger due to our use of structured data.
Theorem 13.4.3 There is a total function U : ID ID and a total WHILE-computable
function T (p, d, c) such that
1. For all p, d ID there is at most one c ID such that T (p, d, c) = true.
2. A partial function f : ID ID is recursive if and only if there is a p ID such that
for all d ID
213
Exercises
13.1 Prove that the class of CM-computable functions is closed under primitive recursion.
2
13.2 Explain why the construction used to prove Theorem 13.3.3 or Lemma 13.4.2 does
not necessarily show that partial function y . f is recursive when f is a partial computable
function.
2
13.3 Extend to functions on ID in the natural way using the enumeration d0 , d1 , . . . of
Lemma 5.7.1. Then prove that t . f (t, p, d) = nil may be uncomputable when f is a
partial WHILE-computable function. Hint: let f (t, p, d) = true if t = true, or if both
t 6= true and [[p]](d), else undefined.
2
13.4 * Use the previous exercise to prove that t . f (t, x) = 0 may be uncomputable when
f is a partial CM-computable function.
2
13.5 (Hilberts choice function.)* Define g y . f (x, y) to hold if for all x, whenever
there exists some y such that f (x, y) = 0, then f (x, g(x)) = 0. In other words, g(x)
produces some witness to the truth of y.f (x, y) = 0, but not necessarily the least one as
was the case for y(f (x, y) = 0).
Prove that if f is partial computable, there exists a partial computable partial function
g with g y . P (x, y). Hint: use dovetailing as in Theorem 5.5.1.
2
References
Classic books on recursive function are the ones by Kleene, Davis, and Rogers [100, 33,
155]. More recent ones include a newer one by Davis et.al. [37] and one by Sommerhalder
and van Westrhenen [164], which has a scope similar to that of this book.
14
14.1
In this section we describe one approach to defining the meaning of recursively defined
functions, by so-called fixpoint semantics.
First an example: consider the recursive definition
f (n) = (if n = 0 then 1 else n f (n 1))
(14.1)
(14.2)
(14.3)
1 Recall that g ' g 0 if and only if for all n IN , either g(n) and g 0 (n) are both undefined (), or are
both in IN and equal.
217
Note that this definition is nonrecursive, since g 0 is defined in terms of g and not in terms
of itself. Equation (14.3) defines a transformation from g to g 0 . Function transformers
are often called functionals, and one may write for example g 0 = F(g) where F is the
functional defined by (14.3). In this case F has type (IN IN ) (IN IN ), and is
defined by
F(g) = g 0 , where
g 0 (n) = (if n = 0 then 0 else if n = 1 then g(1) else g(n 2) + 2)
For an example, F transforms function g3 (n) = n2 into
F(g3 ) = g30 , where
g30 (n) = (if n = 0 then 0 else if n = 1 then 12 else (n 2)2 + 2)
Satisfying equation (14.2) thus amounts to asserting g = F(g). Such a function is called
a fixpoint of F. Our goal is therefore to select as standard interpretation a unique
computable function g satisfying g = F(g).
def
Some examples (where we write = for equal by the definition of g):
def
def
g1 (n) = n = n 2 + 2 = g1 (n 2) + 2 =
(if n = 0 then 0 else if n = 1 then g1 (1) else g1 (n 2) + 2)
def
2. (14.2) is also satisfied by g2 (n) = (if n even then n else ). Arguing again by
cases, equation (14.2) is trivially true for n 1. If n > 1 is even then
def
def
g2 (n) = n = n 2 + 2 = g2 (n 2) + 2 =
(if n = 0 then 0 else if n = 1 then g2 (1) else g2 (n 2) + 2)
and if n > 1 is odd then
def
def
g2 (n) = = + 2 = g2 (n 2) + 2 =
(if n = 0 then 0 else if n = 1 then g2 (1) else g2 (n 2) + 2)
219
i
0
1
2
3
4
fi
f0
f1
f2
f3
f4
n=0
1
1
1
1
n=1
1
1
1
n=2
2
2
..
.
2
n=3
n=4
n = 5, . . .
, . . .
, . . .
, . . .
, . . .
, . . .
24
120, . . .
Remarks.
1. Function f0 is an initial (and very poor) approximation to the least fixpoint of F,
and f1 , f2 , . . . are successively better approximations.
2. The individual functions fi will most likely not be fixpoints (and the ones in the
table are not). But the limit of f0 , f1 , f2 , . . . will always exist, and will always be
Fs least fixpoint.
3. More precisely fi v fi+1 for all i, where f v g iff for all x IN either f (x) = or
f (x) = g(x) IN . The limit is the smallest function (with respect to partial order
v) f such that fi v f for all i.
4. This scheme works in principle and as a definition, but for practical implementation
a more efficient way to compute the same values would be used. For instance one
would only compute those values of f (x) that are needed for the final answer.
Theoretical Basis. Putting this informal discussion on solid foundations requires some
mathematical machinery, for example as presented in [114, 158]. There it is shown
that the least fixpoint always exists and that the sequence f0 , f1 , f2 , . . . above always
converges toward it, provided functional F is continuous (in a sense different from
that of analysis). Fortunately, any recursive function definition one can write using
variables, constants, tests, function calls and continuous base functions (such as +, ,
and if-then-else) defines a continuous functional.
Mutual recursion. It is easy to generalize this approach to assign meaning to a collection of functions defined by mutual recursion, by computing the least fixpoint of a
functional on a cartesian product of sets of partial functions. (In Scotts domain theory
this is extended to a variety of other domain constructors, see [158].) For an example,
f (n)
g(n)
defines functions f, g : IN IB such that f (n) = true for even n and false for odd n, and
g(n) = false for even n and true for odd n.
14.2
14.2.1
Kleenes second recursion theorem [100] in essence guarantees the computability of functions defined by self-referential or reflexive algorithms. Given this ability, it is possible
to simulate recursion as a special case, without having it as a built-in language construct. The theorem thus gives an alternate way to assign meaning to recursive language
constructs. The recursion theorem has many applications in recursive function theory,
machine-independent computational complexity, and learning theory [14, 19]. It is valid
for any programming language computing all partial recursive functions, such that the
s-m-n Theorem holds and a universal program exists. Such a language is called an
221
and will be
I language.
those given
hold for all
Clearly [[q]] ' [[f (q)]] implies [[q]](d) = [[f (q)]](d) = q as desired.
Example 3: Elimination of recursion, using Kleenes version. Let L = I. Consider the
total computable function [[p]]: ID ID defined by
[[p]](q.x) = [if x = 0 then 1 else x * [[q]](tl x)]
where x is assumed to be a numeral niln , as in Section 2.1.6. The call to q can be programmed as the call [[i]]((q.(tl x))), where i is the universal program for I-programs.
By Theorem 14.2.1, there is a fixed-point program e with the property
[[e]](x) =
Thus e, which was found without explicit use of recursion, is the factorial function. More
generally, Kleenes theorem implies that any acceptable programming system is closed
under recursion.
Other examples, including the Blum Speedup Theorem involve computing time and
appear in Chapter 20.
14.2.2
Our first proofs of the Recursion theorems are indirect, applying only to a reflexive
programming language extending I.
Let I be an extension of language I, with syntax as in Figure 14.1. This is an
abstract syntax; a concrete one is obtained as in Section 4.2 by encoding * and univ as
trees built from nil.
Informal semantics: the value of expression * is the text of the program currently
being executed. The value of expression univ(E, F) is the value of [[e]](f), where e is
the value of E and f is the value of F. In words: evaluation of E is expected to result in
a program text. This program is then run with the value of F as input, and the result of
this run is the value of univ(E, F). Somewhat more formally:
Definition 14.2.3 The I language has I -programs ID and I -programs the result of
encoding the programs generated by the grammar of Figure 14.1 uniquely as elements of
Expressions
E, F
Commands
Programs
3
3
C, D
P
::=
|
|
|
::=
::=
223
X
nil | cons E F | hd E | tl E
*
univ(E1, E2)
X := E | C ; D | while E do C
read X; C; write X
case Cd, St of
...
((hd E).Cd),
St
(dohd.Cd),
(T.St)
...
((*).Cd),
St
((univ E1 E2).Cd),St
(douniv.Cd),
(V2.(V1.St))
(clean.Old.Cd),
St
...
((; C1 C2).Cd),
St
...
St:=
Cd:=
Cd:=
St:=
cons P St;
cons* E1 E2 douniv Cd;
cons* V1 clean Vl Cd; Vl:= V2;
cons Vl St; Vl:= Old;
225
read D;
X := cons * D;
C;
write Y
For any d ID, q first assigns (q.d) to X, and then writes [[p]] (q.d) = [[q]] (d).
Rogers theorem: Given I program p with f = [[p]] , let q be the I program:
read D;
Tem := p *;
Y := univ(Tem, D);
write Y
For any d ID, [[q]] (d) = [[r]] (d) where r = [[p]] (q) = f (q). Thus [[q]] = [[f (q)]]
2
14.3
A model-independent approach to
computability
More recent recursive function theory, e.g. as formulated by Rogers [155], begins even
more abstractly: instead of an enumeration p0 , p1 , p2 ,. . . of programs, one simply assumes that for each i 0 there is given a partial function i : IN IN . The starting
point is thus an enumeration 0 , 1 , 2 , . . . of one-argument partial recursive functions
that are required to satisfy certain natural conditions. The definition given below captures properties sufficient for a development of computability theory which is entirely
independent of any particular model of computation.
The underlying theme is to avoid explicit construction of programs wherever possible,
so there is no formal definition of program at all; a program is merely an index in this
standard enumeration. Informal algorithm sketches, with liberal appeals to the ChurchTuring thesis, are used to establish computability.
The approach emphasizes extensional properties expressed in terms of numbers and
mathematical functions, rather than intensional properties of programs, for example their
appearance, time efficiency, or storage consumption.
Our goals are somewhat different, though, partly because we start from Computer
Science, in which the exact nature of programs and their intensional properties is of
major concern. We are more interested in efficient problem solving by programs than
in exploring the outer regions of uncomputability. Nonetheless the interplay between
these two viewpoints is fascinating and well worth study, since the extensional viewpoint
focuses on what the problems are that are to be solved (computing functions, deciding
membership in sets, etc.), whereas the intensional viewpoint focuses on how they are to be
solved, by concrete programs running with measurable time and storage usage. Another
way to describe this is as a distinction between problem specification and problem solution
by means of programs.
14.3.1
In the following definitions, an n-argument function f : IN n IN is considered effectively computable iff for some total effectively computable tupling function
<x1 , x2 , . . . , xn > there is a one-argument effectively computable g : IN IN such that
for any x1 , . . . , xn IN
f (x1 , . . . , xn ) = g(<x1 , x2 , . . . , xn >)
For conciseness we will also write x . y instead of <x, y>. Henceforth we will write a
value ranging over IN in teletype font, e.g. p, when it clearly denotes an index used as a
program, otherwise in mathematical style, e.g. p.
227
Functions taking i to di and back are clearly computable by the Church-Turing thesis.
2
A simple result using this approach is that symbolic function composition is possible in
any language defining an accepable enumeration. This generalizes the result of Theorem
13.2.5, that symbolic function composition can be done for CM programs.
Theorem 14.3.3 Given any acceptable enumeration , there is a total recursive function compose : IN 2 IN such that for any indices p, q and x IN
compose(p,q) (x) = p (q (x))
Proof. By the Church-Turing thesis, the function i (j (x)) is computable; our task is
to find an index for it by a uniform method. First, define f by
f (p, q, x) = p (q (x)) = univ (p, univ (q, x))
(equality holds by universality). By the Church-Turing thesis, this 3-argument function
is computable, and so by Turing completeness has some index r. The needed function
is then compose = compose where compose = s21 (r, p, q). Alternatively this can be done
using only one-argument specialization, by: compose = s11 (s11 (r, p), q).
2
Remarks
Even though very natural from a computing viewpoint, these conditions are not guaranteed to be satisfied for any arbitrary sequence 0 , 1 , 2 , . . . of partial recursive functions.
For example, suppose the indices i correspond to positions in a listing of all finite automata, and i (x) is the result of applying finite automaton number i to input x expressed
as a bit string. This fails Turing completeness since it is well known that finite automata
cannot compute all computable functions. Similarly, there exist enumerations possessing
a partial recursive universal function but not a partial recursive s-m-n function, or vice
versa, or neither [113], [155].
14.3.2
Theorem 14.3.4 Kleenes recursion theorem holds for all acceptable enumerations :
For any program p IN there is a program e IN such that e (x) = p (e, x). We call
such an e a Kleene fixed-point for p.
229
14.3.3
A strength of the results just proven, in relation to that of Theorem 14.2.4, is that they
hold for any programming language defining an acceptable enumeration. In particular
they hold for I as well as for I , and even for Turing machines.
A weakness, however, is that all nontrivial uses of either fixpoint theorem seem to
require the universal program. If the constructions seen above for Theorems 14.3.4
and 14.3.5 are carried out in practice, the resulting programs e or n turn out to be
unacceptably inefficient. For example, [62] reports several experiments to compute the
factorial n! Every program built, by either the Kleene or the Rogers method, had running
time greater than exponential in n. The reason is that carrying out the constructions
above literally leads to the use of n interpretation levels, each consisting of one universal
program interpreting the next, in order to compute n!
The reflective construction of Theorem 14.2.4, however, only takes approximately
time linear in n to compute n!, since only one interpretation level is ever involved. More
details on experiments and efficiency of fixpoint algorithms may be found in [78, 62].
14.4
Rogers remarkable theorem is that there exists a compiling bijection between any two
programming languages L, M defining acceptable enumerations of the partial recursive
functions on IN : compiling functions which are total, computable, meaning-preserving,
one-to-one, and onto. The proof involves several steps:
231
1
[[i]](j, y) =
[[h(j)]](y)
This is recursively defined, since program i is used on both sides. Still, the right side
is clearly a computable function of i, j, y (Church-Turing!), so there is a program r such
that [[r]](i, j, y) equals the right sides value. Define f (i) = spec(r, i). Then certainly [[i]]
= [[f (i)]] is another way to express the equation.
By Rogers version of the recursion theorem (since f is clearly total computable), [[i]]
= [[f (i)]] has a solution i. We now show that i is a 1-1 specialization index and [[h(x)]]
= [[spec(i, x)]] for all x. First, suppose for the sake of contradiction that i is not a 1-1
specialization index. Consider the smallest k such that spec(i, j 0 ) = spec(i, k) for some
j 0 > k, and let j be the smallest such j 0 for this k. Then for any y we have
[[spec(i, j)]](y) = [[i]](j, y) = 0
233
Proof. Consider the pairing function pr(x, y) = <x, y> from Section 14.3, and let pr1
be its (computable) left inverse, so x = pr1 (<x, y>) for any x, y IN . Choosing h = pr1
in Theorem 14.4.2 (and omitting superscripts), we obtain a program index i such that
[[pr1 (z)]] = [[spec(i, z)]] for all z. By definition of spec this implies for any p, d that
[[p]] = [[pr1 (< p, d >)]] = [[spec(i, < p, d >)]]
By Theorem 14.4.2 the function (p, d) = spec(i, < p, d >) is total computable, and oneto-one in < p, d >.
2
Using padding, we can now strengthen Proposition 14.4.3 to make the compiling functions
strictly monotonic.
Proposition 14.4.5 There is a total computable g : L-programs M-programs such that
[[p]]L = [[g(p)]]M for p L-programs, and 0 < g(p) < g(p + 1) for all p.
Proof. Let r be the one-to-one function from Proposition 14.4.3, and a padding function
as just constructed. Define g as follows:
g(0)
= (r(0),
min{y | (r(0), y) > 0})
g(p + 1) = (r(p + 1), min{y | (r(p + 1), y) > g(p)})
Function g simply takes a program compiled from L into M by r, and pads it sufficiently
to exceed all of its own values on smaller arguments. It is clearly computable.
2
Finally, the crux of our development:
Theorem 14.4.6 There is a one-to-one, onto, total computable function f : L-programs
M-programs such that [[p]]L = [[f (p)]]M for p L-programs.
Proof. Let g : L-programs M-programs and h : M-programs L-programs be compiling
functions from Proposition 14.4.5 such that [[p]]L = [[g(p)]]M , [[q]]M = [[h(q)]]L , and 0 <
g(p) < g(p + 1) and 0 < h(q) < h(q + 1) for all p, q.
Both functions are one-to-one and p < g(p) and q < h(q); these will be key properties
in the following construction. The one-to-one property ensures that g 1 and h1 are partial functions; the monotonicity of both implies that their inverses are also computable.
Define functions zig : L-programs {true, false}, zag : M-programs {true,
false}, and f as follows:
235
L
g
?
M
S
7
S
Z
Z
S
w
S
f (p) = g(p)
7S
S
7
S
S
Z
Z
S
S
w
S
w
S
f (p) = g(p)
L
6
h
M
L
g
?
M
p
S
7
S
S
w
S
f (p) = h1 (p)
p
S
7
7S
S
S
S
S
w
S
w
S
f (p) = h1 (p)
L
6
h
M
Note that zig, zag are both total since g, h decrease. Further, f (p) is always uniquely
defined. This is evident if zig(p) = true, as f (p) = g(p). The other case is zig(p) = false,
which can (by inspection of zigs definition) only occur if p is in the range of h, in which
case f (p) is the unique value of h1 (p).
We must now show that f is a total computable isomorphism. From the remarks
above it should be clear that f is total and recursive.
Onto. Let q M-programs. Value zag(q) is either true or false. If true then q = g(p)
for some q L-programs for which zig(p) = true. This implies f (p) = q. If zag(q) is
false then zig(h(q)) = zag(q) = false, which implies implies f (h(q)) = h1 (h(q)) = q.
Thus all M programs are in the range of f .
One-to-one. Suppose f (p) = f (p0 ). As f is defined, there are two possibilities for each
(the then or the else branch above), giving four combinations. First: if f (p) = g(p)
and f (p0 ) = g(p0 ) then g(p) = g(p0 ) which implies p = p0 since g is one-to-one. Second: if
f (p) = h1 (q) and f (p0 ) = h1 (q0 ) then h1 (q) = h1 (q0 ) which implies q = q0 since h is
a single-valued function.
Third possiblity: f (p) = g(p) and f (p0 ) = h1 (p0 ), which by definition of f can only
happen if zig(p) = true and zig(p0 ) = false. But this is impossible since p0 = h(f (p0 )) =
h(f (p)) = h(g(p)), which implies zig(p) = zig(p0 ). The fourth possibility is the same, just
with the roles of p and p0 reversed.
2
Exercises
14.1 Construct a self-reproducing WHILE-program directly, so [[p]](d) = p for all d.
References
Mannas book [114] has a lucid and elementary treatment of the fixpoint treatment of
recursion, a subject treated from a more abstract viewpoint in denotational semantics
[162, 158]. The recursion theorem is originally due to Kleene [100], and Rogers gave an
alternate form involving program transformation in [155]. The isomorphism theorem is
from [154]; our proof is adapted from [113].
Part IV
Introduction to Complexity
15
15.1
Parts I, II and III of this book concerned understanding the nature of computability, and
delineating the boundary between problems that are effectively solvable (computable)
and those that are not. The problems studied involved computing partial functions and
deciding memberships in sets1 .
Following Turings analysis of computation in general, we chose at first the WHILE
language as computing formalism, and proved several fundamental results using it in
Chapter 5. In particular Kleenes s-m-n theorem established the possibility of program
specialization, the halting problem was shown undecidable, and Rices theorem established the undecidability of all nontrivial extensional program properties. A universal
WHILE-program, able to simulate any WHILE-program at all, was constructed.
The boundary between those sets whose membership problems are decidable, semidecidable, and undecidable was explored, as were the relations among semidecidability of
set membership, effective enumerability, and the computability of possibly partial functions.
After that rather abstract chapter, relations to daily computing concepts were discussed informally in Chapter 6: compilers, interpreters, partial evaluation, compiler bootstrapping, and related computational time aspects. Time was, however, only treated in
a quite informal way.
The remainder of Part II very significantly broadened the scope, relevance, and applicability of the previous formal results, by showing that they hold not only for the
WHILE language, but also for several other computing formalisms: both flow chart and
functional analogues of the WHILE language; Turing machines; counter machines; random
access machines; and classically defined recursive functions. This was done by showing
all these formalisms to be mutually simulable, or by compilations. In particular, Chapter
8 on robustness introduced models CM, 2CM, RAM, SRAM, TM and proved their equivalences (see Figure 8.1 for an overview.) The main result was: computability, without
regard to resource bounds, is equivalent for all of: F, WHILE, GOTO, CM, 2CM, RAM, and TM.
A corollary: the halting problem is undecidable for any language L in this list.
Finally, some relatively natural and simple problems (at least in appearance!) were
1 All
239
15.2
Parts I through III concerned only what was computable, and paid no attention at all
(aside from the informal Chapter 6) to how much time or space was required to carry
out a computation. In the real computing world, however, computational resource usage
is of primary importance, as it can determine whether or not a problem is solvable at all
in practice.
In the remainder of the book we thus investigate computability in a world of limited
resources such as running time or memory space. We will develop a hierarchy of robust
subclasses within the class of all decidable sets. In some cases we will prove proper containments: that a sufficient resource increase will properly increase the class of problems
that can be solved. In other cases, questions concerning proper containments are still
unsolved, and have been for many years.
Finally, and in lieue of definitive answers, we will characterize certain problems as
complete for the class of all problems solvable within given resource bounds. A complete problem is both solvable within the given bounds and, in a precise technical sense,
hardest among all problems so solvable. Many familiar problems will be seen to be
complete for various of these complexity classes.
15.2.1
241
Complexity theory has as yet a great many unsolved open questions, but has evolved
a substantial understanding of just what the intrinsic complexity is of many interesting
general and practically motivated problems. This is reflected by a well-developed classification system for how decidable a problem is. Computability theory has similar
classification systems, for how undecidable a problem is [100], [155], but this subject
is beyond the scope of this book.
15.2.2
The concepts we attempt to capture all involve computational resources, the central ones
being time, space, and nondeterminism (the ability to guess).
First, we define what it means to decide a problem within a given time bound. Second,
we define what it means to decide a problem within a givenspace bound (i.e., a limit on
memory or storage) . The third resource, nondeterminacy, will also be introduced and
discussed.
These definitions will require some discussion since not entirely straightforward, partly
due to the multitude of our machine models, and partly to some near-philosophical
questions about what is a fair time or space cost? when input size grows toward
infinity. After carefully investigating fair resource measures, we will establish that:
1. Computability, up to linear differences in running time, is equivalent for F, WHILE,
and GOTO.
2. Computability, up to polynomial differences in running time, is equivalent for all
of: F, WHILE, GOTO, SRAM, and TM.
3. Computability, up to polynomial differences in memory usage, is equivalent for all
of: F, WHILE, GOTO, SRAM, and TM.
Invariance of polynomial-time computability
Conclusion 2 supports (or will, once proven) a strong robustness result: The class
ptime, consisting of all problems solvable in time polynomially bounded in the size of
the problems input, is essentially independent of the computation model being used3 .
The assertion that ptimeL is the same class of problems for all reasonable sequential
(that is, nonparallel) computational models L could well be called Cooks thesis, after
Stephen C. Cook, a pathbreaking researcher in computational complexity. A stronger
3 Conclusion 1 is not as strong since it involves fewer machine types, and it seems likely that the
property of linear time solvability in fact depends on the machine model used.
version, analogous to the Church-Turing thesis but most likely too strong, is to identify
ptime with the class of all tractable, or feasible problems.
The complexity equivalences in points 2 and 3 above do not concern either the counter
machine or the unrestricted RAM. Informal reasons: counter machines have so limited an
instruction set that solving even trivial problems can take nonpolynomial computation
time. The full RAM model has the opposite problem: it can solve some problems faster
than is realistic on actual computers (Section 16.5 contains a more detailed discussion).
Conclusions 2 and will be shown by following arcs in the cycle SRAM TM GOTO
SRAM: We will show how, for each arc L M in the cycle, to construct, for an arbitrary
L-program p, an equivalent M-program q whose running time is polynomially bounded in
the running time (resp. space usage) of p. In essence we will traverse the central loop of
Figure 8.1, with one exception: a GOTO program will be directly simulated by an SRAM
program, bypassing the counter machines of Figure 8.1.
15.3
We deal with such questions as: what is the most efficient way to solve a given problem?
Such a question is quite difficult to answer because it quantifies over all possible correct
algorithms for the problem. Nevertheless we will establish lower bounds on needed resources (time or space) for some problems: proofs that any algorithm solving the problem
within a certain programming language must use at least at certain amount of computing
resources.
Establishing that problem A cannot be solved in time f (n) amounts to proving that
no matter how any program p is written, if p solves A then it must take more than f (n)
amount of time on some inputs of size n. Such results can only be proven in precisely
defined contexts, and even then are not at all easy to obtain.
On the other hand, there exist some problems that have no best algorithm: The
famous Blum speedup theorem (Chapter 20) says that there are problems such that for
any program p whatever that solves the problem, there is another program q also solving
the problem that is much faster than p on all but finitely many inputs.
In this book part we are primarily concerned with the following question. When do
added computational resources provably increase problem-solving ability? For instance,
is there a problem P solvable by no algorithm whatsoever that runs in time n2 (where
n is the size of the input data), but which can be solved by at least one algorithm that
runs in time n3 ? We will see that the answer is yes.
243
A similar question: given time resource bound function f , are there problems solvable
in time b f (n), but not in time a f (n) for some constants a < b? (Here, again, n is the
size of the input data.) In other words, do constant time factors matter for problems
solvable in time O(f (n))? We will see that the answer is yes for the programming
language I.
As a special case, we prove that constant time factors are indeed important, even
within linear-time solvable problems; thus confirming in theory what one tends to think
from practical experience. Practice can, however, only establish positive results such as:
problem A can be solved in time f (n). Negative results are much harder, as it is clearly
inadequate to say I tried to solve this problem in this way . . . , but failed.
What problems are solvable in bounded time or space?
Our goal is to investigate the relative computing power of the above mentioned models
for solving problems, given bounds on programs running times or space usage. This
leads first to asking the question: what is a problem?
If a problem is to compute a function f (x) there is a risk of a trivial answer: given
more time, more problems can be solved simply because a larger result f (x) can be
written out if more time is available(!). Such answers give little real insight into the
relation between available resources and problem-solving power, so we restrict ourselves
to decision problems.
A decision problem is given by a subset A L-data for some language L. The problem
to be solved is, when given an input data value x L-data, to answer the question: Is
x A?
In the remainder of the book for simplicity of exposition we will, unless explictly
stated otherwise, assume L is an imperative language as described in Section 7.1, with
programs of form: p = I1 . . . Ik . Thus a computation is a linear sequence of states
p ` s1 s2 . . . st . This naturally describes computations by all the languages seen
so far, except for the functional languages.
Nondeterminism
Many practically interesting but apparently intractable problems lie is the class nptime,
a superset of ptime including, loosely speaking, programs that can guess (a precise
definition will appear later.) Such programs can solve many challenging search or optimization problems by a simple-minded technique of guessing a possible solution and
then verifying, within polynomial time, whether or not the guessed solution is in fact a
correct solution.
15.4
An extension of Cooks thesis would be to argue that the class of all computationally
tractable problems comprises exactly those that lie in ptime. This is a useful working
assumption in many circumstances, but should not be taken too literally.
Identification of ptime with the computationally tractable problems is less solidly
founded than the Church-Turing thesis, which concerns computability in a world of
unlimited resources. Reasons for a certain skepticism include two facts:
100
There exist algorithms that run in a superpolynomial time bounds in the worst case,
but which work quite well in practice and with small constant factors. Examples:
The Simplex method for linear programming can take exponential time in
the worst case, but works very well in practice for finding optimal solutions
to systems of linear inequalities. In this interesting case, there exist alternative algorithms that are truly polynomially time-bounded (e.g. the ellipsoid
method), but all seem to have unacceptably large constant time factors for
practical use.
Type inference in the programming language SML [129] has been proven to
take exponential time in the worst case, regardless of the algorithm used, but
again works well in practice.
245
There are, as well, a number of arguments in favour of identifying ptime with tractability. While admittedly not a perfect fit, the class ptime has good closure properties, so
reasonable operations on problems in ptime or on programs running in polynomial time
do not take us outside the class. Further, the class has many alternative characterizations
and theorems, making it mathematically appealing to work with.
15.5
The constant speedup theorem, well known from Turing machine based complexity theory,
in essence states that any program running in superlinear time can be rewritten so as
to run faster by any preassigned constant factor. This counterintuitive result will be
proven false for a natural imperative programming language I that manipulates treestructured data. This relieves a long-standing tension between general programming
practice, where linear factors are essential, and complexity theory, where linear time
changes are traditionally regarded as trivial.
Specifically, there is a constant b such that for any a 1 there is a set X recognizable
in time a b n but not in time a n (where n is the size of the input.) Thus the collection
of all sets recognizable in linear time by deterministic I-programs, contains an infinite
hierarchy ordered by constant coefficients. Constant hierarchies also exist for larger
increases from time bounds T (n) to T 0 (n), provided the bounds are time-constructible
in a natural sense.
15.6
time and space, respectively, bounded by polynomial functions of the problems input
size. Classes nrdonly, nptime, npspace denote the problems decidable within the
same bounds, but by nondeterministic algorithms that are allowed to guess; and rec,
re are the recursive and recursively enumerable classes of decision problems already
studied in Chapter 5.
Invariance of with respect to problem representation
The significance of this hierarchy is that a great number of practically interesting problems (e.g. maze searching, graph coloring, timetabling, regular expression manipulation,
context-free grammar properties) can be precisely located at one or another stage in this
progression.
Its significance is notably enhanced by the fact that the placement of a problem
within the hierarchy is in general quite independent of the way the problem is described.
For example, a problem concerning graphs will appear in the same complexity class
regardless of whether the graphs are represented by connection matrices or by adjacency
lists. (There are a few exceptions to this rule involving degenerate problem instances,
for example extremely sparse graphs, but such exceptions only seem to confirm that the
rule holds in general.)
A collection of open problems
A long-standing open problem is whether the backbone inclusions are proper. Many
researchers think that every inclusion is proper, but proofs have remained elusive. All
that is known for sure is that nrdonly pspace, a very weak statement.
15.7
In spite of the many unresolved questions concerning proper containments in the backbone, a great many problems have been proven to be complete for the various classes.
If such a problem P is complete for class c, then it is hardest in the sense that if it lay
within the next smaller class (call it b with b c), then every problem in class c would
also be in class b, i.e. the hierarchy would collapse there, giving b = c. Complete
problems are known to exist and will be constructed for every class in the backbone
except for rdonly (since no smaller class is present) and rec (for more subtle reasons.)
15.8
247
The classes logspace and ptime have been traditionally defined by imposing space,
respectively time, bounds on Turing machines. We will give two intrinsic characterizations, free of any externally imposed bounds. In particular, we will see that logspace
is identical to the class rdonly of problems solvable by WHILE-programs that do not
use the cons operation; and that ptime is identical to the class rdonlyrec of problems
solvable by the same programming language, extended by recursion. We anticipate the
first result briefly as follows.
Read-only Turing machine programs whose work space is bounded by k log(|d|) for
some k and all d
Read-only counter programs such that each counter value is bounded by |d|, or a
polynomial in |d|
GOTO programs without cons, i.e. that use no additional space at all, beyond the
input d
Further, all problems in this class will be seen to lie in ptime (though whether the class
is a proper subset of ptime is still an open question).
References
More information about the scope and historical development of complexity theory may
be found in the surveys [15, 17, 29, 63, 147]. Broadly encompassing surveys of complete
problems may be found in the books by Garey and Johnson, and by Greenlaw, Hoover,
and Ruzzo [52, 56]. The approach taken in this book stems from article [78], and [88]
contains a preview of some of its results on complexity.
16
Parts I-III concerned only the limits of computability and completely ignored questions
of running time and space, except for the very informal treatment of time in Chapter
6. In the remainder of the book we mainly consider imperative languages, as described
in Section 7.1. Two tree-manipulating imperative languages will be the main focus: the
languages GOTO and I of Sections 7.2 and 4.2.
We will need to be much more precise about running time and space than before,
partly to be able to prove theorems concerning what can or cannot be done within various
resource bounds, and partly to justify that these results reflect facts about real-world
computations (at least in contexts where resource bounds may be expanded whenever
needed).
16.1
16.1.1
Some simplifications
For technical convenience we make some small changes in the machine or programming
models seen earlier, and precisely define program running times in the revised computation models. The main changes are the following. None affect the class of problems that
can be solved, though some problems representations may be encoded. Their aim is to
provide better descriptions of computations within limited time or space resources (with
fairer cost assignments, or technically more manageable.)
All the computation models will consistently use a fixed input set , namely L-data
= {0, 1} or L-data = ID01 , a subset of ID isomorphic to {0, 1} . This will make it
easier to compare various models, without having continually to invoke data coding
and decoding functions.
The input to an SRAM program is a read-only string a1 . . . an {0, 1} . Initially all
registers are set to zero, except that register R0 will be initialized to n.
249
16.1.2
Recall Definition 6.1.1 of a timed programming language L. The simplest time measure is
the unit cost time measure, quite commonly used in complexity theory:
Definition 16.1.1 For an imperative language L, the function time L : Lprograms
(Ldata IN ) is defined as follows, for any p Lprograms, d Ldata:
(
timeLp (d)
16.2
Before continuing, there is a difference in data sets that must be reconciled: Turing
machines read bit strings, and counter machines read numbers, whereas our WHILE, GOTO
and other languages read binary trees. Function bin of Chapter 8 maps numbers into bit
strings, so all we need is a way to represent a bit string in {0, 1} as a binary tree in ID,
and vice versa.
Isomorphism of {0, 1} and a subset of ID
We regard 0, 1 as standing for standard encodings in ID of nil and (nil.nil), respectively. Clearly any ID-value in the set ID01 generated by the following grammar
D01
::=
nil
(nil . D01)
((nil.nil) . D01)
251
can be regarded as a string from {0, 1} . Further, string a1 a2 ...ak {0, 1} with ai {0, 1}
can be regarded as an element of ID01 by the coding c : {0, 1} ID01 defined as follows,
using the list notation of Section 2.1.5:
c(a1 a2 ...ak ) = (a1 a2 ...an ) ID01
Treating all of ID
Our restriction to the subset ID01 of ID makes things simpler, but is by no means essential.
A coding of arbitrary ID elements is easy to define and work with, with for example
cID : ID {0, 1} representing d ID by its Polish prefix form in {0, 1} . This is
obtained by traversing its tree structure in preorder, writing 0 every time nil is seen,
and 1 every time an internal cons node is seen.
The constructions seen below could be carried out using the full ID, at the expense
of some complications (see Exercise 16.3).
16.3
16.3.1
Comparing languages
Definition 16.3.1 Suppose one is given two timed programming languages, L and M
with L-data = M-data. Then by definition1
1. L ptime M if every for L-program p there is an L-program q such that [[p]]L = [[q]]M
and a polynomial f (n) such that for all d Ldata
timeMq (d) f (timeLp (d))
In words: M can simulate L, up to a polynomial difference in time.
1 To
avoid trivial exceptions, the requirements only apply to programs p and languages S such that
|d| timeSp (d) for all data d. This is not unreasonable, since a program running in time less than this
would be unable to examine all of its input data.
16.3.2
Tree-manipulating programs
253
16.4
Tree-manipulating programs
16.4.1
In Section 2.3 it was shown that any program using tree comparison operator =? could be
replaced by an equivalent one using only tests on nil. Consequently in the remainder of
this book we assume that tree-manipulating programs only compare values only against
the atom nil, and that such a comparison has unit time cost. Remark: this avoids any
need to have the operation =?, since its effects can be achieved using if and while.
16.4.2
GOTO revisited
The language GOTO will henceforth have the following syntax (slightly restricted) and
semantics, and running times:
Definition 16.4.1 Let program p = I1 ...Im , and let Vars be a countable set of variables. We use the conventions d, e ID and X, Y, Z Vars. The informal syntax of GOTO
is given by the following grammar for instruction forms where d ID:
I
::=
|
X := d | X := Y | X := hd Y | X := tl Y
X := cons Y Z | if X goto ` else `0
GOTO
16.4.3
Conceptually, this is very simple: one counts one time unit for each operation or test
performed on data during execution. Technically, we use parts of the definitions of E etc.
from Section 2.2.
Definition 16.4.2 Given a store containing the values of the variables in an expression
E, the function T maps E and into the time T [[E]] IN taken to evaluate E. Function
T : Expression (Storep IN ) is defined by:
T [[X]]
T [[d]]
T [[hd E]]
T [[tl E]]
T [[cons E F]]
=
=
=
=
=
1
1
1 + T [[E]]
1 + T [[E]]
1 + T [[E]] + T [[F]]
2
Given a store , the relation C `time t expresses the fact that t time units are
expended while executing the command C, beginning with store . (If command C does
not terminate in the given store , then there will be no t such that C `time t.) By
definition C `time t is the smallest relation satisfying:
X := E
C; D
`time t + 1
`
time
t+t
while E do C `time t + 1
while E do C `
time
t+t +1
if T [[E]] = t
if C `time t, C ` 0 ,
and D `time 0 t0
if T [[E]] = t and E[[E]] = nil
if T [[E]] = t and E[[E]] 6= nil and
C; while E do C `time t0
2
16.5
Since all our programs are imperative, the most natural cost to assign to a computation
is the sum of the costs of its individual steps. The unit cost per operation model will
consistently be used unless other measures are specified. Thus Turing machines and the
GOTO language use unit cost, whereas WHILE and I use time as specified by Definition
16.4.2.
255
One-step instruction times for random access machines can be defined in more than
one way, and some are closer to daily computational practice than others.
Time measures on the counter machine CM do not give much insight. The problem
is that the CM instructions are too weak to solve interesting problems within reasonable
time, since in any one instruction, a counter may change in value by at most 1. We
will see, however, that a reasonable measure of computation space can be defined for a
counter machine.
The full RAM model has somehat the opposite problem under the unit-cost model, if
memory cells are unlimited in value: its instruction set is typically too strong to yield
a reasonable time measure. The problem is one of data value size: if instructions such
as X:=Y+Z are allowed, executing X:=X+X k times will multiply Xs value by 2k ; and an
instruction X:=X*X (allowed on many RAM models) can, if repeated, construct extremely
large values within short time.
A symptom of this problem is that some problems known to be NP-complete (presented later in this book) can be solved in polynomially many steps on the unlimited RAM
model [159]. One solution to this problem is to use a nonuniform cost measure, in effect
charging instructions according to how large the values are that they manipulate. This
leads to the logarithmic cost model discussed below.
Another solution, which we will use, is to limit the RAM model to be a successor RAM
or SRAM, with indirect addressing to load and store data, but only with data computation
instructions X:=Y+1 or X:=Y-1. We will see that this yields the same class ptime under
unit time costing as Turing machines and other models. Further, it is essentially equivalent to impure Lisp, meaning Lisp with instructions to change already existing cells
via operations such as SETCAR! or RPLACA. Another equivalent formulation is Schonhages
storage modification machine [160].
16.5.1
There is some controversy about what a fair charge should be for instruction times on
a RAM, for at least two reasons. First, the model is fairly close to actual machine hardware
instruction sets to relate its computation times to those we deal with in practice (unlike,
for example, the counter machine). Second, the model allows arbitrarily large natural
numbers to be stored in its registers or memory cells a feature in conflict with the
first.
It is not easy to get around allowing arbitrarily large values in memory cells, since
if one assumes all cells are finite then the machine becomes a kind of finite automaton.
While interesting in themselves and useful for many purposes (e.g. lexical analysis in
a compiler), finite automata are not Turing complete, and daily Computer Science algorithms become quite unnatural when truncated to fit within finitely bounded word
sizes.
We here have a paradoxical situation: that the most natural model of daily computing
on computers, which we know to be finite, is by means of an infinite (i.e. potentially
unbounded) computation model. This question can be discussed at great length, which
we will not do here. One element in such a discussion, though, would surely be the
fact that we carefully design and build our computers to provide a faithful model of
a mathematical world, e.g. great attention is paid to ensure that an ADD instruction
behaves as closely as possible to the idealized mathematical addition function, as long as
overflow does not occur.
Consequently it would seem unnatural not to model our descriptions of computer
capacities on mathematical idealizations, at least until one exceeds limits due to word
size, run time cost, or memory capacity. It is also relevant that todays computers are
extremely fast and have very large memories, so such limitations are not encountered as
often as in the earlier days of our field.
Back to the point of assigning fair costs to the RAM model. Hardware factors relevant
to fair costing can include:
1. Should the size of the data being manipulated be considered?
One view: one data item fits into one machine word, which takes a constant time
to fetch or store.
Another view: very large data values take longer to manipulate, and this should
be accounted for in the instruction cost.
2. Should program-dependent factors be included?
A basic example is the address (index) of an explicitly named program variable.
Some other examples follow.
3. Should the time to locate the current instruction be included?
4. What effect does instruction pipelining have on times? Should a linear sequence of
instructions be charged less time than code with control transfers?
5. What about page faults, or data or instruction cache misses? These involve distinctions between data in local memory, e.g. on the chip with the CPU, and memory
held in a global store.
6. Computer circuits exist in three-dimensional space, so time O(n1/3 ) is surely a
lower bound in the limit for the time to access data at address n. (Actually time
257
16.5.2
Xi := Xi+1
Xi := Xi-1
Xi := 0
if Xi=0 goto ` else `0
Xi := Xj
Xi := <Xj>
<Xi> := Xj
Which time measure is more realistic? We will see that, when discussing polynomial time bounds and the class ptime, it makes little difference which time measure
is chosen. However, these factors become highly relevant if we discuss either or both of
linear time computability, or the effect of increasing both data and storage size toward
infinity.
The assumption that both data and storage size can grow toward infinity implies a
computing model where more and more hardware or circuitry is needed. It thus models
one aspect of distributed computing, e.g. situations involving very large data bases, but
not daily practice within a single stored-program computer.
In spite of the argument that one should charge time proportional to the address
length for access to a memory cell, or a dag or graph node, this is not the way people
think or count time when they program. Memories are now quite large and quite cheap
per byte, so most programmers need to take little account of the time to access memory
in external data storage.
Further, computer memories are carefully designed to make pointer access essentially
a constant time operation, so users rarely need to be conscious of address length in order
to make a program run fast enough. In practice computer hardware is fixed: word sizes
or memory capacities cannot practically be increased on demand.
An analogy is with arithmetic: Even though the computer certainly cannot deal with
arbitrary integers, it is carefully designed to model operations on them faithfully as long
as they do not exceed, say, 32 bits. Given this fact, programmers have the freedom
to assume that the computer faithfully realizes the world of arithmetical calculations,
thinking of his or her problem and ignoring the computers actual architecture unless
boundary cases arise.
259
Exercises
16.1 Find a program-independent bound on the slowdown of the translation in Proposition ??.
2
16.2 Find a program-dependent bound on the slowdown of the translation in Proposition 3.7.4 as a function of p.
2
16.3 The purpose of this exercise is to show how to modify the coding between bit
strings in {0, 1} and binary trees d ID01 of Section 16.2 to include all of ID. Coding
cID represents d ID by its Polish prefix form. This is obtained by doing a preorder
traversal of its tree structure, writing 0 every time nil is seen, and 1 every time an
internal cons node is seen.
Formally it is defined by cID (nil) = 0, cID (d1.d2) = 1cID (d1)cID (d2). Figure 16.1 shows
an example.
@
@
R
@
nil
@
nil
@
R
@
@
@
R
@
nil
@
nil
@
R
@
nil
The lemma gives a simple algorithm to determine whether a bit string corresponds to a
tree: initialize a counter to 1, and scan the bit string from the left. Add 1 every time a
1 is seen and subtract 1 whenever a 0 is seen. The string represents a tree if and only if
the counter equals 0 when the scan is finished, and never becomes negative.
Hint: Part 3 can be used for part 2.
2
References
The random access machine was introduced by Shepherdson and Sturgis in 1963 [163].
Discussions of the delicate issue of what is a fair time cost measure are found the the book
by Aho, Hopcroft and Ullman [2], and in articles by Schonhage and by Jones [159, 78].
17
To deserve its name, complexity theory must concern realistic models of program behaviour. In this (admittedly low-level) chapter we examine several basic assumptions,
hoping that the discussion will give greater faith that our complexity models faithfully
capture intuitive complexity concepts.
As in the preceding chapter, nil will be the only atom used in any construction
or definition henceforth even though for the sake of readability we may use other
atomic abbreviations in examples, for instance 0 and 1 as alternate ways to write nil
and (nil.nil). Extension to multiple atoms is straightforward but more complex.
17.1.1
We have assumed every elementary operation cons, hd, etc. as well as every conditional
to take one time unit in GOTO, and similar costs appear in WHILE. These costs may
seem illogical and even unreasonable since, for example, command X := cons Y Y binds
to X a tree with more than twice as many nodes as that bound to Y.
In fact, it is reasonable to assign constant time to a cons operation and the others using the data-sharing implementation techniques common to Lisp and functional
programming languages. In this section we give such a semantics for GOTO.
The first subsection introduces a certain form of graphs. The second subsection
reveals the connection between these graphs and elements of ID. The third subsection uses
the graphs to state the new semantics and the fourth subsection proves the correctness
of the new GOTO semantics with respect to the standard GOTO-semantics. The last
subsection sketches a Pascal-like implementation of the semantics, which will be used in
later chapters.
Definition 17.1.1
1. A DAG is a directed acyclic graph.
2. A data-storage graph (DSG for short) is a DAG with the following properties:
261
(a) Every node has either no out-edges or two out-edges. The first is called an
atom-node, and the second is called a cons-node.
(b) For every cons-node, one out-edge has label l and the other has label r. The
node pointed to by the edge with label l is called the left child of the cons-node,
and the node pointed to by the edge with label r is called its right child.
(c) There is only one atom-node, named node 0, to represent atom nil.
3. A rooted DSG is a DSG together with a designated node chosen as the root. A
DSG may have nodes that are unreachable from its root.
4. Suppose is a DSG with two nodes n1 , n2 , and let n be a fresh node not already
in . Then add(, n1 , n2 , n) is the DSG obtained by adding the node n to , and
adding an edge from n to n1 labelled l, and a node from n to n2 labelled r. For
instance, the DSG in the right of Figure 17.1 could arise from the one to its left by
an add(, n1 , n2 , n) operation.
2
Figure 17.1 shows two example DSGs; consider the leftmost one. (It represents
((nil.nil).(nil.nil)), which can also be written as (1 0) or even (1.1).) For
simplicity, the labels l and r have not been written; instead the same information is
represented by the physical horizontal relationship between the edges on paper. There is
in reality only one node labeled nil, but we have duplicated it to make it easier to read
the diagrams.
S
S
=
wQ
S
@
Q
@
R
s
Q
nil nil
nil
nil
Figure 17.1: Two example DSGs
B
B
S
SS
B
=
w
Q B
@
@
R
Q B
sN
Q
nil nil
nil
nil
17.1.2
263
In connection with the DAG semantics we shall view elements of ID as DSGs, and conversely. To view a DSG as an element of ID we unfold it from a given node n to give
the value unf (, n) ID.
Definition 17.1.2 Given a d ID define a DSG = dag(d, n) with root node n as follows.
1. if d is the atom nil then consists of the dag with one node named 0.
2. if d = (d1 .d2 ) then has a root n, which has edges to n1 and n2 where n1 and n2
are the roots of the DSGs for d1 and d2 , respectively.
Definition 17.1.3 Given a DSG and a node n in define d = unf(, n) ID as follows.
0
unf(, n) =
(d1 .d2 )
For example, let be the leftmost DSG in Figure 17.1, and n its topmost node. Then
unf(, n) = ((nil.nil).(nil.nil)) = (1(0.nil)) = (1 0).
During execution in GOTO, the DAG semantics builds a DSG . Rather than binding
a variable X to a value d ID, the execution binds X to a node in the DSG. To do this,
it will use an environment : Vars(p) DagNodes. Instead of binding variable X to the
atom nil, X will now be bound to atom-node 0, and instead of binding variable X to a
pair (d1 .d2 ), the new semantics will bind X to a cons-node. For an example, consider the
reverse program seen in Section 7.2.
0:
1:
2:
3:
4:
5:
6:
7:
8:
read X;
Y:= nil;
if X then goto 4;
goto 8;
Z := hd X;
Y := cons Z Y;
X := tl X;
goto 2;
write Y
Consider this program, applied to input (1 0). The left part of Figure 17.2 illustrates
the DSG at the start: X is bound to DAG structure for (1 0) = ((nil.nil) nil), while
Y and Z point to nil. At the end of execution two more nodes have been allocated and
Y points to the node denoting the result (0 1), the reverse of input (1 0).
(nil (nil.nil)) = (0 1) Y
J
J
J
^
J
B
B
S
SS
=
(nil.nil) = 1
(nil.nil) = 1
B
=
w
w
SQ Y Z
Q B
@
@
Q
@
R
@
R
Q B
s
Q
sN
Q
nil nil ?
nil nil
nilX
nil
0Z
nil = 0
Figure 17.2: First and last DSG in execution of reverse
DAG semantics
In general the DAG semantics is as follows.
Definition 17.1.4 (DAG semantics for GOTO). Let program p = I1 ; ...; Im with
both input and output via variable X, and let Vars(p)={X,Z1...,Zn} be the set of all
variables in p.
1. A store for p is a pair (, ) where is a DSG and is a mapping from Vars(p) to
nodes of . A state for p is a pair (`, ) where 1 ` m + 1 and is a store for p.
2. The initial store 0p (d) for p with input d is (0 , 0 ) where 0 = dag(d, l) and
0 = [X 7 l, Z1 7 0, . . . , Zn 7 0]
3. The rules for the DAG semantics of GOTO appear in Figure 17.3. We define
DAG
[[p]]
(d) = e iff (1, (0 , 0 )) . . . (m + 1, (, )) and unf(, (Y)) = e
4. We define the running time of p on input d by:
timeDAG
(d) = t iff (1, (0 , 0 )) . . . (m + 1, (, ))
p
{z
}
|
t+1 states
s(, , a)
s(, , Y)
=
=
(, 0)
(, (Y))
s(, , cons Z Y)
Where n is fresh.
(, n)
If (Y) is a cons-node
with left child n, else
where 0 is s nil node.
If (Y) is a cons-node
with right child n, else
where 0 is s nil node
s(, , hd Y)
s(, , tl Y)
(, 0)
(, n)
(, 0)
265
17.1.3
GOTO
(d) = [[p]]
Proof. First prove that (0 , 0 ) 0 for the initial stores in the DAG and standard
semantics (use the property that unf(dag(d, n), n) = d). Then prove that (, ) implies
(`, ) (`0 , 0 ) for some 0
iff
(`, (, )) (`0 , ( 0 , 0 )) for some ( 0 , 0 )
1. Neither the standard nor the DAG semantics ever arrive at label m + 1;
2. Both the standard and the DAG semantics arrive at label m + 1 in t steps, and the
final states (m + 1, (, )) and (m + 1, ) satisfy (, ) . If so, then unf(, (Y)) =
(Y), so the final result in the two semantics are the same.
2
17.2
We now give a Pascal-like implementation using arrays of the DAG semantics of flow
chart language GOTO. This will be used for several purposes:
To justify the unit-cost timing used for GOTO programs, or that assigned in Section
16.4.3 to WHILE programs.
To prove that the problems solvable by functional F programs without cons are
exactly those solvable in polynomial time, in Section 24.2.
To prove that boolean program nontriviality and Horn clause satisfiability are
complete for ptime, meaning that they are in a sense most difficult among
all problems solvable in polynomial time (Chapter 26).
17.2.1
The first is now the main goal: to make it evident that each operation takes time bounded
by a constant. As usual we assume there is only one atom, nil (the technique is easily
extendible to any fixed finite set of atoms). The implementation technique is easier to
explain for an input-free program, so we begin assuming no input, and then explain how
to account for initialization for input data.
Given a GOTO program p = I1 ; ...; Im with output through variable X, let
{X,Z1...,Zk} be the set of variables in p. Construct a Pascal-like simulating program
as follows:
267
*)
*)
*)
*)
The idea is that the two parallel arrays Hd, Tl hold all pointers to hd and tl substructures.
Variables assume only node pointers as values in this implementation. A variable X has
value 0 if it is bound to nil, and otherwise points to a position in the arrays Hd and Tl
arrays. This position contains pointers to the first and second components of Xs value.
For simplicity we handle allocation by using variable Time to find an unused index in
these arrays1 . Command I` , which simulates command I` for 1 ` m + 1, is defined
in Figure 17.4. Note that each of the simulation sequences above takes constant time,
under the usual assumptions about Pascal program execution.
Instruction I
Z := nil
Z := V
Z := hd V
Z := tl V
Z := cons V W
if Z = nil goto r else s
Simulating instruction I
Z := 0; Time := Time + 1
Z := V; Time := Time + 1
Z := Hd[V]; Time := Time + 1
Z := Tl[V]; Time := Time + 1
Hd[Time] := V; Tl[Time] := W;
Z := Time; Time := Time + 1;
if Z = 0 then goto r else s
more economical implementation could maintain a free list of unused memory cells.
17.2.2
Data initialization.
Suppose now that program p has input d = (d1 d2 . . . dn ) ID. This data has to be
stored into the Pascal data structures Hd, Tl. One way to describe this is to assume that
variable X has been initialized by the following sequence of instructions, inserted at the
start of p, where Zero indicates the always-present cell 0:
One := cons Zero Zero; X := Zero; Initn ;...Init1 ;
where for 1 i n Initi is:
X := cons Zero X
if di = 0, else
X := cons One X
This adds n + 2 instructions and so has the effect of incrementing every instruction label
in p by n + 3, so the simulation should now implement GOTO code if Z = nil goto r
else s in p by Pascal-like code if Z = 0 then goto r+n+3 else s+n+3.
The following indicates the initial DAG built this way for input d = (1 0), coded as
((nil.nil) nil).
S
S
=
wQ
S
@
Q
@
R
s
Q
nil nil
nil
nil
An alternative approach. Some readers may object to the approach of building the input
into the Pascal-like simulating program. While we will find this convenient later, there
is a simple alternative: Just replace the line
X := Zero; Z1 := 0;... Zn := 0;
above by
readin; Z1 := 0;... Zn := 0;
where procedure readin reads d = (d1 d2 . . . dn ) and initializes Hd, Tl and sets Time to
n+3 (all just as the initialization sequence above would do). This is Exercise 17.2.
269
Trace of an example simulation. Consider the reverse program seen before, and
assume that it is given input X = (1 0), coded as ((nil.nil) nil), which is represented
in the Hd, Tl table positions 0 through 4. This would give rise to the sequence of memory
images in Figure 17.5, where
Timet
Instrt
Hdt , Tlt
=
=
=
This models the right part of Figure 17.1, except that all of nil, a and b are represented
by cell number 0.
Timet
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Instrt
Initialize data at
times
t = 1, . . . , n + 2
1:
2:
4:
5:
6:
7:
2:
4:
5:
6:
7:
2:
3:
8:
Y := nil
if X goto
Z := hd X
Y := cons
X := tl X
goto 2
if X goto
Z := hd X
Y := cons
X := tl X
goto 2
if X goto
goto 8
write Y
4
Z Y
4
Z Y
Hdt
0
0
0
1
1
0
-
Tlt
0
0
0
3
0
8
-
Xt
3
4
4
4
4
4
3
3
3
3
3
0
0
0
0
0
Yt
0
0
0
0
0
0
0
0
8
8
8
8
8
13
13
13
13
13
13
Zt
0
0
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
Memory reuse
Practical implementations of programs manipulating tree-structured data re-use memory
cells, in contrast to the method above which allocates a new cell every time the clock
ticks. This is often done by organizing all free cells into a single linked list called the
freelist. A cons operation can be implemented by detaching a cell from the freelist, and
assigning its two fields. When memory is exhausted (assuming it is finite, unlike in the
model above), a garbage collection phase ensues, in which cells that have no pointers
to them are located and collected together into a new freelist (assuming there are any
unused cells, else execution aborts). Describing such methods in more detail is beyond
the scope of this book.
Exercises
17.1 Write a Pascal-like program writeout(index). Its effect should be to print out
the value in ID denoted by position index in the Hd and Tl arrays.
2
17.2 Write a Pascal-like program readin. Its input should be a list (a1 . . . an ) ID01 .
Its effect should be to initialise the Hd and Tl arrays so that cell n + 2 denotes the value
(a1 . . . an ).
2
References
The implementation ideas sketched in this chapter stem from McCarthys original work
on Lisp [124]. A more pedagogical treatment can be found in Hendersons book [67].
Relevant ideas are also discussed in [78, 159].
18
Robustness of Time-bounded
Computation
In Chapter 8 the term robust had a precise meaning: that the classes of problems
decidable by a wide range of computation models are invariant, aside from inessential data
encodings. Computing in a resource-limited context leads to a new aspect of robustness.
Ideally, resource-bounded problem solvability should be:
1. invariant with respect to choice of machine model;
2. invariant with respect to size and kind of resource bound (e.g. quadratic time,
polynomial space, etc.); and
3. invariant with respect to problem representation (e.g. the choice to represent a
directed graph by an incidence matrix or by adjacency lists should not make a
complexity difference).
In this chapter we will affirm the first two points for polynomial time bounds, and leave
the last to Chapter 25. As before we are only interested in decision problems expressible
by a yes-no answer, and not in computation of functions.
18.1
We begin by defining a resource-bounded program class to be the set of programs that run
within a given resource bound. Next, we define the sets of problems solvable by programs
running within these classes; for instance the well-known class ptime is defined below to
be exactly the set of problems solvable by programs in WHILEptime .
Consequent to the discussion of Section 16.2, we assume L-data= {0, 1} for every
language L. Recall that |d| is the size of a data value d: the number of symbols in it if d
is a string in {0, 1} , and the number of leaves if d is a tree in ID.
Definition 18.1.1 Given programming language L and a total function f : IN IN , we
define three sets of time-bounded programs:
Ltime(f (n)) = {p L-program | timeLp (d) f (|d|) for all d L-data}
271
Lptime =
Ltime(n . p(n))
p a polynomial
Llintime =
Ltime(n . kn)
k=0
The corresponding classes of decision problems solvable within limited time are easy to
define.
Definition 18.1.2 Given programming language L and a total function f : IN IN
1. The class of problems L-decidable in time f is:
timeL (f ) = {A {0, 1} | A is decided by some p Ltime(f (n)) }
2. The class ptime of problems L-decidable in polynomial time is:
ptimeL = {A {0, 1} | A is decided by some p Lptime) }
3. The class lintime of problems L-decidable in linear time is:
lintimeL = {A {0, 1} | A is decided by some p Llintime }
18.2
273
Recall the several simulations and constructions from Chapter 8. We now do time analyses of some of them, and give another construction. Recall from Definition 16.3.1 that
notation lintimepgind is used for linear-time simulation overhead with a programindependent constant factor a.
18.2.1
18.2.2
The running time of the Pascal-like program from the construction of Figure 17.4
is clearly at most linearly slower (by a program-independent constant factor) than the
GOTO program from which it was obtained. The construction can be refined to yield an
equivalent SRAM program, as in Exercise 18.6.
Given these representations, each GOTO operation is conceptually realized by a simple
operation on the DAG, realizable by a nonlooping sequence of SRAM operations. Further,
the running time of the SRAM program is shown in Exercise 18.6 to be slower than the
GOTO program from which it was obtained by at most a program-independent linear
factor.
2
18.2.3
Compiling SRAM to TM
::=
|
Xi := Xi + 1 | Xi := Xi .
- 1 | if Xi=0 goto ` else `0
Xi := Xj | Xi := <Xj> | <Xi> := Xj
Linear time
275
tapes. Thus the t-th computation step of SRAM-program pgtp is simulated by at most
O(5t log t) a t log t Turing machine steps, for a suitable constant a and all t.
Now, a total time analysis: let u = timeSRAM
(d). No step by p takes time more than
p
TM
timeq (d) a u log u to simulate by the Turing machine, and one simulation step requires
at most a u log u time.
Thus the entire u-step simulation takes time at most a u2 log u Turing machine steps.
2
SRAM
This yields timeTM
(d). This is polynomially bounded,
q (d) = O(u log u) where u = timep
as required.
2
18.2.4
18.3
Linear time
Some of the following results concern programs in the functional language F, leading
to the need to define its time usage function timeFp (d). Informally, this is just another
unit-cost measure, counting 1 for every operation, test or function call.
18.3.1
Consider program p = E0 whererec f(X) = B. The following uses the F semantic function E : Expression Expression ID ID as defined in Figure 9.1. Given a value v
of the variable X in an expression E, the function T maps E and v into the time T [[E]]v IN
taken to evaluate E. Further, function P maps p and d into the time P[[p]]d IN taken
to run p on d, i.e. P[[p]]d = timeFp (d).
Definition 18.3.1 The functions T : Expression Expression ID IN and P :
Fprogram ID IN are defined by:
P[[E0 whererec f(x) = B]]d
T [[X]]B v
T [[d]]B v
T [[hd E]]B v
T [[tl E]]B v
T [[cons E F]]B v
T [[if E then E1 else E2]]B v
T [[if E then E1 else E2]]B v
T [[f(E)]]B v
=
=
=
=
=
=
=
=
=
T [[E0]]B d
1
1
1 + T [[E]]B v
1 + T [[E]]B v
1 + T [[E]]B v + T [[F]]B v
1 + T [[E]]B v + T [[E1]]B v, if E[[E]]B v 6= nil
1 + T [[E]]B v + T [[E2]]B v, if E[[E]]B v = nil
1 + T [[E]]B v + T [[B]]B (E[[E]]B)
2
18.3.2
Lemma 18.3.2 There exist two programs intIF and intFI and constants c, d such that
for any p Iprograms, q Fprograms and all d ID:
F
277
Proof. This follows partly from the compilations of Propositions 8.2.1 and 8.2.2. In each
case, the translated program q runs slower than the original p by a constant factor. For
example in going from WHILE to GOTO by Proposition 8.2.1, timeGOTO
(d) a timeWHILE
(d)
q
p
for some a and all d.
The remainder follows from Lemma 18.3.2.
2
Theorem 18.3.3 states a form of robustness within linear-time decidable problems: the
class lintimeL is stable for the cluster we have studied until now of programming languages that L manipulate trees in ID.
Robustness of the concept of lineartime
The question just which problems can be solved in linear time has aroused some controversy and many differences of opinion, as it depends critically on the exact commputation
model used (i.e. it is not as robust as the class ptime). One might hope that Theorem
18.3.3 could be extended, for example to
lintimeTM = lintimeGOTO = lintimeSRAM
but this seems false: the class of problems solvable in linear time is nonrobust since it
appears to be different for various models. In particular, the multitape Turing machine
model is unnatural for linear time, and seems unable to solve as many problems in linear
time as the SRAM.
18.4
The following material is included for historical interest, but is not central to our development. It should probably be skipped on first reading.
In the classical Turing machine model (described in Section 7.6), one-step transitions
are defined to cost one time unit each. The definition is unrealistic, as it ignores two
important program-dependent parameters: the number of tapes k, and the size of the tape
alphabet . The assumption that these can be chosen arbitrarily large is also questionable
in view of Alan Turings analysis of computation, cf. Exercise 1.1.
In this section we show that not accounting for these factors1 implies the well-known
Turing machine constant speedup theorem. It in essence asserts that for any classical
Turing machine running in superlinear time, there is an equivalent one that runs faster
by any desired constant factor. The central idea in the proof is to replace the tape
alphabet by another alphabet m for a possibly large constant m.
There is some controversy as to the interpretation of the speed-up theorem and its
proof. Papadimitriou [138] claims that advances in hardware make constants meaningless, since the proof shows that increasing the word size of the computer decreases the
running-time by a constant-factor. Saying that a program runs in 2 n2 time does not
make sense, because while this may be true of todays computer, the program may run in
n2 time on the computer of tomorrow. Instead, one should simply say that the program
runs in O(n2 ) time, thus abstracting from the constant factor.
This however, does not account for the fact that constant-factors may make a difference when considering programs that run on the same computer, i.e., when the word size
is fixed. Indeed, claiming that every superlinear programs running time can be cut in
half clearly contradicts daily programming experience. Moreover, a sign of a mismatch of
theory with practice is seen in its proof which, in practical terms, amounts to increasing
the word size. Intuitively speaking, the speedup is obtained by a change of hardware
unrealistic from a programming perspective. In any case, the physical realizability of
this trick is dubious.
Further, it is not at all clear that the technique could be adapted to more familiar
machine architectures, even if one assumed that hardware could be increased in size
upon demand. The constant speedup theorem is in fact false for the I and F languages:
Theorem 19.3.1 shows that increased constant factors give a provable increase in decision
power for linear time bounds, and Theorem 19.5.3 does the same for a broad class of
so-called constructible time bounds. A consequence is that the classical Turing machine
computation model is provably different from I and F for problems solvable in linear and
many other time bounds. One view of this is that I and F are more faithful models of
computational practice than classical Turing machines.
Before proving the main result it may be useful to review a simple example illustrating
the essential idea in the speed-up theorem.
Example 18.4.1 The following Turing machine M decides the set of even unary numbers.
1. = {0, 1, B};
1 Together
279
2. Q = {`0 , . . . , `3 };
3. `init = `0 ;
4. `f in = `3 ;
5. T = {(`0 , B, B, , `1 ), (`1 , 1, B, , `2 ), (`1 , B, 1, , `3 ), (`2 , 1, B, , `1 ), (`2 , B, 0, , `3 )}
The machine first moves to the right of the initial blank and then reads past 1s. It is
in state `1 whenever it has read an even number of 1s, and in state `2 whenever it has
read an odd number of 1s. Therefore, if the blank following the input is arrived at in
`1 , the input is even and the output hence is 1. The machine requires around |x| steps
to compute its result, where x is the input, and |x| its length.
We will now consider an equivalent machine M 0 which, apart from an initial setup
phase, runs in half the time. The idea is to use an alphabet which allows us to express
two consecutive occurrences of 1 in a single symbol 11. This allows us to read past two
1s in a single transition, and therefore the new machine will run twice as fast.
However, M 0 receives its input in the same form as M and must therefore first transform it into the compressed format. We will use an extra tape to carry the compressed
form of the input. Here is M 0 :2
1. = {0, 1, B, 11, 1B};
2. Q = {`0 , . . . , `5 };
3. `init = `0 ;
4. `f in = `5 ;
5. T = {(`0 , (B, B, ), nop, `1 ),
(`1 , (1, B, ), nop, `2 ), (`2 , (1, B, ), (B, 11, ), `1 ),
(`1 , (B, B, ), (B, B, ), `3 ), (`2 , (B, B, ), (B, 1B, ), `3 ),
(`3 , nop, (11, 11, ), `3 ), (`3 , nop, (B, B, ), `4 ),
(`4 , nop, (11, B, ), `4 ), (`4 , (B, 0, ), (1B, B, ), `5 ), (`4 , (B, 1, ), (B, B, ), `5 )}
As usual the first transition just skips the initial blank. The next group of transitions
move the input to the second tape in compressed form. If the input does not have even
length, then it is necessary to pad an extra blank to the last 1, since we collect pairs
of symbols into single symbols. The symbol 1B is used for this. The third group of
transitions move to the start of the compressed input on the second tape (alternatively
we could have processed the input backwards). Finally, the last group of transitions
process the compressed input.
2 Remember
The last phase takes around d|x|/2e steps so we have roughly reduced the runningtime by half. The price to be paid is that we need to compress the input and go back to
the start, and this takes around |x| + d|x|/2e steps.
2
In this example, the total cost has been increased. However, this is just because M
has linear running time. If M runs in superlinear time then the added linear time to
compress the input may be outweighed by the halfing of the superlinear running time,
as the next theorem shows.
Theorem 18.4.2 Let M be a classical Turing machine deciding a set L in time f . For
any > 0 there is a Turing machine deciding L in time g where g(n) = f (n) + 2n + 4.
Proof. We shall prove that if
M = (, Q, `init , `f in , T )
is a 1-tape Turing machine running in time f and > 0, then there is a 2-tape machine
M 0 = (0 , Q0 , `0init , `0f in , T 0 )
running in time n. f (n) + 2n + 4. It is easy to modify the proof to show that if M is
a k-tape machine, for k > 1, then M 0 also is a k-tape machine.
The essential idea of the proof is similar to that of the example above. Each symbol
of M 0 encodes several symbols of M . As a consequence, several successive transitions in
M can be encoded by a single transition of M 0 .
More specifically, we shall encode m = d6/e symbols of M into a single symbol of
M 0 (the choice of m will be clear at the end of the proof). Thus 0 contains all m-tuples
of symbols from M . Since M 0 must be able do deal with the input to M , 0 must also
include the alphabet of M . Hence:
0 = m
The transitions of M 0 are divided into three phases: a compression phase, a simulation
phase, and a decompression phase.
In the compression phase M 0 reads the input x from tape 1 and stores it in compressed
form of length d|x|/me on the auxiliary tape, erasing tape 1 at the same time.3 Whenever
m symbols 1 , . . . , m have been read from tape 1, the single symbol (1 , . . . , m ) m
is written to the auxiliary tape. This can be done by recalling in the state the symbols
that are read.
3 Note that in the general case where M is a k-tape machine, k > 1, such an auxiliary tape is available
already in M 0 which is also given k tapes.
281
in
0
1
2
..
.
m1
has meaning
no symbols read from tape 1 yet
read from tape 1
1 , 2 read from tape 1
..
.
1 , . . . , m1 read from tape 1
The transitions to do the compression appear in Figure 18.1, to which the following
numbers refer. As long as less than m symbols have been read from tape 1, another
symbol is read and recorded in the state (1). When m symbols have been read from tape
1, the compressed symbol is written to tape 2, and control returns to the initial state (2).
If the whole input has been read, the compression phase ends (3). In case the input ends
in the middle of an m-tuple, additional blanks are padded (4). When the compression
phase ends, the read/write head on tape 2, moves to the beginning of the input (5). All
this takes 2 + |x| + d|x|/me steps.
We are then ready to the simulation phase in which all operations take place on the
second tape. In the simulation phase M 0 repeatedly simulates m transitions of M by
at most 6 transitions. Such a simulation of m steps is called a stage. At every stage
M 0 moves one square to the left, two to the right, and one to the left again. Recalling
the scanned tuples in the state, M 0 now has sufficient information to predict the next
m steps of M . These m steps can affect at most m successive squares, spanning over at
most two consective m-tuples, and so M 0 can implement the next m transitions of M by
at most two transitions.
More specifically, at each stage, M 0 begins in a state (q, j), where q represents the
state of M and j is the position of M s read/write head within the m-tuple that M 0
currently scans. This requires the addition to Q0 :
Q0 = . . . Q {1, . . . , m}
At the very first stage, control must be passed from the compression phase to the simulation phase (6). M 0 now moves one square to the left (7), then two to the right (8-9),
and one to the left again (10), recalling the scanned m-tuples in the state. This requires
(1)
(2)
(3)
(4)
(5)
((q, j, (~ ,~ ,
~)), nop, ((~ ), (~2 ), ), (q , j, (~ ,~ ,
~)))
0
((q , j, (~ ,~ ,
~)), nop, ((~ ), (~1 ), ), (q , l))
if 1 l 1 m
(12)
((q, j, (~ ,~ ,
~)), nop, ((~ ), (~2 ), ), (q , j, (~ ,~ ,
~)))
((q , j, (~ ,~ ,
~)), nop, ((~ ), (~3 ), ), (q 0 , l))
if 2m + 1 l 1 3m
(13)
((q, j, (~ ,~ ,
~)), nop, ((~ ), (~2 ), ), (q , j, (~ ,~ ,
~)))
((q , j, (~ ,~ ,
~)), nop, ((~ ), (~3 ), ), (q 0 , l))
if m + 1 l 1 2m
and ~1 6= ~
(14)
((q, j, (~ ,~ ,
~)), nop, ((~ ), (~2 ), ), (q , j, (~ ,~ ,
~)))
0
((q , j, (~ ,~ ,
~)), nop, ((~ ), (~3 ), ), (q , l))
if m + 1 l 1 2m
and ~3 6=
~
(15)((q, j, (~ ,~ ,
~)), nop, ((~ ), (~2 ), ), (q 0 , l))
if m + 1 l 1 2m
and ~1 = ~ ,~3 =
~
283
the addition to Q0 :
Q0
= ...
Q {1, . . . , m} m
Q {1, . . . , m} 2m
Q {1, . . . , m} 3m
The reader should not be surprised to see analogs of the preceding theorem with the
term 2n + 4 replaced by some other term. The term is sensitive to small changes in the
definition of Turing machines. For instance, some models only allow a machine to write
a symbol or move one square, but not both, in a single step, and this makes a difference.
Exercises
18.1 Show that L lintime M implies ptimeL = ptimeM .
18.2 Show that that the interpreter int of F by WHILE of Proposition 9.2.2 induces at
most a program-independent constant slowdown: given any F-program p and input d,
F
timeWHILE
2
int (p.d) b timep (d).
18.3 Complete Lemma 18.3.2 part 1 by showing that the interpreter int of Exercise
18.2 can be replaced by an I program.
2
18.4 Show that the interpreter int of I by F of Proposition 9.2.2 induces at most constant
slowdown: for any I program p and input d, timeFint (p.d) b timeIp (d). This finishes
Lemma 18.3.2.
2
18.5 Show how to simulate a Pascal-like program with several arrays, with no more than
a constant overhead time per operation. RAM
2
18.6 * The Pascal-like implementation of GOTO was not quite a SRAM program because
it had several arrays, and records as well. Prove that this is equivalent to an SRAM
program running at most linearly more slowly. Consequence: any GOTO program p can
be implemented by a SRAM program q which runs in time linear in ps running time.
Does the constant coefficient depend on program p?
2
18.7 Argue informally that the SRAM ptime TM under the logarithmic time cost measure
for SRAM computations. Show that this implies ptime = ptimeSRAM .
2
18.8 Why can the proof method of Theorem 18.4.2 not be applied to WHILE or GOTO? 2
References
The random access machine was introduced by Shepherdson and Sturgis in 1963 [163].
The book by Aho, Hopcroft and Ullman contains a good discussion of robustness of
285
polynomial time [2]. This insight arose in work by several authors including Cobham,
Edmonds, Cook, and Karp. [25, 42, 26, 95]
19
287
19.1
Running times of I programs are defined just as in Section 16.4.3 (reasonable, since I is
a subset of WHILE). We show that the universal program for I developed in Section 4.1.1
is efficient, a term we use with a definite technical meaning:
An efficient interpreter is one whose use costs at most a program-independent linear
overhead, as in Section 16.3.2. Note that constant a below is quantified before p, so the
overhead caused by an efficient interpreter is independent of the programs it interpret.
Definition 19.1.1 An S-interpreter int written in L is efficient if there is a constant a
such that for all p Sprograms and d Sdata:
timeLint (p.d) a timeSp (d)
Constructing an efficient interpreter
Recall the interpreter u1var for one-variable WHILE programs constructed in Section
4.1.1. It had form:
read PD;
P := hd PD;
C := hd (tl P)
Cd := cons C nil;
St := nil;
Vl := tl PD;
while Cd do STEP;
write Vl;
(*
(*
(*
(*
(*
(*
(*
Input (p.d)
*)
P = ((var 1) c (var 1))
*)
C = c
program code is c
*)
Cd = (c.nil), Code to execute is c *)
St = nil,
Stack empty
*)
Vl = d
Initial value of var.*)
do while there is code to execute *)
where STEP is the large rewrite command of Figure 4.1. This program u1var is easily
seen to be efficient in the sense above:
Proposition 19.1.2 There exists a such that for all p and d
WHILE (d)
timeWHILE
u1var (p.d) a timep
Proof. Note that the entire STEP command of Figure 4.1 is a fixed piece of noniterative
code. For any one operation of p, STEP finds the appropriate rule(s) to apply, by matching
the top of the control stack Cd and, in some cases, the top of the computation stack St.
For any one p operation it only takes a constant amount of time (independent of p and
d) to find the appropriate rewrite rule(s) and to realize its effect (their effects).
289
Any single step of the interpreted program is realized by applying at most two iterations of STEP. For example, the decision of whether while E do C should perform C
the first time takes one step in p (in addition to the time to evaluate E). The interpreter
realizes the action of while E do C by applying STEP twice: one to set up the code stack
before evaluating the expression E; and once after Es evaluation, to check Es value to
see whether to enter command C or to escape from the while loop.
This implies that there exists a uniform and program-independent upper bound on
the interpretation/execution time ratio for all computations.
Variable access in the simulated program p is simulated by actions in u1var. Since p
has at most one variable, their execution times are independent of p. They are dependent
on the interpreter u1var, but are independent of program p.
However it is not clear that a program-independent upper bound can exist if p is
allowed to be an arbitrary multiple-variable WHILE program. The problem is that if the
interpreted program has multiple variables, the actions to simulate variable access and
storage will take time depending on the number of variables in p.
2
Remark: u1var satisfies another natural inequality, in the opposite direction: there exists
a constant b such that for each one-variable program p and input d:
(d) b timeWHILE
timeWHILE
u1var (p.d)
p
Such a bound is quite natural, because every single step of the interpreted program p is
simulated by several actions (always more than one) of u1var.
Although natural, such a constant b does not exist for all universal programs, since
there exist infinite classes of programs that can be simulated faster than they run. One
way this can be done is by remembering whether a certain subcomputation has been
performed before and, if so, fetching its result from memory rather than repeating the
computation. An example os this is Cooks construction involving stack programs [30, 6].
Proof. A correctness proof resembles that of Exercise 4.1. Each operation of the interpreted program is realized by a program-independent number of the interpreters operations.
2
19.2
Definition 19.2.1 An I-program tu is a timed universal program if for all p Iprograms, d ID and n 1:
1. If timep (d) n then [[tu]](p . d . niln ) = ([[p]](d).nil), and
2. If timep (d) > n then [[tu]](p . d . niln ) = nil.
The effect of [[tu]](p.d.niln ) is to simulate p for min(n,timep (d)) steps. If timep (d) n,
i.e. p terminates within n steps, then tu produces a non-nil value containing ps result.
If not, the value nil is yielded, indicating time limit exceeded.
Similar to the terminology for interpreters, we say:
Definition 19.2.2 A timed universal I-program tu is efficient if there is a constant k
such that for all p, d ID and n 1:
time tu ((p.d).niln ) k min(n, time p (d))
We will now construct an efficient timed universal program tu for I.
Construction 19.2.3 Recall the universal program i for I in Section 19.1. It was built
by translating the WHILE program u1var (plus its STEP command) into I.
The idea in constructing tu is to start with u1var, to add an extra input: a time
limit of the form niln stored in a variable Cntr, and some extra clocking code. Every
time the simulation of one operation of program input p on data input d is completed,
the clocking code will decrement Cntr and test it, stopping when it reaches zero. Call
the resulting program tt.
Details: program tt is seen in Figure 19.1, which uses a shorthand notation for the
membership test. This is easily turned into actual I commands. Finally, let tu be the
result of translating tt from WHILE to one-variable I code as in Proposition 3.7.4.
Lemma 19.2.4 tu is an efficient timed universal I-program.
291
read X;
(* X = ((p.d).niln ) *)
Cd := cons (hd (hd X)) nil;
(* Code to be executed *)
Vl := tl (hd X);
(* Initial value of simulated X *)
Cntr := tl X;
(* Time bound *)
St := nil;
(* Computation stack *)
while Cd do
if Cntr
then { if hd (hd Cd) {quote, var, do hd, do tl,
do cons, do asgn, do while}
then Cntr := tl Cntr;
STEP; X := cons Vl nil;}
else {
Cd := nil; X := nil};
write X
Figure 19.1: An efficient timed universal program tt.
Proof. To prove tu efficient, we must find a k such that for all p Iprograms, d ID,
and n we have both of:
time tu ((p.d).niln )
time tu ((p.d).niln )
k time p (d)
kn
The first inequality holds by reasoning similar to that of Proposition 19.1.2. The second
is immediate from the form of tu, since Cntr decreases with each iteration. If k1 , k2
respectively satisfy the first and second, then max(k1 , k2 ) satisfies both.
2
19.3
Theorem 19.3.1 There is a b such that for all a 1, there is a set A in timeI (a b n)
that is not in timeI (a n).
Proof. Let program diag be as in Figure 19.2. Claim: the set
L
read X;
Timebound := nila|X| ;
Arg := cons (cons X X) Timebound;
X := tu Arg;
(* Run X on X for up to a |X| steps *)
if hd X then X := false (* or until Timebound is reduced to zero *)
else X := true;
write X
Figure 19.2: Diagonalization program diag.
We now analyze the running time of program diag on input p. Since a is fixed,
nila|d| can be computed in time c a |d| for some c and all d. We implicitly assume that
command Timebound := nila|X| has been replaced by code to do this computation.
From Lemma 19.2.4 there exists k such that the timed universal program tu of Figure
19.1 runs in time time tu ((p.p).niln ) k min(n, timep (p)). Thus the command X :=
tu Arg takes time at most
k min(a |p|, timep (p)) k a |p|
so on input p, program diag runs in time at most
c a |p| + k a |p| + e
where c is the constant factor used to compute a |X|, k is from the timed universal
program, and e accounts for the time beyond computing Timebound and running tu.
Now |p| 1 so
c a |p| + k a |p| + e a (c + k + e) |p|
which implies that A timeI (a b n) with b = c + k + e.
We prove that A
/ timeI (a n) by a diagonal argument. Suppose for the sake of
contradiction that A timeI (a n). Then there exists a program p which decides membership in A and satisfies time p (d) a |d| for all d ID. Now consider the effect of
running p on itself as input, i.e., computing [[p]](p). The fact that time p (p) a |p| implies that tu in Figure 19.2 has sufficient time to simulate p to completion on input p.
By Definition 19.2.2, this implies
[[tu]]((p.p).nila|p| ) = ([[p]](p).nil)
293
19.4
Theorem 19.4.1 The result of Theorem 19.3.1 holds for the one-variable, one-atom
functional language F.
Proof. By Theorem 19.3.1 timeI (a n) / timeI (ab n) for all a. Using this and constants
c, d from Lemma 18.3.2 we obtain a chain of inequalities:
timeF (a n) timeI (ad n) / timeI (abd n) timeF (abcd n)
so the result holds with bcd in place of the b of Theorem 19.3.1.
19.5
We showed earlier for languages I and F that within linear time bounds, increased time
gives provably greater decision power. The proof technique involved diagonalization. In
this section we carry the theme further, showing analogous results for other computation
models, and for other time bounds. In particular we will look at asymptotic complexity,
showing that when one functional time bound grows faster than another in the limit,
there are problems solvable in the larger time bound but not in the smaller.
First, a slight generalization of the construction seen earlier.
Construction 19.5.1 Given an I-program b, consider the program diag-b of Figure
19.3, where tu is the timed universal program of Lemma 19.2.4:
read X;
Timebound
Arg
X
if hd X
then X :=
write X
:=
:=
:=
b X;
(* Insert body of b here *)
cons (cons X X) Timebound;
tu Arg;
(* run X on input X until it stops, *)
(* or until Timebound is reduced to nil *)
false else X := true;
Behavior: Suppose [[b]](d) always yields values of the form nilm (as it always will in
our applications). Then for any input p ID with [[b]](p) = nilm :
true
if timep (p) m and [[p]](p) = false
Time analysis: Let k be the interpretation and counting overhead incurred by the
timed universal program tu, and e the time to perform the final test above. Then for
any p ID
timediagb (p) timeb (p) + k min(|[[b]](p)| + e, timep (p))
timeb (p) + k |[[b]](p)| + e
Time-costructible functions
Motivation: For a time bound function f (n) to be useful, it must be possible when given
an input of size n to find out how much time f (n) is available by a computation not
taking more than the order of f (n) steps.
295
The following can be proven directly by diagonal constructions similar to that of Theorem
19.5.3, though more complex since self-interpreters are less easy to write for languages
TM or RAM than for GOTO. Alternatively, somewhat weaker versions may be proven using
Theorem 19.5.4.
Theorem 19.5.6 Suppose functions f, g are time constructible, f (n) n, g(n) n for
all n, and limn f (n)/(g(n) log g(n)) = . Then timeTM (f )\timeTM (g) 6= .
Theorem 19.5.7 Suppose functions f, g are time constructible, f (n) n, g(n) n for
all n, and limn f (n)/g(n) = . Then timeSRAM (f )\timeSRAM (g) 6= .
Exercises
19.1 Why can the proof method of Theorem 19.3.1 not be applied to WHILE or GOTO? 2
19.2 Prove that there are problems solvable by WHILE programs in time n3 but not in
time n2 . Hint: use the result of Theorem 19.5.4 together with a cost bound on the
simulation of WHILE programs by I programs.
2
19.3 Sketch the construction of a universal program for SRAM programs. This can store
the program to be interpreted in odd memory locations, and can represent program
memory cell loc in the interpreters memory cell 2 loc. Discuss its running time in
relation to that of the interpreted program, under the unit-cost asumption.
2
19.4 For the interpreter of the previous exercise, consider a logarithmic cost which also
accounts for the cost of instruction access. Thus all times are as in the table given before
for SRAM instruction times, but with factor log ` added to execute instruction in location
`.
Show that under this cost, the total interpretation time will be bounded by a programindependent constant times the interpreted programs running time.
2
19.5 Prove the unit-cost version of Theorem 19.3.2 from Exercise 19.3: that linear time
SRAM-decidable sets possess an infinite hierarchy ordered by constant coefficients, as in
Theorem 19.3.1.
2
19.6 Prove the logarithmic cost version of Theorem 19.3.2 from Exercise 19.4.
297
References
The earliest work on time-bounded hierarchies is from 1965, due to Hartmanis, Lewis
and Stearns [64, 65]. The hierarchy result for linear time in the I language appeared in
1993 in [78]. Papers by Gurevich and Shelah, and by Schonhage contain related work
[60, 159].
20
(by A. M. Ben-Amram)
The previous chapters hierarchy theorems (19.3.1, 19.4.1, 19.5.3) show that there exist programs whose running time cannot be improved beyond a constant multiplicative
factor. We call such programs optimal1 .
These theorems construct, from a given time bound T (n), a problem which is solvable
by an optimal program with running time cT (n) for some c and all n. In practice,
however, we are typically given a problem that we wish to solve by computer, rather than
a time bound. We attempt to write a program that will solve it as fast as possible. But
how fast can a given problem be solved?
The branches of Computer Science that deal with such questions are the design of
efficient algorithms and, on the negative side, lower-bound theory. (This book deals
mainly with the hierarchy and completeness results underlying lower-bound theory.) In
this chapter we consider what may be the most essential question to begin with: given
a problem, does there necessarily exist a fastest algorithm to solve it? In other words,
is the goal of algorithm design always well defined?
One of the major results in complexity theory, Blums Speedup theorem, shows that
there exist problems for which this goal cannot be achieved. For every algorithm to solve
such a problem, there is another one that is significantly faster. These problems are,
however, artificially constructed to prove the theorem. It is therefore edifying to discover
that for an important class of problems that occur in practice an optimal algorithm
does exist: one whose time cannot be improved by more than a constant multiplicative
factor. This result is known as Levins theorem. In this chapter we formulate and
prove, first Levins theorem, and then Blums theorem. We conclude with a theorem
of a somewhat different flavour, known as the Gap theorem. This theorem shows that
the results of the hierarchy theorems depend on the time bound T being a nice (that
is, time constructible) function: there exist functions t such that no program can be
designed to have running time inside some large zone lying just above t.
Remarks: Levins theorem exploits the existence of an efficient interpreter. All of
these theorems can be proven in a general form that applies not only to running time
1 Actually,
299
but to other reasonable computing resources, e.g. space. We do not go into details of
this generalization here.
20.1
Levins Theorem
Levins Theorem
301
(x, y) R has been found2 . It is quite obvious that this strategy can yield an extremely
inefficient program, since it may waste a lot of time on wrong candidates until it finds a
witness. Levins theorem states a surprising fact: for many interesting problems there is
another brute-force search strategy that not only is efficient, but optimal up to constant
factors. The difference is that Levins strategy generates and tests not solutions, but
programs.
Problems with easy witness checking. A common situation with many problems is
that verifying membership of a pair (x, y) in R (checking a witness) is relatively straightforward, not withstanding that producing a witness might be difficult. For example,
verifying membership in RSAT amounts to evaluating (F); this can be done in linear
time. On the other hand, finding a witness for F is at least as hard as just deciding
whether the witness exists, a problem complete for nptime.
This situation holds for a great many problems. For example it has been open for
many years whether SAT has any solution algorithm at all that runs in subexponential
time. The beauty of Levins theorem is that, even though no-one knows how fast (say)
satisfiability can be decided, the construction nonethelss gives an algorithm for it that is
asymptotically optimal (up to constant factors).
For Levins theorem to be of interest, it suffices that we be able to check witnesses
efficiently enough so that having the complexity of checking as a lower bound for witnesssearching is acceptable. However, in many cases, it can actually be proved that searching
for a witness cannot be done asymptotically faster than checking; for instance, this is
obvious when checking takes linear time (as in the SAT example).
This is a quite general phenomenon, which led to formulation of the class nptime,
also called np (to be discussed at length in Chapters 25 and 27). By definition, all
problems in nptime can be soved by guess-and-verify algorithms, where both guessing
and verification can be done in polynomial time. The only cause of superpolynomial time
is that the number of possible guesses is typically exponential in the problem input size,
and thus too large to enumerate.
A more sophisticated result that is relevant: by the version we saw of Kleenes Normal
Form (Theorem 13.4.3), for any program p there is a predicate R, decidable in linear time,
such that R(x, y) is true if and only if y is the computation of p on input x. In this case,
2 If R is decidable, this is straightforward by testing (x, y) R for all finite binary trees y, using a loop
as in Lemma 5.7.1 to enumerate them. If R is semi-decidable but not decidable, then one could use a
dovetailing of computations as in Theorem 5.5.1 to test (x, d0 ) R?, (x, d1 ) R?, . . . in parallel.
finding a witness for x is exactly equivalent to running p on x, and so can have arbitrarily
high complexity.
Ease of witness checking is captured in the following definition. (Section A.3.11
explains the o( ) notation.)
Definition 20.1.3 We call a semi-decidable binary predicate R easy to check if there is
a program r such that R = dom([[r]]), and no witness function f can be computed (on
input x) in o(time r (x.f (x))).
2
Suppose R is easy to check, and that program r satisfies Definition 20.1.3. Then program
opt of Theorem 20.1.2 is asymptotically fastest (that is, up to a constant factor) among
all programs that compute witnesses for R.
Proof of Levins theorem.
Proof. We make a simple, non-restrictive assumption on the program r: when run with
input (x.y), if (x, y) R it gives y as output. Otherwise, it loops forever.
Recall that the concrete syntax for I programs uses only the atom nil. Enumerate
ID = {d0 , d1 , . . .} as in Lemma 5.7.1 by programs start and next. We build program opt
from these parts (a concrete program will be given shortly):
1. A main loop to generate all finite trees. At each iteration one new tree is added
to list L = (dn . . . d1 d0 ). Tree dn for n = 0, 1, 2, . . . will be treated as the command
part of the n-th I program pn .
2. Iteration n will process programs pk for k = n, n 1, . . . , 1, 0 as follows:
(a) Run pk on input x for a time budget of at most bk (n) = 2nk steps.
(b) If pk stops on x with output y, then run r on input (x.y), so pk and r together
have been executed for at most bk (n) steps.
(c) If pk or r failed to stop, then replace k by k 1, double the time budget to
bk1 (n) = 2nk+1 steps, and reiterate.
3. If running pk followed by r terminates within time budget bk (n), then output
[[opt]](x) = y and stop; else continue with iteration n + 1.
Thus the programs are being interpreted concurrently, every one receiving some interpretation effort. We stop once any one of these programs has both solved our problem
and been checked, within its given time bounds. Note that opt will loop in case no witness
is found.
Levins Theorem
303
The keys to optimality of opt are the efficiency of STEP, plus a policy of allocating
time to the concurrent simulations so that the total time will not exceed, by more than a
constant factor, the time of the program that finishes first. The following table showing
the time budgets of the various runs may aid the reader in following the flow of the
construction and correctness argument.
Time budget
n=0
n=1
n=2
n=3
n=4
n=5
n=6
...
p0
1
2
4
8
16
32
64
...
p1
1
2
4
8
16
32
p2
1
2
4
8
16
p3
1
2
4
8
p4
1
2
4
p5
1
2
...
...
...
...
...
...
...
...
We first argue that the abstract algorithm just given is correct, then give it in concrete
program form, and finally analyze its time usage.
Correctness of the algorithm. Proving correctness of opt has two parts: showing
that opt produces only witnesses, and that it produces a witness for every x 1 R. First,
if [[opt]](x) = y then [[r]](x.y) terminates, so (x, y) R. Thus every output of opt is a
witness for its input.
Second, suppose x 1 R. Claim: there is a pair (n, k) with k n such that
1. timepk (x) 2nk ; and
2. timepk (x) + time r (x.y) 2nk where y = [[pk ]](x).
Proof of claim: since x 1 R there exists a pair (x, y) R. For this y, clearly [[r]](x.y)
terminates. Choose any program pk such that y = [[pk ]](x), and choose a value n large
enough so that 1 and 2 hold.
The computation of [[opt]](x) stops at iteration n or before. This implies [[opt]](x) =
[[r]](x.y) = y and (x, y) R, so opt has a witness as output for every input x 1 R.
read X; start;
Go := true;
while Go do
{
L1 := L;
T := (nil);
*)
*)
*)
*)
*)
while L1 do
(* Loop (2): set up to run pk on x
{
Cd := hd L1; St := nil; Vl := X;
T1 := T;
(* Copy time bound t
while T1 do
(* 2(a): Run pk on x for t steps
{STEP; T1 := tl T1;}
if Cd = nil
then
{Y := Vl;
Cd := r; St
*)
*)
*)
while T1 and Cd do
(*
Run r on (x.y)
{STEP; T1 := tl T1;} (* for remaining steps
*)
*)
if Cd = nil (*
If r stopped on x in time left
then {L1 := nil; Go := false;}
(*
then stop!
*)
*)
if Go then
(* (2c): If pk or r failed to stop
{L1 := tl L1;(*
k := k-1
T1 := T;
(*
Double time budget t := 2(nk)
while T1 do
{T1 := tl T1; T := cons nil T;}
}
(* End of if Go
next; L1 := cons New L1;
}
(* End of 2(a-b-c)
}
(* End of loop (1); try n := n+1
write Y
*)
*)
*)
*)
*)
*)
A program for opt. Let STEP be the WHILE macro used in Lemma ?? to execute an
arbitrary I program. This uses variables Cd, St and Vl to contain the control stack,
computation stack, and current value of the (unique) variable, respectively. By the proof
Levins Theorem
305
of Proposition 4.1.1, any single step of the interpreted program is simulated by at most
two applications of STEP.
Program opt is built from STEP and start, next of Lemma 5.7.1, and can be seen in
Figure 20.1. The list of all elements of ID considered to date is maintained in variable L,
with a local copy L1. The time budget is maintained in variable T, with a local copy T1.
The main loop of the program is (1). During its n-th iteration, the inner loop (2)
first applies STEP to simulate each program pk on L1 on input x for 2nk steps.
Program opt stops once one of the programs yields an output y (loop (2a)), provided
that value has been verified using r without overrunning the time budget (loop (2c)).
Faithfulness to the informal algorithm above should be clear.
Time analysis of opt. The following are easy to establish for n > 0. The phrase
simulation of pk includes running both pk and subsequently r (Steps 2(a) and 2(b)
above).
(1) The time for each iteration of the main loop, outside the code to simulate pk by
STEP or to double t, is bounded by c0 n where c0 is a constant and n is the iteration
number (cf. Exercises 5.11, 5.12).
(2) In iteration n, STEP is applied to n + 1 programs: pn , . . . , p1 , p0 .
(3) In iteration n, program pk is simulated for a number of interpretation steps, no
larger than 2nk .
(4) The total time to maintain time counter t is of the order of 1 + 2 + . . . + 2nk =
2nk+1 1, thus O(2n ).
(5) The total time for iteration n is bounded by the sum of the times for the pk :
c0 n +
n
X
c1 2nk + c2 2n c3 2n
k=0
20.2
Blums Speedup theorem involves two techniques: a diagonalization argument more subtle than that seen before in Theorem 5.3.1; and a search process executing programs
307
under a time budget, similar to that used in proving Levins theorem. Before proving
Blums result, we establish a simpler result that uses the same sort of diagonalization.
We define the following simplifying framework for the proof, only considering input of
the form niln . A program accepts a set of integers, in the sense that it program accepts
n if it outputs a non-nil value for input niln . The time complexity of program p, then,
can be expressed as a function on IN , namely tp (n) = time p (niln ).
On diagonalization. In Chapter 19 we used diagonalization to prove the hierarchy
theorem. In this chapter we use diagonalization in a slightly more involved manner, so
it may be useful to present first a general form of the diagonalization argument.
Let Q be a set of programs. We wish to construct a program p and ensure that p
/ Q.
We construct p so [[p]] 6= [[q]] for all q Q. More explicitly, p will be built so for every
q Q there is at least one input d such that [[p]](d) differs from [[q]](d).
Such a q will be said to have been killed. We construct p so every q Q will be
killed at some stage during ps computations, thus making p Q impossible. This is
done by inverting qs output for some input d, so [[p]](d) = true if [[q]](d) = false and
false otherwise.
The following shows that there exist problems arbitrarily hard to solve, no matter what
algorithm is used. The result is stronger than Theorem 19.3.1 since the lower bound on
run time applies to all but finitely many inputs.
Theorem 20.2.1 For every total recursive function g : IN IN there exists a total
recursive f : IN {true, false} such that if f = [[p]] for any program p, then tp (n) > g(n)
for all but finitely many n IN .
2
Proof. The proof uses some ideas from the proof of Levins theorem 20.1.2. We
assume that the reader is familiar with this, and now just give a sketch. Let p0 ,
p1 , p2 ,. . . enumerate all I-programs. Program pk can be generated by code start;
next;...; next with k occurrences of next (as in the proof of Levins theorem).
Call program p quick on m if tp (m) g(m). Our goal is to find a function f such
that f = p implies p
/ Q, where Q is the set of programs that are quick on infinitely
many inputs. This is done progressively. The value of any f (n) is computed in stages:
for each m = 0, 1, 2, . . . , n we construct two sets
Deadm
Quickm
=
=
read n;
Dead := ;
(* Programs that have been killed
*)
for m := 0 to n do
(* Compute f (0), ..., f (n)
*)
Quick := ;
(* Programs that run with time <= g *)
for k := 0 to m do
(* Iterate on different inputs
*)
if k
/ Dead and tpk (m) g(m)
(* Collect unkilled pgms
*)
then Quick := Quick {k};
(* quick on input m
*)
if Quick 6=
(* Now compute f (m)
*)
then k := the smallest index in Quick;
Dead := Dead {k};
Quick := Quick \ {k};
Answer := [[pk ]](m)
(* The value of f (m)
*)
else Answer := true;
(* End of all the loops *)
write Answer
Figure 20.2: A function that is hard to compute.
The set sequences will be monotone: r s implies Deadr Deads and Deadr Quickr
Deads Quicks .
The value of f (n) will be made different from pk (n) where k is the smallest index in
Quickn , assuming this set is nonempty. Function f is (by definition) computed by the
program of Figure 20.2. This program reads n, then computes Deadi , Quicki , f (i) in turn
for i = 0, 1, . . . , n, and finally writes f (n). It is evident that f is total.
In the program (which omits the subscripts on Quick and Dead) any index k such
that tpk (m) g(m) for some value k m n will be entered into Quick, unless already in
Dead.
For each n, the value of f (n) is defined so as to make f 6= [[pk ]] for a new pk in Quick.
(This happens provided Quick is nonempty, which will occur infinitely often.) When
program pk has been killed, it is removed from the set Quick and placed in set Dead.
Suppose now that f = pr . By construction [[pk ]] 6= f for every element k put into Dead,
so r is not in any set Deadm . Suppose further that program pr is fast on infinitely many
inputs. Then it is also fast on infinitely many inputs n0 , n1 , . . . larger than r (see Figure
20.3 for a pictorial representation). For every one of these of these, r will be entered
into Quickni (since r is not in Deadni ). Eventually r will be the smallest index in some
Quickni , at which point it will be added to Deadni . A contradiction arises because of the
309
Programs
6
6
r
pr
n0
r
n1
r
n2
r
n3
r
n4
r
n5
r-
An index k
- Inputs n
m
Figure 20.3: Program pr is quick infinitely often.
assumption that f = [[pr ]]:
f (ni ) = Answer = [[pr ]](ni ) = f (ni )
2
20.3
Theorem 20.3.1 For any total recursive function h there exists a total recursive function f such that for any program p computing f , there is another program p0 such that
f = [[p]] = [[p0 ]], and for all but finitely many d ID
timep (d) h(timep0 (d))
To appreciate the significance of this theorem, let h be a fast growing function such
as 2n . The theorem says that there is a function f such that, for every program p0 you
311
read n;
Dead := ;
(* Programs that have been killed
*)
for m := 0 to n do
(* Compute f (0), ..., f (n)
*)
Quick := ;
(* Programs to be killed
*)
T := 1;
(* Time budget t = h(mk) (1) for k = m
*)
for k := m, m-1,...,0 do
(* Iterate on different inputs *)
if k
/ Dead and tpk (m) T
(* Collect unkilled pgms
*)
then Quick := Quick {k};
(* that stopped in time
*)
T := h(T);
(* Increase time budget and decrease k *)
if Quick 6=
c2
n
X
i=n0 +1
313
20.4
The Gap theorem shows that for an arbitrarily chosen computable increase in time
bounds, there exist functions such that applying the increase to the bound does not
enlarge the class of decidable problems (in sharp contrast to the hierarchy results of the
last chapter). The theorem provides such a function that satisfies a pair of conditions,
one an arbitrarily chosen computable lower time bound g and another, h, that defines
the amount of increase to be applied.
Theorem 20.4.1 The Gap Theorem. For any (arbitrarily large) total recursive functions g : ID IN and h : IN IN such that (n) h(n) n, there is a total recursive
function t : ID IN such that (d) t(d) g(d) and for every I program p we have
timep (d) h(t(d)) = timep (d) t(d)
for all but finitely many values d.
Thus, time bound ht is not stronger than t when infinitely many inputs are considered.
Note that by the assumption on h, we have h t t, so the statement is significant. We
say that there is a complexity gap between t and h t.
Proof. First define a macro TEST that accepts as input a tree variable X and an integervalued variable N , and gives a Boolean result. Macro TEST generates I programs
p1 , p2 , . . . , pj until pj = X (this will happen because our enumeration process generates
all trees). Using the timed interpreter from the previous chapter, TEST runs each generated program for at most h(N ) steps on X. If any of these programs terminates within s
steps where N < s h(N ) the result of TEST is false. Otherwise its true.
We now use the macro TEST to write a program that computes a function t : ID IN .
On input X, the program computes n = g(X), then repeatedly applies TEST to X and
Exercises
20.1 The proof of Levins theorem assumes program q to be coded in language I, while
opt is a WHILE program. Explain why this discrepancy does not affect the result.
2
20.2 * What is the space complexity of opt? In particular, how does it relate to the
space consumption of a given program q for the problem in question?
2
315
20.7 Section 20.3 claimed that Blums theorem establishes the existence of a faster
program p0 , but there is no algorithm to construct it, given p. However, from the proof
of the theorem we know that blumk+1 is that faster program. Why doesnt the proof
imply an algorithm to obtain the faster program? In other words, why is the construction
of blumk+1 not effective?
2
20.8 Modify the proof of Theorem 20.4.1 to ensure that function t will increase when
|d| is increased.
2
20.9 * Let us restrict attention to time bounds which only depend on the size of the
input, t(d) = f (|d|). Demonstrate that for some constant a > 0, it is not possible to find
such a time bound t such that there is a gap between t and at, and 0 < f (n) n2 .
Hint: Design a program p1 such that for every odd n and 0 < i n
(d) i < timep1 (d) ai
for an appropriate constant a. Design another program p2 whose time similarly lies
between in and ain. Show, that for t, f as above, and for infinitely many inputs, one
of these programs will have its running time inside the intended gap. Remark: It is
actually possible to generalize this result to any polynomial function of n (instead of n2 ).
2
References
Levins theorem has been presented in a form quite similar to the above in an article
by Gurevich [59]. This is rather different from (and simpler than) the original Russian
article [107, 105].
Blums Speedup theorem is from [14]. The Gap theorem is attributed to two independent works, [16] and [167]. Both theorems can be found, together with an assortment
of related results, in [173].
The fields of designing efficient algorithms and of proving lower bounds for computational problems have been the issue of extensive literature, for example [2, 101] and
numerous more recent publications.
21
Space-bounded Computations
We have hitherto emphasized computation time. There is a similar but somewhat different way to classify problems according to how much memory space is required to solve
them. For simplicity of exposition we limit ourselves to imperative languages in which
a computation is a linear sequence of states, i.e. all the languages seen so far except the
functional languages1 .
For the computation models of Chapter 7 the input is contained in the initial store,
which always has length at least |d|, i.e. space linear in the size of the input. In other
words, there are no problems solvable in sublinear space in the models given earlier.
In general, linear space decidable sets can take exponential time to decide; and no
better bound is known (see Theorem 21.5.2). This time bound is intractable, i.e., well
beyond the running time of practically usable algorithms. This motivates a study of
space bounds that are small enough to give running times closer to practical interest,
i.e., the study of space bounds smaller than |d|, the length of the input d.
A solution to this problem is to use offline models that allow only read-only access
to an input value d and, when measuring program space consumption, to count only the
workspace that is used beyond the input length. (This is intuitively reasonable, since
read-only input will remain unchanged during the entire computation.) For the moment
we are only interested in decision problems expressible by a yes-no answer, and not in
computation of functions.
In order to study space-bounded computations, we will equip Turing, counter, or random access machines with a read-only input, instead of the earlier device of incorporating
the program input value into its initial state. A motivation is that it will become possible
to analyse computations in sublinear space, i.e. using space smaller than the size of the
program input, thus bringing space-limited computation nearer practically interesting
problems than before.
The models will later be extended to allow output as well. This will be write-only,
symmetric with the read-only restriction on input, in order to maintain the separation of
work storage from storage used for input-ouput data. Classes of functions computable in
limited space analogous to the above time-bounded decidable classes will turn out to be
quite useful for investigating complete, i.e. hardest problems for the various complexity
1 Functional languages can also be classified spacewise, but require more subtle definitions because of
implicit space usage caused by recursion.
317
21.1
21.1.1
21.1.2
The read-only Turing machine variant has read-only access to its input d. Further, only
the workspace that is used beyond the input data will be counted. (This is intuitively
reasonable, since read-only input will remain unchanged during the entire computation.)
A pictorial representation may be seen in Figure 21.1.
Definition 21.1.3 A read-only Turing machine TMro is a two-tape Turing machine
whose input is a string d in {0, 1} . Its instructions are as follows, where subscript
319
Tape 1 (input)
Finite
state
control
(program)
?
...B 0 0 1 1 1 1 B ...
Tape 1:
Tape 2:
Symbols:
I ::=
I ::=
S ::=
A tape together with its scanning position will be written as . . . BL1 S1 R1 B . . ., where the
underline indicates the scanned position. We assume the program never attempts to
move right or left beyond the blanks that delimit the input, unless a nonblank symbol
has first been written2 .
We define the length of a read-only TMro state s = (`, ), where ` is the instruction
counter and = (. . . BL1 S1 R1 B . . ., . . . BL2 S2 R2 B . . .), to be |s| = |L2 S2 R2 |, formally expressing
that only the symbols on work tape 2 are counted, and not those on tape 1.
2
Definition 21.1.4 A read-only counter machine CMro is a register machine whose input
is a string d in {0, 1} . Input access is by instruction if InCi = 0 goto ` else `0 , which
tests symbol ak in input d = a1 a2 ...an indirectly: index k is the value of counter Ci.
Data initialization sets counter C0 to n, giving the program a way to know how long
its input is.
I
::=
|
Ci := Ci + 1 | Ci := Ci .
- 1 | Ci := Cj
if Ci=0 goto ` else `0 | if InCi =0 goto ` else `0
2 This condition simplifies constructions, and causes no loss of generality in computational power, or
in time beyond a constant factor.
where log v is the number of bits required to represent v. This formally expresses that
only the space usage of nonempty registers (measured in bits) is counted.
2
Remark: This differs slightly from the counter machines seen earlier in Section 7.4, in
that input is a bit string instead of a number.
21.1.3
The following easily proven propositions assert that, as far as space usage is concerned,
multiple tapes are only essential when considering computations that use space less than
the length of the input.
Proposition 21.1.5 For any k-tape Turing machine p such that spaceTM
p (d) |d| for any
input d, there exists a 1-tape Turing machine q with [[p]]TM = [[q]]TM and a constant a such
TM
that spaceTM
q (d) a spacep (d) for any input d.
Corollary 21.1.6 If p is a read-only Turing machine such that spaceTMro
(d) |d| for all
p
inputs d, there is a 1-tape Turing machine q with [[p]]TMro = [[q]]TM , and a constant a such
TMro
that spaceTM
(d) for any input d {0, 1} .
q (d) a spacep
Proof. Exercises 21.1 and 21.2.
Essentially the same results hold for counter machines. Hints for the straightfoward
proofs are give in Exercises 21.3, 21.4.
321
Proposition 21.1.7 For any counter machine p as in Section 7.4 there exists a read-only
counter machine q and a constant a such that for any input v IN :
CMro
[[q]]
CM
Proposition 21.1.8 For any read-only counter machine p such that spaceCMro
(d) |d|
p
for any input d, there exists a counter machine q as in Section 7.4 and a constant a such
that for any input v IN :
CM
CMro
21.1.4
CMro
(cIN (v)) and space CM
(cIN (v))
q (v) a space p
3. Lpspace =
f a polynomial L
space(n . k log n)
k=0 L
space(f )
The corresponding classes of problems solvable within limited space are easy to define:
Definition 21.1.10 Given programming language L and a total function f : IN IN
1. The class of problems L-decidable in space f is:
spaceL (f ) = {A L-data | A is decided by some p Lspace(f (n) }
2. The class of problems L-decidable in logarithmic space is:
logspaceL = {A L-data | A is decided by some p Llogspace }
3. The class of problems L-decidable in polynomial space is:
pspaceL = {A L-data | A is decided by some p Lpspace }
21.2
We now show that Turing machines and counter machines are equivalent as regards space
usage. First-time readers may skip this section without loss of continuity.
Theorem 21.2.1 For any f with f (n) max(log n, 1)
[
[
spaceTMro (cf ) = spaceCMro (df )
c
Proof. The corollary is immediate from Theorem 21.2.1 and the preceding propositions.
Two constructions follow to prove Theorem 21.2.1, one building from an f -space-bounded
Turing machine program a corresponding counter machine operating in the desired size
bound, and another construction in the opposite direction. We leave it to the reader to
verify that the constructed programs decide the same sets as their sources, that is that
the simulations are faithful. This should not be surprising, as each program simulates
the operations of the other in exactly the same order, so it is only important to verify
that the desired space bounds are preserved, and that the two programs states continue
to correspond properly.
2
Construction 21.2.3 A spaceTMro (cf ) implies A
CMro
(df )).
d space
323
TMro
(cf ):
c space
The same effect can be achieved without the extra symbol 2 by a simple data encoding
into 0, 1, at most doubling the tape space. Since there is a fixed number k of CMro
variables, the total amount of work tape storage, including markers to separate the
blocks, is at most a constant times f (n) bits, as required.
Each CMro operation is straightforwardly simulable by the Turing machine. For example, command if InCi =0 goto ` else `0 can be realized by steps:
Locate the block containing the value j of Ci, and copy it into another block for
use as a counter c.
If 1 c n then continue, else goto the code simulating `.
Move to the left end of input tape 1 containing a1 a2 ...an .
If c = 1, the input symbol aj has been found and may be tested for zero.
If c > 1 then decrement it by 1, scan forward one symbol on the input tape, and
repeat from the previous step.
2
21.3
21.4
Robustness of pspace
Theorem 21.2.1 gives a pleasingly tight connection between the space used by Turing
machine computations and the sizes of counters used by counter machines solving the
same problems. Further, any counter machine is also a RAM, so we now briefly consider
the translation compiling RAM to TM from a memory usage perspective.
Robustness of pspace
325
The amount of Turing machine tape used by a translated program can be assumed to
be bounded by the sum of the lengths and addresses of the nonzero RAM memory cells3 .
Now every nonconstant address must have first appeared in a register; so if the RAM
program uses at most space f (d) bits of storage on input d, then the simulating Turing
machine uses at most linearly more space.
From this (informal) argument we can conclude pspaceTM = pspaceCM = pspaceRAM .
Therefore we henceforth often write pspace rather than pspaceTM .
Extending this result to GOTO programs has some complications that require a more
subtle implementation; the complications and an alternate implementation are sketched
below.
Storage usage in GOTO programs.
The original tree-based semantics gives unrealistically high space measures for two reasons. First, the tree model did not account for sharing, whereas an assignment such as
X:=cons X X should clearly not double the memory assigned to X.
A second problem is that even if the more realistic DAG model of Section 17.1.1 is
used, it often happens that nodes become inaccessible. For example, consider the translation compiling a Turing machine program to an equivalent GOTO seen in Section 18.2.
Without accounting for unreachable nodes, this would require space roughly proportional
to the simulated Turing machines running time, since every tape head motion is simulated by a cons. This is far in excess of what seems reasonable. The following seems to
be a fairer definition:
Definition 21.4.1 A space measure for the flow chart language GOTO: Consider the semantics of Section 17.1.1 in which the store is a DAG (, ) where maps Vars(p) to
nodes, and is a DSG that specifies the structure of the DAG. By definition, the size ||
of such a store is the number of nodes in the dag that can be reached from some node
variable, that is the number of nodes reachable via from the entry nodes in the range
of .
3 Using the construction of Chapter 8 , this could only fail if the RAM repeatedly stored first a nonzero
value, and then 0, in a great many cells. This would create many useless but space-consuming blocks
on the Turing machines tape. The problem is easy to circumvent; each time a register-changing RAM
instruction is performed, the simulating Turing machine checks to see whether the new value is zero.
If so, the address and value are removed from address and contents tapes, thus compacting the tape
storage. This yields the desired space bound.
21.5
timeTM (cf )
Proof. We show that if a one-tape Turing machine program p runs in space f and
terminates on its inputs, then it also runs in time cf for appropriate c.
Clearly p cannot repeat any computational state s = (`, . . . B L S R B . . .) in the computation on input d, since if this happened, p would loop infinitely on d. So to prove
our result it suffices to show that a terminating program running in space f has at most
cf (|d|) different states for some c.
Consider any computational state s reachable on input d. By the assumption on p,
|L S R| f (|d|). The total number of possible values of the nonblank tape contents LSR
327
with this space bound is bounded by 3f (|d|) , since each symbol in LSR must be 0, 1, or
B. Further, the scanning position where S is located has at most f (|d|) possibilities.
Combining these bounds, the total number of different possible values of the tape,
including both tape scanning position and contents, is bounded by
f (|d|) 3f (|d|)
Now n 2n for all n 1, so by the assumption that f (n) n we have
f (|d|) 3f (|d|) 2f (|d|) 3f (|d|) = 6f (|d|)
Finally, a total configuration of program p includes the control point and the state of its
tape. The number of these is bounded by (|p| + 1) 6f (|d|) cf (|d|) for all d where, for
example, c = 12|p| will do since
(|p| + 1) 6f (|d|) (2|p|)f (|d|) 6f (|d|) = (12|p|)f (|d|)
Since no state in p ` s0 s1 . . . st st+1 . . . can be repeated, the running time of p is
bounded by cf (|d|) . Thus A lies in time(cf (|d|) ).
2
21.6
For later usage in Chapter 26 (and for the sake of curiosity), we show that a number of
familiar functions can be computed in logarithmic space. The read-only Turing machine
has binary integers as inputs (multiple entries are separated by blanks), and is now
assumed equipped with a one-way write-only output tape to write function values.
Proposition 21.6.1 The following functions f : {0, 1} {0, 1} are Turing computable
in space log n:
. y, (x, y) . x y
1. (x, y) . x + y, (x, y) . x
2. (x, y) . x y
3. f (x1 , x2 , . . . xn ) = the same sequence sorted into nondecreasing order
Proof. Exercises 21.5, 21.6, 21.7.
Lemma 21.6.2 The following statements about a function f : {0, 1} {0, 1} are
equivalent, provided |f (d)| is bounded by some polynomial p(|d|) for all d:
329
Proof. The obvious approach is simply to compute g(x) and then apply f to this result.
Unfortunately this does not prove the theorem, because g(x) may occupy more that
k log n bits (for example, even if g is the identity function). The problem is that a
logspace f program cannot store all its input on a work tape, but is restricted only to
look at its input one symbol at a time. Our strategy is thus not to store g(x) explicitly
but rather virtually, using the result of Lemma 21.6.2. Let TM-program pf compute f ,
and assume program pg computes
(i, x) . the ith bit of g(x)
as in Lemma 21.6.2. We sketch the construction of a 6-tape Turing program r to compute
f (g(x)).
Tape number
1 (read-only input)
2
3
4
5
6 (write-only output)
Tape contents
x = a1 . . . an
Program pf s work tape
i = scan position on program pf s input tape
b = program pf s scanned input symbol from g(x)
Program pg s work tape
Program pf s output tape
*)
Finally, it must be seen that this code can be programmed on a Turing machine, and
that the resulting machine r works in logarithmically bounded space.
As to programming, command b := pg x i can be realized by modifying pg s program to use tape 5 as its work tape, and to take its input from tape 1 as long as it is
scanning the x part of its two-part input xBi, and to shift over to reading from tape 3
when reading from the i part.
As to rs space consumption, let n = |x|. Tape 4 is of constant size, and tape 5 is
pg s work tape on x and so is logarithmically bounded in n. The value of g(x), which is
pf s simulated input, must be bounded by some polynomial (n) by the running time
argument of Corollary 21.3.2. Thus 0 i 1 + (n), so tape 3 is logarithmically bounded
(assuming i to be represented in binary notation). Finally, tape 2 has length at most
k 0 log |g(x)| k 0 log((n)) = O(log n).
Tape 1 is not counted, and all 4 work tapes are logarithmically bounded. They can
all be combined into one work tape, also logarithmically bounded, which completes the
argument.
2
21.7
Very similar results to those seen earlier for time bounds can also be proven for space
bounds. The following is analogous to Definition 19.5.2.
Definition 21.7.1 Function f : IN IN is space-constructible if there is a TM program
f and a constant c > 0 such that for all n 0
TM
n
[[f]] (0n ) = bin(f (n)) and space TM
f (0 ) c f (n)
Many familiar monotone functions are space-constructible, e.g. all linear functions, all
polynomials, and f + g, f g, f g whenever f, g are time-constructible (Exercise21.8).
Theorem 21.7.2 For one-tape Turing machines: If f is space-constructible there exists
b > 0 such that pspaceTM (bf )\pspaceTM (f ) 6= .
Proof. The proof is very similar to that of Theorem 19.5.3 and so is just sketched here.
The technique used is again diagonalization to construct a program diag defining a set
A in pspaceTM (bf )\pspaceTM (f ) for suitable b.
331
There are, however, some differences. To begin with, we must assume that onetape Turing machine programs are encoded as strings over {0, 1} . The next step is to
construct a self-interpreter that uses such a description of a program by a string. This is
technically rather messy, and has been done in numerous books and articles, so we omit
the details.
The diagonalizing program diag is then a modification of the self-interpreter, just as
in Section 19.5.3. Program diag is constructed so that for any input p {0, 1} :
Proof. This is very similar in concept to the proof of Theorems 19.5.3 and 21.7.2.
Exercises
21.1 Prove Proposition 21.1.5.
333
References
The earliest work on space-bounded hierarchies is from 1965, due to Hartmanis, Lewis
and Stearns [64, 65]. Early results on sublinear space are found in papers by Savitch,
Meyer, Jones, and Jones, Lien and Laaser [157, 126, 84, 75, 80].
22
Nondeterministic Computations
A nondeterministic program is one that may guess, i.e. one whose next-state transition
relation is multivalued rather than a partial function, as has been the case hitherto. This
capacity may be added to any of the imperative computation models already seen by
adding a single instruction form `: goto `0 or `00 . Its semantics is to enlarge the state
transition relation of Figure 7.1 to also allow transitions
(`, ) (`0 , ) and
(`, ) (`00 , )
Correspondingly, one makes a while program nondeterministic by adding a choice command, for example
C
::=
choose C1 or C2
22.1
22.2
The problem is, given a directed graph G = (V, E, s, t) with edges E = {(u1 , v1 ),
(u2 , v2 ), . . .} and a source and target nodes s, t, to decide whether there exists a path
from s to t. The following nondeterministic WHILE program sketch assumes inputs s, t,
and that the graph G is given as a list ((u1 .v1 ) (u2 .v2 ) ...(un .vn )) in ID.
read S, T, G;
W := S;
while W 6= T do
(* Repeat until (if ever) T is reached
Copy := G;
while Copy do
(* This chooses an edge at random:
choose
Copy := tl Copy (* Either omit the first edge of Gs copy
or { Edge := hd Copy; Copy := nil };
(* or keep it
if
then
W = hd Edge
W := tl Edge;
write true
*)
*)
*)
*)
*)
*)
*)
22.3
Time and space usage are also interpreted angelically, taking the least possible values
over all accepting computations:
Definition 22.3.1 Given a computation C = p ` s1 s2 . . . st , its running time is
t (its number of states). The space usage of computation C is by definition |C| =
max{|s0 |, |s1 |, . . . , |st |}. The time usage (space usage) function of program p on input
d is the shortest length (minimum space) of any accepting computation:
timep (d) = min{t | p ` s1 st is an accepting computation on input d}
spacep (d) = min{|C| | C = p ` s1 st is an accepting computation on input d}
Definition 22.3.2 In the following, L- program p may be nondeterministic.
nptimeL
npspaceL
nlogspaceL
337
The symbol N in the classes above indicates nondeterminism. Note that, by definition
and in contrast to deterministic computation as defined before, if p fails to accept an
input d then it may enter an infinite loop (though it is not required to do so).
Proposition 22.3.3
ptimeL nptimeL , pspaceL npspaceL , and logspaceL nlogspaceL .
Proof. Immediate since every deterministic program is also nondeterministic, and uses
no more time nor space under the nondeterministic measure than under the deterministic
one.
2
Theorem 22.3.4 Aside from data encoding,
nptimeTM =nptimeSRAM =nptimeGOTO
npspaceTM =npspaceSRAM =npspaceGOTO =npspaceCM
nlogspaceTM =nlogspaceCM
Proof. The constructions seen earlier for deterministic programs can without modification
be applied to the nondeterministic ones.
2
Exercises
22.1 Prove that any set A {0, 1} that is accepted by a nondeterministic Turing machine p is recursively enumerable.
Hint: Let a choice sequence be a string cs = c1 c2 . . . cm {0, 1} . For each time step
t in ps computation, if the current instruction to execute is goto `0 or `00 , interpret ct
as advice on which branch to take: p should take branch `0 if ct = 0, else take branch `00 .
Consider the function (
f (d, cs) =
References
The earliest work on nondeterministic space-bounded computation is by Kuroda from
1964 [102], soon followed by Hartmanis, Lewis and Stearns [64, 65]. Edmonds explored
nondeterministic algorithms from a more practical viewpoint [42].
23
23.1
In this and following chapters many constructions start with a Turing machine program
p, deterministic or nodeterministic, that accepts a set A {0, 1} . These constructions
1 This
339
become technically more convenient if we can assume without loss of generality that
program p has been normalized so that acceptance of an input only occurs in a fixed
way, less general than as defined before, and so easier to manage in our constructions.
This is the content of
Proposition 23.1.1 For any Turing machine program p there is a program q = I1 . . . Im
such that for any d {0, 1}
1. p has a computation that accepts d if and only if q has a computation
Readin(d) = (0, 0 ) . . . (m, m ) (m, m ) . . .
where the work tape of m contains 1BB....
2. p has a computation that does not accept d if and only if q has a computation
Readin(d) = (0, 0 ) . . . (m 1, m1 ) (m 1, m1 ) . . .
where the work tape of m1 contains 0BB....
3. In the computations above, q first reaches configurations with label m or m 1
after using the same space as p on the same input, and time at most a constant
factor larger than that used by p on the same input.
Proof. First, let q be identical to p, but with instructions added at the end of its program
to clean up the work tape by writing blanks over all squares except for the answer (0
or 1), and then stopping there. Next, add to q the instructions
m-1: if 0 goto m-1;
m:
if 1 goto m
at its end, so q loops infinitely at control point m-1 if the answer is 0, else at control
point m.
Clearly the cleanup code costs no extra space, and uses time at most the length of
the nonblank part of ps work tape, which is of course bounded by ps run time. The
final code only adds a constant amount to time usage.
2
23.2
341
Definition 23.2.1 A concrete syntax for graphs. Graph G = (V, E, v0 , vend ) can be represented by listing its vertices, edges, and source and target as the following string over
the alphabet = {0, 1, [, ], (, ), }, where each vertex vi is represented by i as a binary
number:
0
23.3
The following apparently rather specialized problem will turn out to play a central role
in establishing several parts of the space-time complexity hierarchy.
Decision problem GAP (graph accessibility):
Input: a directed graph G = (V, E, v0 , vend ) as in the concrete syntax of Definition 23.2.1.
Output: true if G has a path v0 vend , else false.
We present no less than four algorithms for the problem. The first two are nondeterministic and use logarithmic space: one gives positive answers and the other, negative
answers. The others are deterministic. The third uses linear time, and linear space
as well; and the last runs in space O(log2 n). Each is expressed by giving an informal
procedure, after which its time or space usage on a Turing machine is analysed.
23.3.1
343
w := v0 ;
while w 6= vend do
choose an arbitrary node x with w x E;
w := x
write true
This straightforward nondeterministic program just guesses a path from v0 to vend . It
stores at most two vertices at any one time. Given r vertices in V , this alorithm requires
at most O(log r) bits of storage , which is at most O(log size(G)).
2
23.3.2
Surprisingly, the negation of this problem can also be solved within logarithmic space
using nondeterminism.
Theorem 23.3.2 The following set is in the class nlogspaceTM :
GAP = { G = (V, E, v0 , vend ) | graph G has no path from vertex v0 to vend }
Proof. Let G be a graph be as above. Let
ni = #{u | v0 i u}
be the number of nodes that can be reached from node v0 by a path of length at most
i. We will soon show how each ni can be computed. First, though, we show a nondeterministic algorithm which, assuming nr1 to be given in advance, can answer Nopath =
true iff G GAP. Consider the program sketch of Figure 23.1.
Assume that nr1 is given correctly. This program, for every node z, can either ignore
it, or guess that there exists a path from v0 to z. The next step is to see whether its
guess was correct, and to abort if the verification attempt fails2 . The number Count of
such verified guesses is counted. If it equals nr1 then every accessible node has been
examined.
In this case, the final value of Nopath is true if and only if there exists no path from
v0 to vend . In all other cases the program fails to terminate, so only correct answers are
ever produced.
The algorithm above uses several variables, each of value bounded by either a constant
or log r, and so runs in logarithmic space (assuming nr1 given in advance).
2 This
23.3.3
345
n := 1; i := 0;
repeat
(* Invariant here: n = ni *)
i := i + 1;
n := 0;
(* Search for all and only nodes u reachable *)
for u := 1 to r do
(* from v0 in i steps *)
Counter := n i; (* Find all nodes reachable in <i steps *)
Foundu := false;
for w := 1 to r do
(* Examine EVERY node w *)
choose
(* Guess w unreachable in <i steps *)
skip
or
(* Guess w reachable in <i steps
*)
if path v0 < i w
then Counter := Counter-1; (* w reached in <i steps *)
if w u then Foundu := true; (* If reachable *)
else abort;
if Counter 6= 0
then abort
(* Missed nodes reachable in <i steps
*)
if Foundu
then n := n + 1;
(* Another u reachable in i steps
*)
until i = r-1;
(* End of outermost loop *)
write n
Figure 23.2: Nondeterministic algorithm to compute nr .
the algorithms total run time is O(max(|V |, |E|)). A Turing machine implementation of
the algorithm takes time more than linear, but still a low-degree polynomial.
23.3.4
TM
k space (
k(log n)2 ).
Proof. Let G = (V, E, v0 , vend ). Correctness of the following algorithm is based on the
observation that x k y iff one of three cases holds: k = 0 and x = y; or k = 1 and
k
k
2
(x, y) E; or k > 1 and for some z V , both of x d 2 e z and z b 2 c y are true.
Algorithm Divide-and-conquer search.
This algorithm (Figure 23.4) uses recursion to decide whether there exists a path
from vertex i to vertex j of length at most `. Termination is ensured by dividing ` by
two at each recursive call. Space bound log2 r is understood to mean (log r)2 .
2
Space analysis: procedure Path can call itself recursively to a depth of at most O(log r),
as this is the number of times that r can be halved before reaching 1. The call stack
of traditional implementations thus has at most O(log r) stack frames, each containing 3
numbers between 0 and r (plus a return address, of constant size). Each number can be
347
represented in O(log r) bits, so the total storage requirement is at most O(log2 r) bits.
This bound is easily achieved on a Turing machine, by storing the call stack on its tape.
23.3.5
d;
length(d);
f(n);
{};
{};
(* Input size *)
(* Work tape space bound *)
(* No vertices initially *)
(* No edges initially *)
for ` := 1 to m+1 do
(* Compute the set of all vertices *)
for i := 0 to n+1 do
for j := 0 to z+1 do
forall strings w {0,1,B} with |w| z do
V := V {(`, i, j, w)};
write V;
forall c1 V do forall c2 V do (* Compute all edges *)
if c1 c2 by program p
then E := E {c1 c2};
write E;
v0 := (1,0,0,B); vend := (m,0,1,1); (* Initial and final *)
write v0 , vend ;
Figure 23.5: Build state transition graph.
Proof. First, configurations have form C = (`, i, j, W). Since p is f -bounded, their number
is at most (m + 1)(n + 2)(f (n) + 2)3 f (n) . This is O(g f (n) ) for appropriate g. Since f is
space constructible, step z := f(n); in Figure 23.5 can be performed in space f (n) and
so in time hf (n) for appropriate h.
The first nest of four loops takes time proportional to the number of configurations.
The second nest of two loops takes time at most quadratic in the number of configurations,
which only serves to increase the base of the exponent. The test if c1 c2 can be
done in time O(|c1| + |c2|).
Implementation of this algorithm on a Turing machine is straightforward. The only
effect of slow access to data stored on its tapes being to increase the value of c. This
completes the proof.
2
Lemma 23.3.7 If f is space constructible and f (n) log n for all n, then for a given
fixed program p there is a c such that for any d, graph Gp (d) can be constructed using
work space at most cf (|d|).
Proof. A slight modification of Construction 23.3.6 can be used. One change is that
instead of storing the vertices and edges of Gp (d) in memory, they are written on a
write-only output tape as they are constructed. Another is to find a way to avoid storing
all of V .
First, note that a single configuration C = (`, i, j, w) takes space at most
23.4
349
We have now done most of the work needed for the following result, which strengthens
that of Theorem 21.5.2.
Theorem 23.4.1 nspace(f ) time(cf ) for some constant c, if f is space constructible
and f (n) log n for all n.
Proof. Given p that runs in space f , Construction 23.3.6 yields its state transition graph
Gp (d) = (V, E, v0 , vend ) in time O(g f (n) ) for appropriate g, where n = |d|. We have shown
that p accepts d if and only if Gp (d) has a path from v0 to vend . This can be tested
by the depth-first graph searching algorithm of Section 23.3 in time polynomial in g f (n) ,
which is again exponential in f (n) (for example (g f (n) )k = (g kf (n) )).
2
Corollary 23.4.2 nlogspace ptime
Proof. ck log n = nk log c , so nspace(k log n) time(ck log n ) = time(nk log c ).
2
S
Theorem 23.4.3 nspace(f ) c space(c (f 2 )), provided f is space constructible and
f (n) log n for all n.
Proof. Suppose A nspace(f ) is accepted by program q. Let program p be as in
Proposition 23.1.1, and let Gp (d) be ps state transition graph. As observed before,
d A iff q accepts d, so d A iff Gp (d) has a path from v0 to vend . It thus suffices to
show that the existence of such a path can be tested within space (f (n)2 ), where n = |d|.
By Lemma 23.3.7 there is a c such that the function g(d) = Gp (d) can be constructed
using work space at most cf (|d|), and graph Gp (d) has at most r = cf (n) nodes. By the
result of Section 23.3.4, this graph can be tested to see whether a path from v0 to vend
exists in space O((log r)2 ). Finally
(log r)2 = (log(cf (n) ))2 = (f (n) log c)2 = (log c)2 f (n)2
Consequently the test for existence of a path from v0 to vend can be carried out in space
at most O(f (n)2 ).
2
Corollary 23.4.4 pspace = npspace
Proof. Left-to-right containment is immediate by definition. The opposite containment
follows from Theorem 23.4.3, since the square of any polynomial is also a polynomial. 2
23.5
An enigmatic hierarchy
For any k 1 we have limn k log2 n/n = 0, so by the hierarchy theorem for space
constructible bounds (Theorem 21.7.2), there exist problems in space(n) but not in
space(k log2 n) for any k, and so a fortiori not in nlogspace. Since n is certainly a
polynomial, there are problems in pspace but not in nlogspace.
2
An interesting and challenging fact is that, even after many years research, it is still
not known which of the inclusions above are proper inclusions. The undoubtedly bestknown of these several open questions is whether ptime = nptime, also known as the
P=NP? question.
Frustratingly, the result that nlogspace / pspace implies that at least one among
the inclusions
logspace nlogspace ptime nptime pspace
must be a proper inequality (in fact, one among the last three, since equality of all three
would violate nlogspace / pspace); but it is not known which ones are proper.
The gap in computational resources between, say, logspace and nptime seems to
be enormous. On the one hand, nptime allows both polynomially much time, and as
much space as can be consumed during this time, and as well the ability to guess. On
the other hand, logspace allows only deterministic program that move a fixed number
of pointers about, without changing their data at all. (This claim will be substantiated
in Section 24.1.)
Nonetheless, no one has been able either to prove that logspace = nptime, nor
to find a problem solvable in the larger class that is provably unsolvable in the smaller.
Many candidates exist that are plausible in a very strong sense, as will be seen in a later
chapter on complete problems, but the problems of proper inclusion remain open.
Theorem 23.5.2 If A nspace(f ) and f (n) log n is space-constructible, then A
nspace(c f ) for some c > 0, where A is the complement of A.
An enigmatic hierarchy
351
Exercises
23.1 Estimate the running time of the graph searching algorithm of Lemma 23.3.4. 2
23.2 Estimate the running time of the state transition graph-searching algorithm of
Theorem 23.3.2.
2
23.3 Prove carefully that GAP nlogspace.
23.4 Estimate the running time of the logspace algorithm of Theorem 23.3.2 for deciding membership in GAP.
2
References
The backbone hierarchy presented here is the result of work by many researchers. These
include the first works on space- and time-bounded computation by Hartmanis, Lewis and
Stearns [64, 65]; the role of nondeterminism as seen in theory and practice by Kuroda and
Edmonds [102, 42]; Savitchs pathbreaking works on logspace computation and reduction
plus later results by Meyer, Stockmeyer, Jones and others [157, 126, 84, 75, 80]; and
Immerman and Szelepcsenyis answers in 1987 to Kurodas question of 23 years before
[166, 71].
24
24.1.1
To establish the first point above, we show that the following all define the same decidable
problems (on inputs from ID01 for GOTOro programs):
Turing machine programs that run in space k log(|d|) for some k.
Read-only counter programs in which each counter is bounded in value by |d|, or a
polynomial in |d|, or even restricted so that no counter may be incremented.
GOTOro programs.
Frotr programs.
Proofs are by a series of lemmas progressing from GOTOro programs to the logspace
counter-length bounded machines of Corollary 21.3.1.
Lemma 24.1.3 A {0, 1} is decidable by a CM\C:=C+1 program iff A is decidable by a
GOTOro program.
Lemma 24.1.4 If A {0, 1} is decidable by a CMvalue(n) program then A is decidable
by a CM\C:=C+1 program.
Lemma 24.1.5 If A {0, 1} is decidable by a CMlogspace program then A is decidable
by a CMvalue(n) program.
355
24.1.2
We must now prove the three Lemmas. The following is an easy result on very limited
counter machines:
Proof. Lemma 24.1.3: we must show that any CM\C:=C+1 program p is equivalent to some
GOTOro program program, and conversely. Input to a CM-program is a string a1 a2 ...an ,
corresponding to input list
(an an1 ...ak ...a1 ) ID01
(using Lisp list notation) for a GOTOro-program. Each ai is nil or (nil.nil).
Suppose we are given a CM\C:=C+1 program p. Its counters Ci can only assume values
between 0 and n. Thus any Ci with value k can be represented by a GOTOro program
variable Xi which points to sublist (ak ...a1 ) (and to the nil at the end of the input list,
in case k = 0).
Counter command Ci := Cj can obviously be simulated by Xi := Xj. Command
Ci := Ci.
-1 can be simulated by Xi := tl Xi (recall that tl(nil) = nil). Command if
Ci = 0 goto ` else `0 can be simulated by if Xi goto `0 else ` (the test is reversed
since counter value 0 corresponds to the end of the list, which has list value nil = false).
Command if inCi = 0 goto ` else `0 can be simulated by if hd Xi goto `0 else `
(the test is again reversed since symbol 0 is coded as nil = false).
Conversely, suppose that we are given a GOTOro-program p and the input list
(an an1 ...ak ...a1 ) ID01 . We assume n > 0; a special case can be added to give the
correct answer if n = 0.
The variables X of p can only point to: one of three things: 1) a position (ai ...ak ...a1 )
within the list with i 1; or 2) the root of (nil.nil), encoding some ai = 1; or 3) the
atom nil.
Thus variable X may be represented by two counter variables X1, X2. In case 1) X1
has i 1 as value. In case 2) X1 has value 0 and X2 has value n. In case 3) both variables
have value 0.
Counter code to maintain these representation invariants is straighforward to construct, by enumerating the possible forms of GOTOro commands.
2
Proof. Lemma 24.1.4: we must show that any CMvalue(n) program p is equivalent to some
program q without C := C+1. All counters are by assumption bounded by n, so we need
not account for overflow. Recall that counter C0 is initialized to the length n of the
input.
We can simulate C := C+1 (without addition!) by using an auxiliary variable Tem
and exploiting the instruction Tem := C0 which assigns input length n to Tem. Let the
initial value of C be i.
The following works in two phases: first, variable Tem is initialized to n, and then
C and Tem are synchronously decremented by 1 until C = 0. Thus Tem ends at n i, at
which point it is decremented once again, to n i 1. For the second pass C is reset
to n, and Tem and C are again synchronously counted down until Tem = 0. Once this
happens, C is i + 1 = n (n i 1), as required. Note that if C = n, the effect is to leave
C unchanged.
Tem := C0;
while C 6= 0 do
{C := C-1; Tem := Tem-1};
Tem := Tem - 1;
C := C0;
while Tem 6= 0 do
{C := C - 1; Tem := Tem -
(* Tem := n *)
(* Tem := n i and C := 0 *)
(* Tem := n i 1 *)
(* C := n *)
(* C := i + 1 by decreasing Tem to 0 *)
1};
2
Proof. Corollary 24.1.5: We must show that any CMlogspace program p is equivalent to
some CMvalue(n) program q. We do this in two stages.
Representation of an n2 -bounded CM counter by a fixed number of 2n-bounded
counters. Consider the traditional enumeration of pairs of natural numbers:
{(0, 0), (0, 1), (1, 0), (2, 0), (1, 1), (0, 2), (0, 3), . . .}
357
as described in Appendix A.7. We represent any one counter Cz with value z by two
counters Cx, Cy with values x, y such that z is the position of the pair (x, y) in this
enumeration. Note that
z = (x + y)(x + y + 1)/2 + y = (x2 + 2xy + y 2 + x + 3y)/2
so 0 x2 , y 2 2z. Thus x, y 2n if z n2 .
Each CM operation on Cz is simulable by operation on Cx, Cy as in Figure 24.1. For
example, Cz:=Cz+1 involves moving Northwest one position along a diagonal unless x = 0,
in which case one moves to the start of the next diagonal.
We showed earlier that without loss of generality one may assume that test if InC = 0
goto ` else `0 is only performed when the value i of C satifies i n. This is harder, as it
involves reconstructing i from the representation Cx, Cy of C. First, Cx and Cy are copied
into Dx, Dy, giving representation of a variable we could call D. By manipulating Dx and
Dy the loop decrements D until it reaches 0 or n decrements have occurred, meanwhile
counting variable R up by 1 at each iteration. The net result is to set R to i = min(i, n),
and that input position is then tested.
This reduces the counter bound from n2 to 2n; the technique below can be used to
reduce this further to n.
2
The development above supports the intuition that logspace is precisely the class of all
problems solvable by read-only programs, which may move any fixed number of markers
around their input, but cannot use any other form of storage. The characterization by
GOTO programs is particularly elegant, although one has a suspicion that such programs
will take extra time due to the complexity of backing up to inspect an already-seen
input.
Representation of one 2n-bounded CM counter C by several n-bounded counters. We represent C containing x by counters Under and Over, where Under contains
min(x, n), and Over contains 0 if x n and x n otherwise. Each CM operation on C is
simulable as in Figure 24.2. Variable N is counter C0, initialized to the input length n
(again assumed to be posititve).
24.1.3
Wadlers treeless transformer, when applied to any of a quite useful class of firstorder programs, will automatically yield a linear-time equivalent program which builds
operation on C
Cz := Cz+1
Cz := Cz-1
if Cz 6= 0 goto `
if InC =
6 0 goto `
Simulation on C1, C2
if Cx 6= 0 then {Cx := Cx-1; Cy := Cy+1}
else {Cx := Cy+1; Cy := 0}
if Cy 6= 0 then {Cx := Cx+1; Cy := Cy-1}
else {Cy := Cx-1; Cx := 0}
if C1 6= 0 or C2 6= 0 then goto `
R := 0; S := C0; Dx := Cx; Dy := Cy;
while S 6= 0 and not(Cz1 = Cz2 = 0) do
{R := R+1; Code for D := D-1}
if InR 6= 0 goto `
Simulation
if Under = n
then Over := Over+1 else Under := Under+1
if Over = 0
then Under := Under-1 else Over := Over-1
if Over 6= 0 or Under 6= 0 then goto `
if InUnder 6= 0 goto `
24.2
359
We now prove that ptime is identical to the set of problems solvable by cons-free programs with recursion. This is analogous to the intrinsic characterization of logspace,
without reference to time or storage bounds.
24.2.1
globalvariables U1,...,Uu;
procedure P1; localvariables P11,...,P1v;
1:I1 2:I2 ... i:Ii
procedure P2;
1:J1 2:J2
.....
procedure Pm;
1:K1 2:K2
localvariables P21,...,P2w;
... j:Jj
localvariables Pm1,...,Pmx;
... k:Kk
24.2.2
As a first step we use the flow chart implementation of GOTO using arrays, as in Section
17.2 of Chapter 17. An example appears in Figure 17.5.
Lemma 24.2.2 Given a GOTO-program p = 1:I1 2:I2 ...m:Im and an input d ID01 .
Let (`1 , 1 ) . . . (`t , t ) . . . be the (finite or infinite) computation of p on d, where
`1 = 1 and 1 is the initial DAG for input d. Then for any t 0 and variable X the
equations in Figure 24.4 hold.
Proof. A simple induction on t, using the definitions from Figure 17.4.
Theorem 24.2.3 If V ID01 is decidable by a (recursive or nonrecursive) WHILEprogram p in polynomial time, then V is decidable by a CMlogspace+rec -program.
Proof. Suppose one is given a WHILE-program p that runs in time f (n) where f is a
polynomial, and an input d. The various functions Instrt , Hdt , Tlt , Xt are computable by
mutual recursion, at least down to t = n + 3 (the time used to build the initial DAG as in
Section 17.2.2). Further, the values of Hdt , Tlt for t = 0, 1, . . . , n + 2 are determined solely
by the program input d, and easily computed.
Regard each equation in Figure 24.4 as a definition of a function of one variable t.
This is always an integer, between 0 and f (n) + n + 3 where n = |d|.
The calls all terminate, since in each call the value of argument t decreases. Now t
is bounded by the running time, which is a polynomial in the size of d, hence p can be
simulated by a recursive counter machine with polynomial size bounds on its counters.
l0 : Il0
l0 : I 0
l
Instrt+1 =
l00 : Il00
l + 1: Il+1
Hdt+1
if Instrt = l:
if Instrt = l:
if Instrt = l:
otherwise
goto l
if X goto l else l00 and Xt 6= 0
if X goto l else l00 and Xt = 0
Yt
0
if Instrt = l:
otherwise
X := cons Y Z
Zt
0
if Instrt = l:
otherwise
X := cons Y Z
if Instrt =
if Instrt =
if Instrt =
if Instrt =
otherwise
X
X
X
X
Tlt+1
Xt+1
Yt
Hd(Yt )
= Tl(Yt )
t+1
Xt
l:
l:
l:
l:
361
:=
:=
:=
:=
Y
hd Y
tl Y
cons Y Z
Proof. (Sketch.) This is done by tabulation. Suppose we are given an F+ro program p,
and an input d0 = (a1 . . . an ) ID.
The idea is to collect a set M F G1 of triples of forms (f, , ) or (f, , d), where f is
the name of a function defined in p, is a tuple of arguments to f, and d ID. These
signify the following.
1. (f, , ) M F G: function f appearing in program p has been called, with argument
tuple . Computation of the value of f() is not yet finished.
2. (f, , d) M F G: function f appearing in program p has been called, with argument
tuple , and the value f() = d has been computed.
Since p is cons-free, the value that assigns to any variable X must be a pointer to some
part of d0 . There are at most n of these, and so there exist at most 2m nk+1 possible
triples in M F G, where m is the number of functions defined in p.
The simulation algorithm:
1. M F G := {(f1, [X1 7 (d0 )], )}, where the first function in p is f1 and has argument
X1.
2. Repeat steps 3 through 9 until M F G cannot be changed.
3. Pick a triple (f, , ) M F G, and find the definition f(X1,...,Xn) = Exp in program p.
4. Attempt to evaluate Exp with X1,...,Xn bound to the values in .
5. If the value of a call g(Exp1,...,Expm) is needed in order to evaluate Exp, try to
evaluate the arguments Exp1,...,Expm to yield a tuple 0 .
6. If argument evaluation fails, then abandon the current attempt to evaluate Exp.
7. If argument evaluation succeeds and M F G contains a triple (g, 0 , d0 ), then continue
to evaluate Exp with d0 as the value of the call g(Exp1,...,Expm).
8. If argument evaluation succeeds but M F G contains no triple (g, 0 , d0 ) with d0 ID,
then perform M F G := M F G {(g, 0 , )}, and abandon the current attempt to
evaluate Exp.
9. If evaluation of Exp with X1,...,Xn bound to the values in succeeds with result
value d, then replace (f, , ) M F G by (f, , d) M F G.
10. If (f, [X1 7 (d0 )], d) M F G, then [[p]](d0 ) = d, else [[p]](d0 ) = .
1M F G
363
M F G is used for two purposes while simulating program p. The first is as an oracle,
from which to fetch values of already computed function applications, rather than recomputing them. The second is as a repository in which the triple (f, , d) is placed
every time a new fact f() = d has been established. If this happens, the triple (f, , )
(which must already be in M F G) is replaced by the new (f, , d).
This process is repeated until M F G cannot be increased. If one ever adds a triple
(f1, [X1 7 d0 ], d), then we know that [[p]](d0 ) = d, and the computation stops. The entire
algorithm can be made terminating, since there exists only a polynomially bounded
number of possible triples to put in M F G.
Interestingly, the same technique also works if p is nondeterministic, and the method
applies as well if the functions are replaced by relations.
2
Further developments. Cook [30] proved similar results in the framework of auxiliary push-down automata. Further developments involving efficient memoization led to
the result that any 2DPDA (two-way deterministic pushdown automaton) can be simulated in linear time on a RAM ([27, 76, 6]). This in turn led to efficient pattern-matching
algorithms, in particular the Knuth-Morris-Pratt string matcher an interesting case
where investigations in pure theory led to a practically significant algorithm.
An interesting open problem. The results above can be interpreted as saying that,
in the absence of cons, functional programs are capable of simulating imperative ones;
but at a formidable cost in computing time, since results computed earlier cannot be
stored but must be recomputed. In essence, the heap can be replaced by the stack,
but at a high time cost.
It is not known, however, whether this cost is necessary. Proving that it is necessary
(as seems likely) would require proving that there exist problems which can be solved in
small time with general storage, but which require large time when computed functionally.
A simple but typical example would be to establish a nonlinear lower bound on the
time that a one-tape, no-memory two-way pushdown automaton [30] requires to solve
some decision problem. One instance would be to prove that string matching must take
superlinear time. We conjecture that such results can be obtained.
Exercises
24.1 Prove the missing part of Theorem 24.1.2. (Note that two inclusions need to be
established.)
2
24.2 Prove that it is possible to construct from any GOTOro program an equivalent WHro
program, and vice versa. (You may appeal to constructions seen earlier.)
2
24.3 Prove that it is possible to construct from any GOTO program an equivalent Fro
program.
2
24.4 Try to show how to construct from any Fro program an equivalent GOTOro or WHro
program. Reflect on the results of your attempt.
2
24.5 Assume that WHro programs are allowed a command write X whose effect is to
extend a write-only output string by 0 in case the value of X is nil, and to extend it by
1 otherwise. The output string is initially empty.
Denote by x the binary representation of number x, as a list of bits written in reverse
order, i.e. least significant bit first. Write a WHro program which, when given input
(x y), will write out x + y.
2
24.6 Assume WHro programs have outputs as described in the previous exercise. Write
a WHro program which, when given input (a1 a2 . . . an ) where each ai {0, 1}, will write
out its reversal (an an1 . . . a1 ).
2
References
Both of the main results of this chapter have been seen before in other forms.
It has long been a folklore theorem that logspace consists of exactly to the sets
decidable by a multihead, two-way read-only Turing machine. The result of Theorem
24.1.7 implies this, since such a Turing machine is essentially identical to a CMvalue(n)
program. Our result is a bit stronger since Theorem 24.1.7 can be read as saying that
the Turing machine could be restricted only to move its heads right, or to reset them
back to the start of the input tape.
More than 25 years ago Cook [27] used a somewhat different framework, auxiliary
push-down automata, to characterize ptime. In essence this is very close to our proof
that ptime equals the sets decidable by Frotr-programs, the main difference being that
365
our recursive programs have an implicit call stack in place of Cooks nonrecursive automata with an explicit single stack.
In comparison to these classical results, our program-oriented version seems more
natural from a programming viewpoint (both appear in [88], which sums up the results
of this chapter). In particular, the results are still of considerable interest as regards
relationships between time and space, or the power of cons in a functional language.
Part V
Complete Problems
25
25.1
Introduction
369
3. Even more interesting: Many natural and practically motivated problems have been
proven to be complete for one or another complexity class C.
25.1.1
Forms of reduction
The idea of reduction of one problem to another has been studied for many years, for
example quite early in Mathematical Logic as a tool for comparing the complexity of
two different unsolvable problems or undecidable sets. Many ways have been devised to
reduce one problem to another since Emil Posts pathbreaking work in 1944 [143].
A reduction A B where (say) A, B ID can be defined in several ways. First, the
reduction may be many-one: one shows that A B by exhibiting a total computable
function such that for any d ID we have d A if and only if f (d) B. Clearly, an
algorithm for deciding membership in B can be used to decide membership in A. (A
concrete example will be given shortly.) A stronger version is one-one, in which f is
required to be injective.
An alternative is truth-table reducibility, where one answers a question x A? by
asking several questions y1 B, . . . , yk B?, and then combining the truth values of
their answers in some preassigned way. Yet another variant is Turing reducibility, where
question x A? gives rise to a dialogue: a whole series of questions about membership
in B. The first question depends only on x. The second question (if any) can depend
both on x and the response (positive or negative) to the first question; and so forth. The
chief requirement on such a reduction is that the series is required to terminate for every
x and answer sequence.
If computability is being studied, the only essential requirement is that the reduction
be effective. Complexity classifications are naturally involve bounds on the complexity
of the questions that can be asked, for example of the function f used for many-one
reducibility. In order to study, say, the class nptime using many-one reducibility, it is
natural to limit ones self to questions that can be computed by deterministic algorithms
in time polynomial in |x|.
25.1.2
Appendix Section A.1 describes graphs, and boolean expressions and their evaluation.
We use the term CNF to stand for conjunctive normal form.
Definition 25.1.1
Introduction
371
Three combinatorial decision problems. Following are three typical and interesting problems which will serve to illustrate several points. In particular, each will be seen
to be complete, i.e. hardest, problems among all those solvable in a nondeterministic
time or space class. The problems:
GAP
{ (G, v0 , vend )
CLIQUE
{ (G, k)
SAT
{F
25.1.3
In this and the following chapters, we prove problems complete for various classes using
a novel approach. Supose we are given a decision problem H that we wish to show
complete for complexity class C. The most intricate part is usually to show that H is
Incidence matrix:
1 2 3 4 5
1 0 1 0 1 1
2 1 0 1 0 1
3 0 1 0 1 0
4 1 0 1 0 1
5 1 1 0 1 0
25.2
Before comparing problem complexities, we have to address a question: can the way a
problem is presented significantly affect the complexity of its solution? For one example,
a number n can be presented either in binary notation, or in the much longer unary
notation, such as the list form niln used before. Another example is that a directed or
2 Showing
373
CLIQUE algorithm for one representation would immediately imply the same for any of
the other representations.
From this viewpoint the most sensible problem representations are all equivalent, at
least up to polynomial-time computable changes in representation. The question of representation becomes trickier when one moves to lower complexity classes, and especially
so for linear time computation.
Recent work by Paige on the reading problem [137] shows that data formed from
finite sets by forming tuples, sets, relations, and multisets can be put into a canonical and
easily manipulable storage form in linear time on an SRAM3 . This ensures the independence
of many standard combinatorial algorithms from the exact form of problem presentation.
25.3
which, when given any CNF boolean expression F, will yield a pair f (F) = (G, k) such
that graph G has a k-clique if and only if F is a satisfiable expression.
This implies that CLIQUE is at least as hard to solve as SAT in polynomial time:
given a polynomial time algorithm p to solve CLIQUE, one could answer the question
is F satisfiable? by first computing f (F) and then running p on the result.
Construction 25.3.1 Given a conjunctive normal form boolean expression F = C1
. . . Ck , construct a graph f (F) = (G, k) where graph G = (V, E) and
1. V = the set of occurrences of literals in F
2. E = {(a, b) | a and b are not in the same conjunct of F, and neither is the negation
of the other}
For an instance, the expression
(A B) (B C) (A C)
3 The term pointer machine is sometimes used but imprecise, as argued in [10]. By most definitions,
the programs obtained by compiling GOTO programs into SRAM code are all pointer programs.
375
' $
' $
q
q
B P
LS PPP
B
PP
LS
PP
PP
L S
PPPqC
A
q L S
&
%
&
%
S
L
Z
Z L S
Z
S
L
Z
Z
S
L
#
S
L ZZ
L ZS
Sq
Lq
Z
A
C
"
!
Figure 25.2: The graph f ((A B) (B C) (A C)).
would give graph f (F) as in Figure 25.2. The expression F is satisfied by truth assignment [A 7 f alse, B 7 f alse, C 7 true], which corresponds to the 3-clique {A, B, C}.
More generally, if F has n conjuncts, there will be one n-clique in f (F) for every truth
assignment that satisfies F, and these will be the only n-cliques in f (F).
It is also possible to show that CLIQUE SAT, but by a less straightforward
ptime
25.3.1
AA
A B and B C implies A C
A B and B C implies A C
A B and B D implies A D
Reduction is reflexive
Reduction is transitive
C is downwards closed under reduction
D is downwards closed under reduction
4 For example we could have C = ptime and D = nptime. Generally, C and D will be two classes for
which we know that C D, but we do not know whether the inclusion is proper.
'
$
H
> ]
'
$
J
J
J
D A
JB
C
&
&
%
%
377
Many-one reductions
A common classification technique is by so-called many-one reduction functions. A function f that reduces A to B has the property that x A iff f (x) B for all x. Thus
the question is x A? can be answered by first computing f (x), and then asking is
f (x) B? Provided f is essentially simpler to compute than the problem of deciding
membership in A, this shows a way that answering one problem can help to answer
another.
Definition 25.3.6 Given a class F ns of total functions f : , define
A B if and only if f F ns(x . x A if and only if f (x) B)
F ns
The general idea is that F ns is a class of easy reduction functions, that can be used to
classify complex problems by reducing one to another. An example would be the function
f used to reduce SAT to CLIQUE in the example seen earlier.
Lemma 25.3.7 is a C, D-classifier, provided
F ns
F ns
F ns
rec
ptime
logs
Theorem 25.3.9 Consider the list of problem classes logspace, nlogspace, ptime,
nptime, pspace, rec, re.
1. is a rec, re-classifier
rec
2.
is a ptime, D-classifier for any D appearing later in the list than ptime.
ptime
3. is a logspace, D-classifier for any D appearing later in the list than logspace.
logs
The following shows that the complement of a complete problem is also complete, provided the class it is in is closed under complementation.
Theorem 25.3.10 Suppose that D is closed under complementation, meaning A D
implies \ A D. If is a C, D-classifier and problem H is -complete for D, then
\ H is also -complete for D.
Proof. Since H is -complete for D it is in D, which implies \ H is also in D. Note
that by completeness of H we have ( \ H) H. Further, it is immediate that A B
if and only if ( \ A) ( \ B) for any A, B. This implies H ( \ H).
To show hardness, consider an arbitrary problem A D. Then A H by hardness
of H and so A ( \ H) by transitivity of reduction. Thus \ H is -complete for D.
2
25.3.2
It may seem surprising that complete problems exist at all for our various complexity
classes. Interestingly, most of the classes mentioned before (excepting linear time) have
natural and interesting complete problems. The following chapters will discuss several in
detail
Existence of complete problems
Given a class D and an appropriate notion of problem reduction , a first question to ask
is whether or not D has at least one -complete problem, say, H. This can be technically
difficult since it involves showing that any problem A in D can be reduced to H, i.e. that
H is hard for D. The other part, showing that H D, is often (though not always)
fairly straightforward.
The usual way to show H to be -hard for D is to begin with an arbitrary Turing
machine (or other) program p that decides a problem A within the resource limits that
379
define D, and to show how, given an arbitrary p-input d, to construct a value f (d) such
that d A f (d) H. If f defines a -reduction, the task is completed since one has
shown A H for any A D.
A powerful and general way to prove the existence of such problems is to make
variants of the set accepted by the universal programs seen before (for instance we will
see that the halting problem HALT is complete for the recursively enumerable sets).
While this proves the existence of complete problems, the problems obtained this way
are often, however, somewhat unnatural and unintuitive. An example will be seen below
for nondeterministic linear time in Section 25.6.
Showing other problems complete
Once the existence of one -complete problem for D has been established, other problems
can be shown complete by Proposition 25.3.5: If H is -complete for D, and H B,
and B D, then B is also complete for D. This is usually much simpler since it does not
involve reasoning about arbitrary programs in a computation model. The technique has
been used extensively since Cooks pathbreaking work proving the existence of problems
-complete for nptime. Several hundred problems have been shown complete for
ptime
nptime and for ptime. Relevant books include [52] and [56].
However for this approach to be useful it is necessary that problem H be well-chosen:
simply stated, and such that H can easily be reduced to many interesting problems. It is
for this reason that the problems SAT and GAP have taken prominent roles within the
classes nptime and nlogspace, respectively. We will see similarly archetypical problems
for both ptime and pspace.
We begin with two examples: one obtained by a universal construction, and one
obtained from the state transition graphs used earlier.
25.4
Theorem 25.4.1 The following set is -complete for the class re:
rec
A re. By Theorem 5.7.2, A re implies that there exists a GOTO-program p such that
A = dom([[p]]). Thus for any d
d A if and only if [[p]](d) 6= if and only if (p.d) HALT
Thus A HALT by the (obviously recursive) reduction function f (d) = (p.d).
rec
We conclude that HALT is a hardest problem among all recursively enumerable problems. Further, for each problem X shown undecidable in Chapter 10, either X or its
complement is -complete for re:
rec
Theorem 25.4.2 The following sets are -complete for the class re:
rec
1.
2.
3.
4.
5.
6.
Proof. Chapter 8 showed that HALT HALT-2CM, and Chapter 10 had proofs that
rec
HALT X for each remaining set X in this list. By Theorem 25.4.1, A HALT for any
rec
rec
A re, so by Theorem 25.3.9, HALT X implies A X for any A re. Thus each of
rec
rec
rec
25.5
Theorem 25.5.1 The following set is -complete for the class nlogspace:
logs
381
Proof. Let G = (V, E, v0 , vend ) be a given graph with designated start and finish vertices v0 , vend and vertex set V = {v1 , . . . , vr }. Note that r size(G) for any natural
representation.
First, GAP nlogspace by Theorem 23.3.1. We now need to show that if A
nlogspace then A GAP. Let A = Acc(p) where p is a nondeterministic TMro-program
logs
nlogspace.
Proof. First, it is in nlogspace: Given regular grammar G = (N, T, P, S), build a graph
with edge from A to B whenever there is a production A ::= xB. Then L(G) 6= iff
there is a path from S to some C where C ::= x with x T is a production in P . In
Section 23.3 we saw that graph searching could be done by a nondeterministic algorithm
in logarithmic space. The graph built has size no larger than that of the grammar, so
this shows that the nonemptiness problem for regular grammars is in nlogspace.
Conversely, since the graph accessibility problem GAP is complete for nlogspace
it suffices by Proposition 25.3.5 to reduce GAP to the regular nonemptiness problem.
Given a graph accessibility problem instance (G, v0 , vend ), construct a grammar with
start symbol v0 , productions A ::= B for all edges A B of G, and a single terminal
production vend ::= . This regular grammar will generate the set {} if G has a path
from v0 to vend , and if there is no such path.
2
The following are immediate from Theorem 25.3.10.
Corollary 25.5.4 The following set is -complete for the class nlogspace:
logs
nlogspace.
25.6
We now show that nlintime has a hardest problem with respect to linear-time reductions. This problem is a variant of the set accepted by the universal program u; one of
the complete problem sources mentioned in Section 25.3.2.
A nondeterministic universal program
By definition A nlintime iff A is accepted by a nondeterministic program p which
runs in time bounded by a |d| for some a and all d ID.
Recall the universal program u of Chapter 4. Construct a universal program nu for
nondeterministic programs by extending the STEP macro by adding two rules to interpret
the instruction choose C1 or C2, as follows. These can be implemented simply by using
a choose instruction in the interpreter itself.
Code
(choose C1 C2).Cd
(choose C1 C2).Cd
Comp. stack
St
St
Value
Vl
Vl
Code
C1.Cd
C2.Cd
Comp. stack
St
St
Value
Vl
Vl
It is easy to see that nu is efficient as we have defined the term, and that (p.d) is accepted
iff p accepts d.
Definition 25.6.1 f : ID ID is linear time and size computable if there are a, b, p such
that f = [[p]], and time p (d) a |d|, and |f (d)| b |d| for all d ID.
Definition 25.6.2 Let L, M ID. Then L is reducible to M (written L M ) iff there
ltime
is a linear time and size computable function f such that d L iff f (d) M for all d in
ID.
Further, P ID is complete for nlintime iff P nlintime and L P for all L
ltime
nlintime.
Lemma 25.6.3
383
Proof. This is essentially the same as for Lemma 25.3.7. It is immediate that the
identity function is linear time computable. Further, the composition of any two linear
time computable functions is also linear time computable. Note that the size condition
|f (d)| b |d| for some b and all d ID is needed for this: Functions computable within
linear time without a size limit are not closed under composition since they can build
values exponentially larger than their argument, for example by repeatedly executing
X := cons X X.
2
Lemma 25.6.4 L M and M lintime implies L lintime.
ltime
Exercises
25.1 Prove that SAT CLIQUE, i.e. that the reduction described before can be done
logs
in logarithmic space.
25.2 Prove that GAP
path from v0 to vend . Show that GAP1 is also -complete for the class nlogspace.
logs
Hint: given graph G = (V, E), the triple (G, v0 , vend ) is in GAP iff G has a path from
v0 to vend . This path can have at most n 1 vertices, where n = |V |. Show how to
0
construct from G an acyclic graph G0 = (V 0 , E 0 ) that has a path from v00 to vend
iff G
has a path from v0 to vend .
2
25.3 Prove that CLIQUE
ptime
of all of all triples (G, S, k) such that S is a subset containing k of Gs nodes, such that
every edge of G includes a node from S as an endpoint. Hint: consider the complement
graph G, with the same vertices but with all and only the edges that are not edges of G.
2
25.4 Prove two parts of Theorem 25.3.9.
References
The approach used in Chapters 25 through 28, of reducing arbitrary computations to
computations of programs using only boolean variables, was first used (to our knowledge)
in [55], and was an organizing theme of two papers by Jones and Muchnick [81, 82].
The concepts of many-one reduction (and several other reductions) stem from recursive function theory. They are very clearly explained in Posts 1944 paper [143], which
also shows in essence that HALT is -complete for RE.
rec
independently in 1972 [106, 104], but this was unrecognized for several years due to its
terseness and inaccessibility. Cooks and Levins results were not widely remarked until
385
Karp showed that a great many familiar combinatorial problems are also complete for
nptime [95], at which time wide interest was aroused.
Somewhat earlier in 1970, Savitch had in essence shown that the GAP problem is
-complete for nlogspace in [157]; and in 1971 Cook had proved the path problem
logs
to be -complete for ptime [26]. In 1972 Meyer and Stockmeyer proved some problems
logs
6 This
was done independently by Cook, and by Meyer and Stockmeyer too, at around the same time.
26
The fact that SAT is complete for nptime is perhaps the most important result in
theoretical Computer Science. The technical breakthrough was Cooks realization that
questions about Turing machine computations could be expressed in terms of formulas
in propositional logic, i.e., boolean expressions.
As a convenient stepping-stone we will first reduce questions about Turing machine
computations to questions about boolean programs, and then reduce questions about their
computations further to ones about propositional logic. Although the questions we ask
are similar to the halting problem and so are all undecidable for Turing machines, boolean
programs have decidable properties since their entire state spaces can be computed.
In this and the next two chapters we prove theorems relating well-known complexity
classes to properties of boolean programs, and then use them as a basis to show several standard combinatorial, logical and linguistic problems to be complete for various
complexity classes.
Definition 26.0.7 (The language BOOLE and sublanguages.)
1. A boolean program is an input-free program q = I1 . . . Im where each instruction I
and expression E is of form given by:
I
::=
E
X
::=
::=
X | true | false | E1 E2 | E1 E2 | E | E1 E2 | E1 E2
X0 | X1 | . . .
2. Language SBOOLE (sequential BOOLE) is identical, except that programs may not
contain goto `.
3. Language MCIRCUIT (monotone BOOLE) has a very limited instruction format:
I
::=
X := Y | X := true | X := Y Z | X := Y Z
all variables are assigned to false in the initial store. Since there is no input, rather
than the form [[q]](d) used until now we instead write [[q]] and [[q]] to indicate that
the computation by q does or does not terminate; and notation [[q]] to denote the value
computed by q: the value stored by the last assignment done by program q, assuming
[[q]] ([[q]] is undefined if [[q]]).
26.1
Definition 26.1.1 The length |q| of a BOOLE program is the number obtained by counting one for every operator :=, ;,. . . , appearing in q, and adding 1 + dlog(i + 1)e for
each occurrence of a variable Xi in q.
This is consistent with assuming that variable Xi is represented by a tag (such as the
var used earlier) together with the value of i in binary notation.
Lemma 26.1.2 An SBOOLE program always terminates, and can be executed in time
polynomial in its length.
Proof. Immediate by a simple algorithm slightly extending that of Section 3.4.2.
Lemma 26.1.3 Let Turing machine program p run in polynomial time (n) on inputs
of length n. There is a logspace computable function f : {0, 1} {SBOOLE programs}
such that for any d {0, 1} :
SBOOLE
= true
Index range
1 ` m+1
(n) i (n)
389
T1 : a1 ;. . . ; Tn : an ;
Tn+1 : B; . . . ; T(n) : B;
T0 : B; T1 : B; . . . ; T(n) : B;
L1 := true; Accept := false;
STEP;(n)
Accept := Accept;
Input d = a1 . . . an on squares 1, 2, . . . , n;
blanks to the right; and
blanks to the left.
Start at I1 ; p has not yet accepted d.
Run p for (n) steps.
Accept = true iff p eventually accepted d.
for
for
for
for
Turing instruction: I`
Boolean sequence I`
right
left
write a
if a goto `0 else `00
RIGHT((n));
LEFT((n));
T0 : a;
L` := false;
Figure 26.1 contains (n) copies of the SBOOLE code STEP, which simulates a single step
of ps computation on d. Its
STEP =
form:
if L1 then I1 else
if L2 then I2 . . .
if Lm then Im else Accept := true
Lemma 26.1.4 Consider the state of Boolean program q just after the t-th execution
of the instructions in STEP, where 0 t (n). Then L` will be true for exactly one `,
and for each i with (n) i (n), Boolean variable Tai will be true for exactly one a.
Proof. Recall that any uninitialized variable has start value false. The result is immediate for t = 0 by the way q was constructed. Further, examination of the cases in Figure
26.2 shows that these properties are preserved in going from t to t + 1.
Lemma 26.1.5 (Correctness of simulation) Let p ` (`1 , 1 ) . . . (`r , r ) be the computation by TM-program p on input d. For any 0 t r let t = L S R where b0 b1 . . . br = SR
and br . . . b1 = L. Then SBOOLE program q, after executing STEP for the t-th interation,
b
will be in a state such that L`t = true, and Tt i = true for any i with r i r.
Corollary 26.1.6 At the end of execution, q will assign true to Accept if and only if
p accepts d.
Lemma 26.1.7 q = f (d) is constructible from d in space O(log |d|).
Proof. Construction of q begins by generating the initialization code for Ti , L1 and
Accept, done with one loop over i = (n), . . . , (n). The code for one occurrence of
STEP has O((n)) SBOOLE instructions, and each Turing machine instruction code I` , for
` = 1, . . . , m, can be generated in time O((n) log n). The STEP code is replicated (n)
times, followed by generation of Accept:=Accept. These several loops all involve indices
i, t that lie between (n) and (n) and so can be stored is space O(log n).
Lemma 26.1.2 implies that SbooleComp is in ptime. By that result and the hardness
property just shown, we have proven the following:
Theorem 26.1.8 The following set1 is -complete for ptime:
logs
26.2
391
Programs in the subset MCIRCUIT of BOOLE are so simple that they can be regarded as
circuits. We now show that these too yield a decision problem complete for ptime.
The key is to show that SBOOLE programs can be reduced still further, by eliminating
if and . The following proves this, plus another result we will use later (in Theorem
26.3.3): that no variable is assigned in two different commands.
Lemma 26.2.1 There is a logspace computable translation from any SBOOLE program
p to another q such that [[p]] = [[q]] and q has each of the following properties:
1. The right sides of assignments in q are all of form X, true, X Y, X Y, or X,
where X, Y are variables; and the if instruction tests only variables.
2. Property 1 plus: q has no if instructions, so it is just a sequence of assignments.
3. Properties 1 and 2 plus: no right side of any assignment in q contains .
4. Properties 1, 2 and 3 plus: q has the single-assignment property.
Proof. We prove these accumulatively, in order. Item 1 is quite straightforward by
adding extra assignments to simplify complex expressions (using methods seen before),
and expressing the operators and in terms of and . Any occurrence of false
can be replaced by an uninitialized variable.
Items 2 and 3 are less trivial. For item 2, suppose p = I1 . . . Im is an SBOOLE program.
We define an if-free translation I of each instruction I, and set program q to be:
Go := true; I1 ...Im
The translation I is given below. Variable S below is to be chosen as a new variable
for each occurrence of if in I, but the same Go is used for all of q. We have used
expressions with more than one right-side operator for readability, but they can obviously
be eliminated.
if U then I else J
I; J
X := E
=
=
=
Correctness is based on the following claim: For any simple or compound instruction
I, its translation I
has exactly the same effect as I on variables assigned in I, provided Go is true
when its execution begins, and
has no effect at all on variables assigned in I, provided Go is false when its execution begins.
First, it is easy to see that the translation of X:=E will make no change to X if variable Go
is false, and that it will effectuate the assignment X:=E if variable Go is true. Second,
if Go is false when execution of instruction if U then I else J or instruction I; J
begins, then it will remain false until its end, so I has no effect.
Third, assuming Go is initially true, the translation of if U then I else J will
execute the translations of both branches I and J, and in that order; and it will also set
Go to true in the branch to be executed, and to false in the other branch.
For item 3: Instructions of form X := Y can be eliminated by a straightforward
program transformation. The device is to represent every p variable X by two complementary variables, X0 and X00 , in its translation p0 . The idea is that each will always be the
negation of the other, and the value of X in p is the value of X0 . This property is ensured
at the start by prefixing the code of p0 by instructions X00 := true for every X occurring
in p (since all are initially false). The last step is to show how each of ps instructions
can be simulated while perserving this invariant representation property. This is easy, as
seen in the table of Figure 26.3. Variable Tem is used in the last line so an assignment X
:= X will not go wrong.
Instruction in p
X
X
X
X
X
:=
:=
:=
:=
:=
true
Y
Y Z
Y Z
Y
Translation in p0
X0 := true;
X0 := Y0 ;
X0 := Y0 Z0 ;
X0 := Y0 Z0 ;
Tem := Y0 ;
X00
X00
X00
X00
X0
:= Freshvariable
:= Y00
:= Y00 Z00
:= Y00 Z00
:= Y00 ; X00 := Tem
393
Logspace computability. This is straightforward; since logspace computable functions are closed under composition we need only argue that each individual transformation can be done in logarithmic space.
Item 1 is not difficult; the only trick is to use counters to keep track of expressions
nesting level (a fully parenthesized concrete syntax should be used). Item 2 is also
straightforward one must just assign unique S variables, which can be done by indexing
them 1, 2, etc. Item 3 can be done in two passes. Pass one finds all variables X in p,
and to generate X00 :=true for each to prefix the translation. Pass two translates each
instruction as described above.
Finally, item 4 (variable renaming): each instruction Ii in a given p = I1 . . . Im is an
assignment; denote it by Xi :=Ei . There may, however, be several assignments with the
same left side Xi = Xj even though i 6= j. Transformation: replace every Ii by Xi :=Ei
where X1 ,. . . ,Xm are new variables, and Ei is identical to Ei except that reference to any
Y in Ei is replaced as follows:
Trace backward from Ii until you first find an instruction Ii = Y:=..., or the
program start.
If Ii = Y:=... is found then replace Y by Xi , else leave Y unchanged.
This can be done using three pointers: one for the current Ii , one for tracing backward,
and one used to compare variables for equality.
2
Theorem 26.2.2 The monotone circuit value problem is -complete for ptime:
logs
for any problem A ptime. The construction just given implies SbooleComp MCV,
logs
26.3
Proof. HORN is in ptime: Consider the following simple marking algorithm. It is easy
to verify that it runs in polynomial (quadratic) time. The HORN problem can, in fact,
be solved in linear time on a pointer machine [40, 10].
395
Algorithm. Given (H, A), begin with every boolean variable being unmarked. Then
for each Horn clause A1 A2 . . . Ak A0 H with unmarked A0 , mark A0 if all of
A1 A2 , . . . , Ak are marked; and repeat until no more variables can be marked.
Clearly the algorithm works in time at most the square of the size of H. Correctness
is the assertion that H ` A iff A has been marked when the algorithm terminates (it
is clear that it does terminate since no variable is marked more than once). For if,
we use induction on the number of times the algorithm above performs for each Horn
clause. . .
Note that all axioms will be marked first, and these are trivially provable from H.
Now consider A1 A2 . . . Ak A0 in H, and suppose a mark has just been placed on
A0 . By the inductive assumption each left side variable is provable, so the right side will
also be provable (by Definition 26.3.1). In this way every provable variable will eventually
be marked, so if A has been marked when the algorithm terminates, then H ` A.
Similar reasoning applies in the other direction (only if), using induction on the
number of steps in a proof. The base case where A is an axiom is immediate. Assume
H ` A by a proof of n+1 steps whose last step uses Horn clause A1 , A2 . . .Ak A0 H.
By induction all of A1 , A2 . . . , Ak have been marked, so A0 will be marked if not already
so. Thus every variable that is provable from H will get marked.
HORN is hard for ptime: Suppose A ptime is decided by TM program p in polynomial time. For a given input d, consider the single-assignment MCIRCUIT program q
constructed from p, d in the proof of Theorem 26.2.2. It had the property that d A iff
[[q]] = true.
Construct from this a Horn problem H which has
1. An axiom X for every assignment X := true in q.
2. A clause Y X for every assignment X := Y in q.
3. A clause Y Z X for every assignment X := Y Z in q.
4. Clauses Y X and Z X for every assignment X := Y Z in q.
Exercise 26.3 is to show that this construction can be done in logarithmic space. Letting
A be the last variable assigned in q, the following Lemma 26.3.4 shows (H, A) has a
solution if and only if [[q]] = true.
2
Remark: the single-assignment property is essential for this, since there is no concept of
order in the application of Horn clauses to deduce new boolean variables. If applied to
an arbitrary monotone straightline BOOLE program, H can deduce as true every X that
the program makes true, but it could possibly also deduce more than just these, since
it need not follow qs order of executing instructions.
Lemma 26.3.4 Let q = 1:X1 :=E1 . . . m:Xm :=Em , and let
q ` (1, 0 ) . . . (m + 1, m )
be qs computation where 0 (X) = false for every q variable X. Then for every i [0, m]
we have H ` Xi if and only if i+1 (Xi ) = true.
Proof. This is by induction on i. Assume the statement holds for all k with 0 k < i m,
and consider the form of the ith instruction Xi :=Ei . If it is Xi :=true then i+1 (Xi ) =
true and H ` Xi since Xi is an axiom. Suppose the ith instruction is Xi :=Xj ; then
j < i by the single-assignment property as established in Lemma 26.2.1. By induction
j+1 (Xj ) = true iff H ` Xj .
One direction: suppose i+1 (Xi ) = true. This implies i (Xj ) = j+1 (Xj ) = true. Then
by induction H ` Xj , which implies H ` Xi by clause Xj Xi . The other direction:
suppose H ` Xi . This can only be deduced from clause Xj Xi because of qs singleassignment property, so H ` Xj holds, and j+1 (Xj ) = true by induction. Again by qs
single-assignment property, this implies i (Xj ) = true = i+1 (Xi ).
The other two cases are very similar and so omitted.
2
26.4
397
6=
logs
logs
a pair (H, B), construct a context-free grammar G whose nonterminals are the boolean
variables appearing in H, with start symbol B, and which has productions:
A ::= (the empty string)
A ::= A1 A2 ...Ak
if A is an axiom in H
if A1 A2 . . . Ak A H
where V ars is the set of boolean variables appearing in (H, A), and Axioms is the set of
clauses of form B in H, and
M
= {(A0 , A1 . . . Ak A0 ) | A1 . . . Ak A0 H}
{(A1 . . . Ak A0 , Ai ) | 1 i k}
In words: a position for player 1 is a variable, and a position for player 2 is a clause in
H. A move for player 1 from position A is to choose a clause implying A, and a move
for player 2 from A1 . . . Ak A0 is to choose a premise Ai to prove.
It is easy to verify (Exercise 26.4) that position A is winning for player 1 if and only
if A is deducible from H, i.e. G = (V ars, H, M, Axioms) GAME iff (H, A) HORN.
Further, it is easily seen that G is constructible from H in logarithmic space.
2
26.5
There seems to be a clear gap between those problems that are easy to solve using parallelism, and problems that are complete for ptime. A sketch follows, although parallelism
is outside the scope of this book.
The class nc (standing for Nicks Class) is the set of all problems that can be
solved on inputs of size n in time O(logk n) (very fast), provided one is given a number of
processors that is polynomial in n, and that these can communicate instantaneously (a
rather liberal assumption). Analogous to identifying ptime with the class of all feasibly
solvable problems, many researchers identify nc with the class of all problems that have
efficient parallel solutions. While the identification is not perfect, it gives a starting point,
and has been used in many investigations.
The classes logspace and nlogspage are easily seen to lie within nc, which certainly
lies within ptime. On the other hand, if any problem that is -complete for ptime lies
logs
in nc, then ptime = nc, that is all polynomial-time solvable problems have fast parallel
solutions. This would be a remarkable result, comparable in its significance to showing
that ptime = nptime.
Thus to show that certain problems are hard to parallelize, it suffices to show that
they are complete for ptime. This property is often used in the literature, and is a major
motivation of the book [56]. More details can be found in that book, or in the one by
Papadimitriou [138].
399
Exercises
26.1 Prove correctness of the translation of Lemma 26.2.1, using induction on program
length.
2
2
26.3 Complete the proof of Theorem 26.3.3 by showing that function f is computable
in logarithmic space.
2
2
26.4 Fill in the missing details of the proof of Proposition 26.4.4.
References
The first problem known to be -complete for ptime was Cooks path problem, delogs
scribed in [26]. The circuit value problem was proven complete for ptime by Goldschlager
in 1977 [55]. The remaining problems in this chapter were proven complete by Jones [79].
The book by Greenlaw, Hoover, and Ruzzo [56] has a very large collection of problems
complete for ptime, with particular emphasis on parallel computation.
27
In Chapter 26 we showed the Horn clause deducibility problem to be complete for ptime.
Hardness was proven by steps whose net effect is to reduce acceptance of an input by a
deterministic polynomial time Turing machine program to provability of a goal by a set
of Horn clauses. A variation on this construction is used to prove the central result of
this chapter: that SAT is -complete for nptime.
logs
2. Build from p and d an SBOOLE program q such that [[p]] (d) = true iff [[q]]BOOLE =
true.
Conclusion: Problem SbooleComp is -complete for ptime. Completeness of Horn
logs
[[Init; q]]
27.1
SBOOLE
= [[Init; q1 ]]
SBOOLE
= [[Init; q2 ]]
= true
and so complete.
1 By Proposition 26.3.2, Horn clause deducibility is a special case of nonsatisfiability, and so Horn
clause nondeducibility is a special case of satisfiability. By Theorem 25.3.10 Horn clause nondeducibility
is also complete for ptime, so phase twos result naturally extends that for ptime.
403
(n)
L1 := true; I ; . . . ;I
; Answer := Lm+1
Exactly the same construction from Figure 26.2 can be used, plus the following translation
of the nondeterministic choice instruction:
TM Instruction I`
goto `0 or `00
This is clearly a deterministic program. Construct q from q0 by prefixing it with instructions X:=false for every variable in q, except the oracle variables Ot . Clearly
{O0 , O1 , . . . , O(n) } includes all variables not assigned in q.
Now q has the property that TM program p has a terminating computation on input
d if and only if [[Init;q]] = true for some initialization assignment sequence Init. (This
was not true for q0 since its construction relied on the falsity of unassigned variables.)
An example initialization sequence:
Init = O0 := true; O1 := false; . . . ; O(n) := true;
Correctness of q. First, if [[Init;q]] = true for some initialization sequence Init, then
the accepting computation by Init;q clearly corresponds to an accepting computation
by p on d. Now consider any computation (there may be many) by p on d:
p ` (1, B d) (`t1 , t1 ) (`0t1 , t0 1 ) . . . (`tr , tr ) (`0tr , t0 1 ) . . .
where (`t1 , t1 ), (`t2 , t2 ), . . . is a list of all states (`t , t ) such that I`t has form goto `0
or `00 . For each such ti , let Init contain assignment Oti := true if branch `0ti is taken in
2 From
Chapter 22: a TM program which may also have instructions of form goto `0 or `00 .
27.2
Expression F, to be built from p and d, will have the same boolean variables as in the
previous section, plus new boolean oracle variables Ot , one for each point in time, i.e.,
polynomially many new variables.
A choice by p to transfer control to `0 at time t will amount to setting Ot = true
in F, and a transfer to `00 will amount to setting Ot = f alse in F. Each Ot is called an
oracle variable, since the choice of a satisfying truth assignment (Definition 25.1.1) for
F, in effect, predetermines the sequence of choices to be taken by program p, just as
the initialization sequence Init of the preceding section. The values of variables Ot will
not be uniquely determined: Since p may have many different computations on the same
input d, some accepting and others not, there may be many satisfying truth assignments.
F will be constructed from q very similarly to the way H was built, but a few
additional clauses will be needed to be certain that F can be satisfied only in ways
that correspond to correct computations.
Theorem 27.2.1 SAT is -complete for nptime.
logs
Proof. First, SAT nptime by a simple guess and verify algorithm. Given a boolean
expression F, a nondeterministic program can first select a truth assignment , using the
instruction goto ` or `0 to choose between assigning true or false to each variable.
Then evaluate (F) in polynomial time by Lemma 26.1.2. If true, accept the input, else
dont. All this can certainly be done by a nondeterministic polynomial time computation.
The next task is to show that A SAT for any set A ID01 in nptimeTM . This is
logs
done by modifying the construction of a Horn clause program from a TM program seen
before; details appear in the following section. After that, correctness and space usage
of the construction will be established.
2
27.2.1
405
Lemma 27.2.2 Let p be a nondeterministic TM program running in time g(|d|) for any
input d ID where g(n) is a polynomial. Then for any input d there exists a 3CNF
boolean expression3
F = C1 C2 . . . Ct
which is satisfiable iff p can accept d. Further, F can be constructed from p and d in
space O(log |d|).
Proof. Begin with SBOOLE program q from Lemma 27.1.2. Apply a construction in
Lemma 26.2.1 to q to construct an equivalent SBOOLE program q1 without conditional. Its
instructions can only have forms X:=true, X:=false, X:=Y, X:=Y, X:=YZ, or X:=YZ.
Next, apply another construction from Lemma 26.2.1 to q1 to construct an equivalent
single-assignment no-if, no-goto SBOOLE program q2 = I1 I2 . . . Im . Finally, construct
boolean expression
F = I1 . . . Im
from q2 , where each I is defined as follows:
BOOLE Instruction I
X := true
X := false
X := Y
X := Y
X := YZ
X := YZ
Clauses I
X
X
YX
Y X
(Y X) (Z X)
YZ X
3CNF equivalent
X
X
YX
YX
(YX) (ZY)
(YZX)
Expression F does not have the form of a set of Horn clauses because the negation
operator appears in two places. Further, we are asking a different question, satisfiability,
rather than Horn clause deducibility.
3 Meaning of 3CNF: each C is a disjunction of at most three literals. See Appendix Section A.1 for
i
terminology if unfamiliar.
27.3
Thousands of problems have been shown complete for nptime. For a large selection,
see [52]. Many of the first problems shown complete for nptime concern graphs, as
indicated by the following selection. However there is a wide variety in nearly all areas
where combinatorial explosions can arise. For historical reasons we now sometimes write
vertex where node has been used other places in the book; but the meaning is exactly
the same.
Corollary 27.3.1 The CLIQUE problem is -complete for nptime.
logs
Proof. First, CLIQUE nptime by a simple algorithm. Given graph G and number k,
just guess a subset of k of Gs vertices and check to see whether every pair is joined by
an edge of G. This takes at most quadratic time.
Second, we saw in Construction 25.3.1 how the SAT problem can be reduced to
CLIQUE. It is easy to see that the construction can be done in logarithmic space, so
SAT CLIQUE. By Proposition 25.3.5, CLIQUE is also -complete for nptime. 2
logs
logs
407
-complete for
logs
nptime:
Given: an undirected graph G = (V, E) and a number k.
To decide: is there a subset S V with size k such that every edge in E has an endpoint
in S?
Proof. It is clear that VertexCover is in nptime by a simple guess-and-verify algorithm.
Second, we show CLIQUE VertexCover which by the previous result and Proposition
logs
The reduction is as follows, given a CLIQUE problem instance (G, k) (does G = (V, E)
have k mutually adjacent vertices?). Construct the complement graph G = (V, E 0 )
where E 0 = {(v, w) | (v, w V, v 6= w, (v, w)
/ E}. Let n be the number of vertices in V .
Claim: C is a k-element clique of G if and only if S = V \ C is a (n k)-element vertex
cover of G. Assume C is a k-clique. An arbitrary edge (v, w) of G connects two distinct
vertices and is not in E. Thus at least one of v or w must not be in C, and so must be
in S \ C. Thus every edge has an endpoint in S, so S is an (n k)-element vertex cover
of G.
Now assume S is an (n k)-element vertex cover of G and v, w are any two distinct
vertices of C. If (v, w) were an edge in E 0 then one of them would be in S = V \ C. Thus
(v, w) is an edge in E, so C is a clique.
Thus (G, k) CLIQUE iff (G, n k) VertexCover. Further, (G, n k) can be constructed from (G, k) in logarithmic space, so CLIQUE VertexCover.
2
logs
Theorem 27.3.3 The following problem, called SetCover, is -complete for nptime:
logs
Si =
j=k
[
Sij
j=1
4 For example, by listing each as a string {v , . . . , v }, using binary integers to denote the various
m
1
elements vi .
Exercises
27.1 Prove Theorem 27.1.1. Hint: for hardness, show that SAT NonTrivial.
logs
27.2 Complete the proof of Theorem 27.2.1 by showing that function f is computable
in logarithmic space.
2
27.3 Verify the equivalence stated in Theorem 27.3.3.
References
Many thousands of combinatorial and other problems have been proven complete for
nptime. A wide-ranging survey may be found in the book by Garey and Johnson [52].
28
28.1
=
=
Lemma 28.1.2 Boolean programs can be executed in space at most a polynomial function of their length. Further, termination can be decided within polynomial space.
Proof. A simple interpreter slightly extending that of Section 3.4.2 can execute an arbitrary BOOLE program. Let the interpreted program p have m labels and k boolean
variables. Clearly m |p| and k |p|. The interpreter can store the current state (`, )
of p using O(log |p|) bits for the control point and one bit for the current value of each
program variable, O(|p|) bits in all.
This naive interpreter will of course loop infinitely if the interpreted program does
so. It can be modified always to terminate as follows. Observe that program p can enter
at most m 2k configurations without repeating one, and so going into an infinite loop.
Modify the interpreter to maintain a binary counter c consisting of r = kdlog me
boolean values (initially all false), and increase this counter by 1 every time an instruction
of p is simulated. If c becomes 2r 1 (all trues) then the interpreter stops simulation
and signals that p has entered an infinite loop. This is sufficient since 2r m 2k .
Clearly the modified interpreter uses O(|p| log |p|) space.
Theorem 28.1.1 in essence says that if boolean programs can be executed in polynomial time, then ptime = pspace. To show that BooleComp is hard for pspace we
409
BOOLE
= true
for some polynomial (n) and all d {0, 1} . For a fixed p we show how to construct
from d a Boolean program q = f (d) as desired. Without loss of generality we may assume
that p accepts d if and only if ps computation on d terminates at program control point
m+1.
Figure 28.1 shows the structure of the BOOLE program q that simulates ps computation on d. It uses the notation and the STEP macro from Chapter 26. The correctness
argument is just the same as that of Lemma 26.1.5 (the only difference is the use of the
WHILE loop in Figure 28.1.)
T1 : a1 ;. . . ; Tn : an ;
Tn+1 : B; . . . ; T(n) : B;
T0 : B; T1 : B; . . . ; T(n) : B;
L1 := true; Accept := false;
while not Accept do STEP;
Accept := true;
Input d = a1 . . . an on squares 1, 2, . . . , n;
blanks to the right; and
blanks to the left.
Start at I1 ; p has not yet accepted d.
Run p until it terminates.
Accept = true iff p eventually accepted d.
for pspace.
The following variant is a bit simpler, and so will be used in some later reductions to
prove problems hard for pspace.
411
28.2
::=
|
::=
X | true | false | E1 E2 | E1 E2 | E | E1 E2 | E1 E2
X . E | X . E
X0 | X1 | . . .
We say that E is closed if every variable X is bound, i.e. lies within the scope of some
quantifier X.E or X.E. The value of a closed quantified boolean expression E is either
true or false.
An expression of form X.E has value true if both E+ and E- have value true, where
E+, E- are obtained from E by replacing every unbound occurrence of X in E by true,
respectively false. Expression X.E has value true if E+ or E- have value true (or both),
and expressions E1 E2 , etc. are evaluated by combining the values of their components
in the usual way for boolean expressions.
2
Theorem 28.2.2 The set QBT of true quantified boolean expressions is -complete
logs
for pspace.
Proof. First, it should be clear that truth of a quantified boolean expression can be
established in linear space, by an algorithm that enumerates all combinations of values true, f alse of its quantified variables, and combines the results of subexpressions
according to the logical operators and quantifiers in E. This requires one bit per variable.
We next show Bterm QBT, so QBT is -complete for pspace by Theorems
logs
logs
1: X := true;
2: if Y goto 4 else 3
3: X := false
and similarly for the other forms.
One-step simulation We start out by constructing a quantified boolean expression
Nx(~X, ~L, ~X0 , ~L0 ) where ~X stands for the sequence X1 , . . . , Xk , ~L stands for L1 , . . . , Lm+1 , and
similarly for their primed versions. The expression will be such that
p ` (`, [1 7 v1 , . . . , k 7 vk ]) (`0 , [1 7 v10 , . . . , k 7 vk0 ])
if and only if
Nx(v1 , . . . , vk , f alse, . . . , true, . . . , f alse, v10 , . . . , vk0 , f alse, . . . , true, . . . , f alse)
evaluates to true, where the first sequence of truth values has true in position ` only, and
the second has true in position `0 only. Intention: L` = true (L0` = true ) if the current
control point (next control point) is instruction I` .
Some auxiliary notation: if vectors ~U, ~V have the same length s, then ~U ~V stands
for (U1 U1 ) . . . (Us Us ). Similarly, if I {1, 2, . . . , s}, then ~U I ~V stands for
V
iI (Ui Ui ). Finally, two more abbreviations, where we write [r, s) for {i | r i < s}:
Lab(`) stands for
Lab0 (`) stands for
V
L` i[1,`)(`,k] Li
V
L0` i[1,`)(`,k] L0i
goto `0
Xi := true
Xi := false
413
The size of this expression is clearly polynomial in m + k, and it is also evident that it is
logspace computable with the aid of a few counters bounded by k or m.
P (a, b)
P (a, b)
Claims: first, expression P 2 (a, b) will be true if and only if there exists a sequence
i1
a1 , a2 , . . . , a2i such that P 2 (ai , ai+1 ) holds for every i [1, 2i ). Second, the size of the
i
expression P 2 (a, b) is O(i+s) where s is the size of expression P (a, b), since each doubling
of the exponent only adds a constant number of symbols to the previous expression.
Now let r = dk log(m + 1)e, so 2r (m + 1)2k (the number of configurations p can
r
enter without looping). Consider quantified boolean expression Nx2 (~X, L~0 , X~0 , L~0 ). Value
2r is large enough so that if program p can go from state represented by (~X, ~L) to the
state represented by (X~0 , L~0 ) by any sequence of transitions, then it can do so in at most
2r transitions.
Consequently p terminates iff its start transition can reach one with control point
m + 1 within 2r steps. Thus [[p]] = true iff the following quantified boolean expression
is true1 :
r
~X~L . [~X false L1 ~L (1,m+1] false] Nx2 (~X, ~L, X~0 , L~0 ) L0m+1
Finally, a size analysis: by the argument above about P (a, b), the size of boolean expresr
sion Nx2 (...) is of the order of r times the size of Nx(~X, L~0 , X~0 , L~0 ). The latter has been
argued to be polynomial in the size of program p, so the total is polynomially bounded.
The final step, logspace computability of the reduction, is Exercise 28.4.
2
1 The
28.3
Theorem 28.3.1 The totality problem REGALL for regular expressions (is L(R) = ?)
is -complete for pspace.
logs
REGNOTALL is in pspace. Given regular expression R over alphabet , the property L(R) 6= can be decided in linear space as follows. First, construct an NFA
M = (Q, , m, q0 , F ) (nondeterministic finite automaton, see the appendix) such that
L(M ) = L(R). This can be done so the size of M is linear in the size of R [3].
Then apply the usual subset construction [3] to define a DFA (deterministic finite
automaton) MD accepting the same set L(R) = L(M ) = L(MD ). Note that MD may
have a number of states exponential in the size of M , since each state is a subset of the
states of M .
The property L(MD ) 6= holds if and only if there is some path from the automatons initial state {q0 } to a nonaccepting state. As seen before, this can be done by a
nondeterministic search through MD s transition graph, storing at most one graph node
at a time (it is not necessary to build all of MD first). The natural way to represent a
state of automaton MD is by storing one bit for each M state, that is as a bit vector of
size O(|R|). Thus the nondeterministic search can be done in at most linear space.
This shows the problem L(R) 6= is in nspace(n), and so in pspace by Theorem
23.4.3.
REGNOTALL is hard for pspace. We prove Bterm REGNOTALL. Suppose
logs
we are given a BOOLE program p = I1 . . . Im with variables X1,. . . ,Xk. Without loss of
generality we may assume every instruction in p is of the form X := true, X := false,
or if X goto ` else `0 .
We will show how to construct a regular expression Rp over = {#, 0, 1, t, f} which
generates all sequences that are not terminating computations by p. Thus L(Rp ) =
iff p does not terminate (which implies every string in is a noncomputation), so p
Bterm iff Rp is in REGNOTALL.
Represent a configuration C = (`, [1 7 b1 , . . . , k 7 bk ]) by the following string over
415
alphabet of length m + 1 + k:
C = 0`1 10m+1` b1 . . . bk
where bi = t if bi = true and bi = f if bi = f alse for i = 1, . . . , k. A computation trace
will be a string over alphabet :
T racesp = {#C1 # . . . #Ct # | p ` C1 . . . Ct and
C1 = (1, [1 7 f alse, . . . , k 7 f alse]) and Ct = (m + 1, [. . .])
Claim: for each BOOLE program p there is a yardstick regular expression Rp such that
1. L(Rp ) = \ T racesp
2. Rp is constructible in space O(|p|)
3. Rp = R1 | R2 | R3 | R4 where the Ri behave as follows:
L(R1 ) =
\ #[(0|1)m+1 (t|f)k #]
Wrong format
L(R2 ) =
\ #10 #f
L(R3 ) =
\ #0m 1(t|f)k #
Wrong finish
L(R4 ) =
Some Ci 6 Ci+1
Wrong start
Regular expression E`
goto `0
Xi := true
j[1,i](i,k] (B
ft
|B tf )
Xi := false
j[1,i](i,k] (B
ft
|B tf )
00
A generalization: regular expressions with squaring. Suppose the class of regular expressions is enriched by adding the operator R2 , where by definition L(R2 ) =
L(R) L(R). The totality problem for this class (naturally called REG2 ALL) can by
S
essentially similar methods be shown complete for c space(2cn ).
The ability to square makes it possible, by means of an extended regular expression
of size O(n), to generate all noncomputations of an exponentially space-bounded Turing
machine. Intuitively, the point is that an expression (. . . (2 )2 . . .)2 of size n generates all
strings in of length 2n , so the yardstick m + k + 1 used above can be made exponentially long by an extended regular expression of length O(n). This allows generation of
all noncomputations by an exponential space Turing machine by a linear-length regular
expression with squaring.
28.4
Game complexity
Game complexity
417
Blindfold games. Games such as Battleship, Kriegspiel (blindfold chess), and even
card games are based on imperfect information: No player is fully aware of the total game
state. Again, it might be expected that their complexity is higher. It is shown in [77]
that the one-token game shown complete for ptime in Theorem 26.4.4 becomes complete
for pspace in its natural blindfold version. The technique used is a simple reduction
from REGALL.
Exercises
28.1 Prove Corollary 28.1.4: the problem of deciding a Boolean programs termination
is complete for pspace.
2
28.2 Construct regular expressions for R1 , R2 , R3 without using set complement \. Give
bounds on their lengths in relation to the size of program p.
2
28.3 Prove that the membership problem for context-sensitive grammars is complete for
pspace.
2
28.4 Prove that the quantified boolean expression of the proof of Theorem 28.2.2 can
be built in logarithmic space.
2
References
The technique of reducing computations by arbitrary programs to ones using only boolean
variables was used extensively by Jones and Muchnick in [81, 82]. Completeness for
pspace of the REGALL and QBT problems is due to Meyer and to Stockmeyer [126, 165].
Part VI
Appendix
A.1
Boolean algebra
pronounced
not
and
or
implies
if and only if
arity
unary
binary
binary
binary
binary
precedence
5
4
3
2
1
associativity
left
left
left
left
A.1.1
When we want to determine the truth value of a boolean expression, we must specify
how the variables in the expression are to be interpreted. To this end we let be a
truth assignment mapping boolean variables to truth values. If all the boolean variables
occurring in an expression E are in the domain of (), then we define the value of E under
the truth assignment to be the result of applying the function eval : truth assignments
boolean expressions truth values given by
true,
if E is true
false, if E is false
eval E =
(E), if E is a variable
p op q, if E is p op q and
p = eval p and q = eval q
p,
if E is p and p = eval p
where the truth value of p op q is given by the following truth table:
p
true
true
false
false
q
true
false
false
true
p
false
false
true
true
pq
true
false
false
false
pq
true
true
false
true
pq
true
false
true
true
pq
true
false
true
false
Sets
A.2
A.2.1
423
Sets
Definition and examples
A.2.2
If T and S are two sets then the union S T is the set of all those objects that are
elements in T or in S (or both). For example, {1, 3} {3, 5} = {1, 3, 5}. The intersection
S T is the set of all those objects that are elements in both T and S. For example,
{1, 3, 4} {3, 4, 5} = {3, 4}. S and T are disjoint if they have no members in common, i.e.
if S T = . Finally, the difference S\T is the set of all those objects that belong to to
S but not T . Thus {1, 2, 5}\{3, 5, 7} = {1, 2}.
An ordered pair is a sequence of two (not necessarily distinct) objects in parentheses
(a, b). The first component is a and the second component is b. If S and T are sets the
cartesian product S T is the set of all ordered pairs where the first component belongs
to T and the second component belongs to S.
Similarly we speak of triples (a, b, c), quadruples (a, b, c, d), and in general n-tuples
(a1 , a2 , . . . , an ), and of the cartesian product of n sets S1 , S2 , . . . , Sn .
P(S) denotes the set of all subsets of S. For instance,
P({1, 2, 3}) = { , {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3} }
If S is a finite set we let | S | denote the number of elements in S.
A.2.3
An abbreviation
We use the vector notation ~xn to denote the sequence x1 , x2 , . . . , xn (also when
x1 , x2 , . . . , xn are numbers, graphs, etc.). Note that ~xn does not include parentheses,
so (~xn ) means (x1 , x2 , . . . , xn ). Moreover, if ~xn denotes x1 , x2 , . . . , xn and ~ym denotes
denotes y1 , y2 , . . . , ym then (~xn , ~ym ) means (x1 , x2 , . . . , xn , y1 , y2 , . . . , ym ).
A.3
A.3.1
Functions
Total Functions
Functions
425
A.3.2
Infinite sequences
Let S be some set. An infinite sequence of elements from S is a total function from IN
to S. For example, the identity function i : IN IN defined by i(x) = x is a sequence,
and the function i : IN IN IN defined by i(x) = (i, 2i) is a sequence.
Instead of presenting a sequence by a function definition, one often simply writes the
first few values i(0), i(1), i(2), etc. when it is obvious how i is then defined. For instance,
the first sequence above would simply be written 0, 1, 2, . . . and the second would be
written (0, 0), (1, 2), (2, 4), (3, 6), . . .
A.3.3
Partial functions
Functions
427
(a, b) g we write b = g(a), that is, we put parentheses around a. Thus above we should
have written f ((3, 2)) = 2, rather than f (3, 2) = 2. However it is customary to drop one
set of parentheses, and we shall also do so.
For a partial function f : A B the domain of f is the set
dom(f ) = {a A | f (a) }
In case f is total, dom(f ) = A.
The codomain of a total or partial function from A into B is the set B.
The range of a total or partial function from A into B is the set
rng(f ) = {b B | there is a a A such that f (a) = b}
A.3.4
Any total function is also a partial function. For a partial function f : A B it may
happen that for all a A, f (a) is defined, i.e. dom(f ) = A. In that case f is also a total
function.
There are two standard ways of obtaining a total function f 0 from a partial function
f : A B :
1. Remove all those elements of A on which f is undefined: Define f 0 : dom(f ) B
by f 0 (a) = f (a) for all a dom(f ).
2. Add a new element to B and let that be the result whenever f is undefined:
Define f 0 : A (B {}) by: f 0 (a) = f (a) for all a dom(f ), and f 0 (a) = for
a A\dom(f ).
A.3.5
Recall that functions are just certain sets, and that two sets are equal if and only if they
contain the same elements. This implies that two total functions f, g : A B are equal
if and only if they are the same sets of pairs. Equal total functions f and g thus satisfy
f (a) = g(a) for all a A.
Similarly, two partial functions f, g : A B are equal iff dom(f ) = dom(g) and for
all a dom(f ) : f (a) = g(a), i.e. iff for all a A:
1. f (a) and g(a); or
2. f (a) and g(a) and f (a) = g(a).
A.3.6
otherwise
The function updating of two partial functions f, g : A B is the partial function
f [g] : A B defined by
(
g(a) if a dom(g)
f [g](a) =
f (a) otherwise
Note that if both g and f are undefined on a A, then so is f [g].
A function f : A B with finite domain dom(f ) = {a1 , a2 , . . . , an } is also written
[a1 7 b1 , a2 7 b2 , . . . , an 7 bn ] where f (a1 ) = b1 , f (a2 ) = b2 , . . . , f (an ) = bn . (This is just
a slight variant of the notation {(a1 , b1 ), (a2 , b2 ), . . . , (an , bn )} for f .) So (omitting a pair
of square brackets)
f [a1 7 b1 , a2 7 b2 , . . . , an 7 bn ]
is the function h : A B such that h(a1 ) = b1 , h(a2 ) = b2 , . . . , h(an ) = bn , and h(a) = f (a)
for a A\{a1 , a2 , . . . , an }.
Let f, g : X IR for some set X. Then
1. The sum f + g : X IR is defined by:
(
f (x) + g(x) if f (x) and g(x)
(f + g)(x) =
otherwise
2. The product f g : X IR is defined by:
(
f (x) g(x) if f (x) and g(x)
(f g)(x) =
otherwise
3. The difference f g : X IR is defined by:
(
f (x) g(x) if f (x) and g(x)
(f g)(x) =
otherwise
Functions
429
otherwise
5. Similar notation is used with a constant a X in place of f . For instance, a f :
X IR is defined by (a f )(x) = a f (x).
In the special case where f, g are total functions (see Section A.3.4) the operations 1-3
and 5 give as a result a total function. In 4 the result may be a partial function even
when f, g are both total.
A.3.7
Higher-order functions
A.3.8
Lambda notation
Lambda notation is a device to define a function without giving it a name. For instance,
we have previously described the successor function as
f : IN IN , f (n) = n + 1
Using the lambda notation this function could be written:
n . n + 1 : IN IN
The notation n . n + 1 should be read: the function that maps any n to n + 1.
In the usual notation we write for example f (3) = 3 + 1. What we do when we write
3+1 on the right hand side of this equality is that we take the definition of f , f (n) = n+1
and substitute 3 for n in the right hand side of the definition. In the lambda notation
we do something similar by writing
(n . n + 1) 3 = 3 + 1 = 4
Note the unusual bracketing in this expression.
We write functions of several variables, e.g. addition, as:
()
(m, n) . m + n : IN IN IN
m . n . m + n : IN (IN IN )
Whereas the first function expects a pair (m, n) and then gives m+n as result, the second
function expects a number and then gives a function as result. For instance,
(m . n . m + n) 3 = n . 3 + n
This function, add 3 can itself be applied to some argument, for instance
(m . 3 + m) 4 = 3 + 4 = 7
Thus
((m . n . m + n) 3) 4 = (n . 3 + n) 4 = 3 + 4 = 7
It is clear that for any two numbers k, l IN
((m, n) . m + n) (k, l) = ((m . n . m + n) k) l
This suggests that one can represent functions of several variables by means of functions
of just one variable. Indeed this holds in general as was discovered independently by
several people. The transformation from a function like the one in () to the one in ()
is called currying after H. B. Curry, one of the discoverers of the idea.
From now on multiple function applications associate to the left, so e1 e2 e3 means
(e1 e2 ) e3 .
A.3.9
Functions
431
A.3.10
A.3.11
Below all functions are from IN into IR+ . Given a total function f .
1. O(f ) (pronounced big oh) is the set of all functions g such that for some r IR+ ,
and for all but finitely many n,
g(n) < r f (n)
2. (f ) is the set of all functions g such that for some r IR+ and for infinitely many
n,
g(n) > r f (n)
3. (f ) is the set of all functions g such that for some r1 , r2 IR+ and for all but
finitely many n,
r1 f (n) g(n) r2 f (n)
4. o(f ) (pronounced little oh) is the set of all functions g such that
g(n)
=0
n f (n)
lim
If g O(f ) then for some r the graph of g is below that of r f = x . r f (x) for all but
finitely many arguments. If g o(f ) then the graph of g is below that of r f = x . r f (x)
for all r > 0 and all but finitely many arguments.
If g (f ) then for some r1 , r2 the graph of f stays between the graph of r1 f and
r2 f for all but finitely many arguments.
The following properties are useful. Their proofs are left as exercises.
1. g (f ) iff g O(f ) and f O(g)
2. g (f ) iff f (g)
Some examples of the O-notation, whose proofs are also left as exercises:
1. n . k O(n . n), but n . n 6 O(n . k), for any k IR+ .
2. n . log n O(n . n), but n . n 6 O(n . log n).
3. n . na O(n . bn ), but n . bn 6 O(n . na ), for all a, b IR+ .
A common but sloppy notation is to write f = O(g) instead of f O(g). Such notation
is harmless as long as one keeps in mind that the = is neither symmetric nor transitive.
Thus if f = O(g) and h = O(g) one should conclude neither O(g) = f which is meaningless
nor f = h which may be plain wrong.
A.4
Graphs
A graph consists of a number of nodes and a number of edges between these nodes. For
instance the following graph has three nodes and three edges. The edges have arrows in
one direction, so this is a directed graph.
1
S
o
S
/
S
- 3
2
More precisely, we define a directed graph to be a pair (V, E) where V is called the
set of nodes or vertices and E V V is called the set of edges. The graph above is
({1, 2, 3}, {(1, 2), (2, 3), (3, 1)}). An edge (x, y) E may also be written as x y.
433
A.5
A.5.1
A finite non-empty set is sometimes called an alphabet, in which case the members of the
set are called symbols. If = {a1 , . . . , ak } is an alphabet, a string over is a sequence
b1 b2 . . . bm where m 0 and each bi . For example, if = {0, 1}, then 11, 101, and
100011 are all strings over . The empty string is the unique string with m = 0.
If x = b1 . . . bm and y = c1 . . . cn , then x and y are equal, written x = y, if m = n and
bi = ci for all i {1, . . . n}. If x = b1 . . . bm and y = c1 . . . cn , their concatenation is the
string xy = b1 . . . bm c1 . . . cn . If z = xy then we say x is a prefix of z, and that y is a suffix
of z. If z = xwy then we say w is a substring of z.
If A, B are two sets of strings over , then we define
AB
A
A+
= {xy | x A, y B}
= {x1 x2 . . . xn | n 0, x1 , . . . , xn A}
= {x1 x2 . . . xn | n 1, x1 , . . . , xn A}(so A = A+ {})
A.5.2
Grammars
A grammar includes a rewrite system P (as defined in Section 10.2.1), used as a tool
to generate strings over an alphabet. We often write ::= instead of (, ) P . For
instance
A ::= a A a
A ::= b A b
A ::= c
A ::= aca
with = {a, b, c} is a grammar. For conciseness we often group productions with the
same left side, separated by the symbol | (pronounced or). Thus the four productions
above could be expressed as one:
A ::= a A a | b A b | c | aca
The usage of a grammar is that one starts out with the start symbol S and then replaces
non-terminals A (in particular S) by the right hand sides of their productions, so the
preceding grammar, beginning with A, can generate strings over {a, b} like:
aacaa
aaabcbaaa
bbaacaabb
baacaab
(What is the underlying structure of all these strings?)
More formally, a grammar is a 4-tuple G = (N, T, P, S) where
1. N is an alphabet whose members are called nonterminals.
2. T is an alphabet, disjoint from N , whose members are called terminals.
3. P is a string rewriting system over N T such that (, ) P implies
/ T .
4. S is a member of N called the start symbol .
In the preceding example
1. N = {A}.
2. T = {a, b, c}.
3. P = {(A, a A a), (A, b A b), (A, c), (A, aca)}.
4. S = A.
435
A.5.3
Classes of grammars
Some classes of grammars are particularly interesting, and well-studied for programming
language applications.
A regular grammar G = (N, T, P, S) is a grammar in which every production is of
form A ::= x or A ::= xB where A, B N, x T . Our example grammar above is not
regular.
A context-free grammar G = (N, T, P, S) is one such that in every production ::=
P , is a single nonterminal symbol. Our example grammar above is context-free.
Clearly every regular grammar is context-free, but not necessarily vice versa.
A context-sensitive grammar G = (N, T, P, S) is one such that in every production
::= P , the length of is larger than or equal to that of , or ::= is S ::= , and
S does not appear on the right side of any production in P .
Let G = (N, T, P, S) be a context-free grammar. There is a specific form of one-step
and multi-step rewriting where one always rewrites the left-most non-terminal. These are
called the left-most one-step derivation relation l and the left-most multi-step derivation relation l and are defined as follows where , (N T ) :
1. l iff ::= P and T , (N T ) .
A.5.4
A.5.5
437
Regular expressions
One way to represent a set of strings is to find a grammar generating exactly that set.
Another way is to find a regular expression. Let be an alphabet. The set of regular
expressions over is defined as follows.
1. is a regular expression over .
2. If a then a is a regular expression over .
3. If r, s are regular expressions over then so are (r | s), (rs), and (r )
To save parentheses we adopt the conventions that
1. has the highest precedence;
2. concatenation has the second highest precedence, and associates to the left;
3. | has the lowest precedence, and associates to the left.
For instance the regular expression r = (((00) ) | (1((11) ))) can be written shorter as
(00) | 1(11) .
As for grammars we define L(r), the set generated by the regular expression r, as
follows:
1.
2.
3.
4.
5.
L() = ;
L(a) = {a} for every a ;
L(r | s) = L(r) L(s);
L(rs) = L(r)L(s);
L(r ) = L(r)
where L(r)L(s) and L(r) are defined in Subsection A.5.1. For the regular expression r
above L(r) is the set of all strings consisting either of an even number of 0s or an odd
number of 1s.
The cautious reader may have noticed that a certain class of grammars was called
the regular grammars. This suggests some connection to the regular expressions. Indeed
the following property holds:
Proposition A.5.1
1. For any regular grammar G there is a regular expression r with L(G) = L(r).
2. For any regular expression r there is a regular grammar G with L(G) = L(r).
On the other hand there are certain sets of strings that are generated by a context-free
grammar but not by any regular expression or regular grammar. For instance, this is the
case with the set of strings consisting of n as followed by n bs.
A.5.6
Grammars and regular expressions are compact representations of sets of strings. We now
introduce a third kind of representation of a set of strings, namely a non-deterministic
finite automaton, or NFA for short. Pictorially an NFA is a directed graph where every
edge has a label, one node is depicted as the start node, and zero, one or more nodes are
depicted as accept nodes. Here is an example where the start node stands out by having
an arrow labelled start into it, and where the single accepting node has two circles
rather than just one:
start cb
c
1
2 - 3 - 4l
a
a
O
O
The idea of representing a set L of strings by this NFA is as follows. From the start node
1 we can read a c and then proceed to node 2. From this node we can read any number
of as without leaving the state and then read a b, jumping to node 3. Again we can
read any number of as and then a c, jumping to the accepting node. Thus altogether
we have read a string of form: ca . . . aba . . . ac. The set L consists of all the strings we
can read in this manner; in other words, L is the same set of string as the set generated
by the regular expression ca ba c.
The reason why these automata are called non-deterministic is that there can be
two different edges out of a node with the same label, and there can be edges labelled ,
as illustrated in the following NFA, which accepts the set of strings generated by |ab|ac:
start
a b
- 1
2
H
HH
a HH
j c
4
W
- 3l
- 5l
439
Proposition A.5.2 the following conditions are equivalen for any language L:
1. There is a DFA accepting L.
2. There is an NFA accepting L.
3. There is a regular expression generating L.
4. There is a regular grammar generating L.
Proofs of these properties can be found in [3].
In constructing 2 from 1, the number of states of the two automata are the same since
any DFA may be converted into an equivalent NFA by a trivial change in the transition
function (to yield a singleton set of states instead of one state). In constructing 1 from
2, the DFA may have as many as 2n states where n is the number of states of the NFA.
In constructing 2 from 3, the NFA has at most twice as many states as the size of the
regular expression.
A.6
A.6.1
Induction
Inductive proofs
1+2+...+n =
n(n + 1)
2
Is this equation true for all n IN ?1 If n = 0 it states that2 0 = (0 1)/2 which is true.
For n = 1 it states 1 = (1 2)/2 which is true. For n = 2, 3 it states that 1 + 2 = (2 3)/2
and 1 + 2 + 3 = (3 4)/2, which are both true, and so on.
The formula seems to be true for all examples. However this does not constitute
a proof that it really is true in all cases. It could be that the formula fails for some
number.3 On the other hand, if we dont know what n is, we need a general technique
to prove the equation.
Suppose we could prove the following.
1. () holds for n = 0.
1 Recall
that predicates are certain sets. In this section we often discuss whether or not something
holds or is true. This always boils down to set membership, cf. Section 12.2.
2 By convention 1 + 2 + . . . + n = 0 when n = 0.
3 Allenby [5] mentions a a striking example of this kind. Consider the following property that a number
n may or may not have: n can be written as n31 + n32 + n33 + n34 + n35 + n36 + n37 + n38 where n1 , . . . , n8 IN .
It turns out that the property holds for all natural numbers except 23 and 239.
Induction
441
n(n + 1)
2
Then
1 + 2 + . . . + n + (n + 1) =
=
=
=
n(n+1)
+ (n + 1)
2
n(n+1)
+ 2(n+1)
2
2
n(n+1)+2(n+1)
2
(n+1)(n+2)
2
A.6.2
Inductive definitions
One can define objects inductively (or recursively). For instance, the sum s(n) = 1 + 2 +
. . . + n can be defined as follows:
s(0)
= 0
s(n + 1) = (n + 1) + s(n)
More generally we may use:
Definition by Recursion. If S is some set, a is an element of S, and g :
S IN S is a total function, then the function f : IN S
f (0)
=
f (n + 1) =
a
g(f (n), n)
is well-defined.
In the preceding example S was IN , a was 0, and g(x, n) = (n + 1) + x.
Many variations of this principle exist. For instance:
1. f (n + 1) may use not only n and f (n), but all the values 0, . . . , n and f (0), . . . , f (n).
2. Function f may have more parameters than the single one from IN .
3. Several functions may be defined simultaneously.
As examples of the three variations:
1. The fibonacci function f : IN IN is defined by:
f (0)
=
f (1)
=
f (n + 2) =
1
1
f (n + 1) + f (n)
= 1
= m mn
3. The functions even : IN {T, F } returning T iff the argument is even, and odd :
IN {T, F } returning T iff the argument is odd can be defined by mutual recursion:
even(0)
= T
even(n + 1) = odd(n)
odd(0)
odd(n + 1)
= F
= even(n)
Induction
A.6.3
443
The set of strings generated by a grammar can be viewed as defined inductively. Here is
an example:
A parenthesis string is a string over the alphabet {(, )}. The set of all balanced
parenthesis strings is defined as the set of strings generated by the following grammar:
S
S
S
::=
::=
::=
SS
(S)
Example strings generated by the grammar: () and (()()) and (()(())). Some examples,
not generated by the grammar: )( and ()(() and ())).
There is a well-known algorithm to test whether a parenthesis string is balanced. Let
l(x) and r(x) be the number of left and right parentheses in x, respectively. A prefix of
x is a string y such that x = yz for some z, i.e. an initial part of x. Claim: a parenthesis
string x is balanced iff l(x) = r(x) and for all prefixes y of x l(y) r(y).
Actually we can prove correctness of this claim. This has two parts. First, that
any string x generated by the grammar satisfies the test; and second, that any string
satisfying the test is also generated by the grammar.
For the first part, the proof is by complete induction on n, the number of steps in the
derivation S x, with base case n = 1. So P (n) is: any string x in a derivation S x
with n steps satisfies the test.
Base case. If n = 1 then the derivation must be S (remember that every derived
string consists only of terminals). Clearly, l() = 0 = r(), and since the only prefix of
is itself, l(y) r(y) for all prefixes y.
Induction step: Suppose all strings generated in n or fewer steps from the grammar
satisfy the test, and consider some string x generated in n + 1 steps. The rewriting must
begin with either S S S or S (S).
We consider first the case beginning with S S S. Here x has form uv where S u
and S v are derivations in n or fewer steps. By induction hypothesis the test holds
for both u and v. Then
l(x) = l(uv)
= l(u) + l(v)
= r(u) + r(v)
= r(x)
Now we only need to show that l(y) r(y) for any prefix y of x = uv, so let y be some
prefix of x. If y is a prefix of u then l(y) r(y) by induction hypothesis. If y is not a
=
=
=
=
=
l(uw)
l(u) + l(w)
r(u) + l(w)
r(u) + r(w)
r(uw)
r(x)
as required.
The case where the derivation begins with S (S) is left as an exercise, and the proof
of the remaining part, that any string x satisfying the test is generated by the grammar,
is also an exercise.
Induction proofs occur frequently in computability and complexity theory as well as
in other branches of theoretical computer science. The only way to get to master such
proofs is to try and do a number of them. Therefore the reader is strongly encouraged
to try out Exercises A.17 and A.18.
A.7
Pairing functions
Pairing functions
445
For a more economical example in which pr is onto, consider the pairing decomposition
where the pairing function is pr3 (x, y) = (x + y)(x + y + 1)/2 + y = (x2 + 2xy + y 2 + x +
3y)/2. This pairing is surjective.
This can be illustrated by the figure:
y
..
.
4
3
2
1
0
...
10
6
3
1
0
0
...
...
11
7
4
2
1
...
...
...
12
8
5
2
...
...
...
...
13
9
3
...
...
...
...
...
14
4
...
...
...
...
...
...
...
In both of the two last pairing decompositions the pairs in the sequence
{(0, 0), (0, 1), (1, 0), (2, 0), (1, 1), (0, 2), (0, 3), . . .}
receive increasing values by the pairing function, and in the last example these values
are even consecutive. Further, Polya has proven that any surjective polynomial pairing
function must be identical to pr3 (x, y) or its converse pr4 (x, y) = pr3 (y, x).
Exercises
A.1
1. Place the implicit parentheses in the boolean expression p q q p q
2. Convert the expression to CNF and indicate which equations you use.
3. Given the truth assignment (p) = true, (q) = false, what is the value of the
expression in question 1? What is the value of the CNF-converted expression?
A boolean expression is called satisfiable iff there exists a truth assignment for
the variables of the expression such that the value of the expression is true. It is
called valid iff the value of the expression is true under all truth assignments of the
variables.
4. Is the expression in question 1 satisfiable? Is it valid?
A.4 Prove that if f : A B is a bijective function then there exists exactly one function
f 1 : B A such that: f (a) = b if and only f 1 (b) = a. The function f 1 is called the
inverse of f .
2
A.5 Prove that if f : A B is an injective function then there exists exactly one function
f 1 : rng(f ) A such that: f (a) = b if and only f 1 (b) = a. The function f 1 is again
called the inverse of f .
2
A.6 Prove that the inverse of an injective function is surjective.
A.11 Suppose f O(f 0 ) and g O(g 0 ). Which of the following are true?
1. f + g O(f 0 + g 0 ).
2. f g O(f 0 g 0 ).
3. f /g O(f 0 /g 0 ).
4. Suppose that f g and f 0 g 0 are functions from IN into IR+ . Then f g
O(f 0 g 0 ).
2
A.12 Construct NFAs accepting the following regular expressions:
1. (a|b)
2. (a |b )
3. (( | a)b )
Pairing functions
447
2
A.14 Give a regular expression generating the language accepted by the following NFA:
W
startab1
2
3
@
O a
@
a
c
@
c
R
@
- 5l
4
2
A.15 Convert the NFA of the preceding exercise into a DFA.
References
Most of the contents of this appendix is covered by many books on discrete mathematics.
For more specialized texts, an excellent introduction to sets and functions can be found in
Halmos book [61], and finite automata are covered by the classic text by Aho, Hopcroft,
and Ullman [2].
Bibliography
[1] W. Ackermann. Zum Hilbertschen Aufbau der reelen Zahlen. Mathematische Annalen,
99, 1928.
[2] A. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer Algorithms. Computer Science and Information Processing. Addison-Wesley, 1974.
[3] A. Aho, R. Sethi, and J.D. Ullman. Compilers: Principles, Techniques, and Design.
Addison-Wesley, 1986.
[4] A.V. Aho, editor. Currents in the Theory of Computing. Prentice-Hall, 1973.
[5] R.B.J.T. Allenby. Rings, Fields, and Groups. Edward Arnold, 1988.
[6] Nils Andersen and Neil D. Jones. Generalizing cooks transformation to imperative stack
programs. In Juhani Karhum
aki, Hermann Maurer, and Grzegorz Rozenberg, editors,
Results and Trends in Theoretical Computer Science, volume 812 of Lecture Notes in
Computer Science, pages 118. Springer-Verlag, 1994.
[7] Y. Bar-Hillel, M. Perles, and E. Shamir. On formal properties of simple phrase structure
grammars. Z. Phonetik, Sprachwiss. Kommunikationsforsch., 14:143172, 1961.
[8] H.P. Barendregt. The Lambda Calculus: Its Syntax and Semantics. North-Holland, second,
revised edition, 1984.
Oskarsson, and E. Sandewall. A partial evaluator and its
[9] L. Beckman, A. Haraldson, O.
use as a programming tool. Artificial Intelligence, 7:319357, 1976.
[10] Amir M. Ben-Amram. What is a pointer machine? SIGACT News, 26(2):8895, June
1995.
[11] A. Berlin and D. Weise. Compiling scientific code using partial evaluation. IEEE Computer, 23:2537, 1990.
[12] D. Bjrner, A.P. Ershov, and N.D. Jones, editors. Partial Evaluation and Mixed Computation. North-Holland, Amsterdam, 1988.
[13] L. Blum, M. Shub, and S. Smale. On a theory of computation over the real numbers,
np-completeness, and universal machines. Proc. IEEE Symposium on Foundations of
Computer Science, 29, 1988.
[14] M. Blum. A machine independent theory of the complexity of recursive functions. Journal
of the Association for Computing Machinery, 14:322336, 1967.
[15] P. van Emde Boas. Machine models and simulations. In J. val Leeuwen, editor, Handbook
of Theoretical Computer Science, vol. A. Elsevier, 1990.
[16] A.B. Borodin. Computational complexity and the existence of complexity gaps. Journal
of the Association for Computing Machinery, 19:158174, 1972.
[17] A.B. Borodin. Computational complexity theory and practice. In Aho [4], pages 3589.
449
450 Bibliography
[18] H. Bratman. An alternate form of the uncol diagram. Communications of the ACM,
4:3:142, 1961.
[19] J. Case and C. Smith. Comparison of identification criteria for machine inductive inference.
Theoretical Computer Science, 25:193220, 1983.
[20] C.-L. Chang and R.C.-T. Lee. Symbolic Logic and Mechanical Theorem Proving. Computer
Science and Applied Mathematics. Academic Press, 1973.
[21] T. Chuang and B. Goldberg. Real-time deques, multihead turing machines, and purely
functional programming. International Conference on Functional Programming Languages
and Computer Architecture, 6:289298, 1995.
[22] A. Church. A note on the Entscheidungsproblem. Journal of Symbolic Logic, 1:4041,
1936.
[23] A. Church. An unsolvable problem of elementary number theory. American Journal of
Mathematics, 58:345363, 1938.
[24] A. Church and J.B. Rosser. Some properties of conversion. Transactions of the American
Mathematical Society, 39:1121, 1936.
[25] A. Cobham. The intrinsic computational difficulty of functions. In Proceedings of the
Congress for Logic, Mathematics, and Philosophy of Science, pages 2430, 1964.
[26] S.A. Cook. Path systems and language recognition. In Proceedings of the 2nd Annual
ACM Symposium on on the Theory of Computing, pages 7072, 1970.
[27] S.A. Cook. Characterization of pushdown machines in terms of time-bounded computers.
Journal of the Association for Computing Machinery, 18(1):418, January 1971.
[28] S.A. Cook. The complexity of theorem proving procedures. In Proceedings of the 3rd
Annual ACM Symposium on on the Theory of Computing, pages 151158, 1971.
[29] S.A. Cook. An overview of computational complexity. Communications of the ACM,
26(6):401408, 1983.
[30] Stephen A. Cook. Linear-time simulation of deterministic two-way pushdown automata. In
V. Freiman C. editor, Information Processing 71, pages 7580. North-Holland Publishing
Company, 1972.
[31] C. Dahl and M. Hessellund. Determining the constant coefficients in a time hierarchy.
Technical report, Department of Computer Science, University of Copenhagen, feb 1994.
[32] M. Davis. Arithmetical problems and recursively enumerable predicates. Journal of Symbolic Logic, 18:3341, 1953.
[33] M. Davis. Computability and Unsolvability. McGraw-Hill, New York, 1958. Reprinted in
1982 with [35] as an appendix, by Dover Publications.
[34] M. Davis. The Undecidable. New York, Raven Press, 1960.
[35] M. Davis. Hilberts tenth problem is unsolvable. American Mathematical Monthly, 80:233
269, 1973.
451
[44] A.P. Ershov. On the essence of compilation. In E.J. Neuhold, editor, Formal Description
of Programming Concepts, pages 391420. North-Holland, 1978.
[45] A.P Ershov. Mixed computation: Potential applications and problems for study. Theoretical Computer Science, 18:4167, 1982.
[46] A.P. Ershov. Opening key-note speech. In Bjrner et al. [12].
[47] R. W. Floyd and R. Beigel. The Language of Machines. Freeman, 1994.
[48] Y. Futamura. Partial evaluation of computing process an approach to a compilercompiler. Systems, Computers, Controls, 2(5):4550, 1971.
[49] Y. Futamura. Partial computation of programs. In E. Goto, K. Furukawa, R. Nakajima,
I. Nakata, and A. Yonezawa, editors, RIMS Symposia on Software Science and Engineering, volume 147 of Lecture Notes in Computer Science, pages 135, Kyoto, Japan, 1983.
Springer-Verlag.
[50] Y. Futamura and K. Nogi. Generalized partial computation. In Bjrner et al. [12], pages
133151.
[51] R. Gandy. The confluence of ideas in 1936. In Herken [68], pages 55112.
[52] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to Theory of
NP-Completeness. Freeman, New York, 1979.
[53] R. Gl
uck. On the generation of specializers. Journal of Functional Programming, 4(4):499
514, 1994.
[54] K. G
odel. Uber
formal unentscheidbares S
atze der Principia Mathetatica und verwandter
Systeme. Monatsheft, Math. Phys., 37:349360, 1931.
452 Bibliography
[55] L.M. Goldschlager. The monotone and planar circuit value problems are logspace complete
for p. SIGACT News, 9:2529, 1977.
[56] R. Greenlaw, H.J. Hoover, and W.L. Ruzzo. Limits to parallel computation. Oxford
University Press, New York, 1995.
[57] K. Grue. Arrays in pure functional programming languages. Lisp and Symbolic Computation, 2:105113, 1989.
[58] A. Grzegorczyk. Some classes of recursive functions. Rozpraqy Mathematyczny, 4:145,
1953.
[59] Y. Gurevich. Kolmogorov machines and related issues: the column on logic in computer
science. Bull. of the EATCS, 35:7182, 1988.
[60] Y. Gurevich and S. Shelah. Nearly linear time. In Lecture Notes in Computer Science
363, pages 108118. Springer Verlag, 1989.
[61] P. Halmos. Naive Set Theory. Springer-Verlag, undergraduate texts in mathematics, 1974
edition, 1960.
[62] Torben Amtoft Hansen, Thomas Nikolajsen, Jesper Larsson Tr
aff, and Neil D. Jones.
Experiments with implementations of two theoretical constructions. In Lecture Notes in
Computer Science 363, pages 119133. Springer Verlag, 1989.
[63] J. Hartmanis and J.E. Hopcroft. An overview of the theory of computational complexity.
Journal of the Association for Computing Machinery, 18(3):444475, 1971.
[64] J. Hartmanis, P.M. Lewis II, and R.E. Stearns. Classification of computations by time
and memory requirements. In Proc. IFIO Congress 65, Spartan, N.Y., pages 3135, 1965.
[65] J. Hartmanis and R.E. Stearns. On the complexity of algorithms. Transactions of the
American Mathematical Society, 117:285306, 1965.
[66] J. H
astad. Computational Limitations for Small-depth Circuits (Ph.D. thesis). MIT Press,
Cambridge, MA, 1987.
[67] P. Henderson. Functional Programming: Application and Implementation. PH, 1980.
[68] R. Herken, editor. The Universal Turing Machine. A Half-Century Survey. Oxford University Press, 1988.
[69] D. Hilbert and P. Bernays. Grundlagen der Mathematik, volume I. Springer, 1934.
[70] D. Hofstaedter. G
odel, Escher, Bach: An Eternal Golden Braid. Harvester Press, 1979.
[71] N. Immerman. Nondeterministic space is closed under complement. SIAM Journal of
Computing, 17:935938, 1988.
[72] K. Jensen and N. Wirth. Pascal User Manual and Report. Springer-Verlag, revised edition,
1985.
[73] D. S. Johnson. Approximation algorithms for combinatorial problems. Journal of Computer and Systems Sciences, 9:256278, 1974.
453
[74] J.P. Jones and Y.V. Matiyasevich. Register machine proof of the theorem on exponential
diophantine representation of enumerable sets. Journal of Symbolic Logic, 49(3):818829,
1984.
[75] N. D. Jones. Space-bounded reducibility among combinatorial problems. Journal of Computer and System Science, 11:6885, 1975.
[76] N. D. Jones. A note on linear time simulation of deterministic two-way pushdown automata. IPL: Information Processing Letters, 1977.
[77] N. D. Jones. Blindfold games are harder than games with perfect information. Bulletin
European Association for Theoretical Computer Science, 6:47, 1978.
[78] N. D. Jones. Constant time factors do matter. In Steven Homer, editor, STOC 93.
Symposium on Theory of Computing, pages 602611. ACM Press, 1993.
[79] N. D. Jones and W. Laaser. Complete problems for deterministic polynomial time. Theoretical Computer Science, 3:105117, 1977.
[80] N. D. Jones, E. Lien, and W. Laaser. New problems complete for nondeterministic log
space. Mathematical Systems Theory, 10:117, 1976.
[81] N. D. Jones and S. Muchnick. Even simple programs are hard to analyze. Journal of the
Association for Computing Machinery, 24(2):338350, 1977.
[82] N. D. Jones and S. Muchnick. Complexity of finite memory programs with recursion.
Journal of the Association for Computing Machinery, 25(2):312321, 1978.
[83] N. D. Jones and A. Mycroft. Data flow analysis of applicative programs using minimal function graphs. In Proceedings of the Thirteenth ACM Symposium on Principles of
Programming Languages, pages 296306, St. Petersburg, Florida, 1986.
[84] N.D. Jones. Reducibility among combinatorial problems in log n space. In Proceedings
7th Annual Princeton Conference on Information Sciences and Systems, pages 547551.
Springer-Verlag, 1973.
[85] N.D. Jones. Constant time factors do matter. In ACM Symposium on Theory of Computing
proceedings., pages 602611. Association for Computing Machinery, 1993.
[86] N.D. Jones. Mix ten years later. In William L. Scherlis, editor, Proceedings of PEPM 95,
pages 2438. ACM, ACM Press, 1995.
[87] N.D. Jones. An introduction to partial evaluation. Computing Surveys, ?:?, 1996.
[88] N.D. Jones. Computability and complexity from a programming perspective. TCS, ?:?,
1997.
[89] N.D. Jones, C.K. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program
Generation. Prentice-Hall, 1993.
[90] N.D. Jones, P. Sestoft, and H. Sndergaard. An experiment in partial evaluation: the
generation of a compiler generator. In J.-P. Jouannaud, editor, Rewriting Techniques and
Applications, Dijon, France., volume 202 of Lecture Notes in Computer Science, pages
124140. Springer-Verlag, 1985.
454 Bibliography
455
[109] P.M. Lewis II, R.E. Stearns, and J. Hartmanis. Memory bounds for recognition of contextfree and context-sensitive languages. In Conf. Rec., IEEE 6th Annual Symposium on
Switching Circuit Theory and Logic Design, 1965.
[110] m. Li and P.M.B. Vit
anyi. Kolmogorov complexity and its applications. In Jan van
Leeuwen, editor, Handbook of Theoretical Computer Science, volume 1. Elsevier and MIT
Press, 1990.
[111] L.A. Lombardi. Incremental computation. In F. L. Alt and M. Rubinoff, editors, Advances
in Computers, volume 8, pages 247333. Academic Press, 1967.
[112] L.A. Lombardi and B. Raphael. Lisp as the language for an incremental computer. In
E.C. Berkeley and D.G. Bobrow, editors, The Programming Language Lisp: Its Operation
and Applications, pages 204219, Cambridge, Massachusetts, 1964. MIT Press.
[113] M. Machtey and P. , Young. An Introduction to the General Theory of Algorithms. NorthHolland, 1978.
[114] Z. Manna. Mathematical Thoery of Computation. MH, 1974.
[115] A.A. Markov. The theory of algorithms. Technical report, Israeli Program for Scientific
Translations, Jerusalem, 1962. Translated from the Russian version which appeared in
1954.
[116] Y.V. Matiyasevich. Enumerable sets are diophantine. Doklady Akedemii Nauk SSSR,
191:279282, 1970. English translation in [117].
[117] Y.V. Matiyasevich. Enumerable sets are diophantine. Soviet Mathematics: Doklady,
11:354357, 1970.
[118] Y.V. Matiyasevich. Diofantova predstavlenie perechislimykh predikatov. Izvestia Akdemii
Nauk SSSR. Seriya Matematichekaya, 35(1):330, 1971. English translation in [122].
[119] Y.V. Matiyasevich. Diophantine representation of recursively enumerable predicates. In
J.E. Fenstad, editor, Proceedings of the Second Scandinavian Logic Symposium, volume 63
of Studies in Logic and the Foundations of Mathematics, pages 171177, Amsterdam,
North-Holland, 1971.
[120] Y.V. Matiyasevich. Diophantine representation of the set of prime numbers. Doklady
Akademii Nauk SSSR, 196:770773, 1971. English translation in [121].
[121] Y.V. Matiyasevich. Diophantine representation of the set of prime numbers. Soviet Mathematics: Doklady, 12:249254, 1971.
[122] Y.V. Matiyasevich. Diophantine representations of enumerable predicates. Mathematics
of the USSR. Izvestia, 15(1):128, 1971.
[123] Y.V. Matiyasevich. Hilberts Tenth Problem. MIT Press, 1993.
[124] J. McCarthy. Recursive functions of symbolic expressions and their computation by machine. CACM, 3(4):184195, 1960.
[125] A. Meyer and D.M. Ritchie. The complexity of loop programs. In Proceedings of the ACM
National Meeting, pages 465469, 1967.
456 Bibliography
[126] A. Meyer and L. Stockmeyer. The equivalence problem for regular expressions with squaring requires exponential space. In Proceedings of the IEEE 13th Annual Symposium on
Switching and Automata Theory, pages 125129, 1972.
[127] A. Meyer and L. Stockmeyer. The equivalence problem for regular expressions with squaring requires exponential space. In Proceedings of the IEEE 13th Annual Symposium on
Switching and Automata Theory, pages 125129, 1972.
[128] R. Milner. Operational and algebraic semantics of concurrent processes. Handbook of
Theoretical Computer Science, B:12031242, 1990.
[129] R. Milner, M. Tofte, and R. Harper. The Definition of Standard ML. MIT, Cambridge,
Massachusetts, 1990.
[130] P.B. Miltersen. Combinatorial Complexity Theory (Ph.D. thesis). BRICS, University of
Aarhus, Denmark, 1993.
[131] M. Minsky. Computation: Finite and Infinite Machines. Prentice-Hall Series in Automatic
Computation, 1967.
[132] T. . Mogensen. Self-applicable online partial evaluation of the pure lambda calculus. In
William L. Scherlis, editor, Proceedings of PEPM 95, pages 3944. ACM, ACM Press,
1995.
[133] Torben . Mogensen. Efficient self-interpretation in lambda calculus. Functional Programming, 2(3):345364, July 1992.
[134] Torben . Mogensen. Linear-time self-interpretation of the pure lambda calculus. HigherOrder and Symbolic Computation, 13(3):217237, September 2000.
[135] N. Nagel and J.R. Newman. G
odels Proof. New York University Press, 1958.
[136] H.R Nielson and F. Nielson. Semantics with Applications. John Wiley & Sons, 1991.
[137] Robert Paige. Efficient translation of external input in a dynamically typed language. In
B. Pehrson and I. Simon, editors, Technology and Foundations Information Processing
94, volume 1 of IFIP Transactions A-51, pages 603608. North-Holland, 1994.
[138] C.H. Papadimitriou. Computational Complexity. Addison-Wesley Publishing Company,
1994.
[139] L.C. Paulson. ML for the Working Programmer. Cambridge University Press, 1991.
[140] Gordon D. Plotkin. A structural approach to operational semantics. Technical Report 19,
Aarhus University, 1981.
[141] E.L. Post. Finite combinatory processesformulation I. Journal of Symbolic Logic, 1:103
105, 1936.
[142] E.L. Post. Formal reductions of the general combinatorial decision problem. American
Journal of Mathematics, 65:197215, 1943.
[143] E.L. Post. Recursively enumerable sets of positive natural numbers and their decision
problem. Bulletin of the American Mathematical Society, 50:284316, 1944.
457
[144] E.L. Post. A variant of a recursively unsolvable problem. Bulletin of the American Mathematical Society, 50:264268, 1946.
[145] M.O. Rabin. Speed of computation and classification of recursive sets. In Third Convention
of the Scientific Society, Israel, pages 12, 1959.
[146] M.O. Rabin. Degree of difficulty of computing a function and a partial ordering of recursive
sets. Technical Report 1, O.N.R., Jerusalem, 1960.
[147] M.O. Rabin. Complexity of computations. Communications of the ACM, 20(9):625633,
1977.
[148] M.O. Rabin. Probabilistic algorithm for testing primality. Journal of Number Theory,
12:128138, 1980.
[149] T. Rado. On a simple source for non-computable functions. Bell System Technical Journal,
pages 877884, May 1962.
[150] C. Reid. Hilbert. Springer-Verlag, New York, 1970.
[151] H.G. Rice. Classes of recursively enumerable sets and their decision problems. Transactions
of the American Mathematical Society, 89:2559, 1953.
[152] J. Richard. Les principes des mathematiques et le probl`eme des ensembles. Acta Mathematica, 30:295296, 1906.
[153] J. Robinson. Existential definability in arithmetic. Transactions of the American Mathematical Society, 72:437449, 1952.
[154] H. Rogers Jr. G
odel numberings of partial recursive functions. Journal of Symbolic Logic,
23(3):331341, 1958.
[155] H. Rogers Jr. Theory of Recursive Functions and Effective Computability. McGraw-Hill,
1967.
[156] G. Rozenberg and A. Salomaa. Cornerstones of Undecidability. Prentice-Hall, 1993.
[157] W. Savitch. Relationship between nondeterministic and deterministic tape complexities.
Journal of Computing and Systems Sciences, 4(2):177192, 1970.
[158] D.A. Schmidt. Denotational Semantics. Boston, MA: Allyn and Bacon, 1986.
[159] A. Sch
onhage. On the power of random access machines. In H. A. Maurer, editor,
Proceedings of the 6th Colloquium on Automata, Languages and Programming, pages 520
529. LNCS 71. Springer, July 1979.
[160] A. Sch
onhage. Storage modification machines. SIAM Journal of Computing, 9:490508,
1980.
[161] D.S. Scott. Some definitional suggestions for automata theory. Journal of Computer and
System Sciences, 1:187212, 1967.
[162] D.S. Scott. Lectures on a mathematical theory of computation. Technical Report PRG-19,
Programming Research Group, Oxford University, 1981.
458 Bibliography
[163] J.C. Shepherdson and H.E. Sturgis. Computability of recursive functions. Journal of the
Association for Computing Machinery, 10(2):217255, 1963.
[164] R. Sommerhalder and S.C. van Westrhenen. The Theory of Computability: Programs,
Machines, Effectiveness and Feasibility. International Computer Science Series. AddisonWesley, 1988.
[165] L. Stockmeyeri. The polynomial time hierarchy. Theoretical Computer Science, 3:122,
1977.
[166] R. Szelepcsenyi. The method of forcing for nondeterministic automata. Bull. EATCS,
33:96100, 1987.
[167] B.A. Trakhtenbrot. Complexity of Algorithms and Computations (lectures for students of
the NGU). Novosibirsk, 1967. In Russian.
[168] V.F. Turchin. The language Refal, the theory of compilation and metasystem analysis.
Courant Computer Science Report 20, Courant Institute of Mathematical Sciences, New
York University, 1980.
[169] V.F. Turchin. The concept of a supercompiler. Transactions on Programming Languages
and Systems, 8(3):292325, 1986.
[170] A.M. Turing. On computable numbers with an application to the Entscheidungsproblem.
Proceedings of the London Mathematical Society, 42(2):230265, 1936-7.
[171] L. Valiant. General purpose parallel architectures. In J. val Leeuwen, editor, Handbook of
Theoretical Computer Science, vol. A. Elsevier, 1990.
[172] P. Wadler. Deforestation: Transforming programs to eliminate trees. In H. Ganzinger,
editor, Proceedings of the European Symposium on Programming, volume 300 of Lecture
Notes in Computer Science, pages 344358. Springer Verlag, 1988.
[173] K. Wagner and G. Wechsung. Computational Complexity. Reidel Publ. Comp., Dordrecht,
Boston, Lancaster, Tokyo, 1986.
[174] J. Welsh and J. Elder. Introduction to Pascal. International series in Computer Science.
Prentice-Hall, second edition, 1982.
List of Notation
[[p]]
ID
|d|
N
n
`
timeLp (d)
p ` s s0
p ` s s0
c01B
cpr
bin
c2CM
-complete
10
10
10
12, 47
31
31
36
36
40
48
88
112
112
121
121
121
131
131
131
131
154, 377
F ns
377
ptime
377
logs
IN
f (a)
f (a)
f (a) =
dom
rng
'
log(n) (n IN )
O(f )
(f )
(f )
o(f )
::=
|
G = (N, T, P, S)
L(G) (G a grammar)
r
L(r) (r a regular expression)
rec
`F
ID01
c : {0, 1} ID01
ptime
lintime
ptime
lintime
lintimepgind
lintimepgind
cID
Ltime(f (n))
Lptime
Llintime
-equivalent
-hard
376
377
174
189
208
225
250
251
251
252
252
252
252
253
259
272
272
272
376
376
459
421
421
421
421
421
423
426
426
426
427
427
427
429
431
431
431
431
432
433, 437
433
433
433
433
434
434, 437
434
435
437
437
Index
Church-Turing thesis, 4, 8, 127
Church-Turing-Kleene thesis, 207
circuit complexity, xv, 9
circuit, monotone, 391
CLIQUE, 371, 373, 374
CM, 111, 116
CM\C:=C+1 , 353
CM-computability, 127, 210
CM-computable, 134, 210
CMlogspace , 354
CMlogspace+rec , 360
CMro, 319, 322
CMvalue(n) , 353
CNF, 371, 374, 422
communicating systems, xv
compilation, 50, 59
for proving equivalence of languages,
127
versus interpretation, 89
with change of data, 52, 129
compiler, 50, 53, 231
bootstrapping, 91
diagrams, 51, 56
generation, 96
compiling function, 50
w.r.t. coding, 52
complete logical system, 198, 200
complete problems, 369, 376, 378
for nlogspace, 380, 384
for nptime, 401
for nlintime, 382
for pspace, 409
for ptime, 387
for re, 379
completeness, 369
complexity classes
ptime, 272, 275
lintime, 272, 277
460
461
462 Index
463
464 Index
equivalent, 48
functional, 137
imperative, 137
implementation, 53, 57
simulating one by another, 48
source, 53, 57
target, 57
left-most multi-step derivation relation,
435
left-most one-step derivation relation, 435
length (of a list), 35
length of a read-only TMro state, 319
length of a state, 318
linear time, 277, 293
linearly equivalent, 252
lintime, 277
lintime, 272
list, 35
list, 37
list representation, 35
literal, 422
logarithmic cost, 255, 257
logspace, 321, 324, 350, 355
logspace functions, 327
lookup, 54
Markov algorithms, 8
match, 57
MCV, 393
minimization, 210
model-independent, 225
multi-step derivation relation, 435
multi-step rewrite relation, 156
natural numbers, 423
natural semantics, 190
negation, 35, 194, 421
NFA, 438
nlintime, 382
nlogspace, 336, 349, 350
nodes, 432
non-deterministic finite automaton, 438
nondeterminism, 243, 335
nonterminals, 434
nonuniform complexity, xv
normal form theorem, 212
normalizations, 339
npspace, 336, 349, 350
nptime, 243, 336, 350
numerals, 36
O-notation, 431
o-notation, 432
omega-notation, 431
one-step derivation relation, 435
one-step rewrite relation, 155
operational semantics, 190
optimality of a specializer, 101
ordered pair, 424
overhead factor, 252
pairing, 47
pairing decomposition, 444
parallelism, xv
parallelism,ptime, 398
partial evaluation, 64, 75, 94
off-line, 104
techniques, 103
partial recursive, 207
partial recursive functions, 24
Pascal-like implementation of GOTO, 266
path finding, 336
PCP, 158
polynomial-time, 275
polynomially equivalent, 252
Posts correspondence problem, 158
predecessor, 36
predicate, 194, 197
prefix, 433
primitive recursion, 208
problem, 3
ambiguity, 163, 436
complete for nlogspace, 380
complete for nlintime, 382
complete for re, 379
completeness, 164, 436
465
membership, 436
natural unsolvable, 153
non-emptiness for, 436
representation, 271
representation of, 372
undecidable, 156
production, 155
program
cons-free, 353
boolean, 409
computes a function, 41
function computed by, 41
looping, 41
self-reproducing, 221
stack, 289
terminating, 41
time-bounded, 271
timed universal, 290
program padding, 233
program point specialization, 104
program property, 76
extensional, 76
intensional, 76
non-trivial, 76
program specializer, 57, 227
optimal, 102
program-dependent, 253, 256
program-independent, 253
programming language, 47
programs
cons-free, 359
programs-as-data, 47
programs-as-data representation, 48
proof tree, 200
propositional formulas, 421
provability, 204
pspace, 321, 324, 349, 350, 409
ptime, 244, 275, 324, 349, 350, 359
ptime, 272
pushdown automaton, 363
quadruples, 424
quantified boolean algebra, 411
RAM, 111
random access machine, 8, 117
read-only, 317
Readin, 113
Readout, 113
real numbers, xv, 423
positive, 423
recursion theorem, 220, 228
recursive, 83
recursive function theory, 207
recursive extension, 359
recursive function, 8, 207
recursively enumerable, 83, 197, 199, 204
reducing SAT to CLIQUE, 374
reduction, 154, 369, 370
reduction function, 372
reflexive extension, 222
REG6= , 436
REGALL, 436
REGAMB, 436
regular expression, 437
set generated by, 437
totality, 414
representable predicate, 202
resource bound, 271, 336
resource-bounded program class
definition of, 271
resources, 242
restriction
to one operator, 61
to one variable, 59
reverse, 33, 433
rewrite rule, 155
Rices theorem, 76
robust, 275, 277, 324
robustness, 241, 271
computability, 127
Rogers, H., 226, 230
rooted DSG, 262
rule
rewrite, 43
running time
WHILE program, 254
466 Index
467
theta-notation, 431
TI-diagrams, 51
time
linear, 277
superlinear, 293
time constructible, 295, 299
time usage, 336
time(f ), 272
timed universal program, 290
timed programming language, 87
TM, 111
TMro, 318
totality of a specializer, 100
tractable, 244
transition function, 438, 439
treeless transformer, 357
triples, 424
true, 421
true, 34
truth, 204
truth assignment, 422
truth table, 422
truth values, 421
n-tuples, 424
tupling function, 225
Turing completeness, 227
Turing machine, 5, 8, 114, 318
configuration, 122
deterministic, 120
enumeration, 225
program, 225
uncomputable functions, 15
undecidable, 76, 156
undirected graph, 433
unfolding, 193
unfolding function calls, 104
unit cost, 250, 257, 261
universal function, 207
universal function property, 227
universal program, 70
universal quantifier, 195
unnecessary code elimination, 84
unsolvable, 4
update, 137
update, 54
valid, 445
value assumption, 191
vector notation, 424
VERTEXCOVER, 384
vertices, 432
Wadler, P., 354, 357
WHILE, 29, 31
WHILEcomputable, 73
WHILE1op , 61
WHro, 353