Randomness & Complexity 2007 Calude PDF
Randomness & Complexity 2007 Calude PDF
8:57
Cristian S. Calude
book
8:57
book
8:57
book
8:57
book
8:57
Preface
for friends.
coined the name AIT; this name is becoming more and more popular.
2 Chaitin
book
8:57
VI
During its history of more than 40 years, AIT knew a significant variation in terminology. In particular, the main measures of complexity
studied in AIT were called Solomonoff-Kolmogorov-Chaitin complexity,
Kolmogorov-Chaitin complexity, Kolmogorov complexity, Chaitin complexity, algorithmic complexity, program-size complexity, etc. Solovays handwritten notes [22]3 , introduced and used the terms Chaitin complexity and
Chaitin machine.4 The book [21] promoted the name Kolmogorov complexity for both AIT and its main complexity.5
The main contribution shared by AIT founding fathers in the mid 1960s
was the new type of complexitywhich is invariant up to an additive
constantand, with it, a new way to reason about computation. Founding fathers subsequent contributions varied considerably. Solomonoffs
main interest and results are related to inductive inference, see [19]. Kolmogorovs main contributions to AIT were mainly indirect6 through the
works of his students, P. Martin-Lof, L. Levin, V. Uspenskij.7 Chaitins
contributionsspanning over four decadeson plain and program-size
complexity, algorithmic randomness (finite and infinite sequences), applications to G
odel incompleteness, and concrete versions of AIT (hands-on
programming), are central for the field. One can appreciate their lasting
impact by inspecting the forthcoming monograph [17] which also includes
the boom of results obtained in the last decade (due in part to the renaissance of Recursion Theory focussed on AIT).
While Chaitins main contributions are in AIT, he was engaged in other
research projects as well.
His first paper [5]published when he was 18was in automata theory. It significantly improves a theorem by Moore, which later became very
important for modelling quantum phenomena with automata, see [24]. In
fact, Chaitin was interested in the relation between computation and quantum physics since the early 1960s; he even wrote an APL2 course outline for
3 During
the research for my book [1] I was advised by Greg Chaitin to carefully read [22].
The problem was that the manuscript appeared to have vanished, certainly Chaitin and
Solovay didnt have copies. Eventually, C. Bennett kindly sent me a copy in early 1993
and I circulated it in the community. It had the lasting effect predicted by Chaitin; see
for example [17].
4 Solovay continued to use this terminology in his later paper [23]. I used this terminology
in [1].
5 See more about the early history of AIT in [25].
6 As Shen explained in Dagsthul, by the mid 1960s Kolmogorovs interests had more and
more focused on school mathematics education.
7 See [20] for a complete list of publications.
book
8:57
book
VII
physics, [8].8 His paper [6] initially included a direct reference to Natures
impact on how a Turing machine operates, but that part was removed at
the referees request. Here is how he put it recently in [3]:
If Nature really lets us toss a coin, then, with extremely
high probability, you can actually compute algorithmically irreducible strings of bits, but theres no way to do that in a
deterministic world.
Chaitin has been employed by IBM for 40 years.9 In the late 1970s and
early 1980s he was part of the group designing the RISC architecture. One
of his main contributions,10 the Chaitin style graph colouring algorithm
for optimal global register allocation, featured as a key innovation in the
IBM 801 computer; a very influential paper [7, 9] describes this algorithm.
Chaitins intense interest in hands-on programming is also visible in his
pioneering work on concrete AIT.
Chaitin has a long-time interest in philosophy, in general, and in the
philosophy of mathematics, in particular. For Chaitin [11] (p. xiii), programming is a reliable way to achieve mathematical understanding:
To me, you understand something only if you can program it.
(You, not someone else!)
He supports the view that mathematics is quasi-empirical11 and experimental mathematics should be used more freely. In recent years his
attention was captured by Leibniz, the man and the philosopher, whom he
sees as precursor of the idea of descriptional complexity:
Mais quand une regle est fort composee, ce qui luy est conforme,
passe pour irregulier.12
One may be led to think that philosophical interests and a focus on big
ideas signal a lack of steam for technical mathematical problems. This is
8 He
8:57
VIII
not the case as his most recent papers show [13, 14].13
My first contact with Chaitins work was through Martin Davis paper
[16], which gives a beautiful presentation of one of Chaitins informationtheoretic forms of incompleteness.14 In the late 1970s, I started a weekly
research seminar on AIT at the University of Bucharest which lasted till
my departure in 1992. In this seminar, we read many of Chaitins papers
and we presented some of our own; he was our main source of inspiration. I
first met Greg in January 1993he was my first visitor in Auckland, where
my family relocated in December 1992. Since then, I have been very privileged to continue meeting him regularly, to understand some of his results
not from papers, not from books, but from stimulating discussions, and to
cooperate on different projects (including a joint paper published in Nature, [2], which inspired a poem [15]).
Chaitin is an unconventional person as well as an unconventional thinker
and scientist. Radically new ideas are not easily accepted and more often
than not generate strong opposition and resentment. One of the best portraits of Chaitin in action was painted by John Horgan in [18]:
Stout, bald, and boyish, he wore neobeatnik attire: baggy white
pants with an elastic waistband, black T-shirt adorned with a
Matisse sketch, sandals over socks. He was younger than I expected; I learned later that his first paper had been published
when he was only 18, in 1965. His hyperactivity made him seem
younger still. His speech was invariably either accelerating, as
he became carried away by his words, or decelerating, perhaps
as he realized he was approaching the limits of human comprehension and ought to slow down. Plots of velocity and volume
of his speech would form overlapping sine waves. Struggling to
articulate an idea, he squeezed his eyes shut, and, with an agonized grimace, tipped his head forward, as if trying to dislodge
the words from his sticky brain.
recent paper [14] was inspired by Davis paper [16], perhaps not a random
coincidence.
14 This paper is reproduced in this book.
15 In an article included in this book, J.-P. Delahaye observes that the symbol had
been used in mathematics for a variety of purposes. But increasingly it was reserved for
Chaitins number, just as came exclusively to represent Archimedes constant at the
beginning of the 18th century.
book
8:57
book
IX
can get a mere16 glimpse of it. How could we hope to understand Chaitin?
The contributions included in this book have been grouped into the following categories: technical contributions (AIT and related areas, Physics,
Algebra, Automata Theory, Computer Architecture), papers on Philosophy,
essays, and reminiscences. The book also includes Chaitins own recollections on AIT, and pictures from the Chaitin celebration at the New Kind
of Science Conference (Burlington, 15 July 2007).
I am grateful to Prof. Kok Khoo Phua, Chairman of World Scientific,
for supporting this project and to Kim Tan from World Scientific for being
such an efficient editor. I wish to thank Springer and the science magazine
Pour la Science for allowing me to reprint articles initially published by
them. I also thank Jeff Grote, Sally McCay, Jacquie Meyer, David Reiss,
and Karl Svozil for permission to reproduce their pictures. A big thank
you goes to all contributors to this volume as well as to Wolfram Research,
University of Vienna, the Godel Society and IBM Research for organising
meetings in which the book was or will be presented.
Finally, to the Omega Man, to use the Time magazine formula, from all
of us, a Very Happy Birthday!
Cristian S. Calude
Auckland, July 2007
References
[1] C. S. Calude. Information and Randomness. An Algorithmic Perspective,
Springer-Verlag, Berlin, 1994. Second Edition, Revised and Extended, 2002.
[2] C. S. Calude, G. J. Chaitin. Randomness everywhere, Nature 400, 22 July
(1999), 319320.
[3] C. S. Calude, G. J. Chaitin. A dialogue on mathematics & physics, The
Rutherford Journal: The New Zealand Journal for the History and Philosophy of Science and Technology, Vol. 2, 20062007, www.rutherfordjournal.
org.
[4] C. S. Calude, M. J. Dinneen. Exact approximations of omega numbers, Int.
Journal of Bifurcation & Chaos 17, 6 (2007), to appear.
[5] G. J. Chaitin. An improvement on a theorem by E. F. Moore, IEEE Transactions on Electronic Computers, EC-14 (1965), 466467.
[6] G. J. Chaitin. On the length of programs for computing finite binary sequences, Journal of the ACM 13 (1966), 547569.
16 Finite,
ridiculously small.
8:57
[7] G. J. Chaitin. Register allocation and spilling via graph coloring (with
retrospective), Best of PLDI, 1982, 6674, http://doi.acm.org/10.1145/
989393.989403. Also in: Proc. SIGPLAN 82 Symp. Compiler Construction,
June 1982, 98105.
[8] G. J. Chaitin. An APL2 gallery of mathematical physicsA course outline,
Proceedings Japan 85 APL Symposium, N:GE1899480, IBM Japan, 1985,
156.
[9] G. J. Chaitin, M. A. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins
and P. W. Markstein. Register allocation via coloring, Computer Languages
6, (1981), 4757.
[10] G. J. Chaitin, C. H. Hoagland, M. J. Stephenson. System and method
for generating an object module in a first format and then converting
the first format into a format which is loadable into a selected computer, United States Patent 4791558, http://www.freepatentsonline.com/
4791558.html.
[11] G. J. Chaitin. Meta Math!, Pantheon, New York, 2005.
[12] G. J. Chaitin. Epistemology as information theory: from Leibniz to Omega,
Collapse 1 (2006), 2751.
[13] G. J. Chaitin. Probability and program-size for functions, Fundamenta Informaticae 71 (4) (2006), 367370.
[14] G. J. Chaitin. An algebraic characterization of the halting probability, Fundamenta Informaticae, 79 (1-2) (2007), 1723.
[15] R. M. Chute. Reading a note in the journal Nature I learn, Beloit Poetry
Journal 50 (3), Spring 2000, 8. Also in the book by the same author, Reading
Nature, JWB, Topsham, 2006.
[16] M. Davis. What is a Computation? in L. A. Steen, Mathematics Today:
Twelve Informal Essays, Springer-Verlag, New York, 1978, 241267.
[17] R. Downey and D. Hirschfeld. Algorithmic Randomness and Complexity,
Springer, Berlin, to appear in 2008.
[18] J. Horgan. The End of Science, Helix Books, Reading Mass., 1996, 227228.
[19] M. Hutter. Universal Artificial Intelligence: Sequential Decisions Based on
Algorithmic Probability, Springer, Berlin, 2004.
[20] A. N. Kolmogorov. Selected Works of A. N. Kolmogorov. Vol. III. Information Theory and the Theory of Algorithms, Kluwer, Dordrecht, 1993.
[21] M. Li and P. Vit
anyi, An Introduction to Kolmogorov Complexity and Its
Applications, Springer, Berlin, 1993. Second Edition, 1997.
[22] R. M. Solovay. Draft of a paper (or series of papers) on Chaitins work ...
done for the most part during the period of Sept.Dec. 1974, unpublished
manuscript, IBM Thomas J. Watson Research Center, Yorktown Heights,
New York, May 1975, 215 pp.
[23] R. M. Solovay. A version of for which ZF C can not predict a single bit,
in C.S. Calude, G. P
aun (eds.). Finite Versus Infinite. Contributions to an
Eternal Dilemma, Springer-Verlag, London, 2000, 323334.
[24] K. Svozil. Randomness & Undecidability in Physics, World Scientific, Singapore, 1993.
[25] Panel discussion: History of Algorithmic Randomness & Complex-
book
8:57
book
XI
8:57
book
XII
Contents
Technical Contributions
1.
2.
13
Francoise Chaitin-Chatelin
3.
JanusFaced Physics
25
69
P. C. W. Davies
5.
What is a Computation?
89
Martin Davis
6.
123
131
153
James Goodman
9.
A Berry-type Paradox
155
Gabriele Lolli
10.
in Number Theory
161
175
217
8:57
book
XIII
13.
231
Karl Svozil
14.
237
John Tromp
Philosophy
261
15.
263
16.
281
F. Walter Meyerstein
17.
287
Ugo Pagallo
Essays
299
18.
301
19.
Gods Number
321
Marcus Chown
20.
Omega Numbers
343
Jean-Paul Delahaye
21.
359
Tor Nrretranders
22.
367
Stephen Wolfram
23.
383
Doron Zeilberger
Reminiscences
411
24.
413
8:57
book
XIV
Andreea S. Calude
25.
417
John Casti
uvre
421
26.
423
Celebration
443
27.
445
8:57
Technical Contributions
book
8:57
book
8:57
Chapter 1
On Random and Hard-to-Describe Numbers
Charles H. Bennett1
IBM Watson Research Center, Yorktown Heights, NY 10598, USA;
[email protected]
The first essay discusses, in nontechnical terms, the paradox implicit in
defining a random integer as one without remarkable properties, and
the resolution of that paradox at the cost of making randomness a property which most integers have but cant be proved to have. The second
essay briefly reviews the search for randomness in the digit sequences
of natural irrational numbers like and artificial ones like Champernownes C = 0.12345678910111213 . . ., and discusses at length Chaitins
definable-but-uncomputable number , whose digit sequence is so random that no betting strategy could succeed against it. Other, Cabalistic
properties of are pointed out for the first time.
paper was written and widely circulated in 1979, but is published here for the first
time in its original form.
3
book
8:57
Charles H. Bennett
book
8:57
book
occurs with equal asymptotic frequency. It is easy to show that no rational number is normal to any base, and that almost all irrational numbers
are normal to every base; but the normality of these most famous irrational numbers remains open. The question cannot be settled by any finite
amount of statistical evidence, since an ultimately normal number might
begin abnormally (e.g. e = 2.718281828 . . .), or vice versa. Existing evidence [5] shows no significant departures from randomness in . e also
appears to be normal, though there is some evidence for other statistical
irregularities [6].
In contrast to , whose random-appearing digit sequence mocks the
attempt to prove it so, the following very non-random number:
C = 0.12345678910111213141516171819202122232425262728293031 . . .
is nevertheless provably normal, to base 10. This number, invented by D.
G. Champernowne [7], consists of the decimal integers written in increasing order (Benoit Mandelbrot has pointed out another number of this sort,
whose base-2 normality is implicit in an earlier paper by N. Wiener [8]).
Departures from equidistribution are large in the initial portion of Champernownes or Wieners number, but approach zero as the count is extended
over more and more of the sequence. It is apparently not known whether
these numbers are normal to every base.
Although the digit sequence of may be random in the sense of being
normal, it is definitely not random in the sense of being unpredictable: a
good gambler betting against it would eventually infer its rule and thereafter always win, and only a very inept gambler could lose many bets against
Champernownes number. Is there a sequence so random that no computable betting strategy, betting against it at fair odds, can win an infinite
gain? Any number that is random in this strong sense is also normal to
every base. It is a basic result of probability theory that almost all real
numbers are random in this strong sense [9], but here again we are seeking
a specific random number.
There is, of course, a sense in which no specifically definable real number can be random. Since there are uncountably many real numbers but
only countably many definitions, the mere fact that a real number is definable makes it atypical of real numbers in general. Here, however, we are
only seeking a number whose atypicality is unrecognizable by constructive
means. In particular, the number we are seeking must not be computable
from its definition; since if it were, that would already imply a perfect betting strategy. One may define an uncomputable real number K in terms
8:57
Charles H. Bennett
halting problem, i.e. the problem of distinguishing programs that come to a spontaneous halt from those that run on indefinitely, is the classic unsolvable problem of
computability theory. At first sight the problem might seem solvable since, if a program
halts, that fact can certainly be demonstrated by running the program long enough.
Moreover there are many programs which can easily be proven to halt or not to halt
even without running the program. The difficulty comes not in solving particular cases,
but in solving the problem in general. It can be shown that there is no effective prescription for deciding how long to run a program that waits long enough to reveal the
halting of all halting programs, nor any consistent system of axioms strong enough to
prove the non-halting of all non-halting ones. The unsolvability of the halting problem
can be derived from and indeed is equivalent to the fact that most random integers cant
be proved random.
book
8:57
book
<
n + 2n .
were a terminating binary rational, the expansion ending in infinitely many ones
should be used, making n < = n + 2n . In fact, this problem never arises, since, as
will be proved presently, is irrational, and so lies strictly between n and n + 2n .
4 Some well-known conjectures, e.g. that is normal, or that there are infinitely many
twin primes (consecutive odd primes like 3 and 5 or 17 and 19), or that there are only
finitely many primes of the form 2n +1, are not in principle decidable one way or the other
by any finite amount of direct evidence. Perhaps the most important conjecture of this
sort is the P 6= NP conjecture in computational complexity theory, which holds that there
8:57
Charles H. Bennett
book
8:57
book
domness of all integers of n + 1 bits or less. The procedure for doing this
is essentially the same as that used to solve the halting problem: to find
whether a given n + 1 bit integer x is algorithmically random, use n as
described earlier to find all n-bit programs that halt. If none of these
has x as its output, then by definition x is algorithmically random.
Let us now return to the senses in which itself is random: its incompressibility and the impossibility of successfully gambling against it. It may
appear strange that can contain so much information about the halting
problem and yet be computationally indistinguishable from a meaningless
random sequence generated by tossing a coin. In fact, is a totally informative message, a message which appears random because all redundancy
has been squeezed out of it, a message which tells us only things we dont
already know.
To show that is incompressible, let p be a program that for some n
computes n , the first n bits of . This program may be altered, increasing
its size by at most c bits (c a constant independent of n), so that instead of
printing n it finds and prints out the first algorithmically random (n + 1)bit number, as explained above. This would be a contradiction unless the
original program p were at least n c bits long.
No finitely describable computable gambling scheme can win an infinite
profit betting against the bits of . Let G be a gambling scheme, describable
in g bits, and able to multiply the gamblers initial capital 2k fold by betting
on some number n of initial bits of . Without loss of generality we may
suppose that the scheme includes a specification of the desired gain 2k , and
that it quits as soon as this gain is achieved, making no further bets. One
may imagine the same gambling scheme applied to other inputs besides .
On most of them it would fail to achieve its goal, but on some it would
succeed. Indeed one may use G to enumerate all the finite inputs on which
G would quit successfully. This set has total probability 2k or less, of
which 2n is contributed by n .
It can be shown [10] that about n k bits suffice to locate n within
this enumeration of successful inputs. Therefore n can be computed by
a program of approximately g + n k bits. This means, in turn, that
k cannot be much greater than g without violating the incompressibility
of . Therefore no g-bit gambling scheme betting on can multiply the
initial capital by more than about 2g , the amount one would win by simply
knowing g bits of and betting only on those bits.
Throughout history philosophers and mystics have sought a compact
key to universal wisdom, a finite formula or text which, when known and
8:57
10
Charles H. Bennett
understood, would provide the answer to every question. The Bible, the
Koran, the mythical secret books of Hermes Trismegistus, and the medieval Jewish Cabala have been so regarded. Sources of universal wisdom
are traditionally protected from casual use by being hard to find, hard to
understand when found, and dangerous to use, tending to answer more and
deeper questions than the user wishes to ask. Like God the esoteric book is
simple yet undescribable, omniscient, and transforms all who know It. The
use of classical texts to foretell mundane events is considered superstitious
nowadays, yet, in another sense, science is in quest of its own Cabala, a
concise set of natural laws which would explain all phenomena. In mathematics, where no set of axioms can hope to prove all true statements, the
goal might be a concise axiomatization of all interesting true statements.
is in many senses a Cabalistic number. It can be known of, but not
known, through human reason. To know it in detail, one would have to accept its uncomputable digit sequence on faith, like words of a sacred text. It
embodies an enormous amount of wisdom in a very small space, inasmuch
as its first few thousand digits, which could be written on a small piece
of paper, contain the answers to more mathematical questions than could
be written down in the entire universe, including all interesting finitelyrefutable conjectures. Its wisdom is useless precisely because it is universal:
the only known way of extracting from the solution to one halting problem, say the Fermat conjecture, is by embarking on a vast computation that
would at the same time yield solutions to all other equally simply-stated
halting problems, a computation far too large to be carried out in practice.
Ironically, although cannot be computed, it might accidentally be generated by a random process, e.g. a series of coin tosses, or an avalanche that
left its digits spelled out in the pattern of boulders on a mountainside. The
initial few digits of are thus probably already recorded somewhere in the
universe. Unfortunately, no mortal discoverer of this treasure could verify
its authenticity or make practical use of it.
The author has received reliable information, from a Source who wishes
to remain anonymous, that the decimal expansion of begins
0.9999998020554253273471801908 . . .
References
(1) Whitehead, A. N., and Russell, B., Principia Mathematica,
Vol. 1, Cambridge University Press, London (1925) p. 61.
book
8:57
book
11
Fig. 1. Method of using to solve the halting problem for all programs of
length n. Let the first n digits of be known; call them n . Place a weight
equal to n in the left pan of a balance. Meanwhile begin a systematic but
unending search for programs that halt, running one program then another
for greater and greater time in the manner of the song The Twelve Days
of Christmas. Every time a program of length k is found to halt, having
caused the computer to read neither more nor less than its full k bits in the
course of the computation, add a weight 2k to the right pan of the balance,
8:57
12
Charles H. Bennett
book
8:57
Chapter 2
Computing Beyond Classical Logic: SVD Computation in
Nonassociative Dickson Algebras
Francoise Chaitin-Chatelin
Universite Toulouse 1 and CERFACS,42 avenue G. Coriolis 31057
Toulouse Cedex 1, France; E-mail: [email protected]
This short note puts the fundamental work of G. Chaitin into an historical perspective about the multisecular evolution of the art of computing.
It recalls that each major step forward in the creation of new numbers
was met by strong opposition. It shows, by way of an example taken
from SVD computation in nonassociative Dickson algebras, why classical
logic cannot account for certain results which carry global information.
2.1. Introduction
The scientific uvre of Gregory Chaitin revolves around computation and
displays a remarkable unity of thought. More than 4 decades ago, Chaitin
began to explore the limits of computation within the paradigm of a Turing machine. This led him to the celebrated Omega number [1,2] which
expresses the ultimate in uncomputability `a la Turing.
The Turing thesis about computability is an axiomatic definition of
what can be computed (by a machine) within the limits of classical logic, the
rational logic based on elementary arithmetic. This axiom is now accepted
by computer scientists and logicians as a universal rule for computation.
Therefore the work of Chaitin, which questions this claim to universality
from within, has aroused passionate and antagonistic reactions, positive
and negative.
One of the main reasons for the irrational passion stirred by his work is
that it is rooted at a most fundamental level. Few questions reach deeper
into human understanding than What can rational computation achieve?.
From the point of view of classical logicians, the theoretical findings
of Chaitin about computation are unacceptable. Not because the mathematics are wrongthey are impeccable, but because the conclusions are
13
book
8:57
14
Francoise Chaitin-Chatelin
The discovery of each of these new numbers was a major step forward in
the evolution of the art of computing in the western world. This advance,
which spanned over seven centuries, was instrumental in the axiomatic
clarification of the foundations of mathematics which occured at the dawn
of the 20th Century. Thereafter, even associativity became an optional
feature for multiplication.
book
8:57
book
15
8:57
16
Francoise Chaitin-Chatelin
It is conventional wisdom that the lack of associativity is a severe limitation for computation in Ak , k 3. Nothing could be further from reality,
as this was shown in [3, 4]. Nonassociativity creates computational opportunities which are well exemplified by the Singular Value Decomposition
(SVD) for the left multiplication map La : x 7 a x, x Ak (Section 4).
Let [x, y, z] = (x y) z x (y z) denote the associator for x, y, z
in Ak , k 3.
Remark. Vectors in Dickson algebras have been called hypercomplex
numbers in the 19th Century. And computation on hypercomplex numbers is classically known as hypercomputation [3,4,7]. This mathematical notion should not be confused with a recent version of computation
designed by computer scientists to overcome some of Turings limitations
(see http://en.wikipedia.org/wiki/Hypercomputation.)
2.3.2. Alternative vectors in Ak , k 4
An alternative vector a in Ak satisfies the weakened associativity condition:
[a, a, x] = ||a||2 x + a (a x) = 0
for any x in Ak . The condition is identically satisfied for k 3, but not
for k 4. All canonical basis vectors ei , i = 0 to 2k 1, are alternative
for arbitrary k. Among them, the two vectors 1 = e0 and 1 = e2k1 have
stronger properties. They span the subalgebra C1 = lin(1, 1) isomorphic to
C. Any pair of vectors (x, y) in C1 satisfies
[x, x, y] = [x, y, y] = 0.
The vectors in C1 are fully alternative for k 4 [3].
2.3.3. The splitting Ak = C1 Dk , k 2
Let a be in Ak . It can be represented as the sum
a = + 1 + c
where h = +
1 C1 is the fully alternative head and c is the tail: c
belongs to the subspace Dk = C
of vectors with zero component on 1
1
and on
1=
1k . These vectors are called doubly pure. Such a splitting
plays an important role in non classical SVD calculations in Ak , k 3
(Section 4).
book
8:57
book
17
8:57
18
Francoise Chaitin-Chatelin
The classical derivation of the SVD for La from that for Lc yields a
generalization of Pythagoras theorem to ||h|| =
6 0 and to the singular values
for Lc , c 6= 0:
a = h + c = N = ||h||2 + > 0 for h 6= 0.
We have discovered (2005) that the nonassociative nature of multiplication in Dickson algebras for k 3 enables us to perform a non classical
derivation, which is a computational artifact in Ak , k 3 [3].
2.4.3. Nonclassical derivation from c to a, k 3
The nonclassical mode of derivation is defined in [3,Section 9]. It uses
the block-diagonal form of LTa La (with blocks of order 4) written in the
eigenbasis for L2c . In this nonclassical approach, the order in which the
addition of and
1 is performed matters. From an SVD point of view,
addition is not always associative in Ak , k 3, as we shall see.
When 6= 0, there are 3 different routes to go from c to a in C1 ,
as sketched on Figure 1: one can reach a either directly (diagonally) or
sideways through d =
1 + c, or through e = + c. When = 0, the
route is unique.
Figure 2.1.
book
8:57
book
19
where t 0, s R.
Theorem 2.2. For a = + 1 + c, c Dk , k 3, the nonclassical SVD
derivation yields the nonnegative values listed below in two columns:
0 < 6= 14
0
via d
2
direct or via e
2
N1 = + + 1
N 2
= 2 + (, )
N0 = 2 + 2
N1
2
N 2
+
2
= (, + )
N0 2 = ( )2
Table 1
Proof.
i. Once tamed, i = 1 found its way in almost all engineering calculations of the 19th Century which dealt with wave propagation (light, sound,
electricity,magnetism,...).
Warned by history, we should be extremely cautious. We should not
jump hastily to the obvious conclusion. Could it be possible that nonclassical SVD computation serves a purpose from a computational point of
view, and that it delivers useful information?
8:57
20
Francoise Chaitin-Chatelin
|||| = ||a|| = N1 .
Theorem 2.3. The eigenvalues of LT L are given in Table 1 by the left
most column. Their values equal the nonclassical eigenvalues for LTa La ,
computed in Ak via d =
1 + c. Their multiplicities are multiplied by 2.
Proof.
Let = ( + c, 1), and v = (x, y). Direct computation of
( v) shows that LT L has the 2 2 block representation
L=
M G
G M
0
K
4
0 < 6= 1 , the blocks are of the form N I8 + 2 J, with J =
K 0
0 I2
and K =
. K is antisymmetric, K T = K, K T K = K 2 = I4 .
I2 0
Its eigenvalues are i and its singular values are 1 quadruple. Thus the
eigenvalues of J are given
pair 1. And the eigenvalues
by the quadruple
T
4
of L L are N 2 for 0 < 6= 1 .
We have been able to interpret half of the seemingly meaningless singular values in Ak by the singular values of (a + c, 1) in the complexified
algebra Ak+1 = Ak Ak 1k+1 .
This is not a complete surprise. The interpretation of the nonclassical
singular values by induction from Ak to Ak+1 mimics, for k 3, the inter
pretation of 1 from R to C (k = 0). What seemed at first impossible
or absurd at a given level (dimension 2k ) can be resolved and understood
easily at the next level (dimension 2k+1 ).
However, this is just the tip of the iceberg, since any a in Ak can induce
4 or 8 different vectors in Ak+1 . A more complete study can be found in [6].
It sheds light on the role of nonclassical SVD in the process of creation by
hypercomputation.
book
8:57
book
21
2.6. Conclusion
The moral of this story about computation with hypercomplex numbers
has already been given by Leibniz more than 300 years ago: There is
hardly any paradox without its proper role. And history tells us that extreme caution should be used before judging, based on past experience, that
certain computations are absurd or impossible. Computation in nonassociative Dickson algebras begs for an extension of classical logic. It calls
for a dynamical logic where the results of a computation can be right and
wrong, depending on the point of view.
For example, in Ak , k 3, d = 1 + c in Im Ak is alternative iff c is
alternative in Dk . Thus d cannot be a zero divisor in Ak when we assume
c to be alternative. For || = ||c||, = (c, 1) is a zero divisor in Dk+1 [3].
This property is indicated by the nonclassical singular values: one is 0, the
other is 2||c||. These 2 values are wrong in relation with a, in Ak , but they
are the exact singular values forL in Dk+1 . The exact classical singular
value relative to a is, of course, 2||c|| = ||a|| = ||||, but it is mute about
the 2 other singular values for L .
This internal dynamical relativity of viewpoints created by induction
exists for each level k. The limit as k defines an evolution which is
clearly beyond the reach of any Turing machine [6].
If one wants to understand the manifested world, the moving, flexible
world that one sees and experiences, it is necessary to scrutinise the way
information is being dynamically processed during computation. This necessity was sensed by Gregory Chaitin already in the mid 1960s when he
conceived of his Algorithmic Information Theory (AIT). His theory explores
the limits of formal axiomatic reasoning based on the Turing paradigm. As
was mentioned in the introduction, Chaitin exposes the limitations from
within the paradigm. It is clear that Dicksons hypercomputation lies outside the paradigm, shedding a complementary light on the limitations from
without.
Time will come when it will be obvious that the Turing thesis is a
straight-jacket imposed on computation to make it mechanical. Time will
come when the message of Chaitin about the limitations of purely rational
computation and of axiomatic reasoning will be received by everyone [2].
There are many ways out of the evolutive dead-end that would result
from any axiomatically constrained computation, such as the one that was
imagined in the 20th Century by Hilbert (1900) and Turing (1936).
8:57
22
Francoise Chaitin-Chatelin
Acknowledgement
The author wishes to thank Gregory Chaitin for countless illuminating
discussions over the past 14 years. The views expressed above about the
Turing thesis on computability are entirely hers.
References
[1] G. Chaitin (2005) Meta Math! The Quest for Omega, Pantheon Books, New
York
[2] G. Chaitin (2006) The limits of reason, Scientific American 294, 74-81,
March 2006.
[3] F. Chaitin-Chatelin (2005). Inductive multiplication in Dickson algebras,
Technical Report TR/PA/05/56, CERFACS, Toulouse, France, 2005.
[4] F. Chaitin-Chatelin (2006). Calcul non lineaire dans les alg`ebres de Dickson,
Technical Report TR/PA/06/07, CERFACS, Toulouse, France, 2006.
[5] F. Chaitin-Chatelin (2006). Computation in nonassociative Dickson algebras,
Invited talk, July 2006, Symposium in honour of Prof. Jacqueline Fleckinger,
Univ. Toulouse 1.
[6] F. Chaitin-Chatelin (2007). About an organic logic ruling the continuous
book
8:57
[7]
[8]
[9]
[10]
[11]
[12]
[13]
1 All
book
23
8:57
24
Francoise Chaitin-Chatelin
book
8:57
Chapter 3
JanusFaced Physics: On Hilberts 6th Problem
N. C. A. da Costa1 , F. A. Doria
author
25
book
8:57
26
completeness results for the axiomatic systems which are developed out of a
simple set of concepts where we have cradled significant portions of physics
and show that they imply the undecidability of many interesting questions
in physics and the incompleteness of the corresponding formalized theories.
We give chaos theory as an example. Those results point towards the conceptual depth of the game of physics. Our presentation follows several
results that we have obtained in the last two decades.
Why axiomatize a scientific theory? Doesnt the extra rigor required
carry with itself an unwanted burden that hinders the understanding of the
theorys concepts?
We axiomatize a theory not only to better understand its inner workings but also in order to obtain metatheorems about that theory. We will
therefore be interested in, say, proving that a given axiomatic treatment for
some physical theory is incomplete (that is, the system exhibits the incompleteness phenomenon), among other things. As a followup, we would also
like to obtain examples, if any, of physically meaningful statements within
that theory that are formally independent of its axioms.
Out of the authors previous work [1518] we describe here a technique
that allows for such an axiomatization. Its main guidelines are:
First, the mathematical setting of the theory is clarified, and everything
is formulated within the usual standards of mathematical rigor.
We then formulate those results within an adequate axiomatic framework, according to the prescriptions we present in this paper.
We may also be interested in semantic constructions. As the required
step to obtain those results, we show here how to embed a significant portion of classical physics within a standard axiomatic set theory such as
the ZermeloFraenkel system together with the Axiom of Choice (ZFC set
theory). By classical physics we mean classical mechanics as seen through
the analyticocanonical (Lagrangian and Hamiltonian) formalism; electromagnetic theory; Diracs theory of the electron the Schrodinger theory
of the electron is obtained through a limiting procedure; general relativity;
classical field theories and gauge field theories in particular.
Then, it is possible to examine different models for that axiom system
and to look for sentences that are true or not depending on their interpretation and that hopefully have corresponding different physical interpretations.
book
8:57
JanusFaced Physics
book
27
It is obvious that the crucial idea is the rather loose concept of physically meaningful sentence. We will not try to define such a concept.
However we presume (or at least hope) that our main examples somehow
satisfy that criterion, as they deal with objects defined within physical
theories, and consider problems formulated within the usual intuitively understood mathematical constructions of physics. Chaitin says that mathematics is random at its core. We show here that mathematics and the
mathematicallybased sciences are also pervaded by undecidability and by
highdegree versions of incompleteness when axiomatized.
3.2. Hilberts 6th Problem
When we discuss the possibility of giving physics an axiomatic treatment we
delve into an old and important question about physical theories [12, 68].
The sixth problem in Hilberts celebrated list of mathematical problems
sketches its desirable contours [34]:
The Mathematical Treatment of the Axioms of Physics.
The investigations on the foundations of geometry suggest the problem:
to treat in the same manner, by means of axioms, those physical sciences
in which mathematics plays an important part; in the first rank are the
theory of probability and mechanics.
As to the axioms of the theory of probabilities, it seems to me to be desirable that their logical investigation be accompanied by a rigorous and
satisfactory development of the method of mean values in mathematical
physics, and in particular in the kinetic theory of gases.
Important investigations by physicists on the foundations of mechanics
are at hand; I refer to the writings of Mach. . . , Hertz. . . , Boltzmann. . . ,
and Volkman. . . It is therefore very desirable that the discussion of the
foundations of mechanics be taken up by mathematicians also. Thus
Boltzmanns work on the principles of mechanics suggests the problem
of developing mathematically the limiting processes, those merely indicated, which lead from the atomistic view to the laws of continua.
Conversely one might try to derive the laws of motion of rigid bodies
by a limiting process from a system of axioms depending upon the idea
of continuously varying conditions on a material filling all space continuously, these conditions being defined by parameters. For the question
as to the equivalence of different systems of axioms is always of great
theoretical interest.
If geometry is to serve as a model for the treatment of physical axioms,
we shall try first by a small number of axioms to include as large a class
as possible of physical phenomena, and then by adjoining new axioms
to arrive gradually at the more special theories. At the same time Lies
8:57
28
book
8:57
JanusFaced Physics
book
29
that derive from the use of metamathematical techniques which are among other
things applied to the mathematics that underlie physics had already been obtained by
MaitlandWright [49] in the early 1970s. That author investigated some aspects of the
development of Solovays mathematics [58], i. e., of a forcing model of set theory where
a weakened version of the axiom of choice holds, as well as the axiom every subset of
the real line is Lebesguemeasurable. Among other interesting results, it is shown that
the theory of Hilbert spaces based on that model does not coincide with the classical
version. Explorations of forcing models in physics can be found in [6, 7, 46].
3 This is the actual way most courses in theoretical physics are taught.
8:57
30
treatment of field theories, and then one applies those formalisms to electromagnetic theory, to Schr
odingers quantum mechanics which is obtained
out of geometrical optics and the eikonal equation, which in turn arise from
HamiltonJacobi theory and gravitation and gauge fields, which grow
out of the techniques used in the formalism of electromagnetic theory. Here
we use a variant of this approach.
Electromagnetism
The first conceptually unified view of electromagnetic theory is given in
Maxwells treatise, dated 1873 (for a facsimile of the 1891 edition see [50]).
Maxwells treatment was given a more homogeneous, more compact notation by J. Willard Gibbs, and a sort of renewed presentation of Maxwells
main conceptual lines appears in the treatise by Sir James Jeans (1925,
[38]). Next step is Strattons textbook with its wellknown list of difficult
problems [59], and then Jacksons book, still the main textbook in the 1970s
and 1980s [37].
When one looks at the way electromagnetic theory is presented in these
books one sees that:
The mathematical framework is calculus the socalled advanced calculus, plus some knowledge of ordinary and partial differential equations and linear algebra.
Presentation of the theorys kernel becomes more and more compact;
its climax is the use of covariant notation for the Maxwell equations.
However covariant notation only appears as a development out of
the set of Maxwell equations in the traditional Gibbsian gradient
divergencerotational vector notation.
So, the main trend observed in the presentation of electromagnetic theory is: the field equations for electromagnetic theory are in each case summarized as a small set of coordinateindependent equations with a very
synthetic notation system. When we need to do actual computations, we
fall back into the framework of classical, 19thcentury analysis, since for
particular cases (actual, realworld, situations), the field equations open
up in general to complicated, quite cumbersome differential equations to
be solved by mostly traditional techniques.
A good reference for the early history of electromagnetism (even if its
views of the subject matter are pretty heterodoxical) is ORahillys tract
book
8:57
JanusFaced Physics
book
31
[52].
General relativity and gravitation
The field equations for gravitation we use today, that is, the Einstein field
equations, are already born in a compact, coordinateindependent form
(1915/1916) [30]. We find in Einsteins original presentation an explicit
striving for a different kind of unification, that of a conceptual unification
of all domains of physics. An unified formalism at that moment meant
that one derived all different fields from a single, unified, fundamental field.
That basic field then naturally splits up into the several component fields,
very much like, or in the search of an analogy to, the situation uncovered
by Maxwell in electromagnetism, where the electric field and the magnetic
field are different concrete aspects of the same underlying unified electromagnetic field.
This trend starts with Weyls theory [67] in 1918 just after Einsteins
introduction in 1915 of his gravitation theory, and culminates in Einsteins beautiful, elegant, but physically unsound unified theory of the
nonsymmetric field (1946, see [29]). Weyls ideas lead to developments
that appear in the treatise by Corson (1953, [13]), and which arrive at the
gauge field equations, or YangMills equations (1954), which were for the
first time examined in depth by Utiyama in 1956 [65].
An apparently different approach appears in the KaluzaKlein unified
theories. Originally unpromising and clumsylooking, the blueprint for
these theories goes back to Kaluza (1921) and then to Klein (1926, [63]). In
its original form, the KaluzaKlein theory is basically the same as Einsteins
gravitation theory over a 5dimensional manifold, with several artificial
looking constraints placed on the fifth dimension; that extra dimension is
associated to the electromagnetic field.
The unpleasantness of having to deal with extraneous conditions that do
not arise out of the theory itself was elegantly avoided when A. Trautmann
in the late 1960s and then later Y. M. Cho, in 1975 [11], showed that the
usual family of KaluzaKleinlike theories arises out of a simile of Einsteins
theory over a principal fiber bundle on spacetime with a semisimple Lie
group G as the fiber. Einsteins Lagrangian density over the principal fiber
bundle endowed with its natural metric tensor splits up as Einsteins usual
gravitational Lagrangian density with the socalled cosmological term plus
an interacting gauge field Lagrangian density; depending on the group G
8:57
32
one gets electromagnetic theory, isospin theory, and so on. The cosmological constant arises in the ChoTrautmann model out of the Lie groups
structure constants, and thus gives a possible geometrical meaning to its
interpretation as dark energy.
Here, conceptual unification and formal unification go hand in hand,
but in order to do so we must add some higherorder objects (principal
fiber bundles and the associated spaces, plus connections and connection
forms) to get our more compact, unified treatment of gravitation together
with gauge fields, which subsume the electromagnetic field. We are but a
step away from a rigorous axiomatic treatment.
Classical mechanics
The first efforts towards an unification of mechanics are to be found in Lagranges Traite de Mecanique Analytique (1811) and in Hamiltons results.
But one may see Hertz as the author of the first unified, mathematically
welldeveloped presentation of classical mechanics in the late 1800s, in a
nearly contemporary mathematical language. His last book, The Principles
of Mechanics, published in 1894, advances many ideas that will later resurface not just in 20th century analytical mechanics, but also in general relativity [33]. Half a century later, in 1949, we have two major developments
in the field: C. Lanczos publishes The Variational Principles of Mechanics,
a brilliant mathematical essay [42] that for the first time presents classical
mechanics from the unified viewpoint of differential geometry and Riemannian geometry. Concepts like kinetic energy or Coriolis force are made into
geometrical constructs (respectively, Riemannian metric and affine connection); several formal parallels between mechanical formalism and that of
general relativity are established. However the style of Lanczos essay is
still that of late 19th century and early 20th century mathematics, and is
very much influenced by the traditional, tensororiented, over a local coordinate domain, presentations of general relativity.
New and (loosely speaking) higherorder mathematical constructs appear when Steenrods results on fiber bundles and Ehresmanns concepts of
connection and connection forms on principal fiber bundles are gradually
applied to mechanics; those concepts go back to the late 1930s and early
1940s, and make their way into the mathematical formulations of mechanics in the late 1950s. Folklore has that the use of symplectic geometry in
book
8:57
JanusFaced Physics
book
33
mechanics first arose in 1960 when a major unnamed mathematician4 circulated a letter among colleagues which formulated Hamiltonian mechanics
as a theory of flows over symplectic manifolds, that is, a Hamiltonian flow is
a flow that keeps invariant the symplectic form on a given symplectic manifold. The symplectic manifold was the old phase space; invariance of the
symplectic form directly led to Hamiltons equations, to Liouvilles theorem
on the incompressibility of the phase fluid, and to the wellknown Poincare
integrals and here the advantage of a compact formalism was made clear,
as the old, computational, very cumbersome proof for the Poincare invariants was substituted for an elegant twoline, strictly geometrical proof.
High points in this direction are Sternbergs lectures (1964, [60]), MacLanes monograph (1968, [47]) and then the AbrahamMarsden treatise,
Foundations of Mechanics [1]. Again one had at that moment a physical
theory fully placed within the domain of a rigorous (albeit intuitive) mathematical framework, as in the case of electromagnetism, gauge field theory
and general relativity. So, the path was open for an axiomatic treatment.
to be Richard Palais.
8:57
34
book
8:57
JanusFaced Physics
book
35
8:57
36
Bourbaki [8] is a syntactical one. One of the authors formulated [14] a semantical version of it which is fully equivalent to the original construction.
That second notion was called a Suppes predicate in the reference.
Loosely speaking, it can be described as follows. Let LZF be the language of ZF set theory. We construct a predicate
P (S, X0 , X1 , . . . , Xn ),
that is to say, a formula of LZF that defines a particular kind S of structures
based on the sets X1 , X2 , . . .. Predicate P is given by the conjunction of
two pieces:
First piece, P1 (S, X0 , X1 , . . . , Xn ), shows how structure S is built out
of the basic sets X0 , X1 , . . . , Xn .
Second piece, P2 (S, X0 , X1 , . . . , Xn ), is the conjunction of the axioms
that we wish S to satisfy.
We get:
P (S, X0 , X1 , . . . , Xn ) Def P1 (S, X0 , X1 , . . . , Xn ) P2 (S, X0 , X1 , . . . , Xn ).
Here P (S, X0 , X1 , . . . , Xn ) is called a species of structures on the basic
sets
X0 , . . . , Xn ,
and the predicate:
X0 , X1 , . . . , Xn P (S, X0 , X1 , . . . , Xn )
is called the class of structures that corresponds to P .
3.5. Axiomatization in mathematics
The preceding construction sketches the required background for our formal
treatment in this essay. It shows the way we will fit usual mathematical concepts like those of group or topological space within a formal framework
like ZF set theory. We will in general assume that those formalizations
have been done as in our examples; if required, each structure and species
of structures we deal with can easily be made explicit (even if with some
trouble). It is in general enough to know that we can axiomatize a theory
of our interest with a Suppes predicate.
An axiomatic theory starts out of some primitive (undefined) concepts
and out of a set of primitive propositions, the theorys axioms or postulates.
book
8:57
JanusFaced Physics
book
37
Other concepts are obtained by definition from the primitive concepts and
from defined concepts; theorems of the theory are derived by proof mechanisms out of the axioms.
Given the settheoretic viewpoint for our axiomatization, primitive concepts are sets which are related through the axioms. We adopt here the
views expressed with the help of Suppes slogan [62], slightly modified to
suit our presentation:
To axiomatize a theory in mathematics, or in the mathematicsbased
sciences, is to define a settheoretic predicate, that is to say, a species of
structures.
8:57
38
will proceed in an informal way, and leave to the archetypical interested reader the
toil and trouble of translating everything that we have done into a fully formal, rigorous,
treatment of our presentation.
book
8:57
JanusFaced Physics
book
39
to be adequate for them. But we may confidently say that our axiomatization covers the whole of classical mechanics, classical field theory and
firstquantized quantum mechanics.
We follow the usual mathematical notation in this subsection. In particular, Suppes predicates are written in a more familiar but essentially
equivalent way.
The species of structures of essentially all classical physical theories
can be formulated as particular dynamical systems derived from the triple
P = hX, G, i, where X is a topological space, G is a topological group,
and is a measure on a set of finite rank over X G and it is easy to put
it in the form of a species of structures.
Thus we can say that the mathematical structures of physics arise out
of the geometry of a topological space X. More precisely, physical objects
are (roughly) the elements of X that:
Exhibit invariance properties with respect to the action of G.
(Actually the main species of structures in classical theories can be
obtained out of two objects, a differentiable finitedimensional real
Hausdorff manifold M and a finitedimensional Lie group G.)
Are generic with respect to the measure for X.
(This means, we deal with objects of probability 1. So, we only deal
with typical objects, not the exceptional ones. This condition isnt
always used, we must note, but anyway measure allows us to identify
the exceptional situations in any construction.)
Lets now give all due details:
Definition 3.1. The species of structures of a classical physical theory
is given by the 9tuple
= hM, G, P, F, A, I, G, B, = i,
which is thus described:
(1) The Ground Structures. hM, Gi, where M is a finitedimensional
real differentiable manifold and G is a finitedimensional Lie group.
(2) The Intermediate Sets. A fixed principal fiber bundle P (M, G)
over M with G as its fiber plus several associated tensor and exterior
bundles.
(3) The Derived Field Spaces. Potential space A, field space F and
the current or source space I. A, F and I are spaces (in general,
8:57
40
book
8:57
JanusFaced Physics
book
41
8:57
42
F C k (F ), if we are dealing with C k crosssections (actually a submanifold in the usual C k topology due to the closure condition dF = 0).
Finally we have two group actions on F: the first one is the Lorentz
Poincare action L which is part of the action of diffeomorphisms of M ; then
we have the (here trivial) action of the group G 0 of gauge transformations of
P when acting on the field manifold F. As it is well known, its action is not
trivial in the nonAbelian case. Anyway it always has a nontrivial action
on the space A of all gauge potentials for the fields in F. Therefore we take
as our symmetry group G the product L G 0 of the (allowed) symmetries
of M and the symmetries of the principal bundle P .
We must also add the spaces A of potentials and of currents, I, as
structures derived from M and S 1 . Both spaces have the same underlying
topological structure; they differ in the way the group G 0 of gauge transformations acts upon them. We obtain I = 1 s1 (M ) and A = I = C k (I).
Notice that I/G 0 = I while A/G 0 6= A.
Therefore we can say that the 9tuple
hM, S 1 , P, F, A, G, I, B, = i
where M is Minkowski space, and B is a set of boundary conditions for our
field equations = , represents the species of mathematical structures
of a Maxwellian electromagnetic field, where P , F and G are derived from
M and S 1 . The Diraclike equation
=
should be seen as an axiomatic restriction on our objects; the boundary
conditions B are (i) a set of derived species of structures from M and S 1 ,
since, as we are dealing with Cauchy conditions, we must specify a local or
global spacelike hipersurface C in M to which (ii) we add sentences of the
form x C f (x) = f0 (x), where f0 is a set of (fixed) functions and the f
are adequate restrictions of the field functions and equations to C.
Consistency of the added axioms
Hamiltonian mechanics
Hamiltonian mechanics is here seen as the dynamics of the Hamiltonian
fluid [1, 3, 42]. Our ground structure for mechanics starts out of basic sets
which are a 2ndimensional real smooth manifold, and the real symplectic
group Sp(2n, R). Phase spaces in Hamiltonian mechanics are symplectic
book
8:57
JanusFaced Physics
book
43
8:57
44
General relativity
General relativity is a theory of gravitation that interpretes this basic force
as originated in the pseudoRiemannian structure of spacetime. That is
to say: in general relativity we start from a spacetime manifold (a 4
dimensional, real, adequately smooth manifold) which is endowed with an
pseudoRiemannian metric tensor. Gravitational effects originate in that
tensor.
Given any 4dimensional, noncompact, real, differentiable manifold M ,
we can endow it with an infinite set of different, nonequivalent pseudo
Riemannian metric tensors with a Lorentzian signature (that is, + ++).
That set is uncountable and has the power of the continuum. (By nonequivalent metric tensors we mean the following: form the set of all such metric
tensors and factor it by the group of diffeomorphisms of M ; we get a set
that has the cardinality of the continuum. Each element of the quotient set
is a different gravitational field for M .)
Therefore, neither the underlying structure of M as a topological
manifold, nor its differentiable structure determines a particular pseudo
Riemannian metric tensor, that is, a specific gravitational field. From the
strictly geometrical viewpoint, when we choose a particular metric tensor g
of Lorentzian signature, we determine a gdependent reduction of the general linear tensor bundle over M to one of its pseudoorthogonal bundles.
The relation
g 7 gdependent reduction of the linear bundle
to a pseudoorthogonal bundle
is 11.
We now follow our recipe:
We take as basic sets a 4dimensional real differentiable manifold of
class C k , 1 k +, and the Lorentz pseudoorthogonal group
O(3, 1).
We form the principal linear bundle L(M ) over M ; that structure is
solely derived from M , as it arises from the covariance properties of the
tangent bundle over M . From L(M ) we fix a reduction of the bundle
group L(M ) P (M, O(3, 1)), where P (M, O(3, 1)) is the principal
fiber bundle over M with the O(3, 1) group as its fiber.
Those will be our derived sets. We therefore inductively define a
Lorentzian metric tensor g on M , and get the couple hM, gi, which is
book
8:57
JanusFaced Physics
book
45
spacetime.
(Notice that the general relativity spacetime arises quite naturally out
of the interplay between the theorys general covariance aspects,
which appear in L(M ), and as we will see in the next section
its gaugetheoretic features, which are clear in P (M, O(3, 1)).)
Field spaces are:
The first is the set (actually a manifold, with a natural differentiable structure) of all pseudoRiemannian metric tensors,
M C k (2 T (M )), where C k (2 T (M )) is the bundle of all C k
symmetric covariant 2tensors over M .
Also out of M and out of adequate associated bundles we get
A, the bundle of all Christoffel connections over M , and F, the
bundle of all RiemannChristoffel curvature tensors over M .
We need the space of source fields, I, that includes energymomentum
tensors, and arise out of adequate associated tensor bundles over M .
G is the group of C k diffeomorphisms of M .
If K is any of the field spaces above, then K/G is the space of physically
distinct fields.
Finally the dynamics are given by Einsteins equations (there is also a
Diraclike formulation for those, first proposed by R. Penrose in 1960
as a neutrinolike equation; see [24]).
The quotient K/G is the way we distinguish concrete, physically diverse,
fields, as for covariant theories one has that any two fields related by an
element of G are the same field.
Classical gauge fields
The mathematics of classical gauge fields can be found in [5, 65]. We follow
here the preceding examples, and in particular the treatment of general
relativity:
The basic sets are a spacetime hM, gi, and a finite dimensional, semi
simple, compact Lie group G.
The derived set is a fixed principal bundle P (M, G) over M with G
as the fiber.
The group of gauge transformations G is the subgroup of all diffeomorphisms of P (M, G) that reduce to a diffeomorphism on M and to
the group action on the fiber.
If `(G) is the Lie algebra of G, we get:
8:57
46
book
8:57
JanusFaced Physics
book
47
8:57
48
book
8:57
JanusFaced Physics
book
49
8:57
50
Turing machine of G
odel number m enters an infinite loop over input a.
Then we can define the halting function :
(m, a) = 1 if and only if Mm (a) .
(m, a) = 0 if and only if Mm (a) .
(m, a) is the halting function for Mm over input a.
isnt algorithmic, of course [54, 64], that is, there is no Turing machine
that computes it.
Then, if is the sign function, (x) = 1 and (0) = 0:
Expressions for the Halting Function
Prop 3.10 (The Halting Function.). The halting function (n, q) is explicitly given by:
(n, q) = (Gn,q ),
Z
Gn,q =
Km
).
1 + Km
Remark 3.11. There is an expression for the Halting Function even within
a simple extension of PA. Let p(n, x) be a 1parameter universal polynomial; x abbreviates x1 , . . . , xp . Then either p2 (n, x) 1, for all x p , or
there are x in p such that p2 (n, x) = 0 sometimes. As (x) when restricted
to is primitive recursive, we may define a function (n, x) = 1p2 (n, x)
such that:
book
8:57
JanusFaced Physics
book
51
X (n, x)
],
q (x)!
q
(x)
where q (x) denotes the positive integer given out of x by the pairing
function : if q maps qtuples of positive integers onto single positive
integers, q+1 = (x, q (x)).
Undecidability and incompleteness
Our main undecidability (and the related incompleteness) results stem from
the following:
Lemma 3.12. There is a Diophantine set D so that
m D x1 , . . . , xn p(m, x1 , . . . , xn ) = 0,
p a Diophantine polynomial, and D is recursively enumerable but not recursive.
Corollary 3.13. For an arbitrary m there is no general decision procedure to check whether p(m, x1 , . . .) = 0 has a solution in the positive
integers.
Main undecidability and incompleteness result
Therefore, given such a p, and F = P (p), where P is an adequate Richardson transform:
Corollary 3.14. For an arbitrary m there is no general decision procedure to check whether, for F and G adequate realdefined and realvalued
functions:
(1) There are real numbers x1 , . . . , xn such that F (m, x1 , . . . , xn ) = 0;
(2) There is a real number x so that G(m, x) < 1;
(3) Whether we have x R (m, x) = 0 or x R (m, x) = 1 over the
reals.
(4) Whether for an arbitrary f (m, x) we have f (m, x) (m, x).
8:57
52
book
8:57
JanusFaced Physics
book
53
8:57
54
(2)
(3)
(4)
(5)
an actual converging computation with input y; if k 0 is the highest integer queried during one such computation, and if dA cA is an initial
segment of the characteristic function cA , we take as a standby for D
and D the initial segment dA where the length l(dA ) = k 0 + 1.
We can effectively list all oracle machines with respect to a fixed A, so
that, given a particular machine we can compute its index (or Godel
number) x, and given x we can recover the corresponding machine.
Given an Apartial recursive function A
x , we form the oracle Turing
machine that computes it. We then do the computation A
x (y) = z that
outputs z. The initial segment dy,A is obtained during the computation.
The oracle machine is equivalent to an ordinary twotape Turing machine that takes as input hy, dy,A i; y is written on tape 1 while dy,A is
written on tape 2. When this new machine enters state s0 it proceeds
as the oracle machine. (For an ordinary computation, no converging
computation enters s0 , and dy,A is empty.)
The twotape Turing machine can be made equivalent to a onetape
machine, where some adequate coding places on the single tape all the
information about hy, dy,A i. When this third machine enters s0 it scans
dy,A .
We can finally use the standard map that codes nples 11 onto and
add to the preceding machine a Turing machine that decodes the single
natural number (hy, dy,A i) into its components before proceeding to
the computation.
Let w be the index for that last machine; we note the machine w .
If x is the index for A
x , we write w = (x), where is the effective
11 procedure above described that maps indices for oracle machines into
indices for Turing machines. Therefore,
A
x (y) = (x) (hy, dy,A i).
Now let us note the universal polynomial p(n, q, x1 , . . . , xn ). We can
define the jump of A as follows:
A0 = {(z) : x1 , . . . , xn p((z), hz, dz,A i, x1 , . . . , xn ) = 0}.
With the help of the Richardson map described above, we can now form
a function modelled after the function that settles the Halting Problem;
it is the desired characteristic function:
c0 (x) = ((x), hx, dx,0 i).
book
8:57
JanusFaced Physics
book
55
8:57
56
For a > b, a b = a b.
For a < b, a b = 0.
In the next result, Z is the set of integers. The starting point is the
following consequence of a wellknown result which we now quote: let N
be a model, N |= T , and N makes T arithmetically sound. Then:
Prop 3.22. If T is arithmetically sound, then we can algorithmically
construct a polynomial expression q(x1 , . . . , xn ) over Z such that M |=
x1 , . . . , xn q(x1 , . . . , xn ) > 0}, but
T 6` x1 , . . . , xn q(x1 , . . . , xn ) > 0
and
T 6` x1 , . . . , xn q(x1 , . . . , xn ) = 0.
Proof: Let LT be an undecidable sentence obtained for T with
the help of G
odels diagonalization; let n be its Godel number and let
mT be the G
odel coding of proof techniques in T (of the Turing machine that enumerates all the theorems of T ). For an universal polynomial
p(m, q, x1 , . . . , xn ) we have:
q(x1 , . . . , xn ) = (p(mT , n , x1 , . . . , xn ))2 .
Corollary 3.23. If PA is consistent then we can find within it a polynomial p as in Proposition 3.22.
We can also state and prove a weaker version of Proposition 3.22:
Prop 3.24. If T is arithmetically sound, there is a polynomial expression
over Z p(x1 , . . . , xn ) such that N |= x1 , . . . , xn p(x1 , . . . , xn ) > 0,
while
T 6` x1 , . . . , xn p(x1 , . . . , xn ) > 0
and
T 6` x1 , . . . , xn p(x1 , . . . , xn ) = 0.
Proof: See [21]. If p(m, x1 , . . . , xn ), m = hq, ri, is an universal polynomial with being Cantors pairing function [54], then {m : x1 . . .
p(m, x1 , . . .) = 0} is recursively enumerable but not recursive. Therefore
there must be an m0 such that x1 . . . (p(m0 , x1 , . . .))2 > 0. (This
book
8:57
JanusFaced Physics
book
57
(m+1) = {x : x
(x)}
8:57
58
book
(x) if and
(m+1)
(n)
)) =
8:57
JanusFaced Physics
book
59
= + [1 (p(m0 (m ), x1 , . . . , xn ) + 1)],
where p(. . .) is as in Proposition 3.29.
Remark 3.33. Rogers discusses the rank within the arithmetical hierarchy
of wellknown open mathematical problems ( [54], p. 322), such as Fermats
Conjecturewhich in its usual formulation is demonstrably equivalent to a
01 problem,6 or unsettled questions such as Riemanns Hypothesis, which
is also stated as a 01 problem. On the other hand, the P < N P hypothesis in computer science is formulated as a 02 sentence that can be made
equivalent to an intuitive 01 sentence, while its negation, the P = N P conjecture, can be formalized as a 01 sentence within Peano Arithmetic [19].
Rogers conjectures that our mathematical imagination cannot handle
more that four or five alternations of quantifiers. However the preceding
result shows that any arithmetical nontrivial property within T can give
rise to intractable problems of arbitrarily high rank.
We stress the need for the extension T PA, since otherwise we
wouldnt be able to find an expression for the characteristic function of
a set with a high rank in the arithmetical hierarchy within our formal language.
An extension of the preceding result is:
Corollary 3.34. If T is arithmetically sound then, for any nontrivial P
there is a LT such that P () is arithmetically expressible, N |= P ()
but only demonstrably equivalent to a 0n+1 assertion and not to a lower
one in the hierarchy.
Proof: Put
= + (m+1) ,
where one uses Corollary 3.31.
6 The
question of whether Wiles proof can be fully formalized within ZFC is still open,
and so, while we know that Fermats Theorem is true of the standard integers, we dont
know which minimum axiomatic resources are required for its proof.
8:57
60
Beyond arithmetic
We recall:
Definition 3.35.
() = {hx, yi : x (y) },
for x, y .
Then:
Definition 3.36.
() (m) = c() (m),
where c() (m) is obtained as in Proposition 3.17.
Still,
Definition 3.37.
(+1) = (() )0 .
book
8:57
JanusFaced Physics
book
61
8:57
62
have within formal systems to handle expressions for the objects in those
systems.
(When we say the undecidability and incompleteness of dynamical systems, we are making an abuse of language: more precisely, we mean the
undecidability or incompleteness properties of the formal theory of those
systems developed with the help of the language of classical analysis.)
Undecidability in classical mechanics
Lets go straight to the point and ask three questions in order to give a
good example:
(1) Given a Hamiltonian h, do we have an algorithm that tells us whether
the associated Hamiltonian dynamical system Xh can be integrated by
quadratures?
(2) Given a Hamiltonian h such that Xh can be integrated by quadratures,
can we algorithmically find a canonical transformation that will do the
trick?
(3) Can we algorithmically check whether an arbitrary set of functions is
a set of first integrals for a Hamiltonian system?
The answer to those questions is, no. Proof follows from the techniques
developed in the previous sections [16].
Chaos theory is undecidable and incomplete
We finally reach the question that originally motivated our quest. Let X
be a smooth vectorfield on a differentiable manifold M . Can we algorithmically check whether X has some kind of chaotic behavior in any of
the usual meanings for that word, that is, given an arbitrary vectorfield X,
can we algorithmically decide whether X is chaotic?
This problem was explicitly discussed by M. Hirsch [35] when he makes
some remarks about the Lorenz system of equations [43]:
(. . . ) By computer simulation Lorenz found that trajectories seem to
wander back and forth between two particular stationary states, in a random, unpredictable way. Trajectories which start out very close together
eventually diverge, with no relationship between long run behaviors.
But this type of chaotic behavior has not been proved. As far as I am
aware, practically nothing has been proved about this particular system
(. . . )
book
8:57
JanusFaced Physics
book
63
8:57
64
For the third assertion, we know that geodesic flows on such an M are
Bernouillian. Then, if X is one such flow, we write
Zm = (m)X.
Again we cannot in general decide whether we have a trivial zero field or a
Bernouillian flow.
We conclude with an incompleteness theorem:
Prop 3.45. If T contains an axiomatization of dynamical system theory
then there is an expression X in the language of T so that
T ` T is a dynamical system
while
T 6` X is chaotic
and
T 6` (X is chaotic),
for any nontrivial predicate in the language of T that formally characterizes
chaotic dynamical systems.
That is to say, chaos theory is undecidable and incomplete (in its axiomatic version), no matter what nontrivial definition that we get for chaos.
3.11. Janusfaced physics
Theoretical physics has two faces. On one side, it allows us to do computations, to quantitatively obtain data that describe and predict the behavior
of realworld systems. On the other side it allows us to imagine the inner
workings of the phenomena out of the mathematical tools used in their
description. This is the conceptual game we mentioned before; we believe
that it clarifies and complements Chaitins vision of physical theories as
algorithmic devices. The plethora of incompleteness results weve offered
is of course a parallel to his vision of randomness as deeply ingrained into
mathematics.
Lets consider one aspect of our work as an example. We have axiomatized physics with axiom systems where the dynamical rule is given by
Diraclike equations, instead of the more commonplace variational principles. Diraclike equations are today an important tool in differential
book
8:57
JanusFaced Physics
book
65
geometry [57], where they are used in the classification of bundles over
4dimensional manifolds (and 4dimensional differential manifolds are our
current formal depiction of spacetimes). They appear in Ktheory and in
modern theories of cohomology. When one uses Diraclike equations in the
axiomatization of physical theories one wishes to stress the wideranging
relations that such a mathematical object has within todays geometry.
Diraclike equations are a kind of crossroadconcept in todays mathematics. They lead to manifold routes, many of them still unexplored.
Weve briefly mentioned 4dimensional differential manifolds. The problem of the physical meaning of exotic, fake spacetimes, if any, is still wide
open. Weve, again briefly, mentioned it before [15], when we pointed out
that there is an extra difficulty in that question: once we have uncountable
many exotic differentiable structures for some fixed adequate topological
4manifold, we will have uncountable many set theoretically generic differentiable structures for that manifold in adequate models for our theory.
What is their meaning?
We dont know. Our axiomatization for classical and firstquantized
physics opens up large vistas towards unknown, totally new, landscapes.
Such is its raison detre.
Acknowledgments
The authors wish to thank their institutions for support. N. C. A. da Costa
acknowledges support from CNPq, Philosophy Section; F. A. Doria thanks
C. A. Cosenza and S. Fuchs for their invitation to join respectively the
Fuzzy Sets Research Group and the Philosophy of Science Group at the
Production Engineering Program, COPPEUFRJ.
References
[1] R. Abraham, J. Marsden, Foundations of Mechanics, 2nd. ed., Addison
Wesley (1978).
[2] S. Albeverio, J. E. Fenstad, R. H
oghKrohn, T. Lindstr
om, Nonstandard
Methods in Stochastic Analysis and Mathematical Physics, Academic (1986).
[3] V. I. Arnold, Les Methodes Mathematiques de la Mecanique Classique, Mir,
Moscow (1976).
[4] C. J. Ash and J. Knight, Computable Structures and the Hyperarithmetical
Hierarchy, Elsevier (2000).
[5] M. F. Atiyah, Geometry of YangMills Fields, Lezioni Fermiane, Pisa (1979).
8:57
66
[6] P. A. Benioff, Models of Zermelo Frankel set theory as carriers for the
mathematics of physics. I, J. Math. Phys. 32, 618 (1976).
[7] P. A. Benioff, Models of Zermelo Frankel set theory as carriers for the
mathematics of physics. II, J. Math. Phys. 32, 629 (1976).
[8] N. Bourbaki, Set Theory, Hermann and AddisonWesley (1968).
[9] R. Carnap, The Logical Syntax of Language, Routledge and Kegan Paul
(1949).
[10] R. Carnap, Introduction to Symbolic Logic and its Applications, Dover
(1958).
[11] Y. M. Cho, Higherdimensional unifications of gravitation and gauge theories, J. Math. Physics 16, 2029 (1975).
[12] L. Corry, David Hilbert and the axiomatization of physics (18941905),
Arch. Hist. Exact Sciences 51, 83 (1997).
[13] E. M. Corson, Introduction to Tensors, Spinors and Relativistic Wave
Equations, Blackie & Sons. (1953).
[14] N. C. A. da Costa and R. Chuaqui, On Suppes settheoretical predicates,
Erkenntnis 29 95 (1988).
[15] N. C. A. da Costa and F. A. Doria, A Suppes predicate for general relativity
and settheoretically generic spacetimes, Int. J. Theoretical Phys. 29, 935
(1990).
[16] N. C. A. da Costa and F. A. Doria, Undecidability and incompleteness in
classical mechanics, Int. J. Theoretical Physics 30, 1041 (1991).
[17] N. C. A. da Costa and F. A. Doria, Suppes predicates and the construction
of unsolvable Problems in the axiomatized sciences, P. Humphreys, ed.,
Patrick Suppes, Scientific Philosopher, II, 151191 Kluwer (1994).
[18] N. C. A. da Costa and F. A. Doria, Computing the Future, in K. V.
Velupillai, ed., Computability, Complexity and Constructivity in Economic
Analysis, Blackwell (2005).
[19] N. C. A. da Costa, F. A. Doria and E. Bir, On the metamathematics of
P vs. N P , to appear in Appl. Math. Computation (2007).
[20] N. C. A. da Costa and S. French, Science and Partial Truth, Oxford (2003).
[21] M. Davis, Hilberts Tenth Problem is unsolvable, Amer. Math. Monthly
80, 233 (1973).
[22] M. Davis, Computability and Unsolvability, Dover (1982).
[23] P. A. M. Dirac, The Principles of Quantum Mechanics, Oxford U. P. (1967).
[24] F. A. Doria, A Diraclike equation for the gravitational field, Lett Nuovo
Cimento 14, 480 (1975).
[25] F. A. Doria, A Lagrangian formulation for noninteracting highspin fields,
J. Math. Phys. 18, 564 (1977).
[26] F. A. Doria, Informal and formal mathematics, to appear in Synth`ese
(2007).
[27] F. A. Doria, A. F. Furtado do Amaral, S. M. Abrah
ao, A Diraclike equation for gauge fields, Progr. theor. Phys 75, 1440 (1986).
[28] F. A. Doria and J. F. Costa, eds., Special issue on hypercomputation, Applied
Math. Computation 178 (2006).
[29] A. Einstein, The Meaning of Relativity, Methuen (1967).
book
8:57
JanusFaced Physics
book
67
8:57
68
book
8:57
book
Chapter 4
The Implications of a Cosmological Information Bound
for Complexity, Quantum Information and the Nature of
Physical Law
P. C. W. Davies
BEYOND: Center for Fundamental Concepts in Science
Arizona State University, USA; deepthought@ asu. edu
Whereof one cannot speak, thereof one must remain silent.
Ludwig Wittgenstein1
Is our universe a polynomial or an exponential place?
Scott Aaronson 2
8:57
70
P. C. W. Davies
question: What are the laws of physics and where do they come from?
The subsidiary question, Why do they have the form that they do? I have
discussed in detail elsewhere.4
First let me articulate the orthodox position, adopted by most theoretical physicists, which is that the laws of physics are immutable, absolute,
eternal, perfect mathematical relationships, infinitely precise in form. The
laws were imprinted on the universe at the moment of creation, i.e. at
the big bang, and have since remained fixed in both space and time. The
properties of the physical universe depend in an obvious way on the laws
of physics, but the basic laws themselves depend not one iota on what happens in the physical universe. There is thus a fundamental asymmetry: the
states of the world are affected by the laws, but the laws are completely
unaffected by the states a dualism that goes back to the foundation of
physics with Galileo and Newton. The ultimate source of the laws is left
vague, but it is tacitly assumed to transcend the universe itself, i.e. to
lie beyond the physical world, and therefore beyond the scope of scientific
inquiry. The proper task of the physicist, it is often said, is to discover the
forms of the laws using reason and experiment, adopt them pragmatically,
and get on with the job of determining their consequences. Inquiry into
their origin is discouraged as a quasi-religious quest.
The orthodox view of the nature of physical laws conforms well to the
mathematical doctrine of Platonism. Plato regarded mathematical forms
and relationships as enjoying a real existence in an otherworldly realm,
where mathematicians come upon them in a voyage of intellectual discovery.
A Platonist regards mathematics as possessing an existence independent of
the physical universe, rather than being a product of the human brain. An
essential quality of the Platonic heaven is that the mathematical forms it
contains are perfect. For example, circles are exactly round, in contrast to
circles in the physical universe, which are always flawed approximations to
the idealized Platonic forms.
Most theoretical physicists are by temperament Platonists. They envisage the laws of physics too as perfect idealized mathematical relationships
and operations that really exist, located in an abstract realm transcending
the physical universe. I shall call this viewpoint physical Platonism to distinguish it from mathematical Platonism. Newton was a physical Platonist,
and cast his laws of mechanics and gravitation in terms of what we would
now call real numbers and differentiable functions. Taking Newtons laws
seriously implies accepting infinite and infinitesimal quantities, and arbi-
book
8:57
book
71
trary precision. The idealized, Platonic notion of the laws of physics reached
its zenith with the famous claim of Laplace, concerning an omniscient demon. Laplace pointed out that the states of a closed deterministic system,
such as a finite collection of particles subject to the laws of Newtonian mechanics, are completely fixed once the initial conditions are specified5 .
We may regard the present state of the universe as the effect of its
past and the cause of its future. An intellect which at any given moment
knew all of the forces that animate nature and the mutual positions of the
beings that compose it, if this intellect were vast enough to submit the
data to analysis, could condense into a single formula the movement of the
greatest bodies of the universe and that of the lightest atom; for such an
intellect nothing could be uncertain and the future just like the past would
be present before its eyes.
If Laplaces argument is taken seriously, on the assumptions adopted,
then everything that happens in the universe, including Laplaces decision
to write the above words, my decision to write this article, Chaitins beautiful work on Omega, etc. are all preordained. The information about these
events is already contained in the state of the universe at any previous
time. To get some idea of the demons gargantuan task, note the following. If the demon overlooked the gravitational force of a single electron
located at the edge of the observable universe, then his prediction for the
motion of a given molecule of air in your living room would be rendered
completely uncertain after only 12 intermolecular collisions.6 This arresting example reveals how exquisitely sensitive to error predicting the future
can be. Laplaces vignette is based on classical mechanics, and is usually
dismissed by invoking quantum mechanics, or arguing that the universe is
an open system, but this misses the point. The real absurdity in Laplaces
statement is its implicit reliance on physical Platonism extrapolated to a
staggering degree, made without any experimental foundation whatever.In
spite of the fact that we now know Newtonian mechanics is only an approximation, physical Platonism remains the dominant philosophy among
theoretical physicists. The project of quantum cosmology, for example,
is predicated on the assumption that the laws of quantum mechanics and
general relativity exist independently of the universe, and may therefore
be invoked to explain how the universe came to exist from nothing. In the
fashionable subject of string/M theory, the string Lagrangian, or whatever
else serves to determine the unified dynamics, is assumed to somehow already exist, so that from it may (one day) flow an explanation for space,
8:57
72
P. C. W. Davies
book
8:57
book
73
(1)
The bound (1) is not fixed, but grows with time as the horizon expands
and encompasses more particles:
Iuniverse t2 .
(2)
(3)
where M and A are the mass and area of the black hole respectively, and the
other symbols have their usual meanings as various fundamental constants
of nature.
The fact that the entropy is a function of black hole area, as opposed to
volume, is deeply significant. In the case of a laboratory gas, for example,
entropy is additive: twice the volume of a (homogeneous) gas will have
twice the entropy. Evidently, when gravitation enters the picture, the rules
of the game change fundamentally. Entropy can been regarded as a measure
of information I (or information loss), through the relationship
S = k log2 I
(4)
8:57
74
P. C. W. Davies
length LP (G/~c3 )1/2 as a fundamental unit, and note that, using Eq.
(4), the information of the black hole is simply one quarter of the horizon
area in Planck units.
Early on, Bekenstein sought to generalize his result by postulating that
Eq. (1) serves as a universal bound on entropy (or information content)
applicable to any physical system.11 That is, the information content of a
physical system can never, he claims, exceed one quarter of the area of its
encompassing surface. The black hole saturates the Bekenstein bound, and
represents the maximum amount of information that can be packed into the
volume occupied by the hole, as befits the equilibrium end state of a gravitating system. A simple argument in support of the universal Bekenstein
bound is that if a system confined to a certain region of space possessed
an information content in excess of the bound, one could then add some
matter and induce this system to undergo gravitational collapse to a black
hole, thereby reducing its entropy and violating the second law of thermodynamics (suitably generalized to include event horizon area). However,
the Bekenstein bound remains a conjecture: a general proof is lacking.The
idea of associating entropy and information with horizon area was soon
extended to include all event horizons, not just those surrounding black
holes. For example, if the universe becomes dominated by dark energy,
which is what current astronomical observations suggest, it will continue
to expand at an accelerating rate (dark energy acts as a sort of antigravity
force). This creates a cosmological event horizon, which may be envisaged
as a roughly spherical surface that bounds the region of the universe to
which we can ever have causal and informational access. A similar horizon
characterizes the period of inflation, widely believed to have occurred at
about 1034 s after the big bang. Generalizations of horizon entropy have
been proposed for cosmological horizon area too, with de Sitter space (a
universe subject to dark energy alone) saturating the Bekenstein bound, by
Gibbons and Hawking12 , Bousso13 , and Davis and Davies14 . A number of
calculations support the proposal. Based on the foregoing ideas, t Hooft15
and Susskind16 have proposed the so-called holographic principle, according
to which the information content of the entire universe is captured by an
enveloping surface that surrounds it. The principle states that the total
information content of a region of space cannot exceed one quarter of the
surface area that confines it (other variants of the holographic principle
have been proposed, with different definitions of the enveloping area), and
that this limit is attained in the case of the cosmological event horizon. A
book
8:57
book
75
simple calculation of the size of our universes event horizon today based on
the size of the event horizon created by the measured value of dark energy
gives an information bound of 10122 bits, the same as found by Lloyd using
the particle horizon. The event horizon also expands with time, and at this
epoch is roughly the same radius as the particle horizon, but unlike the
latter, it asymptotes to a constant value not a lot greater than its present
value (assuming that the density of dark energy is constant). So whether
we take the particle horizon or the event horizon, or a more generalized
holographic principle, as the basis for the calculation, we discover an upper
bound like (1) on the information content of a causal region of the universe.
How might the bound affect physics and cosmology? The answer to
this question depends critically on ones assumptions about the nature of
information. The traditional logical dependence of laws, states of matter
and information is
A. laws of physics matter information.
Thus, conventionally, the laws of physics form the absolute and eternal
bedrock of physical reality and, as mentioned, cannot be changed by anything that happens in the universe. Matter conforms to the given laws,
while information is a derived, or secondary property having to do with
certain special states of matter. But several physicists have suggested that
the logical dependence should really be as follows:
B. laws of physics information matter.
In this scheme, often described informally by the dictum the universe
is a computer, information is placed at a more fundamental level than
matter. Nature is regarded as a vast information-processing system, and
particles of matter are treated as special states which, when interrogated
by, say, a particle detector, extract or process the underlying quantum
state information so as to yield particle-like results. It is an inversion famously encapsulated by Wheelers pithy phrase It from bit.17 Treating
the universe as a computer has been advocated by Fredkin18 , Lloyd8 and
Wolfram19 among others. An even more radical transformation is to place
information at the base of the logical sequence, thus
C. information laws of physics matter.
The attraction of scheme C is that, after all, the laws of physics are
informational statements.
8:57
76
P. C. W. Davies
For most purposes the order of logical dependence does not matter
much, but when it comes to the information bound on the universe, one is
forced to confront the status of information: is it ontological or epistemological? If information is simply a description of what we know about the
physical world, as is implied by Scheme A, there is no reason why Mother
Nature should care about the limit (1). Or, to switch metaphors, the
bedrock of physical reality according to Scheme A is sought in the perfect
laws of physics, which live elsewhere, in the realm of the gods the Platonic
domain they are held by tradition to inhabit where Mother Nature can
compute to arbitrary precision with the unlimited quantity of information
at her disposal. According to orthodoxy, the Platonic realm is the real
reality, while the world of information is but the shadow on Platos cave.
But if information underpins physical reality if, so to speak, it occupies
the ontological basement (as is implied in Scheme C and perhaps B) then
the bound on I universe represents a fundamental limitation on all reality,
not merely on states of the world that humans perceive.
Someone who advocated precisely this latter position was Rolf Landauer, a former colleague of Chaitins at IBM. He explicitly took the view
that the universe computes in the universe, because he believed, as he
was fond of declaring, that information is physical. And Landauer was
quick to spot the momentous consequences of this shift in perspective:
The calculative process, just like the measurement process, is subject
to some limitations. A sensible theory of physics must respect these limitations, and should not invoke calculative routines that in fact cannot be
carried out.20
In other words, in a universe limited in resources and time a universe
subject to the information bound (1) in fact concepts like real numbers,
infinitely precise parameter values, differentiable functions, the unitary evolution of a wave function are a fiction: a useful fiction to be sure, but
a fiction nevertheless, and with the potential to mislead. It then follows
that the laws of physics, cast as idealized infinitely precise mathematical
relationships inhabiting a Platonic heaven, are also a fiction when it comes
to applications to the real universe. Landauers proposal that our theories
should be constrained by the possibly finite resources of the universe
has been independently developed in recent years by Benioff.21
If one adopts Landauers philosophy, then some serious consequences
follow. In effect, one cannot justify the application of the laws of physics in
book
8:57
book
77
8:57
78
book
P. C. W. Davies
significant, and may have left a trace on the structure of the universe that
could be used to test the existence of the bound. Inflation is a brief episode
of exponential expansion thought to have occurred at about 1034 s after
the big bang. At that time, the horizon size was about 3 1024 cm,
yielding a surface of about 1019 Planck areas. The information bound
then implies for the cosmological scale factor change
a(taf ter )/a(tbef ore ) < 1019 .
(5)
8:57
book
79
(7)
The same result may be derived in a completely different way, by imposing the condition on the vacuum energy that at every scale of size L,
the energy density must not exceed the level at which the total mass within
a volume L3 is greater than the mass of a black hole of size L, otherwise
the vacuum energy would presumably undergo gravitational collapse. This
requirement may be expressed as follows:
c2 L3 < Mbh (L).
(8)
Substituting the right hand side of Eq. (5) for we obtain, to an order
or magnitude,
G~ 4 L3 /c7 < L
(9)
or
< c4 /GL2
(10)
(11)
where P is the Planck energy density and H is the Hubble energy density,
defined to be the energy density of a single quantum in a Hubble volume
with a wavelength equal to the Hubble radius.
This remarkable result that the cosmological information bound explains the magnitude of the dark energy comes at a price, however. The
same reasoning may be applied to the pressure of the vacuum, p, which for
a massless scalar field is
p = 21 ~cL1
(12)
8:57
80
book
P. C. W. Davies
i.e. p = , which is the necessary equation of state for the vacuum energy
to play the role of dark energy. Now recall that the information bound
varies with time in the manner indicated by Eq. (2). Hence the cut-off
in the summation in both Eqs. (5) and (12) will be time-dependent, so
the dark energy is also predicted to be time-dependent. This raises an
immediate difficulty with the law of energy conservation:
pda3 + d(a3 ) = 0
(13)
8:57
book
81
of the wave function, that describe the system. If it were possible to control
all the components, or branches, of the wave function simultaneously, then
the quantum system would be able to process information exponentially
more powerfully than a classical computer. This is the aspiration of the
quantum computation project.
Because the complexity of an entangled state rises exponentially with
the number of qubits (which is its virtue), large-scale quantum information
processing comes into conflict with the information bound. Specifically, a
quantum state with more components than about n = log2 Iuniverse will
require more bits of information to specify it than can be accommodated
in the entire observable universe! Using the bound given by inequality (1),
this yields a limit of approximately n = 400. In other words, a generic
entangled state of more than about 400 particles will have a quantum state
with more components than Iuniverse , evolving in a Hilbert space with more
dimensions than Iuniverse . The question therefore arises of whether this violation of the information bound (1) signals a fundamental physical limit.
It seems to me that it must.
On the face of it, the limit of 400 particles is stringent enough to challenge the quantum computation industry, in which a long-term objective
is to entangle many thousands or even millions of particles and control
the evolution of the quantum state to high precision. The foregoing analysis, however, is overly simplistic. First, note that the dimensionality of
the (non-redundant part of the) Hilbert space is not an invariant number:
by changing the basis, the number might be reduced. So specifying the
complexity of a quantum state simply by using the dimensionality of the
Hilbert space can be misleading. A more relevant criterion is the number of
independent parameters needed to specify inequivalent n-component quantum systems. This problem has been addressed, but it is a difficult one on
which only limited progress has so far been made.27 Second, the dimensionality of the Hilbert space serves to define the number of amplitudes needed
to specify a generic superposition. But the amplitudes themselves require
additional information to specify them; indeed, a single complex number
coefficient i will mostly contain an infinite number of bits of information.
If we are to take the bound (1) seriously, then it must be applied to the
total algorithmic information content of the amplitude set over the entire
Hilbert space. Following Chaitin, the algorithmic information measure of
a binary string X is defined as
H(X) = lnP (X)+O(1)
(14)
8:57
82
book
P. C. W. Davies
where P (X) is the probability that the proverbial monkey typing randomly
on a typewriter will generate a program which, when run on a universal
Turing machine, will output X. Applied to the amplitude set {i } of a
generic quantum state (plus any ancillary information needed to specify the
state, such as constraints), the cosmological information bound (1) may be
expressed as follows:
H({i }) < Aholo /L2P
(15)
where Aholo is the area of the appropriate holographic surface (e.g. a cosmological event horizon). Inequality (17) is a stronger constraint than (1),
appropriate to the interpretation of information as ontological and fundamental, and therefore including not merely a head-count of the degrees of
freedom, but the algorithmic information content of all the specifying parameters of the state too. This extra informational burden on the bound
will reduce somewhat the dimensionality of the Hilbert space at which unitary evolution is expected to break down.
A more subtle issue concerns the specific objectives of quantum computation, which is not to control the dynamical evolution of arbitrary entangled quantum states, but an infinitesimal subset associated with certain
mathematical problems of interest, such as factoring. It is trivially true that
it is impossible to prepare, even approximately, a state containing more
than 10122 truly independent parameters because it is impossible to even
specify such a state: there are not enough bits in the universe to contain
the specification. Almost all states fall into this category of being impossible to specify, prepare and control. So in this elementary sense, generic
quantum computation is obviously impossible. Less obvious, however, is
whether the subset of states (of measure zero) of interest to the computing
industry is affected by the cosmological information bound, for even if it is
the case that the number of independent amplitudes exceeds 10122 , there
may exist a compact mathematical algorithm to generate those amplitudes.
(The algorithm for generating the amplitudes that specify the initial state
should not be confused with the algorithm to be executed by the quantum computer dynamics.) For example, the amplitudes of the quantum
computers initial state could be the (unending) digits of , which can be
generated by a short algorithm. That is, the set of amplitudes may contain
an unbounded number of bits of information, but a finite (and even small)
number of bits might be sufficient to define the generating algorithm of the
amplitude set. So if the information bound on the universe is interpreted as
an upper limit on the algorithmic information (as opposed to the Shannon
8:57
book
83
8:57
84
P. C. W. Davies
book
8:57
book
85
the product of real computational processes (rather than existing independently in a Platonic realm) then there is a self-consistent loop: the laws
of physics determine what can be computed, which in turn determines the
informational basis of those same laws of physics. Benioff has considered
a scheme in which mathematics and the laws of physics co-emerge from
a deeper principle of mutual self-consistency,32 thus addressing Wigners
question of why mathematics is so unreasonably effective in describing
the physical world.33 I have discussed these deeper matters elsewhere.34
Acknowledgments
I should like to thank Scott Aaronson, Ted Jacobson, Gerard Milburn,
William Phillips, Sandu Popescu and Leonard Susskind for helpful comments, conversations and guidance.
Footnotes
(1) Wittgenstein, L. (1921) Tractatus Logico-Philosophicus, English translation: David Pears and Brian McGuinness (Routledge, London 1961).
(2) Aaronson, S. (2005) Are quantum states exponentially long vectors?
Proceedings of the Oberwolfach Meeting on Complexity Theory (to be
published).
(3) Chaitin, G. (2005) Meta Math! The Quest for Omega (Pantheon Books,
New York), 115.
(4) Cosmic Jackpot by Paul Davies (Houghton Mifflin, New York 2007).
(5) Laplace, P. (1825) Philosophical Essays on Probabilities (trans. F.L.
Emory and F.W. Truscott, Dover, New York 1985).
(6) I am grateful to Michael Berry for drawing my attention to this example.
(7) The limits of reason, by Gregory Chaitin, Scientific American March
2006, p. 74.
(8) Lloyd, S. (2002) Computational capacity of the universe, Phys. Rev.
Lett. 88, 237901;
Lloyd, S. (2006) The Computational Universe (Random House, New
York).
(9) Bekenstein, J. (1973) Phys. Rev. D 8, 2333.
(10) Hawking, S.W. (1975) Comm. Math. Phys. 43, 199.
(11) Bekenstein, J (1981) Phys. Rev. D 23, 287.
(12) Gibbons, G.W, and Hawking, S.W. (1977) Phys. Rev. D 15, 2738.,
Bousso, R. (1999) J. High Energy Phys. 7, 4., Davies, P.C.W. and
8:57
86
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
P. C. W. Davies
book
8:57
book
87
8:57
88
P. C. W. Davies
book
8:57
Chapter 5
What is a Computation?
Martin Davis1
Department of Computer Science, Courant Institute of Mathematical
Sciences, New York University, USA; martin@ eipye. com
On numerous occasions during the Second World War, members of the German high command had reason to believe that the allies knew the contents
of some of their most secret communications. Naturally, the Nazi leadership was most eager to locate and eliminate this dangerous leak. They were
convinced that the problem was one of treachery. The one thing they did
not suspect was the simple truth: the British were able to systematically
decipher their secret codes. These codes were based on a special machine,
the Enigma, which the German experts were convinced produced coded
messages that were entirely secure. In fact, a young English mathematician,
Alan Turing, had designed a special machine for the purpose of decoding
messages enciphered using the Enigma. This is not the appropriate place
to speculate on the extent to which the course of history might have been
different without Turings ingenious device, but it can hardly be doubted
that it played an extremely important role.
In this essay we will discuss some work which Alan Turing did a few years
before the Second World War whose consequences are still being developed.
What Turing did around 1936 was to give a cogent and complete logical
analysis of the notion of computation. Thus it was that although people
have been computing for centuries, it has only been since 1936 that we have
possessed a satisfactory answer to the question: What is a computation?
1 Originally published in L. A. Steen, Mathematics Today: Twelve Informal Essays,
Springer-Verlag, New York, 1978, pp. 241267. Re-published with the kind permission
of Springer.
89
book
8:57
90
Martin Davis
Alan M. Turing
Alan M. Turing was born in 1912, the second son in an upper class
English family. After a precocious childhood, he had a distinguished career as a student at Cambridge University. It was shortly after graduation
that Turing published his revolutionary work on computability. Turings
involvement in the deciphering of German secret codes during the Second
World War has only recently become public knowledge. His work has included important contributions to mathematical logic and other branches
of mathematics. He was one of the first to write about the possibility of
computer intelligence and his writings on the subject are still regarded as
fundamental. His death of cyanide poisoning in June 1954 was officially
adjudged suicide.
book
8:57
What is a Computation?
book
91
Turings analysis provided the framework for important mathematical investigations in a number of directions, and we shall survey a few of them.
Turings analysis of the computation process led to the conclusion that
it should be possible to construct universal computers which could be
programmed to carry out any possible computation. The existence of a
logical analysis of the computation process also made it possible to show
that certain mathematical problems are incapable of computational solution, that they are, as one says, unsolvable. Turing himself gave some
simple examples of unsolvable problems. Later investigators found that
many mathematical problems for which computational solutions had been
sought unsuccessfully for many years were, in fact, unsolvable. Turings
logical proof of the existence of universal computers was prophetic of the
modern all-purpose digital computer and played a key role in the thinking of such pioneers in the development of modern computers as John von
Neumann. (Likely these ideas also played a role in Turings seeing how
to translate his cryptographic work on the German codes into a working
machine.) Along with the development of modern computers has come a
new branch of applied mathematics: theory of computation, the application
of mathematics to the theoretical understanding of computation. Not surprisingly, Turings analysis of computation has played a pivotal role in this
development.
Although Turings work on giving a precise explication of the notion of
computation was fundamental because of the cogency and completeness of
his analysis, it should be stated that various other mathematicians were
independently working on this problem at about the same time, and that
a number of their formulations have turned out to be logically equivalent
to that of Turing. In fact the specific formulation we will use is closest to
one originally due to the American mathematician Emil Post.
The Turing Post Language
Turing based his precise definition of computation on an analysis of what a
human being actually does when he computes. Such a person is following a
set of rules which must be carried out in a completely mechanical manner.
Ingenuity may well be involved in setting up these rules so that a computation may be carried out efficiently, but once the rules are laid down, they
must be carried out in a mercilessly exact way. If we watch a human being
calculating something (whether he is carrying out a long division, perform-
8:57
92
Martin Davis
Emil L. Post
Emil L. Post was born in Poland in 1897, but arrived in New York City
at the age of seven, and lived there for the remainder of his life. His life
was plagued by tragic problems: he lost his left arm while still a child and
was troubled as an adult by recurring episodes of a disabling mental illness.
While still an undergraduate at City College he worked out a generalization of the differential calculus which later turned out to be of practical
importance. His doctoral dissertation at Columbia University initiated the
modern metamathematical method in logic. His researches while a postdoctoral fellow at Princeton in the early 1920s anticipated later work of Godel
and Turing, but remained unpublished until much later, partly because of
the lack of a receptive atmosphere for such work at the time, and partly because Post never completed the definitive development he was seeking. His
work on computability theory included the independent discovery of Turings analysis of the computational process, various important unsolvability
results, and the first investigations into degrees of unsolvability (which provide a classification of unsolvable problems). He died quite unexpectedly
in 1954 while under medical care.
book
8:57
book
What is a Computation?
93
52 + 780
832.
We suppose that the linear tape is marked off into individual squares and
that only one symbol can occupy a square. Again, this is a matter of
convenience and involves no particular limitations. So, our multiplication
example might look like this:
2
2 .
8:57
94
Martin Davis
The next restriction we impose (here we are actually going a bit further
than Turing did) is that the only symbols which may appear on our tape
are 0 and 1. Here we are merely making use of the familiar fact that all
information can be coded in terms of two symbols. It is this fact, for
example, which furnishes the basis for Morse code in which the letters of
the alphabet are represented as strings of dots and dashes. Another
example is binary arithmetic which forms the basis of modern digital computation.
Our next restriction has to do with the number of different symbols our
calculator can take note of (or as we shall say, scan) in a single observation. How many different symbols can a human calculator actually take
in at one time? Certainly no one will be able to take in at a glance the
distinction between two very long strings of zeros and ones which differ
only at one place somewhere in the middle. One can take in at a glance,
perhaps, five, six, seven, or eight symbols. Turings restriction was more
drastic. He assumed that in fact one can take in only a single symbol at
a glance. To see that this places no essential restriction on what our calculator can accomplish, it suffices to realize that whatever he does as a
result of scanning a group of, say, five symbols can always be broken up
into separate operations performed viewing the symbols one at a time.
What kinds of things can the calculator actually do? He can replace a 0
by a 1 or a 1 by a 0 on the square he is scanning at any particular moment,
or he can decide to shift his attention to another square. Turing assumed
that this shifting of attention is restricted to a square which is the immediate neighbor, either on the left or on the right, of the square previously
scanned. Again, this is obviously no essential restriction: if one wants to
shift ones attention to a square three to the right, one simply shifts one
to the right three successive times. Also the calculator may observe the
symbol in the square being scanned and make a decision accordingly. And
presumably this decision should take the form: Which instruction shall I
carry out next? Finally, the calculator may halt, signifying the end of the
computation.
To summarize: any computation can be thought of as being carried out
by a human calculator, working with strings of zeros and ones written on
a linear tape, who executes instructions of the form:
Write the symbol 1
Write the symbol 0
book
8:57
What is a Computation?
book
95
8:57
96
Martin Davis
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
PRINT 0
GO LEFT
GO TO STEP 2 IF 1 IS SCANNED
PRINT 1
GO RIGHT
GO TO STEP 5 IF 1 IS SCANNED
PRINT 1
GO RIGHT
GO TO STEP 1 IF 1 IS SCANNED
STOP
book
8:57
What is a Computation?
book
97
8:57
98
Martin Davis
example, the entire tape could be blank or there could be some ones but
all to the left of the initially scanned square.) In this case, the first two
steps will be carried out over and over again forever, since a 1 will never be
encountered. After step 2 is performed, step 1 will be performed again. This
makes it clear that a computation from a TuringPost program need not
actually ever halt. In the case of this simple three-step program it is very
easy to tell from the initial tape configuration whether the computation
will eventually halt or continue forever: to repeat, if there is a 1 to the
right of the initially scanned square the computation will eventually halt;
whereas if there are only blanks to the right of the initially scanned square
the computation will continue forever. We shall see later that the question
of predicting whether a particular TuringPost program will eventually halt
contains surprising subtleties.
Codes for Turing Post Programs
All of the dramatic consequences of Turings analysis of the computation process proceed from Turings realization that it is possible to encode a TuringPost program by a string of zeros and ones. Since such
a string can itself be placed on the tape being used by another (or even
the same) TuringPost program, this leads to the possibility of thinking
of TuringPost programs as being capable of performing computations on
other TuringPost programs.
There are many ways by which TuringPost programs can be encoded
by strings of zeros and ones. We shall describe one such way. We first
represent each TuringPost instruction by an appropriate sequence of zeros
and ones according to the following code:
Code
Instruction
000
001
010
011
101 |0 .{z
. . 0} 1
PRINT 0
PRINT 1
GO LEFT
GO RIGHT
GO TO STEP i IF 0 IS SCANNED
110 |1 .{z
. . 1} 0
GO TO STEP i IF 1 IS SCANNED
i
i
100
STOP
book
8:57
book
What is a Computation?
99
Step
1
000
Step
10
100
Step
2
010
Step
3
110110
Step
4
001
Step
5
011
Step
6
110111110
Step
7
001
Step
8
011
End
111
It is important to notice that the code of a TuringPost program can be deciphered in a unique, direct, and straightforward way, yielding the program
of which it is the code. First remove the initial 1 and the final 111 which
are just punctuation marks. Then, proceeding from left to right, mark off
the first group of 3 digits. If this group of 3 digits is 000, 001, 010, 011, or
100 the corresponding instruction is: PRINT 0, PRINT 1, GO LEFT, GO
RIGHT, or STOP, respectively. Otherwise the group of 3 digits is 101 or
110, and the first instruction is a GO TO. The code will then have one
of the forms:
101 |0 .{z
. . 0} 1
i
110 |1 .{z
. . 1} 0
i
corresponding to
GO TO STEP i IF 0 IS SCANNED
and
GO TO STEP i IF 1 IS SCANNED
respectively. Having obtained the first instruction, cross out its code and
continue the process, still proceeding from left to right. Readers who wish
to test their understanding of this process may try to decode the string:
8:57
100
book
Martin Davis
101000110100110000010101010111
The Universal Program
We are now ready to see how Turings analysis of the computation process
together with the method for coding TuringPost programs we have just
introduced leads to a conclusion that at first sight seems quite astonishing. Namely, there exists a single (appropriately constructed) TuringPost
program which can compute anything whatever that is computable. Such
a program U (for universal) can be induced to simulate the behavior of
any given TuringPost program P by simply placing code(P ), the string of
zeros and ones which represents P , on a tape and permitting U to operate
on it. More precisely, the non-blank portion of the tape is to consist of
code(P ) followed by an input string v on which P can work. (For clarity,
we employ capital letters to stand for particular TuringPost programs and
lowercase letters to stand for strings of zeros and ones.) For example, the
string
111 |{z}
11
1 000010110110001011110111110001011110101001
{z
} |{z}
|{z}
|
Begin
End Input
signifies that U should simulate the behavior of the doubling program when
11 is the input. Thus, at the end of the computation by U , the tape should
look just like the final tape in Figure 2.
Now, a universal TuringPost program U is supposed to perform in this
way not only for our doubling program, but for every TuringPost program. Let us be precise: U is to begin its computation presented with
a tape whose nonblank portion consists of code(P ) for some TuringPost
program P (initially scanning the first symbol, necessarily 1, of this code)
followed by a string v. U is then supposed to compute exactly the same
result as the program P would get when starting with the string v as the
nonblank part of the tape (scanning the initial symbol of v). Such a program U can then be used to simulate any desired TuringPost program P
by simply placing the string code(P ) on the tape.
What reason do we have for believing that there is such a program U ?
To help convince ourselves, let us begin by thinking how a human calculator could do what U is supposed to do. Faced with the tape contents
on which U is supposed to work, such a person could begin by scanning
this string of zeros and ones, from left to right, searching for the first place
8:57
What is a Computation?
book
101
that 3 consecutive ones appear. This triple 111 marks the end of code(P )
and the beginning of the input string. Our human calculator can then
write code(P ) on one sheet of paper and the input string on another. As
already explained, he can decode the string code(P ) and obtain the actual
TuringPost program P . Finally, he can play machine, carrying out the
instructions of P , applied to the given input string in a robotlike fashion.
If and when the computation comes to a halt, our calculator can report the
final tape contents as output. This shows that a human calculator can do
what we would like U to do. But now, invoking Turings analysis of the
computation process, we are led to believe that there must be a Turing
Post program which can carry out the process we have just described, a
universal TuringPost program.
The evidence we have given for the existence of such a program is rather
unsatisfactory because it depends on Turings analysis of the computation
process. It certainly is not a mathematical proof. But in fact, if one is
willing to do some tedious but not very difficult work, one can circumvent
the need to refer to Turings analysis at all and can, in fact, write out in
detail an explicit universal TuringPost program. This was done in fact by
Turing himself (in a slightly different, but entirely equivalent context) in
his fundamental 1936 paper. And subsequently, it has been redone many
times. The success of the construction of the universal program is in itself
evidence for the correctness of Turings analysis. It is not appropriate here
to carry out the construction of a universal program in detail; we hope,
merely, that the reader is convinced that such a program exists. (Experienced computer programmers will have no difficulty in writing their own
universal program if they wish to do so.)
We have conceived of TuringPost programs as consisting of lists of
written instructions. But clearly, given any particular TuringPost program
P , it would be possible to build a machine that would actually carry out
the instructions of P in sequence. In particular, this can be done for our
universal program U . The machine we get in this way would be an example
of an all-purpose or universal computing machine. The code for a particular
program P placed on its tape could then be thought of as a program for
doing the computation which P does. Thus, Turings analysis leads us, in
a very straightforward manner, to the concept of an all-purpose computer
which can be programmed to carry out any computation whatever.
8:57
102
Martin Davis
book
8:57
What is a Computation?
book
103
The proof of the unsolvability of the halting problem is remarkably simple. It uses the method known as indirect proof or reductio ad absurdum.
That is, we suppose that what is stated in italics above is false, that in fact,
we possess a computing procedure which, given an initial tape configuration will enable us to determine whether or not the universal program will
eventually halt when started in that configuration. Then we show that this
supposition is impossible; this is done in the box on p. 114.
8:57
104
Martin Davis
example turned up. Early in the century the Norwegian Axel Thue had
emphasized the importance of what are now called word problems. In
1947, Emil Post showed how the unsolvability of the halting problem leads
to the existence of an unsolvable word problem. Posts proof is discussed
in the box on p. 116. Here we merely explain what a word problem is.
In formulating a word problem one begins with a (finite) collection,
called an alphabet, of symbols, called letters. Any string of letters is called
a word on the alphabet. A word problem is specified by simply writing down
a (finite) list of equations between words. Figure 3 exhibits a word problem
specified by a list of 3 equations on the alphabet a, b, c. From the given
equations many other equations may be derived by making substitutions
in any word of equivalent expressions found in the list of equations. In the
example of Figure 3, we derive the equation bac = abcc by replacing the
part ba by abc as permitted by the first given equation.
We have explained how to specify the data for a word problem, but we
have not yet stated what the problem is. It is simply the problem of determining for two arbitrary given words on the given alphabet, whether one
can be transformed into the other by a sequence of substitutions that are
legitimate using the given equations. We show in the box on p. 116 that we
can specify a particular word problem that is unsolvable. In other words,
no computational process exists for determining whether or not two words
can be transformed into one another using the given equations. Work on
unsolvable word problems has turned out to be extremely important, leading to unsolvability results in different parts of mathematics (for example,
in group theory and in topology).
Another important problem that eventually turned out to be unsolvable
first appeared as the tenth in a famous list of problems given by David
Hilbert in 1900. This problem involves so-called Diophantine equations.
An equation is called Diophantine when we are only interested in solutions
in integers (i.e., whole numbers). It is easy to see that the equation
4x 2y
has no solutions in integers (because the left side would have to be even
while the right side is odd). On the other hand the equation
4x y
book
8:57
What is a Computation?
book
105
x2 + y 2
z2
8:57
106
Martin Davis
Undecidable Statements
The work of Bertrand Russell and Alfred North Whitehead in their threevolume magnum opus Principia Mathematica, completed by 1920, made
it clear that all existing mathematical proofs could be translated into the
specific logical system they had provided. It was assumed without question
by most mathematicians that this system would suffice to prove or disprove any statement of ordinary mathematics. Therefore mathematicians
were shocked by the discovery in 1931 by Kurt Godel (then a young Viennese mathematician) that there are statements about the whole numbers
which can neither be proved nor disproved in the logical system of Principia
Mathematica (or similar systems); such statements are called undecidable.
Turings work (which was in part inspired by Godels) made it possible to
understand G
odels discovery from a different, and indeed a more general,
perspective.
Let us write N (P, v) to mean that the TuringPost program P will never
halt when begun with v on its tape (as usual, scanning its leftmost symbol). So, for any particular TuringPost program P and string v, N (P, v)
is a perfectly definite statement which is either true (in case P will never
halt in the described situation) or false (in case P will eventually halt).
When N (P, v) is false, this fact can always be demonstrated by exhibiting the complete sequence of tape configurations produced by P leading
to termination. However, when N (P, v) is true no finite sequence of tape
configurations will suffice to demonstrate the fact. Of course we may still
be able to prove that a particular N (P, v) is true by a logical analysis of
P s behavior.
Let us try to be very rigorous about this notion of proof. Suppose that
certain strings of symbols (possibly paragraphs of English) have been singled out as proofs of particular statements of the form N (P, v). Suppose
furthermore that we possess a computing procedure that can test an alleged proof that N (P, v) is true and determine whether is or is not
actually such a proof. Whatever our rules of proof may be, this requirement is surely needed for communication purposes. It must be possible in
principle to perform such a test in order that should serve its purpose
of eliminating doubts concerning the truth of N (P, v). (In practice, published mathematical proofs are in highly condensed form and do not meet
this strict requirement. Disputes are resolved by putting in more detail as
needed. But it is essential that in principle it is always possible to include
book
8:57
What is a Computation?
book
107
8:57
108
Martin Davis
well know, P can have an unsolvable halting problem (e.g., P could be the
universal program U ), we have arrived at a contradiction; this completes
the proof of G
odels theorem.
Of course, G
odels theorem does not tell us that there is any particular
pair P, v for which we will never be able to convince ourselves that N (P, v) is
true. It is simply that, for any given sound rules of proof, there will be a pair
P, v for which N (P, v) is true, but not provable using the given rules. There
may well be other sound rules which decide this undecidable statement.
But these other rules will in turn have their own undecidabilities.
Complexity and Randomness
A computation is generally carried out in order to obtain a desired answer. In our discussion so far, we have pretty much ignored the answer,
contenting ourselves with discussing only the gross distinction between a
computation which does at least halt eventually and one which goes on
forever. Now we consider the question: how complex need a TuringPost
program be to produce some given output? This straightforward question
will lead us to a mathematical theory of randomness and then to a dramatic
extension of G
odels work on undecidability.
We will only consider the case where there are at least 2 ones on the
tape when the computation halts. The output is then to be read as consisting of the string of zeros and ones between the leftmost and rightmost ones
on the tape, and not counting these extreme ones. Some such convention
is necessary because of the infinite string of zeros and ones on the tape. In
effect the first and last one serve merely as punctuation marks.
To make matters definite suppose that we wish to obtain as output a
string consisting of 1022 ones. When we include the additional ones needed
for punctuation, we see that what is required is a computation which on
termination leaves a tape consisting of a block of 1024 ones and otherwise
blank. One way to do this is simply to write the 1024 ones on the tape
initially and do no computing at all. But surely we can do better. We can
get a slight improvement by using our faithful doubling program (Figure
1). We need only write 512 ones on the tape and set the doubling program
to work. We have already written out the code for the doubling program;
it took 39 bits. (A bit is simply a zero or a one; the word abbreviates
bi nary digit.) So we have a description of a string of 1022 ones which uses
39 + 512 = 551 bits. But surely we can do better. 1024 = 210 , so we should
book
8:57
What is a Computation?
book
109
be able to get 1024 ones by starting with 1 and applying the doubling
program 10 times. In Figure 4 we give a 22-step program, the first nine
steps of which are identical to the first nine steps of the doubling program,
which accomplishes this. Beginning with a tape configuration
10 |11 {z
. . . 1}
n
this program will halt with a block of 2n+1 ones on the tape.
It is not really important that the reader understand how this program
works, but here is a rough account: the program works with two blocks of
ones separated by a zero. The effect of Steps 1 through 9 (which is just
the doubling program) is to double the number of ones to the left of the
0. Steps 10 through 21 then erase 1 of the ones to the right of the zero
and return to Step 1. When all of the ones to the right of the zero have
been erased, this will result in a zero being scanned at Step 11 resulting in
a transfer to Step 22 and a halt. Thus the number of ones originally to the
left of the zero is doubled as many times as there are ones originally to the
right of the zero.
The full code for the program of Figure 4 contains 155 bits. To obtain
the desired block of 1024 ones we need the input 10111111111. We are thus
down to 155 + 11 = 166 bits, a substantial improvement over 551 bits.
We are now ready for a definition: Let w be any string of bits. Then we
say that w has complexity n (or equivalently, information content n) and
write I(w) = n if:
(1) There is a program P and string v such that the length of code(P )
plus the length of v is n, and P when begun with v will eventually
halt with output w (that is with 1w1) occupying the nonblank part of
the tape, and
(2) There is no number smaller than n for which this is the case.
If w is the string of 1022 ones, then we have shown that I(w) 166. In
general, if w is a string of bits of length n, then we can easily show that
I(w) n + 9. Specifically, let the program P consist of the single instruction: STOP. Since this program does not do anything, if it begins with
input 1w1, it will terminate immediately with 1w1 still on the tape. Since
Code(P ) = 1100111, it must be the case that I(w) is less than or equal to
the length of the string 11001111w1, that is, less than or equal to n + 9.
(Naturally, the number 9 is just a technical artifact of our particular formulation and is of no theoretical importance.)
8:57
110
Martin Davis
How many strings are there of length n such that, say, I(w) n 10?
(We assume n > 10; in the interesting cases n is much larger than 10.)
Each such w would be associated with a program P and string v such that
Code(P )v is a string of bits of length less than or equal to n 10. Since
the total number of strings of bits of length i is 2i , there are only:
2 + 4 + . . . + 2n10
strings of bits of length n 10. This is the sum of a geometric series
easily calculated to be 2n9 2. So we conclude: there are fewer than 2n9
strings of bits w of length n such that I(w) n 10.
Since there are 2n strings of bits of length n, we see that the ratio of
the number of strings of length n with complexity n 10 to the total
number of strings of length n is no greater than
1 .
1
1
2n9
<
=
=
2n
29
512
500
This is less than 0.2%. In other words, more than 99.8% of all strings of
length n have complexity > n 10. Now the complexity of the string of
1022 ones is, as we know, less than or equal to 166, thus much less than
1022 10 = 1012. Of course, what makes this string so special is that the
digit pattern is so regular that a comparatively short computational description is possible. Most strings are irregular or as we may say, random.
Thus we are led to an entirely different application of Turings analysis
of computation: a mathematical theory of random strings. This theory was
developed around 1965 by Gregory Chaitin, who was at the time an undergraduate at City College of New York (and independently by the world
famous A. N. Kolmogorov, a member of the Academy of Sciences of the
U.S.S.R.). Chaitin later showed how his ideas could be used to obtain a
dramatic extension of G
odels incompleteness theorem, and it is with this
reasoning of Chaitins that we will conclude this essay.
Let us suppose that we have rules of proof for proving statements of
the form I(w) > n where w is a string of bits and n is a positive integer.
As before, we assume that we have a computing procedure for testing an
alleged proof to see whether it really is one. We assume that the rules
of proof are sound, so that if is a proof of the statement I(w) > n, then
the complexity of the string w really is greater than n. Furthermore, let
us make the very reasonable assumption that we have another computing
procedure which, given a proof of a statement I(w) > n, will furnish us
book
8:57
What is a Computation?
book
111
with the specific w and n for which I(w) > n has been proved.
We now describe a new computing procedure we designate as . We
begin generating the sequence 1 , 2 , 3 , . . . of possible proofs as above.
For each i we perform our test to determine whether or not i is a proof
of a statement of the form I(w) > n. If the answer is affirmative we use
our second procedure to find the specific w and n. Finally we check to see
whether n > k0 where k0 is some fixed large number. If so, we report w
as our answer; otherwise we go on to the next i . By Turings analysis
this entire procedure can be replaced by a TuringPost program, where
the fixed number k0 is to be chosen at least as large as the length of this
program. (The fact that k0 can be chosen as large as this is not quite obvious; the basic reason is that far fewer than k0 bits suffice to describe the
number k0 .)
Now, a little thought will convince us that this TuringPost program
can never halt: if it did halt we would have a string w for which we had
a proof i that I(w) > n where n > k0 . On the other hand this very
program has length less than or equal to k0 (and hence less than n) and
has computed w, so that I(w) < n, in contradiction to the soundness of
our proof rules. Conclusion: our rules of proof can yield a proof of no
statement of the form I(w) > n for which n > k0 . This is Chaitins form
of G
odels theorem: given a sound set of rules of proof for statements of
the form I(w) > n, there is a number k0 such that no such statement is
provable using the given rules for any n > k0 .
To fully understand the devastating import of this result it is important
to realize that there exist rules of proof (presumably sound) for proving
statements of the form I(w) > n which include all methods of proof available in ordinary mathematics. (An example is the system obtained by using
the ordinary rules of elementary logic applied to a powerful system of axioms, of which the most popular is the so-called Zermelo-Fraenkel axioms
for set theory.) We are forced to conclude that there is some definite number k0 , such that it is in principle impossible, by ordinary mathematical
methods, to prove that any string of bits has complexity greater than k0 .
This is a remarkable limitation on the power of mathematics as we know
it.
Although we have discussed a considerable variety of topics, we have
touched on only a tiny part of the vast amount of work which Turings
analysis of the computation process has made possible. It has become
8:57
112
Martin Davis
book
8:57
What is a Computation?
book
113
8:57
114
Martin Davis
book
8:57
What is a Computation?
book
115
statement. The only way out of this contradiction is to conclude that what
we were pretending is untenable. In other words, the halting problem for
U is not solvable.
8:57
116
Martin Davis
q4 1 = q5 0,
book
8:57
What is a Computation?
book
117
corresponding to the next step of the computation. Suppose next that the
fifth instruction is
GO RIGHT.
It requires 6 equations to fully translate this instruction, of which two
typical ones are
q5 01 = 0q6 1,
q5 1h = 1q6 0h.
qn+1 1 = qn+1
0qn+1 = qn+1 ,
1qn+1 = qn+1
serve to transform any Post word containing qn+1 into the word hqn+1 h.
Putting all of the pieces together we see how to obtain a word problem
which translates any given TuringPost program.
Now let a TuringPost program P begin scanning the leftmost symbol
of the string v; the corresponding Post word is hq1 vh. Then if P will
eventually halt, the equation
hq1 vh = hqn+1 h
will be derivable from the corresponding equations as we could show by
following the computation step by step. If on the other hand P will never
halt, it is possible to prove that this same equation will not be derivable.
(The idea of the proof is that every time we use one of the equations which
translates an instruction, we are either carrying the computation forward,
orin case we substitute from right to leftundoing a step already taken.
So, if P never halts, we can never get hq1 vh equal to any word containing
qn+1 .) Finally, if we could solve this word problem we could use the solution
to test the equation
hq1 vh = hqn+1 h
and therefore to solve the halting problem for P . If, therefore, we start
with a TuringPost program P which we know has an unsolvable halting
8:57
118
Martin Davis
book
8:57
What is a Computation?
book
119
Tape Configuration
Program Step
. . . 001100 . . .
. . . 000100 . . .
. . . 000100 . . .
. . . 010100 . . .
. . . 010100 . . .
. . . 011100 . . .
. . . 011100 . . .
. . . 011000 . . .
. . . 011000 . . .
. . . 011000 . . .
. . . 011000 . . .
. . . 111000 . . .
. . . 111000 . . .
. . . 111000 . . .
. . . 111000 . . .
. . . 111100 . . .
. . . 111100 . . .
1
2
4
5
7
8
1
2
2
2
4
5
5
5
7
8
10
8:57
120
Martin Davis
book
8:57
What is a Computation?
book
121
Julia B. Robinson
Julia B. Robinson was born in 1919 in St. Louis, Missouri, but has
lived most of her life in California. Her education was at the University
of California, Berkeley, where she obtained her doctorate in 1948. She has
always been especially fascinated by mathematical problems which involve
both mathematical logic and the theory of numbers. Her contributions
played a key role in the unsolvability proof for Hilberts tenth problem. In
1975 she was elected to the National Academy of Sciences, the first woman
mathematician to be so honored.
8:57
122
Martin Davis
1.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
PRINT 0
..
.
GO TO STEP
GO RIGHT
GO TO STEP
GO RIGHT
GO TO STEP
GO LEFT
PRINT 0
GO LEFT
GO TO STEP
GO LEFT
GO TO STEP
GO RIGHT
GO TO STEP
STOP
1 IF 1 IS SCANNED
22 IF 0 IS SCANNED
12 IF 1 IS SCANNED
16 IF 1 IS SCANNED
18 IF 1 IS SCANNED
1 IF 1 IS SCANNED
book
8:57
Chapter 6
On the Kolmogorov-Chaitin Complexity for short
sequences
Jean-Paul Delahaye and Hector Zenil
Laboratoire dInformatique Fondamentale de Lille
Centre National de la Recherche Scientifique (CNRS)
Universite des Sciences et Technologies de Lille;
[email protected], [email protected]
Among the several new ideas and contributions made by Gregory
Chaitin to mathematics is his strong belief that mathematicians should
transcend the millenary theorem-proof paradigm in favor of a quasiempirical method based on current and unprecedented access to computational resources [3]. In accordance with that dictum, we present in this paper an experimental approach for defining and measuring the KolmogorovChaitin complexity, a problem which is known to be quite challenging for
short sequences shorter for example than typical compiler lengths.
The Kolmogorov-Chaitin complexity (or algorithmic complexity) of a
string s is defined as the length of its shortest description p on a universal
Turing machine U , formally K(s) = min{l(p) : U (p) = s}. The major
drawback of K, as measure, is its uncomputability. So in practical applications it must always be approximated by compression algorithms. A string
is incompressible if its shorter description is the original string itself. If a
string is incompressible it is said that the string is random since no patterns
were found. Among the 2n different strings of length n at least one will
be completely random simply because there are not enough shorter strings.
By using the same argument it can be also deduced that most of the strings
have maximal K-C complexity. Therefore many of them will remain equal
or very close to their original size after compression. Most of them will be
therefore random. An important property of K is that it is nearly independent of the choice of U . However, when the strings are short in length,
the dependence of K on a particular universal Turing machine U is higher
producing arbitrary results. In this paper we will suggest an empirical ap123
book
8:57
124
book
8:57
book
125
better notation is the 3 tuplet CA(t, c, j) with j indicating the number of symbols,
but because we are only considering 2 symbol cellular automata we can take it for
granted and avoid that complication.
2 Both enumeration schemes are implemented in Mathematica calling the functions
CelullarAutomaton and TuringMachine, the latter implemented in Mathematica version
6.0
8:57
126
that the bottom of the distribution, and therefore all of it, will tend to
stabilise by taking bigger samples. By analysing 6.1 it can be deduced that
the output frequency distribution of each of the independent devices of
computation (T M and CA) follows an output frequency distribution. We
conjecture that these systems of computation and others of equivalent computational power converge toward a single distribution when bigger samples
are taken by allowing a greater number of steps and/or bigger classes containing more and increasingly sophisticated computational devices. Such
distributions should then match the value of m(s) and therefore K(s) by
means of the convergence of what we call their experimental counterparts
me (s) and Ke (s). If our method succeeds as we claim, it could be possible
to give a stable definition of the K-C complexity for short sequences independent of any constant.
Figure 6.1. The above diagram shows the convergence of the frequency distributions of
the outputs of T M and ECA = CA(1, 1) for k = 4, after 200 steps, fed with (pseudo)
random inputs. Matching strings are linked by a line. As one can observe, in spite
of certain crossings, T M and ECA are strongly correlated and both successfully group
equivalent output strings. By taking the six groups marked with brackets the
distribution frequencies only differ by one.
book
8:57
book
127
For instance, the strings 0101 and 1010 are grouped in second place.
They are therefore the second most complex group after the group composed by the strings of a sequence of zeros or ones but before all the other
2k strings. And that is what it would be expected according to what algorithmic probability predicts since more structured non-random strings
appear classified at the top (as our 0101 . . . example) while less structured
random-looking strings appear classified at the bottom. In favour of our
claims about the nature of these distributions as following the universal
distribution m(s) and therefore approaching K(s), notice that all strings
were correctly grouped with their equivalent category of complexity under
the three possible symmetries preserving their K-C complexity, namely reversion (sy), complementation (co) and composition of the two (syco). The
fact that the method groups all the strings by their complexity category
allowed us to apply a well-known lemma used in group theory to enumerate actual different cases, which let us consider only a single representative
string for each of the complexity categories. For instance, for strings of
length 10 (k = 10), the compressed distribution after the application of
Burnsides lemma has 272 actual different strings from all 210 = 1024 original cases. The distribution below was built from CA(3, 2) after 200 steps
and regular inputs (a 1 surrounded by 0s). The following table contains
the top 20 strings with their respective frequencies of appearance.
string
f requency (%)
0000000000
0101010101
0111101111
0011110011
0101001010
0001001000
0110000110
0010001000
0101101101
0000100000
0110111111
0100100100
0010010000
0011111111
0100100000
25.2308
4.92308
1.84615
1.84615
1.84615
1.84615
1.84615
1.53846
1.53846
1.53846
1.53846
1.23077
1.23077
1.23077
1.23077
8:57
128
0001000000
0110001100
0000110000
0010110100
0011100000
1.23077
1.23077
1.23077
1.23077
0.923077
Even though each distribution obtained by different means favoured different symmetries, it turned out that all them were strongly correlated to
the others. Furthermore, we found that frequency distributions from several
real- world data sources also approximates the same distribution, suggesting
that they probably come from the same kind of computation, supporting
contemporary claims about nature as performing computations [8, 11]. The
extended paper available online contains more detailed results for strings
of length k = 4, 5, 6, 10 as well as two metrics for measuring the convergence of T M (2, 2) and ECA and the real-world data frequency distributions extracted from several sources.3 Detailed papers with mathematical
formulations and conjectures and the real-world data distribution results,
are currently in preparation.
References
[1] G.J. Chaitin, Algorithmic Information Theory, Cambridge University Press,
1987.
[2] G.J. Chaitin, Information, Randomness and Incompleteness, World Scientific, 1987.
[3] G.J. Chaitin, Meta-Math! The Quest for Omega, Pantheon Books NY, 2005.
[4] C.S. Calude, Information and Randomness: An Algorithmic Perspective
(Texts in Theoretical Computer Science. An EATCS Series), Springer; 2nd.
edition, 2002.
[5] Kirchherr, W., M. Li, and P. Vit
anyi. The miraculous universal distribution.
Math. Intelligencer 19(4), 7-15, 1997.
[6] M. Li and P. Vit
anyi, An Introduction to Kolmogorov Complexity and Its
Applications, Springer, 1997.
[7] A.K.Zvonkin, L. A. Levin. The Complexity of finite objects and the Algorithmic Concepts of Information and Randomness., UMN = Russian Math.
Surveys, 25(6):83-124, 1970.
[8] S. Lloyd, Programming the Universe, Knopf, 2006.
[9] R. Solomonoff, The Discovery of Algorithmic Probability, Journal of Computer and System Sciences, Vol. 55, No. 1, pp. 73-88, August 1997.
3 It
book
8:57
book
129
[10] R. Solomonoff, A Preliminary Report on a General Theory of Inductive Inference, (Revision of Report V-131), Contract AF 49(639)-376, Report ZTB138, Zator Co., Cambridge, Mass., Nov, 1960
[11] S. Wolfram, A New Kind of Science, Wolfram Media, 2002.
8:57
130
book
8:57
Chapter 7
Circuit Universality of Two Dimensional Cellular
Automata: A Review
Anah Gajardo1 and Eric Goles 2
Universidad Adolfo Ib
an
ez,
Av. Diagonal Las Torres 2640, Pe
nalolen, Santiago, Chile
Instituto de Sistemas Complejos de Valparaso (ISCV),
Av. Artillera 600B, Co Artillera, Valparaso, Chile; [email protected]
Universality of Cellular Automata (CA) is the ability to develop
arbitrary computations, and is viewed as a complexity certificate. The
concept exists since the creation of CA by John von Neumann, and it
has undergone several transformations and ramifications. We review a
sample of models, starting with Bankss CA, where universality has been
shown through the construction of arbitrary boolean circuits (Circuit
Universality), in most but not all cases leading to proofs of Turing
Universality.
7.1. Introduction
A d-dimensional Cellular Automata (d-CA) is a dynamical system evolving
in Zd in discrete time, where the upgrade of the lattice is synchronous and
each site changes its state following a local rule which depends on the states
of a fixed neighborhood.
Such a system can present very diverse and complex behaviors, the
prediction of which is usually difficult. But what is exactly meant by this?
If we completely know the initial configuration of a CA, we can compute
its whole evolution up to any iteration t. On the other hand, if we only
know the state of a finite part of the lattice, we can only update the state
1
131
book
8:57
132
of those cells whose neighborss states are all known. In this case the set of
cells that can be updated decreases over time until an instant in which we
cannot compute anymore.
Let us illustrate this with an example in Z. Consider the following rule:
if the central cell is in state 1, it will change to 0 if any of its neighbors is in
state 0; in any other case, it keeps its current state. (Figure 7.1 illustrates
the evolution of this rule for a certain initial configuration). Let us now
suppose that we know the initial state of cells 1 to 10. Initially, we can
update only cells 2 to 9. In the next iteration we can update only cells 3
to 8, and, in general, at iteration i we can only update cells 1 + i to 10 i.
At step 5, the state of every cell is unknown to us.
Figure 7.1.
upward.
book
8:57
book
133
8:57
134
book
8:57
book
135
OUTPUT
NOT
AND
NOT
NOT
AND
AND
NOT
NOT
CROSSOVER
FANOUT
Figure 7.2.
FANOUT
8:57
136
Figure 7.3.
Banks defined a wire: a line of cells in gray state, embedded in a background of cells in blank state. This is a stable configuration, but if two
neighboring cells in the chain are changed to blank and black, this perturbation propagates over the chain in the direction of the black cell (this is
a signal), and if two signals collide, they disappear; a cut wire is a dead
end for the signal. A junction of three or four wires is also a stable configuration; a grey cell is added in the three wires case, giving it a short cut
wire. When one or two signals (in right angle) enter a junction they exit by
the remaining wires. But if three signals enter a junction, they disappear.
Figure 7.4 shows simulations of some of these phenomena.
The OR gate is simply a junction of three wires, which can be also used
as a FANOUT. These gates can send signals in the direction of the inputs,
something that Banks prefered to avoid. He managed this by defining a
DIODE, shown in Figure 7.5(a), which only allows a signal to pass in one
direction. Banks also constructed a Timer that generates signals periodically, shown in Figure 7.5(b): the signal inside the cycle duplicates each
time it passes by the junction, sending a signal into the exiting wire. The
book
8:57
book
137
Figure 7.4. Top: A wire and the propagation of a signal. Middle: When one signal
enters a three wire junction. Bottom: When two signals enter a four wire junction.
NOT gate is composed by two Timers connected to a junction (see Figure 7.5(c)). If a signal A arrives to this junction at the same iteration that
both Timer signals, they mutually annihilates.
A
A
A
1
(a)
Figure 7.5.
1
(b)
(c)
(a) the Diode. (b) the Timer. (c) the NOT gate.
8:57
138
book
8:57
book
139
(a)
(b)
Figure 7.6.
tion.
(a) A signal and the wire. (b) Two signals entering an Annihilating Junc-
Unfortunately, the XOR and the NOT gates are not enough for constructing the OR and the AND gates. In Bankss CA, the OR gate was
obtained because two signals do not annihilate each other in a junction,
something which does not work here. However, the logical gate p AND
NOT q can be computed within this system, as shown in Figure 7.7(b).
In order to correctly use the gate, the input signals must be synchronized
to collide at point C (or anywhere between the hexagon and the input p).
If a signal arrives by p, it will exit. If a signal arrives by q, it will die at
the Annihilating Junction A. If two signals arrive at the same time they
will be mutually annihilated at point C and no output will be produced.
Any logical gate can be constructed by using the p AND NOT q gate, and
hence the cellular automaton is Circuit Universal.
q
p
A
A
Output
(a)
Figure 7.7.
(b)
8:57
140
(a)
(b)
Figure 7.8.
This collision emulates a gate with two outputs: p AND NOT q and
NOT p AND q, because each glider will continue its way if and only if the
other glider does not arrive. It is possible to kill one of the outputs with a
configuration called eater, which is a stable configuration that destroys the
gliders when they collide with it. We obtain in this way a p AND NOT
q gate. If we replace the input p by a periodical glider generator, which
book
8:57
book
141
exists (a glider gun), we obtain a NOT gate. With these two gates all the
other gates can be obtained.
To complete the construction, we still need to duplicate and to direct
the gliders, and this cannot be done with a logical gate. To achieve this,
another collission must be used: a collision called Kickback Collision. When
a stream of gliders collides with a glider in a Kickback collision, the first
glider comes back to the stream and collides with the second glider. This
last collision can kill the third glider too. In short, a single glider can kill
three. This can be used to duplicate and change the direction of a logical
value, as shown in Figure 7.9. The FANOUT requires several gliders (1s),
which can be generated with glider guns.
1111
0
A
0
0
A
1
0
0
A00A
A
0
0
0
A000
1
0
0
0
Figure 7.9. A FANOUT. If A = 1, the first three 1s of the left are killed, and the 1s
coming from the bottom survive, as well as the last 1 which finally exits by the right. If
A = 0, the last three 1s of the left are killed, and no glider goes up; only the first one
goes on to kill the 1 coming from the right-bottom, preventing the exit of gliders in any
direction.
8:57
142
it stops.
With the collision we have a p AND NOT q gate and all the other gates,
including the CROSSOVER. If a ladder turns three times it collides with
its own trace and stops, and this can be used to stop unused outputs. The
FANOUT can also be constructed with this property (see Figure 7.10).
Since within this system we have no ladder gun, all 1s must be produced
at the beginning and delayed long enough to arrive at the desired iteration
to the gate where they are needed. Griffeath and Moore proved that this
can be done without making the circuit grow in an inconvenient way.
A
Figure 7.10. If a signal A passes before the 1 of the right, every other 1 passes. But if
the signal A is 0, then the 1 of the right prevents the other 1s from going up.
book
8:57
book
143
to emulate wires and the Fredkin gate. Figure 7.11 shows the behavior of
the Fredkin gate.
1
p
q
1
p
q
0
p
q
0
q
p
Figure 7.11. The Fredking gate is a three-inputs-three-outputs logical gate. If the upper
input value is 1 (True), the outputs are equal to the inputs. If the upper input value is
0 (False), the upper output is also 0 but the other two inputs are permuted.
Figure 7.12. The collision of two balls can be viewed as a general logical gate with
multiple outputs. The variables X and Y represent the presence or the absence of a
travelling ball.
8:57
144
book
Figure 7.13. The Fredking gate computed only with the BBM collision. The gray boxes
represent either the two-inputs-four-outputs gate given by the collision or its inverse.
The lines outside these gates represent particle trajectories, which are controlled by well
placed mirrors. The circuit must be constructed in such a way that the particles do not
interact when their trajectories cross.
Figure 7.14.
8:57
book
solid
145
dotted
solid
dotted
Figure 7.15.
solid
8:57
146
Figure 7.16.
A
1
Figure 7.17. NOT gate for the HPP lattice gas. If a signal comes by A, it will collide
with the 1 coming from above and two horizontal signals will exit. If no signal comes by
A, the 1 from below will collide with the other one, then a signal will exit and collide
the 1 from the right to finally exit by the top.
7.6. Sandpiles
Bak et al [23] introduced a model, mostly known as the Sandpile Automaton, which captures important features of many nonlinear dissipative systems. Consider a set of sites in a d-dimensional lattice such that each cell is
connected to the 2d nearest neighbors. A finite number of tokens, xi (t) 0,
is assigned to each cell i. Given the configuration at step t, if the site i
has more than 2d tokens, it gives one of these to each neighbor. Since
the updating is synchronous, the site i also receives one token from each
neighbor with more than d tokens. In Figure 7.18 we give an example of
the Sandpile dynamics in a two dimensional lattice.
If the number of tokens is finite, within a finite amount of time every
site will have less than d tokens (see, for example, C. Moore [24] for a formal
proof). Then, any finite configuration becomes stationary (a fixed point),
that is to say, every avalanche stops in a finite number of iterations. The
book
8:57
1
1 6 1
1
10
Figure 7.18.
book
147
2
2 2 2
2
1 1 1 1
1 1 1 1 4 3 3 3
1 1 1 1
3 3 3 3
3 3 3 3 3
3 3 3 3
3 3 3 3
2 3 3 3 3
3 3 3 3
Figure 7.19.
8:57
148
book
W
3
3
3 3
3
3
3 3
S
3
3
3
3
2
3
3
3
3
3
3
3
E
3
3
3
3
33
3 3
3
3
3
3
Figure 7.21.
The wire and the signal
1 1 1
1 1 1
1 212 01 8
1 1 1
1 1 1
The neigborhood
The crossover
11
00
00
11
00
11
00
11
7
7
7
7
7
7
Figure 7.21. Logical gates constructed for a Sandpile with von Neumann neighborhood
of radius two. In this model, the critical threshold is 8.
7.7. Conclusions
The construction of boolean circuits in 2-CA is not always an easy task,
but it remains the best known path to proving Turing Universality. Turing
Universality is an important property, since it means being able to perform
any algorithmic calculation, and has strong consequences like the existence
8:57
book
149
8:57
150
Acknowledgments
We want to thank particularly to Andres Moreira for careful reading and
precious comments.
References
[1] R. Greenlaw, H. Hoover, and W. Ruzzo, Limits to Parallel Computation:
P-Completeness Theory. (Oxford University Press, 1995).
[2] R. E. Ladner, The circuit value problem is log space complete for P, SIGACT
News. 7(1), 1820, (1975).
[3] E. R. Banks. Information Processing and Transmission in cellular automata.
PhD thesis, M.I.T., Cambridge, Mass., U.S.A., (1971).
[4] B. Durand and Z. R
oka, The game of life: universality revisited, In eds.
M. Delorme and J. Mazoyer, Cellular Automata: a parallel model, pp. 51
76. Kluwer Academic Pub., (1999).
book
8:57
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
book
151
lattice gas: transport properties and time correlation functions, Phys. Rev.
A. 13, 19491960, (1976).
K. Morita and S. Ueno, Computation-universal models of two-dimensional
16-state reversible cellular automata, IEICE Trans. Inf. and Syst. E75-D
(1), 141147, (1992).
K. Morita, M. Margenstern, and K. Imai, Universality of reversible hexagonal cellular automata, RAIRO Theoretical Informatics and Applications.
33, 535550, (1999).
K. Imai and K. Morita, A computation-universal two-dimensional 8-state
triangular reversible cellular automaton, Theoret. Comput. Sci. 231, 181
191, (2000).
C. Moore and M. G. Nordahl. Predicting lattice gases is P-complete. Technical Report 034, Santa Fe Institute, Santa Fe, New Mexico (apr, 1997).
P. Bak, C. Tang, and K. Wiesenfild, Self-organized criticality: An explanation of 1/f noise, Phys. Rev. Lett. 59(4), 381384, (1987).
C. Moore and M. Nilsson, The computational complexity of Sandpiles, J. of
Stat. Phys. 96, 205224, (1999).
E. Goles and M. Margenstern, Sand pile as a universal computer, Int. J. of
Modern Physics C. 7(2), 113122, (1996).
A. Gajardo and E. Goles, Crossing information in two dimensional Sandpiles,
Theor. Comput. Sci. 369(1-3), 463469, (2006).
C. G. Langton, Studying artificial life with cellular automata, Physica D.
22, 120149, (1986).
A. Gajardo, A. Moreira, and E. Goles, Complexity of Langtons ant, Discrete
Applied Mathematics. 117, 4150, (2002).
8:57
152
book
8:57
Chapter 8
Chaitins Graph Coloring Algorithm
James Goodman
Computer Science Department, University of Auckland, Auckland, New
Zealand; [email protected]
book
8:57
154
James Goodman
utilyzedhand-coded programs tested human limits by requiring the programmer to think about the efficient sharing of registers across different
procedures that might or might not have concurrent scope. The development of an efficient global register allocation scheme was an important step
forward, and justified the larger general register file typical of the new processors emerging in the early 1980s. Today compilers are able to do far
better than humans can hope to do in allocating registers.
References
[1] G.J. Chaitin, M.A. Auslander, A.K. Chandra, J. Cocke, M.E. Hopkins, &
P.W. Markstein. Register allocation via coloring, Computer Languages, Vol.
6, pp. 4757, 1981.
[2] G. Radin. The 801 minicomputer, IBM J. Res. Develop., Vol 27 (3), pp.
237246, May 1983. The paper originally appeared in the Proceedings of
the (First) Symposium for Programming Languages and Operating Systems
(ASPLOS), March 1982.
[3] F.E. Allen. The history of language processor technology in IBM, IBM J.
Res. Develop., Vol. 25 (6), pp. 535548, September 1981.
book
8:57
Chapter 9
A Berry-type Paradox
Gabriele Lolli
Department of Mathematics
University of Torino, Italy; [email protected]
In this brief note we want to present and make known a statement of
Berry paradox which has been ignored buried in the graveyard of dead
languages and which is due to Beppo Levi, in 1908, independently
from Russell. Berry paradox should actually be called Russells paradox, as Chaitin has observed, in [3, pp. 8-9], on the basis of Alejandro
Garciadiegos findings.
Beppo Levis version is much more modern than Russells informal one,
since it is cast in arithmetical terms, with the appropriate numerical computations; although Levis assessment of it is rather muddled, his paradox
is ready for use when supplemented with the ideas which will transform the
epistemological paradoxes in positive arguments in the theory of undecidable problems, as foreseen by Godel and as realized in [2] or [1].
Beppo Levi doesnt mention Berry paradox and he doesnt cite the paper
[10], where it was presented, which he quite certainly did not know; his
only references are to [9]. He gives a mathematically rigorous version of the
paradox in the course of a rather lengthy analysis of Richards antinomy,
where he follows and improves Peanos discussion of the latter.
1. Beppo Levi did not belong to Peanos school, although he graduated
in Torino in 1896 with a dissertation in analysis. At the beginning of his
career he worked in set theory, starting out with the perusal of Baires thesis, then reverting to measure and category theory of the line after a failed
attack at the continuum hypothesis. His name in mentioned in connection
with the history of the axiom of choice, as he seems to have been the first to
recognize (and formulate) the principle of partition in Bersteins proof that
there are continuum many closed sets; he also gave a new proof avoiding
155
book
8:57
156
Gabriele Lolli
choice. Azriel Levy for example gives as reference for the axiom of choice
(Beppo Levi 1902; Erhard Schmidt 1904 see Zermelo 1904), in [6, p.
159], but Zermelo in 1904 only attributed, rightly, to Beppo Levi the principle of partition. Levis precise contributions are spelled out in [8, pp.
78-80] and [7].
Further work by Beppo Levi in logic concerned mainly improvements
and criticism of Peanos logic, and has only historical interest. As a mathematician he is best remembered for results in analysis related to Lebesgue
theory, such as the theorem of the passage to limit under integral sign
equivalent to the later Lebesgues dominated convergence and in number
theory, as explained in [11].
Beppo Levi had a crystal clear conception of the new axiomatic method
as it had been recognized at the end of the nineteenth century; to him, on a
par with Pasch, Hilbert, Enriques and Pieri, are due some of the most neat
explications of the nature of mathematical theories, of the impossibility for
primitive notions to be completely determined and of the unavoidability of
multiple interpretations.
He relied on these principles also in the analysis of logical antinomies
he gave in [4].
2. In discussing Richards antinomy Levi denotes by E the set of real
numbers definable with a finite number of words, and by R an enumeration
of E.
He concludes that E does not exist, but the reason he gives is different
from that of Poincare and Russell imputing the antinomy to the impredicative character of the definition. He also notices the circularity of the
definition of E, which to him, from his axiomatic point of view, means that
E ed R are not to be considered defined and independent entities; they are
primitive ideas subjected as a whole to some postulates, which he tries to
identify:
(1) E is a set of numbers comprised between 0 and 1;
(2) R is an order of the numbers of E;
(3) All numbers of E and only they can be expressed with a finite number
of symbols, including E and R.
Richards diagonal number N proves only, according to Levi, that the postulates imposed on E and R are not mutually consistent.
book
8:57
book
A Berry-type Paradox
157
Levi does not pinpoint the source of the contradiction, observing that
when a set of postulates is inconsistent, the contradiction lies in their
union, not in any single one of them.
He dwells however on the ordering R, which is naturally thought of as
obtained by a lexicographic ordering of the set F of all statements by pruning it of those which do not define a number or define a number previously
listed. Let us call it briefly the lexicographic order.
Here comes the paradox (we had to change the counting with respect to
Levis because of the different length of sentences in Italian and in English).
Let us call B, as we did, the number of symbols needed to compose
our statements: among these symbols well assume to be included, for
simplicity, the symbol for exponentiation (to the). It is easy to see
that B > 40: suppose we write it in the decimal system, and let be the
number of its digits (we can reasonably assume = 2). Now consider
the proposition
Il numero di posto B B in R
8:57
158
Gabriele Lolli
References
[1] G. Boolos, A new proof of the G
odel incompleteness theorem, Notices
AMS , 36, pp. 388-90.
[2] G. J. Chaitin, The Berry paradox, Complexity, 1, n. 1 (1995), pp. 26-30.
[3] G. J. Chaitin, The Unknowable, Springer, Singapore, 1999.
[4] B. Levi, Antinomie logiche?, Annali di Matematica Pura e Applicata, (III)
(1908), XV, pp. 187-216, in [5, vol. 2, pp. 629-58].
[5] B. Levi, Opere 1897-1926 , 2 voll., edited by Unione Matematica Italiana,
Cremonese, Roma, 1999.
[6] A. Levy, Basic Set Theory, Springer, Berlin, 1979.
[7] G. Lolli, Lopera logica di Beppo Levi, in [5, vol. 1, pp. LXVII-LXXVI].
[8] G. H. Moore, Zermelos Axiom of Choice, Springer, New York, 1982.
[9] B. Russell, The principles of mathematics I , Cambridge Univ. Press. Cambridge, 1903.
tre definizioni di numerazioni che si vogliano immaginare sostituite a
quella del n. 8. La contraddizione del Richard si presenta comunque
R si voglia immaginar determinata.
[4, p. 638].
book
8:57
A Berry-type Paradox
book
159
8:57
160
Gabriele Lolli
book
8:57
Chapter 10
in Number Theory
book
8:57
162
abstract some of the details from these models and consider a programming
language in which the (partial) recursive functions are represented by finite
binary strings.1 These strings are just programs for a universal Turing machine (or universal lambda expression) and they take input in the form of
a binary string then output another binary string or diverge (fail to halt).
For convenience, we will often consider these inputs and outputs to encode
tuples of positive integers.
On top of this simplified picture of computation, we impose one restriction which is necessary for the development of algorithmic information
theory (and hence ). The set of strings that encode the recursive functions
must be prefix-free. This means that no program can be an extension of
another, and thus each program is said to be self-delimiting. As algorithmic
information theory is intricately linked with communication as well as computation, this is quite a natural constraintif you wish to use a permanent
binary communication channel, then you need to know when the end of a
message has been reached and this cannot be done if some messages are
extensions of others.
There are many prefix-free sets that one could choose and many recursive mappings between these and the recursive functions. These different
choices of programming language lead to different values of , but this
does not matter much as almost all of its significant properties will remain
the same regardless. However, to allow talk of as a specific real number
we will use the same language as Chaitin [3].
Now that we have explained what we mean by a programming language,
we can give a quick overview of computability in terms of programs. A program computes a set of n-tuples if, when provided with input hx1 , . . . , xn i,
it returns 1 if this is a member of the set and 0 otherwise. A program
computes an infinite sequence if, when provided with input n, it returns
the value of the n-th element in the sequence. A program computes a real,
r, if it computes a sequence of rationals {rn } which converges to r and
|r rn | < 21n . These sets, sequences and reals that are computed by programs are said to be recursive.
There are also many sets, sequences and reals that cannot be computed,
but can be approximated in an important way. A program semi-computes a
set of n-tuples if, when provided with input hx1 , . . . , xn i, it returns 1 if this
is a member of the set and diverges otherwise. A program semi-computes
1 For
book
8:57
in Number Theory
book
163
8:57
164
book
8:57
in Number Theory
book
165
8:57
166
book
(10.5)
8:57
in Number Theory
book
167
(10.6)
x1 . . . xm D(a1 , . . . , an , x1 , . . . , xm ) = 0 (10.7)
(10.8)
where D is once again a polynomial, but now some of its variables are exponential functions of others. Davis, Putnam and Robinson [5] used this
additional flexibility to show that all r.e. sets are exponential Diophantine.
It had long been known that all exponential (and standard) Diophantine
sets are r.e. because it is trivial to write a program that searches for a solution to a given equation and halts if and only if it finds one. Therefore,
the new result meant that the exponential Diophantine sets were precisely
the r.e. sets.
In 1970, Yu. Matiyasevich [7] completed the final step, proving that all
exponential Diophantine sets are also Diophantine and thus that the Diophantine sets are exactly the r.e. setsa result now known as the dprm
Theorem.
8:57
168
book
8:57
in Number Theory
book
169
(10.9)
that has solutions for given values of k and N if and only if P returns 1
when provided with these as input. For a given value of k, there are solutions for infinitely many values of N if and only if the k-th bit of is 1.
Thus, by using a more subtle property of the family of Diophantine
equations, Chaitin was able to show that algorithmic randomness occurs
in number theory: as k is varied, there is simply no recursive pattern to
whether this family of equations has solutions for finitely or infinitely many
values of N .
By modifying Chaitins method slightly, we can find a new way of expressing the bits of through a family of Diophantine equations. We will
present this method informally here, with complete details being found
in [9] (see also [4]). Our result has now been extended by Matiyasevich [8].
Consider a new program, Q, that also takes inputs k and N , and begins
to compute the sequence {i }. For each value of i , Q checks to see if it is
greater than 2Nk , halting if this is so, and continuing through the sequence
otherwise. Since {i } approaches from below, we can see that i > 2Nk
8:57
170
book
implies that > 2Nk and conversely, if > 2Nk there must be some value
of i such that i > 2Nk . Therefore, Q will halt on k and N if and only
if > 2Nk . Alternatively, we could say that Q recursively enumerates the
pairs hk, N i such that > 2Nk .
Just as we could determine the k-th bit of from the number of values
of N that make P return 1, so we can determine it from the number of
values of N for which Q halts. In what follows, we shall refer to these
quantities as as pk and qk respectively.
Unlike pk , qk is always finite. Indeed, an upper bound is easily found.
Since < 1, only values of k and N such that 2Nk < 1 can possibly be less
than and thus make Q halt. Since both k and N take only values from
the positive integers we also know that 2Nk > 0 and thus for
there
a given k,
k
k
are less than 2 values of N for which Q halts and qk 0, 1, . . . , 2 1 .
From the value of qk , it is quite easy to derive the first k bits of .
Firstly, note that qk is equal to the largest value of N such that 2Nk <
unless there is no such N , in which case it equals 0. Either way, its value can
be used to provide a very tight bound on the value of : 2qkk < qk2+1
k .
Since is irrational, we can strengthen this to 2qkk < < qk2+1
k , which
means that the first k bits of 2qkk are exactly the first k bits of .
This gives some nice results connecting qk and . The first k bits of 2qkk
are just the bits of qk when written with enough leading zeros to make k
digits in total. Thus qk , when written in this manner, provides the first k
bits of . Additionally, we can see that qk is odd if and only if the k-th bit
of is 1.
Now that we know the power and flexibility of qk , it is a simple matter
to follow Chaitin in bringing these results to number theory. The function
computed by Q is r.e. so, by the dprm Theorem, there must be a family of
Diophantine equations
2 (k, N, x1 , . . . , xm ) = 0
(10.10)
that has a solution for specified values of k and N if and only if Q halts
when given these values as inputs. Therefore, for a particular value of k,
this equation only has solutions for values of N between 0 and 2k 1 with
the number of solutions, qk , being odd if and only if the k-th bit of is 1.
This new family of Diophantine equations improves upon the original
one in a couple of ways. Whereas the first method expressed the bits of
in the fluctuations between a finite and infinite amount of values of N that
8:57
in Number Theory
book
171
give solutions, the second keeps this value finite and bounded, with the bits
of expressed through the more mundane property of parity. It is the fact
that this quantity is always finite that leads to many of the new features
of this family of Diophantine equations. pk is infinite when the k-th bit of
is 1 and, since there is only one way in which it can be infinite, it can
provide no more than this one bit of information. On the other hand, qk
can be odd (or even) in 2k1 ways, which is enough to give k 1 additional
bits of information, allowing the first k bits of to be determined.
The fact that qk is always finite also provides a direct reduction of the
problem of determining the bits of to Hilberts Tenth Problem. To find
the first k bits of , one need only determine for how many values of N
the new family of Diophantine equations has solutions. Since we know that
there can be no solutions for values of N greater than or equal to 2k , we
could determine the first k bits of from the solutions to 2k instances
of Hilberts Tenth Problem. In fact, we can lower this number by taking
advantage of the fact that if there is a solution for a given value of N then
there are solutions for all lower values. All we need is to find the highest
value of N for which there is a solution and we can do this with a bisection
search, requiring the solution of only k instances of Hilberts Tenth Problem.
Finally, the fact that qk is always finite allows the generalisation of these
results from binary to any other base, b. If we replace all above references
to 2k with bk we get a new program, Qb , with its associated family of Diophantine equations. For this family, the value of qk now gives us the first k
digits of the base b expansion of : it is simply the base b representation of
qk with enough leading zeroes to give k digits. The value of the k-th digit
of is simply qk mod b.
Chaitin [3] did not stop with his Diophantine representation of , but
instead moved to exponential Diophantine equations where his result could
be presented more clearly. He made this move to take advantage of the
theorem that all r.e. sets have singlefold exponential Diophantine representations, where a representation is singlefold if each equation in the family
has at most one solution.
We can denote the singlefold family of exponential Diophantine equations for the program P by
e1 (k, N, x1 , . . . , xm , 2x1 , . . . , 2xm ) = 0
(10.11)
For a given k, this equation will have exactly one solution for each of in-
8:57
172
book
finitely many values of N if the k-th bit of is 1 and exactly one solution
for each of finitely many values of N if the k-th bit of is 0. We can make
use of this to express the bits of through a more intuitive property.
If we treat N in this equation as an unknown instead of a parameter,
we get a new (very similar) family of exponential Diophantine equations
with only one parameter
e1 (k, x0 , x1 , . . . , xm , 2x1 , . . . , 2xm ) = 0
(10.12)
Since the previous family was singlefold and N has become another unknown, there will be exactly one solution to this single parameter family
for each value of N that gave a solution to the double parameter family.
Thus, (10.12) has infinitely many solutions if and only if the k-th bit of
is 1.
This same approach can be used with our method [9]. There is a twoparameter singlefold family of exponential Diophantine equations for Q and
this can be converted to a single parameter family of exponential Diophantine equations
e2 (k, x0 , x1 , . . . , xm , 2x1 , . . . , 2xm ) = 0
(10.13)
with between 0 and 2k 1 solutions, the quantity being odd if and only if
the k-th bit of is 1.
Finally, we have also shown [9] that both Chaitins finitude-based
method and our parity-based method can be used to generate polynomials
for . For a given family of Diophantine equations with two parameters,
D(k, N, x1 , . . . , xm ) = 0
(10.14)
(10.15)
Note that the parameter, N , is again treated as an unknown and thus denoted x0 .
If we restrict the values of the variables to positive integers then, for a
given k, this polynomial takes on exactly the set of all values of N for which
(10.14) has solutions. We can thus use this method on 1 = 0 and 2 = 0,
generating polynomials that express pk and qk in the number of distinct
positive integer values they take on for different values of k. We therefore
have a polynomial whose number of distinct positive integer values fluctuates from odd to even and back in an algorithmically random manner as a
8:57
in Number Theory
book
173
parameter k is increased.
Our result has now been further extended by Matiyasevich [8].
There are thus many ways in which algorithmic randomness is manifested in number theory. While finding whether solutions exists for certain equations is undecidable, finding the quantity of solutions or even just
whether this is finite or infinite, odd or even, is much harder. In the long
run, even the best computer program can do no better than chance.
References
[1] C.S. Calude. A characterization of c.e. random reals. Theoretical Computer
Science, 271:314, 2002.
[2] G.J. Chaitin. A theory of program size formally identical to information theory. Journal of the ACM, 22(3):329340, July 1975.
[3] G.J. Chaitin. Algorithmic Information Theory. Cambridge University Press,
Cambridge, 1987.
[4] G.J. Chaitin. Meta Math! The Quest for . Pantheon Books, New York,
2005.
[5] Martin Davis, Hilary Putnam, and Julia Robinson. The decision problem for
exponential Diophantine equations. Annals of Mathematics, Second Series,
74(3):425436, 1961.
[6] David Hilbert. Mathematical problems. lecture delivered before the International Congress of Mathematicians at Paris in 1990. Bulletin of the American
Mathematical Society, 8:437479, 1902.
[7] Yuri V. Matiyasevich. Hilberts Tenth Problem. MIT Press, Cambridge, 1993.
[8] Yuri V. Matiyasevich. Hilberts tenth problem and paradigms of computation.
In S.B. Cooper, B. L
owe, and L. Torenvliet, editors, New Computational
Paradigms, Proceedings of the First Conference on Computability in Europre,
pages 310321, Berlin Heidelberg, 2005. Springer-Verlag.
[9] T. Ord and T.D. Kieu. On the existence of a new family of Diophantine
equations for . Fundmenta Informaticae, 56:27384, 2003.
8:57
174
book
8:57
Chapter 11
The Secret Number. An Exposition of Chaitins Theory
11.1. Introduction
The purpose of this paper is to discuss formally some key issues dealing with
randomness, compressibility and undecidability. Our discussions lead also
to a comprehensive exposition of the properties of Chaitins secret number
and the remarkable connection with the undecidability of Diophantine
problems. Many of the subsequent arguments follow our exposition in [7].
Our presentation will be largely self-contained. The reader is assumed
to be familiar with the basics of formal languages, Turing machines and
recursively enumerable sets. For instance, [8] may be consulted in this respect.
A brief description about the contents of the paper follows. The main
result about exponential Diophantine representations will be given below
in this Introduction. Section 2 contains an informal discussion about randomness versus the compressibility of information. The informal discussion will continue in Section 10. Section 3 deals with Krafts inequality,
prefix-freeness and frequency indicators. The formal notion of a (Chaitin)
1 Corresponding
author
175
book
8:57
176
computer is presented in Section 4, where also the importance of being selfdelimited will become apparent. Section 5 shows the existence of universal
computers and, accordingly, the notion of (program-size) complexity is defined in the following Section. It is independent of the choice of the universal computer. Section 7 proves fundamental inequalities about program-size
complexity. Section 8 compares computational and information-theoretic
complexity and also discusses Kolmogorov complexity, where no prefixfreeness is required. Section 9 is about the construction of self-delimiting
programs. Section 11 is about the formal connection between probability
and complexity. It leads to the degree of randomness, Section 12. The formal apparatus suffices to present in Section 13 the remarkable properties of
, the halting probability of the universal computer. Resulting conclusions
about formal theories and Diophantine problems are presented in Section
14. The paper ends with some general remarks in Section 15.
We will end this Introduction with some well-known undecidability considerations needed for the main results in Section 14. A very good illustration about how everyday considerations in mathematics may lead to undecidability is Hilberts Tenth Problem. The problem, proposed by Hilbert
in a list of problems at the International Congress of Mathematicians in
1900, is to give an algorithm that will tell whether or not a polynomial
equation with integer coefficients has a solution in integers. Nothing was
known about undecidability in 1900 and, in view of Hilberts later attempts
to find general decision methods in logic, it remains questionable whether
he in 1900 really had in mind the possibility of a negative solution for the
Tenth Problem.
Thus, we consider equations of the form
(A)
P (x1 , . . . , xn ) = 0 ,
book
8:57
book
177
Definition 11.1. A set S of ordered n-tuples (a1 , . . . , an ), n 1, of nonnegative integers is termed Diophantine if and only if there are polynomials
P (a1 , . . . , an , x1 , . . . , xm ), Q(a1 , . . . , an , x1 , . . . , xm )
with nonnegative integer coefficients such that we have, for all nonnegative
integers a1 , . . . , an ,
(a1 , . . . , an ) S if and only if (x1 0) . . . (xm 0)
[P (a1 , . . . , an , x1 , . . . , xm ) = Q(a1 , . . . , an , x1 , . . . , xm )] .
Similarly, S is termed exponential Diophantine, briefly ED, if and only if
exponentiation is allowed in the formation of P and Q, that is, P and Q
are functions built up from a1 , . . . , an , x1 , . . . , xm and nonnegative integer
constants by the operations of addition A + B, multiplication AB and
exponentiation AB .
Some additional remarks are in order. For n > 1 we speak of Diophantine and exponential Diophantine relations. The terms unary and singlefold
are used in connection with exponential Diophantine representations. Here
singlefold means that, for any n-tuple (a1 , . . . , an ), there is at most one
m-tuple (x1 , . . . , xm ) satisfying the equation. Unary means that only oneplace exponentials, such as 2x , are used rather than two-place exponentials
y x . One of the easy observations due to number theory, [7], is that it is no
loss of generality to restrict the attention to unary representations.
In decidability theory one begins with an interesting set or relation,
such as the set of primes or the relation x is the greatest common divisor
of y and z, and looks for a Diophantine or an exponential Diophantine
equation representing it, in the sense of Definition 11.1.
Diophantine and exponential Diophantine sets and relations are recursively enumerable. Also the converses of these statements hold true. The
converse of the latter statement, often referred to as the Davis - Putnam Robinson Theorem can be stated as follows.
Theorem 11.2. Every recursively enumerable set and relation possesses a
singlefold exponential Diophantine representation.
The proof of Theorem 11.2 given in [7] is based on a sequence of constructions, each one showing that a set or relation is exponential Diophantine. The exponential Diophantine characterization is obtained via register
8:57
178
book
8:57
book
179
8:57
180
to tell which method is preferable for dealing with sequences such as those
in Construction 1.
There may be other ways to compress information than detected patterns. There is no pattern visible in tables of trigonometric functions. Even
tables of a modest size give rise to a rather long sequence of bits if everything is expressed as a single sequence. However, a much more compact way
to convey the same information is to provide instructions for calculating the
tables from the underlying trigonometric formulas. Such a description is
brief and, moreover, can be used to generate tables of any size.
Usually no such compression method can be devised for tables presenting empirical or historical facts. For instance, there are books presenting
the results of the gold medal winners in each event in each of the Olympic
Games since 1896. As regards such information, the amount for compression is negligible, especially if attention is restricted to the least significant
digits. Since the results tend to improve, there are regularities in the most
significant digits, even to the extent that predictions can be made for the
next Olympic Games. In general, as regards empirical data, compression
can be linked with inductive inference and inductive reasoning as it is employed in science: observations presented as sequences of bits are to be
explained and new ones are to be predicted by theories. For our purposes
in this paper, it is useful to view a theory as a computer program to reproduce the observations actually made. The scientist searches for minimal
programs, and then the amount of compression can be measured by comparing the size of the program with the size of the data. This leads also to
a plausible definition concerning what it means that the data are random:
no compression is possible. In other words, the most concise way of representing the data is just to list them.
Randomness will be a central theme in our considerations. We want
to mention one important aspect already at this point. There are various
statistical tests for disclosing deviations from randomness. We consider here
potentially infinite sequences of bits or digits, such as the decimal expansion
of . A sequence is normal if and only if each bit or digit, as well as each
block of any length of bits or digits, occurs with equal asymptotic frequency.
Clearly, if a sequence does not qualify as normal, then it is not intuitively
viewed as a random sequence. On the other hand, normality does not
guarantee randomness, no matter whether we view randomness intuitively
book
8:57
book
181
8:57
182
book
8:57
book
183
()
r
X
tj F I(Li ) = F I(Li )
j=1
r
X
tj
j=1
1 1 1
+ + + ... = 1 .
2 4 8
8:57
184
The equation holds also for some finite languages, for instance,
F I({a, b}) =
1 1
1 1 1
+ = F I({a, ba, bb}) = + +
2 2
2 4 4
1 1 1 1
+ + + =1.
4 4 4 4
All of these languages are maximal with respect to prefix-freeness: if a new
word over {a, b} is added to the language then the prefix-freeness is lost.
This follows either by a direct argument or by Theorem 11.4. Observe also
that the converse of Theorem 11.4 fails: the inequality F I(L) 1 does not
imply that L is prefix-free.
= F I({aa, ab, ba, bb}) =
book
8:57
book
185
8:57
186
book
0 1 1 0
0 1 1 0
6
6
initial
...
?
1 0
halt
...
...
?
0
0 ...
It can now be shown that this concrete machine model computes exactly
the partial recursive functions specified in the abstract definition of a computer given before Construction 2. Indeed, the partial recursive functions
computed by the machine model must satisfy the additional condition of
prefix-freeness. This follows because the machine is allowed neither to run
off the right end of the program tape nor to leave some part of the program
unread. Thus, C(u, v) and C(uu1 , v), u1 6= , can never both be defined.
Observe, however, that C(u, v) and C(uu1 , v 0 ) can both be defined.
Conversely, we show how a concrete machine C can simulate an abstract
computer C 0 . The basic idea is that C moves the reading head on the program tape only when it is sure that it should do so.
Given a program u and an input v, C first ignores u and starts generating on its work tape, just as an ordinary Turing machine, the recursively
enumerable set X = {x|C 0 (x, v) is defined}. This is done by dovetailing
through computations for all x. Denote by u1 the prefix of u already read;
initially u1 = . (C keeps u1 on its work tape.) All the time C keeps
checking whether or not u1 is a prefix of some element of X already generated. If it finds an x such that u1 = x, C goes to the halt state, after
first producing the output C 0 (u1 , v) on its work tape. If C finds an x such
that u1 is a proper prefix of x, then it reads the next symbol a from the
program tape and starts comparisons with u1 a.
This works. If C 0 (u, v) is defined, C will eventually find u. It then
compares step by step the contents of the program tape with u, and halts
with the output C 0 (u, v) if the program tape contains u. If the program tape
contains something else, C does not reach a correct halting configuration.
This is also the case if C 0 (u, v) is not defined. Observe that in this case
C 0 (u1 , v) may be defined, for some prefix u1 of u. Then C finds u1 and
8:57
book
187
goes to the halt state but the halting configuration is not the correct one:
a portion of the program tape remains unread.
11.5. Universal computer
We now continue to develop the abstract notions.
Definition 11.6. A computer U is universal if and only if, for every computer C, there is a simulation constant sim(C) such that, whenever C(u, v)
is defined, there is a program u0 (that is, a word over the binary alphabet
V = {0, 1}) such that
U (u0 , v) = C(u, v) and |u0 | |u| + sim(C) .
The following result shows that our definition is not vacuous.
Theorem 11.7. There is a universal computer.
Proof. Consider an enumeration of all computers, that is, of all tables
defining a computer: C0 , C1 , . . . . Clearly, the partial function F : N
V V V defined by
F (i, u, v) = Ci (u, v), i N ; u, v V ,
is partial recursive, and so is the partial function U : V V V defined
by
U (0i 1u, v) = Ci (u, v) .
Moreover, for each v, the domain of the projection Uv is prefix-free.
This follows because all projections of each Ci possess the required property of prefix-freeness. Consequently, U is a universal computer with
sim(Ci ) = i + 1, for each i.
Thinking of the machine model, after reading i 0s from the program
tape, U has on its work tape a description of Ci (part from the input v that
it has already at the beginning). When U meets the first 1 on the program
tape, it starts to simulate the computer Ci whose description it has at that
moment on its work tape. (Alternatively, U can store only i on its work
tape and, after seeing 1 on the program tape, compute Ci from i.) This
concludes the proof of Theorem 11.7.
Universal computers are by no means unique. A different U results, for
instance, from a different enumeration of the computers Ci . However, from
8:57
188
book
8:57
book
189
C, equals
HC (w) = min{|u| | u V and C(u, ) = w} .
Again, min is undefined if the set involved is empty. We denote briefly
H(w) = HU (w).
Thus, H(w) is the length of the minimal program for U to compute
w when started with the empty work tape. We can view the ratio between H(w) and |w| as the compressibility of w. We want to emphasize
that, intuitively, H(w) should be understood as the information-theoretic
(or program-size) complexity of w, as opposed to the computational complexity of w. This is also why we have chosen the notation H, standard in
information theory in entropy considerations. H(w) could also be referred
to as the algorithmic entropy.
We define next the conditional complexity of w, given t V .
Definition 11.11. For a computer C, the conditional complexity of a word
w V with respect to a word t V equals
HC (w/t) = min{|u| | u V and C(u, t ) = w} .
We denote briefly HU (w/t) = H(w/t), and speak of the conditional
complexity of w with respect to t. It is immediate by Theorem 11.9 that
H(w) and H(w/t) are defined for all w and t.
The complexities defined depend on the computer C. This is true also
as regards H(w) and H(w/t), because they depend on our chosen universal
computer. The next theorem asserts that words possess also an inherent
complexity, independent of the computer. This holds both for plain and
conditional complexity. Although easy to prove, this result is of fundamental importance for the whole theory. The result is often referred to as the
Invariance Theorem. It says that the universal computer is asymptotically
optimal. It does not say much about the complexity of an individual word
w, because the constant involved may be huge with respect to the length
of w.
Theorem 11.12. For every computer C, there is a constant AC such that
H(w) HC (w) + AC and H(w/t) HC (w/t) + AC ,
for all w and t.
8:57
190
for all w. The same result holds for the absolute value of the difference of
the conditional complexities as well. Observe, however, that an analogous
result does not hold for arbitrary computers C and C 0 .
The equation H(w) = |w | is an immediate consequence of the definition of w and Theorem 11.9.
Sometimes it is convenient to use the 0-notation for the order of magnitude, defined for functions on real numbers. By definition, f (x) = 0(g(x))
if and only if there are positive constants a and x0 such that
|f (x)| a|g(x)|
holds for all x x0 . Thus, the above inequality () can be written as
HU (w) = HU 0 (w) + 0(1) .
Let (x, y) be a pairing function, that is, a recursive bijection of V V
onto V . Such a pairing function is obtained, for instance, from the ordering
of V using a pairing function for nonnegative integers. We now present
the latter in detail. Consider the function (i, x) defined by the table
0 1 2 3 4 5 ...
0 0 1 3 6 10 15 . . .
1 2 4 7 11 . . .
2 5 8 12 . . .
.
3 9 13 ..
.
4 14 ..
.. ..
. .
book
8:57
book
191
Thus, values are plotted to the table in the increasing order of magnitude
along the diagonals. The function can be defined also by the quadratic
expression
(i, x) =
1
(i + x + 1)(i + x) + 1 .
2
8:57
192
H(w/w) = 0(1).
H(H(w)/w) = 0(1).
H(w) H(w, t) + 0(1).
H(w/t) H(w) + 0(1).
H(w, t) H(w) + H(t/w) + 0(1).
book
8:57
book
193
8:57
194
C(w u, ) = (w, t) .
Let us see how (v) follows from this claim. Indeed, by definition, HC (w, t)
is the length of the shortest program for C to compute (w, t) from the
empty word. Since, by (), w u is such a program, we obtain
HC (w, t) |w u| = |w | + |u| = H(w) + H(t/w) .
From this (v) follows because always H(w, t) HC (w, t) + sim(C).
It remains to verify the claim that there is such a computer C. We
follow the abstract definition. Let C(x, y) be the following partial recursive
function. For y 6= , C(x, y) is undefined. Let Y be the domain of U , that
is,
Y = {u V |U (u, ) is defined} .
The following effective procedure is now used to compute the value C(x, ).
Elements of the recursively enumerable set Y are generated until, if ever,
a prefix v of x is found. Then we write x in the form x = vu and simulate
the computations U (u, v) and U (v, ). If both of these computations halt,
we output
C(x, ) = (U (v, ), U (u, v)) .
Obviously C is partial recursive. We show that C satisfies the required
condition (), whenever u, w and t satisfy (). Denote x = w u, and
consider our algorithm for computing C(x, ). Since U (w , ) = w, we
conclude that w is in Y and, consequently, will eventually be found as a
prefix v of x, yielding the factorization x = vu = w u. Moreover, it is not
possible that another prefix v 0 of x would be found in this fashion, because
then one of the words v and v 0 would be a prefix of the other, where both
of the words v and v 0 are in Y . However, this would contradict our basic
assumption concerning computers: the domain of U is prefix-free.
book
8:57
book
195
IC (t : w) = HC (w) HC (w/t) ,
I(t : w) = IU (t : w) .
8:57
196
One would expect that the information contained in w about w is approximately the same as the information contained in w, and that the information contained in about w, and vice versa, amounts to nothing. Also
the other formal statements presented in the next theorem are plausible
from the intuitive point of view.
Theorem 11.16.
(i)
(ii)
(iii)
(iv)
(v)
I(t : w) 0(1).
I(t : w) H(t) + H(w) H(t, w) + 0(1).
I(w : w) = H(w) + 0(1).
I(w : ) = 0(1).
I( : w) = 0(1).
book
8:57
book
197
Construction 3. Many problems concerning regular languages, although decidable, are known to be hard from the point of view of computational complexity. For instance, it has been shown that for some such
problems no polynomial space bound exists. We use this fact to show that
sometimes the descriptional complexity is as low as it can be, whereas
the computational complexity is high.
Consider regular expressions over the alphabet {a, b}. Thus, we use
Boolean operations, catenation and star to the atoms a, b, . Let i ,
i = 0, 1, . . ., be an ordering of such regular expressions. For instance, we
may order the regular expressions first according to length and then alphabetically to get the order i . Let us call an index i saturated if and only if
the regular language denoted by i equals {a, b} , that is, its complement
is empty. A real number
r = .a0 a1 a2 . . .
in binary notation is now defined by the condition: ai = 1 if and only if i
is saturated.
Clearly, r > 0 because some indices i are saturated. On the other
hand, in our ordering that takes the length into account, the first saturated
index appears quite late, because at the beginning only languages with a
nonempty complement appear. We consider also the language consisting of
all prefixes of the binary expansion of r:
Lr = {w {0, 1}+ |w = a0 a1 . . . ai , for some i 0} .
It is obvious that there is a constant A such that the descriptional complexity of words in Lr is bounded from above by A, provided the length of
the word is given. This follows because the algorithm for computing bits
of the expansion of r, that is, for testing the emptiness of certain regular
languages can be carried out by a fixed computer. Such a computer can
be one of the computers C defined above. Then C starts with an empty
program tape and with the index i written in binary notation on its work
tape, and halts with the output a0 a1 . . . ai . Since the binary expansion of
i contains, roughly, log i bits we obtain the result H(w) log |w| + A for
all w in Lr . The same result is obtained also if r is the decimal or binary
expansion of . This can be viewed as the lowest possible descriptional
complexity: the same algorithm produces arbitrarily long prefixes of an
infinite sequence. This is a property shared by all computable (recursive)
sequences. The estimate H(w) log |w| + A can be further improved by
replacing |w| = i + 1 with its descriptional complexity. This yields the
8:57
198
book
8:57
book
199
8:57
200
book
8:57
w
01
101001
115
book
201
|w| M L(w)
SD(w)
10
1001
100101
110 101001 101001101001
1111 10101011 101010117
Consider now an arbitrary word x over {0, 1}. To present x in the form
x = M L(w)w, provided this is at all possible, it suffices to find the first bit
1 in x that occurs in an even-numbered position, counted from left to right.
The longer the word w is, the less is the contribution of M L(w) to the
length of SD(w). Asymptotically, we have
|SD(w)| = |w| + 2 log|w| .
Thus, the length increases only by an additive logarithmic term in the
transition from a word to its self-delimiting presentation. Such a difference
by a logarithmic term is often present when comparisons are made between
the Kolmogorov complexity and the self-delimiting Kolmogorov complexity
(Chaitin complexity). A typical example is Theorem 11.14 (vi). Because
K is not self-delimited, a logarithmic term is needed, but it is not needed
for H because it is already taken care of in the definitions.
To summarize, one can distinguish in the study of descriptional complexity the self-delimiting version (Chaitin complexity) and the non-selfdelimiting version (Kolmogorov complexity). As regards conditional complexity, there is a further possibility for a different definition, depending on
whether a word t itself, or the shortest program t for it is considered. (We
made the latter choice in our definition of complexity.) Different variations
in the basic definitions result in somewhat different theories but it is beyond
the scope of this contribution to investigate this matter in detail.
For our purposes, the given definition of complexity, using computers C
and the resulting prefix-freeness, is the most suitable. Our main purpose
is the discussion concerning the secret number , a very compact way of
encoding the halting problem (or any of the other undecidable problems,
[5]). In this discussion, probabilities play a central role: we will use the
Kraft inequality (Theorem 11.4). The self-delimiting version satisfies the
essential requirement for prefix-freeness.
8:57
202
book
8:57
book
203
Let us elaborate the latter claim. The claim is that, although most
words are random, we will never be able to explicitly exhibit a long word
that is demonstrably random. The reason is that the axioms and rules of
inference in any formal system T can be described in a certain number,
say n, of bits. The system T cannot be used to prove the randomness of
any word w much longer than n because, otherwise, a contradiction would
arise. If |w| = m is much larger than n and the randomness of w could
be proven within T , we could construct a computer C to check through
the proofs of T , until the correct one is found. The description of C takes
roughly n bits and, thus, we get a program much shorter than m describing
w, a contradiction. To put it very simply, we cannot construct a program p
to print a word w with |w| > |p| unprintable by programs q with |q| < |w|.
Chaitin has expressed the matter by saying that if one has ten pounds of
axioms and a twenty-pound theorem, then the theorem cannot be derived
from the axioms.
Thus, we cannot know a random number but we can still know of a
random number. In particular, we can know of the secret number : it
is the probability that the universal computer U halts when it is started
with an empty work tape. (Thus U is started with a pair (u, ), where the
program u is arbitrary.) Before going into formal details, we still present
another possibility to encode the halting problem.
Construction 5. We consider now ordinary Turing machines because
self-delimitedness is not important here. Let T M0 , T M1 , T M2 , . . . be a
numbering of all Turing machines, and define a number A by its binary
expansion
A = .a0 a1 a2 . . . ,
where, for all i, ai = 1 if and only if T Mi halts with the empty tape. We
already pointed out in Construction 3 above that computable sequences are
never random. However, A is clearly noncomputable and, thus, could be
random as far as this matter is concerned. But A is not random. A gambler
is able to make an infinite profit by using some infinite subclass of Turing
machines with a decidable halting problem. A formal argument concerning
the compressibility of A can be based on the following observation. Consider
any prefix of A of length n. Suppose we know the number m of 1s in this
prefix. Then we know also the prefix itself, because we can dovetail the
computations of the first n Turing machines, until have found m of them
8:57
204
that have halted. Eventually, this will happen. Thus, information about
the prefix of length n can be compressed to information about n and m.
11.11. Probability and complexity
We first present the formal definition of . We consider the binary alphabet V = {0, 1}. We use the definitions of a computer C and a universal
computer U given above. Moreover, exactly as was done above, a fixed
universal computer U will be considered all the time. Also the optimal
program t for t is defined as before.
We are now ready for the fundamental definition.
Definition 11.19. the probability of a word w V with respect to a
computer C equals
X
PC (w) =
2|u| .
uV
C(u,)=w
In this definition, instead of probabilities, we could speak also of algorithmic probabilities or information-theoretic probabilities. The justification
for the terminology and the interconnection with the classical probability
theory will be discussed below. We prove first some formal results.
book
8:57
book
205
Theorem 11.20. The following inequalities hold for all words w and t over
V and for all computers C.
(i)
(ii)
(iii)
(iv)
0 PC (w) 1,
0 PC (w/t) 1,
P
0 xV PC (x) 1,
P
0 xV PC (x/t) 1.
8:57
206
Proof. The inequalities 0 < P (w) and 0 < P (w/t) follow by Theorem
11.21 because H(w) and H(w/t) are always defined (see Theorem 11.9).
By Theorem 11.20, (iii),
X
P (x) 1 ,
xV
and each term in the sum is greater than 0, we must have P (w) < 1. The
inequality P (w/t) < 1 follows similarly by Theorem 11.20, (iv). Consequently, Theorem 11.22 follows.
11.12. Degree of randomness
We begin this section by establishing some upper bounds concerning the
cardinalities of certain sets defined in terms of H and P . Intuitively, if
randomness is understood as information-theoretic incompressibility, then
almost everything is fairly random.
Theorem 11.23. For all computers C, words t and integers m, n 1, we
have
(i)
(ii)
(iii)
(iv)
card{w
card{w
card{w
card{w
Proof. HC (w) is the length of the shortest u, if any, such that C(u, ) =
w. Each u gives rise to at most one w, and there are no more than 2m 1
possible us, since this is the total number of words shorter than m. Hence,
(i) follows. The proof for (ii) is similar. Arguing indirectly, we see that if
(iii) does not hold, then
1=
n m
m m
PC (w) 1 .
wV
Here the last inequality follows by Theorem 11.20, (iii), and the strict inequality holds because we have strictly increased every element in a sum
and possibly added new elements. The contradiction 1 < 1 shows that (iii)
holds. Again, the proof for (iv) is similar, and Theorem 11.23 follows.
book
8:57
book
207
8:57
208
the same random sequences as the definition given above, see [Li Vi] and
[Cal].
11.13. Magic bits: properties of the secret number
We now discuss the properties of , the halting probability of the universal
computer, the secret number. Taking C = U in Theorem 11.20, (iii),
we obtain first 0 1. Theorem 11.22 shows that the first inequality
is strict. That also the second inequality is strict is a consequence of the
fact that U cannot halt for all programs used in the proof of the Kraft
inequality, Theorem 11.4. Thus, we have
0<<1.
Let
= .b1 b2 b3 . . .
be the binary expansion of . Ambiguities are avoided by choosing the
non-terminating expansion whenever two expansions are possible. (We do
not want to exclude the possibility of being rational!) Informally, we
refer to the bits bi , i 1, in the expansion of as magic bits. Thus, is
a real number. We consider also the infinite sequence
B = b1 b2 b2 . . .
of letters of {0, 1}, as well as its prefix of length i,
Bi = b1 . . . bi ,
i1.
book
8:57
book
209
be determined for finitely many parameter values but never for infinitely
many, let alone all, of them.
The domain of U , that is, the set
DOM (U ) = {u V |U (u, ) is defined }
is recursively enumerable. We consider some fixed enumeration of it, where
repetitions do not occur. Such an enumeration is obtained by a method
customarily referred to as dovetailing. We have already hinted at this
method before: you order the steps in different computations in one sequence, as in the definition of the pairing function.
Thus, we obtain a total recursive injection g of the set of positive integers
into V . We define, for n 1,
n =
n
X
2|g(j)| .
j=1
bj 2j .
j=i+1
To prove the second sentence, assume that we know Bi . Hence, we are able
to compute i . We now compute the numbers j until we have found an n
such that n > i . By the properties of j , this is always possible because
we know that such an n eventually comes up.
Let u1 be a word over V of length i1 i. We claim that U (u1 , ) is
defined if and only if u1 is one of the words g(1), . . . , g(n). The if-part
of the claim is clear. To prove the only if -part, we assume the contrary:
u1 = g(m), where m > n. We obtain a contradiction by the following chain
of inequalities
> m n + 2i1 n + 2i > i + 2i .
8:57
210
Indeed, every word belonging to the left side belongs to the right side and,
conversely, if H(w) i then w has a program of length at most i.
Consider now the partial recursive function f from V into V , defined
as follows. Given x = x1 . . . xt , xj V , find the smallest m, if any, such
book
8:57
book
211
that
m >
t
X
xj 2j .
j=1
If such an m is found, f (x) is the first word (in lexicographical order) not
belonging to the set
{g(j)|1 j m} .
Let C be the computer defined by
C(x, ) = f (U (x, )) .
We now consider an arbitrary prefix Bi and infer
H(f (Bi )) HC (f (Bi )) + sim(C)
= min{|u| | C(u, ) = f (Bi )} + sim(C)
= min{|u| | f (U (u, )) = f (Bi )} + sim(C)
min{|u| | U (u, ) = Bi } + sim(C)
= H(Bi ) + sim(C) .
By the equation (A), H(f (Bi )) > i. Hence,
i sim(C) < H(Bi )
for all i. This means that B is random, by Definition 11.24 with A =
sim(C).
It is clear by the discussion after Theorem 11.25 that an upper bound
n is inherent in every formal theory, such that no prefix Bi with i > n
can be produced according to the theory. Theorem 11.26 shows that such
an upper bound concerns also the total number of bits of that can be
produced according to the theory.
8:57
212
P (i, x1 , . . . , xm ) = Q(i, x1 , . . . , xm )
book
8:57
()
book
213
P (i, n, x2 , . . . , xm ) = Q(i, n, x2 , . . . , xm ).
8:57
214
References
[1] C. Calude, Information and Randomness. An Algorithmic Perspective. Second Edition. Springer-Verlag, Berlin, Heidelberg, New York (2002).
[2] G. Chaitin, A theory of program size formally identical to information
theory. Journal of the Association for Computing Machinery 22 (1975) 329
340.
book
8:57
book
215
8:57
216
book
8:57
Chapter 12
The Complexity of the Set of Nonrandom Numbers
Frank Stephan
Department of Mathematics and School of Computing
National University of Singapore, 2 Science Drive 2, Singapore 117543;
[email protected]
Let C and H denote the plain and prefix-free description complexity,
respectively. Then the sets NRC of nonrandom numbers with respect to
C has neither a maximal nor an r-maximal superset. The set of NRH of
nonrandom numbers with respect to H has an r-maximal but no maximal
superset. Thus the lattices of recursively enumerable supersets (modulo
finite sets) of NRC and NRH are not isomorphic. Further investigations
deal with the related set NRW of numbers x with a stronger nonrandomness property: x 6= max{We } for any e < x where W0 , W1 , . . . is
derived from the underlying acceptable numbering of partial-recursive
functions. Friedman originally asked whether NRW T K for every underlying acceptable numbering and Davie provided a positive answer for
many underlying acceptable numberings. Later Teutsch asked whether
the set NRW can be r.e. or co-r.e.; as an answer to this question it is
shown that in the case that the underlying numbering is a Kolmogorov
numbering, NRW is not n-r.e. for any n. If one uses any acceptable
numbering instead of a Kolmogorov numbering, then the underlying
numbering can be chosen such that NRW is a co-2-r.e. set; but it cannot
be a 2-r.e. set for any acceptable numbering.
12.1. Introduction
Let C and H denote the plain and prefix-free description complexity, respectively. Furthermore, one can identify the numbers in the set In =
{2n 1, 2n , 2n + 1, . . . , 2n+1 2} with the binary strings of length n; a number x corresponds to a string iff x + 1 is the binary value of the string 1.
For having an easier connection to other fields of recursion theory, natural
numbers are used from now on. For any number x, the n with x In is
called the length of x, written |x|. The plain description complexity C is
217
book
8:57
218
Frank Stephan
defined as
C(x) = min{|p| : U (p) = x}
where U is a universal function. That is, the value C based on U satisfies
the following two conditions:
(1) The range of U is the set of all natural numbers, so that C(x) is defined
for all x.
(2) For every further unary partial-recursive function V there is a constant
c with C(V (p)) |p| + c for all p in the domain of V .
The other variant H is based on prefix-free machines. Chaitin [13] and
Levin [11] laid the foundations and showed the significance of this alternative approach which developed together with the original notion C to the
two best accepted concepts in description complexity. A prefix-free machine satisfies that all different strings p, q in its domain are incomparable
with respect to the string-prefix-relation; alternatively, one can also use the
Kraft-Chaitin-Theorem [13, 11] and say that a machine U is equivalent to
a prefix-free one iff
X
2|p| 1.
pdom(U )
A prefix-free machine U is universal iff the two conditions above hold where
in Condition 2 only prefix-free machines V are considered. Then
H(x) = min{|p| : U (p) = x}
where U is a prefix-free partial-recursive unary function which is universal
for the class of all prefix-free partial-recursive unary functions. Now the
two sets in question are defined as
NRC = {x : C(x) < |x|};
NRH = {x : H(x) < |x|}.
They are called the sets of nonrandom numbers with respect to C and
H or the sets of compressible strings with respect to C and H. These
sets are standard examples of simple sets, which do not arise through a
complicated construction but are just given as natural. Furthermore, they
are wtt-complete but not btt-complete. Kummer [9] showed that NRC is
tt-complete (for any given universal machine), but according to Muchnik
and Peretselski [14] the tt-completeness of NRH depends on the choice of
the universal machine. The existence of universal machines for which NRH
book
8:57
book
219
8:57
220
Frank Stephan
book
8:57
book
221
8:57
222
book
Frank Stephan
8:57
book
223
8:57
224
Frank Stephan
coinfinite by the preceding two paragraphs. For each e define the r.e. set
Ve = {n : Ln B We }.
Now let i, j be indices of r.e. sets such that Wj is the complement of Wi .
By the first part of the proof, all numbers n > i + j are in Vi Vj . As M is
maximal, M Ve for some e {i, j}. Then A We . Therefore, A and
A NRH are r-maximal.
The r-maximal set A NRH constructed has by Theorem 12.2 no hyperhypersimple superset. Thus it has no maximal superset. Such type of
r-maximal sets had been constructed by Robinson [18] in 1967 and by Lachlan [10] in 1968.
Corollary 12.1. There is an r-maximal set without a maximal superset.
A maximal set A has the property that the structure LA of its r.e. superset
modulo finite sets is just the 2-element Boolean Algebra. Hyperhypersimple sets are characterised as those sets where this structure is a Boolean
Algebra. Theorems 12.1 and 12.2 show that the two sets of nonrandom
numbers have a different superset structure.
Corollary 12.2. Given a set E, let (LE , ) be the partially ordered set
of the r.e. supersets of E modulo finite sets with the ordering induced from
set-inclusion. Then the structures (LNRC , ) and (LNRH , ) are not
isomorphic.
12.4. The Problems of Friedman and Teutsch
A numbering of partial recursive functions is called a Kolmogorov numbering iff for every further numbering of partial-recursive functions there
are constants c, d such that, for all x, every x equals some y with
y < cx + d. This translated of course to the domains Wx of x : for every
further numbering A0 , A1 , A2 , . . . of any r.e. sets there are constants c, d
such that for every x there is an y cx + d with Wy = Ax . Furthermore,
if W0 , W1 , W2 , . . . is derived from an acceptable numbering of partial recursive functions, then there is for every numbering A0 , A1 , A2 , . . . of any
r.e. sets a recursive function f such that Wf (x) = Ax for all indices x. Let
NRW denote the set
NRW = {x : y < x [x = max(Wy {0})]}
book
8:57
book
225
for a given fixed Kolmogorov numbering of the partial recursive functions. Friedman [6] asked whether NRW T K for every underlying acceptable numbering ; certainly it is easy to construct some for which is
true. Teutsch [20] added the question whether NRW can be r.e. or co-r.e.
for some acceptable numbering. The general case is indeed the more difficult one and therefore only Teutschs question can be answered for all
acceptable numberings.
In the following, a set is 2-r.e. iff it is the difference of two r.e. sets, it
is 3-r.e. iff it is the difference of an r.e. set minus a 2-r.e. set and so on.
Alternatively, one can say that a set A is n-r.e. iff there is an approximation
As of A such that A0 = and there are for every n at most n indices with
s with As+1 (x) 6= As (x). Recall the default approximation A0 , A1 , A2 , . . .
is that
x As y < x [{x} Wy,s {0, 1, . . . , s} {0, 1, . . . , x}].
This approximation makes for each x up to 2x mind changes; so it witnesses
that NRW is -r.e. but it does not witness that NRW is n-r.e. for any natural
number n. So the main question is whether NRW can be n-r.e. for some n.
The answer is affirmative if the underlying numbering is not requested to
be a Kolmogorov numbering.
Theorem 12.3. There is an acceptable numbering such that NRW is the
complement of some 2-r.e. set.
Proof. Fix the recursive partition J0 , {x0 }, J1 , {x1 }, J2 , {x2 } of the natural numbers satisfying |Jk | = 2k + 2 for all k; this is the partition given by
J0 = {0, 1, 2}, x0 = 3, J1 = {4, 5, 6, 7}, x1 = 8, J2 = {9, 10, 11, 12, 13, 14}
and so on. Furthermore, let for every k the number
yk = min{x Jk : z Jk [z x H(z) < k]}.
Note that the intervals Jk are chosen so large that every interval Jk contains
at least three numbers z with H(z) k and yk is just the maximum of all
these z. Furthermore, the yk are uniformly approximable from above as the
formula of their definition shows, let yk,s be the value of yk before stage s.
Now define As (x) = 1 iff there is a k with x Jk and s > 0 and x = yk,s . It
is easy to verify that this approximation witnesses that A is 2-r.e.: At the
beginning, As (x) = 0; if yk,s has come down and reached x then As (x) = 1;
if yk,s has gone below x then As (x) = 0 again.
The proof is completed by defining an acceptable enumeration
W0 , W1 , . . . such that NRW based on this numbering is a finite variant
8:57
226
Frank Stephan
book
8:57
book
227
8:57
228
book
Frank Stephan
y = max(Bx,s );
As (z) = NRW(z) for all z {0, 1, . . . , 3cy}; in particular, y As and
|{0, 1, . . . , 3yc} As | 3y.
for every u < x, there is an element larger than 3cy of the infinite set
Bu enumerated, that is, max(Bu,s ) > 3cy.
Now it would follow that Bx,s+1 = Bx,s {y + 1} by the construction of
the set Bx in contradiction to the assumption that y = max(Bx ). Hence
Bx has to be infinite.
Given n, let m be so large that every x 2cn + n + 1 satisfies the
condition max(Bx,0 ) < m. For these x, let sx be the first stage where
m+1 Bx,sx +1 . As max(Bx,sx+1 ) > 3mc and |{0, 1, . . . , 3mc}Asx | 3m,
there is for any y {m, m+1, . . . , 3mc} a stage t {sx , sx +1, . . . , sx+1 1}
with y = max(Bx,t ) and y < max(Bx,t+1 ), hence y At for that t. Hence
there at least 2m numbers y {m, m + 1, . . . , 3cm} for which there is a
t {sx , sx + 1, . . . , xx+1 1} with At (y) = 0 < At+1 (y) = 1. As this
applies to x = 0, 1, . . . , 2cn + n, there are 2m(2cn + n) pairs (y, t) with
y {m, m+1, . . . , 3cm} and At (y) < At+1 (y). So there is an number y with
At (y) < At+1 (y) for at least 2n stages t as |{m, m + 1, . . . , 3cm}| = 2cm + 1
and
2mn(2c + 1)
2m(2cn + n)
= 2n.
2mc + 1
(2c + 1)m
This contradicts to NRW being an n-r.e. set.
The last result of this paper gives a short alternative proof for Davies
first result that whenever NRW is based on a Kolmogorov numbering then
NRW T K. As NRW is -r.e., obviously NRW T K and this direction
is not included in the proof. After his first result, Davie corresponded
with Solovay who indicated to him how to generalize his first result such
that it answers Friedmans question for every usual underlying acceptable
numbering where usual means that there exists a polynomial p such that
for every recursively enumerable family A0 , A1 , . . . of r.e. sets and every x
there is an y < p(x) with Wy = Ax . The proof for Davies first result given
here uses Kummers Cardinality Theorem [7, 8], but it should be noted
that also Owings preliminary results [17] are sufficient to prove this result
as the sets En in the proof below are uniformly recursive relative to NRW.
Theorem 12.6. If the underlying numbering is a Kolmogorov numbering
then the set NRW is Turing complete; that is, NRW T K.
8:57
book
229
Proof. Note that the constant c from Remark 12.2 satisfies for every x
the condition {x, x+1, . . . , cx} 6 NRW. Now let J0 , J1 , . . . be a partitioning
of the natural numbers into intervals satisfying min(Jn )c < max(Jn ) for all
natural numbers n. Furthermore, let D0 , D1 , D2 , . . . be a canonical indexing
of all finite sets of natural numbers with D0 = . Define
Bm = {y : y m (|Dn | + 1) + |K Dn | for the n with m Jn }
and note that the Bm are uniformly r.e. finite sets. Now choose a constant
k such that for all n with |Dn | k and all m Jn , every set Bm has an
index u such that Wu = Bm and u < mk m (|Dn | + 1) + |K Dn |.
The constant exists since W0 , W1 , W2 , . . . is a Kolmogorov numbering of r.e.
sets. Now one can define the following sets in a way that they are uniformly
r.e. relative to NRW:
En = {u {0, 1, . . . , k} : m Jn [ |Dn | = k mk + u NRW]}.
By choice, En contains |K Dn | whenever |Dn | = k. Furthermore,
c(k + 1) min(Jn ) < (k + 1) max(Jn ), thus there is for every n some
u {0, 1, . . . , k} not contained in En . Now applying Kummers Cardinality Theorem [7, 8] to the cardinality function #K
k gives K T NRW.
Acknowledgments: The author wants to thank Martin Kummer for useful discussions and detailed comments. The author also thanks George
Davie, Carl Jockusch, Marcus Schaefer and Jason Teutsch for correspondence; Jason Teutsch permitted him to see an incomplete version of his
thesis including the open problems there.
References
[1] Gregory J. Chaitin. A theory of program size formally identical to information theory. Journal of the Association for Computing Machinery, 22:329
340, 1975.
[2] Gregory J. Chaitin. Information-theoretic characterizations of recursive infinite strings. Theoretical Computer Science, 2:4548, 1976.
[3] Gregory J. Chaitin. Algorithmic information theory, IBM Journal of Research and Development, 21:350359+496, 1977.
[4] George Davie. Foundations of Mathematics Recursion Theory Question.
http://cs.nyu.edu/pipermail/fom/2002-May/005535.html, This posting
answers Friedmans Question [6].
[5] Rodney G. Downey and Denis R. Hirschfeldt. Algorithmic Randomness and
Complexity. Manuscript, 2007.
8:57
230
Frank Stephan
book
8:57
Chapter 13
Omega and the Time Evolution of the n-Body Problem
Karl Svozil
Institut f
ur Theoretische Physik, University of Technology Vienna,
Wiedner Hauptstrae 8-10/136, A-1040 Vienna, Austria;
[email protected]
The series solution of the behavior of a finite number of physical bodies
and Chaitins Omega number share quasi-algorithmic expressions; yet
both lack a computable radius of convergence.
book
8:57
232
Karl Svozil
be able to predict with certainty the state of the Universe for any later time.
But [[ . . . ]] it can be the case that small differences in the initial values produce great differences in the later phenomena; a small error in the former
may result in a large error in the latter. The prediction becomes impossible
and we have a random phenomenon.
In what follows we present an even more radical departure from Laplacian determinism. A physical system of a finite number of bodies capable
of universal computation will be presented which has the property that
certain propositions remain not only provable intractable, but provable unknowable. Pointedly stated, our knowledge of any such system remains incomplete forever. For the sake of making things worse, we shall compress
and compactify this kind of physical incompleteness by considering physical observables which are truly random, i.e., algorithmically incompressible
and stochastic.
The methods of construction of physical nbody observables exhibiting
the above features turn out to be rather humble and straightforward. In
a first step, it suffices to reduce the problem to the halting problem for
universal computation. This can be achieved by embedding a universal
computer into a suitable physical system of a finite number of bodies. The
associated ballistic computation will be presented in the next section. In a
second reduction step, the universal computer will be directed to attempt to
compute Chaitins Omega number, which is provable random, and which
is among the most difficult tasks imaginable. Finally, consequences for
the series solutions [36] to the general n-body problem will be discussed.
13.2. Reduction by ballistic computation
In order to embed reversible universal computation into a quasi-physical
environment, Fredkin and Toffoli introduced a billiard ball model [7
10] based on the collisions of spheres as well as on mirrors reflecting the
spheres. Thus collisions and reflections are the basic ingredients for building
universal computation.
If we restrict ourselves to classical gravitational potentials without collisions, we do not have any repulsive interaction at our disposal; only attractive 1/r potentials. Thus the kinematics corresponding to reflections and
collisions has to be realized by purely attractive interactions. Fig. 13.1a)
depicts a Fredkin gate realized by attractive interaction which corresponds
to the analogue billiard ball configuration achieved by collisions (e.g., [8,
Fig. 4.5]). At points A and B and time ti , two bodies are either put at both
book
8:57
book
233
AB
AB
A? W
tf
ti
z
W
W
W
B?
AB
a)
AB
A
A
b)
Figure 13.1. Elements of universal ballistic computation realized by attractive 1/r potentials. a) Fredkins gate can perform logical reversibility: bodies will appear on the
right outgoing paths if and only if bodies came in at both A and B; b) Reflective mirror
element realized by a quasi-steady mass.
8:57
234
Karl Svozil
book
8:57
book
235
References
[1] P.-S. Laplace, Philosophical Essay on Probabilities. Translated from
the fifth French edition of 1825. (Springer, Berlin, New York,
1995,1998). ISBN 978-0-387-94349-7. URL http://www.archive.org/
details/philosophicaless00lapliala.
[2] H. Poincare, Wissenschaft und Hypothese. (Teubner, Leipzig, 1914).
[3] K. E. Sundman, Memoire sur le probl`eme de trois corps, Acta Mathematica.
36, 105179, (1912).
[4] Q. D. Wang, The global solution of the n-body problem, Celestial Mechanics.
50, 7388, (1991). doi: 10.1007/BF00048987. URL http://dx.doi.org/10.
1007/BF00048987.
[5] F. Diacu, The solution of the n-body problem, The Mathematical Intelligencer. 18(3), 6670, (1996).
[6] Q. D. Wang, Power series solutions and integral manifold of the nbody problem, Regular & Chaotic Dynamics. 6(4), 433442, (2001). doi:
10.1070/RD2001v006n04ABEH000187. URL http://dx.doi.org/10.1070/
RD2001v006n04ABEH000187.
[7] E. Fredkin and T. Toffoli, Conservative logic, International Journal of Theoretical Physics. 21(3-4), 219253, (1982). doi: 10.1007/BF01857727. URL
http://dx.doi.org/10.1007/BF01857727. reprinted in [24, Part I, Chapter
3].
[8] N. Margolus, Physics-like model of computation, Physica. D10, 8195,
(1984). reprinted in [24, Part I, Chapter 4].
[9] N. Margolus. Universal cellular automata based on the collisions of soft
spheres. In ed. A. Adamatzky, Collision-based computing, pp. 107134.
Springer, London, (2002). URL http://people.csail.mit.edu/nhm/cca.
pdf.
[10] A. Adamatzky. New media for collision-based computing. In ed.
A. Adamatzky, Collision-based computing, pp. 411442. Springer, London,
(2002). URL http://people.csail.mit.edu/nhm/cca.pdf.
[11] E. M. Gold, Language identification in the limit, Information and Control.
10, 447474, (1967). doi: 10.1016/S0019-9958(67)91165-5. URL http://dx.
doi.org/10.1016/S0019-9958(67)91165-5.
[12] L. Blum and M. Blum, Toward a mathematical theory of inductive inference,
Information and Control. 28(2), 125155 (June, 1975).
[13] D. Angluin and C. H. Smith, A survey of inductive inference: Theory and
methods, Computing Surveys. 15, 237269, (1983).
[14] L. M. Adleman and M. Blum, Inductive inference and unsolvability, The
Journal of Symbolic Logic. 56, 891900 (Sept., 1991). doi: 10.2307/2275058.
URL http://dx.doi.org/10.2307/2275058.
[15] M. Li and P. M. B. Vit
anyi, Inductive reasoning and Kolmogorov complexity,
Journal of Computer and System Science. 44, 343384, (1992). doi: 10.1016/
0022-0000(92)90026-F. URL http://dx.doi.org/10.1016/0022-0000(92)
90026-F.
[16] H. Rogers, Jr., Theory of Recursive Functions and Effective Computability.
8:57
236
Karl Svozil
book
8:57
Chapter 14
Binary Lambda Calculus and Combinatory Logic
John Tromp
CWI, Kruislaan 413, 1098 SJ Amsterdam, Netherlands;
[email protected]
In the first part, we introduce binary representations of both lambda calculus and combinatory logic terms, and demonstrate their simplicity by
providing very compact parser-interpreters for these binary languages.
Along the way we also present new results on list representations, bracket
abstraction, and fixpoint combinators. In the second part we review Algorithmic Information Theory, for which these interpreters provide a
convenient vehicle. We demonstrate this with several concrete upper
bounds on program-size complexity, including an elegant self-delimiting
code for binary strings.
14.1. Introduction
The ability to represent programs as data and to map such data back to
programs (known as reification and reflection [9]), is of both practical use
in metaprogramming [14] as well as theoretical use in computability and
logic [17]. It comes as no surprise that the pure lambda calculus, which
represents both programs and data as functions, is well equipped to offer
these features. Kleene [7] was the first to propose an encoding of lambda
terms, mapping them to G
odel numbers, which can in turn be represented
as so called Church numerals. Decoding such numbers is somewhat cumbersome, and not particularly efficient. In search of simpler constructions, various alternative encodings have been proposed using higher-order abstract
syntax [8] combined with the standard lambda representation of signatures [11]. A particularly simple encoding was proposed by Mogensen [22],
for which the term m.m(x.x)(x.x) acts as a selfinterpreter. The prevalent data format, both in information theory and in practice, however, is
1 no
book
8:57
238
John Tromp
book
8:57
book
239
where M [x := N ] denotes the result of substituting N for all free occurrences of x in M (taking care to avoid variable capture by renaming bound
variables in M if necessary). For example,
(x y.y x)y (x.(z.z x))y (x z.z x)y = z.z y.
A term with no -redex, that is, no subterm of the form (x.M )N , is said
to be in normal form. Terms may be viewed as denoting computations of
which -reductions form the steps, and which may halt with a normal form
as the end result.
14.2.1. Some useful lambda terms
Define (for any M, P, Q, . . . , R)
I x.x
true x y.x
nil false x y.y
hP, Q, . . . , Ri z.z P Q . . . R
M [0] M true
M [i + 1] (M false)[i]
Y f.((x.x x)(x.f (x x)))
(x.x x)(x.x x)
Note that
true P Q = (x y.x) P Q = x[x := P ] = P
false P Q = (x y.y) P Q = y[y := Q] = Q,
justifying the use of these terms as representing the booleans.
A pair of terms like P and Q is represented by hP, Qi, which allows one
to retrieve its parts by applying htruei or hfalsei:
htrueihP, Qi = hP, Qi true = true P Q = P
hfalseihP, Qi = hP, Qi false = false P Q = Q.
Repeated pairing is the standard way of representing a sequence of
terms:
hP, hQ, hR, . . .iii.
8:57
240
John Tromp
A sequence is thus represented by pairing its first element with its tailthe
sequence of remaining elements. The ith element of a sequence M may be
selected as M [i]. To wit:
hP, Qi[0] = true P Q = P,
hP, Qi[i + 1] (hP, Qi false)[i] = Q[i].
The empty sequence, for lack of a first element, cannot be represented by
any pairing, and is instead represented by nil. A finite sequence P, Q, . . . , R
can thus be represented as hP, hQ, h. . . , hR, nili . . .iii.
Our choice of nil allows for the processing of a possible empty list s
with the expression
s M N,
which for s nil reduces to N , and for s hP, Qi reduces to M P Q N ].
In contrast, Barendregt [13] chose I to represent the empty list, which
requires a much more complicated list processing expression like like
s (a b c.c a b) M XN , which for s = nil reduces to N M X, and for
s hP, Qi reduces to M P Q X N ].
Y is the fixpoint operator, that satisfies
Yf = (x.f (x x))(x.f (x x)) = f (Y f ).
This allows one to transform a recursive definition f = . . . f . . . into f =
Y(f.(. . . f . . .)), which behaves exactly as desired.
is the prime example of a term with no normal form, the equivalence
of an infinite loop.
14.2.2. Binary strings
Binary strings are naturally represented by boolean sequences, where true
represents 0 and false represents 1.
Definition 14.1. For a binary string s and lambda term M , (s : M )
denotes the list of booleans corresponding to s, terminated with M . Thus,
(s : nil) is the standard representation of string s.
For example, (011 : nil) htrue, hfalse, hfalse, niliii represents the string
011. We represent an unterminated string, such as part of an input stream,
as an open term (s : z), where the free variable z represents the remainder
of input.
book
8:57
book
241
The interpreter works in continuation passing style [15]. Given a continuation and a bitstream containing an encoded term, it returns the continuation applied to the abstracted decoded term and the remainder of the
stream. The reason for the abstraction becomes evident in the proof.
The theorem is a special case of a stronger one that applies to arbitrary
de Bruijn terms. Consider a de Bruijn term M in which an index n occurs
8:57
242
John Tromp
Proof.
We take
E Y (e c s.s (a t.t (b.a E0 E1 )))
E0 e (x.b (c (z y.x hy, zi))(e (y.c (z.x z (y z)))))
E1 (b (c (z.z b))(s.e (x.c (z.x (z b))) t))
book
8:57
book
243
the continuation applied to the decoded term provided with the new bindings. In case b = false (application), it calls E recursively again, extracting
another decoded term y, and returns the continuation applied to the application of the decoded terms provided with shared bindings.
E1 , in case b = true, decodes to the 0 binding selector. In case b =
false, it calls E recursively on t (coding for an index one less) to extract a
binding selector x, which is provided with the tail z b of the binding list to
obtain the correct selector.
We continue with the formal proof, using induction on M .
Consider first the case where M = 0. Then
c : N ) = E C (10 : N )
E C (M
= hfalse, htrue, N ii (a t.t (b.a E0 E1 ))
= htrue, N i (b.false E0 E1 )
= (E1 N )[b := true]
= C (z.z true) N,
as required. Next consider the case where M = n + 1. Then, by induction,
c : N ) = E C (1n+2 0 : N )
E C (M
= hfalse, hfalse, (1n 0 : N )ii (a t.t (b.a E0 E1 ))
= (s.e (x.C (z.x (z false)))(1n+1 0 : N ))(1n 0 : N )
= E (x.C (z.x (z false))) (b
n : N)
= (x.C (z.x (z false))) (z.nz[] ) N
= C (z.n(z
false)[]
)N
= C (z.(z false)[n])) N
= C (z.z[n + 1])) N
= C (z.(n + 1)z[] ) N,
as required. Next consider the case M = M 0 . Then, by induction and
claim 14.1,
d0 : N )) = E C (00M
c0 : N )
E C ((M
c0 : N )ii (a t.t (b.a E0 E1 ))
= htrue, htrue, (M
c0 : N )
= e (x.(C (z y.xhy, zi))) (M
= (x.(C (z y.x hy, zi)))(z.M 0z[] ) N
8:57
244
book
John Tromp
d00 : N )
= (x.(e (y.C (z.x z (y z)))))(z.M 0z[] ) (M
0z[]
d00 : N )
= e (y.C (z.(z.M ) z (y z)))(M
= (y.C (z.M 0z[] (y z)))(z.M 00z[] ) N
= C (z.M 0z[] M 00z[] ) N
= C (z.(M 0 M 00 )z[] ) N,
as required. This completes the proof of Theorem 14.1.
8:57
book
245
(x not free in M )
(x 6 M )
x. (M N ) S ( x. M ) (0 x. N )
14.3.1. Binary Combinatory Logic
Combinators have a wonderfully simple encoding as binary strings: encode
S as 00, K as 01, and application as 1.
e of a combinator C as
Definition 14.3. We define the encoding C
e 00
S
e 01
K
eD
e
Cg
D1C
e the size of combinator C.
Again we call |C|
For instance, the combinator S(KSS) (S((KS)S)) is encoded as
10011010000. The size of a combinator with n K/Ss, which necessarily
has n 1 applications, is thus 2n + n 1 = 3n 1.
For such a simple language we expect a similarly simple interpreter.
Theorem 14.3. There is a cross-interpreter F of size 124, such that for
every combinator M and terms C, N we have
f : N) = C M N
F C (M
8:57
246
Proof.
John Tromp
We take
F Y (e c s.s(a.a F0 F1 ))
F0 t.t (b.c (b S K))
F1 e (x.e (y.(c x y)))
of size 131 and note that a toplevel beta reduction saves 7 bits in size.
Given a continuation c and sequence s, it extracts the leading bit a
of s, and tail t extracts the next bit b, and selects F0 to deal with a =
true (S or K), or F1 to deal with a = false (application). Verification is
straightforward and left as an exercise to the reader.
We conjecture F to be the smallest interpreter for any binary representation of CL. The next section considers translations of F which yield a
self-interpreter of CL.
(x 6 M )
whenever possible. Now the size of F as a combinator is only 281, just over
half as big.
Turner [23] noticed that repeated use of bracket abstraction can lead to
a quadratic expansion on terms such as
X a b . . . z.(a b . . . z) (a b . . . z),
book
8:57
book
247
of applicability:
2 x. (S K M ) S K (for all M )
2 x. M K M
(x 6 M )
x. x I
2
x. (M x) M
2
(x 6 M )
x. (x M x) x. (S S K x M )
2
8:57
248
John Tromp
book
8:57
book
249
Turing machines are an obvious choice, but turn out to be less than
ideal: The operating logic of a Turing machineits finite controlis of an
irregular nature, having no straightforward encoding into a bitstring. This
makes construction of a universal Turing machine that has to parse and
interpret a finite control description quite challenging. Roger Penrose takes
up this challenge in his book [1], at the end of Chapter 2, resulting in a
universal Turing machine whose own encoding is an impressive 5495 bits in
size, over 26 times that of E.
The ominously named language Brainfuck which advertises itself as
An Eight-Instruction Turing-Complete Programming Language [21], can
be considered a streamlined form of Turing machine. Indeed, Oleg Mazonka and Daniel B. Cristofani [16] managed to write a very clever BF selfinterpreter of only 423 instructions, which translates to 423 log(8) = 1269
bits (the alphabet used is actually ASCII at 7 or 8 bits per symbol, but
the interpreter could be redesigned to use 3-bit symbols and an alternative
program delimiter).
In [5], Levin stresses the importance of a (descriptional complexity)
measure, which, when compared with other natural measures, yields small
constants, of at most a few hundred bits. His approach is based on constructive objects (c.o.s) which are functions from and to lower ranked c.o.s.
Levin stops short of exhibiting a specific universal computer though, and
the abstract, almost topological, nature of algorithms in the model complicates a study of the constants achievable.
Gregory Chaitin [2] paraphrases John McCarthy about his invention of
LISP, as This is a better universal Turing machine. Lets do recursive
function theory that way! Later, Chaitin continues with So Ive done
that using LISP because LISP is simple enough, LISP is in the intersection
between theoretical and practical programming. Lambda calculus is even
simpler and more elegant than LISP, but its unusable. Pure lambda calculus with combinators S and K, its beautifully elegant, but you cant really
run programs that way, theyre too slow.
There is however nothing intrinsic to calculus or CL that is slow; only
such choices as Church numerals for arithmetic can be said to be slow, but
one is free to do arithmetic in binary rather than in unary. Frandsen and
Sturtivant [10] amply demonstrate the efficiency of calculus with a linear
time implementation of k-tree Turing Machines. Clear semantics should be
a primary concern, and Lisp is somewhat lacking in this regard [4]. This
paper thus develops the approach suggested but discarded by Chaitin.
8:57
250
John Tromp
book
8:57
book
251
14.4.2. Monadic IO
The reason for preserving the remainder of input in the prefix casse is
to facilitate the processing of concatenated descriptions, in the style of
monadic IO [19]. Although a pure functional language like calculus cannot
define functions with side effects, as traditionally used to implement IO, it
can express an abstract data type representing IO actions; the IO monad.
In general, a monad consists of a type constructor and two functions, return
and bind (also written >>= in infix notation) which need to satisfy certain
axioms [19]. IO actions can be seen as functions operating on the whole
state of the world, and returning a new state of the world. Type restrictions
ensure that IO actions can be combined only through the bind function,
which according to the axioms, enforces a sequential composition in which
the world is single-threaded. Thus, the state of the world is never duplicated
or lost. In our case, the world of the universal machine consists of only the
input stream. The only IO primitive needed is readBit, which maps the
world onto a pair of the bit read and the new world. But a list is exactly
that; a pair of the first element and the remainder. So readBit is simply
the identity function! The return function, applied to some x, should map
the world onto the pair of x and the unchanged world, so it is defined by
return x y.hx, yi. Finally, the bind function, given an action x and a
function f , should subject the world y to action x (producing some ha, y 0 i)
followed by action f a, which is defined by bind x f y.x y f (note that
ha, y 0 if = f a y 0 ) One may readily verify that these definitions satisfy the
monad axioms. Thus, we can wite programs for U either by processing the
input stream explicitly, or by writing the program in monadic style. The
latter can be done in the pure functional language Haskell [20], which is
essentially typed lambda calculus with a lot of syntactic sugar.
8:57
252
John Tromp
book
8:57
book
253
l1
X
(xi + 1)2i =
i=0
l1
X
2i +
i=0
l1
X
xi 2i = 2l 1 + X.
i=0
(n)
0 p(n),
or, equivalently,
0=0
n + 1 = 1 l(n) n.
8:57
254
John Tromp
Figure 14.1.
Figure 14.2 shows the codes as segments of the unit interval, where code
x covers all the real numbers whose binary expansion starts as 0.x.
14.5. Upper bounds on complexity
Having provided concrete definitions of all key ingredients of algorithmic
information theory, it is time to prove some concrete results about the
complexity of strings.
The simple complexity of a string is upper bounded by its length:
KS(x) |bI| + l(x) = l(x) + 4
The prefix complexity of a string is upper bounded by the length of its
book
8:57
book
255
delimited version:
\ + l(x) = l(x) + 413.
KP (x) |delimit|
where delimit is a straightforward translation of the following Haskell
code into calculus:
delimit = do bit <- readBit
if bit then return []
else do len <- delimit
n <- readbits len
return (inc n)
where
readbits [] = return []
readbits len = do bit <- readBit
x <- readbits (dec len)
return (bit:x)
dec [True] = []
dec (True:rest) = False:(dec rest)
dec (False:rest) = True:rest
inc [] = [True]
inc (True:rest) = False:rest
inc (False:rest) = True:(inc rest)
The do notation is syntactic sugar for the binding operator >>=, as exemplified by the following de-sugared version of readbits:
readbits len = readBit >>= (\bit ->
readbits (dec len) >>= (\x ->
return (bit:x)))
The prefix complexity of a pair is upper bounded by the sum of individual prefix complexities, one of which is conditional on the shortest program
of the other:
K(x, y) K(x) + K(y|x ) + 1876.
This is the easy side of the fundamental Symmetry of information
theorem K(x) K(x|y ) = K(y) K(y|x ) + O(1), which says that y
contains as much information about x as x does about y.
8:57
256
John Tromp
Chaitin [3] proves the same theorem using a resource bounded evaluator,
which in his version of LISP comes as a primitive called try. His proof is
embodied in the program gamma:
(((lambda (loop) (((lambda (x*) (((lambda (x) (((lambda
(y) (cons x (cons y nil)))) (eval (cons ((read-exp)) (cons
(cons (cons x* nil)) nil)))))) (car (cdr (try no-time-limit
((eval (read-exp))) x*)))))) (loop nil)))) ((lambda (p)
(if(= success (car (try no-time-limit ((eval (read-exp)))
p))) p (loop (append p (cons (read-bit) nil)))))))
of length 2872 bits.
We constructed an equivalent of try from scratch. The constant
1876 is the size of the term pairup defined below, containing a symbolic
lambda calculus normal form reducer (which due to space restrictions is
only sparsely commented):
-- identity
I = \x x
-- bool x y represents if bool then x else y
true = \x\y x
false = \x\y y
-- allows for list processing as: list (\head\tail\x case-non-nil)
case-nil nil = false
-- unary number representation
zero = false
one = \s s zero
succ = \n\s s n
pred = \n n I
-- binary Lambda Calculus interpreter
intL = \cont\list list (\bit0\list1 list1 (\bit1 bit0
(intL (\exp bit1 (cont (\args\arg exp (\z z arg args)))
(intL (\exp2 cont (\args exp args (exp2 args))))))
(bit1 (cont (\args args bit1))
(\list2 intL (\var cont (\args var (args bit1))) list1))))
-- binary Lambda Calculus universal machine allowing open programs
uniL = intL (\x x x)
readvar = \cont\list list (\bit0 bit0
(cont (\suff \z z bit0 suff)nil )
(readvar (\pref\v cont (\suff \z z bit0 (pref suff)) (\s s v))))
-- binary Lambda Calculus parser
readc = \cont\list list (\bit0 bit0
(\list1 list1 (\bit1 readc (\pref1\exp1 bit1
(cont (\suff \z z bit0 (\z z bit1 (pref1 suff))) (\l\a\v l exp1))
(readc (\pr2\exp2 cont (\suff \z z bit0 (\z z bit1 (pr1 (pr2 suff))))
(\l\a\v a exp1 exp2) )))))
(readvar (\pref\var cont (\suff \z z bit0 (pref suff)) (\l\a\v v var))))
book
8:57
book
257
8:57
258
John Tromp
2104 bits, at the cost of introducing yet another primitive into his language.
Our program is 996 bits shorter than his first, and 228 bits shorter than
his second.
14.6. Future Research
It would be nice to have an objective measure of the simplicity and expressiveness of a universal machine. Sizes of constants in fundamental theorems are an indication, but one that is all too easily abused. Perhaps
diophantine equations can serve as a non-arbitrary language into which to
express the computations underlying a proposed definition of algorithmic
complexity, as Chaitin has demonstrated for relating the existence of infinitely many solutions to the random halting probability . Speaking of ,
our model provides a well-defined notion of halting as well, namely when
U (p : z) = hM, zi for any term M (we might as well allow M without
normal form). Computing upper and lower bounds on the value of , as
Chaitin did for his LISP-based , and Calude et al. for various other languages, should be of interest as well. A big task remains in finding a good
constant for the other direction of the Symmetry of Information theorem,
for which Chaitin has sketched a program. That constant is bigger by an
order of magnitude, making its optimization an everlasting challenge.
14.7. Conclusion
The -calculus is a surprisingly versatile and concise language, in which not
only standard programming constructs like bits, tests, recursion, pairs and
lists, but also reflection, reification, and marshalling are readily defined,
offering an elegant concrete foundation of algorithmic information theory.
An implementation of Lambda Calculus, Combinatory Logic, along
with their binary and universal versions, written in Haskell, is available
at Tromps website [24].
Acknowledgements
I am greatly indebted to Paul Vitanyi for fostering my research into concrete
definitions of Kolmogorov complexity, and to Robert Solovay for illuminating discussions on my definitions and on the above symbolic reduction engine in particular, which not only revealed a bug but lead me to significant
further reductions in size.
book
8:57
book
259
References
[1] R. Penrose, The Emperors New Mind, Oxford University press, 1989.
[2] G. Chaitin, An Invitation to Algorithmic Information Theory, DMTCS96
Proceedings, Springer Verlag, Singapore, 1997, pp. 123 (http://www.cs.
auckland.ac.nz/CDMTCS/chaitin/inv.html).
[3] G. Chaitin, Exploring Randomness, Springer Verlag, 2001. (http://www.cs.
auckland.ac.nz/CDMTCS/chaitin/ait3.html)
[4] R. Muller, M-LISP: A representation-independent dialect of LISP with reduction semantics, ACM Transactions on Programming Languages and Systems
14(4), 589616, 1992.
[5] L. Levin, On a Concrete Method of Assigning Complexity Measures, Doklady
Akademii nauk SSSR, vol. 18(3), pp. 727731, 1977.
[6] M. Li and P. Vit
anyi, An Introduction to Kolmogorov Complexity and Its
Applications, Graduate Texts in Computer Science, second edition, SpringerVerlag, New York, 1997.
[7] S.C. Kleene, Lambda-Definability and Recursiveness, Duke Mathematical
Journal, 2, 340353, 1936.
[8] Frank Pfenning and Conal Elliot, Higher-Order Abstract Syntax, ACM SIGPLAN88 Conference on Programming Language Design and Implementation,
199208, 1988.
[9] D. Friedman and M. Wand, Reification: Reflection without Metaphysics, Proc.
ACM Symposium on LISP and Functional Programming, 348355, 1984.
[10] Gudmund S. Frandsen and Carl Sturtivant, What is an Efficient Implementation of the -calculus?, Proc. ACM Conference on Functional Programming
and Computer Architecture (J. Hughes, ed.), LNCS 523, 289312, 1991.
[11] J. Steensgaard-Madsen, Typed representation of objects by functions,
TOPLAS 11-1, 6789, 1989.
[12] N.G. de Bruijn, Lambda calculus notation with nameless dummies, a tool for
automatic formula manipulation, Indagationes Mathematicae 34, 381392,
1972.
[13] H.P. Barendregt, Discriminating coded lambda terms, in (A. Anderson and
M. Zeleny eds.) Logic, Meaning and Computation, Kluwer, 275285, 2001.
[14] Francois-Nicola Demers and Jacques Malenfant, Reflection in logic, functional and object-oriented programming: a Short Comparative Study, Proc.
IJCAI Workshop on Reflection and Metalevel Architectures and their Applications in AI, 2938, 1995.
[15] Daniel P. Friedman, Mitchell Wand, and Christopher T. Haynes, Essentials
of Programming Languages 2nd ed, MIT Press, 2001.
[16] Oleg Mazonka and Daniel B. Cristofani, A Very Short Self-Interpreter,
http://arxiv.org/html/cs.PL/0311032, 2003.
[17] D. Hofstadter, Godel, Escher, Bach: an Eternal Golden Braid, Basic Books,
Inc., 1979.
[18] H.P. Barendregt, The Lambda Calculus, its Syntax and Semantics, revised
edition, North-Holland, Amsterdam, 1984.
[19] Simon Peyton Jones, Tackling the awkward squad: monadic input/output,
8:57
260
John Tromp
concurrency, exceptions, and foreign-language calls in Haskell, in Engineering theories of software construction, ed. Tony Hoare, Manfred Broy, Ralf
Steinbruggen, IOS Press, 4796, 2001.
[20] The Haskell Home Page, http://haskell.org/.
[21] Brainfuck homepage, http://www.muppetlabs.com/~breadbox/bf/.
[22] Torben . Mogensen, Linear-Time Self-Interpretation of the Pure Lambda
Calculus, Higher-Order and Symbolic Computation 13(3), 217-237, 2000.
[23] D. A. Turner, Another algorithm for bracket abstraction, J. Symbol. Logic
44(2), 267270, 1979.
[24] J. T. Tromp, http://www.cwi.nl/~tromp/cl/Lambda.lhs, 2004.
book
8:57
Philosophy
book
8:57
book
8:57
Chapter 15
Where Do New Ideas Come From? How Do They Emerge?
Epistemology as Computation (Information Processing)
Gordana Dodig-Crnkovic
M
alardalen University, V
aster
as, Sweden,
gordana. dodig-crnkovic@ mdh. se
This essay presents arguments for the claim that in the best of all possible
worlds (Leibniz) there are sources of unpredictability and creativity for
us humans, even given a pancomputational stance. A suggested answer
to Chaitins questions: Where do new mathematical and biological ideas
come from? How do they emerge? is that they come from the world
and emerge from basic physical (computational) laws. For humans as a
tiny subset of the universe, a part of the new ideas comes as the result
of the re-configuration and reshaping of already existing elements and
another part comes from the outside as a consequence of openness and
interactivity of the system. For the universe at large it is randomness
that is the source of unpredictability on the fundamental level. In order
to be able to completely predict the Universe-computer we would need
the Universe-computer itself to compute its next state; as Chaitin already demonstrated there are incompressible truths which means truths
that cannot be computed by any other computer but the universe itself.
15.1. Introduction
The previous century had logical positivism and all that emphasis on the
philosophy of language, and completely shunned speculative metaphysics,
but a number of us think that it is time to start again. There is an emerging digital philosophy and digital physics, a new metaphysics associated
with names like Edward Fredkin and Stephen Wolfram and a handful of
like-minded individuals, among whom I include myself.
It was in June 2005 I first met Greg Chaitin at the E-CAP 2005 conference in Sweden, where he delivered the Alan Turing Lecture, and presented
his book Meta Math! It was a remarkable lecture and a remarkable book
263
book
8:57
264
Gordana Dodig-Crnkovic
This is a very important result, which sheds a new light on epistemology. It sheds a new light on the meaning of Godels and Turings negative
responses to Hilberts program. What is scientific truth today after all,3 if
not even mathematics is able to prove every true statement within its own
domain? Chaitin offers a new and encouraging suggestion mathematics
may be not as monolithic and a priori as Hilbert believed.
But we have seen that the world of mathematical ideas has infinite complexity; it cannot be explained with any theory having a finite number of
1I
had the privilege to discuss the Turing Lecture article with Chaitin, while editing
the forthcoming book Dodig-Crnkovic G. and Stuart S., eds. (2007), Computation,
Information, Cognition The Nexus and The Liminal, Cambridge Scholars Publishing.
The present paper is meant as a continuation of that dialog.
2 For a detailed implementation of the idea of information compression, see Wolff (2006).
3 Tasic, in his Mathematics and the Roots of Postmodern Thought gives an eloquent
answer to this question in the context of human knowledge in general.
book
8:57
book
265
bits, which from a sufficiently abstract point of view seems much more like
biology, the domain of the complex, than like physics, where simple equations reign supreme.
The consequence is that the ambition of having one grand unified theory of mathematics must be abandoned. The domain of mathematics is
more like an archipelago consisting of islands of truths in an ocean of incomprehensible and uncompressible information. Chaitin, in an interview
in September 2003 says:
You see, you have all of mathematical truth, this ocean of mathematical
truth. And this ocean has islands. An island here, algebraic truths. An
island there, arithmetic truths. An island here, the calculus. And these
are different fields of mathematics where all the ideas are interconnected
in ways that mathematicians love; they fall into nice, interconnected
patterns. But what Ive discovered is all this sea around the islands.
Here the interesting question of the nature of a vacuum is worth mentioning. A vacuum
in modern physics is anything but empty it is simmering with continuous activity, with
virtual particles popping up from it and disappearing into it. Chaitins ocean of the
unknown can be imagined as a vacuum full of the activity of virtual particles.
8:57
266
Gordana Dodig-Crnkovic
Normally one equates a new biological idea with a new species, but in
fact every time a child is born, thats actually a new idea incarnating; its
reinventing the notion of human being, which changes constantly.
I have no idea how to answer this extremely important question; I wish
I could. Maybe you will be able to do it. Just try! You might have to keep
it cooking on a back burner while concentrating on other things, but dont
give up! All it takes is a new idea! Somebody has to come up with it. Why
not you? (Chaitin 2006)
That is where I want to start. After reading Meta Math! and a number
of Chaitins philosophical articles,5 and after having written a thesis based
on the philosophy of computationalism/informationalism (Dodig-Crnkovic,
2006) I dare to present my modest attempt to answer the big question
above, as a part of a Socratic dialogue. My thinking is deeply rooted in
pancomputationalism, characterized by Chaitin in the following way:
5A
goldmine of articles may be found on Chaitins web page. See especially www.cs.
auckland.ac.nz/~chaitin/g.pdf, Thinking About G
odel & Turing.
book
8:57
book
267
And how about the entire universe, can it be considered to be a computer? Yes, it certainly can, it is constantly computing its future state
from its current state, its constantly computing its own time-evolution!
And as I believe Tom Toffoli pointed out, actual computers like your PC
just hitch a ride on this universal computation! (Chaitin 2006)
6 The
universe is a network of computing processes and its phenomena are infocomputational. Both continuous as discrete, analogue as digital computing are parts
of the computing universe. (Dodig-Crnkovic, 2006). For the discussion about the necessity of both computational modes on the quantum mechanical level see Lloyd (2006).
8:57
268
Gordana Dodig-Crnkovic
book
8:57
book
269
generalized to mean natural computation. MacLennan 2004 defines natural computation as computation occurring in nature or inspired by that
in nature, which besides classical computation also includes quantum computing and molecular computation, and may be represented by either discrete or continuous models. Examples of computation occurring in nature
encompass information processing in evolution by natural selection, in the
brain, in the immune system, in the self-organized collective behavior of
groups of animals such as ant colonies, and in particle swarms. Computation inspired by nature includes genetic algorithms, artificial neural nets,
simulated immune systems, and so forth. There is a considerable synergy
gain in relating human-designed computing with the computing in nature.
Here we can illustrate Chaitins claim that we only understand something
if we can program it: In the iterative course of modeling and computationally simulating (programming) natural processes, we learn to reproduce
and predict more and more of the characteristic features of the natural systems.
Classical ideal theoretical computers are mathematical objects and are
equivalent to algorithms, abstract automata (Turing machines or logical
machines as Turing called them), effective procedures, recursive functions,
or formal languages. Contrary to traditional Turing computation, in which
the computer is an isolated box provided with a suitable algorithm and
an input, left alone to compute until the algorithm terminated, interactive
computation (Wegner 1988, Goldin et al. 2006) presupposes interaction
i.e. communication of the computing process with the environment during
computation. Interaction consequently provides a new conceptualization
of computational phenomena which involves communication and information processing. Compared with new emerging computing paradigms, in
particular with interactive computing and natural computing, Turing machines form the proper subset of the set of information processing devices.
(Dodig-Crnkovic, 2006, paper B)
The Wegner-Goldin interactive computer is conceived as an open system in communication with the environment, the boundary of which is
dynamic, as in living biological systems and thus particularly suitable to
model natural computation. In a computationalist view, organisms may
be seen as constituted by computational processes; they are living computers. In the living cell an info-computational process takes place using
DNA, in an open system exchanging information, matter and energy with
the environment.
8:57
270
Gordana Dodig-Crnkovic
Burgin (2005) in his book explores computing beyond the Turing limit
and identifies three distinct components of information processing systems:
hardware (physical devices), software (programs that regulate its functioning and sometimes can be identical with hardware, as in biological computing), and infoware (information processed by the system). Infoware is
a shell built around the software-hardware core, which is the traditional
domain of automata and algorithm theory. Semantic Web is an example of
infoware that is adding a semantic component to the information present
on the web (Berners-Lee, Hendler and Lassila, 2001).
For the implementations of computationalism, interactive computing
is the most appropriate general model of natural computing, as it suits
the purpose of modeling a network of mutually communicating processes
(Dodig-Crnkovic 2006). It will be of particular interest to computational
accounts of epistemology, as a cognizing agent interacts with the environment in order to gain experience and knowledge. It also provides a unifying
framework for the reconciliation of classical and connectionist views of cognition.
15.4. Cognitive Agents Processing Data Information
Knowledge
Our specific interest is in how the structuring from data to information
and knowledge develops on a phenomenological level in a cognitive agent
(biological or artificial) in its interaction with the environment. The central
role of interaction is expressed by Gorzel (1994) in the following way:
Today, more and more biologists are waking up to the sensitive
environment-dependence of fitness, to the fact that the properties which
make an organism fit may not even be present in the organism, but may
be emergent between the organism and its environment.
One can say that living organisms are about the environment, that
they have developed adaptive strategies to survive by internalizing environmental constraints. The interaction between an organism and its environment is realized through the exchange of physical signals that might be seen
as data, or when structured, as information. Organizing and mutually relating different pieces of information results in knowledge. In that context,
computationalism appears as the most suitable framework for naturalizing
epistemology.
book
8:57
book
271
Maturana and Varela (1980) presented a very interesting idea that even
the simplest organisms possess cognition and that their meaning-production
apparatus is contained in their metabolism. Of course, there are also nonmetabolic interactions with the environment, such as locomotion, that also
generates meaning for an organism by changing its environment and providing new input data. We will take Maturana and Varelas theory as the
basis for a computationalist account of evolutionary epistemology.
At the physical level, living beings are open complex computational
systems in a regime on the edge of chaos,7 characterized by maximal informational content. Complexity is found between orderly systems with high
information compressibility and low information content and random systems with low compressibility and high information content. Living systems
are open, coherent, space-time structures maintained far from thermodynamic equilibrium by a flow of energy. (Chaisson, 2002)
Langton has compared these different regions to the different states of
matter. Fixed points are like crystals in that they are for the most part
static and orderly. Chaotic dynamics are similar to gases, which can be
described only statistically. Periodic behavior is similar to a non-crystal
solid, and complexity is like a liquid that is close to both the solid and
the gaseous states. In this way, we can once again view complexity and
computation as existing on the edge of chaos and simplicity. (Flake 1998)
Artificial agents may be treated analogously with animals in terms of
different degrees of complexity; they may range from software agents with
no sensory inputs at all to cognitive robots with varying degrees of sophistication of sensors and varying bodily architecture.
The question is: how does information acquire meaning naturally in the
process of an organisms interaction with its environment? A straightforward approach to naturalized epistemology attempts to answer this question via study of evolution and its impact on the cognitive, linguistic, and
social structures of living beings, from the simplest ones to those at highest
levels of organizational complexity (Bates 2005).
7 Bertschinger
N. and Natschl
ager T. (2004) claim Employing a recently developed
framework for analyzing real-time computations we show that only near the critical
boundary such networks can perform complex computations on time series. Hence, this
result strongly supports conjectures that dynamical systems which are capable of doing
complex computational tasks should operate near the edge of chaos, i.e. the transition
from ordered to chaotic dynamics.
8:57
272
Gordana Dodig-Crnkovic
This approach is not a hybrid dynamic/symbolic one, but interplay between analogue and digital information spaces, in an attempt to model
the representational behavior of a system. The focus on the explicitly
referential covariation of information between system and environment
is shifted towards the interactive modulation of implicit internal content and therefore, the resulting pragmatic adaptation of the system
via its interaction with the environment. The basic components of the
framework, its nodal points and their dynamic relations are analyzed,
aiming at providing a functional framework for the complex realm of
autonomous information systems (Arnellos et al. 2005)
book
8:57
book
273
8:57
274
Gordana Dodig-Crnkovic
preconditions for future inputs. Those processes are interactive and selforganizing. That makes the essential speed-up for the process of getting
more and more complex structures.
book
8:57
book
275
as segmentation into components. The intermediate level handles grouping, shape detection and such; and the top level processes this information
symbolically, constructing an overall interpretation of the scene. This
three-level perceptual hierarchy appears to be an exceptionally effective
approach to computer vision.
We look for those objects that we expect to see and we look for those
shapes that we are used to seeing. If a level 5 process corresponds to an
expected object, then it will tell its children [i. e. sub-processes] to look
for the parts corresponding to that object, and its children will tell their
children to look for the complex geometrical forms making up the parts to
which they refer, et cetera. (Gortzel 1994)
Human intelligence is indivisible from its presence in a body (Dreyfus
1972, G
ardenfors 2000, 2005, Stuart 2003). When we observe, act and reason, we relate different ideas in a way that resembles the relation of our
body with various external objects. Cognitive structures of living organisms
are complex systems with evolutionary history (Gell-Mann 1995) evolved
in the interaction between first proto-organisms with the environment, and
evolving towards more and more complex structures which is in a complete agreement with the info-computational view, and the understanding
of human cognition as a part of this overall picture.
15.7. Conclusions
This essay attempts to address the question posed by Chaitin (2006)
about the origin of creativity and novelty in a computational universe.
For that end, an info-computationalist framework was assumed within
which information is the stuff of the universe while computation is its
dynamics. Based on the understanding of natural phenomena as infocomputational, the computer in general is conceived as an open interactive
system, and the Classical Turing machine is understood as a subset of a
general interactive/adaptive/self-organizing universal natural computer. In
a computationalist view, organisms are constituted by computational processes, implementing computation in vivo.
All cognizing beings are physical (informational) systems in constant interaction with their environment. The essential feature of cognizing living
organisms is their ability to manage complexity, and to handle complicated
environmental conditions with a variety of responses that are results of
adaptation, variation, selection, learning, and/or reasoning. Increasingly
8:57
276
Gordana Dodig-Crnkovic
book
8:57
book
277
15.8. Acknowledgements
I would like to thank Greg Chaitin for his inspiring ideas presented in his
Turing Lecture on epistemology as information theory and the subsequent
paper, and for his kindness in answering my numerous questions.
References
Arnellos, A., Spyrou, T. and Darzentas, J. The Emergence of Interactive
Meaning Processes in Autonomous Systems, In: Proceedings of FIS
2005: Third International Conference on the Foundations of Information Science. Paris, July 4-7, 2005.
Bates, M. J. Information and Knowledge: An Evolutionary Framework
for Information Science. Information Research 10, no. 4 (2005),
InformationR.net/ir/10-4/paper239.html.
Bertschinger, N. and Natschlager, T. Real-Time Computation at the
Edge of Chaos in Recurrent Neural Networks, Neural Comp. 16 (2004)
14131436.
Berners-Lee, T., Hendler, J. and Lassila, O. The Semantic Web. Scientific American, 284, 5, (21001), 3443.
Bickhard, M. H. The Dynamic Emergence of Representation. In H.
Clapin, P. Staines, P. Slezak (Eds.) Representation in Mind: New
Approaches to Mental Representation, (2004), 7190. Amsterdam: Elsevier.
Burgin, M. (2005) Super-Recursive Algorithms, Berlin: Springer.
Campbell, D. T. and Paller, B. T. Extending Evolutionary Epistemology
to Justifying Scientific Beliefs (A sociological rapprochement with
a fallibilist perceptual foundationalism?). In Issues in evolutionary
epistemology, edited by K. Hahlweg and C. A. Hooker, (1989) 231257.
Albany: State University of New York Press.
Chaisson, E.J. (2001) Cosmic Evolution. The Rise of Complexity in Nature. Harvard University Press, Cambridge.
Chaitin, G. J. (1987) Algorithmic Information Theory, Cambridge University Press.
Chaitin, G Epistemology as Information Theory, Collapse, (2006) Volume I, 2751. Alan Turing Lecture given at E-CAP 2005, www.cs.
auckland.ac.nz/~chaitin/ecap.html.
Chaitin, G. J. (1987) Information Randomness & Incompleteness: Papers
on Algorithmic Information Theory, World Scientific.
8:57
278
Gordana Dodig-Crnkovic
book
8:57
book
279
Kornblith, H. ed. (1994) Naturalizing Epistemology, second edition, Cambridge: The MIT Press.
Kulakov, A. and Stojanov, G. Structures, Inner Values, Hierarchies And
Stages: Essentials For Developmental Robot Architecture, 2nd International Workshop on Epigenetic Robotics, Edinbourgh, 2002.
Leibniz, G. W. Philosophical Papers and Letters, ed. Leroy E. Loemaker
(Dodrecht, Reidel, 1969).
Lloyd, S (2006) Programming the Universe: A Quantum Computer Scientist Takes on the Cosmos, Alfred A. Knopf.
Lorenz, K. (1977) Behind the Mirror. London: Methuen.
MacLennan, B. Natural computation and non-Turing models of computation, Theoretical Computer Science 317 (2004) 115 145.
Maturana, H. (1980) Autopoiesis and Cognition: The Realization of the
Living. D. Reidel.
Maturana, H. and Varela, F. (1992) The Tree of Knowledge. Shambala.
Popper, K. R. (1972) Objective Knowledge: An Evolutionary Approach.
Oxford: The Clarendon Press.
Stich, S. (1993) Naturalizing Epistemology: Quine, Simon and the
Prospects for Pragmatism in C. Hookway & D. Peterson, eds., Philosophy and Cognitive Science, Royal Inst. of Philosophy, Supplement
no. 34 (Cambridge University Press) p. 1-17.
Stonier, T. (1997) Information and Meaning. An Evolutionary Perspective, Berlin: Springer.
Stuart, S. (2003) The Self as an Embedded Agent, Minds and Machines,
13 (2): 187.
Tasic, V. (2001) Mathematics and the Roots of Postmodern Thought.
Oxford University Press.
Toulmin, S. (1972) Human Understanding: The Collective Use and Evolution of Concepts. Princeton University Press.
Wegner, P. Interactive Foundations of Computing, Theoretical Computer Science 192 (1998) 315-51.
Whitehead, A. N. (1978) Process and Reality: An Essay in Cosmology.
New York: The Free Press.
Wolff,
J.
G.
(2006)
Unifying
Computing
and
Cognition, Cognition Research.org.uk, www.cognitionresearch.org.
uk/books/sp_book/ISBN0955072603_e3.pdf.
Wolfram, S. (2002) A New Kind of Science. Wolfram Science.
8:57
280
Gordana Dodig-Crnkovic
book
8:57
Chapter 16
The Dilemma Destiny/FreeWill
F. Walter Meyerstein
Calle Tavern 45, 08006 Barcelona, Spain; fwm@ filnet. es
book
8:57
282
F. Walter Meyerstein
book
8:57
book
283
definite but unending series. It follows that the world of contingents, the
world of what can possibly exist or not exist, involves infinity. Note further
that for Leibniz space and time have not real existence: indeed, on this
principle he bases his assertion that space and time are distinguished only
by means of the predicates of things and not the other way around. The
final result is encapsulated in Leibniz dictum that there are no two drops
of water perfectly alike. The law of continuity, on the other hand, asserts
that the causal chains of things form a series, so that every possible intermediate between the first and the last term is filled once, and only once.
Philosophers who have endeavored to establish the strict rules by which
the world is governed always had to face a vexing problem: how to accommodate human free-will in their world-system, that is, how can humans,
uniquely in the entire creation, interfere with the pre-established (by God?)
causal chains and start, so to say ex nihilo, a brand-new sequence of worldevents? And how did Leibniz reconcile his thorough-going causal determinism with human free-will? His solution does not appear to be very brilliant:
he proposes that God has communicated to us a certain degree of his perfection and of his liberty. But how then, in this best of all possible worlds,
in this pre-established harmony, do some obviously evil actions of free
humans fit? Note that admitting a free-will, if it is really free, is equivalent
to introducing randomness into the world, as freedom of decision implies
previous indetermination of the choices. In fact, the unsolved problem of
conciliating a deterministic world that imposes a pre-established destiny on
humans with a free-will is a very old one as I will now briefly show.
In around 44 BC, probably after the assassination of Caesar, and shortly
before he himself was murdered, M. T. Cicero wrote an essay with the title
De Fato (On Fate) from which only a part has come down to us. The subject of this essay constitutes the analysis of the relation of human free-will
with a rigid destiny as resulting from universal causation `a la Leibniz. To
show how this embarrassing contradiction was understood in the first century BC, I here cite a few passages from this work, using the translation of
H. Rackham in the Loeb Classical Library.
The essay is written as a dialogue between different parties, who in turn
quote previous philosophers to bolster their respective argumentations. Citing Carneades, head of the Platonic Academy in the second century BC,
we read: If everything takes place with antecedent causes, all events take
place in a closely knit web of natural interconnections; if this is true, all
things are caused by necessity; if this is true, nothing is in our power. But
8:57
284
F. Walter Meyerstein
something is in our power. Yet if all events take place by fate, there are
antecedent causes of all events. Therefore it is not the case that whatever
events take place take place by fate (XIV 31). But we read further on:
Even if it is admitted that nothing can happen without an antecedent
cause, what good would that be unless it be maintained that the cause in
question is a link in an eternal chain of causation? But a cause is that
which makes the thing of which it is the cause come about. However,
absurd situations result: For on these lines a well-dressed traveller also
will be said to have been the cause of the highwaymans robbing him of his
clothes (XIV 34).
From the extant fragments it rather clearly transpires that the problem
of destiny and free-will also escaped Cicero. A later philosopher, A.. Gellius (second century AD), wrote this comment in his Noctes Atticae: In
the book that he wrote on the subject of fate Marcus Cicero says that th[is
question] is very obscure and involved, and he remarks that the philosopher Chrysippu (Stoic philosopher of the third century BC), finding himself
quite at sea in the difficulty of how to explain his combination of universal
fatalism with human free-will, ties himself up in a knot.
These very succinct brushstrokes may give you an idea of how, throughout the centuries, philosophers have struggled to unravel the mystery posed
by the randomness and complexity of the world, how they have endeavoured to reintroduce order, purpose, even a design into it, and how they
have tried to make mutually compatible the visibly contradictory concepts
of fate (destiny) and human free-will. Amazingly, but in fact not so surprising, the fascinating work of G. S. Chaitin, as exposed in many of his
books, but particularly in his 2005 META MATH!, directly impinge on
these questions, as I will now try to show.
First of all, let me remark that causality is an extremely difficult idea
as is corroborated by the many books and papers on this subject by recent
philosophers. However, here I assume causality to be a clear term, intuitively understood. Further, note that the causal chains determining destiny
are supposed to be continuous, each individual link connected uniquely to
a past and a future event: prior links are assumed to be the sufficient cause
of all posterior links. Apparently, only in this way these philosophers have
understood a rigidly pre-determined fate (cf. Leibniz law of continuity).
It is obvious that any different solution introduces randomness into the
world making a rigid destiny an impossibility. Also note that cross-linking
or other interactions of the causal chains are not taken into consideration.
book
8:57
book
285
8:57
286
F. Walter Meyerstein
book
8:57
Chapter 17
Aliquid Est Sine Ratione: On some Philosophical
Consequences of Chaitins Quest for
Ugo Pagallo
Law School, University of Turin
Turin, Italy; ugo. pagallo@ unito. it
Introduction
In order to examine some philosophical consequences of Gregory Chaitins
quest for this paper comes in five sections. First of all (first section),
I consider Chaitins interpretation of Leibnizs thought and how Chaitins
halting probability invalidates Leibnizs principle of sufficient reason.
Then (second section), I compare this analysis with a classic reading of
Leibniz, namely what Heidegger states in The Principle of Reason. Once
we have grasped Heideggers criticism to the principle nihil est sine ratione
(section 3), I will stress some paramount differences between Heideggers
thesis and Chaitins theorems (section 4). By showing some flaws in the
German scholars viewpoint, what I would like to lay emphasis on is the
impact of Chaitins results in contemporary philosophical debate (section
5).
book
8:57
288
Ugo Pagallo
book
8:57
book
289
for example S. Wolfram [2002], whose thesis Chaitin have frequently discussed in
recent work.
9 See again D.C. Dennett [2003].
10 G. Chaitin [2004, 2].
11 Cf. G. Chaitin [2004, 3] and [2004, 133].
8:57
290
Ugo Pagallo
that G
odels theorems of incompleteness should have made easier to understand.)
But, again, what about contingent truths? Leibnizs problem on deciding when sufficient reasons have been given appears quite similar to
Chaitins question about how to decide if a computer program is elegant.
In order to shed some further light on this connection, let us proceed with
a strange German misunderstanding: Heideggers, not Hilberts!
17.2. A German misunderstanding
Martin Heidegger, one of the most influential philosophers of the 20th Century, dedicated his 19551956 course at the University of Freiburg to Leibnizs principle nihil est sine ratione.12 These lectures are of great importance not only because they offer a canonical reading of Leibniz, but also
because they focus on the Principle of Sufficient Reason, i.e., as Leibniz
himself explained to Arnauld in a letter on July 14th 1686: nothing happens without a reason that one can always render as to why the matter has
run its course this way rather than that.13
Heidegger portrays Leibniz in a twofold way. On the one hand, Leibniz
is presented as a milestone in the development of modern logic into logistics:
Only through looking back on what Leibniz thought can we characterise
the present agean age one calls the atomic ageas an age pervasively
bepowered by the power of the principium rationis sufficientis.14 On the
other hand, Leibniz should be understood at the light of German idealism
and of its metaphysical credo on the infinite self-knowing of the absolute
spirit.15 While Heidegger links the principle of reason to something he
calls the Destiny of Being,16 it would not be a mere coincidence if Leibniz, a German thinker, expressly posited that nothing is without reason
only 23 centuries after incessant Western philosophical tradition. Indeed,
metaphysics would be complete and philosophy accomplished at the very
moment in which nothing escapes from the quest for reason as in the case of
German idealism (from Leibniz to Hegel, so to say). What is mighty about
the principle of reason displays its power in that the principium reddendae
rationisto all appearances only a Principle of cognitionalso counts, pre12 I
book
8:57
book
291
8:57
292
Ugo Pagallo
book
8:57
book
293
Leibnizs principle of reason therefore would not fully pass the test that
the other basic principle we use in any demonstration passes, that is Aristotles principle of contradiction which is presented in the fourth book of
Metaphysics as a not hypothetical principle ( 3, 1005 b 14). Again,
the motive depends on the leap that occurs from the principle of reason as
foundation of beings (nothing/without) to being qua being that is, qua
ground/reason.24 We do not search for the ground/reason of the very
principle of reason and, hence, we avoid the regressio ad infinitum, only if
we admit a principle that gives us measures but still remains immeasurable.
We are dealing with a principle that grounds without having itself a ground
and that let you calculate and provide reasons while remaining itself incalculable. The more you try to provide reasons for everythingincluding
Being the less you understand that the very principle of reason is nothing but an uttering of being.25 By reversing the perspective, Heidegger
thus speaks about the groundless abyss of what is necessarily incalculable, immeasurable, groundless, otherwise it would not be possible to get
any grounds, reasons, measures, or computations of scientific reasoning. To
be explicit: Being, as what grounds, has no ground; as the abyss it plays
the play that, as Geschick, passes being and ground/reason to us.26
On this very basis, the aim of the German philosopher is to represent
our era as the outcome and triumph of Western philosophical tradition.
The first attempt to quantify relations between words, signs, and things
namely Leibnizs research on the characteristica universalishas led to a
world in which individuality vanishes at breakneck speed into total uniformity and all depends on the provision of atomic energy to establish Mans
domination over the World. The overall idea is that for the first time a
realm is opened up which is expressly oriented toward the possibility of rendering the ground of beings. (. . . ) This epoch characterises the innermost
essence of the age we call modernity.27 Heideggers critique of modernity
and of its granddaughter, i.e., contemporary technology in the atomic
era, is thus grounded on the limits of the Mighty Principle for it was this
very principle (is/reason) that turned out to guarantee calculability of objects (nothing/without) for instrumental cognition. The diagnosis is hence
complete: While the perfection of technology is only a simple echo of the
24 M.
8:57
294
Ugo Pagallo
required completeness of rendering reasons, this very completeness (nothing/without) ends up in the oblivion of aliquid, namely Being, which est
sine ratione (ground/reason). In Heideggers phrase: only beings have
and indeed necessarilya ground/reason. A being is a being only when
grounded. However, being, since it is itself ground/reason, remains without
ground/reason.28 That is why something is but has no reason: aliquid
sine ratione. Q.E.D.
17.4. A Post-modern turning point?
Heidegger has become the well-known hero of the contemporary philosophical approach which insists on the limits of instrumental reasoning and the
end of metaphysics. Somehow this does not sound all too foreign from
Chaitins own speculation. For example, in the conclusions of his Meta
Math! we read that formal axiomatic systems are a failure for coping with
creativity since no mechanical process (rules of the game) can be really
creative, because in a sense anything that ever comes out was already contained in your starting point.29 Furthermore, with the case of the flip of
a coin Chaitin highlights what rationalism violently opposes. Indeed, for a
rationalist everything happens for a reason in the physical world and in the
world of mathematics everything is true for a reason. But, an infinite series
of independent tosses of a fair coin represents a horrible nightmare for any
attempt to formulate a rational world view (. . . ) because each outcome
is a fact that is true for no reason, thats true only by accident!30 (By
the way, before exploring any connection between the halting probability
and, say, quantum mechanics,31 this very idea was exploited in order to
prove that determinism would be perfectly compatible with the principle
that some things have no reason at all. In a nutshell, the idea of flipping
a coin accomplishes just the opposite of digitising in computers: Instead
of absorbing all the micro-variation in the universe, it amplifies it, guaranteeing that the unimaginably large sum of forces acting at the moment will
tip the digitiser into one of two states, heads or tails, but with no salient
necessary conditions for either state.)32
However, there is a paramount difference between Heideggers idea of
28 M.
book
8:57
book
295
8:57
296
Ugo Pagallo
determinism and free will.37 In both cases, however, you have the problem
of pointing out, so to speak, something/without reason in the real world:
is that possible?
17.5. Hyper-modernity and some concluding remarks
Let me introduce you to the conclusions of this paper by summing up the results we have already got: Leibnizs principle of reason is untenable because
Chaitin shows something that is true for no reason simpler than itself. Contrarily to Heidegger and to part of the contemporary philosophical thought
the mistake is not then in is/reason, namely scientific reasoning, but in
nothing/without related to beings. That would automatically mean for
Leibniz that some contingent events are groundless or without reason: for
Chaitin, this mean[s] that physical randomness, coin tossing, something
non-mechanical, is the only possible source of creativity.38 This version
of the principle of Insufficient reason as something/without has emerged
in contemporary debate on true randomness in the sense of indeterminism.
On the one hand, Chaitin recalls Karl Svozils opinion that some new,
deeper, hidden-variable theory will eventually restore determinacy and law
to physics.39 On the other, some scholars like Daniel Dennett would add
that you do not really need indeterminism, that is real randomness, in order
to vouchsafe free will and get something/without cause.40
Chaitin has explained his viewpoint on what is something/without reason in physics and why he interprets unpredictability and chaosthat is:
real intrinsic randomnessof quantum mechanics in a digital way.41 However, by adopting this perspective of 0s and 1s as a measure of complexity
and information we shed further light on a significant issue in the traditional
philosophical debate as well as in social sciences. If determinism is to be
compatible with something/without cause, you do not have to grasp the
complexity of social institutions as infiniteas occurs in mathematicsin
order to get the same results of incompleteness.42 In particular, it is pretty
clear that truth can be represented as a sub-set of environmental complexity also in social sciences. This is the case of legal systems with textbook
37 See
book
8:57
book
297
examples of fons extra ordinem like revolutions or illegal customs in European civil law or like the Bill of Indemnity-tradition in UKs common law.
As suggested by Hayeks ideas and contemporary research on social evolution, the principle of Insufficient reason states that something/without
cause does exist in human interaction for, otherwise, there would be no
third fruitful way between mere chaos and old historical determinism such
as Marxs social laws or Hegels philosophy of history.43
However, to admit that some things have no reason, is by no way to
admit a sort of narrative post-modern reasoning. Again, the mistake is not
in is/reason as a demand of ground: The point is how to construe the
connection between ground/reason and something/without in a digital
way. If Heidegger presented Leibniz as responsible of the evils of modernity, Chaitin thinks of Leibniz as the main precursor of Todays digital
philosophy-paradigm and, indeed, it took more than two centuries to grasp
all of his hints! Once we have stressed the difference between Heideggers version of the principle of insufficient reasonthat is Being without
groundand Chaitins version of something/without, it is hereby clear
why one should follow the second path. Instead of traditional metaphysics
or pure speculations, we get truths that are such for no reason simpler
than themselves. So, to be aware of the limits which the principle of reason encounters and, hence, to understand the strength of the principle of
Insufficient reason through computable irreducibility and the maximally
unknowableboth in Maths and social sciencesdoes not lead to postmodernism but to what I would like to call Hyper-modernity. Indeed,
Heidegger was right when he claimed that it is reason as scientific reason,
namely modernity which unveils the core of contemporary E-revolution and
technology. However, the German thinker failed to comprehend that it is
only through scientific reason that we focus on the very limits of reason
itself. After Chaitins analysis on both digital ontology and digital epistemology, it is then time to apply his theorems to metaphysics. The quest
for is correlated to the traditional inquiry of being qua being: Aliquid
est sine ratione.44
43 I
8:57
298
Ugo Pagallo
References
C. Calude [2002]: Information and Randomness, Springer: Berlin.
C. Calude, M.A. Stay [2005]: From Heisenberg to Godel via Chaitin. In
International Journal of Theoretical Physics, 44, 7, pp. 10531065.
G. Chaitin [2004]: Leibniz, Information, Math and Physics. In: Wissen
und Glauben/Knowledge and Belief. Akten des 26. Internationalen
Wittgenstein-Symposiums 2003, Winfried/Weingartner: Wien, pp.
277286.
G. Chaitin [2005]: Meta Math! The Quest for Omega, Pantheon: New
York.
G. Chaitin [2006]: Teoria algoritmica della complessit`
a, Giappichelli:
Torino.
D.C. Dennett [2003]: Freedom Evolves, Penguin: London.
M. Heidegger [1991]: The Principle of Reason, edited by Reginald Lilly,
Indiana University Press: Bloomington and Indianapolis.
R. Kane [2002]: The Oxford Handbook of Free Will, edited by Robert
Kane, Oxford University Press: New York.
M.J. Loux, D. W. Zimmerman [2003]: The Oxford Handbook of Metaphysics, edited by Michael J. Loux and Dean W. Zimmerman, Oxford
University Press: New York.
U. Pagallo [2005a]: Introduzione alla filosofia digitale. Da Leibniz a
Chaitin, Giappichelli: Torino.
U. Pagallo [2005b]: Platos Daoism and the T
ubingen School. In: Journal
of Chinese Philosophy, 32, 4, pp. 597613.
U. Pagallo [2006a]: Teoria giuridica della complessit`
a, Giappichelli:
Torino.
U. Pagallo [2006b]: Chaitin e le scienze pratiche della complessit`a. In:
Chaitin [2006, 79102].
G. Parkes [1987]: Heidegger and Asian Thought, edited by Graham Parkes,
University of Hawaii Press: Honolulu.
S. Wolfram [2002]: A New Kind of Science, Wolfram Media: Champaign
Ill.
book
8:57
Essays
book
8:57
book
8:57
Chapter 18
Proving and Programming
18.1. Introduction
The current paper, a continuation of [12], is devoted to an analysis of
proof in mathematics from the perspective of the analogy between proving
theorems in mathematics and writing programs in computer science. We
will argue that:
(1) Theorems (in mathematics) correspond to algorithms and not programs
(in computer science); algorithms are subject to mathematical proofs
(for example for correctness).
(2) The role of proof in mathematical modelling is very small: adequacy is
the main issue.
(3) Programs (in computer science) correspond to mathematical models.
They are not subject to proofs, but to an adequacy and relevance anal1 Corresponding
author
301
book
8:57
302
While ideally sound, this type of proof (called Hilbertian or monolithic [21]) cannot be found in mathematical articles or books (except for a
few simple examples). However, most mathematicians believe that almost
all real proofs, published in articles and books, can, with tedious work,
be transformed into Hilbertian proofs. Why? Because real proofs look
convincing for the mathematical community [21]. Going further, DeMillo,
Lipton and Perlis argued that real proofs should be highly non-monolithic
because they aim to be heard, read, assimilated, discussed, used and generalised by mathematiciansthey are part of a social process.
Deductive rules are truth-preserving, but although the conclusion,
generically termed as theorem, yields knowledge2 , there is no claim that
it yields certain knowledge. The reason is simple: nothing certifies the
2 We
book
8:57
book
303
8:57
304
Earlier, in 2003, Goldston and Yildirim announced that there are infinitely many primes such that the gap to the next prime is very small.
The proof looked convincing till A. Granville and K. Soundararajan discovered a tiny flaw which looked fatal (see [23] for the story). The flaw was
discovered not by carefully checking Goldston and Yildirims proof, but by
extending it to show that there are infinitely many primes such that the gap
to the next prime is less than 12 (the gap-12 theorem), a result which was
too close to the twin prime conjecture to be true: they didnt believe it!
B. Conrey, the director of the American Institute of Mathematics, which
was close to this work, is quoted by Devlin [23] by saying that, without the
unbelievable Granville and Soundararajan gap-12 theorem,
the Goldston-Yildirim proof would in all probability have been published
and the mistake likely not found for some years.
How many proofs are wrong? Although many (most?) proofs are probably incomplete or benignly wrongthat is, they can be in principle
fixedit is almost impossible to make an educated guess about how many
proofs are wrong. One reason is that many proofs are only superficially
checked, because either they have limited interest or they never come to be
used (or both).
4 Every
book
8:57
book
305
you think that this is a lot of money, then refer to the BBC announcementbroadcast
as we are writing this paperconfirming that David Beckham will leave Real Madrid
and join Major League Soccer side LA Galaxy at the end of the season; he will be paid
$1 million per week [4].
6 Most mathematicians think he will.
8:57
306
Flaw Reported in New Intel Chip made the headlines of the Technology/Cybertimes section of the New York Times on May 6, 1997:
The Intel Corp. will not formally introduce its Pentium II microprocessor until Wednesday, but the World Wide Web is already buzzing with
reports of a bug that causes the new chip to make errors in some complex
mathematical calculations.
Only three years earlier, on December 20, 1994, Intel recalled its popular
Pentium processor due to an FDIV bug discovered by Thomas Nicely,
who was working on, guess what? He was calculating Bruns sum [38], the
series formed with the reciprocal of twin primes:
1 1
1
1
1
1
1 1
+
+
+
+
+
+
+
+ < .
3 5
5 7
11 13
17 19
Nicely worked with five PCs using Intels 80486 microprocessor and a
Pentium [37]. Comparing the results obtained with the old machines and
book
8:57
book
307
the new Pentium, he observed a discrepancy in the calculation of the reciprocals of the twin primes 824,633,702,441 and 824,633,702,443. Running
various tests, he identified the source of error in the floating point hardware unit of the Pentium CPU. Twenty three other errors were found by
Andreas Kaiser, while Tim Coe arrived at the simplest error instance: the
division 4,195,835/3,145,727which evaluates to 1.33382044 appears
on the Pentium to be 1.33373906 Coes ultra-simple example moved the
whole story from the Internet to New York Times.
In contrast with errors found in mathematical proofs, which remain
within the realm of mathematical experts, computer bugs attract the attention of a larger audience. For example, on January 17, 1995, Intel announced that it will spend $475 million to cover the recall of its Pentium
chip to fix the problem discussed above, a problem that may affect only a
few users.
Can bugs be avoided? More to the point of this article, can the use of
rigorous mathematical proofs guarantee that software and hardware perform
as expected?
18.4.2. From algorithms to programs
Bloch [6] identifies a bug in the Java implementation of a standard binary
search7 . Here is Blochs code:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
7 Apparently
8:57
308
16:
17:
with the explanation that the average value is truncated down to the nearest integer, a statement which is true for integers, but false for bounded
integers. If the sum of low + high is higher than 231 1, then the value
overflows to a negative value and stays negative by division to 2. How frequently can this situation appear? For arrays longer in length than 230 not
uncommon for Google applicationsthe bug appears. Bloch [6] offers some
fixes and an implicit complaint: how come that the bug persisted so long
when he, as a PhD student, was taught a correctness proof [5] of the binary
search algorithm? Finally, he asks the crucial question: and now we know
the binary search is bug-free, right?
18.4.3. Bugs everywhere and Hoares question
Computer bugs are, literally, everywhere and they may affect many users.
Most important software companies maintain bug databases: bugs.sun.
com/bugdatabase/index.jsp, bugs.kde.org, MySQLBugs, bugzilla.
mozilla.org, bugs.apache.org, etc. Here is a model of how to report
bugs at Sun:
If we dont know about your problem, we cant fix it. If youve isolated a
problem that you think were causing, and you cant find it here, submit
a bug! Make sure you include all the relevant information including all
the details needed to reproduce the problem. Submissions will be verified
and prioritized. (Please note that bug fixes are not guaranteed.)8
Bugs can be of different types, hence producing varying levels of inconvenience to the user of the program. The costs of some bugs may be
almost incalculable. A bug in the code controlling the Therac-25 radiation
therapy machine led to at least six deaths between 1985 and 1987 [60]. The
European Space Agencys $1 billion prototype Ariane 5s first test flight
on June 4, 1996, failed, with the rocket self-destructing 37 seconds after
launch [3]; the reason was a software bug, arguably one of the most expensive bugs in history. More recently, a security flaw in PayPal was exploited
by fraudsters to steal credit card numbers and other personal information
8 Our
Italics.
book
8:57
book
309
belonging to PayPal users (June 16, 2006); and the Y2K7 bug (January 3,
2007) affected Microsofts preview version of Expression Design. Finally, a
bug discovered by M. Schwartz [50] found no interest in the community, so
he had to write a small script showing how to use it to get all the email
addresses from members subscribing to a Google group; Google fixed the
problem on January 5, 2007. Improperly coded software seems to have
caused the Mars Global Surveyor failure in November 2006; in January
2007, NASA launched an investigation [62].
The list can easily be continued. Wired magazine maintains a history
of the worst software bugs [28].
In spite of all the examples discussed above, bugs and faulty software
have killed remarkably few people. They caused more embarrassment, nuisance, inconvenience, but many fewer catastrophes. Early in January 2007,
a 6.7 earthquake in Taiwan produced serious interruptions in the internet
in Asia [58]; this showed that the internet is far from shockproof, but consequences were, again, not catastrophic. Finally, one more example: Boeing
777, one of the most automatic fly-by-wire air-planes, has flown since 1995
without any crashes or serious problems. So, we can ask with Hoare [34]
the question: how did software get so reliable without proof ?
18.5. Proving vs. programming: tomorrow
18.5.1. Theorems and programs
The practice of programming, by and large, produces discoursive knowledge, a knowledge resulting from computing. Deductive knowledge,
complementary to discoursive knowledge, can be obtained by the mathematical analysis of the program (in some given context). These notions
of knowledge correspond to Dijkstras approaches (see [24]) to programs:
postulational and operational. Under the postulational approach, the program text is considered a mathematical object. The semantic equivalence
of two programs means that they meet the same specification. According
to the operational approach, reasoning about programs means building a
computational model with respect to which the program text is interpreted
as executable code.
According to Dijkstra:
The tragedy of todays world of programming is that, to the extent that it
reasons about programs at all, it does so almost exclusively operationally.
8:57
310
The computer science analogy of the operationalpostulational distinction corresponds to the differenceconsidered already at the beginning
of the 19th centurybetween mathematics understood as calculation and
mathematics as qualitative conceptual reasoning. In the analogy between
proving and programming, theorems correspond to algorithms not programs;
programs correspond to mathematical models.
18.5.2. Mathematics = proof ?
The role of proof in mathematical modelling is very small: adequacy is the
main issue! As mathematical modelling is closer to coding algorithms into
programs, selecting algorithms to code, designing specifications to implement, one can re-phrase the arguments against the idea of proof of correctness of programs [21, 26] as arguments against the idea of proof of
correctness of mathematical models. Models evolve and become more and
book
8:57
book
311
more adequate to the reality they model: however, they are never true.
Here is an illuminating description by Schwartz [49]:
. . . it may come as a shock to the mathematician to learn that the
Schr
odinger equation for the hydrogen atom . . . is not a literally correct
description of this atom, but only an approximation to a somewhat more
correct equation taking account of spin, magnetic dipole, and relativistic
effects; that this corrected equation is itself only an ill-understood approximation to an infinite set of quantum field-theoretical equations; and
finally that the quantum filed theory besides diverging, neglects a myriad
of strange-particle interactions whose strength and form are largely unknown. . . . The physicist, looking at the original Schr
odinger equation,
learns to sense it . . . and this sense inspires . . .
8:57
312
Lets do the following mental experiment: apply literally to mathematical practice Hilberts requirement for proof stated in Section 18.2 (in logical
terms, the proofs of a theory form a computable set). Then Andersons question, posed at the end of Subsection 18.3.2, is not only not surprising, but
should be answered in an affirmative way. This could be a reasonable motivation for the project Flyspeck.
Probabilistically checkable proofs are mathematical arguments that can
be checked probabilistically by reading just a few of their bits. In the early
1990s it was proved that every proof can be effectively transformed into a
probabilistically checkable proof with only a modest increase in the original
proof length. However, the transformation itself was complex. Recently, a
very simple procedure was discovered by Dinur; see the presentation [48].
Now, feeling a loss of certitude, we should remember that Thales was the
first to stimulate his disciples to criticise his assertions. This tradition was
later lost, but recovered with Galilei. With Thales and Galilei we learned
that human knowledge is essentially conjectural (see also [45]). Should
mathematics and computer science accept being guided by this slogan, or
is it adequate only for the natural and social sciences?
18.5.4. Communication and understanding
Of course, no theorem is validated before it is communicated to the mathematical community (orally and, eventually, in writing). Manin states it
clearly:
Proof is not just an argument convincing an imaginary opponent. Not
at all. Proof is the way we communicate mathematical truth.
book
8:57
book
313
An exponentially long quantum proof cannot be written down, since that would require
an exponential amount of classical paper, but a quantum mind could directly perceive
the proof, [13].
8:57
314
The standard of correctness and completeness necessary to get a program to work at all is a couple of orders of magnitude higher than the
mathematical communitys standard of valid proofs.
When one considers how hard it is to write a computer program even
approaching the intellectual scope of a good mathematical paper, and
how much greater time and effort have to be put into it to make it
almost formally correct, it is preposterous to claim that mathematics
as we practice it is anywhere near formally correct.
But we can go further into the past. Old Greek mathematics, with
Pythagoras, Plato and Euclid, was essentially conceptual and this is the
reason why they were able to invent what we call today a mathematical
proof. Babylonian mathematics was exclusively operational. The move
from an operational to a conceptual attitude in computer programming is
similar to the evolution from Babylonian to Greek mathematics.
Coming to more recent periods in the history of mathematics, we observe
the strong operational aspect of calculus in the 18th century, in contrast
with the move to the predominantly conceptual aspect of mathematical
analysis in the 19th century. Euler is a king of operational mathematics;
Riemann and Weierstrass express per excellence the conceptual attitude.
The transition from the former to the latter is represented by giants such
as Abel and Cauchy. When Cauchy believed that he had proved the continuity of the limit of a convergent sequence of continuous functions, Abel,
with no ironical intention, wrote: Il semble que le theor`eme de Monsieur
Cauchy admet des exceptions.10 But, at that moment, neither of them
was able to invent the notion of uniform convergence and, as a matter of
fact, neither convergence nor continuity was effectively clarified. Only the
second half of the 19th century brought their full understanding, together
with the idea of uniformity, either with respect to convergence or with respect to continuity. We see here all characteristic features of a transition
period, the transition from the operational to the conceptual attitude.
To stress the two facets of Cauchys mathematics, one belonging to the
intuitive-operational, the other to the rigourous-conceptual attitude, let us
recall that, despite the fact that Cauchy is undoubtedly the founder of the
exact differential calculus in the modern sense, he is also the mathematician who was convinced that the continuity of a function implies its differentiability and hence that any continuous function can be geometrically
represented. We had to wait for Weierstrass and Riemann to understand
10 It
book
8:57
book
315
the gap existing between some mathematical concepts and their intuitive
representation.
However, this evolution does not concern only calculus and analysis. It
can be observed in all fields of mathematics, although the periods in which
the transition took place may be different for various fields. For instance,
in algebraic geometry it took place only in the 20th century, with the work
of Oscar Zarisky. In fact, any conceptual period reaches its maturity under
the form of an operational one, which, in its turn, is looking for a new
level of conceptual attitude. The whole treatise of Bourbaki is a conceptual
reaction to an operational approach. Dirichlets slogan asking to replace
calculations with ideas should be supplemented with another, complementary slogan, requiring us to detect an algorithmic level of concepts.
Can we expect a similar alternation of attitudes in respect to programming? Perhaps it is too early to answer, taking into account that the whole
field is still too young. The question is not only academic, as the project
Flyspeck reminds us.
18.5.6. Is it meaningful to speak about the truth of axioms?
In Section 18.2 we argued that mathematical proofs do not produce certain knowledge; they produce rational belief. The epistemological value of
a proof reside in the degree of belief of its axioms. What is then the value
of proof? Is it meaningful to speak about the truth of axioms?
First, a few more words should be said about axioms and primitive
terms. Euclid avoids reference to primitive terms, but they exist in his
Elements, hidden by pseudo-definitions such as We call point what has no
parts. Only modern axiomatic systems make explicit reference to primitive terms. Obviously, programs too could not be conceived in the absence
of some primitive terms. A similar remark is in order about axioms. To
what extent is it meaningful to raise the question about the truth of some
axioms? Semantics is a matter of interpretation of a formal system, which,
in its turn, has some primitive terms and some axioms among its bricks.
Circularity is obvious. G
odels (true) statements that cannot be proved
could not be conceived in the absence of the respective formal system,
which in its turn has among its bricks some primitive terms and some axioms. Maybe we can refer to another way to understand meaning, a way
avoiding Hilberts itinerary? For instance, the way it is understood in C.
S. Peirces semiotics or in modern linguistics. But do they have the rigour
8:57
316
18.6. Acknowledgements
We thank Douglas Bridges, Andreea Calude, Greg Chaitin, and Nick Hay
for many discussions and suggestions that improved our paper.
References
[1] P.Aczel, J. Barwise. Non-well-founded sets, Journal of Symbolic Logic 54,
3 (1989), 11111112.
[2] R. F. Arenstorf. There Are Infinitely Many Prime Twins, http://arxiv.
org/abs/math.NT/0405509 26 May 2004.
[3] Ariane Flight 501, http://spaceflightnow.com/cluster2/000714featur
e/ariane501_qt.html.
[4] Beckham agrees to LA Galaxy move, http://news.bbc.co.uk/sport2/hi/
football/6248835.stm, January 11, 2007.
[5] J. Bentley. Programming Pearls, Addison-Wesley, New York, 1986; 2nd ed.
2000 (Chapter 5).
[6] J. Bloch. Nearly All Binary Searches and Mergesorts are Broken, Google
book
8:57
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
book
317
8:57
318
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
2005, http://www.wired.com/news/technology/bugs/0,2924,69355,00.
html?tw=wn_tophead_1.
D. Goldston, J. Pintz, C. Yildirim. Primes in tuples, http://arxiv.org/
math.NT/0508185.
R. K. Guy. Unsolved Problems in Number Theory, Springer-Verlag, New
York, 1994, 2nd ed. (pp. 1923).
T. C. Hales. Sphere packings. I, Disc. Comput. Geom. 17 (1997), 1051.
T. C. Hales. A proof of the Kepler conjecture, Ann. Math. 162 (2005),
10651185.
D. Hilbert. Mathematical problems (Lecture delivered before the International Congress of Mathematicians at Paris in 1900), in F. E. Browder (ed.).
Proceedings of Symposia in Pure Mathematics of the AMS, AMS, 28, 1976,
p. 25.
C. A. R. Hoare. How did software get so reliable without proof?, in Proceedings of the Third International Symposium of Formal Methods Europe on
Industrial Benefit and Advances in Formal Methods, Lect. Notes Comput.
Sci., Vol. 1051, Springer-Verlag London, 1996, 1 17.
D. Kennedy. Breakthrough of the year, December 22, 2006, http://www.
sciencemag.org/cgi/reprint/314/5807/1841.pdf.
D. E. Knuth. Theory and practice, EATCS Bull. 27 (1985), 1421.
D. Mackenzie. Mechanizing Proof, MIT Press, Cambridge, Mass. 2001.
T. R. Nicely. Enumeration to 1014 of the twin primes and Bruns constant,
Virginia Journal of Science 46, 3 (Fall, 1995), 195204.
G. Odifreddi. The Mathematical Century, Princeton University Press,
Princeton, 2004.
G. Perelman. Ricci Flow and Geometrization of Three-Manifolds, Massachusetts Institute of Technology, Department of Mathematics Simons Lecture Series, September 23, 2004 http://www-math.mit.edu/
conferences/simons.
G. Perelman. The Entropy Formula for the Ricci Flow and Its Geometric Application, November 11, 2002, http://www.arxiv.org/abs/math.
DG/0211159.
G. Perelman. Ricci Flow with Surgery on Three-Manifolds, March 10, 2003,
http://www.arxiv.org/abs/math.DG/0303109.
H. Poincare. Oeuvres de Henri Poincare, tome VI. Gauthier-Villars, Paris,
1953, pp. 486 and 498.
K. R. Popper. Indeterminism in quantum physics and in classical physics.
Part II, The British Journal for the Philosophy of Science 1, 3 (1950),
173195.
K. R. Popper. Part 1 in David Miller (ed.). Popper Selections, Princeton
University Press, 1985.
G.-C. Rota. Indiscrete Thoughts, Birkh
auser, Boston, 1997, p. 96.
J.-P. Serre. Interview, Liberation May 23 (2003).
J. Radhkrishnan, M. Sudan. On Dinurs proof of the PCP theorem, Bull.
AMS 44, 1 (2007), 1961.
J. Schwartz. The pernicious influence of mathematics on science, in M. Kac,
book
8:57
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
[62]
book
319
8:57
320
book
8:57
Chapter 19
Gods Number: Where Can We Find the Secret of the
Universe? In a Single Number!
Marcus Chown1
New Scientist, London, UK; MChown@ compuserve. com
Some time ago a group of hyper-intelligent pan-dimensional beings decided to answer the great question of Life, The Universe and Everything.
To this end they built an incredibly powerful computer, Deep Thought. After the great computer program had run for a few million years, the answer
was announced. And the answer was . . .
.000000100000010000100000100001110111001100100111100010010011100 . . .
Come again? Surely, it was 42? Well, in Douglas Adams novel The
Hitch Hikers Guide to the Galaxy it certainly was. But, in the real world,
rather than the world of Arthur Dent, Zaphod Beeblebrox and Ford Prefect,
the answer to the question of Life, The Universe and Everything is very
definitely. . .
.000000100000010000100000100001110111001100100111100010010011100 . . .
The number is called Omega and, remarkably, if you knew its first few
thousand digits, you would know the answers to more mathematical questions than can ever be posed. What is more, the very existence of Omega
is a demonstration that most mathematics cannot be discovered simply by
1 This
article was first published in the book Marcus Chown. The Never-Ending Days
of Being Dead (Faber, London, 2007). It is re-published here with the kind permission
of Faber.
321
book
8:57
322
Marcus Chown
applying logic and reasoning. The fact that mathematicians have little difficulty in discovering new mathematics may therefore mean that they are
doing somethingemploying intuition perhapsthat no computer can
do. It is tantalising evidence that the human brain is more than a jellyand-water version of the PC on sitting on your desktop.
Omega () actually crops up in a field of mathematics invented by an
Argentinian-American called Gregory Chaitin. Algorithmic Information
Theory attempts to define complexity.2
This is a very difficult concept to pin down precisely yet a precise definition is extremely important in many fields. How else, for instance, can a
biologist studying evolution objectively say that a human is more complex
than a chimpanzee or even a jellyfish?
Chaitin invented AIT when he was 15, the same age Wolfram was when
he began publishing papers in physics journals. His principal concern at
the time was with numbers. But, in fact, AIT applies to much more. After
all, as we all know today, information describing everythingfrom words
to pictures to musiccan ultimately be expressed in the form of numbers.
We are living in a digital world.
Chaitins key idea was that the complexity of a number can be measured
by the length of the shortest computer program that can generate it. Take,
for instance, a number which goes on forever such as 919191. . . Although
it contains an extremely large number of digitsit goes on forever after
allit can be generated by a very short program:
Step 1: Write 91
Step 2: Repeat step 1
According to Chaitins measure, therefore, the number 919191. . . is not
very complex at all. The information it contains can be reduced, or
compressed, into a much more concise form than the number itself
specifically, the two-line program above.
Actually, Chaitin is a bit more precise about what he means by the
shortest computer program. He is a mathematician, after all. He means
the shortest computer program encoded in binary that can generate a particular number, itself expressed in binary .
2 Many
of the ideas of AIT were invented independently by the Russian Andrei Kolmogorov.
book
8:57
Gods Number
book
323
8:57
324
Marcus Chown
there is not one Omega but a whole class of Omegas. This is because Omega
depends on the particular type of computer language used to generate a number. It
would not be the same, for instance, in two languages that used a different string of 0s
and 1s to code for a task like Repeat step 1.
book
8:57
Gods Number
book
325
ation at Bletchley Park and Cheltenham by the end of the war. But his
enduring fame rests on work he carried out earlier, in the 1930s, on a far
more theoretical type of computerone he invented with the specific purpose of figuring out the limits of computers.
A Turing machine is simply a box. A 1-dimensional tape with a series
of 0s and 1s inscribed on it is fed into the box and the same tape emerges
from the box with a different series of 0s and 1s on it. The input is
transformed into the output by a read/write head in the box. As the
tape passes the head, one digit at a time, the head either leaves the digit
unchanged or erases it, replacing a 0 by a 1, and vice versa. Exactly what
the head does to each digit is determined by the internal state of the box
at the timewhat, in todays jargon, we would call a computer program.
With its input and output written in binary on a 1-dimensional tape, a
Turing machine is a wildly impractical device. Practicality, however, was
not the point. The point was that, with the Turing machine, Turing had
inventedon paper, at leasta machine that could simulate the operation
of absolutely any other machine.5
Nowadays, a machine that can simulate any other machinea universal machine is not considered remarkable at all. Such devicescapable
of carrying out not one specialised task but any conceivable taskare ubiquitous features of the world. They are called computers. In the 1930s, however, the universal Turing machine appeared to be straight from the pages
of science fiction. The only way computing machines of the day could carry
out different tasks was if they were painstakingly rewired. Turings genius was to see that this was unnecessary. With a universal machinea
general-purpose computerit was possible to simulate any other machine
simply by giving it a description of the other machine plus the computer
program for the other machine. There was no need to change the wiring
the hardwareonly the software.
Turing imagined the software for his universal Turing machine inscribed
as a long series of 0s and 1s on a cumbersome 1-dimensional tape. Fortunately, todays computers are a bit more sophisticated than Turings vision
and their software comes in a considerably more user-friendly form!
In the universal Turing machine, however, Alan Turing can in fact be
5 And
it could do this using only seven basic commands! (i) Read tape, (ii) Move tape
left, (iii) Move tape right, (iv) Write 0 on tape, (v) Write 1 on tape, (vi) Jump to another
command, and (vii) Halt.
8:57
326
Marcus Chown
book
8:57
Gods Number
book
327
halting problem is uncomputable because, if there was a program that could compute itone that could take another program and spit out, say, a 0 if it never halts
and a 1 if it eventually haltsthis halting program could be used to do something
impossible: construct a program which stops if it doesnt stop and doesnt stops if it
stops!
How? Such a program would have to incorporate the halting program as sub-program,
or subroutine, and apply it to itself. This sounds tricky but actually is not. Just engineer
the program to output itselfa string of 0s and 1s identical to the binary code of the
programand then get the halting program to check whether the output halts or not.
If it does halt, the program has instructions not to halt; and if it does not halt, the
program has instructions to halt.
What has been concocted is an impossibility, a contradictionall made possible by the
existence of the halting program. For the sanity of the Universe, therefore, the halting
program cannot exist.
8:57
328
Marcus Chown
Put it another way. Omega is the concentrated distillation of all conceivable halting problems. It contains the answer not to just one halting
problems but to an infinite number!
Of course, individual cases of the halting problem are uncomputable.
This was Turings big discovery, after all. Consequently, Omega too is uncomputable. This is in fact not very surprising. Recall that it takes an
infinitely long computer program to generate Omega, which is hardly a
practical proposition!
Omega is maximally uncomputable, maximally unknowable. Technically, this is because the first n bits would in theory enable us to answer
Turings infamous halting problem for all programs up to n bits in sizeand
this is impossible, says Chaitin. However, crudely speaking, the reason
Omega is unknowable is that its the probability of something happening
a computer program haltingwhich itself is unknowable!
The deep an unexpected connection between Omega and all conceivable
halting problems has an astonishing consequence. It comes about because
of the remarkable fact that most of the interesting problems in mathematics
can be written as halting problems, says Cristian Calude of the University
of Auckland.
Take, for example, the problem of finding a whole number that is not
the sum of three square numbers. The number 6, for instance, fails. It can
be written as 12 + 22 + 12 and so is the sum of three square numbers. The
first number that is not a sum of three squares is in fact 7.
A brute force program to find numbers that are not the sum of three
squares would simply step through the whole numbers, one at a time, stopping when it finds a number that cannot be written as the sum of three
squares. Or, if all numbers can be written as the sum of three squares, it
will keep going forever. Does this ring any bells? says Calude. Its a
halting problem!
The amazing thing is that a host of other mathematical questions can
also be re-cast as halting problemsif a particular program halts, the answer to the question is, yes; if it doesnt halt, it is no. The consequence
of this fact is scarcely believable. The first few thousand digits of Omega
contain the answers to more mathematical questions than could be written
down in the entire universe! says Charles Bennett of IBM in New York.
An example of such a question is whether Fermats last theorem is
book
8:57
Gods Number
book
329
8:57
330
Marcus Chown
Of course, Omega may contain the secret of the universe but it is unknowable. In fact, it is worse than this. Even if, by some kind of miracle,
we get the first 10,000 digits of Omega, the task of solving the problems
whose answers are embodied in these bits is computable but unrealistically
difficult, says Calude. Technically, the time it takes to find all halting
programs of length less than n grows faster than any computable function
of n.
In other words, we will be in the position of the characters in Adams
The Hitch Hikers Guide to the Galaxy. They knew the answer to Life, the
Universe and Everything is 42. Unfortunately, the hard part was knowing
the question.
Determining all the digits of Omega is clearly an impossibility for lowly
human beings. The number in its entirety is really knowable only by God.
Incredibly, however, Calude has managed to calculate the first 64 digits of
Omegaor at least an Omega. Those digits are the ones shown at the
beginning of this chapter.
Calude was able to calculate 64 bits of a nominally uncomputable number because, contrary to everything that has been said up to now, the
computing barrier discovered by Turing can actually be broken. This is because Turing defined the halting problem for a classical Turing machinea
familiar general-purpose computer. However, nature permits types of machines that Turing did not anticipate such as quantum computers. These
are accelerated Turing machines. It may be possible to use them solve
the halting problem and compute other apparently uncomputable things.
show how a number could encapsulate the answers to all conceivable questions. Take
the French alphabet, he said, including blanks, digits, punctuation marks, upper case,
lower case, letters with accents, and everything. Then start making a list. Start off
with all possible 1-character sequences, then all possible 2-character sequences, and so
on. Eventually, you will get all possible successions of characters of any length, in
alphabetical order. Most will of course be nonsense. Nevertheless, in the list, you will
find every conceivable question in Frenchin fact, everything you can write in French.
Next, said Borel, number the sequences you have created. Then imagine a number
0.d1 d2 d3 . . . whose nth digit dn is a 1 if the nth element of the list is a valid yes/no
question in French whose answer is, yes, and whose nth digit is 2 if the nth element is a
valid yes/no question whose answer is, no. If the nth element of the list is garbage, not
valid French, or valid French but not a yes/no question, then the nth digit is 0.
So Borel had a number that gives the answer to every yes/no question you can ask in
Frenchabout religion, about maths, about physicsand it is all in one number! Of
course, such a number would contain an infinite amount of information, which would
make actually ever knowing it a bit unrealistic. It would be just like Omega. In fact,
Borels number is actually related to Omega.
book
8:57
Gods Number
book
331
8:57
332
Marcus Chown
book
8:57
Gods Number
book
333
ical steps from the bedrock axioms and so is a bona fide theorem.
In short, what Hilbert had in mind was finding a proof-checking algorithm a procedure for checking that each step in a given proof is logically
watertight. If mathematicians possessed such a procedure they would in
theory be able to run through all possible proofs, starting with the simple
ones and progressing to more complex ones; check whether they are correct; and see what theorems they led to. In this way they would be able to
generate an infinite list of all provable mathematical statementsthat is,
all theorems.
If a mathematical statement is true, Hilberts mindless approach would
therefore eventually find the proof. If a statement cannot be proved,
Hilberts mindless method would go on forever, unless a proof that the
statement is false was found.
The mechanical nature of Hilberts proof-checking procedure was crucially important. After all, if it could be applied mindlessly, without any
need to know how mathematics worked, then it would be something absolutely everyone could agree on. Hilbert would have taken the process
of doing mathematics and set it in stone. He would have removed from
the subject all the ambiguities of everyday language and reasoning. There
would be no room left for contradictions such as the one that appeared to
have cropped up in set theory.
Hilbert did not know itcould not have known itbut the mechanical
proof-checking procedure he envisaged was nothing less than a computer
program running on a computer! How many people realise that the computer was actually conceived and born in the abstract field of pure mathematics? says Chaitin.
Hilberts programme to weed out the paradoxes from mathematics was
hugely ambitious. He fully expected it to take decades to carry out. But
what he did not realiseand nor did anyone elsewas that the programme
was impossible!
In 1931, an obscure Austrian mathematician called Kurt Godel showed
that, no matter what set of axioms you select as the ultimate bedrock
of all mathematics, there will always be theoremsperfectly legitimate
theoremsthat you can never deduce from the axioms. Contrary to all
expectations, the perfect world of mathematics is plagued by undecidable theoremsthings which are true but which can never be proved to
be true by logical, rational reasoning.
8:57
334
Marcus Chown
G
odel proved his result in the most ingenious way. He managed to embed in arithmeticone of the most basic fields of mathematicsthe selfreferential declaration this statement is unprovable. Since this required
him to make a piece of arithmetic actually refer to itself, it was an immensely difficult task. However, by embedding the troublesome statement
in arithmetic, G
odel had buried an atomic bomb in the very fabric of mathematics. This statement is unprovable is, after all, the liars paradox
in another guise. If it is true, mathematics admits false statements that
cannot be provedit is inconsistent. If it is false, it admits undecidable
statements that can never be settledit is incomplete.
Incompleteness is very bad for mathematics but inconsistency is truly
terrible. False statements would be like a plague of moths gnawing at its
very fabric. There was no choice for mathematicians but to accept the
lesser of G
odels two evils. Mathematics must be terminally incomplete.
To everyones profound shock, it contained theorems that could never be
proved to be true.
All theorems rest on premises, declared Aristotle. Godels incompleteness theorem shows that the great man was sorely mistaken. High
above the mathematical bedrock there are pieces of mathematical scaffolding floating impossibly in mid-air.
The obvious way to reach these free-floating theorems is by building
up the bedrockthat is, adding more axioms. However, this will not help.
According to G
odels incompleteness theorem, no matter how many axioms
are added, there will always be theorems floating in the sky, perpetually
out of reach. There will always be theorems that are true but that can
never be proved to be true, at least by logical, rational reasoning.
To say that G
odels discovery was deeply distressing to mathematicians
is a bit of an understatement. As pointed out, mathematicians had believed
mathematics was a realm of certain truths, far from the messy uncertainty
of the everyday life. This is precisely what had attracted many of them
to the field in the first place. But, contrary to expectations, mathematics
turned out to be a realm where many things are up in the air, many things
are messy, many things are uncertain8 Some mathematician could not hide
8 By
book
8:57
Gods Number
book
335
their despair at this unhappy revelation. Godels result has been a constant drain on my enthusiasm and determination, wrote Hermann Weyl.
No matter how unpalatable it might be, however, Godels result was
incontrovertible. Mathematicians had no choice but get used to iteven
to revere it. Many now consider the publication of Godels incompleteness theorem to be the most significant event in the whole of 20th century
mathematics. G
odels incompleteness theorem has the same scientific status as Einsteins principle of relativity, Heisenbergs uncertainty principle
and Watson and Cricks double helix model of DNA, says Calude.
But, if things were bad in the world of mathematics after Godel discovered incompleteness, they got a whole lot worse five years later. That
was when Turing discovered uncomputability9 Not only did mathematics
contain things that were undecidable, it also contained things such as the
halting problem that were uncomputable.
Undecidability is in fact deeply connected to the uncomputability10 Not
only that but both undecidability and uncomputability are also deeply connected to Chaitins idea that the complexity of a number is synonymous
with the shortest program that can generate the number.
This is not obvious at all. However, recall that Omega is the ultimate in
irreducible information. This means it cannot be generated by a program
shorter than itself, which is the same as saying it cannot be compressed into
a shorter string of bits than itself. Now think of one of those free-floating
theorems that G
odel discovered are an inevitable feature of mathematics.
It cannot be reached by logical deduction from any axioms, which is the
same as saying it cannot be deduced from any principles simpler than itself,
like the position and speed of atoms. This uncertainty was quantified in Heisenbergs
uncertainty principle. If G
odel dropped a bombshell in mathematics, Heisenberg can
be said to have dropped one in physics.
9 The 29 March 1999 issue of Time magazine included both G
odel and Turing among
the their 20 greatest scientists and thinkers of the 20th century.
10 In fact, undecidability is a consequence of uncomputability. If it were always possible to
start with some axioms and prove that a given program halts or that it never does, that
will give you a way to compute in advance whether a program halts or not, How? You
simply run through all possible proofs, starting with the simplest ones, checking which
ones are correct, until either you find a proof the program will halt eventually or you find
a proof that it is never going to halt. Since Turing showed that computing in advance
whether or not a program will halt is impossible, it follows that this procedure too is
impossible. It follows that there must be proofssuch as the proof that a given program
will haltthat cannot be found by this logical, step-by-step process. In other words,
there are proofs that cannot be deduced from any conceivable axioms and mathematics
is incomplete.
8:57
336
Marcus Chown
book
8:57
Gods Number
book
337
could be written on the back of a postage stamp. From the point of view
of AIT, the search for the Theory of Everything is the quest for an ultimate
compression of the world, says Chaitin.
The most incomprehensible thing about the universe, Einstein famously said, is that it is comprehensible. Chaitin, who equates comprehension with compression, would rephrase this. The most incomprehensible
thing about the Universe is that it is compressible. This feature of the world
is the reason we have been able to divine universal laws of nature, which
apply in all places and at all timeslaws which have enabled us to build
computers and nuclear reactors and gain some degree of mastery over nature.
To Chaitin the compressibility of the Universe is a wonder. For some
reason, God employed the least amount of stuff to build the world, he
says. For some reason, the laws of physics are as simple and beautiful as
they can be and allow us, intelligent beings, to evolve. This is a modern
version of something noted by Leibniz: God has chosen the most perfect
world, he wrote. The one which is the most simple in hypotheses and
the most rich in phenomena.
Though we do not know why the laws underpinning the Universe are
simple, the faith that they are is a powerful driving force of science. According to Feynman: It is possible to now when you are right way ahead
of checking all the consequences. Truth is recognisable by its beauty and
simplicity.
19.4. Randomness
Back to G
odel. Although he had shocked and depressed mathematicians
by showing that mathematics contains theorems which are undecidable,
surprisingly his result did not make any difference to the day-to-day doing of mathematics. Weyls pessimism was misplaced. Mathematicians,
in their everyday work, simply do not come across results that state that
they themselves are unprovable, says Chaitin. Consequently, the places
in mathematics where you get into trouble seem too remote, too strange,
too atypical to matter.
A more serious worry was Turings demonstration that there are things
in the world which were completely uncomputable. This is a very concrete
result. It refers, after all, to computer programs, which actually calculate
8:57
338
Marcus Chown
things. On the other hand, the program Turing considered merely tried to
figure out whether another program halts or not. It is hardly typical of
todays computer programs, which carry out word processing or surf the
Internet. Not surprisingly, therefore, none of these programs turn out to be
undermined in any discernible way by the uncomputability of the halting
problem.
It would seem that uncomputability and undecidability are too esoteric
to bother about, that they can be swept under the carpet and safely forgotten about. This is indeed how it appeared for a long while. All was
tranquil and quiet in the garden of pure mathematics. But then the gate
squeaked on its rusty hinges and in walked Chaitin.
From the time he had been a teenager, Chaitin had been convinced that
G
odel and Turings results had far more serious implications for mathematics than anyone guessed. And he had resolved to find out what those
implications were. It was this quest that had led him to invent AIT.
AIT is of course founded on the idea that the complexity, or information
content, of a number is synonymous with the shortest computer program
that can generate the number. However, at the core of AITjust like at
the core of Turing and G
odels workthere is a paradox. It is actually
impossible to ever be sure you have found the shortest possible program!11
A shortest program, of course, exists. But this is not the point. The
point is that, although it exists, you can never be sure you have found it.
Determining whether you have turns out to be an uncomputable problem.
AIT is founded on uncomputability. The whole field is as riddled with
holes as a swiss cheese. Uncomputability in fact follows from AIT12
And so does G
odels incompleteness theorem. This turns out to be
11 Say,
there is a program that can decide whether a given program, p, is the shortest
possible program capable of producing a given output. Now consider a program, P,
whose output is the output of the smallest program, p, bigger than P that is capable of
producing the given output. But P is too small a program to produce the same output
as p. There is a contradiction! Therefore, an algorithm for deciding if a program p is as
small as possible cannot exist.
12 If you could always decide in advance whether a program halts or not, you could
systematically check whether each small program halts or not, and if it does halt, run
it and see what it computes, until you find the smallest program that computes a given
number. But this would contradict Chaitins result that you cannot ever be sure you
have the smallest program for generating a given number. Consequently, there can be
no general solution to the halting problem. It is uncomputable.
book
8:57
Gods Number
book
339
equivalent to the fact that it is impossible to prove that a sequence of digits is incompressiblethat is, the shortest program has been found. Everywhere you turn in my theory you find incompleteness, says Chaitin.
Why? Because the very first question you ask in my theory gets you into
trouble. You measure the complexity of something by the size of the smallest computer program for calculating it. But how can you be sure that
what you have is the smallest computer program possible? The answer is
that you cant!
In Chaitins AIT, undecidability and uncomputability take centre stage.
Most mathematical problems turn out to be uncomputable. Most mathematical questions are not, even in principle, decidable. Incompleteness
doesnt just happen in very unusual, pathological circumstances, as many
people believed, says Chaitin. My discovery is that its tendrils are everywhere.
In mathematics, the usual assumption is that, if something is true, it is
true for a reason. The reason something is true is called a proof, and the
object of mathematics is to find proofs, to find the reason things are true.
But the bits of OmegaAITs crowning jewelare random. Omega cannot
be reduced to anything smaller than itself. Its 0s and 1s are like mathematical theorems that cannot be reduced or compressed down to simpler
axioms. They are like bits of scaffolding floating in mid-air high above the
axiomatic bedrock. They are like theorems which are true for no reason,
true entirely by accident. They are random truths. I have shown that
God not only plays dice in physics but even in pure mathematics! says
Chaitin.13
Chaitin has shown that Godel and Turings results were just the tip of
the iceberg. Most of mathematics is composed of random truths. In a nutshell, G
odel discovered incompleteness, Turing discovered uncomputability,
and I discovered randomnessthats the amazing fact that some mathematical statements are true for no reason, theyre true by accident, says
Chaitin.
Randomness is the key new idea. Randomness is where reason stops,
its a statement that things are accidental, meaningless, unpredictable and
happen for no reason, says Chaitin.
13 This
8:57
340
Marcus Chown
Chaitin has even found places where randomness crops up in the very
foundation of pure mathematicsnumber theory. If randomness is even
in something as basic as number theory, where else is it? says Chaitin. My
hunch is its everywhere.
Chaitin sees the mathematics which mathematicians have discovered
so far as confined to a chain of small islands. On each of the islands are
provable truths, the things which are true for a reason. For instance, on
one island there are algebraic truths and arithmetic truths and calculus.
And everything on each island is connected to everything else by threads of
logic so it is possible to get from one thing to another simply by applying
reason. However, the island chain is lost in an unimaginably vast ocean.
The ocean is filled with random truths, theorems disconnected forever from
everything else, tiny atoms of mathematical truth.
Chaitin thinks that the Goldbach conjecture, which has stubbornly defied all attempts to prove it true or false, may be just such a random truth.
We just happened to have stumbled on it by accident. If he is right, it will
never be proved right or wrong. There will be no way to deduce it from any
conceivable set of axioms. Sooner or later, in fact, the Goldbach conjecture
will have to be accepted as a shiny new axiom in its own right, a tiny atom
plucked from the vast ocean of random truths.
In this context, Calude asks an intriguing question: Is the existence of
God an axiom or a theorem? !
Chaitin is saying that the mathematical universe has infinite complexity and is therefore not fully comprehensible to human beings. Theres
this vast world of mathematical truth out therean infinite amount of
informationbut any given set of axioms only captures a tiny, finite amount
of this information, says Chaitin. This, in a nutshell, is why Godels incompleteness is natural and inevitable rather than mysterious and complicated.
Not surprisingly, the idea that, in some areas of mathematics, mathematical truth is completely random, unstructured, patternless and incomprehensible, is deeply upsetting to mathematicians. Some might close their
eyes, view randomness as a cancer eating away at mathematics which they
would rather not look at but Chaitin thinks that it is about time people
opened their eyes. And rather than seeing it as bad, he sees it as good.
Randomness is the real foundation of mathematics, he says. It is a rev-
book
8:57
Gods Number
book
341
8:57
342
Marcus Chown
book
8:57
Chapter 20
Omega Numbers
Jean-Paul Delahaye1
Laboratoire dInformatique Fondamentale de Lille
Centre National de la Recherche Scientifique (CNRS)
Universite des Sciences et Technologies de Lille, France;
[email protected]
Omega numbers are disconcerting: they are both well defined and
uncomputable. Yet the closer you look, the more remarkable they appear.
paper has first appeared in French in Pour la Science, Mai 2002, n0 295, pp.
98103. Re-published with the kind permission of Pour la Science.
343
book
8:57
344
Jean-Paul Delahaye
Let us proceed step by step through the universe of real numbers, to examine the definition and properties of omega numbers and to get a feeling for
all their eccentricities.
Real numbers are numbers that, written in base 10, for example, can
continue indefinitely (such as e = 2.7182818284590. . . ). Those that cannot
continue indefinitely (like the famous 6.55957the value of a euro in French
francs at the launch of the single currency in 2002) are decimal fractions.
Of those that do go on indefinitely, some do it in a periodic manner, for
instance, 24/110 = 0.21818181818. . . In addition, numbers that become
periodic at a certain point in their expansion are the quotients of twowhole
numbers (called rational numbers). Irrational numbers such as the 2 are
not quotients of whole numbers, and their decimals are not periodic.
Because the number of decimal places for real numbers is infinite, they
subtly introduce logical difficulties greater than we can imagine, and these
difficulties are shared by the omega numbers. Let us consider them.
First, the infinite number of decimal places means that real numbers
are not countable: there is no numbering system r0 , r1 , . . . , rn , . . . that constitutes the complete list of real numbers (we have Cantor to thank for
this proof). Thus, the set of real numbers constitutes an infinite set larger
than the infinite sets of whole numbers and rational numbers (since, thanks
again to Cantor, this set of rational numbers is the same size as the set of
whole numbers).
A sometimes neglected consequence of this uncountability is that we
cannot compute all the real numbers: computable real numbers are by definition those for which there is a computer program that, allowed to run indefinitely, produces a string of the digits for the number one after the other.
But the number of programs is denumerable: we can, for example, count all
of them if we consider the sets of programs that have 1, 2, 3, . . . n, n + 1, . . .
symbols. Each set contains a finite number of programs, which makes the
set finitely countable and, accordingly, the entire set of all programs countable. We will use this numbering of programs later. It follows that there
are not enough programs to compute all the real numbers. Of course, every
rational number is computable, as are the familiar constants of classical
mathematics: , e log 2, e + , sin(1), and so on. In each case, their definition (for example, the series e = 1 + 1/1! + 1/2! + 1/3! + . . .) dictates the
book
8:57
Omega Numbers
book
345
c: = 1;
while c > 0 do c: = c + 1;
end
A program has only one option: once launched, it can stop after a finite
time; or it can run forever.
Let us draw up the list of all the possible programs P0 , P1 , . . . , Pn , . . .
written, for example, in Java (a ubiquitous programming language) and
classify them by size as described above. Now let us consider the real number whose decimal expansion is = 0.a0 a1 . . . an , . . . where each an equals
1 if the program Pn stops, and 0 if it continues indefinitely.
The undecidability of the halting problem (it is impossible to write a
program A that, examining any other programhere program Pn returns
in finite time whether Pn halts or whether it runs indefinitely), demonstrated in 1936 by Turing, determined that the real number is not computable.
Thus certain numbers, such as , are not computable, but they are
known, for they can be defined without the least ambiguity. Yet they are
unknowable, because no program can produce their string of digits. That
is the way the mathematical world works: some of its numbers can be seen
(defined), but not touched (computed).
8:57
346
Jean-Paul Delahaye
Yet as soon as n becomes sufficiently large, neither of these two statements can be proven. Briefly stated, if is a Chaitin number, the digits,
except for a finite number of them, are undecidable.
book
8:57
Omega Numbers
book
347
8:57
348
Jean-Paul Delahaye
A hierarchy of incalculabilities
Rene Daumal (a French poet, 19081944) imagined Mount Analog, a mysterious mountain symbolizing research, whose summit is by definition unreachable despite an accessible base. The various numbers evoked in this
article are all comparable to the summit of Mount Analog. Rational numbers are computable and periodic, but their decimal places are infinite.
Transcendantal numbers like and e are computable: we can never know
all their digits, but the difference between these numbers and their approximations is as small as one could wish.
The number , whose bits are 0 when the program associated with its nth
digit halts, and 1 when it doesnt. We know how to compute an infinite
number of these bits, but an infinity of others are unknowable.
The Chaitin numbers, which indicate the probability that a program running on a universal machine will come to a halt: we know how to compute
only a finite number of their digits.
The Solovay numbers, for which we cannot compute a single digit, although
the numbers are well defined.
Figure 1.
book
8:57
Omega Numbers
book
349
A mandala
The symbol is repeated four times in the structure of this graphic symbol
of the universe, called a mandala. The symbol appears in many other
mandalas, and represents a fitting artistic rendition of the numbers.
Figure 2.
8:57
350
Jean-Paul Delahaye
book
8:57
Omega Numbers
book
351
8:57
352
Jean-Paul Delahaye
obtain several new digits in going from xn to xn+1 . Other approaches can
be slower. Number crunchers of course prefer more rapid techniques; and
you could say that, faced with mathematical constants, their skill consists
in inventing fast methods of convergence. When you are dealing with a
Chaitin omega, you know definitely and absolutely that this approach is
useless. Not only will no method be fast, but no means of approximation will enable you to know how rapidly it can supply bits for the omega
constant computed.
20.6. The properties of omega numbers
It is possible to imagine all the universal machines and all the omega numbers associated with them. The infinite class of Chaitin omega numbers is
thus countable, as is that of Solovay omega numbers. Moreover, we know
a lot about them. There is no paradox in the fact that it is possible to
demonstrate specific properties of the omega numbers (including the Solovay omega numbers), even though their bits are not computable. In the
real world, to have access to general knowledge for example, the average
weight of Americans is greater than the average weight of Europeansyou
have to gather detailed information. In the mathematical world, that is not
always the case: it is possible to know something general about a number
U for example, the frequency of 1s and 0s is the same in U binary
notationand at the same time not know a single specific U bit. Yet
another mathematical enigma!
Here are some of the known properties of omega numbers:
All omega numbers are irrational and transcendantal (no polynomial
equation containing whole-number coefficients has an omega number
as its solution).
The decimals of all omega numbers are uniformly distributed: the set of
their digits in base 10 carries a tenth of 0, a tenth of 1, . . . , a tenth
of 9, and there is an analogous property in every other numbering
system.
Each omega number is a universal number in each base: every finite sequence of digits is present in it. One could even say that each
omega number contains every finite sequence of n decimal digits with
a frequency of 10n (of course, there is an analogous property in all
numbering systems). Consequently, for all omega numbers, we know
that somewhere there is a series of a billion consecutive 0s (nothing
book
8:57
Omega Numbers
book
353
like that has been demonstrated for constants such as and e).
All omega numbers are random in the strictest mathematical sense
(the technical term is Martin-Lof random in honor of the Swedish
mathematician who introduced this concept in 1966). That implies, in
particular, that (a) a program for predicting the nth bit of an omega
number based on n 1 initial bits can never be better than chance;
(b) if we extract a subsequence of the sequence of digits of some omega
numbers using an algorithm (for example, by retaining only the digits
whose position number is a prime), this sequence will be that of digits of
an irrational, transcendental, uniformly distributed, random number,
and in fact even another Chaitin omega number.
All omega numbers are uncompressible. Specifically, for each omega
number U , there is a constant, c, such that the shortest program
giving the n first few bits of U is at least as long as n c (it is
not possible save more than c bits of information when attempting to
compress an initial U string).
All omega numbers are uncomputable, and yet each is the limit of
a computable increasing sequence of rational numbers (they are said
to be approximable; see Figure 4). This convergence is slower than
the convergence for any computable sequence of rational numbers to a
computable number.
An omega number can begin with any finite string of digits. Thus there
is an omega number that begins with 3, 14, another that begins with
3, 1415, another that begins with 3, 141592, and so on. Note, however,
that universal machines that have those particular numbers for omega
numbers will be artificially constructed.
If the sum of two omega numbers is less than 1, the sum is an omega
number; likewise for the product (these elegant properties are not true
for irrational numbers or for transcendental numbers [for example,
/4 + (2 /4) = 2]).
8:57
354
Jean-Paul Delahaye
example, that associated with the Java language that runs your computer).
A conjecture such as Every even number greater than 2 can be written
as the sum of two primes (Goldbachs conjecture) is essentially equivalent
to a program searching endlessly for a counterexample, a program that is
only several hundred bits long. All conjectures of the form ZFC enables
proof of P if P is a fairly short statement could also be resolved (theoretically) by knowing a few hundred omega bits of a natural universal machine.
Thus omega numbers not only distill information about halting programs. They are also concentrates of mathematical information.
To console us for the fact that we will never know even 1,000 bits of a
natural omega number, we can tell ourselves that extracting information
from omega numbers is a finite but incredibly long job (hence my use of
theoretically in the preceding paragraph). Consequently, even if you
know 1,000 bits of a natural Chaitin omega number, you could never really
use it. As Martin Gardner and Charles Bennett have written: Omega is in
many senses a cabalistic number. It can be known of through human reason,
but not known. To know it in detail, one must accept its uncomputable
sequence of digits on faith, like words of a sacred text.
References
[1] C. H. Bennett and M. Gardner, The random number omega bids fair to
hold the mysteries of the universe, Scientific American, vol. 241, pp. 2034,
1979.
[2] C. S. Calude, Chaitin numbers, Solovay machines, and incompleteness, Theoretical Computer Science, vol. 284, pp. 269277, 2002. See
http://citeseer.ist.psu.com/calude99chaitin.html for a coherent and
exhaustive treatment.
[3] J. P. Delahaye, Information, complexite et hazard, 2d edn., Hermes Science
Publication, Paris, 1999.
[4] J. P. Delahaye, LIntelligence et le calcul, Pour la science, Belin, Paris,
2002.
[5] R. M. Solovay, A version of for which ZFC cannot predict a single bit, in
C. S. Calude, G. P
aun, Finite Versus Infinite: Contribution to an Eternal
Dilemma, pp. 323334, Springer, London, 2000.
book
8:57
Omega Numbers
book
355
8:57
356
Jean-Paul Delahaye
book
8:57
Omega Numbers
book
357
8:57
358
Jean-Paul Delahaye
book
8:57
Chapter 21
Chaitin and Civilization 2.0
Tor Nrretranders
Strandvejen 413, DK-2930 Klampenborg, Denmark; tor@ tor. dk
A short story on why stories have to be long: Why Chaitin did not meet
G
odel or Leibniz or Plato during his random walk of life. Why he is to
become the hero of a new age, The Link Age where everything is being
connected to everything else.
book
8:57
360
Tor Nrretranders
that. We might have hoped for purity and majestic harmony in math and
pure science. But no! G
odel has shown us that this is not the way it is, at
least in some extremely elegantly thought out cases that he studied back
in 1931. Greg Chaitin has shown us that this result is not a weird detail,
but the way things areor at least the way our descriptions are. It is
the same wisdom that we find in quantum physics: We can never give a
description that is at the same time complete and without contradiction.
Danish physicist Niels Bohr coined the phrase complementarity to describe
the fact that when it comes to electrons and other sub-atomic entities, we
have to use more than one concept to catch all of their behaviour (like both
the concept of waves and particles). Yet at the same time we have to accept
that these two concepts contradict each other. We need them both, but
they exclude each other.
Quantum mechanics and Godels proof is certainly not the same thing,
but they are more related than we usually see them. Epistemologically,
they tell the same story: Any description is incomplete, we need more than
one description to grasp anything, but these different descriptions will never
be united, because they contradict each other.
And here was a guy with a red baseball cap walking a mile-long sandy
beach in the summer heat telling me about snow. Snow is fantastic! Crosscountry skiing is just like a mathematical abstraction! There is only the
blue sky and the white, snow-covered mountain, explained Chaitin, telling
me how a common friend has taught him to ski. The snow-covered land is
like pure math: The contours and general outline of the landscape is there,
but all the messy details like bushes and small trenches and streams are
evened out by the deep snow-cover. Snow abstracts away the landscape.
Chaitin likes that. But he didnt like the snow 35 years ago. It meant that
he never met G
odel, even though he did have an appointment.
His secretary called and said that Godel was worried about his health
and wouldnt go out into the snow. Chaitin was to take the train from
New York to Princeton to meet the master. But there was snow. Soon
after he had to leave for Buenos Aires where his parents worked for the UN
and he worked for IBM.
That was the shortest answer, Chaitin could give to the question, if he
ever met G
odel. The was no simpler structure to the story, it was a chain
of coincidences, not structured in a simpler way.
book
8:57
book
361
8:57
362
Tor Nrretranders
book
8:57
book
363
take the simple structures and use them again and again, according to the
laws for their behaviour, and we can get the original random mess back! In
that case we feel we understand: We can take mess, reduce it to simplicity,
do some push-ups with the simplicity and get the mess back.
Lets translate that into randomness and order: We take randomness,
find some structure in it, manipulate the structure and get randomness
back.
Lets translate it into science: We have reality (what a mess!) and we
extract simple principles, manipulate them and get the mess back.
Lets translate it into epistemology: We have a world with no structure,
we build structures in our head, and we can reconstruct the mess.
Chaitin meets Leibniz.
21.5. Abstractions?
Plato, the old Greek philosopher, insisted that there were ideas behind the
phenomena: All existing horses were the incarnation of the principle of the
horse. Ideas before phenomena. That is, order before randomness.
But Plato was wrong. The world is random and there are only approximative concepts.
Or is it so? Most mathematicians (but not Chaitin) are closet-platonists.
They think mathematical objects exist before that messy and low world in
which we exist.
I am arguing here that the mess is the starting point, order evolved.
Some would argue that order was designed and came first.
But perhaps it was in fact designed: 10,000 years ago humans started
doing agriculture. Before that we lived a rich and long life as huntergatherers, collecting a huge variety and self-grown, wild plants and fruits
and hunting down self-grown animals and fish.
But then, after a climate disaster at the end of the Last Ice Age, ocean
water level rose by 100 meters (!) and something new had to happen. Agriculture became the answer.
Agriculture is dull: No longer was there an almost infinite variety of
plants and animals (with hundreds of species being caught or collected every
day). It was all reduced to a few, high-yielding plants and domesticated
animals.
Look at the field and say Wheat! You dont really have to say more.
Look at the wilderness and it will take you a long time to describe.
Agriculture introduces the abstraction. The idea of wheat is in fact
8:57
364
Tor Nrretranders
primary to the wheat field. First was the idea, then came the reality. Idea
before phenomenon. Agriculture.
21.6. The Link Age
But now water levels are rising again and we have to rethink civilisation,
we have to invent a Civilisation 2.0, as I call it. We are entering what I
call The Link Age, the era of network links and everything being linked
to everything else. We are also leaving the starch-producing agricultural
era of a few grasses grown in huge quantities (grasses like wheat, barley,
rye, rice, corn, sugar cane, etc.) 10,000 years ago, a rich variety of different
individual plants in the self-grown wilderness of the hunter-gatherer culture
was transformed into a new reality of stereotypes of cultivated land.
Now, however, Civilisation 2.0 is on its way. It is being created these
days, by mostly unknowing wind mill farmers and Web 2.0 collectives of
co-operative software producers. The wilderness, the distributed control
and the sunshine is coming back.
Agriculture created Civilisation 1.0 with all its real abstractions like the
wheat field or the chickens in the barn.
We are now seeing the advent of a Civilisation 2.0 based on the link
as the atom: Networks, peer-to-peer on the net, social software, social
technology, keeping tracks of your relationships digitally; renewable energy,
recycling, environment. The age of the link, as opposed to the age of atoms
and substance. Not things but relationships between things; not individual
people, but links between people. The Link Age.
Not hierarchies, not central control, not life guards, but Web 2.0 - spontaneous collaboration. High information content, mediated by machines.
Green and hi tech at the same time.
Civilisation 2.0 is arising these days. The world is slowly understanding
the severity of the climate crisis and the fantastic opportunities offered by
the internet and new epistemology. Randomness will reign again.
Civilisation 2.0 will have theoretical heroes. Greg Chaitin will be the
leading one. He discovered the nature of randomness, non-order, noncontrol, the world as it is in all its richness. Chaitin transcended the logic
of the agricultural mindwithout even trying. He just did it. He didnt
know he did it. He didnt even try. We should all be thankful. You could
not give a shorter version of the story than telling about all that Chaitin
did. There was no plan. His career has not been centrally organised and
planned. It was a crooked, zigzag way, information rich and random. It
book
8:57
book
365
was what it was. Never close to the life guard. High risk. High importance.
Greg Chaitins time is just about to come.
Congratulations, everyone.
8:57
366
Tor Nrretranders
book
8:57
Chapter 22
Some Modern Perspectives on the Quest for Ultimate
Knowledge
Stephen Wolfram
Wolfram Research
Dedicated to Gregory Chaitin on the occasion of his sixtieth birthday,
these remarks attempt to capture some of the kinds of topics we have
discussed over the course of many enjoyable hours and days during the
past twenty-five years.
The spectacular growth of human knowledge is perhaps the single greatest achievement in the history of civilization. But will we ever know everything? Three centuries ago, Gottfried Leibniz had a plan. In the tradition
of Aristotle, but informed by two more millennia of mathematical development, he wanted to collect and codify all human knowledge, then formalize
it so everything one could ever want to know could be derived by essentially mathematical means. He even imagined that by forming all possible
combinations of statements, one could systematically generate all possible
knowledge. So what happened to this plan? And will it, or anything like
it, ever be achieved?
Two major things went wrong with Leibnizs original idea. The first was
that human knowledge turned out to be a lot more difficult to formalize
than he expected. And the second was that it became clear that reducing
things to mathematics wasnt enough. Godels Theorem, in particular, got
in the way, and showed that even if one could formulate something in terms
of mathematics, there might still be no procedure for figuring out whether
it was true.
Of course, some things have gone better than Leibniz might have
imagined. A notable example is that its become clear that all forms of
informationnot just wordscan be encoded in a uniform digital way.
367
book
8:57
368
Stephen Wolfram
book
8:57
Picture 1
Picture 2
Is there a way to shortcut what is happening here, to find the outcome
without explicitly following each step? In the first picture above, its obvious that there is. But in the second picture, I dont think there is. I think
what is happening here is a fundamentally computationally irreducible process.
book
8:57
370
Stephen Wolfram
If one traces each step explicitly, there is no problem working out what
will happen. But the point is that there is no general shortcut: no way
to find the outcome without doing essentially as much work as the system
itself.
How can this be? We might have thought that as our methods of mathematics and science got better, we would always be able to do progressively
better. But in a sense what that would mean is that we, as computational
systems, must always be able to become progressively more powerful. And
this is where universal computation comes in. Because what it shows is that
there is an upper limit to computational capability: once one is a universal
computer, one cant go any further. Because as a universal computer, one
can already emulate anything any other system can do. So if the system
ones looking at is a universal computer, its inevitable that one cant find
a shortcut for what it does.
But the question for scienceand for knowledge in generalis how often
the systems ones looking at are universal, and really behave in computationally sophisticated ways.
The traditional successes of the exact sciences are about cases where the
systems ones looking at are computationally quite simple. And thats precisely why traditional science has been able to do what it does with them.
Theyre computationally reducibleand so what the science has done is
to find reductions. Find things like exact formulas that give the outcome
without working through the steps.
But the reason I think science hasnt been able to make more progress is
precisely because there are lots of systems that arent computationally reducible. There are lots of systems that canand doperform sophisticated
computations. And that are universal. And that are just as computationally sophisticated as any of the methods were able to use to analyze them.
So that they inevitably seem to us complexand we cant work out what
they will do except with an irreducible amount of computational work.
Its a very interesting question of basic science just how ubiquitous computational irreducibility really is. Its a little confusing for us, because were
so used to concentrating on cases that happen to be computationally reducible. Most of our existing engineering is built on systems that happen
to behave in computationally reducible waysso that we can readily work
out what theyll do. Biological evolution, as well, tends to have an easier
time dealing with computationally reducible systems. And we as humans
book
8:57
book
371
8:57
372
Stephen Wolfram
book
8:57
book
373
what broader class of things still satisfy some particular favorite theorem.
So thats how one goes from integers to real numbers, complex numbers,
matrices, quaternions, and so on. But inevitably its the kind of generalization that still lets theorems be proved. And its not reaching anything
like all the kinds of questions that could be askedor that one would find
just by systematically enumerating possible questions.
One knows that there are lots of famous unsolved problems in mathematics. Particularly in areas like number theory, where its a bit easier to
formulate possible questions. But somehow theres always been optimism
that as the centuries go by, more and more of the unsolved problems will
triumphantly be solved.
I doubt it. I actually suspect that were fairly close to the edge of
whats possible in mathematics. And that quite close at handand already in the current inventory of unsolved problemsare plenty of undecidable questions. Mathematics has tended to be rather like engineering:
one constructs things only when one can foresee how they will work. But
that doesnt mean that thats everything thats there. And from what Ive
seen in studying the computational universe, my intuition is that the limits to mathematical knowledge are close at handand can successfully be
avoided only by carefully limiting the scope of mathematics.
In mathematics there has been a great emphasis on finding broad methods that in effect define whole swaths of computational reducibility. But
the point is that that computational reducibility is in many ways the exception, not the rule. So instead, one must investigate mathematics by
studyingin more specific termswhat particular systems do.
Sometimes it is argued that one can see the generality of mathematics
by the way in which it successfully captures what is needed in natural science. But the only reason for this, I believe, is that natural science has
been limited tooin effect to just those kinds of phenomena that can successfully be captured by traditional mathematics!
Sometimes it is also said that, yes, there are many other questions that
mathematics could study, but those questions would not be interesting.
But really, what this is saying is just that those questions would not fit into
the existing cultural framework of mathematics. And indeed this is precisely whyto use the title of my bookone needs a new kind of science
to provide the framework. And to see how the questions relate to questions
of undeniable practical interest in natural science and technology.
8:57
374
Stephen Wolfram
But OK, one can argue about what might or might not count as mathematics. But in physics, it seems a bit more clear-cut. Physics should be
about how our universe works.
So the obvious question is: do we have a fundamental theory? Do we
have a theory that tells us exactly how our universe works?
Well, physics has progressed a long way. But we still dont have a fundamental theory. Will we ever have one? I think we will. And perhaps
even soon.
For the last little while, it hasnt looked promising. In the nineteenth
century, it looked like everything was getting wrapped up, just with mechanics, electromagnetism and gravity. Then there were little cracks. They
ended up showing us quantum mechanics. Then quantum field theory. And
so on. In fact, at every stage when it looked like everything was wrapped
up, thered be some little problem that ended up not being so little, and
inevitably making our theory of physics more complicated.
And thats made people tend to think that there just cant be a simple
fundamental theory. That somehow physics is a bottomless pit.
Well, again, from studying the computational universe, my intuition
has ended up being rather different. Because Ive seen so many cases where
simple rules end up generating immensely rich and complex behavior. And
thats made me think its not nearly so implausible that our whole universe
could come from a simple rule.
Its a big question, though, just how simple the rule might be. Is it
like one or two lines of Mathematica code? Or a hundred? Or a thousand?
Weve got some reason to believe that its not incredibly complicated
because in a sense then there wouldnt be any order in the universe: every
particle would get to use a different part of the rule and do different things.
But is it simple enough that, say, we could search for it? I dont know.
And I havent figured out any fundamental basis for knowing. But its
certainly not obvious that our universe isnt quite easy to find out in the
computational universe of possible universes. There are lots of technical
issues. If theres a simple rule for the universe, itin a sensecant have
anything familiar already built in. There just isnt room in the rule to,
say, have a parameter for the number of dimensions of space, or the mass
of the electron. Everything has to emerge. And that means the rule has to
be about very abstract things. In a sense below space, below time, and so
on.
book
8:57
book
375
But Ive gotI thinksome decent ideas about ways to represent those
various abstract possible rules for universes. And Ive been able to do a
little bit of universe hunting.
But, well, one quickly runs into a fundamental issue. Given a candidate
universe, its often very obvious what its like. Perhaps it has no notion of
time. Or some trivial exponential structure for all of space. Stuff that makes
it easy to reject as being not our universe. But thenquite quicklyone
runs into candidate universes that do very complicated things. And where
its really hard to tell if theyre our universe or not. As a practical matter, what one has to do is in a sense to recapitulate the whole history of
physics. To take a universe, and by doing experiments and theory, work
out the effective physical laws that govern it. But one has to do this automatically, and quickly. And theres a fundamental problem: computational
irreducibility.
Its possible that in my inventory of candidate universe is our very own
universe. But we havent been able to tell. Because going from that underlying rule to the final behavior requires an irreducible amount of computational work.
The only hope is that there are enough pieces of computational reducibility to be able to tell whether what we have actually is our universe.
Its a peculiar situation: we could in a sense already have ultimate knowledge about our universe, yet not know it.
One thing that often comes up in physics is the idea that somehow eventually one cant ever know anything with definiteness: there always have to
be probabilities involved. Well, usually when one introduces probabilities
into a model, its just a way to represent the fact that theres something
missing in the modelsomething one doesnt know about, and is just going to assume is random. In quantum theory, probabilities get elevated
to something more fundamental, and were supposed to believe that there
can never be definite predictions for what will happen. Somehow that fits
with some peoples beliefs. But I dont think it scientifically has to be true.
There are all kinds of technical thingslike Bells inequality violations
that have convinced people that this probabilistic idea is real. But actually
there are technical loopholesthat I increasingly think are whats actually
going on. And in fact, I think its likely that there really is just a single,
definite, rule for our universe. That in a sense deterministically specifies
8:57
376
Stephen Wolfram
book
8:57
book
377
8:57
378
Stephen Wolfram
molecules in materials.
So itll be a peculiar picture: the computations we want to do happening down at an atomic scale. With electrons whizzing aroundpretty much
just like they do anyway in any material. But somehow in a pattern that
is meaningful with respect to the computations we want to do.
Now perhaps a lot of the time we may want to do pure computation
in a sense just purely think. But sometimes well want to interact with
the worldfind out knowledge from the world. And this is where our sensors come in. But if they too are operating at an atomic scale, itll be just
as if some clump of atoms somewhere in a material is affecting some clump
of atoms somewhere elseagain pretty much just like they would anyway.
Its in a sense a disappointing picture. At the end of all of our technology development, were operating just like the rest of the universe. And
from the outside, theres nothing obvious weve achieved. Youd have to
know the history and the context to know that those electrons whizzing
around, and those atoms moving, were the result of the whole rich history
of human civilization, and its great technology achievements.
Its a peculiar situation. But in a sense I think it reflects a core issue
about ultimate knowledge.
Right now, the web contains a few billion pages of knowledge that humans have collected with considerable effort. And one might have thought
that itd be difficult to generate more knowledge.
But it isnt. In a sense thats what Leibniz found exciting about mathematics. Its possible to use it systematically to generate new knowledge.
Working out new formulas, or new results, in an essentially mechanical way.
But now we can take that idea much further. We have the whole computational universe to explorewith all possible rules. Including, for example,
I believe, the rules for our physical universe. And out in the computational
universe, its easy to generate new knowledge. Just sampling the richness
of what even very simple programs can do. In fact, given the idea of computation universality, and especially the Principle of Computational Equivalence, there is a sense in which one can imagine systematically generating
all knowledge, subject only to the limitations of computational irreducibility.
But what would we do with all of this? Why would we care to have
knowledge about all those different programs out there in the computa-
book
8:57
book
379
tional universe? Well, in the past we might have said the same thing about
different locations on the Earth. Or different materials. Or different chemicals. But of course, what has happened in human history is that we have
systematically found ways to harness these things for our various purposes.
And so, for example, over the course of time we have found ways to use a
tremendous diversity, say, of possible materials that we can mine from
the physical world. To find uses for magnetite, or amber, or liquid crystals,
or rare earths, or radioactive materials, or whatever. Well, so it will be with
the computational universe. Its just starting now. Within Mathematica,
for example, many algorithms we use were mined from the computational
universe. Found by searching a large space of possible programs, and picking ones that happen to be useful for our particular purposes. In a sense
defining pieces of knowledge from the sea of possibilities that end up being
relevant to us as humans.
It will be interesting to watch the development of technologyas well
as art and civilization in generaland to see how it explores the computational universe of possible programs. Im sure itll be not unlike the case of
physical materials. Therell be techniques for mining, refining, combining.
Therell be gold rushes as particular rich veins of programs are found.
And gradually the domain of whats considered relevant for human purposes
will expand to encompass more and more of the computational universe.
But, OK, so there is all sorts of possible knowledge out there in the
computational universe. And gradually our civilization will make use of it.
But what about particular knowledge that we would like to have, today?
What about Leibnizs goal of being able to answer all human questions by
somehow systematizing knowledge?
Our best way of summarizing and communicating knowledge tends to
be through language. And when mathematics became formalized, it did
so essentially by emulating the symbolic structure of traditional human
natural language. And so its interesting to see whats happened in the
systematization of mathematics.
In the early 1900s, it seemed like the key thing one wanted to do was
to emulate the process of mathematical proof. That one wanted in effect
to find nuggets of truth in mathematics, represented by proofs. But actually, this really turned out not to be the point. Instead, what was really
important about the systematization of mathematics was that it let one
specify calculations. And that it let one systematically do mathematics
8:57
380
Stephen Wolfram
book
8:57
book
381
But if we just take language as it is, it defines a tiny slice of the computational universe. It is in many ways an atypical slice. For it is highly
weighted towards computational reducibility. For we, as humans, tend to
concentrate on things that make sense to us, and that we can readily summarize and predict. So at least for now only a small part of our language
tends to be devoted to things we consider random, complex, or otherwise hard for us to make sense of.
But if we restrict ourselves to those things that we can describe with
ordinary language, how far can we go in our knowledge of them? In most directions, computational irreducibility is not far awayproviding in a sense
a fundamental barrier to our knowledge. In general, everyday language is
a very imprecise way to specify questions or ideas, being full of ambiguities
and incomplete descriptions. But there is, I suspect, a curious phenomenon
that may be of great practical importance. If one chooses to restrict oneself
to computationally reducible issues, then this provides a constraint that
makes it much easier to find a precise interpretation of language. In other
words, a question asked in ordinary language may be hard to interpret in
general. But if one chooses to interpret it only in terms of what can be
computedwhat can be calculatedfrom it, it becomes possible to make
a precise interpretation.
One is doing what I believe most of traditional science has done: choosing to look only at those parts of the world on which particular methods
can make progress. But I believe we are fairly close to being able to build
technology that will let us do some version of what Leibniz hoped for. To
take issues in human discourse, and when they are computable, compute
them.
The weband especially web searchhas defined an important transition. It used to be that static human knowledgewhile in principle accessible through libraries and the likewas sufficiently difficult to access that
a typical person usually sampled it only very sparingly. But now it has
become straightforward to find known facts. Using the handle of language, we just have to search the web for where those facts are described.
But what about facts that are not yet known? How can we access
those? We need to create the facts, by actual computation. Some will be
fraught with computational irreducibility, and in some fundamental sense
be inaccessible. But there will be others that we can access, at least if we
can find a realistic way for us humans to refer to them.
8:57
382
Stephen Wolfram
book
8:57
Chapter 23
An Enquiry Concerning Human (and Computer!)
[Mathematical] Understanding
Doron Zeilberger12
Department of Mathematics, Rutgers University (New Brunswick), Hill
Center-Busch Campus, 110 Frelinghuysen Rd., Piscataway, NJ
08854-8019, USA; zeilberg@ math. rutgers. edu
book
8:57
384
Doron Zeilberger
erroneous and sometimes pure gibberish, the spirit of the critiques were very
well-founded, since all that they were trying to say was that old standby,
that goes back at least to Socrates: We know that we dont know.
23.2. Skeptics
I have always admired skeptics, from Pyrrho of Elis all the way to Jacques
Derrida. But my two favorite skeptics are David Hume and Gregory
Chaitin, who so beautifully and eloquently described the limits of science and the limits of mathematics, respectively.
book
8:57
Constant, :
:=
2|p| ,
p halts
where the sum ranges over all self-delimiting programs run on some Universal Turing Machine. As Greg puts it so eloquently, is the epitome of
mathematical randomness, and its digits are beautiful examples of random
mathematical facts, true for no reason. It also has the charming property
of being normal to all bases.
23.5. How Real Is ?
There is only one problem with , it is a real number! As we all know, but
most of us refuse to admit, real numbers are not real, but purely fictional,
since they have infinitely many digits, and there is no such thing as infinity. Worse, is uncomputable, since we know, thanks to Turing, that
there is no way of knowing, a priori,whether p halts or not. It is true that
many real numbers, for example 2, , e, etc., can be deconstructed in
finite terms, by renaming them algorithms, and we do indeed know that
these are genuine algorithms since in each specific case, we can prove that
any particular digit can be computed in a finite, pre-determined, number
of steps. But if you believe in , then you believe in God. God does know
whether any program p will eventually halt or not, because God lives for
ever and
ever (Amen), and also can predict the future, so for God, is as
real as 2 or even 2 is for us mere mortals. So indeed, if God exists, then
exists as well, and God knows all its digits. Just because we, lowly mortals,
will never know the digits of , is just a reflection on our own limitations.
But what if you dont believe in God? Or, like myself, does not know
for sure, one way or the other?
23.6. Do I Believe in ?
Regardless of whether or not God exists, God has no place in mathematics,
at least in my book. My God does not know (or care) whether a program
p eventually halts or not. So does not exist in my, ultra-finitistic, worldview. But, it does indeed exist as a symbol, and as a lovely metaphor,
so like enlightened non-fundamentalist religious folks, we can still enjoy
and believe in the bible, even without taking it literally. I can still love and
book
8:57
386
book
Doron Zeilberger
cherish and adore Chaitins constant, , the same way as I enjoy Adam and
Eve, or Harry Potter, and who cares whether they are real or fictional.
23.7. Greg Chaitins Advice About Experimental Mathematics
One interesting moral Greg Chaitin draws from his brainchild, Algorithmic
Information Theory, and its crown jewel, , is the advice to pursue Experimental Mathematics. Since so much of mathematical truth is inaccessible,
it is stupid to insist on finding a proof for every statement, since for one,
the proof may not exist (it may well be undecidable), or it may be too long
and complicated for us mere humans, and even for our computers. So Greg
suggests to take truths that we feel are right (on heuristic or experimental
grounds) and adopt them as new axioms, very much like physicist use
Conservation of Energy and the Uncertainty Principle as axioms. Two
of his favorites are P 6= N P and the Riemann Hypothesis. Of course, by
taking these as new axioms we give up on one of the original meanings of
the word axiom, that it should be self-evident, but Hilbert already gave
this up by making mathematical deduction into a formal game.
23.8. Stephen Wolframs Vision
Another, even more extreme, advocate of Experimental Mathematics, is
guru Stephen Wolfram, whose New Kind of Science and New Kind of Mathematics are completely computer-simulation-centric. Lets dump traditional
equation-centric science and deduction-centric mathematics in favor of doing computer experiments, and watching the output.
23.9. Tweaking Chaitins and Wolframs Messages:
Many Shades of Rigor
The
I admire both Chaitin and Wolfram, but like true visionary prophets, they
see the world as black and white. Since all truths that we humans can know
with old-time certainty are doomed to be trivial (or else we wouldnt have
been able to prove them completely), and conversely, all the deep results
will never be able to be proved by us completely, with traditional standards, they advise us to abandon the old ways, and just learn how to ask
our computers good questions, and watch its numerical output, and gain
insight from it.
8:57
book
8:57
388
Doron Zeilberger
book
8:57
book
8:57
390
Doron Zeilberger
may be good media for love songs, but not for mathematics and computer
science. The lingua-franca of theoretical computer science is the Turing
machine. There are also numerous equivalent models, that are sometimes
easier to work with. But even this is too vague, since we cant tell, thanks
to Turing, whether our TM would halt or not, in other words whether it
is a genuine algorithm or just an algorithm wannabe. Furthermore, even if
it does halt, if my super-short computer program would take googolplex to
the power googolplex years to generate my sequence, it cant do me much
good. It is true that for aesthetic reasons, Greg Chaitin refused to enter
time into his marvelous theory, and he preempted the criticism by the disclaimer that his theory is useless for applications. But, I, for one, being,
in part, a naturalist, find it hard to buy this nonchalance. Life is finite
(alas, way too finite), and it would be nice to reconcile time-complexity
with program-size complexity. Anyway, using Turing machines or any of
the other computational models, for which the halting program is undecidable, makes this notion meaningless. Of course it has a great metaphoric
and connotative meaning!
So the notion of Turing machine-computable is way too general. Besides
the Greek model, adopted by mathematicians and meta-mathematicians
alike does not represent how most of mathematics is done in practice.
Most of mathematics, even logic, is done within narrow computational
frameworks, sometimes explicit, but more often implicit. And what mathematicians do is symbol-crunching rather than logical deduction. Of course,
formal logic is just yet another such symbolic-computational framework,
and in principle all proofs can be phrased in that language, but this is unnatural, inefficient, and worse, sooo boring.
Lets call these computational frameworks ansatzes. In my humble
opinion, mathematics should abandon the Greek model, and should consciously try to explicate more and more new ansatzes that formerly were
only implicit. Once they are made explicit, one can teach them to our
computers and do much more than any human.
23.15. The Ansatz Ansatz
Indeed, lots of mathematics, as it is actually practiced today, can be placed
within well-defined computational frameworks, that are provably algorithmic and, of course, decidable. Sometimes the practitioners are aware of
this, and in that case new results are considered routine. For example,
book
8:57
book
the theorem
198765487 198873987 = 39529284877686669,
is not very exciting today, since it belongs to the well-known class of explicit arithmetical identities.
On the other hand, the American Mathematical Monthly still publishes
papers today in Euclidean Geometry, that, thanks to Rene Descartes, is
reducible to high-school algebra, that is also routinely provable, of course
in principle, but today also in practice, thanks to our powerful computer
algebra systems.
The fact that multiplication identities are routinely provable is at least
5000-years old, and the fact that theorems in Plane Geometry are routinelyprovable is at least 250-years old (and 40-year old in practice), but the fact
that an identity like
3
n
X
2n
(3n)! ,
(1)k
=
n+k
n!3
k=n
discovered, and first proved in 1904 by Dixon, is also routinely provable, is
only about 16-years old, and is part of so-called Wilf-Zeilberger Theory.
In each of these cases it is nowadays routine to prove an identity of the
form A = B, since there is a canonical form algorithm A c(A), and all
we have to do is check that c(A) = c(B). In fact, to prove that A = B, it
suffices to have a normal-form algorithm, checking that A B is equivalent to 0.
But before we can prove a statement of the form A = B, we have to
find an appropriate ansatz to which they both belong.
At this time of writing, there are only a few explicitly known ansatzes.
Lets first review one of my favorites.
23.16. The Polynomial Ansatz
David Hume is right that there is no formal, watertight, proof that the sun
will rise tomorrow, since the Boolean-valued function
f (t) := evalb(T he
Sun
W ill
Rise
At
Day
t),
has not yet been proved to belong to any known ansatz. Indeed, we now
know, that for t >> 0, f (t) is false, because the Sun will swallow Planet
Earth, so all we can prove are vague probabilistic statements for small t
8:57
392
Doron Zeilberger
i=1
is perfectly rigorous.
Proof: True for n = 0, 1, 2, 3, 4 (check!), hence true for all n.
QED
In order to turn this into a full-fledged proof, all you have to do is
mumble the following incantation:
Both sides are polynomials of degree 4, hence it is enough to check
the identity at five distinct values.
23.17. An Ansatz-based Chaitin-Kolmogorov Complexity
So lets define the complexity of an infinite (or finite) sequence always relative to a given ansatz, assuming that it indeed belongs to it. So our
descriptive language is much more modest, but we can always determine
its complexity, and everything is decidable. It does not have the transcendental beauty and universal insight of Chaitins Algorithmic Information
Theory, but on the other hand, we can always decide things, and nothing
is unknowable (at least in principle).
23.18. It All Depends on the Data Structure
Even within a specific ansatz, there are many ways of representing our
objects. For example, since a polynomial P , of degree d is determined
by its values at any d + 1 values, we can represent it in terms of a finite
sequence [P (0), . . . , P (d)] that requires d + 1 bits (units of information).
Of course, we can also express it in the usual way, as a linear combination
of the powers {1, n, n2 , . . . , nd }, or in terms of any other natural base, for
book
8:57
example { nk , k = 0 . . . d}. Each of these data structures require d + 1
bits, in general, but in specific cases we can sometimes compress in order
to get lower complexity. For example it is much shorter to write n1000 then
to write [0, 1, 21000 , . . . , 10001000 ] (without the . . . , and spelled-out).
23.19. The Strong N0 Property
An ansatz has the Strong N0 property, if given any two sequences, A, B,
within that ansatz, in order to prove that A(n) = B(n) (for all n), there exists an easily computable (say polynomial-time in the maximal size of A and
B) number N0 = N0 (A, B) such that in order to prove that A(n) = B(n)
for all n, it suffices to prove it for any N0 distinct values of n.
The iconic example of an ansatz having the strong N0 property, already
mentioned above, is the set of polynomials. For polynomials P (x) of a single
variable, N0 (P (x))
is deg P + 1. For a polynomial P (x1 , . . . , xn ) of degree
d+n
d, N0 (P ) is n .
23.20. The Weak N0 Property
An ansatz has the Weak N0 property, if given any two sequences, A, B,
within that ansatz, in order to prove that A(n) = B(n) (for all n), there
exists an easily computable (say polynomial-time in the maximal size of
A and B) number N0 = N0 (A, B) such that in order to prove that
A(n) = B(n) for all n, it suffices to prove it for the first N0 values of
n: n = 1, n = 2, . . . , n = N0 .
A simple example of an ansatz that has the weak, but not the strong,
N0 property, are periodic sequences. If two sequences are known a priori
to have periods d1 and d2 , then if they are equal for the first max(d1 , d2 )
values, then they are identically equal. But the two sequences f (n) := 1
and g(n) := (1)n coincide at infinitely many places (all the even integers),
yet the two sequences are not identically equal.
23.21. Back to Science: The PEL Model
In Hugh G. Gauchs excellent book on the Scientific method Scientific
Method in Practice, he proposes the PEL model, PEL standing for Presupposition, Evidence, Logic. So Humes objection disappears if we are
willing to concede that science is theory laden, and we have lots of presup-
book
8:57
394
book
Doron Zeilberger
Sun
W ill
Rise
At
Day
t) ,
belongs to the constant ansatz (at least for the next 100000 years), then
checking it in just one point, say t = today, proves that the sun will indeed
rise tomorrow.
On the other end, to prove that all emeralds are grue, presupposes that
the color of emeralds belong to the piece-wise constant ansatz, since the
notion of grue belongs to it. In that case, N0 > 2050, so indeed checking
it for many cases but before 2050, does not suffice, even non-rigorously, to
prove that all emeralds are grue.
23.22. The Probabilistic N0 Property
Sometimes N0 is way too big, in other words, to get complete certainty will
take too long. Then you might want to consider settling for N0 (p).
An ansatz that has the probabilistic N0 -property, is one for which, in
order to prove that A B, with probability p, there exists an easily computable (say polynomial-time in the maximal size of A and B) number
N0 (p) = N0 (A, B, p) such that in order to prove that A(n) = B(n) for all
n with probability p, it suffices to prove it for any N0 (p) randomly chosen
values of n.
The celebrated Schwartz-Zippel theorem establishes that multi-variable
polynomials satisfy the N0 (p) property (in addition to having the N0 property, of course), and that N0 (.9999999) is much smaller than N0 (1), so it is
stupid to pay for full certainty.
23.23. An Embarrassing Paper of Mine
Can you envision a professional mathematician publishing a paper entitled
A bijective proof of 10 5 = 2 25, by concocting a nice bijection?
Of course not! Today, all explicit arithmetical identities are known to be
routinely provable.
Yet something analogous happened to me. In my web-journal, I pub-
8:57
book
8:57
396
Doron Zeilberger
a(n)xn ,
n=0
book
8:57
book
(F unEq)
(AlgEq)
(AlgEq)
it follows that G(x, 0; t), G(0, y; t), G(0, 0; t) are all algebraic:
Q(G(x, 0; t), x, 0, t) 0,
(AlgEq 0 )
Q(G(0, y; t), 0, y, t) 0,
(AlgEq 00 )
Q(G(0, 0; t), 0, 0, t) 0.
(AlgEq 000 )
8:57
398
Doron Zeilberger
Now
H(x, y, t) := A(x, y, t)G(x, y; t) + B(x, y, t)G(0, y; t) + C(x, y, t)G(x, 0; t)
book
8:57
book
8:57
400
book
Doron Zeilberger
Then C(m, 0) = 1, and you have the following recurrence, easily derived by
looking at the number of 2s to the left of the rightmost 1:
C(m, n) =
n
X
C(m 1, i),
(1)
i=0
from which you can easily deduce the following special cases:
m+1
m+2
m+3
C(m, 1) =
, C(m, 2) =
, C(m, 3) =
,
1
2
3
that naturally
leads to the conjecture C(m, n) = D(m, n), where D(m, n) =
m+n
.
It
can
be verified for n 10 easily by using (1) with specific n but
n
general m, by only using polynomial summation. Now the more general
statement, C = D, is much more plausible. Besides, this more general
conjecture is much easier to prove, since you have more elbow room, and
it is easy to prove that both X = C and X = D are solutions of the linear
partial recurrence boundary-value problem:
X(m, n) = X(m 1, n) + X(m, n 1),
X(m, 0) = 1,
X(0, n) = 1.
So in this case finding the right generalization first made our conjecture
much more plausible, and then also made it easy to prove.
23.31. How to Do It the Hard Way
In order for you to appreciate how much trouble could be saved by introducing a more general conjecture, lets do it, the hard way, sticking to the
original one-parameter conjecture.
Let b(n) be the number of words in {1, 2} with exactly n 1s and with
exactly n 2s such that in addition , for any proper prefix, the number of
1s always exceeds the number of 2s. Analogously, Let b0 (n) be the number
of words in {1, 2} with exactly n 1s and with exactly n 2s such that in
addition the number of 2s always exceeds the number of 1s except at the
beginning and end. By symmetry b(n) = b0 (n).
Then we have the non-linear recurrence
a(n) =
n1
X
m=0
a(m)(b(n m) + b0 (n m)) = 2
n
X
a(m)b(n m).
(2)
m=0
obtained by looking at the longest prefix with the same number of 1s and
2s. Also, using a standard combinatorial argument, b(n) can be shown to
8:57
n1
X
b(m)b(n m),
m=1
from which you can crank out many values of b(n), that in turn, enable you
to crank out many values of a(n), and make your conjecture much more
plausible. Using the above non-linear recurrence, you can generate the first
few terms of the sequence {b(n)}
n=1 : 1, 1, 2, 5, 14, 42, 132, . . . , and easily
(2n2)!
, and to prove it rigorously, all you need is verify
guess that b(n) = (n1)!n!
the binomial coefficient identity
n1
X (2m 2)!
(2n 2m 2)! ,
(2n 2)!
=
that can be done automatically with the WZ method, and then prove the
identity
n1
X 2m
2n
(2n 2m 2)! ,
=2
(n m 1)!(n m)!
n
m
m=0
that is likewise WZable.
Note: One can also do it, of course, with generating functions, staying
within the Sch
utzenberger ansatz rather than the holonomic anstaz. But
it is still much harder than doing it via the 2-parameter generalization
discussed above.
23.32. P
olyas Ode to Incomplete Induction
In Polyas masterpiece on the art of mathematical discovery, Induction
and Analogy in Mathematics he lauded the use of incomplete induction as a powerful heuristic for discovering mathematical conjectures, and
as a tool for discovering possible proofs. In particular he cites approvingly
the great Euler who conjectured, long before he had a formal proof, many
interesting results. For example:
X
1
2 ,
=
2
n
6
n=1
that he verified numerically to six decimal places, noting that this implies
that the probability that the left side and right hand side coincide by accident is less than one in a million. Many years later he found a complete
book
8:57
402
Doron Zeilberger
n
X
i=1
(i),
book
8:57
satisfy
|M (n)| C()n1/2+ .
Mertens, in 1897, conjectured the stronger conjecture that |M (n)| n1/2 ,
and it was verified for n up to a very large number. Yet in 1985, Andrew
Odlyzko and Herman te Riele disproved it.
Another notorious example concerns the Skewes Number, that is the
smallest n for which (n), the number of prime numbers n, is larger
than li(n), the logarithmic integral. No one knows its exact value, but it
seems to be very large.
23.34. Inequalities vs. Equalities
By hindsight, it is not surprising that both (n) < li(n) and |M (n)| < n
turned out to be false, even though they are true for so many values of n.
First, prime numbers are very hazardous, and since often we have log log
and log log log showing up, it is reasonable to suspect that what seems large
for us is really peanuts. But a better reason to distrust the ample empirical
evidence is that inequalities need much more evidence than equalities.
A trivial example is the following. To prove that P (x) = 0 for a polynomial P of degree d (say given in some complicated way that is not
obviously 0, for example (x4 + 1)(x + 1) x5 x4 x 1) it suffices to
check d + 1 special cases, but consider the conjecture
x
1 < 0.
10000000000000
The left side is a polynomial of degree 1 in x, and the conjecture is true
for the first 10000000000000 integer values of x, yet, of course, it is false in
general.
23.35. The Art of Plausible Reasoning
Given a conjecture P (n), depending on an integer parameter n, that has
been verified for 1 n M , how plausible is it?
If it has the form A(n) B(n), then no matter how big M , it would be
very stupid to jump to conclusions, as shown in the above examples.
From now we will assume that it can naturally be phrased in the form
A(n) = B(n). Granted, every assertion P (n), even an inequality like
book
8:57
404
|M (n)| <
Doron Zeilberger
book
8:57
F (n, k) =
k=0
n
X
G(n, k),
k=0
n
X
k=0
F (n, k)
n
X
G(n, k),
k=0
pi (n)a(n + i) 0,
i=0
book
8:57
406
Doron Zeilberger
book
8:57
book
8:57
408
Doron Zeilberger
book
8:57
(1)k
2n
n+k
3
=
(3n)! ,
n!3
book
8:57
410
Doron Zeilberger
requires a proof, since this is a general statement, valid for all n, but thanks
to WZ theory, there is just one object, a certain rational function R(n, k),
that certifies it. That certificate can be obtained empirically and algorithmically. So the proof is just one object, like the pair (3, 5) in the case of
the theorem that 15 is composite.
If desired, it is always possible to convert a certificate proof to a formal
logic proof, but this is very artificial, and unnecessary.
Lets conclude this manifesto with:
Mathematicians and meta-mathematicians of the world unite,
you have nothing to lose but your logic chains! Lets work
together to develop an ansatz-based mathematics and metamathematics.
book
8:57
Reminiscences
book
8:57
book
8:57
Chapter 24
In the Company of Giants
Andreea S. Calude
Department of Applied Language Studies and Linguistics, The University
of Auckland, Auckland, New Zealand, a. calude@ auckland. ac. nz
As the overcrowded bus huddles into the hustle and bustle of Chinatown, after what feels more like a small lifetime than a four-hour journey,
you could be almost forgiven for thinking you are in Shanghai, but the signs
assure me it is indeed New York that I am finding myself in. Ten minutes
later, sitting in a Japanese cafe and sipping bubble Tapioca tea, I am acting just like one of the locals, well, almost . . . aside from my wide-eyed,
mesmerized expression.
So here I start my exploration of New York, the Big Apple, the city of
opportunities, the land of the yellow cabs, the finance hub, the temple of
the 9-11 pain, the house of fashion and art and theatre and music halls. I
decide to scrap my NY guide book, and just walk around, absorbing the
city that way. I am choosing the skyscrapers as guides instead.
I am due to spend three days in the company of these stirring skyscrapers and while my first visit to the Big Apple makes my heart skip a beat in
itself, I am really here for the chance to spend a weekend with a different
kind of giant: a self-taught mathematical prodigy who started on his quest
of exploring the limits of mathematics at the age of only fifteen, and who
discovered the famous Omega number: Greg Chaitin. Featured in countless popular science magazines, New Scientist, Scientific American, Pour
La Science, Chaitin published his ninth book in September this year. Entitled Meta Math! , the book is printed by Random Housenot an academic
press, but rather the publisher who gave us J. K. Rowlings Harry Potter,
Dan Browns The Da Vinci Code and Mark Haddons The Curious Incident of the Dog in the Night-Time. Meta Math! is written for anyone with
413
book
8:57
414
Andreea S. Calude
book
8:57
book
415
8:57
416
Andreea S. Calude
book
8:57
Chapter 25
Gregory Chaitin: Mathematician, Philosopher, and Friend
John Casti
Wissenschaftzentrum Wien, IIASA, The Kenos Circle, Vienna, Austria;
Castiwien@ cs. com
In 1990, I was working on the manuscript of my book Searching for Certainty (Morrow, New York, 1991), a volume addressing the degree to which
the science of today can effectively predict and/or explain various real-world
phenomena like weather and climate, the outbreak of warfare and the movement of stock market prices. As part of this story, I wanted to include a
chapter on the prediction/explanation of mathematical theorems, in order
to open up a discussion of the philosophy of mathematics, especially what
we mean by mathematical truth and what really constitutes a proof.
Of special concern to me at the time was the question of the complexity of a
mathematical result, since it seemed clear that whatever limits might be in
place for how much science could tell us about the world should be greatly
affected by the complexity of the phenomena under consideration. In short,
are there phenomena that are simply too complex for the human mind
to grasp? In particular, are there theorems/mathematical truths that are
too complex for our axiomatic systems to actually prove or disprove. This
naturally raises the question of what do you mean by the complexity of
some observed eventincluding a mathematical proposition. Enter Greg
Chaitin.
As part of my background research for the book, I had run across Gregs
papers on information theory and algorithmic complexity (later published
by World Scientific in his 1992 book, Information-Theoretic Incompleteness). Since the whole issue of how much juice could you get out of a set
of axioms was totally tied up with the notion of incompleteness a la Godel,
Gregs recasting of G
odels results into the language of computing seemed
to be just what I needed to address the How much complexity is too much
417
book
8:57
418
John Casti
complexity? question. So I sent an email to him and asked his view of the
matter. Not only did Greg reply (which many name-brand scientists did
not when I wrote them about other sections of the book), he generously sent
me a voluminous set of papers, references, and ideas about how to frame
and explain many of the ideas that ultimately appeared in that chapter.
Thus was set in motion an intellectual and personal friendship that is now
seventeen-years old and counting.
In this seventeen years, Greg and I have met dozens of times in almost
all the worlds time zones. And at each one of those meetings Ive come
away with some bit of knowledge or snippet of information that has caused
me to see the world just a bit differently than before. Let me give a rather
eclectic account of just a few of those occasions.
Limits to Scientific Knowledge (Santa Fe, NM 1992): In the spring of
1992, Joseph Traub and I organized a two-day workshop at the Santa Fe
Institute on the theme, Limits to Scientific Knowledge under the sponsorship of the Alfred P. Sloan Foundation. Among the many luminaries
at this meeting were biologist Robert Rosen, computer theorist Rolf Landauer, chaologist Otto R
ossler, economist Brian Arthur, and Sloan Foundation President Ralph Gomory. But the person who contributed the most
to the discussion was, not surprisingly, Greg Chaitin! His booming voice
was heard regularly during the intense discussions, commenting on various
thoughts and presentations that were floating around the meeting room
like the seeds of pollen floating in the desert air of Santa Fe in those days.
Greg was a dynamic force that gave both substance and direction to that
meeting, and which ultimately led to a follow-up workshop on the same
theme in a venue about as far removed from Santa Fe as one can get and
still remain on the same planet.
Limits to Scientific Knowledge (Abisko, Sweden 1995): The village of
Abisko is located many kilometers north of the Arctic Circle, near the border between Norway and Sweden. For several years, the Swedish Council
for Planning and Coordination of Research sponsored an annual meeting
organized by Anders Karlqvist and myself on themes residing at the boundary between the natural sciences, philosophy and the humanities. In 1995,
Anders and I chose the theme of limits to scientific knowledge, in order to
capitalize on the intense, but too short, discussion of these matters in Santa
Fe two years earlier. In the Swedish environment, we had a full week of such
discussions in very intimate surroundings, as the venue for the meeting was
a research station of the Royal Swedish Academy of Sciences, in which all
book
8:57
book
419
the participants lived, ate and, in general, spent most of each day together.
Some of the participants in this meeting were the same as in Santa Fe
Greg, Bob Rosen, Joe Traub, Piet Hut, and myselfbut several new faces
also appeared, including physicist Jim Hartle, biologist Harold Morowitz,
and astrophysicist John Barrow. The proceedings of these discussions were
published under the title Boundaries and Barriers: On the Limits to Scientific Knowledge (Addison-Wesley, Reading, MA, 1996), so I wont go into
them here.
What I most remember about this meeting is a conversation I had with
Greg Chaitin during a walk one afternoon in the breath-taking surroundings
of the research station. We were discussing the question of the complexity
of a mathematical theorem, and what one could possibly expect to get out
of a given set of consistent axioms. Greg had long before proved that there
must be theorems of arbitrarily great complexity, using the notion of algorithmic complexity as the measure, which was already a major extension of
G
odels incompleteness result. But he then went on to state that subject to
some technical conditions, its basically the case that the complexity of the
set of axioms sets an upper bound to the complexity of any theorem you
can prove within that axiomatic framework. In short, you cant get more
out than what you put in.
While it seems self-evident in retrospect, I had never really considered
the world of mathematical truths from this perspective before. This result not only makes G
odels results on the limitations of axiomatic systems
much more precise, the philosophical implication is enormous: mathematics is now both limited by the axiomatic system you employ, as well as
unlimited by the opportunity to cleverly introduce more axioms to create
bigger (i.e., more complex) systems that enable us to prove more complex
theorems. But no matter how complex the axiomatic system may be, no
single system will ever enable us to get it all. This is about as direct a
statement on the limits to knowledge as one will ever get.
The Infinite (Vienna, Austria, sometime later): During a visit to Vienna, Greg had dinner one evening with myself and his Viennese host, Karl
Svozil, in the restaurant Ofenloch in the old center of the city. At one point
in the conversation, I posed the question: What would a world be like that
had no G
odels Theorem? Of course, this was a provocative question, whose
answer rests upon what you believe about the notion of infinity, since in a
totally finite world, where conceptually infinite objects like the number
8:57
420
John Casti
or the square root of 2, do not really exist, then there can be no such results
like those of G
odel or Turing. These types of limitative results depend in
an essential way upon at least the potentially infinite, if not the actually
realized version. So what I was aiming at with this question was really to
enquire as to the reality of mathematical objects, a long and venerable
area of concern in the philosophy of mathematics.
After some deliberation on the question, happily lubricated by some of
Ofenlochs fine selection of wines and designer beers, Greg made a remark
that I remember to this day. He turned to me and said, John, you have
to remember that the infinite is very powerful! Very powerful, indeed! So
powerful, in fact, that our entire view of the world would be turned upside
down if it could ever be proved that the universe is, in fact, strictly finite.
These are but a few of the many interactions Ive had with Greg that
have impacted my intellectual and personal life in a major way. In the
end, the questions we have discussed and debated have all been much more
philosophical than mathematical. And though the world, at-large regards
Greg as a mathematician, when you read his autobiographical volume
Meta Math! (Pantheon, New York, 2005) it is impossible not to be struck
by the deeply philosophicaland emotionalcontent of Gregs work.
So I salute you, Greg, on this the occasion of your 60th birthday. Im
happy to have the privilege of knowing you and to have learned so much
from our interactions. May you have at least sixty years more to reach
many more minds with your wisdom, intelligence, and never-ending set of
novel and imaginative ideas.
book
8:57
uvre
book
8:57
book
8:57
Chapter 26
Algorithmic Information Theory: Some Recollections
Gregory Chaitin
IBM Research, Yorktown Heights, USA; chaitin@ us. ibm. com
Introduction
AIT is a theory that uses the idea of the computer, particularly the size of
computer programs, to study the limits of knowledge, in other words, what
we can know, and how. This theory can be traced back to Leibniz in 1686,
and it features a place in pure mathematics where there is absolutely no
structure, none at all, namely the bits of the halting probability .
There are related bodies of work by other people going in other directions, but in my case the emphasis is on using the idea of algorithmic
complexity to obtain incompleteness results. I became interested in this as
a teenager and have worked on it ever since.
Let me tell you that story. History is extremely complicated, with many
different points of view. What will make my account simple is the unity of
purpose imposed on a field that is a personal creation, that has a central
spine, that pulls a single thread. What did it feel like to do that? In fact,
its not something I did. Its as if the ideas wanted to be expressed through
me.
It is an overwhelming experience to feel possessed by promising new
ideas. This happened to me as a teenager, and I have spent the rest of my
life trying to develop the ideas that flooded my mind then. These ideas
were deep enough to merit 45 years of effort, and I feel that more work
is still needed. There are many connections with crucial concepts in other
fields: physics, biology, philosophy, theology, artificial intelligence. . . Let
me try to remember what happened to me. . . The history of a persons life,
thats just gossip. But the history of a persons ideas, that is real, that is
423
book
8:57
424
Gregory Chaitin
important, that is where you can see creativity at work. That is where you
can see new ideas springing into being.
AIT in a Nutshell
G
odel discovered incompleteness in 1931 using a version of the liar paradox,
This statement is unprovable. I was fascinated by Godels work. I devoured Nagel and Newman, G
odels Proof, when it was published in 1958.
I was also fascinated by computers, and by the computer as a mathematical concept. In 1936 Turing derived incompleteness from uncomputability.
My work follows in Turings footsteps, not Godels, but adds the idea of
looking at the size of computer programs.
For example, lets call a program Q elegant if no program written in
the same language that is smaller than Q produces the same output. Can
we prove that individual programs are elegant? In general, no. Any given
formal axiomatic system can only enable us to show that finitely many programs are elegant.
Its easy to see that this must be so. Just consider a program P that
calculates the output of the first provably elegant program that is larger
than P . P runs through all the possible proofs in the formal axiomatic
system until it finds the first proof that an individual program Q larger
than P is elegant, and then P runs Q and returns Qs output as its (P s)
output.
If you assume that only true theorems can be proved in your formal
axiomatic system, then P is too small to be able to produce the same output as Q. If P actually succeeds in finding the program Q, then we have a
contradiction. Therefore Q is never found, which means that no program
that is bigger than P can be proven to be elegant.
So how big is P ? Well, it must include a big subroutine for running
through all the possible proofs of the formal axiomatic system. The rest
of P , the main program, is rather small; P is mostly that big subroutine.
Thats the key thing, to focus on the number of bits in that subroutine.
So lets define the algorithmic complexity of a formal axiomatic system
to be the size in bits of the smallest program for running through all the
proofs and producing all the theorems. Then we can state what we just
proved like this: You cant prove that a program is elegant if its size is substantially larger than the algorithmic complexity of the formal axiomatic
book
8:57
book
425
8:57
426
Gregory Chaitin
book
8:57
book
427
1964: Summer vacation between high school and college, I try to find
an infinite set with no subset that is easier to generate than the entire
set. By easier I mean faster or simpler; at this point I am simultaneously exploring run-time complexity and program-size complexity. The
work goes well, but is not published until 1969 in the ACM Journal as
On the simplicity and speed of programs for computing infinite sets
of natural numbers.
Also that summer, I get the first incompleteness result that I will publish, UB1, an upper bound on the provable lower bounds on run-time
complexity in any given formal axiomatic system. This is published
in 1970 in a Rio de Janiero Pontifcia Universidade Catolica research
report, and only there.3
Another discovery that summer, UB2, is that one can diagonalize
over the output of all programs that provably calculate total functions f : N N to obtain a faster growing computable total function F : N N . That is to say, given any formal axiomatic system,
one can construct a computer program from it that calculates a total
function f : N N , but the fact that this program calculates a total function f: N N cannot be proved within the formal axiomatic
system, because f goes to infinity too quickly. Calculates a total function f : N N merely means that every time we give the program
f a natural number n as input, it eventually outputs a single natural
number f (n) and then halts.
The result UB1 is actually a corollary of UB2, since all lower bounds
on run-time complexity are computable total functions.
Now one would say that the proof of UB2 is an instance of Cantor
diagonalization, but in my opinion its really closer to Paul du BoisReymonds theorem on orders of infinity. His theorem is that for any
scale of rates of growth, any infinite list of functions that go to infinity
faster and faster, for example
f0 (n) = 2n ,
n
f1 (n) = 22 ,
f2 (n) = 22
3 While
2n
...,
writing up that report in Rio, I realize I can also obtain an upper bound on the
provable lower bounds on program-size complexity.
8:57
428
Gregory Chaitin
that goes to infinity even more quickly. As far as I know, Paul du BoisReymonds work was independent of Cantors.
Note the Cantor ordinal number as a subscript. We can then form
f+1 (n) = 2f (n) ,
f+2 (n) = 2f+1 (n) ,
f+3 (n) = 2f+2 (n) . . .
and then
f2 (n) = max f+k (n).
kn
learned the calculus from Hardys A Course of Pure Mathematics, and also enjoyed
A Mathematicians Apology and Hardy and Wright, An Introduction to the Theory of
Numbers.
5 One of the players is trying to match the other players choice of head or tails.
book
8:57
book
429
8:57
430
Gregory Chaitin
book
8:57
book
431
By the way, in theories (A) and (B), randomness definition (R1) does
not apply, because the size of programs is measured in states, not bits.
It is necessary to use a slightly different definition of randomness:
Definition of Randomness R2: A random n-bit string is one
that has maximum or near maximum complexity. In other words,
an n-bit string is random if its complexity is approximately equal
to the maximum complexity of any n-bit string.
In theory (C), (R1) works fine (but so does (R2), which is more general).
Theory (C) is essentially the same as the one independently proposed
by Kolmogorov at about the same time (1965).8
However, I am dissatisfied with theory (C); the absence of subadditivity disturbs me. What is subadditivity? The usual definition is
that a function f is subadditive if f (x + y) f (x) + f (y). I mean
something slightly different. Subadditivity holds if the complexity of
computing two objects together (also known as their joint complexity) is bounded by the sum of their individual complexities.9 In other
words, subadditivity means that you can combine subroutines by concatenating them, without having to add information to indicate where
the first subroutine ends and the second one begins. This makes it easy
to construct big programs. Complexity is subadditive in theories (A)
and (B), but not in theory (C).
Last but not least, On the length of programs for computing finite binary sequences contains what I would now call a Berry paradox proof
that program-size complexity is uncomputable. This seed was to grow
into my 1970 work on incompleteness, where I refer to the Berry paradox explicitly for the first time.
1966: Awarded by City College the Belden Mathematical Prize and
the Gitelson Medal for the pursuit of truth. Family moves back to
Buenos Aires.
1967: I join IBM Argentina, working as a computer programmer.
(C2) is the version of theory (C) given in On the length of programs for computing
finite binary sequences: statistical considerations (ACM Journal, 1969), the second of
the two papers put together from my 1965 randomness manuscript.
8 Solomonoff was the first person to publish the idea of program-size complexityin fact,
(C)but he did not propose a definition of randomness.
9 In the case of joint complexity the computer has two outputs, or outputs a pair of
objects, whatever you prefer.
8:57
432
Gregory Chaitin
1969: Stimulated by von Neumanns posthumous Theory of SelfReproducing Automata, I work on a mathematical definition of life using
program-size complexity. This is published in Spanish in Buenos Aires,
and the next year (1970) in English in the ACM SICACT News. This
is the first of what on the whole I regard as an unsuccessful series of
papers on theoretical biology.10
1970: I visit Brazil and inspired by this tropical land, I realize that one
can get powerful incompleteness results using program-size arguments.
In fact, one can place upper bounds on the provable lower bounds on
run-time and program-size complexity in a formal axiomatic system.
And this provides a way to measure the power of that formal axiomatic
system.
This first information-theoretic incompleteness result is immediately
published in a Rio de Janiero Pontifcia Universidade Catolica research
report and also as an AMS Notices abstract, and comes out the next
year (1971) as a note in the ACM SIGACT News.
I obtain a LISP 1.5 Programmers Manual in Brazil and start writing
LISP interpreters and inventing LISP dialects.11
1971: I write a longer paper on incompleteness, Information-theoretic
limitations of formal systems, which is presented at the Courant Institute Computational Complexity Symposium in New York City in
October 1971. A key idea in this paper is to measure the complexity
of a formal axiomatic system by the size in bits of the program that
generates all of the theorems by systematically running through the
tree of all possible proofs.
1973: I complete a greatly expanded version of Information-theoretic
limitations of formal systems. The expanded version appears in the
ACM Journal in 1974. A less technical paper on the same subject,
Information-theoretic computational complexity, is presented at the
IEEE International Symposium on Information Theory, in Ashkelon,
Israel, June 1973, and is published in 1974 as an invited paper in the
IEEE Transactions on Information Theory.12
10 The
book
8:57
book
433
conversation with G
odel and an appointment to meet him at the Princeton Institute for
Advanced Study, an appointment that G
odel cancels at the last minute.
13 Solomonoff tried to define P (x) but could not get P (x) to converge since he wasnt
working with self-delimiting programs.
14 This definition is a bit abstract. Here are two other ways of defining an number. As
a sum over programs p:
X
0 =
2|p| .
p halts
2H(n) .
Here |p| is the size in bits of the program p, and H(n) is the size in bits of the smallest
program for calculating the positive integer n.
8:57
434
Gregory Chaitin
book
8:57
book
435
To get (), self-delimiting programs arent enough, you also need the
right definition of relative complexity.17 I had used relative complexity in my big 1965 randomness manuscript, but had eliminated it
to save space. In my 1975 ACM Journal paper I take up relative complexity again, but define H(x|y), the complexity of x given y, to be the
size in bits of the smallest self-delimiting program for calculating x if
we are given for free, not y directly, but a minimum-size self-delimiting
program for y.
And () implies that the extent to which computing two things together is cheaper than computing them separately, also known as the
mutual information
H(x : y) H(x) + H(y) H(x, y),
is essentially the same, within O(1), of the extent to which knowing x
helps us to know y
H(y) H(y|x),
and this in turn is essentially the same, within O(1), of the extent to
which knowing y helps us to know x
H(x) H(x|y).
This is so pretty that I decide never to use theory (C) again. For (D)
doesnt just restore subadditivity to (C), it reveals an elegant new landscape with sharp results instead of messy error terms. From now on,
theory (D) only.
1975: My first Scientific American article, Randomness and mathematical proof, appears. I move back to New York and join the IBM
Watson Lab.
In the paper Algorithmic entropy of sets (Computers & Mathematics with Applications, 1976, written at the end of 1975), I attempt to
extend the self-delimiting approach to programs for generating infinite
sets of output. Much remains to be done.18
This topic is important, because I think of a formal axiomatic system as
a computation that produces theorems. My measure of the complexity
17 Levin
claims to have published theory (D) first. However he missed this vital part of
theory (D).
18 If this interests you, please see the discussion of infinite computations in the last
chapter of Exploring Randomness (2001).
8:57
436
Gregory Chaitin
roughly the same time I fulfill a childhood dream by building my own telescope: I
join a telescope-making club and grind a 6 inch f/8 mirror for a Newtonian reflector in
a basement workshop at the Hayden Planetarium of the Museum of Natural History.
20 Besides learning the physics and some numerical analysis, I wanted to get a feel for
the algorithmic complexity of the laws of physics.
21 Its actually whats called an exponential diophantine equation.
book
8:57
book
437
8:57
438
Gregory Chaitin
book
8:57
book
439
Im invited to present a paper at a philosophy congress in Bonn, Germany, in September. For this purpose, I begin to study philosophy,
particularly Leibnizs work on complexity, which I am led to by a hint
in a book by Hermann Weyl.
My paper appears two years later (2004) in a proceedings volume published by the Academy Press of the Berlin Academy that was founded
by Leibniz. It is reprinted as the second appendix in my book Meta
Math! (2005).
2003: Lecture notes From Philosophy to Program Size published in
Estonia, based on a course I give there, winter 2003.
2004: Corresponding member, Academie Internationale de Philosophie
des Sciences.
Write Meta Math!, a high-level popularization of AIT, published the
following year by Pantheon Books (2005). This book is not just an
explanation of previous work; it presents a syst`eme du monde.
Tasi
c pointed out to me that Borel has a know-it-all real numbera 1927
version of the number. My paper on the ontological status of real numbers is dedicated
to Borels memory.
8:57
440
Gregory Chaitin
book
8:57
book
441
24 Also
published
published
26 Also published
27 Also published
25 Also
in
in
in
in
Greek.
Japanese and Portuguese.
Japanese.
Japanese.
8:57
442
Gregory Chaitin
book
8:57
Celebration
book
8:57
book
8:57
Chapter 27
Chaitin Celebration at the NKS2007 Conference
1 http://www.wolframscience.com/conference/2007/program.html.
445
book
8:57
book
8:57
book
8:57
book
8:57
Stephen Wolfram, Gregory Chaitin, and the Omega cake (photo by Jeff
Grote)
book