Information Randomness Incompleteness Papers On Algorithmic Information Theory
Information Randomness Incompleteness Papers On Algorithmic Information Theory
RANDOMNESS &
INCOMPLETENESS
Papers on Algorithmic
Information Theory
Second Edition
G J Chaitin
IBM, P O Box 704
Yorktown Heights, NY 10598
[email protected]
1
2
I. Prigogine, Universit
e Libre de Bruxelles (Mondes en Developpe-
ment)
Springer-Verlag (Open Problems in Communication and Compu-
tation)
Verlag Kammerer & Unverzagt (The Universal Turing Machine|
A Half-Century Survey)
W. H. Freeman and Company (Sci. Amer.).
Preface
God not only plays dice in quantum mechanics, but even with the whole
numbers! The discovery of randomness in arithmetic is presented in my
book Algorithmic Information Theory published by Cambridge Univer-
sity Press. There I show that to decide if an algebraic equation in
integers has nitely or innitely many solutions is in some cases ab-
solutely intractable. I exhibit an innite series of such arithmetical
assertions that are random arithmetical facts, and for which it is es-
sentially the case that the only way to prove them is to assume them
as axioms. This extreme form of G
odel incompleteness theorem shows
that some arithmetical truths are totally impervious to reasoning.
The papers leading to this result were published over a period of
more than twenty years in widely scattered journals, but because of
their unity of purpose they fall together naturally into the present
book, intended as a companion volume to my Cambridge University
Press monograph. I hope that it will serve as a stimulus for work on
complexity, randomness and unpredictability, in physics and biology as
well as in metamathematics.
For the second edition, I have added the article \Randomness in
arithmetic" (Part I), a collection of abstracts (Part VII), a bibliogra-
phy (Part VIII), and, as an Epilogue, two essays which have not been
published elsewhere that assess the impact of algorithmic information
theory on mathematics and biology, respectively. I should also like to
point out that it is straightforward to apply to LISP the techniques
used in Part VI to study bounded-transfer Turing machines. A few
footnotes have been added to Part VI, but the subject richly deserves
3
4
book length treatment, and I intend to write a book about LISP in the
near future.1
Gregory Chaitin
5
6
Toward a mathematical de
nition of \life" 165
Epilogue 503
Undecidability & randomness in pure mathematics 503
Algorithmic information & evolution 517
9
RANDOMNESS AND
MATHEMATICAL PROOF
Scienti
c American 232, No. 5
(May 1975), pp. 47{52
by Gregory J. Chaitin
Abstract
Although randomness can be precisely dened and can even be measured,
a given number cannot be proved to be random. This enigma establishes
a limit to what is possible in mathematics.
11
12 Part I|Introductory/Tutorial/Survey Papers
The rst is obviously constructed according to a simple rule it consists
of the number 01 repeated ten times. If one were asked to speculate
on how the series might continue, one could predict with considerable
condence that the next two digits would be 0 and 1. Inspection of
the second series of digits yields no such comprehensive pattern. There
is no obvious rule governing the formation of the number, and there
is no rational way to guess the succeeding digits. The arrangement
seems haphazard in other words, the sequence appears to be a random
assortment of 0's and 1's.
The second series of binary digits was generated by ipping a coin
20 times and writing a 1 if the outcome was heads and a 0 if it was tails.
Tossing a coin is a classical procedure for producing a random number,
and one might think at rst that the provenance of the series alone
would certify that it is random. This is not so. Tossing a coin 20 times
can produce any one of 220 (or a little more than a million) binary series,
and each of them has exactly the same probability. Thus it should be
no more surprising to obtain the series with an obvious pattern than to
obtain the one that seems to be random each represents an event with a
probability of 2;20. If origin in a probabilistic event were made the sole
criterion of randomness, then both series would have to be considered
random, and indeed so would all others, since the same mechanism can
generate all the possible series. The conclusion is singularly unhelpful
in distinguishing the random from the orderly.
Clearly a more sensible denition of randomness is required, one
that does not contradict the intuitive concept of a \patternless" num-
ber. Such a denition has been devised only in the past 10 years. It
does not consider the origin of a number but depends entirely on the
characteristics of the sequence of digits. The new denition enables
us to describe the properties of a random number more precisely than
was formerly possible, and it establishes a hierarchy of degrees of ran-
domness. Of perhaps even greater interest than the capabilities of the
denition, however, are its limitations. In particular the denition can-
not help to determine, except in very special cases, whether or not a
given series of digits, such as the second one above, is in fact random
or only seems to be random. This limitation is not a aw in the de-
nition it is a consequence of a subtle but fundamental anomaly in
the foundation of mathematics. It is closely related to a famous theo-
Randomness and Mathematical Proof 13
rem devised and proved in 1931 by Kurt G
odel, which has come to be
known as G
odel's incompleteness theorem. Both the theorem and the
recent discoveries concerning the nature of randomness help to dene
the boundaries that constrain certain mathematical methods.
Algorithmic De
nition
The new denition of randomness has its heritage in information theory,
the science, developed mainly since World War II, that studies the
transmission of messages. Suppose you have a friend who is visiting
a planet in another galaxy, and that sending him telegrams is very
expensive. He forgot to take along his tables of trigonometric functions,
and he has asked you to supply them. You could simply translate
the numbers into an appropriate code (such as the binary numbers)
and transmit them directly, but even the most modest tables of the
six functions have a few thousand digits, so that the cost would be
high. A much cheaper way to convey the same information would be
to transmit instructions for calculating the tables from the underlying
trigonometric formulas, such as Euler's equation eix = cos x + i sin x.
Such a message could be relatively brief, yet inherent in it is all the
information contained in even the largest tables.
Suppose, on the other hand, your friend is interested not in
trigonometry but in baseball. He would like to know the scores of
all the major-league games played since he left the earth some thou-
sands of years before. In this case it is most unlikely that a formula
could be found for compressing the information into a short message
in such a series of numbers each digit is essentially an independent item
of information, and it cannot be predicted from its neighbors or from
some underlying rule. There is no alternative to transmitting the entire
list of scores.
In this pair of whimsical messages is the germ of a new denition
of randomness. It is based on the observation that the information
embodied in a random series of numbers cannot be \compressed," or
reduced to a more compact form. In formulating the actual denition
it is preferable to consider communication not with a distant friend but
with a digital computer. The friend might have the wit to make infer-
14 Part I|Introductory/Tutorial/Survey Papers
ences about numbers or to construct a series from partial information
or from vague instructions. The computer does not have that capacity,
and for our purposes that deciency is an advantage. Instructions given
the computer must be complete and explicit, and they must enable it
to proceed step by step without requiring that it comprehend the result
of any part of the operations it performs. Such a program of instruc-
tions is an algorithm. It can demand any nite number of mechanical
manipulations of numbers, but it cannot ask for judgments about their
meaning.
The denition also requires that we be able to measure the infor-
mation content of a message in some more precise way than by the cost
of sending it as a telegram. The fundamental unit of information is the
\bit," dened as the smallest item of information capable of indicating
a choice between two equally likely things. In binary notation one bit
is equivalent to one digit, either a 0 or a 1.
We are now able to describe more precisely the dierences between
the two series of digits presented at the beginning of this article:
01010101010101010101
01101100110111100010
The rst could be specied to a computer by a very simple algorithm,
such as \Print 01 ten times." If the series were extended according to
the same rule, the algorithm would have to be only slightly larger it
might be made to read, for example, \Print 01 a million times." The
number of bits in such an algorithm is a small fraction of the number
of bits in the series it species, and as the series grows larger the size
of the program increases at a much slower rate.
For the second series of digits there is no corresponding shortcut.
The most economical way to express the series is to write it out in
full, and the shortest algorithm for introducing the series into a com-
puter would be \Print 01101100110111100010." If the series were much
larger (but still apparently patternless), the algorithm would have to
be expanded to the corresponding size. This \incompressibility" is a
property of all random numbers indeed, we can proceed directly to
dene randomness in terms of incompressibility: A series of numbers is
random if the smallest algorithm capable of specifying it to a computer
has about the same number of bits of information as the series itself.
Randomness and Mathematical Proof 15
This denition was independently proposed about 1965 by A. N.
Kolmogorov of the Academy of Science of the U.S.S.R. and by me,
when I was an undergraduate at the City College of the City University
of New York. Both Kolmogorov and I were then unaware of related
proposals made in 1960 by Ray J. Solomono of the Zator Company
in an endeavor to measure the simplicity of scientic theories. During
the past decade we and others have continued to explore the meaning
of randomness. The original formulations have been improved and the
feasibility of the approach has been amply conrmed.
Unprovable Statements
G
odel showed in his 1931 proof that Hilbert's plan for a completely sys-
tematic mathematics cannot be fullled. He did this by constructing an
assertion about the positive integers in the language of the formal sys-
tem that is true but that cannot be proved in the system. The formal
system, no matter how large or how carefully constructed it is, can-
not encompass all true theorems and is therefore incomplete. G
odel's
technique can be applied to virtually any formal system, and it there-
fore demands the surprising and, for many, discomforting conclusion
that there can be no denitive answer to the question \What is a valid
proof?"
G
odel's proof of the incompleteness theorem is based on the paradox
of Epimenides the Cretan, who is said to have averred, \All Cretans
are liars" see \Paradox," by W. V. Quine Scientic American, April,
1962]. The paradox can be rephrased in more general terms as \This
statement is false," an assertion that is true if and only if it is false
and that is therefore neither true nor false. G
odel replaced the concept
of truth with that of provability and thereby constructed the sentence
\This statement is unprovable," an assertion that, in a specic formal
system, is provable if and only if it is false. Thus either a falsehood
is provable, which is forbidden, or a true statement is unprovable, and
hence the formal system is incomplete. G
odel then applied a technique
that uniquely numbers all statements and proofs in the formal system
and thereby converted the sentence \This statement is unprovable" into
an assertion about the properties of the positive integers. Because this
transformation is possible, the incompleteness theorem applies with
equal cogency to all formal systems in which it is possible to deal with
22 Part I|Introductory/Tutorial/Survey Papers
the positive integers see \G
odel's Proof," by Ernest Nagel and James
R. Newman Scientic American, June, 1956].
The intimate association between G
odel's proof and the theory of
random numbers can be made plain through another paradox, similar in
form to the paradox of Epimenides. It is a variant of the Berry paradox,
rst published in 1908 by Bertrand Russell. It reads: \Find the smallest
positive integer which to be specied requires more characters than
there are in this sentence." The sentence has 114 characters (counting
spaces between words and the period but not the quotation marks),
yet it supposedly species an integer that, by denition, requires more
than 114 characters to be specied.
As before, in order to apply the paradox to the incompleteness the-
orem it is necessary to remove it from the realm of truth to the realm of
provability. The phrase \which requires" must be replaced by \which
can be proved to require," it being understood that all statements will
be expressed in a particular formal system. In addition the vague no-
tion of \the number of characters required to specify" an integer can
be replaced by the precisely dened concept of complexity, which is
measured in bits rather than characters.
The result of these transformations is the following computer pro-
gram: \Find a series of binary digits that can be proved to be of a
complexity greater than the number of bits in this program." The pro-
gram tests all possible proofs in the formal system in order of their size
until it encounters the rst one proving that a specic binary sequence
is of a complexity greater than the number of bits in the program. Then
it prints the series it has found and halts. Of course, the paradox in
the statement from which the program was derived has not been elim-
inated. The program supposedly calculates a number that no program
its size should be able to calculate. In fact, the program nds the rst
number that it can be proved incapable of nding.
The absurdity of this conclusion merely demonstrates that the pro-
gram will never nd the number it is designed to look for. In a formal
system one cannot prove that a particular series of digits is of a com-
plexity greater than the number of bits in the program employed to
specify the series.
A further generalization can be made about this paradox. It is
not the number of bits in the program itself that is the limiting factor
Randomness and Mathematical Proof 23
but the number of bits in the formal system as a whole. Hidden in
the program are the axioms and rules of inference that determine the
behavior of the system and provide the algorithm for testing proofs.
The information content of these axioms and rules can be measured
and can be designated the complexity of the formal system. The size
of the entire program therefore exceeds the complexity of the formal
system by a xed number of bits c. (The actual value of c depends on
the machine language employed.) The theorem proved by the paradox
can therefore be stated as follows: In a formal system of complexity n
it is impossible to prove that a particular series of binary digits is of
complexity greater than n + c, where c is a constant that is independent
of the particular system employed.
Illustrations
Algorithmic denition of randomness
(a) 10100!Computer!11111111111111111111
(b) 01101100110111100010!Computer!01101100110111100010
Algorithmic de
nition of randomness relies on the capabilities
and limitations of the digital computer. In order to produce a partic-
ular output, such as a series of binary digits, the computer must be
given a set of explicit instructions that can be followed without making
intellectual judgments. Such a program of instructions is an algorithm.
If the desired output is highly ordered (a), a relatively small algorithm
will suce a series of twenty 1's, for example, might be generated by
some hypothetical computer from the program 10100, which is the bi-
nary notation for the decimal number 20. For a random series of digits
(b) the most concise program possible consists of the series itself. The
smallest programs capable of generating a particular program are called
the minimal programs of the series the size of these programs, mea-
sured in bits, or binary digits, is the complexity of the series. A series
of digits is dened as random if series' complexity approaches its size
in bits.
Randomness and Mathematical Proof 25
Formal systems
Alphabet, Grammar, Axioms, Rules of Inference
#
Computer
#
Theorem 1, Theorem 2, Theorem 3, Theorem 4, Theorem 5, : : :
Formal systems devised by David Hilbert contain an algorithm
that mechanically checks the validity of all proofs that can be formu-
lated in the system. The formal system consists of an alphabet of
symbols in which all statements can be written a grammar that speci-
es how the symbols are to be combined a set of axioms, or principles
accepted without proof and rules of inference for deriving theorems
from the axioms. Theorems are found by writing all the possible gram-
matical statements in the system and testing them to determine which
ones are in accord with the rules of inference and are therefore valid
proofs. Since this operation can be performed by an algorithm it could
be done by a digital computer. In 1931 Kurt G
odel demonstrated that
virtually all formal systems are incomplete: in each of them there is at
least one statement that is true but that cannot be proved.
Inductive reasoning
Observations: 0101010101
Predictions: 01010101010101010101
Theory: Ten repetitions of 01
Size of Theory: 21 characters
Predictions: 01010101010000000000
Theory: Five repetitions of 01 followed by ten 0's
Size of Theory: 42 characters
Inductive reasoning as it is employed in science was analyzed
mathematically by Ray J. Solomono. He represented a scientist's
observations as a series of binary digits the observations are to be
explained and new ones are to be predicted by theories, which are
26 Part I|Introductory/Tutorial/Survey Papers
regarded as algorithms instructing a computer to reproduce the ob-
servations. (The programs would not be English sentences but binary
series, and their size would be measured not in characters but in bits.)
Here two competing theories explain the existing data Occam's razor
demands that the simpler, or smaller, theory be preferred. The task of
the scientist is to search for minimal programs. If the data are random,
the minimal programs are no more concise than the observations and
no theory can be formulated.
Random sequences
Illustration is a graph of number of n-digit sequences
as a function of their complexity.
The curve grows exponentially
from approximately 0 to approximately 2n
as the complexity goes from 0 to n.
Random sequences of binary digits make up the majority of all
such sequences. Of the 2n series of n digits, most are of a complexity
that is within a few bits of n. As complexity decreases, the number of
series diminishes in a roughly exponential manner. Orderly series are
rare there is only one, for example, that consists of n 1's.
Three paradoxes
Russell Paradox
Consider the set of all sets that are not members of themselves.
Is this set a member of itself?
Epimenides Paradox
Consider this statement: \This statement is false."
Is this statement true?
Berry Paradox
Consider this sentence: \Find the smallest positive integer
which to be specied requires more characters
than there are in this sentence."
Does this sentence specify a positive integer?
Randomness and Mathematical Proof 27
Three paradoxes delimit what can be proved. The rst, devised
by Bertrand Russell, indicated that informal reasoning in mathematics
can yield contradictions, and it led to the creation of formal systems.
The second, attributed to Epimenides, was adapted by G
odel to show
that even within a formal system there are true statements that are un-
provable. The third leads to the demonstration that a specic number
cannot be proved random.
Unprovable statements
(a) This statement is unprovable.
(b) The complexity of 01101100110111100010 is greater than 15 bits.
(c) The series of digits 01101100110111100010 is random.
(d) 10100 is a minimal program for the series 11111111111111111111.
Unprovable statements can be shown to be false, if they are
false, but they cannot be shown to be true. A proof that \This state-
ment is unprovable" (a) reveals a self-contradiction in a formal system.
The assignment of a numerical value to the complexity of a particular
number (b) requires a proof that no smaller algorithm for generating
the number exists the proof could be supplied only if the formal sys-
tem itself were more complex than the number. Statements labeled c
and d are subject to the same limitation, since the identication of a
random number or a minimal program requires the determination of
complexity.
Further Reading
A Prole of Mathematical Logic. Howard DeLong. Addison-
Wesley, 1970.
Theories of Probability: An Examination of Foundations. Ter-
rence L. Fine. Academic Press, 1973.
28 Part I|Introductory/Tutorial/Survey Papers
Universal Gambling Schemes and the Complexity Measures of
Kolmogorov and Chaitin. Thomas M. Cover. Technical Report
No. 12, Statistics Department, Stanford University, 1974.
\Information-Theoretic Limitations of Formal Systems." Gregory
J. Chaitin in Journal of the Association for Computing Machin-
ery, Vol. 21, pages 403{424 July, 1974.
RANDOMNESS IN
ARITHMETIC
Scienti
c American 259, No. 1
(July 1988), pp. 80{85
by Gregory J. Chaitin
Gregory J. Chaitin is on the sta of the IBM Thomas J. Watson
Research Center in Yorktown Heights, N.Y. He is the principal archi-
tect of algorithmic information theory and has just published two books
in which the theory's concepts are applied to elucidate the nature of ran-
domness and the limitations of mathematics. This is Chaitin's second
article for Scientific American.
Abstract
It is impossible to prove whether each member of a family of algebraic
equations has a nite or an innite number of solutions: the answers
vary randomly and therefore elude mathematical reasoning.
29
30 Part I|Introductory/Tutorial/Survey Papers
What could be more certain than the fact that 2 plus 2 equals 4? Since
the time of the ancient Greeks mathematicians have believed there
is little|if anything|as unequivocal as a proved theorem. In fact,
mathematical statements that can be proved true have often been re-
garded as a more solid foundation for a system of thought than any
maxim about morals or even physical objects. The 17th-century Ger-
man mathematician and philosopher Gottfried Wilhelm Leibniz even
envisioned a \calculus" of reasoning such that all disputes could one
day be settled with the words \Gentlemen, let us compute!" By the be-
ginning of this century symbolic logic had progressed to such an extent
that the German mathematician David Hilbert declared that all math-
ematical questions are in principle decidable, and he condently set out
to codify once and for all the methods of mathematical reasoning.
Such blissful optimism was shattered by the astonishing and pro-
found discoveries of Kurt G
odel and Alan M. Turing in the 1930's.
G
odel showed that no nite set of axioms and methods of reasoning
could encompass all the mathematical properties of the positive inte-
gers. Turing later couched G
odel's ingenious and complicated proof in
a more accessible form. He showed that G
odel's incompleteness theo-
rem is equivalent to the assertion that there can be no general method
for systematically deciding whether a computer program will ever halt,
that is, whether it will ever cause the computer to stop running. Of
course, if a particular program does cause the computer to halt, that
fact can be easily proved by running the program. The diculty lies in
proving that an arbitrary program never halts.
I have recently been able to take a further step along the path laid
out by G
odel and Turing. By translating a particular computer pro-
gram into an algebraic equation of a type that was familiar even to the
ancient Greeks, I have shown that there is randomness in the branch of
pure mathematics known as number theory. My work indicates that|
to borrow Einstein's metaphor|God sometimes plays dice with whole
numbers!
This result, which is part of a body of work called algorithmic in-
formation theory, is not a cause for pessimism it does not portend
anarchy or lawlessness in mathematics. (Indeed, most mathematicians
Randomness in Arithmetic 31
continue working on problems as before.) What it means is that math-
ematical laws of a dierent kind might have to apply in certain situa-
tions: statistical laws. In the same way that it is impossible to predict
the exact moment at which an individual atom undergoes radioactive
decay, mathematics is sometimes powerless to answer particular ques-
tions. Nevertheless, physicists can still make reliable predictions about
averages over large ensembles of atoms. Mathematicians may in some
cases be limited to a similar approach.
Are there other problems in other elds of science that can benet from
these insights into the foundations of mathematics? I believe algorith-
mic information theory may have relevance to biology. The regulatory
genes of a developing embryo are in eect a computer program for con-
structing an organism. The \complexity" of this biochemical computer
program could conceivably be measured in terms analogous to those I
have developed in in quantifying the information content of $.
Although $ is completely random (or innitely complex) and can-
not ever be computed exactly, it can be approximated with arbitrary
precision given an innite amount of time. The complexity of living
organisms, it seems to me, could be approximated in a similar way. A
sequence of $n 's, which approach $, can be regarded as a metaphor for
evolution and perhaps could contain the germ of a mathematical model
for the evolution of biological complexity.
At the end of his life John von Neumann challenged mathematicians
to nd an abstract mathematical theory for the origin and evolution
of life. This fundamental problem, like most fundamental problems,
is magnicently dicult. Perhaps algorithmic information theory can
help to suggest a way to proceed.
Further Reading
Algorithmic Information Theory. Gregory J. Chaitin.
Cambridge University Press, 1987.
Randomness in Arithmetic 39
Information, Randomness & Incompleteness. Gregory J.
Chaitin. World Scientic Publishing Co. Pte. Ltd., 1987.
The Ultimate in Undecidability. Ian Stewart in Nature,
Vol. 232, No. 6160, pages 115{116 March 10, 1988.
40 Part I|Introductory/Tutorial/Survey Papers
ON THE DIFFICULTY OF
COMPUTATIONS
IEEE Transactions on Information Theory
IT-16 (1970), pp. 5{9
Gregory J. Chaitin1
Abstract
Two practical considerations concerning the use of computing machin-
ery are the amount of information that must be given to the machine
for it to perform a given task and the time it takes the machine to per-
form it. The size of programs and their running time are studied for
mathematical models of computing machines. The study of the amount
of information (i.e., number of bits) in a computer program needed for
it to put out a given nite binary sequence leads to a denition of a ran-
dom sequence the random sequences of a given length are those that
require the longest programs. The study of the running time of programs
for computing innite sets of natural numbers leads to an arithmetic of
computers, which is a distributive lattice.
1 Manuscript received May 5, 1969 revised July 3, 1969. This paper was pre-
41
42 Part I|Introductory/Tutorial/Survey Papers
0 1 A
6 Tape
Black Box
Section I
The modern computing machine sprang into existence at the end of
World War II. But already in 1936 Turing and Post had proposed a
mathematical model of computing machines (gure 1).2 The math-
ematical model of the computing machine that Turing and Post pro-
posed, commonly referred to as the Turing machine, is a black box with
a nite number of internal states. The box can read and write on an
innite paper tape, which is divided into squares. A digit or letter may
be written on each square of the tape, or the square may be blank.
Each second the machine performs one of the following actions. It may
stop, it may shift the tape one square to the right or one square to the
left, it may erase the square on which the read-write head is positioned,
or it may write a digit or letter on the square on which the read-write
head is positioned. The action it performs is determined solely by the
internal state of the black box at the moment, and the current state of
the black box is determined solely by its previous internal state and the
character read on the square of the tape on which its read-write head
was positioned.
Incredible as it may seem at rst, a machine of such primitive design
can multiply numbers written on its tape, and can write on its tape
sented as a lecture at the Pan-American Symposium of Applied Mathematics,
Buenos Aires, Argentina, August 1968.
The author is at Mario Bravo 249, Buenos Aires, Argentina.
2 Their papers appear in Davis 1]. As general references on computability theory
we may also cite Davis 2]{4], Minsky 5], Rogers 6], and Arbib 7].
On the Diculty of Computations 43
the successive digits of . Indeed, it is now generally accepted that
any calculation that a modern electronic digital computer or a human
computer can do, can also be done by such a machine.
Section II
How much information must be provided to a computer in order for
it to perform a given task? The point of view we will present here is
somewhat dierent from the usual one. In a typical scientic applica-
tion, the computer may be used to analyze statistically huge amounts of
data and produce a brief report in which a great many observations are
reduced to a handful of statistical parameters. We would view this in
the following manner. The same nal result could have been achieved
if we had provided the computer with a table of the results, together
with instructions for printing them in a neat report. This observation
is, of course, ridiculous for all practical purposes. For, had we known
the results, it would not have been necessary to use a computer. This
example, then, does not exemplify those aspects of computation that
we will emphasize.
Rather, we are thinking of such scientic applications as solving the
Schr
odinger wave equation for the helium atom. Here we have no data,
only a program and the program will produce after much calculation a
great deal of printout. Or consider calculating the apparent positions
of the planets as observed from the earth over a period of years. A
small program incorporating the very simple Newtonian theory for this
situation will predict a great many astronomical observations. In this
problem there are no data|only a program that contains, of course,
a table of the masses of the planets and their initial positions and
velocities.
Section III
Let us now consider the problem of the amount of information that
it is necessary to provide to a computer in order for it to calculate a
given nite binary sequence. A computing machine is dened for these
44 Part I|Introductory/Tutorial/Survey Papers
purposes to be a device that accepts as input a program, performs
the calculations indicated to it in the program, and nally puts out
the binary sequence it has calculated. In line with the mathematical
theory of information, it is natural for the program to be viewed as a
sequence of bits or 0's and 1's. Furthermore, in computer engineering all
programs and data are represented in the machine's circuits in binary
form. Thus, we may consider a computer to be a device that accepts
one binary sequence (the program) and emits another (the result of the
calculation).
011001001!Computer!1111110010001100110100
As an example of a computer we would then have an electronic dig-
ital computer that accepts programs consisting of magnetized spots
on magnetic tape and puts out its results in the same form. Another
example is a Turing machine. The program is a series of 0's and 1's
written on the machine's tape at the start of the calculation, and the
result is a sequence of 0's and 1's written on its tape when it stops. As
was mentioned, the second of these examples can do anything that the
rst can.
Section IV
We are interested in the amount of information that must be supplied to
a computer M in order for it to calculate a given nite binary sequence
S . We may now dene this as the size or length of the smallest binary
sequence that causes the machine M to calculate S . We denote the
length of the shortest program for M to calculate S by L(M S ). It
has been shown that there is a computing machine M that has the
following three properties.3
1) L(M S ) k + 1 for all binary sequences S of length k.
In other words, any binary sequence of length k can be calculated by
this computer M if it is given an appropriate program at most k +1 bits
in length. The proof is as follows. If no better way to calculate a binary
3 Solomono 8] was the rst to employ computers of this kind.
On the Diculty of Computations 45
sequence occurs to us, we can always include the binary sequence as a
table in the program. This computer is so designed that we need add
only a single bit to the sequence to obtain a program for computing it.
The computer M emits the sequence S when it is given the program
S 0.
2) Those binary sequences S for which L(M S ) < j are fewer than
2j in number.
Thus, most binary sequences of length k require programs of about
the same length k, and the number of sequences that can be com-
puted by smaller programs decreases exponentially as the size of the
program decreases. The proof is as follows. There are only 2j ; 2 bi-
nary sequences less than j in length. Thus, there are fewer than 2j
programs less than j in length, for each program is a binary sequence.
At best, a program will cause the computer to calculate a single binary
sequence. At worst, an error in the program will trap the computer
in an endless loop, and no binary sequence will be calculated. As each
program causes the computer to calculate at most one binary sequence,
the number of sequences calculated must be smaller than the number
of programs. Thus, fewer than 2j binary sequences can be calculated
by means of programs less than j in length.
3) For any other computer M 0 there exists a constant c(M 0) such
that for all binary sequences S , L(M S ) L(M 0 S ) + c(M 0).
In other words, this computer requires shorter programs than any
other computer, or more exactly it does not require programs much
longer than those required by any other computer. The proof is as
follows. The computer M is designed to interpret the circuit diagrams
of any other computer M 0. Given a program for M 0 and the circuit
diagrams of M 0, the computer M proceeds to calculate how M 0 would
behave, i.e., it proceeds to simulate M 0. Thus, we need only add a xed
number of bits to any program for M 0 in order to obtain a program that
enables M to calculate the same result. This program for M is of the
form PC 1.
The 1 at the right end of the program indicates to the computer
M that this is a simulation, C is a xed binary sequence of length
46 Part I|Introductory/Tutorial/Survey Papers
c(M 0) ; 1 giving the circuit diagrams of the computer M 0, which is to
be imitated, and P is the program for M 0.4
Section V
Kolmogorov 9] and the author 11], 12] have independently suggested
that computers such as those previously described be applied to the
problem of dening what is meant by a random or patternless nite
binary sequence of 0's and 1's. In the traditional foundations of the
mathematical theory of probability, as expounded by Kolmogorov in
his classic 10], there is no place for the concept of an individual random
sequence of 0's and 1's. Yet it is not altogether meaningless to say that
the sequence
110010111110011001011110000010
is more random or patternless than the sequences
111111111111111111111111111111
010101010101010101010101010101
for we may describe these last two sequences as thirty 1's or fteen 01's,
but there is no shorter way to specify the rst sequence than by just
writing it all out.
We believe that the random or patternless sequences of a given
length are those that require the longest programs. We have seen that
most of the binary sequences of length k require programs of about
length k. These, then, are the random or patternless sequences. Those
sequences that can be obtained by putting into a computer a program
much shorter than k are the nonrandom sequences, those that possess
a pattern or follow a law. The more possible it is to compress a bi-
nary sequence into a short program calculation, the less random is the
sequence.
As an example of this, let us consider those sequences of 0's and 1's
in which 0's and 1's do not occur with equal frequency. Let p be the
How can the computer M separate PC into P and C? C has each of its bits
4
doubled, except the pair of bits at its left end. These are unequal and serve as
punctuation separating C from P .
On the Diculty of Computations 47
relative frequency of 1's, and let q = 1 ; p be the relative frequency of
0's. A long binary sequence that has the property that 1's are more
frequent than 0's can be obtained from a computer program whose
length is only that of the desired sequence reduced by a factor H (p q) =
;p log 2 p ; q log2 q . For example, if 1's occur approximately 34 of the
time and 0's occur 14 of the time in a long binary sequence of length k,
there is a program for computing that sequence with length only about
H ( 34 14 )k = 0:80k. That is, the program need be only approximately
80 percent the length of the sequence it computes. In summary, if 0's
and 1's occur with unequal frequencies, we can compress such sequences
into programs only a certain percentage (depending on the frequencies)
of the size of the sequence. Thus, random or incompressible sequences
will have about as many 0's as 1's, which agrees with our intuitive
expectations.
In a similar manner it can be shown that all groups of 0's and 1's
will occur with approximately the expected frequency in a long binary
sequence that we call random 01100 will appear 2;5 k times in long
sequences of length k, etc.5
Section VI
The denition of random or patternless nite binary sequences just
presented is related to certain considerations in information theory and
in the methodology of science.
The two problems considered in Shannon's classical exposition 15]
are to transmit information as eciently and as reliably as possible.
Here we are interested in examining the viewpoint of information the-
ory concerning the ecient transmission of information. An informa-
tion source may be redundant, and information theory teaches us to
code or compress messages so that what is redundant is eliminated and
communications equipment is optimally employed. For example, let us
consider an information source that emits one symbol (either an A or
a B ) each second. Successive symbols are independent, and A's are
three times more frequent than B 's. Suppose it is desired to transmit
the messages over a channel that is capable of transmitting either an
5 Martin-Lof 14] also discusses the statistical properties of random sequences.
48 Part I|Introductory/Tutorial/Survey Papers
A or a B each second. Then the channel has a capacity of 1 bit per
second, while the information source has entropy 0.80 bits per symbol
and thus it is possible to code the messages in such a way that on
the average 1/0.80 = 1.25 symbols of message are transmitted over the
channel each second. The receiver must decode the messages that is,
expand them into their original form.
In summary, information theory teaches us that messages from an
information source that is not completely random (that is, which does
not have maximum entropy) can be compressed. The denition of ran-
domness is merely the converse of this fundamental theorem of infor-
mation theory if lack of randomness in a message allows it to be coded
into a shorter sequence, then the random messages must be those that
cannot be coded into shorter messages. A computing machine is clearly
the most general possible decoder for compressed messages. We thus
consider that this denition of randomness is in perfect agreement and
indeed strongly suggested by the coding theorem for a noiseless channel
of information theory.
Section VII
This denition is also closely related to classical problems of the
methodology of science.6
Consider a scientist who has been observing a closed system that
once every second either emits a ray of light or does not. He summarizes
his observations in a sequence of 0's and 1's in which a 0 represents \ray
not emitted" and a 1 represents \ray emitted." The sequence may start
0110101110: : :
and continue for a few million more bits. The scientist then examines
the sequence in the hope of observing some kind of pattern or law.
What does he mean by this? It seems plausible that a sequence of 0's
and 1's is patternless if there is no better way to calculate it than just
by writing it all out at once from a table giving the whole sequence.
The scientist might state:
Solomono 8] also discusses the relation between program lengths and the
6
problem of induction.
On the Diculty of Computations 49
My Scientic Theory: 0110101110: : :
This would not be considered an acceptable theory. On the other hand,
if the scientist should hit upon a method by which the whole sequence
could be calculated by a computer whose program is short compared
with the sequence, he would certainly not consider the sequence to be
entirely patternless or random. The shorter the program, the greater
the pattern he may ascribe the sequence.
There are many parallels between the foregoing and the way sci-
entists actually think. For example, a simple theory that accounts for
a set of facts is generally considered better or more likely to be true
than one that needs a large number of assumptions. By \simplicity" is
not meant \ease of use in making predictions." For although general
relativity is considered to be the simple theory par excellence, very ex-
tended calculations are necessary to make predictions from it. Instead,
one refers to the number of arbitrary choices that have been made in
specifying the theoretical structure. One is naturally suspicious of a
theory whose number of arbitrary elements is of an order of magnitude
comparable to the amount of information about reality that it accounts
for.
Section VIII
Let us now turn to the problem of the amount of time necessary for
computations.7 We will develop the following thesis. Call an innite
set of natural numbers perfect if there is no essentially quicker way to
compute innitely many of its members than computing the whole set.
Perfect sets exist. This thesis was suggested by the following vague and
imprecise considerations.8
One of the most profound problems of the theory of numbers is that
of calculating large primes. While the sieve of Eratosthenes appears to
be as quick an algorithm for calculating all the primes as is possible, in
7 As general references we may cite Blum 16] and Arbib and Blum 17]. Our
exposition is a summary of that of 13].
8 See Hardy and Wright 18], Sections 1.4 and 2.5 for the number-theoretic back-
ground of the following remarks.
50 Part I|Introductory/Tutorial/Survey Papers
recent times hope has centered on calculating large primes by calculat-
ing a subset of the primes, those that are Mersenne numbers. Lucas's
test can decide the primality of a Mersenne number with rapidity far
greater than is furnished by the sieve method. If there are an innity
of Mersenne primes, then it appears that Lucas has achieved a decisive
advance in this classical problem of the theory of numbers.
An opposing point of view is that there is no essentially better way
to calculate large primes than by calculating them all. If this is the case,
it apparently follows that there must be only nitely many Mersenne
primes.
These considerations, then, suggested that there are innite sets
of natural numbers that are arbitrarily dicult to compute, and that
do not have any innite subsets essentially easier to compute than the
whole set. Here diculty of computation refers to speed. Our devel-
opment will be as follows. First, we dene computers for calculating
innite sets of natural numbers. Then we introduce a way of compar-
ing the rapidity of computers, a transitive binary relation, i.e., almost
a partial ordering. Next we focus our attention on those computers
that are greater than or equal to all others under this ordering, i.e.,
the fastest computers. Our results are conditioned on the computers
having this property. The meaning of \arbitrarily dicult to compute"
is then claried. Last, we exhibit sets that are arbitrarily dicult to
compute and do not have any subset essentially easier to compute than
the whole set.
Section IX
We are interested in the speed of programs for generating the elements
of an innite set of natural numbers. For these purposes we may con-
sider a computer to be a device that once a second emits a (possibly
empty) nite set of natural numbers and that once started never stops.
That is to say, a computer is now viewed as a function whose arguments
are the program and the time and whose value is a nite set of natural
numbers. If a program causes the computer to emit innitely many
natural numbers in size order and without any repetitions, we say that
the computing machine calculates the innite set of natural numbers
On the Diculty of Computations 51
that it emits.
A Turing machine can be used to compute innite sets of natural
numbers it is only necessary to establish a convention as to when nat-
ural numbers are emitted. For example, we may divide the machine's
tape into two halves, and stipulate that what is written on the right
half cannot be erased. The computational scratchwork is done on the
left half of the tape, and the successive members of the innite set of
natural numbers are written on the nonerasable squares in decimal no-
tation, separated by commas, with no blank spaces permitted between
characters. The moment a comma has been written, it is considered
that the digits between it and the previous comma form the numeral
representing the next natural number emitted by the machine. We sup-
pose that the Turing machine performs a single cycle of activity (read
tape shift, write, or erase tape change internal state) each second.
Last, we stipulate that the machine be started scanning the rst non-
erasable square of the tape, that initially the nonerasable squares be
all blank, and that the program for the computer be written on the
rst erasable squares, with a blank serving as punctuation to indicate
the end of the program and the beginning of an innite blank region of
tape.
Section X
We now order the computers according to their speeds. C C 0 is
dened as meaning that C is not much slower than C 0.
What do we mean by saying that computer C is not much slower
than computer C 0 for the purpose of computing innite sets of natural
numbers? There is a computable change of C 's time scale that makes
C as fast as C 0 or faster. More exactly, there is a computable function
f (n) (for example n! or nn with n exponents) with the following
:::
n
Section XI
We now clarify what we mean by \arbitrarily dicult to compute."
Let f (n) be any computable function that carries natural numbers
into natural numbers. Such functions can get big very quickly indeed.
For example consider the function nn in which there are nn expo-
:::
n
nents. There are innite sets of natural numbers such that, no matter
how the computer is programmed, at least f (n) seconds will pass before
the computer emits all those elements of the set that are less than or
equal to n. Of course, a nite number of exceptions are possible, for any
nite part of an innite set can be computed very quickly by including
in the computer's program a table of the rst few elements of the set.
Note that the diculty in computing such sets of natural numbers does
not lie in the fact that their elements get very big very quickly, for even
small elements of such sets require more than astronomical amounts of
time to be computed. What is more, there are innite sets of natural
numbers that are arbitrarily dicult to compute and include 90 percent
of the natural numbers.
We nally exhibit innite sets of natural numbers that are arbitrar-
ily dicult to compute, and do not have any innite subsets essentially
easier to compute than the whole set. Consider the following tree of
natural numbers (gure 2).9 The innite sets of natural numbers that
we promised to exhibit are obtained by starting at the root of the tree
(that is, at 0) and walking forward, including in the set every natural
number that is stepped on.
This tree is used in Rogers 6], p. 158, in connection with retraceable sets.
9
Retraceable sets are in some ways analogous to those sets that concern us here.
On the Diculty of Computations 53
.7...
.3.
. .8...
.1.
. . .9...
. .4.
. .10...
0.
. .11...
. .5.
. . .12...
.2.
. .13...
.6.
.14...
Acknowledgment
The author wishes to express his gratitude to Prof. G. Pollitzer of
the University of Buenos Aires, whose constructive criticism much im-
54 Part I|Introductory/Tutorial/Survey Papers
proved the clarity of this presentation.
References
1] M. Davis, Ed., The Undecidable. Hewlett, N.Y.: Raven Press,
1965.
2] |, Computability and Unsolvability. New York: McGraw-Hill,
1958.
3] |, \Unsolvable problems: A review," Proc. Symp. on Mathe-
matical Theory of Automata. Brooklyn, N.Y.: Polytech. Inst.
Brooklyn Press, 1963, pp. 15{22.
4] |, \Applications of recursive function theory to number theory,"
Proc. Symp. in Pure Mathematics, vol. 5. Providence, R.I.: AMS,
1962, pp. 135{138.
5] M. Minsky, Computation: Finite and Innite Machines. Engle-
wood Clis, N.J.: Prentice-Hall, 1967.
6] H. Rogers, Jr., Theory of Recursive Functions and Eective Com-
putability. New York: McGraw-Hill, 1967.
7] M. A. Arbib, Theories of Abstract Automata. Englewood Clis,
N.J.: Prentice-Hall (to be published).
8] R. J. Solomono, \A formal theory of inductive inference," In-
form. and Control, vol. 7, pp. 1{22, March 1964 pp. 224{254,
June 1964.
9] A. N. Kolmogorov, \Three approaches to the denition of the
concept `quantity of information'," Probl. Peredachi Inform., vol.
1, pp. 3{11, 1965.
10] |, Foundations of the Theory of Probability. New York: Chelsea,
1950.
11] G. J. Chaitin, \On the length of programs for computing nite
binary sequences," J. ACM, vol. 13, pp. 547{569, October 1966.
On the Diculty of Computations 55
12] |, \On the length of programs for computing nite binary se-
quences: statistical considerations," J. ACM, vol. 16, pp. 145{
159, January 1969.
13] |, \On the simplicity and speed of programs for computing in-
nite sets of natural numbers," J. ACM, vol. 16, pp. 407{422, July
1969.
14] P. Martin-L
of, \The denition of random sequences," Inform. and
Control, vol. 9, pp. 602{619, December 1966.
15] C. E. Shannon and W. Weaver, The Mathematical Theory of
Communication. Urbana, Ill.: University of Illinois Press, 1949.
16] M. Blum, \A machine-independent theory of the complexity of
recursive functions," J. ACM, vol. 14, pp. 322{336, April 1967.
17] M. A. Arbib and M. Blum, \Machine dependence of degrees of
diculty," Proc. AMS, vol. 16, pp. 442{447, June 1965.
18] G. H. Hardy and E. M. Wright, An Introduction to the Theory of
Numbers. Oxford: Oxford University Press, 1962.
The following references have come to the author's atten-
tion since this lecture was given.
19] D. G. Willis, \Computational complexity and probability con-
structions," Stanford University, Stanford, Calif., March 1969.
20] A. N. Kolmogorov, \Logical basis for information theory and
probability theory," IEEE Trans. Information Theory, vol. IT-
14, pp. 662{664, September 1968.
21] D. W. Loveland, \A variant of the Kolmogorov concept of com-
plexity," Dept. of Math., Carnegie-Mellon University, Pittsburgh,
Pa., Rept. 69-4.
22] P. R. Young, \Toward a theory of enumerations," J. ACM, vol.
16, pp. 328{348, April 1969.
56 Part I|Introductory/Tutorial/Survey Papers
23] D. E. Knuth, The Art of Computer Programming vol. 2, Semi-
numerical Algorithms. Reading, Mass.: Addison-Wesley, 1969.
24] 1969 Conf. Rec. of the ACM Symp. on Theory of Computing (Ma-
rina del Rey, Calif.).
INFORMATION-
THEORETIC
COMPUTATIONAL
COMPLEXITY
Invited Paper
IEEE Transactions on Information Theory
IT-20 (1974), pp. 10{15
Gregory J. Chaitin1
Abstract
This paper attempts to describe, in nontechnical language, some of the
concepts and methods of one school of thought regarding computational
complexity. It applies the viewpoint of information theory to computers.
This will rst lead us to a denition of the degree of randomness of
individual binary strings, and then to an information-theoretic version
of Godel's theorem on the limitations of the axiomatic method. Finally,
we will examine in the light of these ideas the scientic method and von
57
58 Part I|Introductory/Tutorial/Survey Papers
Neumann's views on the basic conceptual problems of biology.
Appendix
In this Appendix we try to give a more detailed idea of how the results
concerning formal axiom systems that were stated are established.6
Two basic mathematical concepts that are employed are the con-
cepts of a recursive function and a partial recursive function. A function
is recursive if there is an algorithm for calculating its value when one
is given the value of its arguments, in other words, if there is a Tur-
ing machine for doing this. If it is possible that this algorithm never
terminates and the function is thus undened for some values of its
arguments, then the function is called partial recursive.7
In what follows we are concerned with computations involving bi-
nary strings. The binary strings are considered to be ordered in the
following manner: (, 0, 1, 00, 01, 10, 11, 000, 001, 010, : : : The natural
number n is represented by the nth binary string (n = 0 1 2 : : :). The
length of a binary string s is denoted lg(s). Thus if s is considered to
be a natural number, then lg(s) = log2(s + 1)]. Here x] is the greatest
integer x.
De
nition 1. A computer is a partial recursive function C (p). Its
argument p is a binary string. The value of C (p) is the binary string
output by the computer C when it is given the program p. If C (p) is
undened, this means that running the program p on C produces an
unending computation.
De
nition 2. The complexity IC (s) of a binary string s is dened
to be the length of the shortest program p that makes the computer C
output s, i.e.,
IC (s) = Cmin
(p)=s
lg(p):
If no program makes C output s, then IC (s) is dened to be innite.
5 Chandrasekaran and Reeker 15] discuss the relevance of complexity to articial
intelligence.
6 See 11], 12] for dierent approaches.
7 Full treatments of these concepts can be found in standard texts, e.g., Rogers
16].
68 Part I|Introductory/Tutorial/Survey Papers
De
nition 3. A computer U is universal if for any computer C and
any binary string s, IU (s) IC (s) + c, where the constant c depends
only on C .
It is easy to see that there are universal computers. For example,
consider the computer U such that U (0i 1p) = Ci(p), where Ci is the
ith computer, i.e., a program for U consists of two parts: the left-hand
part indicates which computer is to be simulated, and the right-hand
part gives the program to be simulated. We now suppose that some
particular universal computer U has been chosen as the standard one
for measuring complexities, and shall henceforth write I (s) instead of
IU (s).
De
nition 4. The rules of inference of a class of formal axiom
systems is a recursive function F (a h) (a a binary string, h a natural
number) with the property that F (a h) F (a h + 1). The value
of F (a h) is the nite (possibly empty) set of theorems that can be
proven from the axioms a by means of proofs h characters in length.
F (a) = Sh F (a h) is the set of theorems that are consequences of the
axioms a. The ordered pair hF ai, which implies both the choice of
rules of inference and axioms, is a particular formal axiom system.
This is a fairly abstract denition, but it retains all those features of
formal axiom systems that we need. Note that although one may not be
interested in some axioms (e.g., if they are false or incomprehensible),
it is stipulated that F (a h) is always dened.
Theorem 1. a) There is a constant c such that I (s) lg(s) + c
for all binary strings s. b) There are less than 2n binary strings of
complexity less than n.
Proof of a). There is a computer C such that C (p) = p for all
programs p. Thus for all binary strings s, I (s) IC (s) + c = lg(s) + c.
Proof of b). As there are less than 2n programs of length less than
n, there must be less than this number of binary strings of complexity
less than n. Q.E.D.
Thesis. A random binary string s is one having the property that
I (s) lg(s).
Theorem 2. Consider the rules of inference F . Suppose that a
proposition of the form \I (s) n" is in F (a) only if it is true, i.e., only
if I (s) n. Then a proposition of the form \I (s) n" is in F (a) only
if n lg(a) + c, where c is a constant that depends only on F .
Information-Theoretic Computational Complexity 69
Proof. Consider that binary string sk having the shortest proof
from the axioms a that it is of complexity > lg(a) + 2k. We claim that
I (sk ) lg(a) + k + c0, where c0 depends only on F . Taking k = c0,
we conclude that the binary string sc with the shortest proof from the
0
Q.E.D.
De
nition 5. An is dened to be the kth binary string of length
n, where k is the number of programs p of length < n for which U (p)
is dened, i.e., An has n and this number k coded into it.
Theorem 3. There are rules of inference F 1 such that for all n,
F (An) is the union of the set of all true propositions of the form
1
\I (s) = k" with k < n and the set of all true propositions of the form
\I (s) n."
Proof. From An one knows n and for how many programs p of
length < n U (p) is dened. One then simulates in parallel, running each
program p of length < n on U until one has determined the value of U (p)
for each p of length < n for which U (p) is dened. Knowing the value
of U (p) for each p of length < n for which U (p) is dened, one easily
determines each string of complexity < n and its complexity. What's
more, all other strings must be of complexity n. This completes our
sketch of how all true propositions of the form \I (s) = k" with k < n
and of the form \I (s) n" can be derived from the axiom An. Q.E.D.
70 Part I|Introductory/Tutorial/Survey Papers
Recall that we consider the nth binary string to be the natural
number n.
De
nition 6. The partial function B (n) is dened to be the biggest
natural number of complexity n, i.e.,
B (n) = Imax
(k)n
k = lg(max
p)n
U (p):
Theorem 4. Let f be a partial recursive function that carries
natural numbers into natural numbers. Then B (n) f (n) for all
suciently great values of n.
Proof. Consider the computer C such that C (p) = f (p) for all p.
I (f (n)) IC (f (n)) + c lg(n) + c = log2(n + 1)] + c < n
for all suciently great values of n. Thus B (n) f (n) for all suciently
great values of n. Q.E.D.
Theorem 5. Consider the rules of inference F . Let
Fn = F (a B (n))
a
where the union is taken over all binary strings a of length B (n),
i.e., Fn is the (nite) set of all theorems that can be deduced by means
of proofs with not more than B (n) characters from axioms with not
more than B (n) bits. Let sn be the rst binary string s not in any
proposition of the form \I (s) = k" in Fn. Then I (sn) n + c, where
the constant c depends only on F .
Proof. We claim that there is a computer C such that if U (p) =
B (n), then C (p) = sn. As, by the denition of B , there is a p0 of length
n such that U (p0 ) = B (n), it follows that
References
1] J. van Heijenoort, Ed., From Frege to Godel: A Source Book
in Mathematical Logic, 1879{1931. Cambridge, Mass.: Harvard
Univ. Press, 1967.
2] M. Davis, Ed., The Undecidable|Basic Papers on Undecidable
Propositions, Unsolvable Problems and Computable Functions.
Hewlett, N.Y.: Raven Press, 1965.
72 Part I|Introductory/Tutorial/Survey Papers
3] J. von Neumann and O. Morgenstern, Theory of Games and Eco-
nomic Behavior. Princeton, N.J.: Princeton Univ. Press, 1944.
4] |, \Method in the physical sciences," in John von Neumann|
Collected Works. New York: Macmillan, 1963, vol. 6, no. 35.
5] |, The Computer and the Brain. New Haven, Conn.: Yale Univ.
Press, 1958.
6] |, Theory of Self-Reproducing Automata. Urbana, Ill.: Univ.
Illinois Press, 1966. (Edited and completed by A. W. Burks.)
7] R. J. Solomono, \A formal theory of inductive inference," In-
form. Contr., vol. 7, pp. 1{22, Mar. 1964 also, pp. 224{254, June
1964.
8] A. N. Kolmogorov, \Logical basis for information theory and
probability theory," IEEE Trans. Inform. Theory, vol. IT-14, pp.
662{664, Sept. 1968.
9] G. J. Chaitin, \On the diculty of computations," IEEE Trans.
Inform. Theory, vol. IT-16, pp. 5{9, Jan. 1970.
10] |, \To a mathematical denition of `life'," ACM SICACT News,
no. 4, pp. 12{18, Jan. 1970.
11] |, \Computational complexity and G
odel's incompleteness theo-
rem," (Abstract) AMS Notices, vol. 17, p. 672, June 1970 (Paper)
ACM SIGACT News, no. 9, pp. 11{12, Apr. 1971.
12] |, \Information-theoretic limitations of formal systems," pre-
sented at the Courant Institute Computational Complexity Sy-
mp., N.Y., Oct. 1971. A revised version will appear in J. Ass.
Comput. Mach.
13] M. Kac, Statistical Independence in Probability, Analysis, and
Number Theory, Carus Math. Mono., Mathematical Association
of America, no. 12, 1959.
Information-Theoretic Computational Complexity 73
14] M. Eigen, \Selforganization of matter and the evolution of bi-
ological macromolecules," Die Naturwissenschaften, vol. 58, pp.
465{523, Oct. 1971.
15] B. Chandrasekaran and L. H. Reeker, \Articial intelligence|a
case for agnosticism," Ohio State University, Columbus, Ohio,
Rep. OSU-CISRC-TR-72-9, Aug. 1972 also, IEEE Trans. Syst.,
Man, Cybern., vol. SMC-4, pp. 88{94, Jan. 1974.
16] H. Rogers, Jr., Theory of Recursive Functions and Eective Com-
putability. New York: McGraw-Hill, 1967.
74 Part I|Introductory/Tutorial/Survey Papers
ALGORITHMIC
INFORMATION THEORY
Encyclopedia of Statistical Sciences, Vol-
ume 1, Wiley, New York, 1982, pp. 38{41
General References
1] Chaitin, G. J. (1975). Sci. Amer., 232 (5), 47{52. (An introduc-
tion to algorithmic information theory emphasizing the meaning
80 Part I|Introductory/Tutorial/Survey Papers
of the basic concepts.)
2] Chaitin, G. J. (1977). IBM J. Res. Dev., 21, 350{359, 496. (A
survey of algorithmic information theory.)
3] Davis, M., ed. (1965). The Undecidable|Basic Papers on Unde-
cidable Propositions, Unsolvable Problems and Computable Func-
tions . Raven Press, New York.
4] Davis, M. (1978). In Mathematics Today: Twelve Informal Es-
says . L. A. Steen, ed. Springer-Verlag, New York, pp. 241{267.
(An introduction to algorithmic information theory largely de-
voted to a detailed presentation of the relevant background in
computability theory and mathematical logic.)
5] Fine, T. L. (1973). Theories of Probability: An Examination of
Foundations . Academic Press, New York. (A survey of the re-
markably diverse proposals that have been made for formulating
probability mathematically. Caution: The material on algorith-
mic information theory contains some inaccuracies, and it is also
somewhat dated as a result of recent rapid progress in this eld.)
6] Gardner, M. (1979). Sci. Amer., 241 (5), 20{34. (An introduc-
tion to algorithmic information theory emphasizing the funda-
mental role played by $.)
7] Heijenoort, J. van, ed. (1977). From Frege to Godel: A Source
Book in Mathematical Logic, 1879{1931 . Harvard University
Press, Cambridge, Mass. (This book and ref. 3 comprise a stimu-
lating collection of all the classic papers on computability theory
and mathematical logic.)
8] Hofstadter, D. R. (1979). Godel, Escher, Bach: An Eternal
Golden Braid . Basic Books, New York. (The longest and most lu-
cid introduction to computability theory and mathematical logic.)
9] Shannon, C. E. and Weaver, W. (1949). The Mathematical The-
ory of Communication . University of Illinois Press, Urbana, Ill.
(The rst and still one of the very best books on classical infor-
mation theory.)
Algorithmic Information Theory 81
Additional References
10] Chaitin, G. J. (1966). J. ACM, 13, 547{569 16, 145{159 (1969).
11] Chaitin, G. J. (1974). IEEE Trans. Inf. Theory, IT-20, 10{15.
12] Chaitin, G. J. (1975). J. ACM, 22, 329{340.
13] Chaitin, G. J. (1979). In The Maximum Entropy Formalism, R.
D. Levine and M. Tribus, eds. MIT Press, Cambridge, Mass., pp.
477{498.
14] Chaitin, G. J. and Schwartz, J. T. (1978). Commun. Pure Appl.
Math., 31, 521{527.
15] Church, A. (1940). Bull. AMS, 46, 130{135.
16] Feistel, H. (1973). Sci. Amer., 228 (5), 15{23.
17] Ga%c, P. (1974). Sov. Math. Dokl., 15, 1477{1480.
18] Kac, M. (1959). Statistical Independence in Probability, Analy-
sis and Number Theory . Mathematical Association of America,
Washington, D.C.
19] Kolmogorov, A. N. (1965). Problems of Inf. Transmission, 1, 1{7.
20] Levin, L. A. (1974). Problems of Inf. Transmission, 10, 206{210.
21] Martin-L
of, P. (1966). Inf. Control, 9, 602{619.
22] Solomono, R. J. (1964). Inf. Control, 7, 1{22, 224{254.
(Entropy
Information Theory
Martingales
Monte Carlo Methods
Pseudo-Random Number Generators
Statistical Independence
Tests of Randomness)
82 Part I|Introductory/Tutorial/Survey Papers
G. J. Chaitin
ALGORITHMIC
INFORMATION THEORY
IBM Journal of Research and Development
21 (1977), pp. 350{359, 496
G. J. Chaitin
Abstract
This paper reviews algorithmic information theory, which is an attempt
to apply information-theoretic and probabilistic ideas to recursive func-
tion theory. Typical concerns in this approach are, for example, the
number of bits of information required to specify an algorithm, or the
probability that a program whose bits are chosen by coin ipping pro-
duces a given output. During the past few years the denitions of algo-
rithmic information theory have been reformulated. The basic features
of the new formalism are presented here and certain results of R. M.
Solovay are reported.
83
84 Part I|Introductory/Tutorial/Survey Papers
Historical Introduction
To our knowledge, the rst publication of the ideas of algorithmic in-
formation theory was the description of R. J. Solomono's ideas given
in 1962 by M. L. Minsky in his paper, \Problems of formulation for
articial intelligence" 1]:
Consider a slightly dierent form of inductive inference
problem. Suppose that we are given a very long \data"
sequence of symbols the problem is to make a prediction
about the future of the sequence. This is a problem famil-
iar in discussion concerning \inductive probability." The
problem is refreshed a little, perhaps, by introducing the
modern notion of universal computer and its associated
language of instruction formulas. An instruction sequence
will be considered acceptable if it causes the computer to
produce a sequence, perhaps innite, that begins with the
given nite \data" sequence. Each acceptable instruction
sequence thus makes a prediction, and Occam's razor would
choose the simplest such sequence and advocate its predic-
tion. (More generally, one could weight the dierent pre-
dictions by weights associated with the simplicities of the
instructions.) If the simplicity function is just the length
of the instructions, we are then trying to nd a minimal
description, i.e., an optimally ecient encoding of the data
sequence.
Such an induction method could be of interest only if
one could show some signicant invariance with respect to
choice of dening universal machine. There is no such in-
variance for a xed pair of data strings. For one could design
a machine which would yield the entire rst string with a
very small input, and the second string only for some very
complex input. On the brighter side, one can see that in a
sense the induced structure on the space of data strings has
some invariance in an \in the large" or \almost everywhere"
sense. Given two dierent universal machines, the induced
structures cannot be desperately dierent. We appeal to
Algorithmic Information Theory 85
the \translation theorem" whereby an arbitrary instruction
formula for one machine may be converted into an equiva-
lent instruction formula for the other machine by the addi-
tion of a constant prex text. This text instructs the second
machine to simulate the behavior of the rst machine in op-
erating on the remainder of the input text. Then for data
strings much larger than this translation text (and its in-
verse) the choice between the two machines cannot greatly
aect the induced structure. It would be interesting to see
if these intuitive notions could be protably formalized.
Even if this theory can be worked out, it is likely that
it will present overwhelming computational diculties in
application. The recognition problem for minimal descrip-
tions is, in general, unsolvable, and a practical induction
machine will have to use heuristic methods. In this con-
nection it would be interesting to write a program to play
R. Abbott's inductive card game 2].]
Algorithmic information theory originated in the independent work
of Solomono (see 1, 3{6]), of A. N. Kolmogorov and P. Martin-L
of
(see 7{14]), and of G. J. Chaitin (see 15{26]). Whereas Solomono
weighted together all the programs for a given result into a probability
measure, Kolmogorov and Chaitin concentrated their attention on the
size of the smallest program. Recently it has been realized by Chaitin
and independently by L. A. Levin that if programs are stipulated to
be self-delimiting, these two diering approaches become essentially
equivalent. This paper attempts to cast into a unied scheme the recent
work in this area by Chaitin 23,24] and by R. M. Solovay 27,28]. The
reader may also nd it interesting to examine the parallel eorts of
Levin (see 29{35]). There has been a substantial amount of other
work in this general area, often involving variants of the denitions
deemed more suitable for particular applications (see, e.g., 36{47]).
86 Part I|Introductory/Tutorial/Survey Papers
Algorithmic Information Theory of Finite
Computations 23]
Denitions
Let us start by considering a class of Turing machines with the following
characteristics. Each Turing machine has three tapes: a program tape,
a work tape, and an output tape. There is a scanning head on each
of the three tapes. The program tape is read-only and each of its
squares contains a 0 or a 1. It may be shifted in only one direction.
The work tape may be shifted in either direction and may be read
and erased, and each of its squares contains a blank, a 0, or a 1. The
work tape is initially blank. The output tape may be shifted in only
one direction. Its squares are initially blank, and may have a 0, a 1,
or a comma written on them, and cannot be rewritten. Each Turing
machine of this type has a nite number n of states, and is dened by
an n
3 table, which gives the action to be performed and the next
state as a function of the current state and the contents of the square
of the work tape that is currently being scanned. The rst state in
this table is by convention the initial state. There are eleven possible
actions: halt, shift work tape left/right, write blank/0/1 on work tape,
read square of program tape currently being scanned and copy onto
square of work tape currently being scanned and then shift program
tape, write 0/1/comma on output tape and then shift output tape,
and consult oracle. The oracle is included for the purpose of dening
relative concepts. It enables the Turing machine to choose between
two possible state transitions, depending on whether or not the binary
string currently being scanned on the work tape is in a certain set,
which for now we shall take to be the null set.
From each Turing machine M of this type we dene a probability
P , an entropy H , and a complexity I . P (s) is the probability that M
eventually halts with the string s written on its output tape if each
square of the program tape is lled with a 0 or a 1 by a separate toss
of an unbiased coin. By \string" we shall always mean a nite binary
string. From the probability P (s) we obtain the entropy H (s) by taking
the negative base-two logarithm, i.e., H (s) is ; log2 P (s). A string p is
Algorithmic Information Theory 87
said to be a program if when it is written on M 's program tape and M
starts computing scanning the rst bit of p, then M eventually halts
after reading all of p and without reading any other squares of the tape.
A program p is said to be a minimal program if no other program makes
M produce the same output and has a smaller size. And nally the
complexity I (s) is dened to be the least n such that for some contents
of its program tape M eventually halts with s written on the output
tape after reading precisely n squares of the program tape i.e., I (s) is
the size of a minimal program for s. To summarize, P is the probability
that M calculates s given a random program, H is ; log2 P , and I is
the minimum number of bits required to specify an algorithm for M to
calculate s.
It is important to note that blanks are not allowed on the program
tape, which is imagined to be entirely lled with 0's and 1's. Thus pro-
grams are not followed by endmarker blanks. This forces them to be
self-delimiting a program must indicate within itself what size it has.
Thus no program can be a prex of another one, and the programs for
M form what is known as a prex-free set or an instantaneous code.
This has two very important eects: It enables a natural probability
distribution to be dened on the set of programs, and it makes it pos-
sible for programs to be built up from subroutines by concatenation.
Both of these desirable features are lost if blanks are used as program
endmarkers. This occurs because there is no natural probability distrib-
ution on programs with endmarkers one, of course, makes all programs
of the same size equiprobable, but it is also necessary to specify in some
arbitrary manner the probability of each particular size. Moreover, if
two subroutines with blanks as endmarkers are concatenated, it is nec-
essary to include additional information indicating where the rst one
ends and the second one begins.
Here is an example of a specic Turing machine M of the above
type. M counts the number n of 0's up to the rst 1 it encounters
on its program tape, then transcribes the next n bits of the program
tape onto the output tape, and nally halts. So M outputs s i it
nds length(s) 0's followed by a 1 followed by s on its program tape.
Thus P (s) = exp2 ;2 length(s) ; 1], H (s) = 2 length(s)+1, and I (s) =
2 length(s) + 1. Here exp2 x] is the base-two exponential function 2x.
Clearly this is a very special-purpose computer which embodies a very
88 Part I|Introductory/Tutorial/Survey Papers
limited class of algorithms and yields uninteresting functions P , H , and
I.
On the other hand it is easy to see that there are \general-purpose"
Turing machines that maximize P and minimize H and I in fact, con-
sider those universal Turing machines which will simulate an arbitrary
Turing machine if a suitable prex indicating the machine to simulate
is added to its programs. Such Turing machines yield essentially the
same P , H , and I . We therefore pick, somewhat arbitrarily, a par-
ticular one of these, U , and the denitive denition of P , H , and I
is given in terms of it. The universal Turing machine U works as fol-
lows. If U nds i 0's followed by a 1 on its program tape, it simulates
the computation that the ith Turing machine of the above type per-
forms upon reading the remainder of the program tape. By the ith
Turing machine we mean the one that comes ith in a list of all possible
dening tables in which the tables are ordered by size (i.e., number
of states) and lexicographically among those of the same size. With
this choice of Turing machine, P , H , and I can be dignied with the
following titles: P (s) is the algorithmic probability of s, H (s) is the
algorithmic entropy of s, and I (s) is the algorithmic information of s.
Following Solomono 3], P (s) and H (s) may also be called the a priori
probability and entropy of s. I (s) may also be termed the descrip-
tive, program-size, or information-theoretic complexity of s. And since
P is maximal and H and I are minimal, the above choice of special-
purpose Turing machine shows that P (s) exp2 ;2 length(s) ; O(1)],
H (s) 2 length(s) + O(1), and I (s) 2 length(s) + O(1).
We have dened P (s), H (s), and I (s) for individual strings s.
It is also convenient to consider computations which produce nite
sequences of strings. These are separated by commas on the out-
put tape. One thus denes the joint probability P (s1 : : : sn), the
joint entropy H (s1 : : : sn ), and the joint complexity I (s1 : : : sn ) of
an n-tuple s1 : : : sn. Finally one denes the conditional probabil-
ity P (t1 : : : tmjs1 : : : sn) of the m-tuple t1 : : : tm given the n-tuple
s1 : : : sn to be the quotient of the joint probability of the n-tuple and
the m-tuple divided by the joint probability of the n-tuple. In particu-
lar P (tjs) is dened to be P (s t)=P (s). And of course the conditional
entropy is dened to be the negative base-two logarithm of the condi-
tional probability. Thus by denition H (s t) = H (s) + H (tjs). Finally,
Algorithmic Information Theory 89
in order to extend the above denitions to tuples whose members may
either be strings or natural numbers, we identify the natural number n
with its binary expansion.
Basic Relationships
We now review some basic properties of these concepts. The relation
H (s t) = H (t s) + O(1)
states that the probability of computing the pair s t is essentially the
same as the probability of computing the pair t s. This is true because
there is a prex that converts any program for one of these pairs into
a program for the other one. The inequality
H (s) H (s t) + O(1)
states that the probability of computing s is not less than the proba-
bility of computing the pair s t. This is true because a program for s
can be obtained from any program for the pair s t by adding a xed
prex to it. The inequality
H (s t) H (s) + H (t) + O(1)
states that the probability of computing the pair s t is not less than the
product of the probabilities of computing s and t, and follows from the
fact that programs are self-delimiting and can be concatenated. The
inequality
O(1) H (tjs) H (t) + O(1)
is merely a restatement of the previous two properties. However, in
view of the direct relationship between conditional entropy and relative
complexity indicated below, this inequality also states that being told
something by an oracle cannot make it more dicult to obtain t. The
relationship between entropy and complexity is
H (s) = I (s) + O(1)
i.e., the probability of computing s is essentially the same as 1= exp2 the
size of a minimal program for s]. This implies that a signicant frac-
tion of the probability of computing s is contributed by its minimal
90 Part I|Introductory/Tutorial/Survey Papers
programs, and that there are few minimal or near-minimal programs
for a given result. The relationship between conditional entropy and
relative complexity is
H (tjs) = Is(t) + O(1):
Here Is(t) denotes the complexity of t relative to a set having a single
element which is a minimal program for s. In other words,
I (s t) = I (s) + Is(t) + O(1):
This relation states that one obtains what is essentially a minimal pro-
gram for the pair s t by concatenating the following two subroutines:
a minimal program for s
Algorithmic Randomness
Consider an arbitrary string s of length n. From the fact that
H (n) + H (sjn) = H (n s) = H (s) + O(1)
it is easy to show that H (s) n + H (n) + O(1), and that less than
exp2 n ; k + O(1)] of the s of length n satisfy H (s) < n + H (n) ; k.
It follows that for most s of length n, H (s) is approximately equal to
n + H (n). These are the most complex strings of length n, the ones
which are most dicult to specify, the ones with highest entropy, and
they are said to be the algorithmically random strings of length n. Thus
a typical string s of length n will have H (s) close to n + H (n), whereas
if s has pattern or can be distinguished in some fashion, then it can
be compressed or coded into a program that is considerably smaller.
That H (s) is usually n + H (n) can be thought of as follows: In order
to specify a typical strings s of length n, it is necessary to rst specify
its size n, which requires H (n) bits, and it is necessary then to specify
each of the n bits in s, which requires n more bits and brings the
total to n + H (n). In probabilistic terms this can be stated as follows:
Algorithmic Information Theory 91
the sum of the probabilities of all the strings of length n is essentially
equal to P (n), and most strings s of length n have probability P (s)
essentially equal to P (n)=2n . On the other hand, one of the strings of
length n that is least random and that has most pattern is the string
consisting entirely of 0's. It is easy to see that this string has entropy
H (n)+ O(1) and probability essentially equal to P (n), which is another
way of saying that almost all the information in it is in its length. Here
is an example in the middle: If p is a minimal program of size n, then it
is easy to see that H (p) = n + O(1) and P (p) is essentially 2;n . Finally
it should be pointed out that since H (s) = H (n)+ H (sjn)+ O(1) if s is
of length n, the above denition of randomness is equivalent to saying
that the most random strings of length n have H (sjn) close to n, while
the least random ones have H (sjn) close to 0.
Later we shall show that even though most strings are algorithmi-
cally random, i.e., have nearly as much entropy as possible, an inherent
limitation of formal axiomatic theories is that a lower bound n on the
entropy of a specic string can be established only if n is less than
the entropy of the axioms of the formal theory. In other words, it is
possible to prove that a specic object is of complexity greater than
n only if n is less than the complexity of the axioms being employed
in the demonstration. These statements may be considered to be an
information-theoretic version of G
odel's famous incompleteness theo-
rem.
Now let us turn from nite random strings to innite ones, or equiv-
alently, by invoking the correspondence between a real number and its
dyadic expansion, to random reals. Consider an innite string X ob-
tained by ipping an unbiased coin, or equivalently a real x uniformly
distributed in the unit interval. From the preceding considerations and
the Borel-Cantelli lemma it is easy to see that with probability one
there is a c such that H (Xn ) > n ; c for all n, where Xn denotes the
rst n bits of X , that is, the rst n bits of the dyadic expansion of x.
We take this property to be our denition of an algorithmically random
innite string X or real x.
Algorithmic randomness is a clear-cut property for innite strings,
but in the case of nite strings it is a matter of degree. If a cuto were
to be chosen, however, it would be well to place it at about the point at
which H (s) is equal to length(s). Then an innite random string could
92 Part I|Introductory/Tutorial/Survey Papers
be dened to be one for which all initial segments are nite random
strings, within a certain tolerance.
Now consider the real number $ dened as the halting probability of
the universal Turing machine U that we used to dene P , H , and I i.e.,
$ is the probability that U eventually halts if each square of its program
tape is lled with a 0 or a 1 by a separate toss of an unbiased coin. Then
it is not dicult to see that $ is in fact an algorithmically random real,
because if one were given the rst n bits of the dyadic expansion of $,
then one could use this to tell whether each program for U of size less
than n ever halts or not. In other words, when written in binary the
probability of halting $ is a random or incompressible innite string.
Thus the basic theorem of recursive function theory that the halting
problem is unsolvable corresponds in algorithmic information theory to
the theorem that the probability of halting is algorithmically random
if the program is chosen by coin ipping.
This concludes our review of the most basic facts regarding the
probability, entropy, and complexity of nite objects, namely strings
and tuples of strings. Before presenting some of Solovay's remark-
able results regarding these concepts, and in particular regarding $, we
would like to review the most important facts which are known regard-
ing the probability, entropy, and complexity of innite objects, namely
recursively enumerable sets of strings.
There are exp2 n ; H (n) + O(1)] singleton sets A with I (A) < n.
There are exp2 n ; H (Ln ) + O(log H (Ln ))] sets A with I (A) < n,
There are exp2 n;H 0 (Ln )+O(log H 0 (Ln ))] sets A with H (A) < n.
Here Ln is the set of natural numbers less than n, and H 0 is the entropy
relative to the halting problem if U is provided with an oracle for the
halting problem instead of one for the null set, then the probability,
entropy, and complexity measures one obtains are P 0, H 0, and I 0 instead
of P , H , and I . Two nal results:
I 0(A the complement of A) H (A) + O(1)
Acknowledgments
The quotation by M. L. Minsky in the rst section is reprinted with
the kind permission of the publisher American Mathematical Society
from Mathematical Problems in the Biological Sciences, Proceedings of
Symposia in Applied Mathematics XIV, pp. 42{43, copyright c 1962.
We are grateful to R. M. Solovay for permitting us to include several of
his unpublished results in the section entitled \More advanced results."
The quotation by M. Gardner in the section on algorithmic information
theory and metamathematics is reprinted with his kind permission, and
the quotation by B. Russell in that section is reprinted with permission
of the Johns Hopkins University Press. We are grateful to C. H. Bennett
for permitting us to present his notion of logical depth in print for the
rst time in the section on algorithmic information theory and biology.
References
1] M. L. Minsky, \Problems of Formulation for Articial Intelli-
gence," Mathematical Problems in the Biological Sciences, Pro-
ceedings of Symposia in Applied Mathematics XIV, R. E. Bell-
man, ed., American Mathematical Society, Providence, RI, 1962,
p. 35.
2] M. Gardner, \An Inductive Card Game," Sci. Amer. 200, No. 6,
160 (1959).
3] R. J. Solomono, \A Formal Theory of Inductive Inference," Info.
Control 7, 1, 224 (1964).
104 Part I|Introductory/Tutorial/Survey Papers
4] D. G. Willis, \Computational Complexity and Probability Con-
structions," J. ACM 17, 241 (1970).
5] T. M. Cover, \Universal Gambling Schemes and the Complex-
ity Measures of Kolmogorov and Chaitin," Statistics Department
Report 12, Stanford University, CA, October, 1974.
6] R. J. Solomono, \Complexity Based Induction Systems: Com-
parisons and Convergence Theorems," Report RR-329, Rockford
Research, Cambridge, MA, August, 1976.
7] A. N. Kolmogorov, \On Tables of Random Numbers," Sankhya
A25, 369 (1963).
8] A. N. Kolmogorov, \Three Approaches to the Quantitative De-
nition of Information," Prob. Info. Transmission 1, No. 1, 1
(1965).
9] A. N. Kolmogorov, \Logical Basis for Information Theory and
Probability Theory," IEEE Trans. Info. Theor. IT-14, 662
(1968).
10] P. Martin-L
of, \The Denition of Random Sequences," Info. Con-
trol 9, 602 (1966).
11] P. Martin-L
of, \Algorithms and Randomness," Intl. Stat. Rev.
37, 265 (1969).
12] P. Martin-L
of, \The Literature on von Mises' Kollektivs Revis-
ited," Theoria 35, Part 1, 12 (1969).
13] P. Martin-L
of, \On the Notion of Randomness," Intuitionism and
Proof Theory, A. Kino, J. Myhill, and R. E. Vesley, eds., North-
Holland Publishing Co., Amsterdam, 1970, p. 73.
14] P. Martin-L
of, \Complexity Oscillations in Innite Binary Se-
quences," Z. Wahrscheinlichk. verwand. Geb. 19, 225 (1971).
15] G. J. Chaitin, \On the Length of Programs for Computing Finite
Binary Sequences," J. ACM 13, 547 (1966).
Algorithmic Information Theory 105
16] G. J. Chaitin, \On the Length of Programs for Computing Finite
Binary Sequences: Statistical Considerations," J. ACM 16, 145
(1969).
17] G. J. Chaitin, \On the Simplicity and Speed of Programs for
Computing Innite Sets of Natural Numbers," J. ACM 16, 407
(1969).
18] G. J. Chaitin, \On the Diculty of Computations," IEEE Trans.
Info. Theor. IT-16, 5 (1970).
19] G. J. Chaitin, \To a Mathematical Denition of `Life'," ACM
SICACT News 4, 12 (1970).
20] G. J. Chaitin, \Information-theoretic Limitations of Formal Sys-
tems," J. ACM 21, 403 (1974).
21] G. J. Chaitin, \Information-theoretic Computational Complex-
ity," IEEE Trans. Info. Theor. IT-20, 10 (1974).
22] G. J. Chaitin, \Randomness and Mathematical Proof," Sci.
Amer. 232, No. 5, 47 (1975). (Also published in the Japanese
and Italian editions of Sci. Amer.)
23] G. J. Chaitin, \A Theory of Program Size Formally Identical to
Information Theory," J. ACM 22, 329 (1975).
24] G. J. Chaitin, \Algorithmic Entropy of Sets," Comput. & Math.
Appls. 2, 233 (1976).
25] G. J. Chaitin, \Information-theoretic Characterizations of Recur-
sive Innite Strings," Theoret. Comput. Sci. 2, 45 (1976).
26] G. J. Chaitin, \Program Size, Oracles, and the Jump Operation,"
Osaka J. Math., to be published in Vol. 14, No. 1, 1977.
27] R. M. Solovay, \Draft of a paper: : : on Chaitin's work: : : done for
the most part during the period of Sept.{Dec. 1974," unpublished
manuscript, IBM Thomas J. Watson Research Center, Yorktown
Heights, NY, May, 1975.
106 Part I|Introductory/Tutorial/Survey Papers
28] R. M. Solovay, \On Random R. E. Sets," Proceedings of the Third
Latin American Symposium on Mathematical Logic, Campinas,
Brazil, July, 1976. To be published.
29] A. K. Zvonkin and L. A. Levin, \The Complexity of Finite Ob-
jects and the Development of the Concepts of Information and
Randomness by Means of the Theory of Algorithms," Russ. Math.
Surv. 25, No. 6, 83 (1970).
30] L. A. Levin, \On the Notion of a Random Sequence," Soviet
Math. Dokl. 14, 1413 (1973).
31] P. Ga%c, \On the Symmetry of Algorithmic Information," Soviet
Math. Dokl. 15, 1477 (1974). \Corrections," Soviet Math. Dokl.
15, No. 6, v (1974).
32] L. A. Levin, \Laws of Information Conservation (Nongrowth) and
Aspects of the Foundation of Probability Theory," Prob. Info.
Transmission 10, 206 (1974).
33] L. A. Levin, \Uniform Tests of Randomness," Soviet Math. Dokl.
17, 337 (1976).
34] L. A. Levin, \Various Measures of Complexity for Finite Objects
(Axiomatic Description)," Soviet Math. Dokl. 17, 522 (1976).
35] L. A. Levin, \On the Principle of Conservation of Information in
Intuitionistic Mathematics," Soviet Math. Dokl. 17, 601 (1976).
36] D. E. Knuth, Seminumerical Algorithms. The Art of Computer
Programming, Volume 2, Addison-Wesley Publishing Co., Inc.,
Reading, MA, 1969. See Ch. 2, \Random Numbers," p. 1.
37] D. W. Loveland, \A Variant of the Kolmogorov Concept of Com-
plexity," Info. Control 15, 510 (1969).
38] T. L. Fine, Theories of Probability|An Examination of Founda-
tions, Academic Press, Inc., New York, 1973. See Ch. V, \Com-
putational Complexity, Random Sequences, and Probability," p.
118.
Algorithmic Information Theory 107
39] J. T. Schwartz, On Programming: An Interim Report on the
SETL Project. Installment I: Generalities, Lecture Notes, Cou-
rant Institute of Mathematical Sciences, New York University,
1973. See Item 1, \On the Sources of Diculty in Programming,"
p. 1, and Item 2, \A Second General Reection on Programming,"
p. 12.
40] T. Kamae, \On Kolmogorov's Complexity and Information," Os-
aka J. Math. 10, 305 (1973).
41] C. P. Schnorr, \Process Complexity and Eective Random Tests,"
J. Comput. Syst. Sci. 7, 376 (1973).
42] M. E. Hellman, \The Information Theoretic Approach to Cryp-
tography," Information Systems Laboratory, Center for Systems
Research, Stanford University, April, 1974.
43] W. L. Gewirtz, \Investigations in the Theory of Descriptive Com-
plexity," Courant Computer Science Report 5, Courant Institute
of Mathematical Sciences, New York University, October, 1974.
44] R. P. Daley, \Minimal-program Complexity of Pseudo-recursive
and Pseudo-random Sequences," Math. Syst. Theor. 9, 83 (1975).
45] R. P. Daley, \Noncomplex Sequences: Characterizations and Ex-
amples," J. Symbol. Logic 41, 626 (1976).
46] J. Gruska, \Descriptional Complexity (of Languages)|A Short
Survey," Mathematical Foundations of Computer Science 1976,
A. Mazurkiewicz, ed., Lecture Notes in Computer Science 45,
Springer-Verlag, Berlin, 1976, p. 65.
47] J. Ziv, \Coding Theorems for Individual Sequences," undated
manuscript, Bell Laboratories, Murray Hill, NJ.
48] R. M. Solovay, \A Model of Set-theory in which Every Set of
Reals is Lebesgue Measurable," Ann. Math. 92, 1 (1970).
49] R. Solovay and V. Strassen, \A Fast Monte-Carlo Test for Pri-
mality," SIAM J. Comput. 6, 84 (1977).
108 Part I|Introductory/Tutorial/Survey Papers
50] G. H. Hardy, A Course of Pure Mathematics, Tenth edition, Cam-
bridge University Press, London, 1952. See Section 218, \Loga-
rithmic Tests of Convergence for Series and Integrals," p. 417.
51] M. Gardner, \A Collection of Tantalizing Fallacies of Mathemat-
ics," Sci. Amer. 198, No. 1, 92 (1958).
52] B. Russell, \Mathematical Logic as Based on the Theory of
Types," From Frege to Godel: A Source Book in Mathemati-
cal Logic, 1879{1931, J. van Heijenoort, ed., Harvard University
Press, Cambridge, MA, 1967, p. 153 reprinted from Amer. J.
Math. 30, 222 (1908).
53] M. Levin, \Mathematical Logic for Computer Scientists," MIT
Project MAC TR-131, June, 1974, pp. 145, 153.
54] J. von Neumann, Theory of Self-reproducing Automata, Univer-
sity of Illinois Press, Urbana, 1966 edited and completed by A.
W. Burks.
55] C. H. Bennett, \On the Thermodynamics of Computation," un-
dated manuscript, IBM Thomas J. Watson Research Center,
Yorktown Heights, NY.
56] C. H. Bennett, \Logical Reversibility of Computation," IBM J.
Res. Develop. 17, 525 (1973).
109
GO DEL'S THEOREM AND
INFORMATION
International Journal of Theoretical
Physics 22 (1982), pp. 941{954
Gregory J. Chaitin
IBM Research, P.O. Box 218
Yorktown Heights, New York 10598
Abstract
Godel's theorem may be demonstrated using arguments having an
information-theoretic avor. In such an approach it is possible to ar-
gue that if a theorem contains more information than a given set of
axioms, then it is impossible for the theorem to be derived from the ax-
ioms. In contrast with the traditional proof based on the paradox of the
liar, this new viewpoint suggests that the incompleteness phenomenon
discovered by Godel is natural and widespread rather than pathological
and unusual.
111
112 Part II|Applications to Metamathematics
1. Introduction
To set the stage, let us listen to Hermann Weyl (1946), as quoted by
Eric Temple Bell (1951):
We are less certain than ever about the ultimate foun-
dations of (logic and) mathematics. Like everybody and
everything in the world today, we have our \crisis." We
have had it for nearly fty years. Outwardly it does not
seem to hamper our daily work, and yet I for one confess
that it has had a considerable practical inuence on my
mathematical life: it directed my interests to elds I con-
sidered relatively \safe," and has been a constant drain on
the enthusiasm and determination with which I pursued my
research work. This experience is probably shared by other
mathematicians who are not indierent to what their scien-
tic endeavors mean in the context of man's whole caring
and knowing, suering and creative existence in the world.
And these are the words of John von Neumann (1963):
: : : there have been within the experience of people
now living at least three serious crises: : : There have been
two such crises in physics|namely, the conceptual soul-
searching connected with the discovery of relativity and the
conceptual diculties connected with discoveries in quan-
tum theory: : : The third crisis was in mathematics. It was
a very serious conceptual crisis, dealing with rigor and the
proper way to carry out a correct mathematical proof. In
view of the earlier notions of the absolute rigor of mathemat-
ics, it is surprising that such a thing could have happened,
and even more surprising that it could have happened in
these latter days when miracles are not supposed to take
place. Yet it did happen.
At the time of its discovery, Kurt G
odel's incompleteness theorem
was a great shock and caused much uncertainty and depression among
mathematicians sensitive to foundational issues, since it seemed to pull
Godel's Theorem and Information 113
the rug out from under mathematical certainty, objectivity, and rigor.
Also, its proof was considered to be extremely dicult and recondite.
With the passage of time the situation has been reversed. A great many
dierent proofs of G
odel's theorem are now known, and the result is
now considered easy to prove and almost obvious: It is equivalent to the
unsolvability of the halting problem, or alternatively to the assertion
that there is an r.e. (recursively enumerable) set that is not recursive.
And it has had no lasting impact on the daily lives of mathematicians
or on their working habits no one loses sleep over it any more.
G
odel's original proof constructed a paradoxical assertion that is
true but not provable within the usual formalizations of number the-
ory. In contrast I would like to measure the power of a set of axioms
and rules of inference. I would like to able to say that if one has ten
pounds of axioms and a twenty-pound theorem, then that theorem can-
not be derived from those axioms. And I will argue that this approach
to G
odel's theorem does suggest a change in the daily habits of math-
ematicians, and that G
odel's theorem cannot be shrugged away.
To be more specic, I will apply the viewpoint of thermodynamics
and statistical mechanics to G
odel's theorem, and will use such con-
cepts as probability, randomness, entropy, and information to study
the incompleteness phenomenon and to attempt to evaluate how wide-
spread it is. On the basis of this analysis, I will suggest that mathe-
matics is perhaps more akin to physics than mathematicians have been
willing to admit, and that perhaps a more exible attitude with re-
spect to adopting new axioms and methods of reasoning is the proper
response to G
odel's theorem. Probabilistic proofs of primality via sam-
pling (Chaitin and Schwartz, 1978) also suggest that the sources of
mathematical truth are wider than usually thought. Perhaps number
theory should be pursued more openly in the spirit of experimental
science (P
olya, 1959)!
I am indebted to John McCarthy and especially to Jacob Schwartz
for making me realize that G
odel's theorem is not an obstacle to a
practical AI (articial intelligence) system based on formal logic. Such
an AI would take the form of an intelligent proof checker. Gottfried
Wilhelm Liebnitz and David Hilbert's dream that disputes could be
settled with the words \Gentlemen, let us compute!" and that mathe-
matics could be formalized, should still be a topic for active research.
114 Part II|Applications to Metamathematics
Even though mathematicians and logicians have erroneously dropped
this train of thought dissuaded by G
odel's theorem, great advances
have in fact been made \covertly," under the banner of computer sci-
ence, LISP, and AI (Cole et al., 1981 Dewar et al., 1981 Levin, 1974
Wilf, 1982).
To speak in metaphors from Douglas Hofstadter (1979), we shall
now stroll through an art gallery of proofs of G
odel's theorem, to the
tune of Moussorgsky's pictures at an exhibition! Let us start with
some traditional proofs (Davis, 1978 Hofstadter, 1979 Levin, 1974
Post, 1965).
References
Let me give a few pointers to the literature. The following are my pre-
vious publications on G
odel's theorem: Chaitin, 1974a, 1974b, 1975a,
1977, 1982 Chaitin and Schwartz, 1978. Related publications by other
authors include Davis, 1978 Gardner, 1979 Hofstadter, 1979 Levin,
1974 Post, 1965. For discussions of the epistemology of mathematics
and science, see Einstein, 1944, 1954 Feynman, 1965 G
odel, 1964
P
olya, 1959 von Neumann, 1956, 1963 Taub, 1961 Weyl, 1946, 1949.
Godel's Theorem and Information 127
Bell, E. T. (1951). Mathematics, Queen and Servant of Science,
McGraw-Hill, New York.
Bennett, C. H. (1982). The thermodynamics of computation|a
review, International Journal of Theoretical Physics, 21, 905{940.
Chaitin, G. J. (1974a). Information-theoretic computational com-
plexity, IEEE Transactions on Information Theory, IT-20, 10{
15.
Chaitin, G. J. (1974b). Information-theoretic limitations of for-
mal systems, Journal of the ACM, 21, 403{424.
Chaitin, G. J. (1975a). Randomness and mathematical proof,
Scientic American, 232 (5) (May 1975), 47{52. (Also published
in the French, Japanese, and Italian editions of Scientic Ameri-
can.)
Chaitin, G. J. (1975b). A theory of program size formally identi-
cal to information theory, Journal of the ACM, 22, 329{340.
Chaitin, G. J. (1977). Algorithmic information theory, IBM Jour-
nal of Research and Development, 21, 350{359, 496.
Chaitin, G. J., and Schwartz, J. T. (1978). A note on Monte
Carlo primality tests and algorithmic information theory, Com-
munications on Pure and Applied Mathematics, 31, 521{527.
Chaitin, G. J. (1979). Toward a mathematical denition of \life,"
in The Maximum Entropy Formalism, R. D. Levine and M. Tribus
(eds.), MIT Press, Cambridge, Massachusetts, pp. 477{498.
Chaitin, G. J. (1982). Algorithmic information theory, Encyclo-
pedia of Statistical Sciences, Vol. 1, Wiley, New York, pp. 38{41.
Cole, C. A., Wolfram, S., et al. (1981). SMP: a symbolic manip-
ulation program, California Institute of Technology, Pasadena,
California.
Courant, R., and Robbins, H. (1941). What is Mathematics?,
Oxford University Press, London.
128 Part II|Applications to Metamathematics
Davis, M., Matijasevi%c, Y., and Robinson, J. (1976). Hilbert's
tenth problem. Diophantine equations: positive aspects of a
negative solution, in Mathematical Developments Arising from
Hilbert Problems, Proceedings of Symposia in Pure Mathematics,
Vol. XXVII, American Mathematical Society, Providence, Rhode
Island, pp. 323{378.
Davis, M. (1978). What is a computation?, in Mathematics To-
day: Twelve Informal Essays, L. A. Steen (ed.), Springer-Verlag,
New York, pp. 241{267.
Dewar, R. B. K., Schonberg, E., and Schwartz, J. T. (1981).
Higher Level Programming: Introduction to the Use of the Set-
Theoretic Programming Language SETL, Courant Institute of
Mathematical Sciences, New York University, New York.
Eigen, M., and Winkler, R. (1981). Laws of the Game, Knopf,
New York.
Einstein, A. (1944). Remarks on Bertrand Russell's theory of
knowledge, in The Philosophy of Bertrand Russell, P. A. Schilpp
(ed.), Northwestern University, Evanston, Illinois, pp. 277{291.
Einstein, A. (1954). Ideas and Opinions, Crown, New York, pp.
18{24.
Feynman, A. (1965). The Character of Physical Law, MIT Press,
Cambridge, Massachusetts.
Gardner, M. (1979). The random number $ bids fair to hold the
mysteries of the universe, Mathematical Games Dept., Scientic
American, 241 (5) (November 1979), 20{34.
G
odel, K. (1964). Russell's mathematical logic, and What is Can-
tor's continuum problem?, in Philosophy of Mathematics, P. Be-
nacerraf and H. Putnam (eds.), Prentice-Hall, Englewood Clis,
New Jersey, pp. 211{232, 258{273.
Hofstadter, D. R. (1979). Godel, Escher, Bach: an Eternal
Golden Braid, Basic Books, New York.
Godel's Theorem and Information 129
Levin, M. (1974). Mathematical Logic for Computer Scientists,
MIT Project MAC report MAC TR-131, Cambridge, Massachu-
setts.
P
olya, G. (1959). Heuristic reasoning in the theory of numbers,
American Mathematical Monthly, 66, 375{384.
Post, E. (1965). Recursively enumerable sets of positive integers
and their decision problems, in The Undecidable: Basic Papers on
Undecidable Propositions, Unsolvable Problems and Computable
Functions, M. Davis (ed.), Raven Press, Hewlett, New York, pp.
305{337.
Russell, B. (1967). Mathematical logic as based on the theory of
types, in From Frege to Godel: A Source Book in Mathematical
Logic, 1879{1931, J. van Heijenoort (ed.), Harvard University
Press, Cambridge, Massachusetts, pp. 150{182.
Taub, A. H. (ed.) (1961). J. von Neumann|Collected Works,
Vol. I, Pergamon Press, New York, pp. 1{9.
von Neumann, J. (1956). The mathematician, in The World of
Mathematics, Vol. 4, J. R. Newman (ed.), Simon and Schuster,
New York, pp. 2053{2063.
von Neumann, J. (1963). The role of mathematics in the sciences
and in society, and Method in the physical sciences, in J. von
Neumann|Collected Works, Vol. VI, A. H. Taub (ed.), McMil-
lan, New York, pp. 477{498.
von Neumann, J. (1966). Theory of Self-Reproducing Automata,
A. W. Burks (ed.), University of Illinois Press, Urbana, Illinois.
Weyl, H. (1946). Mathematics and logic, American Mathematical
Monthly, 53, 1{13.
Weyl, H. (1949). Philosophy of Mathematics and Natural Science,
Princeton University Press, Princeton, New Jersey.
Wilf, H. S. (1982). The disk with the college education, American
Mathematical Monthly, 89, 4{8.
130 Part II|Applications to Metamathematics
G. J. Chaitin
IBM Research Division
Abstract
Complexity, non-predictability and randomness not only occur in quan-
tum mechanics and non-linear dynamics, they also occur in pure math-
ematics and shed new light on the limitations of the axiomatic method.
In particular, we discuss a Diophantine equation exhibiting randomness,
and how it yields a proof of Godel's incompleteness theorem.
Our view of the physical world has certainly changed radically during
the past hundred years, as unpredictability, randomness and complexity
have replaced the comfortable world of classical physics. Amazingly
enough, the same thing has occurred in the world of pure mathematics,
131
132 Part II|Applications to Metamathematics
in fact, in number theory, a branch of mathematics that is concerned
with the properties of the positive integers. How can an uncertainty
principle apply to number theory, which has been called the queen of
mathematics, and is a discipline that goes back to the ancient Greeks
and is concerned with such things as the primes and their properties?
Following Davis (1982), consider an equation of the form
P (x n y1 : : : ym) = 0
where P is a polynomial with integer coecients, and x n m y1 : : : ym
are positive integers. Here n is to be regarded as a parameter, and
for each value of n we are interested in the set Dn of those values of
x for which there exist y1 to ym such that P = 0. Thus a particular
polynomial P with integer coecients in m +2 variables serves to dene
a set Dn of values of x as a function of the choice of the parameter n.
The study of equations of this sort goes back to the ancient Greeks,
and the particular type of equation we have described is called a poly-
nomial Diophantine equation.
One of the most remarkable mathematical results of this century has
been the discovery that there is a \universal" polynomial P such that
by varying the parameter n, the corresponding set Dn of solutions that
is obtained can be any set of positive integers that can be generated
by a computer program. In particular, there is a value of n such that
the set of prime numbers is obtained. This immediately yields a prime-
generating polynomial
h i
x 1 ; (P (x n y1 : : : ym))2
whose set of positive values, as the values of x and y1 to ym vary over
all the positive integers, is precisely equal to the primes. This is a
remarkable result that surely would have amazed Fermat and Euler,
and it is obtained as a trivial corollary to a much more general theorem!
The proof that there is such a universal P may be regarded as
the culmination of G
odel's original proof of his famous incompleteness
theorem. In thinking about P , it is helpful to regard the parameter
n as the G
odel number of a computer program, and to regard the set
of solutions x as the output of this computer program, and to think
Randomness and Godel's Theorem 133
of the auxiliary variables y1 to ym as a kind of multidimensional time
variable. In other words,
P (x n y1 : : : ym) = 0
if and only if the nth computer program outputs the positive integer x
at time (y1 : : : ym).
Let us prove G
odel's incompleteness theorem by making use of this
universal polynomial P and Cantor's famous diagonal method, which
Cantor originally used to prove that the real numbers are more nu-
merous than the integers. Recall that Dn denotes the set of positive
integers x for which there exist positive integers y1 to ym such that
P = 0. I.e.,
Dn = fxj(9y1 : : : ym) P (x n y1 : : : ym) = 0]g :
Consider the \diagonal" set
V = fnjn 62 Dn g
of all those positive integers n that are not contained in the correspond-
ing set Dn . It is easy to see that V cannot be generated by a computer
program, because V diers from the set generated by the nth computer
program regarding the membership of n. It follows that there can be
no algorithm for deciding, given n, whether or not the equation
P (n n y1 : : : ym) = 0
has a solution. And if there cannot be an algorithm for deciding if
this equation has a solution, no xed system of axioms and rules of
inference can permit one to prove whether or not it has a solution. For
if there were a formal axiomatic theory for proving whether or not there
is a solution, given any particular value of n one could in principle use
this formal theory to decide if there is a solution, by searching through
all possible proofs within the formal theory in size order, until a proof
is found one way or another. It follows that no single set of axioms
and rules of inference suce to enable one to prove whether or not a
polynomial Diophantine equation has a solution. This is a version of
G
odel's incompleteness theorem.
134 Part II|Applications to Metamathematics
What does this have to do with randomness, uncertainty and un-
predictability? The point is that the solvability or unsolvability of the
equation
P (n n y1 : : : ym) = 0
in positive integers is in a sense mathematically uncertain and jumps
around unpredictably as the parameter n varies. In fact, it is possible
to construct another polynomial P 0 with integer coecients for which
the situation is much more dramatic.
Instead of asking whether P 0 = 0 can be solved, consider the ques-
tion of whether or not there are innitely many solutions. Let Dn0 be
the set of positive integers x such that
P 0(x n y1 : : : ym) = 0
has a solution. P 0 has the remarkable property that the truth or falsity
of the assertion that the set Dn0 is innite, is completely random. In-
deed, this innite sequence of true/false values is indistinguishable from
the result of successive independent tosses of an unbiased coin. In other
words, the truth or falsity of each of these assertions is an independent
mathematical fact with probability one-half! These independent facts
cannot be compressed into a smaller amount of information, i.e., they
are irreducible mathematical information. In order to be able to prove
whether or not Dn0 is innite for the rst k values of the parameter
n, one needs at least k bits of axioms and rules of inference, i.e., the
formal theory must be based on at least k independent choices between
equally likely alternative assumptions. In other words, a system of
axioms and rules of inference, considered as a computer program for
generating theorems, must be at least k bits in size if it enables one to
prove whether or not Dn0 is innite for n = 1 2 3 : : : k.
This is a dramatic extension of G
odel's theorem. Number theory,
the queen of mathematics, is infected with uncertainty and random-
ness! Simple properties of Diophantine equations escape the power of
any particular formal axiomatic theory! To mathematicians, accus-
tomed as they often are to believe that mathematics oers absolute
certainty, this may appear to be a serious blow. Mathematicians of-
ten deride the non-rigorous reasoning used by physicists, but perhaps
they have something to learn from them. Physicists know that new
Randomness and Godel's Theorem 135
experiments, new domains of experience, often require fundamentally
new physical principles. They have a more pragmatic attitude to truth
than mathematicians do. Perhaps mathematicians should acquire some
of this exibility from their colleagues in the physical sciences!
Appendix
Let me say a few words about where P 0 comes from. P 0 is closely
related to the fascinating random real number which I like to call $. $
is dened to be the halting probability of a universal Turing machine
when its program is chosen by coin tossing, more precisely, when a
program n bits in size has probability 2;n see Gardner (1979)]. One
could in principle try running larger and larger programs for longer
and longer amounts of time on the universal Turing machine. Thus if a
program ever halts, one would eventually discover this if the program
is n bits in size, this would contribute 2;n more to the total halting
probability $. Hence $ can be obtained as the limit from below of a
computable sequence r1 r2 r3 of rational numbers:
$ = klim r
!1 k
this sequence converges very slowly, in fact, in a certain sense, as slowly
as possible. The polynomial P 0 is constructed from the sequence rk by
using the theorem that \a set of tuples of positive integers is Diophan-
tine if and only if it is recursively enumerable" see Davis (1982)]: the
equation
P 0(k n y1 : : : ym) = 0
has a solution if and only if the nth bit of the base-two expansion of rk
is a \1". Thus Dn0 , the set of x such that
P 0(x n y1 : : : ym) = 0
has a solution, is innite if and only if the nth bit of the base-two
expansion of $ is a \1". Knowing whether or not Dn0 is innite for
n = 1 2 3 : : : k is therefore equivalent to knowing the rst k bits of
$.
136 Part II|Applications to Metamathematics
References
G. J. Chaitin (1975), \Randomness and mathematical proof,"
Scientic American 232 (5), pp. 47{52.
M. Davis (1978), \What is a computation?", Mathematics Today:
Twelve Informal Essays, L. A. Steen, Springer-Verlag, New York,
pp. 241{267.
D. R. Hofstadter (1979), Godel, Escher, Bach: an Eternal Golden
Braid, Basic Books, New York.
M. Gardner (1979), \The random number $ bids fair to hold the
mysteries of the universe," Mathematical Games Dept., Scientic
American 241 (5), pp. 20{34.
G. J. Chaitin (1982), \G
odel's theorem and information," Inter-
national Journal of Theoretical Physics 22, pp. 941{954.
M. Davis (1982), \Hilbert's Tenth Problem is Unsolvable," Com-
putability & Unsolvability, Dover, New York, pp. 199{235.
AN ALGEBRAIC
EQUATION FOR THE
HALTING PROBABILITY
In R. Herken, The Universal Turing Ma-
chine, Oxford University Press, 1988, pp.
279{283
Gregory J. Chaitin
Abstract
We outline our construction of a single equation involving only addi-
tion, multiplication, and exponentiation of non-negative integer con-
stants and variables with the following remarkable property. One of
the variables is considered to be a parameter. Take the parameter to
be 0 1 2 : : : obtaining an innite series of equations from the original
one. Consider the question of whether each of the derived equations has
nitely or innitely many non-negative integer solutions. The original
equation is constructed in such a manner that the answers to these ques-
137
138 Part II|Applications to Metamathematics
tions about the derived equations are independent mathematical facts
that cannot be compressed into any nite set of axioms. To produce
this equation, we start with a universal Turing machine in the form of
the Lisp universal function Eval written as a register machine program
about 300 lines long. Then we \compile" this register machine program
into a universal exponential Diophantine equation. The resulting equa-
tion is about 200 pages long and has about 17,000 variables. Finally, we
substitute for the program variable in the universal Diophantine equa-
tion the Godel number of a Lisp program for $, the halting probability
of a universal Turing machine if n-bit programs have measure 2;n . Full
details appear in a book.1
More than half a century has passed since the famous papers of G
odel
(1931) and Turing (1936) that shed so much light on the foundations
of mathematics, and that simultaneously promulgated mathematical
formalisms for specifying algorithms, in one case via primitive recursive
function denitions, and in the other case via Turing machines. The
development of computer hardware and software technology during this
period has been phenomenal, and as a result we now know much better
how to do the high-level functional programming of G
odel, and how
to do the low-level machine language programming found in Turing's
paper. And we can actually run our programs on machines and debug
them, which G
odel and Turing could not do.
I believe that the best way to actually program a universal Tur-
ing machine is John McCarthy's universal function Eval. In 1960
McCarthy proposed Lisp as a new mathematical foundation for the
theory of computation (McCarthy 1960). But by a quirk of fate Lisp
has largely been ignored by theoreticians and has instead become the
standard programming language for work on articial intelligence. I
believe that pure Lisp is in precisely the same role in computational
mathematics that set theory is in theoretical mathematics, in that it
This article is the introduction of the book G. J. Chaitin, Algorithmic Infor-
1
mation Theory, copyright c 1987 by Cambridge University Press, and is reprinted
by permission.
An Algebraic Equation for the Halting Probability 139
provides a beautifully elegant and extremely powerful formalism which
enables concepts such as that of numbers and functions to be dened
from a handful of more primitive notions.
Simultaneously there have been profound theoretical advances.
G
odel and Turing's fundamental undecidable proposition, the ques-
tion of whether an algorithm ever halts, is equivalent to the question
of whether it ever produces any output. In another paper (Chaitin
1987a) I have shown that much more devastating undecidable proposi-
tions arise if one asks whether an algorithm produces an innite amount
of output or not.
G
odel expended much eort to express his undecidable proposition
as an arithmetical fact. Here too there has been considerable progress.
In my opinion the most beautiful proof is the recent one of Jones and
Matijasevi%c (1984), based on three simple ideas:
1. the observation that 110 = 1, 111 = 11, 112 = 121, 113 = 1331,
114 = 14641 reproduces Pascal's triangle, makes it possible to
express binomial coecients as the digits of powers of 11 written
in high enough bases
2. an appreciation of E. Lucas's hundred-year-old
n remarkable theo-
rem that the binomial coecient k is odd if and only if each
bit in the base-two numeral for k implies the corresponding bit
in the base-two numeral for n
3. the idea of using register machines rather than Turing machines,
and of encoding computational histories via variables which are
vectors giving the contents of a register as a function of time.
Their work gives a simple straight-forward proof, using almost no
number theory, that there is an exponential Diophantine equation with
one parameter p which has a solution if and only if the pth computer
program (i.e., the program with G
odel number p) ever halts. Similarly,
one can use their method to arithmetize my undecidable proposition.
The result is an exponential Diophantine equation with the parameter
n and the property that it has innitely many solutions if and only if the
nth bit of $ is a 1. Here $ is the halting probability of a universal Turing
machine if an n-bit program has measure 2;n (Chaitin 1986a, 1986b).
140 Part II|Applications to Metamathematics
$ is an algorithmically random real number in the sense that the rst
N bits of the base-two expansion of $ cannot be compressed into a
program shorter than N bits, from which it follows that the successive
bits of $ cannot be distinguished from the result of independent tosses
of a fair coin. It can also be shown that an N -bit program cannot
calculate the positions and values of more than N scattered bits of $,
not just the rst N bits (Chaitin 1987a). This implies that there are
exponential Diophantine equations with one parameter n which have
the property that no formal axiomatic theory can enable one to settle
whether the number of solutions of the equation is nite or innite for
more than a nite number of values of the parameter n.
What is gained by asking if there are innitely many solutions rather
than whether or not a solution exists? The question of whether or
not an exponential Diophantine equation has a solution is in general
undecidable, but the answers to such questions are not independent.
Indeed, if one considers such an equation with one parameter k, and
asks whether or not there is a solution for k = 0 1 2 : : : N ; 1, the
N answers to these N questions really only constitute log2 N bits of
information. The reason for this is that we can in principle determine
which equations have a solution if we know how many of them are
solvable, for the set of solutions and of solvable equations is r.e. On
the other hand, if we ask whether the number of solutions is nite
or innite, then the answers can be independent, if the equation is
constructed properly.
In view of the philosophical impact of exhibiting an algebraic equa-
tion with the property that the number of solutions jumps from nite
to innite at random as a parameter is varied, I have taken the trouble
of explicitly carrying out the construction outlined by Jones and Mati-
jasevi%c. That is to say, I have encoded the halting probability $ into an
exponential Diophantine equation. To be able to actually do this, one
has to start with a program for calculating $, and the only language I
can think of in which actually writing such a program would not be an
excruciating task is pure Lisp. It is in fact necessary to go beyond the
ideas of McCarthy in three fundamental ways:
1. First of all, we simplify Lisp by only allowing atoms to be one
character long. (This is similar to McCarthy's \linear Lisp.")
An Algebraic Equation for the Halting Probability 141
2. Secondly, Eval must not lose control by going into an innite
loop. In other words, we need a safe Eval that can execute
garbage for a limited amount of time, and always results in an
error message or a valid value of an expression. This is similar
to the notion in modern operating systems that the supervisor
should be able to give a user task a time slice of Cpu, and that
the supervisor should not abort if the user task has an abnormal
error termination.
3. Lastly, in order to program such a safe time-limited Eval, it
greatly simplies matters if we stipulate \permissive" Lisp se-
mantics with the property that the only way a syntactically valid
Lisp expression can fail to have a value is if it loops forever. Thus,
for example, the head (Car) and tail (Cdr) of an atom is dened
to be the atom itself, and the value of an unbound variable is the
variable.
Proceeding in this spirit, we have dened a class of abstract com-
puters which, as in Jones and Matijasevi%c's treatment, are register ma-
chines. However, our machine's nite set of registers each contain a
Lisp S-expression in the form of a character string with balanced left
and right parentheses to delimit the list structure. And we use a small
set of machine instructions, instructions for testing, moving, erasing,
and setting one character at a time. In order to be able to use subrou-
tines more eectively, we have also added an instruction for jumping
to a subroutine after putting into a register the return address, and an
indirect branch instruction for returning to the address contained in a
register. The complete register machine program for a safe time-limited
Lisp universal function (interpreter) Eval is about 300 instructions
long. To test this Lisp interpreter written for an abstract machine,
we have written in 370 machine language a register machine simulator.
We have also rewritten this Lisp interpreter directly in 370 machine
language, representing Lisp S-expressions by binary trees of pointers
rather than as character strings, in the standard manner used in prac-
tical Lisp implementations. We have then run a large suite of tests
through the very slow interpreter on the simulated register machine,
and also through the extremely fast 370 machine language interpreter,
142 Part II|Applications to Metamathematics
in order to make sure that identical results are produced by both im-
plementations of the Lisp interpreter.
Our version of pure Lisp also has the property that in it we can write
a short program to calculate $ in the limit from below. The program
for calculating $ is only a few pages long, and by running it (on the
370 directly, not on the register machine!), we have obtained a lower
bound of 127=128-ths for the particular denition of $ we have chosen,
which depends on our choice of a self-delimiting universal computer.
The nal step was to write a compiler that compiles a register ma-
chine program into an exponential Diophantine equation. This com-
piler consists of about 700 lines of code in a very nice and easy to
use programming language invented by Mike Cowlishaw called Rexx
(Cowlishaw 1985). Rexx is a pattern-matching string processing lan-
guage which is implemented by means of a very ecient interpreter.
It takes the compiler only a few minutes to convert the 300-line Lisp
interpreter into a 200-page 17,000-variable universal exponential Dio-
phantine equation. The resulting equation is a little large, but the ideas
used to produce it are simple and few, and the equation results from
the straight-forward application of these ideas.
I have published the details of this adventure (but not the full equa-
tion!) as a book (Chaitin 1987b). My hope is that this book will con-
vince mathematicians that randomness not only occurs in non-linear
dynamics and quantum mechanics, but that it even happens in rather
elementary branches of number theory.
References
Chaitin, G.J.
1986a Randomness and G
odel's theorem. Mondes en Developpe-
ment No. 54{55 (1986) 125{128.
1986b Information-theoretic computational complexity and Godel's
theorem and information. In: New Directions in the Philos-
ophy of Mathematics, ed. T. Tymoczko. Boston: Birkh
auser
(1986).
An Algebraic Equation for the Halting Probability 143
1987a Incompleteness theorems for random reals. Adv. Appl. Math.
8 (1987) 119{146.
1987b Algorithmic Information Theory. Cambridge, England:
Cambridge University Press (1987).
Cowlishaw, M.F.
1985 The REXX Language. Englewood Clis, NJ: Prentice-Hall
(1985).
G
odel, K.
1931 On formally undecidable propositions of Principia mathe-
matica and related systems I. In: Kurt Godel: Collected
Works, Volume I: Publications 1929{1936, ed. S. Feferman.
New York: Oxford University Press (1986).
Jones, J.P., and Y.V. Matijasevi%c
1984 Register machine proof of the theorem on exponential Dio-
phantine representation of enumerable sets. J. Symb. Log.
49 (1984) 818{829.
McCarthy, J.
1960 Recursive functions of symbolic expressions and their com-
putation by machine, Part I. ACM Comm. 3 (1960) 184{
195.
Turing, A.M.
1936 On computable numbers, with an application to the Ent-
scheidungsproblem. P. Lond. Math. Soc. (2) 42 (1936)
230{265 with a correction, Ibid. (2) 43 (1936-7) 544{546
reprinted in: The Undecidable, ed. M. Davis. Hewlett, NY:
Raven Press (1965).
144 Part II|Applications to Metamathematics
COMPUTING THE BUSY
BEAVER FUNCTION
In T. M. Cover and B. Gopinath, Open
Problems in Communication and Compu-
tation, Springer, 1987, pp. 108{112
Gregory J. Chaitin
IBM Research Division, P.O. Box 218
Yorktown Heights, NY 10598, U.S.A.
Abstract
Eorts to calculate values of the noncomputable Busy Beaver function
are discussed in the light of algorithmic information theory.
I would like to talk about some impossible problems that arise when one
combines information theory with recursive function or computability
theory. That is to say, I'd like to look at some unsolvable problems
which arise when one examines computation unlimited by any practical
145
146 Part II|Applications to Metamathematics
bound on running time, from the point of view of information theory.
The result is what I like to call \algorithmic information theory" 5].
In the Computer Recreations department of a recent issue of Sci-
entic American 7], A. K. Dewdney discusses eorts to calculate the
Busy Beaver function *. This is a very interesting endeavor for a num-
ber of reasons.
First of all, the Busy Beaver function is of interest to information
theorists, because it measures the capability of computer programs as a
function of their size, as a function of the amount of information which
they contain. *(n) is dened to be the largest number which can be
computed by an n-state Turing machine to information theorists it
is clear that the correct measure is bits, not states. Thus it is more
correct to dene *(n) as the largest natural number whose program-
size complexity or algorithmic information content is less than or equal
to n. Of course, the use of states has made it easier and a denite and
fun problem to calculate values of *(number of states) to deal with
*(number of bits) one would need a model of a binary computer as
simple and compelling as the Turing machine model, and no obvious
natural choice is at hand.
Perhaps the most fascinating aspect of Dewdney's discussion is that
it describes successful attempts to calculate the initial values *(1),
*(2), *(3) : : : of an uncomputable function *. Not only is * uncom-
putable, but it grows faster than any computable function can. In fact,
it is not dicult to see that *(n) is greater than the computable func-
tion f (n) as soon as n is greater than (the program-size complexity
or algorithmic information content of f ) + O(1). Indeed, to compute
f (n) + 1 it is sucient to know (a minimum-size program for f ), and
the value of the integer (n ; the program-size complexity of f ). Thus
the program-size complexity of f (n) + 1 is (the program-size com-
plexity of f ) + O(log jn ; the program-size complexity of f j), which is
< n if n is greater than O(1)+the program-size complexity of f . Hence
f (n) + 1 is included in *(n), that is, *(n) f (n) + 1, if n is greater
than O(1) + the program-size complexity of f .
Yet another reason for interest in the Busy Beaver function is that,
when properly dened in terms of bits, it immediately provides an
information-theoretic proof of an extremely fundamental fact of recur-
sive function theory, namely Turing's theorem that the halting problem
Computing the Busy Beaver Function 147
is unsolvable 2]. Turing's original proof involves the notion of a com-
putable real number, and the observation that it cannot be decided
whether or not the nth computer program ever outputs an nth digit,
because otherwise one could carry out Cantor's diagonal construction
and calculate a paradoxical real number whose nth digit is chosen to
dier from the nth digit output by the nth program, and which there-
fore cannot actually be a computable real number after all. To use the
noncomputability of * to demonstrate the unsolvability of the halting
problem, it suces to note that in principle, if one were very patient,
one could calculate *(n) by checking each program of size less than or
equal to n to determine whether or not it halts, and then running each
of the programs which halt to determine what their output is, and then
taking the largest output. Contrariwise, if * were computable, then it
would provide a solution to the halting problem, for an n-bit program
either halts in time less than *(n + O(1)), or else it never halts.
The Busy Beaver function is also of considerable metamathematical
interest in principle it would be extremely useful to know larger values
of *(n). For example, this would enable one to settle the Goldbach
conjecture and the Riemann hypothesis, and in fact any conjecture such
as Fermat's which can be refuted by a numerical counterexample. Let P
be a computable predicate of a natural number, so that for any specic
natural number n it is possible to compute in a mechanical fashion
whether or not P (n), P of n, is true or false, that is, to determine
whether or not the natural number n has property P . How could one
use the Busy Beaver function to decide if the conjecture that P is true
for all natural numbers is correct? An experimental approach is to
use a fast computer to check whether or not P is true, say for the
rst billion natural numbers. To convert this empirical approach into a
proof, it would suce to have a bound on how far it is necessary to test
P before settling the conjecture in the armative if no counterexample
has been found, and of course rejecting it if one was discovered. *
provides this bound, for if P has program-size complexity or algorithmic
information content k, then it suces to examine the rst *(k + O(1))
natural numbers to decide whether or not P is always true. Note that
the program-size complexity or algorithmic information content of a
famous conjecture P is usually quite small it is hard to get excited
about a conjecture that takes a hundred pages to state.
148 Part II|Applications to Metamathematics
For all these reasons, it is really quite fascinating to contemplate
the successful eorts which have been made to calculate some of the
initial values of *(n). In a sense these eorts simultaneously penetrate
to \mathematical bedrock" and are \storming the heavens," to use
images of E. T. Bell. They amount to a systematic eort to settle all
nitely refutable mathematical conjectures, that is, to determine all
constructive mathematical truth. And these eorts y in the face of
fundamental information-theoretic limitations on the axiomatic method
1,2,6], which amount to an information-theoretic version of G
odel's
famous incompleteness theorem 3].
Here is the Busy Beaver version of G
odel's incompleteness theorem:
n bits of axioms and rules of inference cannot enable one to prove what
is the value of *(k) for any k greater than n + O(1). The proof of
this fact is along the lines of the Berry paradox. Contrariwise, there
is an n-bit axiom which does enable one to demonstrate what is the
value of *(k) for any k less than n ; O(1). To get such an axiom,
one either asks God for the number of programs less than n bits in
size which halt, or one asks God for a specic n-bit program which
halts and has the maximum possible running time or the maximum
possible output before halting. Equivalently, the divine revelation is a
conjecture 8k P (k) (with P of program-size complexity or algorithmic
information content n) which is false and for which (the smallest
counterexample i with :P (i)) is as large as possible. Such an axiom
would pack quite a wallop, but only in principle, because it would take
about *(n) steps to deduce from it whether or not a specic program
halts and whether or not a specic mathematical conjecture is true for
all natural numbers.
These considerations involving the Busy Beaver function are closely
related to another fascinating noncomputable object, the halting prob-
ability of a universal Turing machine on random input, which I like to
call $, and which is the subject of an essay by my colleague Charles
Bennett that was published in the Mathematical Games department of
Scientic American some years ago 4].
Computing the Busy Beaver Function 149
References
1] G. J. Chaitin, \Randomness and mathematical proof," Scientic
American 232, No. 5 (May 1975), 47{52.
2] M. Davis, \What is a computation?" in Mathematics Today:
Twelve Informal Essays, L. A. Steen (ed.), Springer-Verlag, New
York, 1978, 241{267.
3] D. R. Hofstadter, Godel, Escher, Bach: an Eternal Golden Braid,
Basic Books, New York, 1979.
4] M. Gardner, \The random number $ bids fair to hold the myster-
ies of the universe," Mathematical Games Dept., Scientic Amer-
ican 241, No. 5 (Nov. 1979), 20{34.
5] G. J. Chaitin, \Algorithmic information theory," in Encyclopedia
of Statistical Sciences, Volume 1, Wiley, New York, 1982, 38{41.
6] G. J. Chaitin, \G
odel's theorem and information," International
Journal of Theoretical Physics 22 (1982), 941{954.
7] A. K. Dewdney, \A computer trap for the busy beaver, the
hardest-working Turing machine," Computer Recreations Dept.,
Scientic American 251, No. 2 (Aug. 1984), 19{23.
150 Part II|Applications to Metamathematics
Part III
Applications to Biology
151
TO A MATHEMATICAL
DEFINITION OF \LIFE"
ACM SICACT News, No. 4
(January 1970), pp. 12{18
G. J. Chaitin
Abstract
\Life" and its \evolution" are fundamental concepts that have not yet
been formulated in precise mathematical terms, although some eorts
in this direction have been made. We suggest a possible point of de-
parture for a mathematical denition of \life." This denition is based
on the computer and is closely related to recent analyses of \inductive
inference" and \randomness." A living being is a unity it is simpler
to view a living organism as a whole than as the sum of its parts. If
we want to compute a complete description of the region of space-time
that is a living being, the program will be smaller in size if the calcula-
tion is done all together, than if it is done by independently calculating
descriptions of parts of the region and then putting them together.
153
154 Part III|Applications to Biology
1. The Problem
\Life" and its \evolution" from the lifeless are fundamental concepts of
science. According to Darwin and his followers, we can expect living
organisms to evolve under very general conditions. Yet this theory
has never been formulated in precise mathematical terms. Supposing
Darwin is right, it should be possible to formulate a general denition
of \life" and to prove that under certain conditions we can expect it
to \evolve." If mathematics can be made out of Darwin, then we will
have added something basic to mathematics while if it cannot, then
Darwin must be wrong, and life remains a miracle which has not been
explained by science.
The point is that the view that life has spontaneously evolved, and
the very concept of life itself, are very general concepts, which it should
be possible to study without getting involved in, for example, the de-
tails of quantum chemistry. We can idealize the laws of physics and
simplify them and make them complete, and then study the resulting
universe. It is necessary to do two things in order to study the evolution
of life within our model universe. First of all, we must dene \life," we
must characterize a living organism in a precise fashion. At the same
time it should become clear what the complexity of an organism is, and
how to distinguish primitive forms of life from advanced forms. Then
we must study our universe in the light of the denition. Will an evo-
lutionary process occur? What is the expected time for a certain level
of complexity to be reached? Or can we show that life will probably
not evolve?
2. Previous Work
Von Neumann devoted much attention to the analysis of fundamental
biological questions from a mathematical point of view.1 He considered
1See in particular his fth lecture delivered at the University of Illinois in De-
cember of 1949, \Re-evaluation of the problem of complicated automata|Problems
of hierarchy and evolution," and his unnished The Theory of Automata: Con-
struction, Reproduction, Homogeneity. Both are posthumously published in von
Neumann (1966).
To a Mathematical Denition of \Life" 155
a universe consisting of an innite plane divided into squares. Time
is quantized, and at any moment each square is in one of 29 states.
The state of a square at any time depends only on its previous state
and the previous states of its four neighboring squares. The universe
is homogeneous the state transitions of all squares are governed by
the same law. It is a deterministic universe. Von Neumann showed
that a self-reproducing general-purpose computer can exist in his model
universe.
A large amount of work on these questions has been done since von
Neumann's initial investigations, and a complete bibliography would
be quite lengthy. We may mention Moore (1962), Arbib (1966,1967),
and Codd (1968).
The point of departure of all this work has been the identication of
\life" with \self-reproduction," and this identication has both helped
and hindered. It has helped, because it has not allowed fundamental
conceptual diculties to tie up work, but has instead permitted much
that is very interesting to be accomplished. But it has hindered be-
cause, in the end, these fundamental diculties must be faced. At
present the problem has evidenced itself as a question of \good taste."
As von Neumann remarks,2 good taste is required in building one's
universe. If its elementary parts are assumed to be very powerful, self-
reproduction is immediate. Arbib (1966) is an intermediate case.
What is the relation between self-reproduction and life? A man may
be sterile, but no one would doubt he is alive. Children are not identical
to their parents. Self-reproduction is not exact if it were, evolution
would be impossible. What's more, a crystal reproduces itself, yet we
would not consider it to have much life. As von Neumann comments,3
the matter is the other way around. We can deduce self-reproduction
as a property which must be possessed by many living beings, if we ask
ourselves what kinds of living beings are likely to be around. Obviously,
a species that did not reproduce would die out. Thus, if we ask what
kinds of living organisms are likely to evolve, we can draw conclusions
concerning self-reproduction.
2 See pages 76{77 of von Neumann (1966).
3 See page 78 of von Neumann (1966).
156 Part III|Applications to Biology
3. Simplicity and Complexity
\Complexity" is a concept whose importance and vagueness von Neu-
mann emphasized many times in his lectures.4 Due to the work of
Solomono, Kolmogorov, Chaitin, Martin-L
of, Willis, and Loveland,
we now understand this concept a great deal better than it was un-
derstood while von Neumann worked. Obviously, to understand the
evolution of the complexity of living beings from primitive, simple life
to today's very complex organisms, we need to make precise a mea-
sure of complexity. But it also seems that perhaps a precise concept
of complexity will enable us to dene \living organism" in an exact
and general fashion. Before suggesting the manner in which this may
perhaps be done, we shall review the recent developments which have
converted \simplicity" and \complexity" into precise concepts.
We start by summarizing Solomono's work.5 Solomono proposes
the following model of the predicament of the scientist. A scientist is
continually observing increasingly larger initial segments of an innite
sequence of 0's and 1's. This is his experimental data. He tries to
nd computer programs which compute innite binary sequences which
begin with the observed sequence. These are his theories. In order
to predict his future observations, he could use any of the theories.
But there will always be one theory that predicts that all succeeding
observations will be 1's, as well as others that take more account of the
previous observations. Which of the innitely many theories should he
use to make the prediction? According to Solomono, the principle
that the simplest theory is the best should guide him.6 What is the
simplicity of a theory in the present context? It is the size of the
computer program. Larger computer programs embody more complex
theories, and smaller programs embody simpler theories.
Willis has further studied the above proposal, and also has intro-
duced the idea of a hierarchy of nite approximations to it. To my
4 See especially pages 78{80 of von Neumann (1966).
5 The earliest generally available appearance in print of Solomono's ideas of
which we are aware is Minsky's summary of them on pages 41{43 of Minsky (1962).
A more recent reference is Solomono (1964).
6 Solomono actually proposes weighing together all the theories into the predic-
tion, giving the simplest theories the largest weight.
To a Mathematical Denition of \Life" 157
knowledge, however, the success which predictions made on this basis
will have has not been made completely clear.
We must discuss a more technical aspect of Solomono's work. He
realized that the simplicity of theories, and thus also the predictions,
will depend on the computer which one is using. Let us consider only
computers whose programs are nite binary sequences, and measure
the size of a binary sequence by its length. Let us denote by C (T ) the
complexity of a theory T . By denition, C (T ) is the size of the smallest
program which makes our computer C compute T . Solomono showed
that there are \optimal" binary computers C that have the property
that for any other binary computer C 0, C (T ) C 0(T ) + d, for all T .
Here d is a constant that depends on C and C 0, not on T . Thus,
these are the most ecient binary computers, for their programs are
shortest. Any two of these optimal binary computers C1 and C2 result
in almost the same complexity measure, for from C1(T ) C2(T ) + d12
and C2(T ) C1(T ) + d21, it follows that the dierence between C1(T )
and C2(T ) is bounded. The optimal binary computers are transparent
theoretically, they are enormously convenient from the technical point
of view. What's more, their optimality makes them a very natural
choice.7 Kolmogorov and Chaitin later independently hit upon the
same kind of computer in their search for a suitable computer upon
which to base a denition of \randomness."
However, the naturalness and technical convenience of the Solo-
mono approach should not blind us to the fact that it is by no means
the only possible one. Chaitin rst based his denition of randomness
on Turing machines, taking as the complexity measure the number
of states in the machine, and he later used bounded-transfer Turing
machines. Although these computers are quite dierent, they lead to
similar denitions of randomness. Later it became clear that using
the usual 3-tape-symbol Turing machine and taking its size to be the
number of states leads to a complexity measure C3(T ) which is asymp-
totically just a Solomono measure C (T ) with its scale changed: C (T )
is asymptotic to 2C3(T ) log2 C3(T ). It appears that people interested
in computers may still study other complexity measures, but to apply
7 Solomono's approach to the size of programs has been extended in Chaitin
(1969a) to the speed of programs.
158 Part III|Applications to Biology
these concepts of simplicity/complexity it is at present most convenient
to use Solomono measures.
We now turn to Kolmogorov's and Chaitin's proposed denition of
randomness or patternlessness. Let us consider once more the scientist
confronted by experimental data, a long binary sequence. This time
he in not interested in predicting future observations, but only in de-
termining if there is a pattern in his observations, if there is a simple
theory that explains them. If he found a way of compressing his ob-
servations into a short computer program which makes the computer
calculate them, he would say that the sequence follows a law, that it
has pattern. But if there is no short program, then the sequence has no
pattern|it is random. That is to say, the complexity C (S ) of a nite
binary sequence S is the size of the smallest program which makes the
computer calculate it. Those binary sequences S of a given length n
for which C (S ) is greatest are the most complex binary sequences of
length n, the random or patternless ones. This is a general formulation
of the denition. If we use one of Solomono's optimal binary com-
puters, this denition becomes even clearer. Most binary sequences
of any given length n require programs of about length n. These are
the patternless or random sequences. Those binary sequences which
can be compressed into programs appreciably shorter than themselves
are the sequences which have pattern. Chaitin and Martin-L
of have
studied the statistical properties of these sequences, and Loveland has
compared several variants of the denition.
This completes our summary of the new rigorous meaning which
has been given to simplicity/complexity|the complexity of something
is the size of the smallest program which computes it or a complete
description of it. Simpler things require smaller programs. We have
emphasized here the relation between these concepts and the philos-
ophy of the scientic method. In the theory of computing the word
\complexity" is usually applied to the speed of programs or the amount
of auxiliary storage they need for scratch-work. These are completely
dierent meanings of complexity. When one speaks of a simple scien-
tic theory, one refers to the fact that few arbitrary choices have been
made in specifying the theoretical structure, not to the rapidity with
which predictions can be made.
To a Mathematical Denition of \Life" 159
4. What is Life?
Let us once again consider a scientist in a hypothetical situation. He
wishes to understand a universe very dierent from his own which he
has been observing. As he observes it, he comes eventually to distin-
guish certain objects. These are highly interdependent regions of the
universe he is observing, so much so, that he comes to view them as
wholes. Unlike a gas, which consists of independent particles that do
not interact, these regions of the universe are unities, and for this reason
he has distinguished them as single entities.
We believe that the most fundamental property of living organisms
is the enormous interdependence between their components. A living
being is a unity it is much simpler to view it as a whole than as
the sum of parts. That is to say, if we want to compute a complete
description of a region of space-time that is a living being, the program
will be smaller in size if the calculation is done all together, than if it
is done by independently calculating descriptions of parts of the region
and then putting them together. What is the complexity of a living
being, how can we distinguish primitive life from complex forms? The
interdependence in a primitive unicellular organism is far less than that
in a human being.
A living being is indeed a unity. All the atoms in it cooperate and
work together. If Mr. Smith is afraid of missing the train to his oce,
all his incredibly many molecules, all his organs, all his cells, will be
cooperating so that he nishes breakfast quickly and runs to the train
station. If you cut the leg of an animal, all of it will cooperate to escape
from you, or to attack you and scare you away, in order to protect its
leg. Later the wound will heal. How dierent from what happens if you
cut the leg of a table. The whole table will neither come to the defense
of its leg, nor will it help it to heal. In the more intelligent living
creatures, there is also a very great deal of interdependence between
an animal's past experience and its present behavior that is to say,
it learns, its behavior changes with time depending on its experiences.
Such enormous interdependence must be a monstrously rare occurrence
in a universe, unless it has evolved gradually.
In summary, the case is the whole versus the sum of its parts. If
both are equally complex, the parts are independent (do not interact).
160 Part III|Applications to Biology
If the whole is very much simpler than the sum of its parts, we have
the interdependence that characterizes a living being.8 Note nally
that we have introduced something new into the study of the size of
programs (= complexity). Before we compared the sizes of programs
that calculate dierent things. Now we are interested in comparing
the sizes of programs that calculate the same things in dierent ways.
That is to say, the method by which a calculation is done is now of
importance to us in the previous section it was not.
5. Numerical Examples
In this paper, unfortunately, we can only suggest a possible point of
departure for a mathematical denition of life. A great amount of
work must be done it is not even clear what is the formal mathemat-
ical counterpart to the informal denition of the previous section. A
possibility is sketched here.
Consider a computer C1 which accepts programs P which are binary
sequences consisting of a number of subsequences B C P1 : : : Pk A.
B , the leftmost subsequence, is a program for breaking the remain-
der of P into C P1 : : : Pk , and A. B is self-describing it starts with a
binary sequence which results from writing the length of B in base-two
notation, doubling each of its bits, and then placing a pair of unequal
bits at the right end. Also, B is not allowed to see whether any of the
remaining bits of P are 0's or 1's, only to separate them into groups.9
C is the description of a computer C2. For example, C2 could be
one of Solomono's optimal binary computers, or a computer which
emits the program without processing it.
P1 : : : Pk are programs which are processed by k dierent copies of
the computer C2. R1 : : : Rk are the resulting outputs. These outputs
would be regions of space-time, a space-time which, like von Neuman-
n's, has been cut up into little cubes with a nite number of states.
A is a program for adding together R1 : : : Rk to produce R, a single
region of space-time. A merely juxtaposes the intermediate results
8 The whole cannot be more complex than the sum of its parts, because one of
the ways of looking at it is as the sum of its parts, and this bounds its complexity.
9 The awkwardness of this part of the denition is apparently its chief defect.
To a Mathematical Denition of \Life" 161
R1 : : : Rk (perhaps with some overlapping) it is not allowed to change
any of the intermediate results. In the examples below, we shall only
compute regions R which are one-dimensional strings of 0's and 1's, so
that A need only indicate that R is the concatenation of R1 : : : Rk , in
that order.
R is the output of the computer C1 produced by processing the
program P .
We now dene a family of complexity measures C (d R), the com-
plexity of a region R of space-time when it is viewed as the sum of
independent regions of diameter not greater than d. C (d R) is the
length of the smallest program P which makes the computer C1 output
R, among all those P such that the intermediate results R1 to Rk are
all less than or equal to d in diameter. C (d R) where d equals the di-
ameter of R is to within a bounded dierence just the usual Solomono
complexity measure. But as d decreases, we may be forced to forget
any patterns in R that are more than d in diameter, and the complexity
C (d R) increases.
We present below a table with four examples. In each of the four
cases, R is a 1-dimensional region, a binary sequence of length n. R1
is a random binary sequence of length n (\gas"). R2 consists of n
repetitions of 1 (\crystal"). The left half of R3 is a random binary
sequence of length n=2. The right half of R3 is produced by rotating the
left half about R3's midpoint (\bilateral symmetry"). R4 consists of two
identical copies of a random binary sequence of length n=2 (\twins").
C (d R) = R = R1 R = R2 R = R3 R = R4
approx. ? \gas" \crystal" \bilateral \twins"
symmetry"
d=n n log2 n n=2 n=2
Note 1
d = n=k
(k > 1 xed, n k log2 n n ; (n=2k) n
n large) Notes 1,2 Note 2 Note 2
d=1 n n n n
Note 1. This supposes that n is represented in base-two notation
by a random binary sequence. These values are too high in those rare
cases where this is not true.
162 Part III|Applications to Biology
Note 2. These are conjectured values. We can only show that
C (d R) is approximately less than or equal to these values.
Bibliography
Arbib, M. A. (1962). \Simple self-reproducing automata," Infor-
mation and Control.
Arbib, M. A. (1967). \Automata theory and development: Part
1," Journal of Theoretical Biology.
Arbib, M. A. \Self-reproducing automata|some implications for
theoretical biology."
Biological Science Curriculum Study. (1968). Biological Science:
Molecules to Man, Houghton Mi+in Co.
Chaitin, G. J. (1966). \On the length of programs for computing
nite binary sequences," Journal of the Association for Comput-
ing Machinery.
Chaitin, G. J. (1969a). \On the length of programs for computing
nite binary sequences: Statistical considerations," ibid.
Chaitin, G. J. (1969b). \On the simplicity and speed of programs
for computing innite sets of natural numbers," ibid.
Chaitin, G. J. (1970). \On the diculty of computations," IEEE
Transactions on Information Theory.
Codd, E. F. (1968). Cellular Automata. Academic Press.
Kolmogorov, A. N. (1965). \Three approaches to the denition
of the concept `amount of information'," Problemy Peredachi In-
formatsii.
Kolmogorov, A. N. (1968). \Logical basis for information the-
ory and probability theory," IEEE Transactions on Information
Theory.
To a Mathematical Denition of \Life" 163
Loveland, D. W. \A variant of the Kolmogorov concept of com-
plexity," report 69-4, Math. Dept., Carnegie-Mellon University.
Loveland, D. W. (1969). \On minimal program complexity mea-
sures," Conference Record of the ACM Symposium on Theory of
Computing, May 1969.
Martin-L
of, P. (1966). \The denition of random sequences,"
Information and Control.
Minsky, M. L. (1962). \Problems of formulation for articial
intelligence," Mathematical Problems in the Biological Science,
American Math. Society.
Moore, E. F. (1962). \Machine models of self-reproduction," ibid.
von Neumann, J. (1966). Theory of Self-Reproducing Automata.
(Edited by A. W. Burks.) University of Illinois Press.
Solomono, R. J. (1964). \A formal theory of inductive infer-
ence," Information and Control.
Willis, D. G. (1969). \Computational complexity and probability
constructions," Stanford University.
164 Part III|Applications to Biology
TOWARD A
MATHEMATICAL
DEFINITION OF \LIFE"
In R. D. Levine and M. Tribus, The
Maximum Entropy Formalism, MIT Press,
1979, pp. 477{498
Gregory J. Chaitin
Abstract
In discussions of the nature of life, the terms \complexity," \organ-
ism," and \information content," are sometimes used in ways remark-
ably analogous to the approach of algorithmic information theory, a
mathematical discipline which studies the amount of information nec-
essary for computations. We submit that this is not a coincidence and
that it is useful in discussions of the nature of life to be able to refer to
analogous precisely dened concepts whose properties can be rigorously
studied. We propose and discuss a measure of degree of organization
165
166 Part III|Applications to Biology
and structure of geometrical patterns which is based on the algorith-
mic version of Shannon's concept of mutual information. This paper
is intended as a contribution to von Neumann's program of formulating
mathematically the fundamental concepts of biology in a very general
setting, i.e. in highly simplied model universes.
1. Introduction
Here are two quotations from works dealing with the origins of life and
exobiology:
These vague remarks can be made more precise by in-
troducing the idea of information. Roughly speaking, the
information content of a structure is the minimum number
of instructions needed to specify the structure. Once can
see intuitively that many instructions are needed to specify
a complex structure. On the other hand, a simple repeating
structure can be specied in rather few instructions. 1]
The traditional concept of life, therefore, may be too
narrow for our purpose: : : We should try to break away
from the four properties of growth, feeding, reaction, and
reproduction: : : Perhaps there is a clue in the way we speak
of living organisms. They are highly organized, and per-
haps this is indeed their essence: : : What, then, is orga-
nization? What sets it apart from other similarly vague
concepts? Organization is perhaps viewed best as \complex
interrelatedness": : : A book is complex it only resembles an
organism in that passages in one paragraph or chapter refer
to others elsewhere. A dictionary or thesaurus shows more
organization, for every entry refers to others. A telephone
directory shows less, for although it is equally elaborate,
there is little cross-reference between its entries: : : 2]
If one compares the rst quotation with any introductory article on
algorithmic information theory (e.g. 3{4]), and compares the second
quotation with a preliminary version of this paper 5], one is struck
by the similarities. As these quotations show, there has been a great
Toward a Mathematical Denition of \Life" 167
deal of thought about how to dene \life," \complexity," \organism,"
and \information content of organism." The attempted contribution
of this paper is that we propose a rigorous quantitative denition of
these concepts and are able to prove theorems about them. We do not
claim that our proposals are in any sense denitive, but, following von
Neumann 6{7], we submit that a precise mathematical denition must
be given.
Some preliminary considerations: We shall nd it useful to distin-
guish between the notion of degree of interrelatedness, interdependence,
structure, or organization, and that of information content. Two ex-
treme examples are an ideal gas and a perfect crystal. The complete
microstate at a given time of the rst one is very dicult to describe
fully, and for the second one this is trivial to do, but neither is or-
ganized. In other words, white noise is the most informative message
possible, and a constant pitch tone is least informative, but neither is
organized. Neither a gas nor a crystal should count as organized (see
Theorems 1 and 2 in Section 5), nor should a whale or elephant be con-
sidered more organized than a person simply because it requires more
information to specify the precise details of the current position of each
molecule in its much larger bulk. Also note that following von Neu-
mann 7] we deal with a discrete model universe, a cellular automata
space, each of whose cells has only a nite number of states. Thus we
impose a certain level of granularity in our idealized description of the
real world.
We shall now propose a rigorous theoretical measure of degree of or-
ganization or structure. We use ideas from the new algorithmic formu-
lation of information theory, in which one considers individual objects
and the amount of information in bits needed to compute, construct,
describe, generate or produce them, as opposed to the classical for-
mulation of information theory in which one considers an ensemble of
possibilities and the uncertainty as to which of them is actually the
case. In that theory the uncertainty or \entropy" of a distribution is
dened to be X
; pi log pi
i<k
and is a measure of one's ignorance of which of the k possibilities ac-
tually holds given that the a priori probability of the ith alternative is
168 Part III|Applications to Biology
pi . (Throughout this paper \log" denotes the base-two logarithm.) In
contrast, in the newer formulation of information theory one can speak
of the information content of an individual book, organism, or picture,
without having to imbed it in an ensemble of all possible such objects
and postulate a probability distribution on them.
We believe that the concepts of algorithmic information theory are
extremely basic and fundamental. Witness the light they have shed on
the scientic method 8], the meaning of randomness and the Monte
Carlo method 9], the limitations of the deductive method 3{4], and
now, hopefully, on theoretical biology. An information-theoretic proof
of Euclid's theorem that there are innitely many prime numbers should
also be mentioned (see Appendix 2).
The fundamental notion of algorithmic information theory is H (X ),
the algorithmic information content (or, more briey, \complexity") of
the object X . H (X ) is dened to be the smallest possible number of
bits in a program for a general-purpose computer to print out X . In
other words, H (X ) is the amount of information necessary to describe
X suciently precisely for it to be constructed. Two objects X and Y
are said to be (algorithmically) independent if the best way to describe
them both is simply to describe each of them separately. That is to say,
X and Y are independent if H (X Y ) is approximately equal to H (X )+
H (Y ), i.e. if the joint information content of X and Y is just the sum
of the individual information contents of X and Y . If, however, X and
Y are related and have something in common, one can take advantage
of this to describe X and Y together using much fewer bits than the
total number that would be needed to describe them separately, and
so H (X Y ) is much less than H (X ) + H (Y ). The quantity H (X : Y )
which is dened as follows
H (X : Y ) = H (X ) + H (Y ) ; H (X Y )
is called the mutual information of X and Y and measures the degree
of interdependence between X and Y . This concept was dened, in
an ensemble rather than an algorithmic setting, in Shannon's original
paper 10] on information theory, noisy channels, and coding.
We now explain our denition of the degree of organization or struc-
ture in a geometrical pattern. The d-diameter complexity Hd (X ) of an
Toward a Mathematical Denition of \Life" 169
object X is dened to be the minimum number of bits needed to de-
scribe X as the \sum" of separate parts each of diameter not greater
than d. Let us be more precise. Given d and X , consider all possi-
ble ways of partitioning X into nonoverlapping pieces each of diameter
d. Then Hd (X ) is the sum of the number of bits needed to describe
each of the pieces separately, plus the number of bits needed to spec-
ify how to reassemble them into X . Each piece must have a separate
description which makes no cross-references to any of the others. And
one is interested in those partitions of X and reassembly techniques
which minimize this sum. That is to say,
X
Hd (X ) = min H ( ) + H (Xi )]
i<k
the minimization being taken over all partitions of X into nonoverlap-
ping pieces
X0 X1 X2 : : : Xk;1
all of diameter d.
Thus Hd(X ) is the minimum number of bits needed to describe X
as if it were the sum of independent pieces of size d. For d larger
than the diameter of X , Hd (X ) will be the same as H (X ). If X is
unstructured and unorganized, then as d decreases Hd(X ) will stay
close to H (X ). However if X has structure, then Hd(X ) will rapidly
increase as d decreases and one can no longer take advantage of patterns
of size > d in describing X . Hence Hd (X ) as a function of d is a kind
of \spectrum" or \Fourier transform" of X . Hd (X ) will increase as d
decreases past the diameter of signicant patterns in X , and if X is
organized hierarchically this will happen at each level in the hierarchy.
Thus the faster the dierence increases between Hd (X ) and H (X )
as d decreases, the more interrelated, structured, and organized X is.
Note however that X may be a \scene" containing many independent
structures or organisms. In that case their degrees of organization are
summed together in the measure
Hd (X ) ; H (X ):
Thus the organisms can be dened as the minimal parts of the scene for
which the amount of organization of the whole can be expressed as the
170 Part III|Applications to Biology
sum of the organization of the parts, i.e. pieces for which the organiza-
tion decomposes additively. Alternatively, one can use the notion of the
mutual information of two pieces to obtain a theoretical prescription
of how to separate a scene into independent patterns and distinguish a
pattern from an unstructured background in which it is imbedded (see
Section 6).
Let us enumerate what we view as the main points in favor of this
denition of organization: It is general, i.e. following von Neumann the
details of the physics and chemistry of this universe are not involved
it measures organized structure rather than unstructured details and
it passes the spontaneous generation or \Pasteur" test, i.e. there is
a very low probability of creating organization by chance without a
long evolutionary process (this may be viewed as a way of restating
Theorem 1 in Section 5). The second point is worth elaborating: The
information content of an organism includes much irrelevant detail, and
a bigger animal is necessarily more complex in this sense. But if it were
possible to calculate the mutual information of two arbitrary cells in a
body at a given moment, we surmise that this would give a measure of
the genetic information in a cell. This is because the irrelevant details
in each of them, such as the exact position and velocity of each molecule,
are uncorrelated and would cancel each other out.
In addition to providing a denition of information content and
of degree of organization, this approach also provides a denition of
\organism" in the sense that a theoretical prescription is given for dis-
secting a scene into organisms and determining their boundaries, so
that the measure of degree of organization can then be applied sepa-
rately to each organism. However a strong note of caution is in order:
We agree with 1] that a denition of \life" is valid as long as anything
that satises the denition and is likely to appear in the universe under
consideration, either is alive or is a by-product of living beings or their
activities. There certainly are structures satisfying our denition that
are not alive (see Theorems 3 to 6 in Section 5) however, we believe
that they would only be likely to arise as by-products of the activities
of living beings.
In the succeeding sections we shall do the following: give a more
formal presentation of the basic concepts of algorithmic information
theory discuss the notions of the independence and mutual information
Toward a Mathematical Denition of \Life" 171
of groups of more than two objects formally dene Hd evaluate Hd (R)
for some typical one-dimensional geometrical patterns R which we dub
\gas," \crystal," \twins," \bilateral symmetry," and \hierarchy" con-
sider briey the problem of decomposing scenes containing several inde-
pendent patterns, and of determining the boundary of a pattern which
is imbedded in an unstructured background discuss briey the two
and higher dimension cases and mention some alternative denitions
of mutual information which have been proposed.
The next step in this program of research would be to proceed from
static snapshots to time-varying situations, in other words, to set up a
discrete universe with probabilistic state transitions and to show that
there is a certain probability that a certain level of organization will be
reached by a certain time. More generally, one would like to determine
the probability distribution of the maximum degree of organization of
any organism at time t + , as a function of it at time t. Let us pro-
pose an initial proof strategy for setting up a nontrivial example of the
evolution of organisms: construct a series of intermediate evolutionary
forms 11], argue that increased complexity gives organisms a selec-
tive advantage, and show that no primitive organism is so successful
or lethal that it diverts or blocks this gradual evolutionary pathway.
What would be the intellectual avor of the theory we desire? It would
be a quantitative formulation of Darwin's theory of evolution in a very
general model universe setting. It would be the opposite of ergodic the-
ory. Instead of showing that things mix and become uniform, it would
show that variety and organization will probably increase.
Some nal comments: Software is fast approaching biological lev-
els of complexity, and hardware, thanks to very large scale integration,
is not far behind. Because of this, we believe that the computer is
now becoming a valid metaphor for the entire organism, not just for
the brain 12]. Perhaps the most interesting example of this is the
evolutionary phenomenon suered by extremely large programs such
as operating systems. It becomes very dicult to make changes in
such programs, and the only alternative is to add new features rather
than modify existing ones. The genetic program has been \patched
up" much more and over a much longer period of time than even the
largest operating systems, and Nature has accomplished this in much
the same manner as systems programmers have, by carrying along all
172 Part III|Applications to Biology
the previous code as new code is added 11]. The experimental proof
of this is that ontogeny recapitulates phylogeny, i.e. each embryo to a
certain extent recapitulates in the course of its development the evo-
lutionary sequence that led to it. In this connection we should also
mention the thesis developed in 13] that the information contained in
the human brain is now comparable with the amount of information in
the genes, and that intelligence plus education may be characterized as
a way of getting around the limited modiability and channel capacity
of heredity. In other words, Nature, like computer designers, has de-
cided that it is much more exible to build general-purpose computers
than to use heredity to \hardwire" each behavior pattern instinctively
into a special-purpose computer.
4. Formal De
nition of Hd
We can now present the denition of the d-diameter complexity Hd (R).
We assume a geometry: graph paper of some nite number of dimen-
sions that is divided into unit cubes. Each cube is black or white,
178 Part III|Applications to Biology
opaque or transparent, in other words, contains a 1 or a 0. Instead
of requiring an output tape which is multidimensional, our universal
Turing machine U outputs tuples giving the coordinates and the con-
tents (0 or 1) of each unit cube in a geometrical object that it wishes
to print. Of course geometrical objects are considered to be the same
if they are translation equivalent. We choose for this geometry the
city-block metric
D(X Y ) = max jxi ; yij
which is more convenient for our purposes than the usual metric. By
a region we mean a set of unit cubes with the property that from any
cube in it to any other one there is a path that only goes through
other cubes in the region. To this we add the constraint which in the
3-dimensional case is that the connecting path must only pass through
the interior and faces of cubes in the region, not through their edges or
vertices. The diameter of an arbitrary region R is denoted by jRj, and
is dened to be the minimum diameter 2r of a \sphere"
fX : D(X X0 ) rg
which contains R. Hd(R), the size in bits of the smallest programs
which calculate R as the \sum" of independent regions of diameter
d, is dened as follows:
X
Hd(R) = min + H (Ri)]
i<k
where G
= H (Rj Ri) + H (k)
i<k
the minimization
F being taken over all k and partitions of R into k-
tuples Ri of nonoverlapping regions with the property that jRij < d
for all i < k.
The discussion in Section 3 of independence and mutual informa-
tion shows that Hd (R) is a natural measure to consider. Excepting
the term, Hd (R) ; H (R) is simply the minimum attainable mutual
information over any partition of R into nonoverlapping pieces all of
size not greater than d. We shall see in Section 5 that in practice the
Toward a Mathematical Denition of \Life" 179
min is attained with a small number of pieces and the term is not
very signicant.
A few words about , the number of bits of information needed to
know how to assemble the pieces: The H (k) term is included in , as
illustrated in Lemma 1 below, because it is the number of bits needed
F to
tell U how many descriptions of pieces are to be read. The H (Rj Ri)
term is included in because it is the number of bits needed to tell U
how to compute R given the k-tuple of its pieces. This is perhaps the
most straight-forward formulation, and the one that is closest in spirit
to Section 5 5]. However, less information may suce, e.g.
G
H (Rjhk (Ri )i) + H (k)
i<k
bits. In fact, one could dene to be the minimum number of bits in
a string which yields a program to compute the entire region when it
is concatenated with minimum-size programs for all the pieces of the
region i.e. one could take
= minfjpj : U (pR0 R1R2 : : : Rk;1) = Rg:
Here are two basic properties of Hd: If d jRj, then Hd(R) =
H (R) + O(1) Hd (R) increases monotonically as d decreases. Hd (R) =
H (R) + O(1) if d jRj because we have included the term in the
denition of Hd(R). Hd (R) increases as d decreases because one can no
longer take advantage of patterns of diameter greater than d to describe
R. The curve showing Hd (R) as a function of d may be considered a
kind of \Fourier spectrum" of R. Interesting things will happen to the
curve at d which are the sizes of signicant patterns in R.
Lemma 1. (\Subadditivity for n-tuples")
G X
H ( Ak ) cn + H (Ak ):
k<n k<n
Proof.
G G
H( Ak ) = H (hn Ak i) + O(1)
k<n k<n
G
= H (n) + H ( Ak jn) + O(1)
k<n
0 X
c + H (n) + H (Ak ):
k<n
180 Part III|Applications to Biology
Hence one can take
cn = c0 + H (n):
00
H (B ) O(1) + H (#(B )) + H (Ri )
i2B
X
0
00
H (C ) O(1) + H (#(C )) + H (Ri ):
i2C 0
Here # denotes the cardinality of a set. Now A00, B 00, and C 00 are each
a substring of an O(log n)-random n=2-bit string. This assertion holds
for B 00 for the following two reasons: the n=2-bit string is considered
to be a loop, and jB 00j d = n=k n=2 since k is assumed to be
greater than 1. Hence, applying Lemma 2, one obtains the following
inequalities:
jA00j + O(log n) H (A00)
jB 00j + O(log n) H (B 00)
jC 00j + O(log n) H (C 00):
Adding both of the above sets of three inequalities and using the facts
that
jA00j + jB 00j + jC 00j = jRj = n #(A) n=2 #(B ) 1 #(C ) n=2
Toward a Mathematical Denition of \Life" 185
and that H (m) = O(log m), one sees that
n + O(log n) H (A00) + H (B 00) + H (C 00)
O(1) + H (#(A)) + H (#(B )) + H (#(C )) +
X
fH (Ri ) : i 2 A0 B 0 C 0g
X
O(log n) + H (Ri ):
Hence
X
Hd (R) H (Ri) n + O(log n) = 2H (R) + O(log H (R)):
Theorem 4. (\Bilateral Symmetry")
For convenience assume n is even. Suppose that the region R consists
of an O(log n)-random n=2-bit string u concatenated with its reversal.
Consider d = n=k, where n is large, and k is xed and greater than
zero. Then
H (R) = n=2 + O(log n) and Hd(R) = (2 ; k;1)H (R) + O(log H (R)):
Proof. The proof is along the lines of that of Theorem 3, with
one new idea. In the previous proof we considered B 00 which is the
region Ri in the partition of R that straddles R's midpoint. Before B 00
was O(log jRj)-random, but now it can be compressed into a program
about half its size, i.e. about jB 00j=2 bits long. Hence the maximum
departure from randomness for B 00 is for it to only be O(log jRj) +
(jRj=2k)-random, and this is attained by making B 00 as large as possible
and having its midpoint coincide with that of R.
Theorem 5. (\Hierarchy")
For convenience assume n is a power of two. Suppose that the region
R is constructed in the following fashion. Consider an O(1)-random
log n-bit string s. Start with the one-bit string 1, and successively
concatenate the string with itself or with its bit by bit complement, so
that its size doubles at each stage. At the ith stage, the string or its
complement is chosen depending on whether the ith bit of s is a 0 or
a 1, respectively. Consider the resulting n-bit string R and d = n=k,
where n is large, and k is xed and greater than zero. Then
H (R) = log n + O(log log n) and Hd(R) = kH (R) + O(log H (R)):
186 Part III|Applications to Biology
Proof that Hd (R) kH (R) + O(log H (R))
The reasoning is similar to the case of the upper bounds on Hd (R)
in Theorems 1 and 3. Partition R into k successive strings of size
oor(jRj=k), with one (possibly null) string of size less than k left over
at the end.
Proof that Hd (R) kH (R) + O(log H (R))
F RProceeding as in the proof of Theorem 3, one considers a partition
i of R that realizes Hd (R). Using Lemma 3, one can easily see that
the following lower bound holds for any substring Ri of R:
H (Ri) maxf1 log jRij ; c log log jRijg:
The max f1 : : :g is because H is always greater than or equal to unity
otherwise U would have only a single output. Hence the following
expression is a lower bound on Hd (R):
X
.(jRi j) (7)
where
X
.(x) = maxf1 log x ; c log log xg jRi j = jRj = n jRi j d:
It follows that one obtains a lower bound on (7) and thus on Hd (R) by
solving the following minimization problem: Minimize
X
.(ni) (8)
subject to the following constraints:
X
ni = n ni n=k n large k xed:
Now to do the minimization. Note that as x goes to innity, .(x)=x
goes to the limit zero. Furthermore, the limit is never attained, i.e.
.(x)=x is never equal to zero. Moreover, for x and y suciently large
and x less than y, .(x)=x is greater than .(y)=y. It follows that a sum
of the form (8) with the ni constrained as indicated is minimized by
making the ni as large as possible. Clearly this is achieved by taking
Toward a Mathematical Denition of \Life" 187
all but one of the ni equal to oor(n=k), with the last ni equal to
remainder(n=k). For this choice of ni the value of (8) is
k log n + O(log log n)] + .(remainder(n=k))
= k log n + O(log log n)
= kH (R) + O(log H (R)):
Theorem 6. For convenience assume n is a perfectp square. Suppose
that the region Rpis an n-bit string consisting of n repetitions of an
O(log n)-random n bit string u. Consider d = n=k, where n is large,
and k is xed and greater than zero. Then
p
H (R) = n + O(log n) and Hd (R) = kH (R) + O(log H (R)):
Proof that Hd (R) kH (R) + O(log H (R))
The reasoning is identical to the case of the upper bound on Hd (R)
in Theorem 5.
Proof that Hd (R) kH (R) + O(log H (R))
F RProceeding as in the proof of Theorem 5, one considers a partition
i of R that realizes Hd (R). Using Lemma 2, one can easily see that
the following lower bound holds for any substring Ri of R:
p
H (Ri ) maxf1 ;c log n + minf n jRijgg:
Hence the following expression is a lower bound on Hd (R):
X
.n(jRij) (9)
where
p X
.n (x) = maxf1 ;c log n + minf n xgg jRij = jRj = n jRij d:
It follows that one obtains a lower bound on (9) and thus on Hd (R) by
solving the following minimization problem: Minimize
X
.n (ni) (10)
subject to the following constraints:
X
ni = n ni n=k n large k xed:
188 Part III|Applications to Biology
Now to do the minimization. Consider .n (x)=x as x goes from 1 topn.
It is easy to see that this ratio is much smaller, on the order of 1= n,
for x near to n than it is for x anywhere
p else in the interval from 1 to n.
Also, for x and y both greater than n and x less than y, .n (x)=x is
greater than .n (y)=y. It follows that a sum of the form (10) with the
ni constrained as indicated is minimized by making the ni as large as
possible. Clearly this is achieved by taking all but one of the ni equal
to oor(n=k), with the last ni equal to remainder(n=k). For this choice
of ni the value of (10) is
p
kp n + O(log n)] + .n (remainder(n=k))
= k n + O(log n)
= kH (R) + O(log H (R)):
8. Common Information
We should mention some new concepts that are closely related to the
notion of mutual information. They are called measures of common
information. Here are three dierent expressions dening the common
information content of two strings X and Y . In them the parameter
denotes a small tolerance, and as before H (X : Y jZ ) denotes H (X jZ )+
H (Y jZ ) ; H (hX Y ijZ ).
maxfH (Z ) : H (Z jX ) < & H (Z jY ) < g
minfH (hX Y i : Z ) : H (X : Y jZ ) < g
minfH (Z ) : H (X : Y jZ ) < g
Thus the rst expression for the common information of two strings
denes it to be the maximum information content of a string that can
be extracted easily from both, the second denes it to be the minimum
of the mutual information of the given strings and any string in the
light of which the given strings look nearly independent, and the third
denes it to be the minimum information content of a string in the light
of which the given strings appear nearly independent. Essentially these
denitions of common information are given in 17{19]. 17] considers
an algorithmic formulation of its common information measure, while
18] and 19] deal exclusively with the classical ensemble setting.
Toward a Mathematical Denition of \Life" 191
Appendix 1: Errors in 5]
: : : The denition of the d-diameter complexity given in 5] has a basic
aw which invalidates the entries for R = R2 R3 and R4 and d = n=k
in the table in 5]: It is insensitive to changes in the diameter d : : :
There is also another error in the table in 5], even if we forget the
aw in the denition of the d-diameter complexity. The entry for the
crystal is wrong, and should read log n rather than k log n (see Theorem
2 in Section 5 of this paper).
Appendix 2: An Information-Theoretic
Proof That There Are In
nitely Many
Primes
It is of methodological interest to use widely diering techniques in
elementary proofs of Euclid's theorem that there are innitely many
primes. For example, see Chapter II of Hardy and Wright 20], and also
21{23]. Recently Billingsley 24] has given an information-theoretic
proof of Euclid's theorem. The purpose of this appendix is to point out
that there is an information-theoretic proof of Euclid's theorem that
utilizes ideas from algorithmic information theory instead of the classi-
cal measure-theoretic setting employed by Billingsley. We consider the
algorithmic entropy H (n), which applies to individual natural numbers
n instead of to ensembles.
The proof is by reductio ad absurdum. Suppose on the contrary that
there are only nitely many primes p1 : : : pk . Then one way to specify
algorithmically an arbitrary natural number
Y
n = pei i
each ei is log n. Thus H (ei) log log n + O(log log log n). So for
random n we have
log n + O(log log n) k log log n + O(log log log n)]
where k is the assumed nite number of primes. This last inequality is
false for large n, as it assuredly is not the case that log n = O(log log n).
Thus our initial assumption that there are only k primes is refuted, and
there must in fact be innitely many primes.
This proof is merely a formalization of the observation that if there
were only nitely many primes, the prime factorization of a number
would usually be a much more compact representation for it than its
base-two numeral, which is absurd. This proof appears, formulated as
a counting argument, in Section 2.6 of the 1938 edition of Hardy and
Wright 20] we believe that it is also quite natural to present it in an
information-theoretic setting.
References
1] L. E. Orgel, The Origins of Life: Molecules and Natural Selection,
Wiley, New York, 1973, pp. 187{197.
2] P. H. A. Sneath, Planets and Life, Funk and Wagnalls, New York,
1970, pp. 54{71.
3] G. J. Chaitin, \Information-Theoretic Computational Complex-
ity," IEEE Trans. Info. Theor. IT-20 (1974), pp. 10{15.
4] G. J. Chaitin, \Randomness and Mathematical Proof," Sci.
Amer. 232, No. 5 (May 1975), pp. 47{52.
Toward a Mathematical Denition of \Life" 193
5] G. J. Chaitin, \To a Mathematical Denition of \Life"," ACM
SICACT News 4 (Jan. 1970), pp. 12{18.
6] J. von Neumann, \The General and Logical Theory of Au-
tomata," John von Neumann|Collected Works, Volume V, A.
H. Taub (ed.), Macmillan, New York, 1963, pp. 288{328.
7] J. von Neumann, Theory of Self-Reproducing Automata, Univ.
Illinois Press, Urbana, 1966, pp. 74{87 edited and completed by
A. W. Burks.
8] R. J. Solomono, \A Formal Theory of Inductive Inference," Info.
& Contr. 7 (1964), pp. 1{22, 224{254.
9] G. J. Chaitin and J. T. Schwartz, \A Note on Monte Carlo Pri-
mality Tests and Algorithmic Information Theory," Comm. Pure
& Appl. Math., to appear.
10] C. E. Shannon and W. Weaver, The Mathematical Theory of
Communication, Univ. Illinois Press, Urbana, 1949.
11] H. A. Simon, The Sciences of the Articial, MIT Press, Cam-
bridge, MA, 1969, pp. 90{97, 114{117.
12] J. von Neumann, The Computer and the Brain, Silliman Lectures
Series, Yale Univ. Press, New Haven, CT, 1958.
13] C. Sagan, The Dragons of Eden|Speculations on the Evolution of
Human Intelligence, Random House, New York, 1977, pp. 19{47.
14] G. J. Chaitin, \A Theory of Program Size Formally Identical to
Information Theory," J. ACM 22 (1975), pp. 329{340.
15] G. J. Chaitin, \Algorithmic Information Theory," IBM J. Res.
Develop. 21 (1977), pp. 350{359, 496.
16] R. M. Solovay, \On Random R.E. Sets," Non-Classical Logics,
Model Theory, and Computability, A. I. Arruda, N. C. A. da
Costa, and R. Chuaqui (eds.), North-Holland, Amsterdam, 1977,
pp. 283{307.
194 Part III|Applications to Biology
17] P. G
acs and J. K
orner, \Common Information Is Far Less Than
Mutual Information," Prob. Contr. & Info. Theor. 2, No. 2
(1973), pp. 149{162.
18] A. D. Wyner, \The Common Information of Two Dependent Ran-
dom Variables," IEEE Trans. Info. Theor. IT-21 (1975), pp. 163{
179.
19] H. S. Witsenhausen, \Values and Bounds for the Common In-
formation of Two Discrete Random Variables," SIAM J. Appl.
Math. 31 (1976), pp. 313{333.
20] G. H. Hardy and E. M. Wright, An Introduction to the Theory of
Numbers, Clarendon Press, Oxford, 1962.
21] G. H. Hardy, A Mathematician's Apology, Cambridge University
Press, 1967.
22] G. H. Hardy, Ramanujan|Twelve Lectures on Subjects Suggested
by His Life and Work, Chelsea, New York, 1959.
23] H. Rademacher and O. Toeplitz, The Enjoyment of Mathematics,
Princeton University Press, 1957.
24] P. Billingsley, \The Probability Theory of Additive Arithmetic
Functions," Ann. of Prob. 2 (1974), pp. 749{791.
25] A. W. Burks (ed.), Essays on Cellular Automata, Univ. Illinois
Press, Urbana, 1970.
26] M. Eigen, \The Origin of Biological Information," The Physicist's
Conception of Nature, J. Mehra (ed.), D. Reidel Publishing Co.,
Dordrecht-Holland, 1973, pp. 594{632.
27] R. Landauer, \Fundamental Limitations in the Computational
Process," Ber. Bunsenges. Physik. Chem. 80 (1976), pp. 1048{
1059.
28] H. P. Yockey, \A Calculation of the Probability of Spontaneous
Biogenesis by Information Theory," J. Theor. Biol. 67 (1977),
pp. 377{398.
Part IV
Technical Papers on
Self-Delimiting Programs
195
A THEORY OF PROGRAM
SIZE FORMALLY
IDENTICAL TO
INFORMATION THEORY
Journal of the ACM 22 (1975),
pp. 329{340
Gregory J. Chaitin1
IBM Thomas J. Watson Research Center
Yorktown Heights, New York
Abstract
A new denition of program-size complexity is made. H (A B=C D)
is dened to be the size in bits of the shortest self-delimiting program
for calculating strings A and B if one is given a minimal-size self-
delimiting program for calculating strings C and D. This diers from
previous denitions: (1) programs are required to be self-delimiting, i.e.
no program is a prex of another, and (2) instead of being given C and
D directly, one is given a program for calculating them that is minimal
in size. Unlike previous denitions, this one has precisely the formal
197
198 Part IV|Technical Papers on Self-Delimiting Programs
properties of the entropy concept of information theory. For example,
H (A B ) = H (A) + H (B=A) + O(1). Also, if a program of length k
is assigned measure 2;k , then H (A) = ; log2(the probability that the
standard universal computer will calculate A) + O(1).
CR Categories:
5.25, 5.26, 5.27, 5.5, 5.6
1. Introduction
There is a persuasive analogy between the entropy concept of informa-
tion theory and the size of programs. This was realized by the rst
workers in the eld of program-size complexity, Solomono 1], Kol-
mogorov 2], and Chaitin 3,4], and it accounts for the large measure of
success of subsequent work in this area. However, it is often the case
that results are cumbersome and have unpleasant error terms. These
ideas cannot be a tool for general use until they are clothed in a pow-
erful formalism like that of information theory.
This opinion is apparently not shared by all workers in this eld
(see Kolmogorov 5]), but it has led others to formulate alternative
1Copyright c 1975, Association for Computing Machinery, Inc. General permis-
sion to republish, but not for prot, all or part of this material is granted provided
that ACM's copyright notice is given and that reference is made to the publica-
tion, to its date of issue, and to the fact that reprinting privileges were granted by
permission of the Association for Computing Machinery.
This paper was written while the author was a visitor at the IBM Thomas J. Wat-
son Research Center, Yorktown Heights, New York, and was presented at the IEEE
International Symposium on Information Theory, Notre Dame, Indiana, October
1974.
Author's present address: Rivadavia 3580, Dpto. 10A, Buenos Aires, Argentina.
A Theory of Program Size 199
denitions of program-size complexity, for example, Loveland's uni-
form complexity 6] and Schnorr's process complexity 7]. In this paper
we present a new concept of program-size complexity. What train of
thought led us to it?
Following 8, Sec. VI, p.7], think of a computer as decoding equip-
ment at the receiving end of a noiseless binary communications channel.
Think of its programs as code words, and of the result of the compu-
tation as the decoded message. Then it is natural to require that the
programs/code words form what is called an \instantaneous code," so
that successive messages sent across the channel (e.g. subroutines) can
be separated. Instantaneous codes are well understood by informa-
tion theorists 9{12] they are governed by the Kraft inequality, which
therefore plays a fundamental role in this paper.
One is thus led to dene the relative complexity H (A B=C D) of
A and B with respect to C and D to be the size of the shortest self-
delimiting program for producing A and B from C and D. However,
this is still not quite right. Guided by the analogy with information
theory, one would like
H (A B ) = H (A) + H (B=A) + ,
to hold with an error term , bounded in absolute value. But, as is
shown in the Appendix, j,j is unbounded. So we stipulate instead
that H (A B=C D) is the size of the smallest self-delimiting program
that produces A and B when it is given a minimal-size self-delimiting
program for C and D. Then it can be shown that j,j is bounded.
In Sections 2{4 we dene this new concept formally, establish the
basic identities, and briey consider the resulting concept of random-
ness or maximal entropy.
We recommend reading Willis 13]. In retrospect it is clear that he
was aware of some of the basic ideas of this paper, though he developed
them in a dierent direction. Chaitin's study 3,4] of the state com-
plexity of Turing machines may be of interest, because in his formalism
programs can also be concatenated. To compare the properties of our
entropy function H with those it has in information theory, see 9{12]
to contrast its properties with those of previous denitions of program-
size complexity, see 14]. Cover 15] and Gewirtz 16] use our new
200 Part IV|Technical Papers on Self-Delimiting Programs
denition. See 17{32] for other applications of information/entropy
concepts.
2. De
nitions
X = f( 0 1 00 01 10 11 000 : : :g is the set of nite binary strings,
and X 1 is the set of innite binary strings. Henceforth we shall merely
say \string" instead of \binary string," and a string will be understood
to be nite unless the contrary is explicitly stated. X is ordered as
indicated, and jsj is the length of the string s. The variables p, q, s,
and t denote strings. The variables and ! denote innite strings. n
is the prex of of length n. N = f0 1 2 : : :g is the set of natural
numbers. The variables c, i, j , k, m, and n denote natural numbers.
R is the set of positive rationals. The variable r denotes an element of
R. We write \r.e." instead of \recursively enumerable," \lg" instead of
\log2," and sometimes \2 " (x)" instead of \2x ." #(S ) is the cardinality
of the set S .
Concrete De
nition of a Computer. A computer C is a Turing
machine with two tapes, a program tape and a work tape. The program
tape is nite in length. Its leftmost square is called the dummy square
and always contains a blank. Each of its remaining squares contains
either a 0 or a 1. It is a read-only tape, and has one read head on it
which can move only to the right. The work tape is two-way innite
and each of its squares contains either a 0, a 1, or a blank. It has one
read-write head on it.
At the start of a computation the machine is in its initial state,
the program p occupies the whole program tape except for the dummy
square, and the read head is scanning the dummy square. The work
tape is blank except for a single string q whose leftmost symbol is being
scanned by the read-write head. Note that q can be equal to (. In that
case the read-write head initially scans a blank square. p can also be
equal to (. In that case the program tape consists solely of the dummy
square. See Figure 1.
During each cycle of operation the machine may halt, move the
read head of the program tape one square to the right, move the read-
write head of the work tape one square to the left or to the right, erase
A Theory of Program Size 201
0 0 1 1 0 1 0
6
Initial State
:::
? :::
1 1 0 0
0 0 1 1 0 1 0
6
Halted
:::
? :::
0 1 0
H (s1 : : : sn ) = HU (s1 : : : sn )
P (s1 : : : sn ) = PU (s1 : : : sn )
3. Basic Identities
This section has two objectives. The rst is to show that H and I satisfy
the fundamental inequalities and identities of information theory to
within error terms of the order of unity. For example, the information
in s about t is nearly symmetrical. The second objective is to show
that P is approximately a conditional probability measure: P (t=s) and
P (s t)=P (s) are within a constant multiplicative factor of each other.
The following notation is convenient for expressing these approxi-
mate relationships. O(1) denotes a function whose absolute value is
less than or equal to c for all values of its arguments. And f g means
that the functions f and g satisfy the inequalities cf g and f cg
for all values of their arguments. In both cases c 2 N is an unspecied
constant.
Theorem 3.1.
(a) H (s t) = H (t s) + O(1),
(b) H (s=s) = O(1),
(c) H (H (s)=s) = O(1),
(d) H (s) H (s t) + O(1),
(e) H (s=t) H (s) + O(1),
(f) H (s t) H (s) + H (t=s) + O(1),
(g) H (s t) H (s) + H (t) + O(1),
(h) I (s : t) O(1),
A Theory of Program Size 207
(i) I (s : t) H (s) + H (t) ; H (s t) + O(1),
(j) I (s : s) = H (s) + O(1),
(k) I (( : s) = O(1),
(l) I (s : () = O(1).
Proof. These are easy consequences of the denitions. The proof of
Theorem 3.1(f) is especially interesting, and is given in full below. Also,
note that Theorem 3.1(g) follows immediately from Theorem 3.1(f,e),
and Theorem 3.1(i) follows immediately from Theorem 3.1(f) and the
denition of I .
Now for the proof of Theorem 3.1(f). We claim that there is
a computer C with the following property. If U (p s ) = t and
jpj = H (t=s) (i.e. if p is a minimal-size program for calculating t
from s), then C (sp () = hs ti. By using Theorem 2.3(e,a) we see
that HC (s t) jspj = jsj + jpj = H (s) + H (t=s), and H (s t)
HC (s t) + sim(C ) H (s) + H (t=s) + O(1).
It remains to verify the claim that there is such a computer. C does
the following when it is given the program sp on its program tape
and the string ( on its work tape. First it simulates the computation
that U performs when given the same program and work tapes. In this
manner C reads the program s and calculates s. Then it simulates the
computation that U performs when given s on its work tape and the
remaining portion of C 's program tape. In this manner C reads the
program p and calculates t from s. The entire program tape has now
been read, and both s and t have been calculated. C nally forms the
pair hs ti and halts, indicating this to be the result of the computation.
Q.E.D.
Remark. The rest of this section is devoted to showing that the
\" in Theorem 3.1(f) and 3.1(i) can be replaced by \=." The argu-
ments used to do this are more probabilistic than information-theoretic
in nature.
Theorem 3.2. (Extension of the Kraft inequality condition for the
existence of an instantaneous code).
Hypothesis. Consider an eectively given list of nitely or innitely
many \requirements" hsk nk i (k = 0 1 2 : : :) for the construction of a
208 Part IV|Technical Papers on Self-Delimiting Programs
computer. The requirements are said to be \consistent" if 1 Pk 2 "
(;nk ), and we assume that they are consistent. Each requirement
hsk nk i requests that a program of length nk be \assigned" to the result
sk . A computer C is said to \satisfy" the requirements if there are
precisely as many programs p of length n such that C (p () = s as
there are pairs hs ni in the list of requirements. Such a C must have the
property that PC (s) = P 2 " (;nk ) (sk = s) and HC (s) = min nk (sk =
s).
Conclusion. There are computers that satisfy these requirements.
Moreover, if we are given the requirements one by one, then we can
simulate a computer that satises them. Hereafter we refer to the par-
ticular computer that the proof of this theorem shows how to simulate
as the one that is \determined" by the requirements.
Proof.
(a) First we give what we claim is the (abstract) denition of a par-
ticular computer C that satises the requirements. In the second
part of the proof we justify this claim.
As we are given the requirements, we assign programs to results.
Initially all programs for C are available. When we are given
the requirement hsk nk i we assign the rst available program of
length nk to the result sk (rst in the ordering which X was de-
ned to have in Section 2). As each program is assigned, it and all
its prexes and extensions become unavailable for future assign-
ments. Note that a result can have many programs assigned to it
(of the same or dierent lengths) if there are many requirements
involving it.
How can we simulate C ? As we are given the requirements, we
make the above assignments, and we simulate C by using the tech-
nique that was given in the proof of Theorem 2.1 for a concrete
computer to simulate an abstract one.
(b) Now to justify the claim. We must show that the above rule for
making assignments never fails, i.e. we must show that it is never
the case that all programs of the requested length are unavailable.
The proof we sketch is due to N. J. Pippenger.
A Theory of Program Size 209
A geometrical interpretation is necessary. Consider the unit in-
terval 0 1). The kth program of length n (0 k < 2n ) corre-
sponds to the interval k2;n (k + 1)2;n ). Assigning a program
corresponds to assigning all the points in its interval. The condi-
tion that the set of assigned programs must be an instantaneous
code corresponds to the rule that an interval is available for as-
signment i no point in it has already been assigned. The rule
we gave above for making assignments is to assign that interval
k2;n (k + 1)2;n ) of the requested length 2;n that is available
that has the smallest possible k. Using this rule for making as-
signments gives rise to the following fact.
Fact. The set of those points in 0 1) that are unassigned can
always be expressed as the union of a nite number of intervals
ki2 " (;ni) (ki + 1)2 " (;ni)) with the following properties:
ni > ni+1, and
(ki + 1)2 " (;ni) ki+12 " (;ni+1):
I.e. these intervals are disjoint, their lengths are distinct powers
of 2, and they appear in 0 1) in order of increasing length.
We leave to the reader the verication that this fact is always
the case and that it implies that an assignment is impossible
only if the interval requested is longer than the total length of
the unassigned part of 0 1), i.e. only if the requirements are
inconsistent. Q.E.D.
Theorem 3.3. (Recursive \estimates" for HC and PC ). Consider
a computer C .
(a) The set of all true propositions of the form \HC (s) n" is r.e.
Given t one can recursively enumerate the set of all true propo-
sitions of the form \HC (s=t) n:"
(b) The set of all true propositions of the form \PC (s) > r" is r.e.
Given t one can recursively enumerate the set of all true propo-
sitions of the form \PC (s=t) > r:"
Proof. This is an easy consequence of the fact that the domain of
C is an r.e. set. Q.E.D.
210 Part IV|Technical Papers on Self-Delimiting Programs
Remark. The set of all true propositions of the form \H (s=t) n"
is not r.e. for if it were r.e., it would easily follow from Theorems 3.1(c)
and 2.3(q) that Theorem 5.1(f) is false, which is a contradiction.
Theorem 3.4. For each computer C there is a constant c such that
(a) H (s) ; lg PC (s) + c,
(b) H (s=t) ; lg PC (s=t) + c.
Proof. It follows from Theorem 3.3(b) that the set T of all true
propositions of the form \PC (s) > 2;n " is r.e., and that given t one
can recursively enumerate the set Tt of all true propositions of the form
\PC (s=t) > 2;n ." This will enable us to use Theorem 3.2 to show that
there is a computer C 0 with these properties:
HC (s) = d; lg PC (s)e + 1
0
(1)
PC (s) = 2 " (;d; lg PC (s)e)
0
HC (s=t) = d; lg PC (s=t)e + 1
0
(2)
PC (s=t) = 2 " (;d; lg PC (s=t)e):
0
Here dxe denotes the least integer greater than x. By applying The-
orem 2.3(a,b) to (1) and (2), we see that Theorem 3.4 holds with
c = sim(C 0) + 2.
How does the computer C 0 work? First of all, it checks whether
it has been given ( or t on its work tape. These two cases can be
distinguished, for by Theorem 2.3(c) it is impossible for t to be equal
to (.
(a) If C 0 has been given ( on its work tape, it enumerates T and
simulates the computer determined by all requirements of the
form
hs n + 1i (\PC (s) > 2;n " 2 T ): (3)
Thus hs ni is taken as a requirement i n d; lg PC (s)e + 1.
Hence the number of programs p of length n such that C 0(p () =
s is 1 if n d; lg PC (s)e+1 and is 0 otherwise, which immediately
yields (1).
However, we must check that the requirements (3) are consistent.
P 2;jpj (over all programs p we wish to assign to the result s) =
A Theory of Program Size 211
2 " (;d; lgPPC (s)e) < PC (s). Hence P 2;jpj (over all p we wish to
assign) < s PC (s) 1 by Theorem 2.3(j). Thus the hypothesis
of Theorem 3.2 is satised, the requirements (3) indeed determine
a computer, and the proof of (1) and Theorem 3.4(a) is complete.
(b) If C 0 has been given t on its work tape, it enumerates Tt and
simulates the computer determined by all requirements of the
form
hs n + 1i (\PC (s=t) > 2;n " 2 Tt ): (4)
Thus hs ni is taken as a requirement i n d; lg PC (s=t)e + 1.
Hence the number of programs p of length n such that C 0(p t) =
s is 1 if n d; lg PC (s=t)e + 1 and is 0 otherwise, which imme-
diately yields (2).
However, we must check that the requirements (4) are consistent.
P 2;jpj (over all programs p we wish to assign to the result s) =
2 " (;d; lg PC (s=tP)e) < PC (s=t). Hence P 2;jpj (over all p we
wish to assign) < s PC (s=t) 1 by Theorem 2.3(k). Thus the
hypothesis of Theorem 3.2 is satised, the requirements (4) indeed
determine a computer, and the proof of (2) and Theorem 3.4(b)
is complete. Q.E.D.
Theorem 3.5.
(a) For each computer C there is a constant c such that P (s)
2;c PC (s), P (s=t) 2;c PC (s=t).
(b) H (s) = ; lg P (s) + O(1), H (s=t) = ; lg P (s=t) + O(1).
Proof. Theorem 3.5(a) follows immediately from Theorem 3.4 us-
ing the fact that P (s) 2 " (;H (s)) and P (s=t) 2 " (;H (s=t))
(Theorem 2.3(l,m)). Theorem 3.5(b) is obtained by taking C = U in
Theorem 3.4 and also using these two inequalities. Q.E.D.
Remark. Theorem 3.4(a) extends Theorem 2.3(a,b) to probabili-
ties. Note that Theorem 3.5(a) is not an immediate consequence of our
weak denition of an optimal universal computer.
Theorem 3.5(b) enables one to reformulate results about H as re-
sults concerning P , and vice versa it is the rst member of a trio of
212 Part IV|Technical Papers on Self-Delimiting Programs
formulas that will be completed with Theorem 3.9(e,f). These formu-
las are closely analogous to expressions in information theory for the
information content of individual events or symbols 10, Secs. 2.3, 2.6,
pp. 27{28, 34{37].
Theorem 3.6.
(a) # (fpjU (p () = s & jpj H (s) + ng) 2 " (n + O(1)).
(b) # (fpjU (p t) = s & jpj H (s=t) + ng) 2 " (n + O(1)).
Proof. This follows immediately
P from Theorem 3.5(b). Q.E.D.
Theorem 3.7. P (s) t P (s t).
Proof. On the one hand, there isPa computer C such that C (p () = s
if U (p () = hs ti. Thus PC (s) t P (s t). Using Theorem 3.5(a), we
see that P (s) 2;c Pt P (s t).
On the other hand,Pthere is a computer C such that C (p () = hs si
if U (p () = s. Thus t PC (s t) PC (s s) P (s). Using Theorem
3.5(a), we see that Pt P (s t) 2;c P (s). Q.E.D.
Theorem 3.8. There is a computer C and a constant c such that
HC (t=s) = H (s t) ; H (s) + c.
Proof. The set of all programs p such that U (p () is dened is
r.e. Let pk be the kth program in a particular recursive enumeration
of this set, and dene sk and tk by hsk tk i = U (pk ().PBy Theorems
3.7 and 3.5(b) there is a c such that 2 " (H (s) ; c) t P (s t) 1
for all s. Given s on its work tape, C simulates the computer Cs
determined by the requirements htk jpk j ; jsj + ci for k = 0 1 2 : : :
such that sk = U (s (). Recall Theorem 2.3(d,e). Thus for each p such
that U (p () = hs ti there is a corresponding p0 such that C (p0 s) =
Cs(p0 () = t and jp0j = jpj ; H (s) + c. Hence
HC (t=s) = H (s t) ; H (s) + c:
However,
P 2 " (;jwe must check that the requirements for Cs are consistent.
P p j) (over all programs p0 we wish to assign to any result
0
t) = 2 " (;jPpj + H (s) ; c) (over all p such that U (p () = hs ti) =
2 " (H (s) ; c) t P (s t) 1 because of the way c was chosen. Thus the
hypothesis of Theorem 3.2 is satised, and these requirements indeed
determine Cs. Q.E.D.
Theorem 3.9.
A Theory of Program Size 213
(a) H (s t) = H (s) + H (t=s) + O(1),
(b) I (s : t) = H (s) + H (t) ; H (s t) + O(1),
(c) I (s : t) = I (t : s) + O(1),
(d) P (t=s) P (s t)=P (s),
(e) H (t=s) = lg P (s)=P (s t) + O(1),
(f) I (s : t) = lg P (s t)=P (s)P (t) + O(1).
Proof.
(a) Theorem 3.9(a) follows immediately from Theorems 3.8, 2.3(b),
and 3.1(f).
(b) Theorem 3.9(b) follows immediately from Theorem 3.9(a) and the
denition of I (s : t).
(c) Theorem 3.9(c) follows immediately from Theorems 3.9(b) and
3.1(a).
(d,e) Theorem 3.9(d,e) follows immediately from Theorems 3.9(a) and
3.5(b).
(f) Theorem 3.9(f) follows immediately from Theorems 3.9(b) and
3.5(b). Q.E.D.
Remark. We thus have at our disposal essentially the entire formal-
ism of information theory. Results such as these can now be obtained
eortlessly:
H (s1 ) H (s1 =s2 ) + H (s2 =s3 ) + H (s3 =s4 ) + H (s4 ) + O(1)
H (s1 s2 s3 s4) = H (s1 =s2 s3 s4 ) + H (s2 =s3 s4) + H (s3=s4 ) +
H (s4) + O(1):
However, there is an interesting class of identities satised by our H
function that has no parallel in information theory. The simplest of
these is H (H (s)=s) = O(1) (Theorem 3.1(c)), which with Theorem
214 Part IV|Technical Papers on Self-Delimiting Programs
3.9(a) immediately yields H (s H (s)) = H (s) + O(1). This is just one
pair of a large family of identities, as we now proceed to show.
Keeping Theorem 3.9(a) in mind, consider modifying the computer
C used in the proof of Theorem 3.1(f) so that it also measures the
lengths H (s) and H (t=s) of its subroutines s and p, and halts indi-
cating hs t H (s) H (t=s)i to be the result of the computation instead
of hs ti. It follows that H (s t) = H (s t H (s) H (t=s)) + O(1) and
H (H (s) H (t=s)=s t) = O(1). In fact, it is easy to see that
H (H (s) H (t) H (t=s) H (s=t) H (s t)=s t) = O(1)
which implies H (I (s : t)=s t) = O(1). And of course these identities
generalize to tuples of three or more strings.
4. A Random In
nite String
The undecidability of the halting problem is a fundamental theorem
of recursive function theory. In algorithmic information theory the
corresponding theorem is as follows: The base-two representation of the
probability that U halts is a random (i.e. maximally complex) innite
string. In this section we formulate this statement precisely and prove
it.
Theorem 4.1. (Bounds on the complexity of natural numbers).
(a) Pn 2;H (n) 1.
Consider a recursive function f : N ! N .
(b) If Pn 2;f (n) diverges, then H (n) > f (n) innitely often.
(c) If Pn 2;f (n) converges, then H (n) f (n) + O(1).
Proof.
(a) By Theorem 2.3(l,j), Pn 2;H (n) Pn P (n) 1.
(b) If Pn 2;f (n) diverges, andP H (n) f (n) held for all but nitely
many values of n, then n 2;H (n) would also diverge. But this
would contradict Theorem 4.1(a), and thus H (n) > f (n) innitely
often.
A Theory of Program Size 215
(c) If Pn 2;f (n) converges, there is an n0 such that Pnn0 2;f (n)
1. By Theorem 3.2 there is a computer C determined by the
requirements hn f (n)i (n n0). Thus H (n) f (n) + sim(C ) for
all n n0. Q.E.D.
Theorem 4.2. (Maximal complexity nite and innite strings).
(a) max H (s) (jsj = n) = n + H (n) + O(1).
(b) # (fsj jsj = n & H (s) n + H (n) ; kg) 2 " (n ; k + O(1)).
(c) Imagine that the innite string is generated by tossing a fair
coin once for each if its bits. Then, with probability one, H ( n) >
n for all but nitely many n.
Proof.
(a,b) Consider a string s of length n. By Theorem 3.9(a), H (s) =
H (n s)+ O(1) = H (n)+ H (s=n)+ O(1). We now obtain Theorem
4.2(a,b) from this estimate for H (s).
There is a computer C such that C (p jpj) = p for all p. Thus
H (s=n) n + sim(C ), and H (s) n + H (n) + O(1). On the
other hand, by Theorem 2.3(q), fewer than 2n;k of the s satisfy
H (s=n) < n ; k. Hence fewer than 2n;k of the s satisfy H (s) <
n ; k + H (n) + O(1). Thus we have obtained Theorem 4.2(a,b).
(c) Now for the proof of Theorem 4.2(c). By Theorem 4.2(b), at most
a fraction of 2 " (;H (n) + c) of the strings s of length n satisfy
H (s) n. Thus the probability that Psatises H ( n ) n is
2 " (;H (n) + c). By Theorem 4.1(a), n 2 " (;H (n) + c) con-
verges. Invoking the Borel-Cantelli lemma, we obtain Theorem
4.2(c). Q.E.D.
De
nition of Randomness. A string s is random i H (s) is
approximately equal to jsj + H (jsj). An innite string is random i
9c 8n H ( n ) > n ; c.
Remark. In the case of innite strings there is a sharp distinction
between randomness and nonrandomness. In the case of nite strings
it is a matter of degree. To the question \How random is s?" one must
reply indicating how close H (s) is to jsj + H (jsj).
216 Part IV|Technical Papers on Self-Delimiting Programs
C. P. Schnorr (private communication) has shown that this com-
plexity-based denition of a random innite string and P. Martin-L
of's
statistical denition of this concept 7, pp. 379{380] are equivalent.
De
nition of Base-Two Representations. The base-two rep-
resentation of a real number x 2 (0 1]Pis that unique string b1b2b3 : : :
with innitely many 1's such that x = k bk 2;k . P P (s) =
De
nition of the Probability
P 2;jpj (U (p () is dened). ! that U Halts. ! = s
HC (s) = HC (s=(),
H (s=t) = HU (s=t),
H (s) = HU (s),
P
PC (s=t) = 2;jpj (C (p t) = s),
PC (s) = PC (s=(),
P (s=t) = PU (s=t),
P (s) = PU (s).
Theorem 5.1.
(a) H (s H (s)) = H (s) + O(1),
(b) H (s t) = H (s) + H (t=s H (s)) + O(1),
218 Part IV|Technical Papers on Self-Delimiting Programs
(c) ;H (H (s)=s) ; O(1) ,(s t) O(1),
(d) ,(s s) = O(1),
(e) ,(s H (s)) = ;H (H (s)=s) + O(1),
(f) H (H (s)=s) 6= O(1).
Proof.
(a) On the one hand, H (s H (s)) H (s) + c because a minimal-size
program for s also tells one its length H (s), i.e. because there
is a computer C such that C (p () = hU (p () jpji if U (p () is
dened. On the other hand, obviously H (s) H (s H (s)) + c.
(b) On the one hand, H (s t) H (s) + H (t=s H (s)) + c follows from
Theorem 5.1(a) and the obvious inequality H (s t) H (s H (s))+
H (t=s H (s)) + c. On the other hand, H (s t) H (s) +
H (t=s H (s)) ; c follows from the inequality H (t=s H (s))
H (s t) ; H (s) + c analogous to Theorem 3.8 and obtained by
adapting the methods of Section 3 to the present setting.
(c) This follows from Theorem 5.1(b) and the obvious inequality
H (t=s H (s)) ; c H (t=s) H (H (s)=s) + H (t=s H (s)) + c.
(d) If t = s, H (s t) ; H (s) ; H (t=s) = H (s s) ; H (s) ; H (s=s) =
H (s) ; H (s) + O(1) = O(1), for obviously H (s s) = H (s) + O(1)
and H (s=s) = O(1).
(e) If t = H (s), H (s t) ; H (s) ; H (t=s) = H (s H (s)) ; H (s) ;
H (H (s)=s) = ;H (H (s)=s) + O(1) by Theorem 5.1(a).
(f) The proof is by reductio ad absurdum. Suppose on the contrary
that H (H (s)=s) < c for all s. First we adapt an idea of A. R.
Meyer and D. W. Loveland 6, pp. 525{526] to show that there
is a partial recursive function f : X ! N with the property that
if f (s) is dened it is equal to H (s) and this occurs for innitely
many values of s. Then we obtain the desired contradiction by
showing that such a function f cannot exist.
A Theory of Program Size 219
Consider the set Ks of all natural numbers k such that H (k=s) < c
and H (s) k. Note that min Ks = H (s), #(Ks) < 2c, and
given s one can recursively enumerate Ks . Also, given s and
#(Ks) one can recursively enumerate Ks until one nds all its
elements, and, in particular, its smallest element, which is H (s).
Let m = lim sup #(Ks), and let n be such that jsj n implies
#(Ks) m.
Knowing m and n one calculates f (s) as follows. First one checks
if jsj < n. If so, f (s) is undened. If not, one recursively enu-
merates Ks until m of its elements are found. Because of the way
n was chosen, Ks cannot have more than m elements. If it has
less than m, one never nishes searching for m of them, and so
f (s) is undened. However, if #(Ks ) = m, which occurs for in-
nitely many values of s, then one eventually realizes all of them
have been found, including f (s) = min Ks = H (s). Thus f (s) is
dened and equal to H (s) for innitely many values of s.
It remains to show that such an f is impossible. As the length
of s increases, H (s) tends to innity, and so f is unbounded.
Thus given n and H (n) one can calculate a string sn such that
H (n) + n < f (sn ) = H (sn ), and so H (sn=n H (n)) is bounded.
Using Theorem 5.1(b) we obtain H (n) + n < H (sn ) H (n sn ) +
c0 H (n)+ H (sn=n H (n))+ c00 H (n)+ c000, which is impossible
for n c0000. Thus f cannot exist, and our initial assumption that
H (H (s)=s) < c for all s must be false. Q.E.D.
Remark. Theorem 5.1 makes it clear that the fact that H (H (s)=s)
is unbounded implies that H (t=s) is less convenient to use than
H (t=s H (s)). In fact, R. Solovay (private communication) has an-
nounced that max H (H (s)=s) taken over all strings s of length n is
asymptotic to lg n. The denition of the relative complexity of s with
respect to t given in Section 2 is equivalent to H (s=t H (t)).
Acknowledgments
The author is grateful to the following for conversations that helped
to crystallize these ideas: C. H. Bennett, T. M. Cover, R. P. Daley,
220 Part IV|Technical Papers on Self-Delimiting Programs
M. Davis, P. Elias, T. L. Fine, W. L. Gewirtz, D. W. Loveland, A. R.
Meyer, M. Minsky, N. J. Pippenger, R. J. Solomono, and S. Winograd.
The author also wishes to thank the referees for their comments.
References
1] Solomonoff, R. J. A formal theory of inductive inference. In-
form. and Contr. 7 (1964), 1{22, 224{254.
2] Kolmogorov, A. N. Three approaches to the quantitative de-
nition of information. Problems of Inform. Transmission 1, 1
(Jan.{March 1965), 1{7.
3] Chaitin, G. J. On the length of programs for computing nite
binary sequences. J. ACM 13, 4 (Oct. 1966), 547{569.
4] Chaitin, G. J. On the length of programs for computing nite
binary sequences: Statistical considerations. J. ACM 16, 1 (Jan.
1969), 145{159.
5] Kolmogorov, A. N. On the logical foundations of information
theory and probability theory. Problems of Inform. Transmission
5, 3 (July{Sept. 1969), 1{4.
6] Loveland, D. W. A variant of the Kolmogorov concept of com-
plexity. Inform. and Contr. 15 (1969), 510{526.
7] Schnorr, C. P. Process complexity and eective random tests.
J. Comput. and Syst. Scis. 7 (1973), 376{388.
8] Chaitin, G. J. On the diculty of computations. IEEE Trans.
IT-16 (1970), 5{9.
9] Feinstein, A. Foundations of Information Theory. McGraw-
Hill, New York, 1958.
10] Fano, R. M. Transmission of Information. Wiley, New York,
1961.
A Theory of Program Size 221
11] Abramson, N. Information Theory and Coding. McGraw-Hill,
New York, 1963.
12] Ash, R. Information Theory. Wiley-Interscience, New York,
1965.
13] Willis, D. G. Computational complexity and probability con-
structions. J. ACM 17, 2 (April 1970), 241{259.
14] Zvonkin, A. K. and Levin, L. A. The complexity of nite
objects and the development of the concepts of information and
randomness by means of the theory of algorithms. Russ. Math.
Survs. 25, 6 (Nov.{Dec. 1970), 83{124.
15] Cover, T. M. Universal gambling schemes and the complexity
measures of Kolmogorov and Chaitin. Rep. No. 12, Statistics
Dep., Stanford U., Stanford, Calif., 1974. Submitted to Ann.
Statist.
16] Gewirtz, W. L. Investigations in the theory of descriptive com-
plexity. Ph.D. Thesis, New York University, 1974 (to be published
as a Courant Institute rep.).
17] Weiss, B. The isomorphism problem in ergodic theory. Bull.
Amer. Math. Soc. 78 (1972), 668{684.
18] Renyi, A. Foundations of Probability. Holden-Day, San Fran-
cisco, 1970.
19] Fine, T. L. Theories of Probability: An Examination of Foun-
dations. Academic Press, New York, 1973.
20] Cover, T. M. On determining the irrationality of the mean of
a random variable. Ann. Statist. 1 (1973), 862{871.
21] Chaitin, G. J. Information-theoretic computational complexity.
IEEE Trans. IT-20 (1974), 10{15.
22] Levin, M. Mathematical Logic for Computer Scientists. Rep.
TR-131, M.I.T. Project MAC, 1974, pp. 145{147,153.
222 Part IV|Technical Papers on Self-Delimiting Programs
23] Chaitin, G. J. Information-theoretic limitations of formal sys-
tems. J. ACM 21, 3 (July 1974), 403{424.
24] Minsky, M. L. Computation: Finite and Innite Machines.
Prentice-Hall, Englewood Clis, N.J., 1967, pp. 54, 55, 66.
25] Minsky, M. and Papert, S. Perceptrons: An Introduction to
Computational Geometry. M.I.T. Press, Cambridge, Mass. 1969,
pp. 150{153.
26] Schwartz, J. T. On Programming: An Interim Report on
the SETL Project. Installment I: Generalities. Lecture Notes,
Courant Institute, New York University, New York, 1973, pp.
1{20.
27] Bennett, C. H. Logical reversibility of computation. IBM J.
Res. Develop. 17 (1973), 525{532.
28] Daley, R. P. The extent and density of sequences within the
minimal-program complexity hierarchies. J. Comput. and Syst.
Scis. (to appear).
29] Chaitin, G. J. Information-theoretic characterizations of recur-
sive innite strings. Submitted to Theoretical Comput. Sci.
30] Elias, P. Minimum times and memories needed to compute the
values of a function. J. Comput. and Syst. Scis. (to appear).
31] Elias, P. Universal codeword sets and representations of the
integers. IEEE Trans. IT (to appear).
32] Hellman, M. E. The information theoretic approach to cryp-
tography. Center for Systems Research, Stanford U., Stanford,
Calif., 1974.
33] Chaitin, G. J. Randomness and mathematical proof. Sci.
Amer. 232, 5 (May 1975), in press. (Note. Reference 33] is
not cited in the text.)
A Theory of Program Size 223
G. J. Chaitin
IBM Thomas J. Watson Research Center, P.O. Box 218,
Yorktown Heights, New York 10598
Abstract
We obtain some dramatic results using statistical mechanics{ther-
modynamics kinds of arguments concerning randomness, chaos, unpre-
dictability, and uncertainty in mathematics. We construct an equation
involving only whole numbers and addition, multiplication, and expo-
nentiation, with the property that if one varies a parameter and asks
whether the number of solutions is nite or innite, the answer to this
question is indistinguishable from the result of independent tosses of a
fair coin. This yields a number of powerful Godel incompleteness-type
results concerning the limitations of the axiomatic method, in which
entropy{information measures are used. c 1987 Academic Press, Inc.
225
226 Part IV|Technical Papers on Self-Delimiting Programs
1. Introduction
It is now half a century since Turing published his remarkable paper On
Computable Numbers, with an Application to the Entscheidungsproblem
(Turing 15]). In that paper Turing constructs a universal Turing ma-
chine that can simulate any other Turing machine. He also uses Can-
tor's method to diagonalize over the countable set of computable real
numbers and construct an uncomputable real, from which he deduces
the unsolvability of the halting problem and as a corollary a form of
G
odel's incompleteness theorem. This paper has penetrated into our
thinking to such a point that it is now regarded as obvious, a fate which
is suered by only the most basic conceptual contributions. Speaking
as a mathematician, I cannot help noting with pride that the idea of
a general purpose electronic digital computer was invented in order
to cast light on a fundamental question regarding the foundations of
mathematics, years before such objects were actually constructed. Of
course, this is an enormous simplication of the complex genesis of the
computer, to which many contributed, but there is as much truth in
this remark as there is in many other historical \facts."
In another paper 5], I used ideas from algorithmic information
theory to construct a diophantine equation whose solutions are in a
sense random. In the present paper I shall try to give a relatively
self-contained exposition of this result via another route, starting from
Turing's original construction of an uncomputable real number.
Following Turing, consider an enumeration r1 r2 r3 : : : of all com-
putable real numbers between zero and one. We may suppose that rk is
the real number, if any, computed by the kth computer program. Let
:dk1dk2dk3 : : : be the successive digits in the decimal expansion of rk .
Following Cantor, consider the diagonal of the array of rk ,
r1 = :d11d12d13 : : :
r2 = :d21d22d23 : : :
r3 = :d31d32d33 : : :
This gives us a new real number with decimal expansion :d11d22d33 : : :
Now change each of these digits, avoiding the digits zero and nine.
The result is an uncomputable real number, because its rst digit is
Incompleteness Theorems for Random Reals 227
dierent from the rst digit of the rst computable real, its second
digit is dierent from the second digit of the second computable real,
etc. It is necessary to avoid zero and nine, because real numbers with
dierent digit sequences can be equal to each other if one of them ends
with an innite sequence of zeros and the other ends with an innite
sequence of nines, for example, .3999999: : : = .4000000: : :
Having constructed an uncomputable real number by diagonalizing
over the computable reals, Turing points out that it follows that the
halting problem is unsolvable. In particular, there can be no way of
deciding if the kth computer program ever outputs a kth digit. Because
if there were, one could actually calculate the successive digits of the
uncomputable real number dened above, which is impossible. Turing
also notes that a version of G
odel's incompleteness theorem is an im-
mediate corollary, because if there cannot be an algorithm for deciding
if the kth computer program ever outputs a kth digit, there also cannot
be a formal axiomatic system which would always enable one to prove
which of these possibilities is the case, for in principle one could run
through all possible proofs to decide. Using the powerful techniques
which were developed in order to solve Hilbert's tenth problem (see
Davis et al. 7] and Jones and Matijasevi%c 11]), it is possible to encode
the unsolvability of the halting problem as a statement about an expo-
nential diophantine equation. An exponential diophantine equation is
one of the form
P (x1 : : : xm) = P 0(x1 : : : xm)
where the variables x1 : : : xm range over natural numbers and P and
P 0 are functions built up from these variables and natural number con-
stants by the operations of addition, multiplication, and exponentia-
tion. The result of this encoding is an exponential diophantine equation
P = P 0 in m + 1 variables n x1 : : : xm with the property that
P (n x1 : : : xm) = P 0(n x1 : : : xm)
has a solution in natural numbers x1 : : : xm if and only if the nth
computer program ever outputs an nth digit. It follows that there can
be no algorithm for deciding as a function of n whether or not P = P 0
has a solution, and thus there cannot be any complete proof system for
settling such questions either.
228 Part IV|Technical Papers on Self-Delimiting Programs
Up to now we have followed Turing's original approach, but now we
will set o into new territory. Our point of departure is a remark of
Courant and Robbins 6] that another way of obtaining a real number
that is not on the list r1 r2 r3 : : : is by tossing a coin. Here is their
measure-theoretic argument that the real numbers are uncountable.
Recall that r1 r2 r3 : : : are the computable reals between zero and
one. Cover r1 with an interval of length =2, cover r2 with an interval
of length =4, cover r3 with an interval of length =8, and in general
cover rk with an interval of length =2k . Thus all computable reals in
the unit interval are covered by this innite set of intervals, and the
total length of the covering intervals is
X1
k = :
k=1 2
Hence if we take suciently small, the total length of the covering
is arbitrarily small. In summary, the reals between zero and one con-
stitute an interval of length one, and the subset that are computable
can be covered by intervals whose total length is arbitrarily small. In
other words, the computable reals are a set of measure zero, and if we
choose a real in the unit interval at random, the probability that it is
computable is zero. Thus one way to get an uncomputable real with
probability one is to ip a fair coin, using independent tosses to obtain
each bit of the binary expansion of its base-two representation.
If this train of thought is pursued, it leads one to the notion of a
random real number, which can never be a computable real. Following
Martin-L
of 12], we give a denition of a random real using constructive
measure theory. We say that a set of real numbers X is a constructive
measure zero set if there is an algorithm A which given n generates
a (possibly innite) set of intervals whose total length is less than or
equal to 2;n and which covers the set X . More precisely, the covering
is in the form of a set C of nite binary strings s such that
X ;jsj ;n
2 2
s2C
(here jsj denotes the length of the string s), and each real in the covered
set X has a member of C as the initial part of its base-two expansion.
Incompleteness Theorems for Random Reals 229
In other words, we consider sets of real numbers with the property that
there is an algorithm A for producing arbitrarily small coverings of the
set. Such sets of reals are constructively of measure zero. Since there are
only countably many algorithms A for constructively covering measure
zero sets, it follows that almost all real numbers are not contained in
any set of constructive measure zero. Such reals are called (Martin-L
of)
random reals. In fact, if the successive bits of a real number are chosen
by coin ipping, with probability one it will not be contained in any set
of constructive measure zero, and hence will be a random real number.
Note that no computable real number r is random. Here is how we
get a constructive covering of arbitrarily small measure. The covering
algorithm, given n, yields the n-bit initial sequence of the binary digits
of r. This covers r and has total length or measure equal to 2;n . Thus
there is an algorithm for obtaining arbitrarily small coverings of the
set consisting of the computable real r, and r is not a random real
number. We leave to the reader the adaptation of the argument in
Feller 9] proving the strong law of large numbers to show that reals in
which all digits do not have equal limiting frequency have constructive
measure zero. It follows that random reals are normal in Borel's sense,
that is, in any base all digits have equal limiting frequency.
Let us consider the real number p whose nth bit in base-two nota-
tion is a zero or a one depending on whether or not the exponential
diophantine equation
P (n x1 : : : xm) = P 0(n x1 : : : xm)
has a solution in natural numbers x1 : : : xm. We will show that p is
not a random real. In fact, we will give an algorithm for producing
coverings of measure (n + 1)2;n , which can obviously be changed to
one for producing coverings of measure not greater than 2;n . Consider
the rst N values of the parameter n. If one knows for how many of
these values of n, P = P 0 has a solution, then one can nd for which
values of n < N there are solutions. This is because the set of solutions
of P = P 0 is recursively enumerable, that is, one can try more and
more solutions and eventually nd each value of the parameter n for
which there is a solution. The only problem is to decide when to give
up further searches because all values of n < N for which there are
230 Part IV|Technical Papers on Self-Delimiting Programs
solutions have been found. But if one is told how many such n there
are, then one knows when to stop searching for solutions. So one can
assume each of the N +1 possibilities ranging from p has all of its initial
N bits o to p has all of them on, and each one of these assumptions
determines the actual values of the rst N bits of p. Thus we have
determined N + 1 dierent possibilities for the rst N bits of p, that
is, the real number p is covered by a set of intervals of total length
(N + 1)2;N , and hence is a set of constructive measure zero, and p
cannot be a random real number.
Thus asking whether an exponential diophantine equation has a
solution as a function of a parameter cannot give us a random real
number. However asking whether or not the number of solutions is
innite can give us a random real. In particular, there is a exponential
diophantine equation Q = Q0 such that the real number q is random
whose nth bit is a zero or a one depending on whether or not there are
innitely many natural numbers x1 : : : xm such that
Q(n x1 : : : xm) = Q0(n x1 : : : xm):
The equation P = P 0 that we considered before encoded the halting
problem, that is, the nth bit of the real number p was zero or one
depending on whether the nth computer program ever outputs an nth
digit. To construct an equation Q = Q0 such that q is random is
somewhat more dicult we shall limit ourselves to giving an outline
of the proof:1
1. First show that if one had an oracle for solving the halting prob-
lem, then one could compute the successive bits of the base-two
representation of a particular random real number q.
2. Then show that if a real number q can be computed using an
oracle for the halting problem, it can be obtained without using
an oracle as the limit of a computable sequence of dyadic rational
numbers (rationals of the form K=2L ).
1The full proof is given later in this paper (Theorems R6 and R7), but is slightly
dierent it uses a particular random real number, , that arises naturally in algo-
rithmic information theory.
Incompleteness Theorems for Random Reals 231
3. Finally show that any real number q that is the limit of a com-
putable sequence of dyadic rational numbers can be encoded into
an exponential diophantine equation Q = Q0 in such a manner
that
Q(n x1 : : : xm) = Q0(n x1 : : : xm)
has innitely many solutions x1 : : : xm if and only if the nth bit
of the real number q is a one. This is done using the fact \that
every r.e. set has a singlefold exponential diophantine represen-
tation" (Jones and Matijasevi%c 11]).
Q = Q0 is quite a remarkable equation, as it shows that there is a
kind of uncertainty principle even in pure mathematics, in fact, even
in the theory of whole numbers. Whether or not Q = Q0 has innitely
many solutions jumps around in a completely unpredictable manner as
the parameter n varies. It may be said that the truth or falsity of the
assertion that there are innitely many solutions is indistinguishable
from the result of independent tosses of a fair coin. In other words,
these are independent mathematical facts with probability one-half!
This is where our search for a probabilistic proof of Turing's theorem
that there are uncomputable real numbers has led us, to a dramatic
version of G
odel's incompleteness theorem.
In Section 2 we dene the real number $, and we develop as much
of algorithmic information theory as we shall need in the rest of the
paper. In Section 3 we compare a number of denitions of randomness,
we show that $ is random, and we show that $ can be encoded into
an exponential diophantine equation. In Section 4 we develop incom-
pleteness theorems for $ and for its exponential diophantine equation.
n log n n
2 n n
X 1
n log n log log n
behaves the same as P 2n 2 n1log n = P n log1 n , which diverges, etc.
On the other hand, P 1 behaves the same as P 2n 1 = P 1 ,
n
which converges.
X 1
n log n(log log n)2
behaves the same as P 2n 2 n(log
n
1 P 1
n)2 = n(log n)2 , which converges, etc.
For the purposes of this paper, it is best to think of the algorithmic
informationP content H , which we shall now dene, as the borderline
;
between 2 f (n) converging and diverging!
De
nition. Dene an information content measure H (n) to be a
function of the natural number n having the property that
X
$
2;H (n) 1 (1)
Incompleteness Theorems for Random Reals 233
and that H (n) is computable as a limit from above, so that the set
f\H (n) k "g (2)
of all upper bounds is r.e. We also allow H (n) = +1, which contributes
zero to the sum (1) since 2;1 = 0. It contributes no elements to the
set of upper bounds (2).
Note. If H isPan information content measure, then it follows
immediately from 2;H (n) = $ 1 that
#fkjH (k) ng 2n:
That is, there are at most 2n natural numbers with information content
less than or equal to n.
Theorem I. There is a minimal information content measure H ,
i.e., an information content measure with the property that for any
other information content measure H 0, there exists a constant c de-
pending only on H and H 0 but not on n such that
H (n) H 0(n) + c:
That is, H is smaller, within O(1), than any other information content
measure.
Proof. Dene H as
H (n) = min H (n) + k]
k 1 k
(3)
where Hk denotes the information content measure resulting from tak-
ing the kth (k 1) computer algorithm and patching it, if necessary, so
that it gives limits from above and does not violate the $ 1 condition
(1). Then (3) gives H as a computable limit from above, and
X X X X
$ = 2;H (n) 2;k 2;H (n)] 2;k = 1:
k
n k1 n k1
Q.E.D.
De
nition. Henceforth we use this minimal information content
measure H , and we refer to H (n) as the information content of n. We
also consider each natural number n to correspond to a bit string s and
234 Part IV|Technical Papers on Self-Delimiting Programs
vice versa, so that H is dened for strings as well as numbers.2 In ad-
dition, let hn mi denote a xed computable one-to-one correspondence
between natural numbers and ordered pairs of natural numbers. We
dene the joint information content of n and m to be H (hn mi). Thus
H is dened for ordered pairs of natural numbers as well as individual
natural numbers. We dene the relative information content H (mjn)
of m relative to n by the equation
H (hn mi)
H (n) + H (mjn):
That is,
H (mjn)
H (hn mi) ; H (n):
And we dene the mutual information content I (n : m) of n and m by
the equation
I (n : m)
H (m) ; H (mjn)
H (n) + H (m) ; H (hn mi):
Note. $ = P 2;H (n) is just on the borderline between convergence
and divergence:
P 2;H (n) converges.
If f (n) is computable and unbounded, then
P 2;H (n)+f (n) di-
verges.
P
If f (n) is computable and 2;f (n) converges, then H (n) f (n)+
O(1).
P
If f (n) is computable and 2;f (n) diverges, then H (n) f (n)
innitely often.
Let us look at a real-valued function (n) that Pis computable as a
limit of rationals from below. And suppose that (n) 1. Then
H (n) ; log (n) + O(1). So 2;H (n) can be thought of as a maxi-
mal function (n) that is computable in the limit from below and has
2It is important to distinguish between the length of a string and its information
content! However, a possible source of confusion is the fact that the \natural unit"
for both length and information content is the \bit." Thus one often speaks of an
n-bit string, and also of a string whose information content is n bits.
Incompleteness Theorems for Random Reals 235
P (n) 1, instead of thinking of H (n) as a minimal function f (n)
that is computable in the limit from above and has P 2;f (n) 1.
Lemma I. For all n,
H (n) 2 log n + c
log n + 2 log log n + c0
log n + log log n + 2 log log log n + c00 : : :
For innitely many values of n,
H (n) log n
log n + log log n
log n + log log n + log log log n : : :
Lemma I2. H (s) jsj + H (jsj) + O(1). jsj = the length in bits of
the string s.
Proof.
X X X
1$= 2;H (n) = 2;H (n) 2;n ]
n n
X X ;jnsj+=Hn(n)]
= 2
n jsj=n
X ;jsj+H (jsj)]
= 2 :
s
The lemma follows by the minimality of H . Q.E.D.
Lemma I3. There are < 2n;k+c n-bit strings s such that H (s) <
n + H (n) ; k. Thus there are < 2n;H (n);k+c n-bit strings s such that
H (s) < n ; k.
Proof. X X ;H (s) X ;H (s)
2 = 2 = $ 1:
n jsj=n s
Hence by the minimality of H
X
2;H (n)+c 2;H (s)
jsj=n
which yields the lemma. Q.E.D.
236 Part IV|Technical Papers on Self-Delimiting Programs
Lemma I4. If (n) is a computable partial function, then
H ((n)) H (n) + c :
Proof. X X X ;H (x)
1 $ = 2;H (n) 2 :
n y (x)=y
Note that X
2;a 2;b i
) a min bi : (4)
i
The lemma follows by the minimality of H . Q.E.D.
Lemma I5. H (hn mi) = H (hm ni) + O(1).
Proof. X ;H (hnmi) X ;H (hnmi)
2 = 2 = $ 1:
hnmi hmni
The lemma follows by using the minimality of H in both directions.
Q.E.D.
Lemma I6. H (hn mi) H (n) + H (m) + O(1).
Proof. X ;H (n)+H (m)] 2 2
2 = $ 1 1:
hnmi
The lemma follows by the minimality of H . Q.E.D.
Lemma I7. H (n) H (hn mi) + O(1).
Proof.
X X ;H (hnmi) X ;H (hnmi)
2 = 2 = $ 1:
n hnmi hnmi
The lemma follows from (4) and the minimality of H . Q.E.D.
Lemma I8. H (hn H (n)i) = H (n) + O(1).
Proof. By Lemma I7,
H (n) H (hn H (n)i) + O(1):
On the other hand, consider
X ;i;1 X ;H (n);j;1
2 = 2
hnii hnH (n)+j i
H (n)i
XX X
= 2;H (n);k = 2;H (n) = $ 1:
n k1 n
Incompleteness Theorems for Random Reals 237
By the minimality of H ,
H (hn H (n) + j i) H (n) + j + O(1):
Take j = 0. Q.E.D.
Lemma I9. H (hn ni) = H (n) + O(1).
Proof. By Lemma I7,
H (n) H (hn ni) + O(1):
On the other hand, consider (n) = hn ni. By Lemma I4,
H ((n)) H (n) + c :
That is,
H (hn ni) H (n) + O(1):
Q.E.D.
Lemma I10. H (hn 0i) = H (n) + O(1).
Proof. By Lemma I7,
H (n) H (hn 0i) + O(1):
On the other hand, consider (n) = hn 0i. By Lemma I4,
H ((n)) H (n) + c :
That is,
H (hn 0i) H (n) + O(1):
Q.E.D.
Lemma I11. H (mjn)
H (hn mi) ; H (n) ;c.
(Proof: use Lemma I7.)
Lemma I12. I (n : m)
H (n) + H (m) ; H (hn mi) ;c.
(Proof: use Lemma I6.)
Lemma I13. I (n : m) = I (m : n) + O(1).
(Proof: use Lemma I5.)
Lemma I14. I (n : n) = H (n) + O(1).
(Proof: use Lemma I9.)
Lemma I15. I (n : 0) = O(1).
(Proof: use Lemma I10.)
238 Part IV|Technical Papers on Self-Delimiting Programs
Note. The further development of this algorithmic version of infor-
mation theory3 requires the notion of the size in bits of a self-delimiting
computer program (Chaitin 3]), which, however, we can do without in
this paper.
3. Random Reals
De
nition (Martin-L
of 12]). Speaking geometrically, a real r is
Martin-L
of random if it is never the case that it is contained in each
set of an r.e. innite sequence Ai of sets of intervals with the property
that the measure4 of the ith set is always less than or equal to 2;i ,
(Ai) 2;i : (5)
Here is the denition of a Martin-L
of random real r in a more compact
notation: h i
8i (Ai ) 2;i ) :8i r 2 Ai] :
An equivalent denition, if we restrict ourselves to reals in the unit
interval 0 r 1, may be formulated in terms of bit strings rather
than geometrical notions, as follows. Dene a covering to be an r.e. set
of ordered pairs consisting of a natural number i and a bit string s,
Covering = fhi sig
with the property that if hi si 2 Covering and hi s0i 2 Covering, then
it is not the case that s is an extension of s0 or that s0 is an extension
3 Compare the original ensemble version of information theory given in Shannon
and Weaver 13].
4 I.e., the sum of the lengths of the intervals, being careful to avoid counting
overlapping intervals twice.
Incompleteness Theorems for Random Reals 239
of s.5 We simultaneously consider Ai to be a set of (nite) bit strings
fsjhi si 2 Coveringg
9c8n H (rn ) n ; c] :
A real r is strongly Chaitin random if (the information content of the
initial segment rn of length n of the base-two expansion of r) eventually
9 Thus
n ; c H(rn ) n + H(n) + c 0
n + logn + 2 loglog n + c
00
by Lemmas I2 and I.
Incompleteness Theorems for Random Reals 241
becomes and remains arbitrarily greater than n: lim inf H (rn ) ; n = 1.
In other words,
8k 9Nk 8(n Nk ) H (rn ) n + k] :
Note. All these denitions hold with probability one (see Theorem
R4).
Theorem R1. Martin-L
of random , Chaitin random.
Proof. :Martin-L
of ) :Chaitin. Suppose that a real number r has
the property that
h i
8i (Ai) 2;i & r 2 Ai :
The series
X n n2 X ;n2 +n ;0 ;0 ;2 ;6 ;12 ;20
2 =2 = 2 = 2 + 2 + 2 + 2 + 2 + 2 +
obviously converges, and dene N so that
X ;n2 +n
2 1:
nN
(In fact, we can take N = 2.) Let the variable s range over bit strings,
and consider
X X ;jsj;n] X n X
2 = 2 (An2 ) 2;n2 +n 1:
nN s2A 2
n
nN nN
is < 2;n . (See the next paragraph for the proof of this claim.) Thus
we have an r.e. innite sequence An of sets of intervals with measure
(An) 2;n which all contain r. Hence r is not Martin-L
of random.
Proof of Claim. Since P 2;H (k) = $ 1, there is a k between 1 and
2n+c+c such that H (k) n + c + c0. For this value of k,
0
since the number of k-bit strings s with H (s) < k + H (k) ; i is < 2k;i+c 0
Of course, it will follow from this theorem that must be an irrational number, so
this situation cannot actually occur, but we don't know that yet!
246 Part IV|Technical Papers on Self-Delimiting Programs
But by Lemma I4,
H ((k )) H (k ) + c :
Hence
k < H ((k )) H (k ) + c
and
H (k ) > k ; c :
Thus is Chaitin random, and by Theorems R1 and R3 it is also
Martin-L
of random and weakly Solovay random. Q.E.D.
Theorem R7. There is an exponential diophantine equation
L(n x1 : : : xm) = R(n x1 : : : xm)
which has only nitely many solutions x1 : : : xm if the nth bit of $ is
a 0, and which has innitely many solutions x1 : : : xm if the nth bit
of $ is a 1.
Proof. Since H (n) can be computed as a limit from above, 2;H (n)
can be computed as a limit from below. It follows that
X
$ = 2;H (n)
is the limit from below of a computable sequence !1 !2 !3
of rational numbers
$ = klim
!1 k
!:
This sequence converges extremely slowly! The exponential diophan-
tine equation L = R is constructed from the sequence !k by using the
theorem that \every r.e. relation has a singlefold exponential diophan-
tine representation" (Jones and Matijasevi%c 11]). Since the assertion
that
\the nth bit of !k is a 1"
is an r.e. relation between n and k (in fact, it is a recursive relation),
the theorem of Jones and Matijasevi%c yields an equation
L(n k x2 : : : xm) = R(n k x2 : : : xm)
involving only additions, multiplications, and exponentiations of nat-
ural number constants and variables, and this equation has exactly one
Incompleteness Theorems for Random Reals 247
solution x2 : : : xm in natural numbers if the nth bit of the base-two
expansion of !k is a 1, and it has no solution x2 : : : xm in natural
numbers if the nth bit of the base-two expansion of !k is a 0. The
number of dierent m-tuples x1 : : : xm of natural numbers which are
solutions of the equation
L(n x1 : : : xm) = R(n x1 : : : xm)
is therefore innite if the nth bit of the base-two expansion of $ is a
1, and it is nite if the nth bit of the base-two expansion of $ is a 0.
Q.E.D.
4. Incompleteness Theorems
Having developed the necessary information-theoretic formalism in Sec-
tion 2, and having studied the notion of a random real in Section 3, we
can now begin to derive incompleteness theorems.
The setup is as follows. The axioms of a formal theory are consid-
ered to be encoded as a single nite bit string, the rules of inference
are considered to be an algorithm for enumerating the theorems given
the axioms, and in general we shall x the rules of inference and vary
the axioms. More formally, the rules of inference F may be considered
to be an r.e. set of propositions of the form
\Axioms `F Theorem."
The r.e. set of theorems deduced from the axiom A is determined by
selecting from the set F the theorems in those propositions which have
the axiom A as an antecedent. In general we will consider the rules of
inference F to be xed and study what happens as we vary the axioms
A. By an n-bit theory we shall mean the set of theorems deduced from
an n-bit axiom.
N -bit theory ever yields more than N + f (log log N ) + cfg bits of $.
Note. Thus for n of special form, i.e., which have concise descrip-
tions, we get better upper bounds on the number of bits of $ which
are yielded by n-bit theories. This is a foretaste of the way algorithmic
information theory will be used in Theorem C and Corollary C2 (Sect.
4.4).
Incompleteness Theorems for Random Reals 253
Lemma for Second Borel{Cantelli Lemma! For any nite set
fxk g of non-negative real numbers,
Y
(1 ; xk ) P1x :
k
Proof. If x is a real number, then
1 ; x 1 +1 x :
Thus Y
(1 ; xk ) Q 1 1
P
(1 + xk ) xk
since if all the xk are non-negative
Y X
(1 + xk ) xk :
Q.E.D.
Second Borel{Cantelli Lemma (Feller 9]). Suppose that the
events An have the property that it is possible to determine whether or
not the event An occurs by examining the rst f (n) bits of $, where
f is aPcomputable function. If the events An are mutually independent
and PrfAng diverges, then $ has the property that innitely many
of the An must occur.
Proof. Suppose on the contrary that $ has the property that only
nitely many of the events An occur. Then there is an N such that
the event An does not occur if n N . The probability that none
of the events AN AN +1 : : : AN +k occur is, since the An are mutually
independent, precisely
Yk
(1 ; PrfAN +ig) hPk 1 i
i=0 i=0 PrfAN +ig
which goes to zero as k goes to innity. This would give us arbitrarily
small covers for $, which contradicts the fact that $ is Martin-L
of
random. Q.E.D. P
Theorem B. If 2n;f (n) diverges and f is computable, then in-
nitely often there is a run of f (n) zeros between bits 2n & 2n+1 of $
254 Part IV|Technical Papers on Self-Delimiting Programs
(2n bit < 2n+1 ). Hence there are rules of inference which have the
property that there are innitely many N -bit theories that yield (the
rst) N + f (log N ) bits of $.
Proof. We wish to prove that innitely often $ must have a run of
k = f (n) consecutive zeros between its 2n th & its 2n+1 th bit position.
There are 2n bits in the range in question. Divide this into nonoverlap-
ping blocks of 2k bits each, giving a total of 2n=2k blocks. The chance
of having a run of k consecutive zeros in each block of 2k bits is
k2k;2
2k : (9)
2
Reason:
There are 2k ; k + 1 k dierent possible choices for where to
put the run of k zeros in the block of 2k bits.
Then there must be a 1 at each end of the run of 0's, but the
remaining 2k ; k ; 2 = k ; 2 bits can be anything.
This may be an underestimate if the run of 0's is at the beginning
or end of the 2k bits, and there is no room for endmarker 1's.
There is no room for another 10k 1 to t in the block of 2k bits, so
we are not overestimating the probability by counting anything
twice.
Summing (9) over all 2n =2k blocks and over all n, we get
X " k2k;2 2n # 1 X n;k 1 X n;f (n)
22k 2k = 8 n 2 = 8 2 = 1:
n
Invoking the Psecond Borel{Cantelli lemma (if the events Ai are inde-
pendent and PrfAig diverges, then innitely many of the Ai must
occur), we are nished.
P Q.E.D.
Corollary B. If 2;f (n) diverges and f is computable and nonde-
creasing, then innitely often there is a run of f (2n+1 ) zeros between
bits 2n & 2n+1 of $ (2n bit < 2n+1 ). Hence there are innitely many
N -bit theories that yield (the rst) N + f (N ) bits of $.
Incompleteness Theorems for Random Reals 255
Proof. If P 2;f (n) diverges and f is computable and nondecreasing,
then by the Cauchy condensation test
X n ;f (2 )
22 n
5. Conclusion
In conclusion, we have seen that proving whether particular exponen-
tial diophantine equations have nitely or innitely many solutions, is
absolutely intractable. Such questions escape the power of mathemat-
ical reasoning. This is a region in which mathematical truth has no
discernible structure or pattern and appears to be completely random.
These questions are completely beyond the power of human reasoning.
Mathematics cannot deal with them.
Quantum physics has shown that there is randomness in nature. I
believe that we have demonstrated in this paper that randomness is
already present in pure mathematics. This does not mean that the
universe and mathematics are lawless, it means that laws of a dierent
kind apply: statistical laws.
References
1] G. J. Chaitin, Information-theoretic computational complexity,
IEEE Trans. Inform. Theory 20 (1974), 10{15.
2] G. J. Chaitin, Randomness and mathematical proof, Sci. Amer.
232, No. 5 (1975), 47{52.
3] G. J. Chaitin, A theory of program size formally identical to
information theory, J. Assoc. Comput. Mach. 22 (1975), 329{340.
4] G. J. Chaitin, G
odel's theorem and information, Internat. J.
Theoret. Phys. 22 (1982), 941{954.
Incompleteness Theorems for Random Reals 259
5] G. J. Chaitin, Randomness and G
odel's theorem, \Mondes en
D
eveloppement," Vol. 14, No. 53, in press.
6] R. Courant and H. Robbins, \What is Mathematics?," Ox-
ford Univ. Press, London, 1941.
7] M. Davis, H. Putnam, and J. Robinson, The decision prob-
lem for exponential diophantine equations, Ann. Math. 74
(1961), 425{436.
8] M. Davis, \The Undecidable|Basic Papers on Undecidable
Propositions, Unsolvable Problems and Computable Functions,"
Raven, New York, 1965.
9] W. Feller, \An Introduction to Probability Theory and Its
Applications, I," Wiley, New York, 1970.
10] G. H. Hardy, \A Course of Pure Mathematics," 10th ed., Cam-
bridge Univ. Press, London, 1952.
11] J. P. Jones and Y. V. Matijasevic, Register machine proof
of the theorem on exponential diophantine representation of enu-
merable sets, J. Symbolic Logic 49 (1984), 818{829.
12] P. Martin-Lof, The denition of random sequences, Inform.
Control 9 (1966), 602{619.
13] C. E. Shannon and W. Weaver, \The Mathematical Theory
of Communication," Univ. of Illinois Press, Urbana, 1949.
14] R. M. Solovay, Private communication, 1975.
15] A. M. Turing, On computable numbers, with an application to
the Entscheidungsproblem, Proc. London Math. Soc. 42 (1937),
230{265 also in 8].
260 Part IV|Technical Papers on Self-Delimiting Programs
ALGORITHMIC ENTROPY
OF SETS
Computers & Mathematics with
Applications 2 (1976), pp. 233{245
Gregory J. Chaitin
IBM Thomas J. Watson Research Center
Yorktown Heights, NY 10598, U.S.A.
Abstract
In a previous paper a theory of program size formally identical to infor-
mation theory was developed. The entropy of an individual nite object
was dened to be the size in bits of the smallest program for calculating
it. It was shown that this is ; log2 of the probability that the object
is obtained by means of a program whose successive bits are chosen by
ipping an unbiased coin. Here a theory of the entropy of recursively
enumerable sets of objects is proposed which includes the previous the-
ory as the special case of sets having a single element. The primary
concept in the generalized theory is the probability that a computing
machine enumerates a given set when its program is manufactured by
coin ipping. The entropy of a set is dened to be ; log2 of this prob-
ability.
261
262 Part IV|Technical Papers on Self-Delimiting Programs
1. Introduction
In a classical paper on computability by probabilistic machines 1], de
Leeuw et al. showed that if a machine with a random element can
enumerate a specic set of natural numbers with positive probability,
then there is a deterministic machine that also enumerates this set. We
propose to throw further light on this matter by bringing into play the
concepts of algorithmic information theory 2,3].
As in 3], we require a computing machine to read the successive
bits of its program from a semi-innite tape that has been lled with
0's and 1's by ipping an unbiased coin, and to decide by itself where
to stop reading the program, for there is no endmarker. In 3] this
convention has the important consequence that a program can be built
up from subroutines by concatenating them.
In this paper we turn from nite computations to unending com-
putations. The computer is used to enumerate a set of objects instead
of a single one. An important dierence between this paper and 3] is
that here it is possible for the machine to read the entire program tape,
so that in a sense innite programs are permitted. However, following
1] it is better to think of these as cases in which a nondeterministic
machine uses coin-ipping innitely often.
Here, as in 3], we pick a universal computer that makes the prob-
ability of obtaining any given machine output as high as possible.
We are thus led to dene three concepts: P (A), the probability that
the standard machine enumerates the set A, which may be called the
algorithmic probability of the set A H (A), the entropy of the set A,
which is ; log2 of P (A) and the amount of information that must be
specied to enumerate A, denoted I (A), which is the size in bits of the
smallest program for A. In other words, I (A) is the least number n such
that for some program tape contents the standard machine enumerates
the set A and in the process of doing so reads precisely n bits of the
program tape.
One may also wish to use the standard machine to simultaneously
enumerate two sets A and B , and this leads to the joint concepts
P (A B ), H (A B ), and I (A B ). In 3] programs could be concatenated,
and this fact carries over here to programs that enumerate singleton sets
(i.e. sets with a single element). What about arbitrary sets? Programs
Algorithmic Entropy of Sets 263
that enumerate arbitrary sets can be merged by interweaving their bits
in the order that they are read when running at the same time, that is,
in parallel. This implies that the joint probability P (A B ) is not less
than the product of the individual probabilities P (A) and P (B ), from
which it is easy to show that H has all the formal properties of the
entropy concept of classical information theory 4]. This also implies
that I (A B ) is not greater than the sum of I (A) and I (B ).
The purpose of this paper is to propose this new approach and to
determine what is the number of sets A that have probability P (A)
greater than 2;n , in other words, that have entropy H (A) less than
n. It must be emphasized that we do not present a complete theory.
For example, the relationship between H (A) and I (A) requires further
study. In 3] we proved that the dierence between H (A) and I (A) is
bounded for singleton sets A, but we shall show that even for nite A
this is no longer the case.
2. De
nitions and Their Elementary Pro-
perties
The formal denition of computing machine that we use is the Tur-
ing machine. However, we have made a few changes in the standard
denition 5, pp. 13{16].
Our Turing machines have three tapes: a program tape, a work tape
and an output tape. The program tape is only innite to the right. It
can be read by the machine and it can be shifted to the left. Each
square of the program tape contains a 0 or a 1. The program tape is
initially positioned at its leftmost square. The work tape is innite in
both directions, can be read, written and erased, and can be shifted
in either direction. Each of its squares may contain a blank, a 0, or a
1. Initially all squares are blank. The output tape is innite in both
directions and it can be written on and shifted to the left. Each square
may contain a blank or a $. Initially all squares are blank.
A Turing machine with n states, the rst of which is its initial state,
is dened in a table with 6n entries which is consulted each machine
cycle. Each entry corresponds to one of the 6 possible contents of the
264 Part IV|Technical Papers on Self-Delimiting Programs
2 squares being read, and to one of the n states. All entries must be
present, and each species an action to be performed and the next state.
There are 8 possible actions: program tape left, output tape left, work
tape left/right, write blank/0/1 on work tape and write $ on output
tape.
Each way of lling this 6
n table produces a dierent n-state
Turing machine M . We imagine M to be equipped with a clock that
starts with time 1 and advances one unit each machine cycle. We call
a unit of time a quantum. Starting at its initial state M carries out
an unending computation, in the course of which it may read all or
part of the program tape. The output from this computation is a set
of natural numbers A. n is in A i a $ is written by M on the output
tape that is separated by exactly n blank squares from the previous $
on the tape. The time at which M outputs n is dened to be the clock
reading when two $'s separated by n blanks appear on the output tape
for the rst time.
Let p be a nite binary sequence (henceforth string ) or an innite
binary sequence (henceforth sequence ). M (p) denotes the set of nat-
ural numbers output (enumerated) by M with p as the contents of the
program tape if p is a sequence, and with p written at the beginning
of the program tape if p is a string. M (p) is always dened if p is a
sequence, but if p is a string and M reads beyond the end of p, then
M (p) is undened. However, instead of saying that M (p) is undened,
we shall say that M (p) halts. Thus for any string p, M (p) is either
dened or halts. If M (p) halts, the clock reading when M reads past
the end of p is said to be the time at which M (p) halts.
De
nition.
PM (A) is the probability that M (p) = A if each bit of the se-
quence p is obtained by a separate toss of an unbiased coin. In
other words, PM (A) is the probability that a program tape pro-
duced by coin ipping makes M enumerate A.
HM (A) = ; log2 PM (A) (= 1 if PM (A) = 0).
IM (A) is the number of bits in the smallest string p such that
M (p) = A (= 1 if no such p exists).
Algorithmic Entropy of Sets 265
We now pick a particular universal Turing machine U having the
ability to simulate any other machine as the standard one for use
throughout this paper. U has the property that for each M there is a
string %M such that for all sequences p, U (%M p) = M (p) and U reads
exactly as much of p as M does. To be more precise %M = 0g 1, where
g is the G
odel number for M . That is to say, g is the position of M in
a standard list of all possible Turing machine dening tables.
De
nition.
P (A) = PU (A) is the algorithmic probability of the set A.
H (A) = HU (A) is the algorithmic entropy of the set A.
set B = all natural numbers not of the form lk (k 2 A). By the lemma
each k in A can be recovered from the corresponding lk .
Proof. % instructs U to act as follows on px in order to produce
B . U works in stages. At stage t (t = 1 2 3 : : :) U simulates t time
quanta of the computation U 0(px). U fakes the halting-problem oracle
used by U 0 by answering that a program halts i it takes t time
quanta to do so. While simulating U 0(px), U notes the time at which
each output k occurs. U also keeps track of the latest stage at which
a change occurred in the chronological list of yes/no answers given by
the fake oracle during the simulation before k is output. Thus at stage
t there are current estimates for t1k , for t2k , and for tk = max t1k t2k , for
each k that currently seems to be in U 0(px). As t goes to innity these
estimates will attain the true values for k 2 A, and will not exist or
will go to innity for k 62 A.
Meanwhile U enumerates B . That part of B output by stage t
consists precisely of all natural numbers less than 2t+1 that are not of
the form 2t + k, for any k in the current approximation to U 0(px). Here
k
Addendum
An important advance in the line of research proposed in this paper
has been achieved by Solovay 8] with the aid of a crucial lemma of D.
A. Martin he shows that
I (A) 3H (A) + O(log H (A)):
In 9] and 10] certain aspects of the questions treated in this paper are
examined from a somewhat dierent point of view.
286 Part IV|Technical Papers on Self-Delimiting Programs
References
1] K. de Leeuw, E. F. Moore, C. E. Shannon and N. Shapiro, Com-
putability by probabilistic machines, in Automata Studies, C. E.
Shannon and J. McCarthy (Eds.), pp. 183{212. Princeton Uni-
versity Press, N.J. (1956).
2] G. J. Chaitin, Randomness and mathematical proof, Scient. Am.
232 (5), 47{52 (May 1975).
3] G. J. Chaitin, A theory of program size formally identical to in-
formation theory, J. Ass. Comput. Mach. 22 (3), 329{340 (July
1975).
4] C. E. Shannon and W. Weaver, The Mathematical Theory of
Communication. University of Illinois, Urbana (1949).
5] H. Rogers, Jr., Theory of Recursive Functions and Eective Com-
putability. McGraw-Hill, N.Y. (1967).
6] R. M. Solovay, unpublished manuscript on 3] dated May 1975.
7] S. K. Leung-Yan-Cheong and T. M. Cover, Some inequalities be-
tween Shannon entropy and Kolmogorov, Chaitin, and extension
complexities, Technical Report 16, Dept. of Statistics, Stanford
University, CA (October 1975).
8] R. M. Solovay, On random r.e. sets, Proceedings of the Third Latin
American Symposium on Mathematical Logic. Campinas, Brazil,
(July 1976), (to appear).
9] G. J. Chaitin, Information-theoretic characterizations of recursive
innite strings, Theor. Comput. Sci. 2, 45{48 (1976).
10] G. J. Chaitin, Program size, oracles, and the jump operation,
Osaka J. Math. (to appear).
Algorithmic Entropy of Sets 287
Communicated by J. T. Schwartz
Received July 1976
288 Part IV|Technical Papers on Self-Delimiting Programs
Part V
Technical Papers on
Blank-Endmarker Programs
289
INFORMATION-
THEORETIC
LIMITATIONS OF
FORMAL SYSTEMS
Journal of the ACM 21 (1974),
pp. 403{424
Gregory J. Chaitin1
Buenos Aires, Argentina
Abstract
An attempt is made to apply information-theoretic computational com-
plexity to metamathematics. The paper studies the number of bits of
instructions that must be a given to a computer for it to perform nite
and innite tasks, and also the amount of time that it takes the com-
puter to perform these tasks. This is applied to measuring the di!culty
of proving a given set of theorems, in terms of the number of bits of
axioms that are assumed, and the size of the proofs needed to deduce
the theorems from the axioms.
291
292 Part V|Technical Papers on Blank-Endmarker Programs
Key Words and Phrases:
complexity of sets, computational complexity, diculty of theorem-
proving, entropy of sets, formal systems, G
odel's incompleteness theo-
rem, halting problem, information content of sets, information content
of axioms, information theory, information time trade-os, metamath-
ematics, random strings, recursive functions, recursively enumerable
sets, size of proofs, universal computers
CR Categories:
5.21, 5.25, 5.27, 5.6
1. Introduction
This paper attempts to study information-theoretic aspects of compu-
tation in a very general setting. It is concerned with the information
that must be supplied to a computer for it to carry out nite or innite
computational tasks, and also with the time it takes the computer to
do this. These questions, which have come to be grouped under the
heading of abstract computational complexity, are considered to be of
interest in themselves. However, the motivation for this investigation
is primarily its metamathematical applications.
Computational complexity diers from recursive function theory in
that, instead of just asking whether it is possible to compute something,
one asks exactly how much eort is needed to do this. Similarly, instead
of the usual metamathematical approach, we propose to measure the
diculty of proving something. How many bits of axioms are needed
1Copyright c 1974, Association for Computing Machinery, Inc. General permis-
sion to republish, but not for prot, all or part of this material is granted provided
that ACM's copyright notice is given and that reference is made to the publica-
tion, to its date of issue, and to the fact that reprinting privileges were granted by
permission of the Association for Computing Machinery.
An early version of this paper was presented at the Courant Institute Computa-
tional Complexity Symposium, New York, October 1971. 28] includes a nontech-
nical exposition of some results of this paper. 1] and 2] announce related results.
Author's address: Rivadavia 3580, Dpto. 10A, Buenos Aires, Argentina.
Information-Theoretic Limitations of Formal Systems 293
to be able to obtain a set of theorems? How long are the proofs needed
to demonstrate them? What is the trade-o between how much is
assumed and the size of the proofs?
We consider the axioms of a formal system to be a program for
listing the set of theorems, and the time at which a theorem is written
out to be the length of its proof.
We believe that this approach to metamathematics may yield valu-
able dividends. Mathematicians were at rst greatly shocked at, and
then ignored almost completely, G
odel's announcement that no set of
axioms for number theory is complete. It wasn't clear what, in practice,
was the signicance of G
odel's theorem, how it should aect the every-
day activities of mathematicians. Perhaps this was because the unprov-
able propositions appeared to be very pathological singular points.23
The approach of this paper, in contrast, is to measure the power of
a set of axioms, to measure the information that it contains. We shall
see that there are circumstances in which one only gets out of a set of
axioms what one puts in, and in which it is possible to reason in the
following manner. If a set of theorems constitutes t bits of information,
and a set of axioms contains less than t bits of information, then it is
impossible to deduce these theorems from these axioms.
We consider that this paper is only a rst step in the direction of
such an approach to metamathematics4 a great deal of work remains to
be done to clarify these matters. Nevertheless, we would like to sketch
here the conclusions which we have tentatively drawn.5
2 In 3] and 4] von Neumann analyzes the eect of Godel's theorem upon math-
ematicians. Weyl's reaction to Godel's theorem is quoted by Bell 5]. The original
source is 6]. See also Weyl's discussion 7] of Godel's views regarding his incom-
pleteness theorem.
3 For nontechnical expositions of G odel's incompleteness theorem, see 8, 9, 10,
Sec. 1, pp. xv-xviii, 11, and 12]. 28] contains a nontechnical exposition of an
incompleteness theorem analogous to Berry's paradox that is Theorem 4.1 of this
paper.
4 13{16] are related in approach to this paper. 13, 15, and 16] are concerned
with measuring the size of proofs and the eect of varying the axioms upon their
size. In 14] Cohen \measures the strength of a formal] system by the ordinals
which can be handled in the system."
5 The analysis that follows of the possible signicance of the results of this paper
has been in"uenced by 17 and 18], in addition to the references cited in Footnote
294 Part V|Technical Papers on Blank-Endmarker Programs
After empirically exploring, in the tradition of Euler and Gauss, the
properties of the natural numbers one may discover interesting regular-
ities. One then has two options. The rst is to accept the conjectures
one has formulated on the basis of their empirical corroboration, as
an experimental scientist might do. In this way one may have a great
many laws to remember, but will not have to bother to deduce them
from other principles. The other option is to try to nd a theory for
one's observations, or to see if they follow from existing theory. In this
case it may be possible to reduce a great many observations into a few
general principles from which they can be deduced. But there is a cost:
one can now only arrive at the regularities one observed by means of
long demonstrations.
Why use formal systems, instead of proceeding empirically? First
of all, if the empirically derived conjectures aren't independent facts,
reducing them to a few common principles allows one to have to re-
member less assumptions, and this is easier to do, and is much safer,
as one is assuming less. The cost is, of course, the size of the proofs.
What attitude, then, does this suggest toward G
odel's theorem that
any formalization of number theory is incomplete? It tends to provide
theoretical justication for the attitude that number theorists have in
fact adopted when they extensively utilize in their work hypotheses such
as that of Riemann concerning the zeta function. G
odel's theorem does
not mean that mathematicians must give up hope of understanding the
properties of the natural numbers it merely means that one may have
to adopt new axioms as one seeks to order and interrelate, to organize
and comprehend, ever more extensive mathematical observations. I.e.
the mathematician shouldn't be more upset than the physicist when he
needs to assume a new axiom nor should he be too horried when an
axiom must be abandoned because it is found that it contradicts pre-
viously existing theory, or because it predicts properties of the natural
numbers that are not corroborated empirically. In a word, we propose
that there may be theoretical justication for regarding number theory
somewhat more like a dynamic empirical science than as a closed static
body of theory.
This paper grew out of work on the concept of an individual random,
2. Incidentally, it is interesting to examine 19, p. 112] in the light of this analysis.
Information-Theoretic Limitations of Formal Systems 295
patternless, chaotic, unpredictable string of bits. This concept has been
rigorously dened in several ways, and the properties of these random
strings have been studied by several authors (see, for example, 20{28]).
Most strings are random they have no special distinguishing features
they are typical and hard to tell apart. But can it be proved that a
particular string is random? The answer is that about n bits of axioms
are needed to be able to prove that a particular n-bit string is random.
More precisely, the train of thought was as follows. The entropy,
or information content, or complexity, of a string is dened to be the
number of bits needed to specify it so eectively that it can be con-
structed. A random n-bit string is about n bits of information, i.e. has
complexity/entropy/information content n there is essentially noth-
ing better to do if one wishes to specify such a string than just show it
directly. But the string consisting of 1,000,000 repetitions of the 6-bit
pattern 000101 has far less than 6,000,000 bits of complexity. We have
just specied it using far fewer bits.
What if one wishes to be able to determine each string of complexity
n and its complexity? It turns out that this requires n + O(1) bits of
axioms at least n ; c bits are necessary (Theorem 4.1), and n + c bits
are sucient (Theorem 4.3). But the proofs will be enormously long
unless one essentially directly takes as axioms all the theorems that
one wishes to prove, and in that case there will be an enormously great
number of bits of axioms (Theorem 7.6(c)).
Another theme of this paper arises from the following metamathe-
matical considerations, which are well known (see, for example, 29]).
In a formal system without a decision method, it is impossible to bound
the size of a proof of a theorem by a recursive function of the number
of characters in the statement of the theorem. For if there were such
a function f , one could decide whether or not an arbitrary proposition
p is a theorem, by merely checking if a proof for it appears among the
nitely many possible proofs of size bounded by f of the number of
characters in p.
Thus, in a formal system having no decision method, there are very
profound theorems, theorems that have short statements, but need im-
mensely long proofs. In Section 10 we study the function e(n), neces-
sarily nonrecursive, dened to be the least s such that all theorems of
the formal system with n characters have proofs of size s.
296 Part V|Technical Papers on Blank-Endmarker Programs
To close this introduction, we would like to mention without proof
an example that shows particularly clearly the relationship between the
number of bits of axioms that are assumed and what can be deduced.
This example is based on the work of M. Davis, Ju. V. Matisjasevi%c,
H. Putnam, and J. Robinson that settled Hilbert's tenth problem (cf.
30]). There is a polynomial P in k +2 variables with integer coecients
that has the following property. Consider the innite string whose ith
bit is 1 or 0 depending on whether or not the set
Si = fn 2 N j9x1 : : : xk 2 N P (i n x1 : : : xk ) = 0g
is innite. Here N denotes the natural numbers. This innite binary
sequence is random, i.e. the complexity of an initial segment is asymp-
totic to its length. What is the number of bits of axioms that is needed
to be able to prove for each natural number i < n whether or not the
set Si is innite? By using the methods of Section 4, it is easy to see
that the number of bits of axioms that is needed is asymptotic to n.
2. De
nitions Related to Computers and
Complexity
This paper is concerned with measuring the diculty of computing
nite and innite sets of binary strings. The binary strings are con-
sidered to be ordered in the following fashion: (, 0, 1, 00, 01, 10, 11,
000, 001, 010, 011, 100, 101, 110, 111, 0000 : : : In order to be able to
also study the diculty of computing nite or innite sets of natural
numbers, we consider each binary string to simultaneously be a natural
number: the nth binary string corresponds to the natural number n.
Ordinal numbers are considered to start with 0, not 1. For example,
we speak of the 0th string of length n.
In order to be able to study the diculty of computing nite and
innite sets of mathematical propositions, we also consider that each
binary string is simultaneously a proposition. Propositions use a nite
alphabet of characters which we suppose includes all the usual math-
ematical symbols. We consider the nth binary string to correspond to
the nth proposition, where the propositions are in lexicographical order
dened by an arbitrary ordering of the symbols of their alphabet.
Information-Theoretic Limitations of Formal Systems 297
Henceforth, we say \string" instead of \binary string," it being un-
derstood that this refers to a binary string. It should be clear from the
context whether we are considering something to be a string, a natural
number, or a proposition.
Operations with strings include exponentiation: 0k and 1k denote
the string of k 0's and k 1's, respectively. lg(s) denotes the length of a
string s. Note that the length lg(n) of a natural number n is therefore
blog2 (n + 1)c. The maximum element of a nite set of strings S is
denoted by max S , and we stipulate that max = 0. #(S ) denotes the
number of elements in a nite set S .
We use these notational conventions in a somewhat tricky way to
indicate how to compactly code several pieces of information into a
single string. Two coding techniques are used.
(a) Consider two natural numbers n and k such that 0 k < 2n.
We code n and k into the string s = 0n + k, i.e. the kth string
of length n. Given the string s, one recovers n and k as follows:
n = lg(s), k = s ; 0lg(s). This technique is used in the proofs
of Theorems 4.3, 6.1, 7.4, and 10.1. In three of these proofs k
is #(S ), where S is a subset of the strings having length < n n
and #(S ) are coded into the string s = 0n + #(S ). In the case
of Theorem 6.1, k is the number that corresponds to a string s
of length < n (thus 0 k < 2n ; 1) n and s are coded into the
string s0 = 0n + s.
(b) Consider a string p and a natural number k. We code p and k
into the string s = 0lg(k)1kp, i.e. the string consisting of lg(k) 0's
followed by a 1 followed by the kth string followed by the string
p. The length of the initial run of 0's is the same as the length of
the kth string and is used to separate kp in two and recover k and
p from s. Note that lg(s) = lg(p) + 2 lg(k) + 1. This technique is
used in the proof of Theorem 10.4. The proof of Theorem 4.1 uses
a simpler technique: p and k are coded into the string s = 0k 1p.
But this coding is less economical, for lg(s) = lg(p) + k + 1.
We use an extremely general denition of computer this has the
advantage that if one can show that something is dicult to compute
298 Part V|Technical Papers on Blank-Endmarker Programs
using any such computer, this will be a very strong result. A computer
is dened by indicating whether it has halted and what it has output,
as a function of its program and the time. The formal denition of a
computer C is an ordered pair hC HC i consisting of two total recursive
functions
C : X
N ! fS 2 2X jS is niteg
3. De
nitions Related to Formal Systems
This paper deals with the information and time needed to carry out
computations. However, we wish to apply these results to formal sys-
tems. This section explains how this is done.
The abstract denition used by Post that a formal system is an r.e.
set of propositions is close to the viewpoint of this paper (see 31]).6
6For standard denitions of formal systems, see, for example, 32{34] and 10, p.
117].
302 Part V|Technical Papers on Blank-Endmarker Programs
However, we are not quite this unconcerned with the internal details of
formal systems.
The historical motivation for formal systems was of course to con-
struct deductive theories with completely objective, formal criteria for
the validity of a demonstration. Thus, a fundamental characteristic of a
formal system is an algorithm for checking the validity of proofs. From
the existence of this proof verication algorithm, it follows that the set
of all theorems that can be deduced from the axioms p by means of
the rules of inference by proofs t characters in length is given by a
total recursive function C of p and t. To calculate C (p t) one applies
the proof verication algorithm to each of the nitely many possible
demonstrations having t characters.
These considerations motivate the following denition. The rules
of inference of a class of formal systems is a total recursive function
C : X
N ! fS 2 2X jS is niteg with the property that C (p t)
C (p t + 1). The value of C (p t) is the nite (possibly empty) set of the
theorems that can be proven from the axioms p by means of proofs S t
in size. Here p is a string and t is a natural number. C (p) = t C (p t)
is the set of theorems that are consequences of the axioms p. The
ordered pair hC pi, which implies both the choice of rules of inference
and axioms, is a particular formal system.
Note that this denition is the same as the denition of a computer
with the notion of \halting" omitted. Thus given any rules of inference,
there is a computer that never halts whose output up to time t consists
precisely of those propositions that can be deduced by proofs of size
t from the axioms the computer is given as its program. And given
any computer, there are rules of inference such that the set of theorems
that can be deduced by proofs of size t from the program, is precisely
the set of strings output by the computer up to time t. For this reason
we consider the following notions to be synonymous: \computer" and
\rules of inference," \program" and \axioms," and \output up to time
t" and \theorems with proofs of size t."
The rules of inference that correspond to the universal computer
U are especially interesting, because they permit axioms to be very
economical. When using the rules of inference U , the number of bits
of axioms needed to deduce a given set of propositions is precisely the
e-complexity of the set of propositions. If n bit of axioms are needed to
Information-Theoretic Limitations of Formal Systems 303
obtain a set T of theorems using the rules of inference U , then at least
n ; sim(C ) bits of axioms are needed to obtain them using the rules
of inference C i.e. if C (a) = T , then lg(a) Ie(T ) ; sim(C ). Thus it
could be said that U is among the rules of inference that permit axioms
to be most economical. In Section 4 we are interested exclusively in
the number of bits needed to deduce certain sets of propositions, not
in the size of the proofs. We shall therefore only consider the rules of
inference U in Section 4, i.e. formal systems of the form hU pi.
As a nal comment regarding the rules of inference U , we would
like to point out the interesting fact that if these rules of inference are
used, then a minimal set of axioms for obtaining a given set of theorems
must necessarily be random. This is just another way of saying that a
minimal e-description is a highly random string, which was mentioned
at the end of Section 2.
The following theorem also plays a role in the interpretation of our
results in terms of formal systems.
Theorem 3.1.
Let f be a recursive function, and g be a recursive predicate.
(a) Let C be a computer. There is a computer C 0 that never halts
such that C 0(p t) = ff (s)js 2 C (p t) & g(s)g for all p and t.
(b) There is a c such that Ie(ff (s)js 2 S & g(s)g) Ie(S ) + c for all
r.e. sets S .
Proof. (a) is immediate (b) follows by taking C = U in part (a).
Q.E.D.
The following is an example of the use of Theorem 3.1. Suppose we
wish to study the size of the proofs that \n 2 H " in a formal system
hC pi, where n is a numeral for a natural number. If we have a result
concerning the speed with which any computer can enumerate the set
H , we apply this result to the computer C 0 that has the property that
n 2 C 0(p t) i \n 2 H "2 C (p t) for all n, p, and t. In this case
the predicate g selects those strings that are propositions of the form
\n 2 H ," and the function f transforms \n 2 H " to n.
Here is another kind of example. Suppose there is a computer C
that enumerates a set H very quickly. Then there is a computer C 0
304 Part V|Technical Papers on Blank-Endmarker Programs
that enumerates propositions of the form \n 2 H " just as quickly. In
this case the predicate g is taken to be always true, and the function f
transforms n to \n 2 H ."
(e) Let Tn be the set of all true propositions of the form \s 62 P "
with lg(s) n. Ie(Tn) = n + O(1). In other words, a formal system
hU pi whose theorems consist precisely of all true propositions of the
form \s 62 P " with lg(s) n, requires n + O(1) bits of axioms i.e.
n ; c bits are necessary and n + c bits are sucient to obtain this set
of theorems.
Proof. (a) This is an immediate consequence of the fact that the
set of all true propositions of the form \I (s) n" is r.e.
(b) We must show that for each n there is a string of length n whose
complexity is greater than or equal to its length. There are 2n strings
of length n. As there are exactly 2n ; 1 program of length < n, there
are < 2n strings of complexity < n. Thus at least one string of length
n must be of complexity n.
(c) Consider the computer C that does the following when it is
given the program p. It simulates running p on U . As C generates
U (p), it examines each string in it to see if it is a proposition of the
form \s 62 P ," where s is a string of length 1. If it is, C outputs the
proposition \I (s) > n" where n = lg(s) ; 1.
If p satises the hypothesis, i.e. \s 62 P " is in U (p) only if it is
310 Part V|Technical Papers on Blank-Endmarker Programs
true, then C (p) enumerates true propositions of the form \I (s) > n"
with n = lg(s) ; 1. It follows by Theorem 4.1 that n must be <
Ie(C (p))+ c0 lg(p)+sim(C )+ c0. Thus lg(s) ; 1 < lg(p)+sim(C )+ c0,
and part (c) of the theorem is proved with c = sim(C ) + c0 + 1.
(d) Consider the computer C that does the following when it is given
the program p. It simulates running p on U . As C generates U (p), it
takes each string s in U (p), and outputs the proposition \s 62 P ."
Suppose S contains no string in P . Let p be a minimal e-description
of S , i.e. U (p) = S and lg(p) = Ie(S ). Then C (p) enumerates true
propositions of the form \s 62 P " with s 2 S . By part (c) of this
theorem,
lg(s) < Ie(C (p)) + c0 lg(p) + sim(C ) + c0 = Ie(S ) + sim(C ) + c0:
Part (d) of the theorem is proved with c = sim(C ) + c0.
(e) That Ie(Tn) n ; c follows from part (c) of this theorem. The
proof that Ie(Tn) n + c is obtained by changing the denition of the
computer C in the proof of Theorem 4.3 in the following manner. After
C has determined each string of complexity n and its complexity,
C determines each string s of complexity n whose complexity is
greater than or equal to its length, and then C outputs each such s in
a proposition of the form \s 62 P ." Q.E.D.
Theorem 4.6. (a) There is a c such that for all programs p, if a
proposition of the form \Ie(U (s)) > n" (s a string, n a natural number)
is in U (p) only if Ie(U (s)) > n, then \Ie(U (s)) > n" is in U (p) only if
n < lg(p) + c.
In other words: (b) There is a c such that for all formal systems
hU pi, if \Ie (U (s)) > n" is a theorem of hU pi only if it is true, then
\Ie(U (s)) > n" is a theorem of hU pi only if n < lg(p) + c.
For any r.e. set of propositions T , one obtains the following from
(a) by taking p to be a minimal e-description of T : (c) If T has the
property that \Ie (U (s)) > n" is in T only if Ie(U (s)) > n, then T has
the property that \Ie(U (s)) > n" is in T only if n < Ie(T ) + c.
Proof. By Theorem 2.1(c), there is a c0 such that Ie(U (s)) > n
implies I (s) > n ; c0.
Consider the computer C that does the following when it is given the
program p. It simulates running p on U . As C generates U (p), it checks
Information-Theoretic Limitations of Formal Systems 311
each string in it to see if it is a proposition of the form \Ie (U (s)) > n"
with s a string and n a natural number. Each time it nds such a
proposition in which n c0, C outputs the proposition \I (s) > m"
where m = n ; c0 0.
If p satises the hypothesis of the theorem, then C (p) enumerates
true propositions of the form \I (s) > m." \I (s) > m" (m = n ; c0 0)
is in C (p) i \Ie (U (s)) > n" (n c0) is in U (p). By Theorem 4.1,
\I (s) > m" is in C (p) only if
m < Ie(C (p)) + c00 lg(p) + sim(C ) + c00:
Thus \Ie(U (s)) > n" (n c0) is in U (p) only if n ; c0 < lg(p)+sim(C )+
c00. The theorem is proved with c = sim(C ) + c00 + c0. Q.E.D.
7.2
Now we begin the formal exposition, which is couched exclusively in
terms of computers.
In this section we study the set K (n) consisting of all strings of com-
plexity n. This set turns out to be extremely dicult to calculate,
or even to enumerate a superset of|either the program or the time
needed must be extremely large. In order to measure this diculty, we
will rst measure the resources needed to output a(n).
De
nition 7.1. K (n) = fsjI (s) ng. Note that this set may be
316 Part V|Technical Papers on Blank-Endmarker Programs
empty, and #(K (n)) isn't greater than 2n+1 ; 1, inasmuch as there are
exactly 2n+1 ; 1 programs of length n.
We shall show that a(n) and the resources required to calcu-
late/enumerate K (n) grow equally quickly. What do we mean by the
resources required to calculate a nite set, or to enumerate a superset
of it? It is assumed that the computer C is being used to do this.
De
nition 7.2. Let S be a nite set of strings. r(S ), the resources
required to calculate S , is the least r such that there is a program p
of length r having the property that C (p r) = S and is halted. If
there is no such r, r(S ) is undened. re(S ), the resources required to
enumerate a superset of S , is the least r such that there is a program p
of length r with the property that S C (p r). If there is no such r,
re(S ) is undened. We abbreviate r(fsg) and re(fsg) as r(s) and re (s).
We shall nd very useful the notion of the set of all output produced
by the computer C with information and time resources limited to r.
We denote this by Cr . S
De
nition 7.3. Cr = C (p r) (lg(p) r).
We now list for future reference basic properties of these concepts.
Theorem 7.0.
(
(a) a(n) = max K (n) if K (n) 6=
undened if K (n) = :
(b) K (n) 6= , and a(n) is dened, i n n. Here n = min I (s),
where the minimum is taken over all strings s.
(c) For all r, Cr Cr+1.
In (d) to (k), S and S 0 are arbitrary nite sets of strings.
(d) S Cr (S) if re(S ) is dened.
e
10.2
Now we begin the formal exposition, which is couched exclusively in
terms of computers.
Consider an r.e. set of strings R and a particular computer C and
p* such that C (p) = R. How quickly is R enumerated? This is, what
is the time e(n) that it takes to output all elements of R of length n?
De
nition 10.1. Rn = fs 2 Rj lg(s) ng. e(n) = the least t such
that Rn C (p t).
We shall see that the rate of growth of the total function e(n) can
be related to the growth of the complexity of Rn. In this way we shall
show that some r.e. sets R are the most dicult to enumerate, i.e. take
the most time.
Theorem 10.1.8 There is a c such that for all n, I (Rn) n + c.
8 This theorem, with a dierent proof, is due to Loveland 37, p. 64].
324 Part V|Technical Papers on Blank-Endmarker Programs
Proof. 0 #(Rn ) 2n+1 ; 1, for there are precisely 2n+1 ; 1
strings of length n. Consider p, the #(Rn)-th string of length n + 1
i.e. p = 0n+1 + #(Rn). This string has both n (= lg(p) ; 1) and
#(Rn) (= p ; 0lg(p)) coded into it. When this string p is its program,
the computer C generates the r.e. set R by simulating C (p), until it
has found #(Rn) strings of length n in R. C then outputs this set
of strings, which is Rn , and halts. Thus I (Rn) lg(p) + sim(C ) =
n + 1 + sim(C ). Q.E.D.
Theorem 10.2.
(a) There is a c such that for all n, e(n) a(I (Rn) + c).
(b) e a.
Proof. (a) Consider the computer C that does the following. Given
a description p of Rn as its program, the computer C rst simulates
running p on U in order to determine Rn . Then it simulates C (p t)
for t = 0 1 2 : : : until Rn C (p t). C then outputs the nal value
of t, which is e(n), and halts.
This shows that sim(C ) bits need be added to the length of a
description of Rn to bound the length of a description of e(n) i.e. if
U (p) = Rn and halts, then C (p) = fe(n)g and halts, and thus I (e(n))
lg(p) + sim(C ). Taking p to be a minimal description of Rn, we have
lg(p) = I (Rn), and thus I (e(n)) I (Rn) + sim(C ). By Theorem
5.1(c), this gives us e(n) a(I (Rn) + sim(C )). Part (a) of the theorem
is proved with c = sim(C ).
(b) By part (a) of this theorem, e(n) a(I (Rn) + c). And by
Theorem 10.1, I (Rn) n + c0 for all n. Applying Theorem 5.1(b), we
obtain e(n) a(I (Rn)+ c) a(n + c0 + c) for all n. Thus e a. Q.E.D.
Theorem 10.3. If a e, then there is a c such that I (Rn) n ; c
for all n.
Proof. By Theorem 7.0(b) and the denition of , if a e, then
there is a c0 such that for all n n, a(n) e(n + c0). And by Theorem
10.2(a), there is a c1 such that e(n + c0) a(I (Rn+c0 ) + c1) for all n.
We conclude that for all n n, a(n) a(I (Rn+c0 ) + c1).
By Theorems 6.5 and 5.1(b), there is a c2 such that if a(m) is dened
and m n ; c2, then a(m) < a(n). As we have shown in the rst
Information-Theoretic Limitations of Formal Systems 325
paragraph of this proof that for all n n, a(n) a(I (Rn+c0 ) + c1), it
follows that I (Rn+c0 ) > n ; c2.
In other words, for all n n, I (Rn+c0 ) > (n + c0) ; c0 ; c1 ; c2. And
thus for all n, I (Rn) n ; c0 ; c1 ; c2 ; M , where M = maxn<n +c0 n ;
t <t0
0
i
328 Part V|Technical Papers on Blank-Endmarker Programs
and t0 is the greatest natural number t such that if t0 < t0 then Pi
applied to hp t0i yields an output in t steps.
The universal computer U that we have just dened is, in fact,
eectively universal: to simulate the computation that C i performs
when it is given the program p, one gives U the program p0 = 0i 1p,
and thus p0 can be obtained from p in an eective manner. Our second
example of a universal computer, U 0, is not eectively universal, i.e.
there is no eective procedure for obtaining p0 from p.10
U 0 is dened as follows:
8 0
>
< U 0(( t) = and is halted,
> U (0p t) = U (p t) ; f1g and is halted i U (p t) is, and
: U 0(1p t) = U (p t) f1g and is halted i U (p t) is.
I.e. U 0 is almost identical to U , except that it eliminates the string 1
from the output, or forces the string 1 to be included in the output,
depending on whether the rst bit of its program is 0 or not. It is
easy to see that U 0 cannot be eectively universal. If it were, given
any program p for U , by examining the rst bit of the program p0 for
U 0 that simulates it, one could decide whether or not the string 1 is in
U (p). But there cannot be an eective procedure for deciding, given
any p, whether or not the string 1 is in U (p).
Added in Proof
The following additional references have come to our attention.
Part of G
odel's analysis of Cantor's continuum problem 39] is highly
relevant to the philosophical considerations of Section 1. Cf. especially
39, pp. 265, 272].
Schwartz 40, pp. 26{28] rst reformulates our Theorem 4.1 using
the hypothesis that the formal system in question is a consistent ex-
tension of arithmetic. He then considerably extends Theorem 4.1 40,
pp. 32{34]. The following is a paraphrase of these pages.
Consider a recursive function f : N ! N that grows very quickly,
say f (n) = n!!!!!!!!!!. A string s is said to have property f if the fact
10 The denition of U is an adaptation of 38, p. 42, Exercise 2-11].
0
Information-Theoretic Limitations of Formal Systems 329
that p is a description of fsg either implies that lg(p) lg(s) or that
U (p) halts at time > f (lg(s)). Clearly a 1000-bit string with property f
is very dicult to calculate. Nevertheless, a counting argument shows
that there are strings of all lengths with property f , and they can be
found in an eective manner 40, Lemma 7, p. 32]. In fact, the rst
string of length n with property f is given by a recursive function of
n, and is therefore of complexity log2 n + c. This is thus an example
of an extreme trade-o between program size and the length of com-
putation. Furthermore, an argument analogous to the demonstration
of Theorem 4.1 shows that proofs that specic strings have property f
must necessarily be extremely tedious (if some natural hypotheses con-
cerning U and the formal system in question are satised) 40, Theorem
8, pp. 33{34].
41, Item 2, pp. 12{20] sheds light on the signicance of these results.
Cf. especially the rst unitalicized paragraphs of answers numbers 4 and
8 to the question \What is programming?" 41, pp. 13, 15{16]. Cf. also
40, Appendix, pp. 63{69].
Index of Symbols
Section 2: lg(s) max S #(S ) X N C (p t) C (p) U sim(C ) I (S ) Ie(S )
Section 3: hC pi hU pi
Section 4: H P
Section 5: a(n)
Section 6: b(n)
Section 7: K (n) r(S ) re(S ) Cr n
Section 8: dC (n)
Section 10: Rn e(n)
330 Part V|Technical Papers on Blank-Endmarker Programs
References
1] Chaitin, G. J. Information-theoretic aspects of Post's construc-
tion of a simple set. On the diculty of generating all binary
strings of complexity less than n. (Abstracts.) AMS Notices 19
(1972), pp. A-712, A-764.
2] Chaitin, G. J. On the greatest natural number of denitional
or information complexity n. There are few minimal descrip-
tions. (Abstracts.) Recursive Function Theory: Newsletter, no. 4
(1973), pp. 11{14, Dep. of Math., U. of California, Berkeley.
3] von Neumann, J. Method in the physical sciences. J. von
Neumann|Collected Works, Vol. VI, A. H. Taub, Ed., MacMil-
lan, New York, 1963, No. 36, pp. 491{498.
4] von Neumann, J. The mathematician. In The World of Math-
ematics, Vol. 4, J. R. Newman, Ed., Simon and Schuster, New
York, 1956, pp. 2053{2063.
5] Bell, E. T. Mathematics: Queen and Servant of Science.
McGraw-Hill, New York, 1951, pp. 414{415.
6] Weyl, H. Mathematics and logic. Amer. Math. Mon. 53 (1946),
1{13.
7] Weyl, H. Philosophy of Mathematics and Natural Science.
Princeton U. Press, Princeton, N.J., 1949, pp. 234{235.
8] Turing, A. M. Solvable and unsolvable problems. In Science
News, no. 31 (1954), A. W. Heaslett, Ed., Penguin Books, Har-
mondsworth, Middlesex, England, pp. 7{23.
9] Nagel, E., and Newman, J. R. Godel's Proof. Routledge &
Kegan Paul, London, 1959.
10] Davis, M. Computability and Unsolvability. McGraw-Hill, New
York, 1958.
11] Quine, W. V. Paradox. Scientic American 206, 4 (April 1962),
84{96.
Information-Theoretic Limitations of Formal Systems 331
12] Kleene, S. C. Mathematical Logic. Wiley, New York, 1968, Ch.
V, pp. 223{282.
13] Godel, K. On the length of proofs. In The Undecidable, M.
Davis, Ed., Raven Press, Hewlett, N.Y., 1965, pp. 82{83.
14] Cohen, P. J. Set Theory and the Continuum Hypothesis. Ben-
jamin, New York, 1966, p. 45.
15] Arbib, M. A. Speed-up theorems and incompleteness theorems.
In Automata Theory, E. R. Cainiello, Ed., Academic Press, New
York, 1966, pp. 6{24.
16] Ehrenfeucht, A., and Mycielski, J. Abbreviating proofs by
adding new axioms. AMS Bull. 77 (1971), 366{367.
17] Polya, G. Heuristic reasoning in the theory of numbers. Amer.
Math. Mon. 66 (1959), 375{384.
18] Einstein, A. Remarks on Bertrand Russell's theory of knowl-
edge. In The Philosophy of Bertrand Russell, P. A. Schilpp, Ed.,
Northwestern U., Evanston, Ill., 1944, pp. 277{291.
19] Hawkins, D. Mathematical sieves. Scientic American 199, 6
(Dec. 1958), 105{112.
20] Kolmogorov, A. N. Logical basis for information theory and
probability theory. IEEE Trans. IT-14 (1968), 662{664.
21] Martin-Lof, P. Algorithms and randomness. Rev. of Internat.
Statist. Inst. 37 (1969), 265{272.
22] Loveland, D. W. A variant of the Kolmogorov concept of com-
plexity. Inform. and Contr. 15 (1969), 510{526.
23] Chaitin, G. J. On the diculty of computations. IEEE Trans.
IT-16 (1970), 5{9.
24] Willis, D. G. Computational complexity and probability con-
structions. J. ACM 17, 2 (April 1970), 241{259.
332 Part V|Technical Papers on Blank-Endmarker Programs
25] Zvonkin, A. K., and Levin, L. A. The complexity of nite
objects and the development of the concepts of information and
randomness by means of the theory of algorithms. Russian Math.
Surveys 25, 6 (Nov.-Dec. 1970), 83{124.
26] Schnorr, C. P. Zufalligkeit und Wahrscheinlichkeit |
Eine algorithmische Begrundung der Wahrscheinlichkeitstheorie.
Springer, Berlin, 1971.
27] Fine, T. L. Theories of Probability|An Examination of Foun-
dations. Academic Press, New York, 1973.
28] Chaitin, G. J. Information-theoretic computational complexity.
IEEE Trans. IT-20 (1974), 10{15.
29] DeLong, H. A Prole of Mathematical Logic. Addison-Wesley,
Reading, Mass., 1970, Sec. 28.2, pp. 208{209.
30] Davis, M. Hilbert's tenth problem is unsolvable. Amer. Math.
Mon. 80 (1973), 233{269.
31] Post, E. Recursively enumerable sets of positive integers and
their decision problems. In The Undecidable, M. Davis, Ed.,
Raven Press, Hewlett, N.Y., 1965, pp. 305{307.
32] Minsky, M. L. Computation: Finite and Innite Machines.
Prentice-Hall, Englewood Clis, N.J., 1967, Sec. 12.2{12.5, pp.
222{232.
33] Shoenfield, J. R. Mathematical Logic. Addison-Wesley, Read-
ing, Mass., 1967, Sec. 1.2, pp. 2{6.
34] Mendelson, E. Introduction to Mathematical Logic. Van Nos-
trand Reinhold, New York, 1964, pp. 29{30.
35] Russell, B. Mathematical logic as based on the theory of types.
In From Frege to Godel, J. van Heijenoort, Ed., Harvard U. Press,
Cambridge, Mass., 1967, pp. 150{182.
36] Lin, S., and Rado, T. Computer studies of Turing machine
problems. J. ACM 12, 2 (April 1965), 196{212.
Information-Theoretic Limitations of Formal Systems 333
37] Loveland, D. W. On minimal-program complexity measures.
Conf. Rec. of the ACM Symposium on Theory of Computing,
Marina del Rey, California, May 1969, pp. 61{65.
38] Rogers, H. Theory of Recursive Functions and Eective Com-
putability. McGraw-Hill, New York, 1967.
39] Godel, K. What is Cantor's continuum problem? In Philosophy
of Mathematics, Benacerraf, P., and Putnam, H., Eds., Prentice-
Hall, Englewood Clis, N.J., 1964, pp. 258{273.
40] Schwartz, J. T. A short survey of computational complexity
theory. Notes, Courant Institute of Mathematical Sciences, NYU,
New York, 1972.
41] Schwartz, J. T. On Programming: An Interim Report on
the SETL Project. Installment I: Generalities. Lecture Notes,
Courant Institute of Mathematical Sciences, NYU, New York,
1973.
42] Chaitin, G. J. A theory of program size formally identical to in-
formation theory. Res. Rep. RC4805, IBM Res. Center, Yorktown
Heights, N.Y., 1974.
Gregory J. Chaitin
IBM Thomas J. Watson Research Center
Jacob T. Schwartz1
Courant Institute of Mathematical Sciences
Abstract
Solovay and Strassen, and Miller and Rabin have discovered fast al-
gorithms for testing primality which use coin-ipping and whose con-
335
336 Part V|Technical Papers on Blank-Endmarker Programs
clusions are only probably correct. On the other hand, algorithmic in-
formation theory provides a precise mathematical denition of the no-
tion of random or patternless sequence. In this paper we shall describe
conditions under which if the sequence of coin tosses in the Solovay{
Strassen and Miller{Rabin algorithms is replaced by a sequence of heads
and tails that is of maximal algorithmic information content, i.e., has
maximal algorithmic randomness, then one obtains an error-free test
for primality. These results are only of theoretical interest, since it
is a manifestation of the Godel incompleteness phenomenon that it is
impossible to \certify" a sequence to be random by means of a proof,
even though most sequences have this property. Thus by using certi-
ed random sequences one can in principle, but not in practice, convert
probabilistic tests for primality into deterministic ones.
Hence
jfs 2 (j (j + 2c)) : 9J 2 (j ) J is composite & Z (s J )]gj
2j (j +2c)+1;2c :
(1)
Since any member s of the set S appearing in (1) can be calculated
uniquely if we are given c and the ordinal number of the position of s
in S expressed as a j (j + 2c) + 1 ; 2c bit string, it follows that
I (s) j (j + 2c) + 1 ; 2c + 2I (c) + O(1) j (j + 2c) ; 2c + O(log c):
(The coecient 2 in the term 2I (c) is present because when two strings
are encoded into a single one by concatenating them, it is necessary to
add information indicating where to separate them. The most straight-
forward technique for providing punctuation doubles the length of the
shorter string.) Hence if c is suciently large, no c-random j (j + 2c)
bit string can belong to S .
Proof of Theorem 2. Arguing as in the proof of Theorem 1, let J
be a non-prime integer j bits long such that I (J ) i. By Lemma 1,
jfs 2 (2j (i + c)) : Z (s J )gj 22j (i+c)+1;2(i+c) : (10)
Since any member s of the set S 0 appearing in (10) can be calculated
uniquely if we are given J and the ordinal number of the position of s
in S 0 expressed as a 2j (i + c) + 1 ; 2(i + c) bit string, it follows that
I (s) 2j (i + c) + 1 ; 2(i + c) + 2I (J ) + O(1) 2j (i + c) ; 2c + O(1):
(The coecient 2 in the term 2I (J ) is present for the same reason as in
the proof of Theorem 1.) Hence if c is suciently large, no c-random
2j (i + c) bit sequence can belong to S 0.
340 Part V|Technical Papers on Blank-Endmarker Programs
3. Applications of the Foregoing Results
Let s be a probabilistically determined sequence in which 0's and 1's
appear independently with probabilities 1 ; , where 0 < < 1.
Group s into successive pairs of bits, and then drop all 00 and 11 pairs
and convert each 01 (respectively 10) pair into a 0 (respectively, a 1).
This gives a sequence s0 in which 0's and 1's appear independently with
exactly equal probabilities. If s0 is n bits long, then the probability that
I (s0) < n ; c is less than 2;c thus c-random sequences can be derived
easily from probabilistic experiments. Theorem 2 gives the number of
potential witnesses of compositeness which must be checked to ensure
that primality for numbers of special form is determined correctly with
high probability (or with certainty, if some oracle gave us a long bit
string known to satisfy the randomness criterion of algorithmic infor-
mation theory). Mersenne numbers N = 2n ; 1 only require checking
O(log n) = O(log log N ) potential witnesses. Fermat numbers
N = 22 + 1 n
4. Additional Remarks
The central idea of the Solovay{Strassen and Miller{Rabin algorithms
and of the preceding discussion can be expressed as follows: Consider
a specic propositional formula F in n variables for which we somehow
know that the percentage of satisability is greater than 75% or less
than 25%. We wish to decide which of these two possibilities is in fact
the case. The obvious way of deciding is to evaluate F at all 2n possible
n-tuples of the variables. But only O(I (F )) data points are necessary
to decide which case holds by sampling, if one posses an algorithmically
random sequence O(nI (F )) bits long. Thus one need only evaluate F
for O(I (F )) n-tuples of its variables, if the random sample is \certied."
These algorithms would be even more interesting if it were possible
to show that they are faster than any deterministic algorithms which
accomplish the same task. Gill 12], 13] in fact attacked the problem
of showing that there are tasks which can be accomplished faster by a
Monte Carlo algorithm than deterministically, before the current surge
of interest in these matters caused by the discovery of several proba-
bilistic algorithms which are much better than any known deterministic
ones for the same task.
The discussion of extensible formal systems given in 14] raises the
question of how to nd systematic sources of new axioms, likely to be
consistent with the existing axioms of logic and set theory, which can
shorten the proofs of interesting theorems. From the metamathematical
results of 1]{ 3], we know that no statement of the form \s is c-random"
can be proved if s has a length signicantly greater than c. This raises
the question of whether statements of the form \s is c-random" are
generally useful new axioms. (Note that Ehrenfeucht and Mycielski 15]
show that by adding any previously unprovable statement X to a formal
system, one always shortens very greatly the lengths of innitely many
proofs. Their argument is roughly as follows: Consider a proposition
of the form \either X or algorithm A halts," where A in fact halts but
takes a very long time to do so. Previously the proof of this assertion
342 Part V|Technical Papers on Blank-Endmarker Programs
was very long one had to simulate A's computation until it halted.
Now the proof is immediate, for X is an axiom. See also G
odel 16].)
Hence it is reasonable to ask whether the addition of axioms \s
is c-random" is likely either to allow interesting new theorems to be
proved, or to shorten the proof of interesting theorems which could
have been proved anyhow (but perhaps by unreachably long proofs).
The following discussion of this issue is very informal and is intended to
be merely suggestive. On the one hand, it is easy to see that interesting
new theorems are probably not obtained in this manner. The argument
is as follows. If it were highly probable that a particular theorem T can
be deduced from axioms of the form \s is c-random," then T could in
fact be proved without extending the axiom system. For even without
extending the axiom system one could show that \if s is random, then
T " holds for many s, and thus T would follow from the fact that most
s are indeed random. In other words, we would have before us a proof
by cases in which we do not know which case holds, but can show that
most do. Hence it seems that interesting new theorems will probably
not be obtained by extending a formal system in this way.
As to the possibility of interesting proof-shortenings, we can note
that Ehrenfeucht{Mycielski theorems are not very interesting ones.
Quick Monte Carlo algorithms for primality suggest another possibility.
Perhaps adding axioms of the form \s is random" makes it possible to
obtain shorter proofs of primality? Pratt's work 17] suggests caution,
but the following more general conjecture seems reasonable. If it is
in fact the case that for some tasks Monte Carlo algorithms are much
better than deterministic ones, then it may also be the case that some
interesting theorems have much shorter proofs when a formal system is
extended by adding axioms of the form \s is random."
References
1] Chaitin, G. J., Information-theoretic computational complexity,
IEEE Trans. Info. Theor. IT-20, 1974, pp. 10{15.
2] Chaitin, G. J., Information-theoretic limitations of formal sys-
tems, J. ACM 21, 1974, pp. 403{424.
A Note on Monte Carlo Primality Tests 343
3] Chaitin, G. J., Randomness and mathematical proof, Sci. Amer.
232, 5, May 1975, pp. 47{52.
4] Schwartz, J. T., Complexity of statement, computation and proof,
AMS Audio Recordings of Mathematical Lectures 67, 1972.
5] Levin, M., Mathematical logic for computer scientists, MIT
Project MAC TR-131, June 1974, pp. 145{147, 153.
6] Davis, M., What is a computation? in Mathematics Today |
Twelve Informal Essays, Springer-Verlag, New York, to appear
in 1978.
7] Chaitin, G. J., Algorithmic information theory, IBM J. Res. De-
velop. 21, 1977, pp. 350{359, 496.
8] Solovay, R., and Strassen, V., A fast Monte-Carlo test for primal-
ity, SIAM J. Comput. 6, 1977, pp. 84{85.
9] Miller, G. L., Riemann's hypothesis and tests for primality, J.
Comput. Syst. Sci. 13, 1976, pp. 300{317.
10] Rabin, M. O., Probabilistic algorithms in Algorithms and Com-
plexity | New Directions and Recent Results, J. F. Traub (ed.),
Academic Press, New York, 1976, pp. 21{39.
11] Bell, E. T., Mathematics | Queen and Servant of Science, Mc-
Graw-Hill, New York, 1951, pp. 225{226.
12] Gill, J. T. III, Computational complexity of probabilistic Turing
machines, Proc. 6th Annual ACM Symp. Theory of Computing,
Seattle, Washington, April 1974, pp. 91{95.
13] Gill, J. T. III, Computational complexity of probabilistic Turing
machines, SIAM J. Comput. 6, 1977, pp. 675{695.
14] Davis, M., and Schwartz, J. T., Correct-Program Technology/
Extensibility of Veriers|Two Papers on Program Verication,
Courant Computer Science Report #12, Courant Institute of
Mathematical Sciences, New York University, September 1977.
344 Part V|Technical Papers on Blank-Endmarker Programs
15] Ehrenfeucht, A., and Mycielski, J., Abbreviating proofs by adding
new axioms, AMS Bull. 77, 1971, pp. 366{367.
16] G
odel, K., On the length of proofs in The Undecidable|Basic Pa-
pers on Undecidable Propositions, Unsolvable Problems and Com-
putable Functions, M. Davis (ed.), Raven Press, Hewlett, New
York, 1965, pp. 82{83.
17] Pratt, V. R., Every prime has a succinct certicate, SIAM J.
Comput. 4, 1975, pp. 214{220.
Gregory J. Chaitin
IBM Thomas J. Watson Research Center
Yorktown Heights, N.Y. 10598, USA
Abstract
Loveland and Meyer have studied necessary and su!cient conditions
for an innite binary string x to be recursive in terms of the program-
size complexity relative to n of its n-bit prexes xn . Meyer has shown
that x is recursive i 9c 8n K (xn=n) c, and Loveland has shown
that this is false if one merely stipulates that K (xn =n) c for innitely
345
346 Part V|Technical Papers on Blank-Endmarker Programs
many n. We strengthen Meyer's theorem. From the fact that there
are few minimal-size programs for calculating a given result, we obtain
a necessary and su!cient condition for x to be recursive in terms of
the absolute program-size complexity of its prexes: x is recursive i
9c 8n K (xn) K (n) + c. Again Loveland's method shows that this
is no longer a su!cient condition for x to be recursive if one merely
stipulates that K (xn) K (n) + c for innitely many n.
References
1] G. J. Chaitin, Information-theoretic aspects of the Turing de-
grees, Abstract 72T-E77, AMS Notices 19 (1972) A-601, A-602.
2] G. J. Chaitin, There are few minimal descriptions, A necessary
and sucient condition for an innite binary string to be recur-
sive, (abstracts), Recursive Function Theory Newsletter (January
1973) 13{14.
3] G. J. Chaitin, A theory of program size formally identical to in-
formation theory, J. ACM 22 (1975) 329{340.
4] D. W. Loveland, A variant of the Kolmogorov concept of com-
plexity, Information and Control 15 (1969) 510{526.
5] C. P. Schnorr, Process complexity and eective random tests, J.
Comput. System Sci. 7 (1973) 376{388.
Communicated by A. Meyer
Received November 1974
Revised March 1975
PROGRAM SIZE,
ORACLES, AND THE
JUMP OPERATION
Osaka Journal of Mathematics 14 (1977),
pp. 139{149
Gregory J. Chaitin
Abstract
There are a number of questions regarding the size of programs for cal-
culating natural numbers, sequences, sets, and functions, which are best
answered by considering computations in which one is allowed to con-
sult an oracle for the halting problem. Questions of this kind suggested
by work of T. Kamae and D. W. Loveland are treated.
351
352 Part V|Technical Papers on Blank-Endmarker Programs
1. Computer Programs, Oracles, Informa-
tion Measures, and Codings
In this paper we use as much as possible Rogers' terminology and no-
tation 1, pp. xv{xix]. Thus N = f0 1 2 : : :g is the set of (natural)
numbers i, j , k, n, v, w, x, y, z are elements of N A, B , X are
subsets of N f , g, h are functions from N into N ', are partial
functions from N into N hx1 : : : xki denotes the ordered k-tuple con-
sisting of the numbers x1 : : : xk the lambda notation x : : :x : : :] is
used to denote the partial function of x whose value is : : :x : : : and the
mu notation x : : : x : : :] is used to denote the least x such that : : : x : : :
is true.
The size of the number x, denoted lg(x), is dened to be the number
of bits in the xth binary string. The binary strings are (, 0, 1, 00, 01,
10, 11, 000 : : : Thus lg(x) is the integer part of log2(x + 1). Note that
there are 2n numbers x of size n, and 2n ; 1 numbers x of size less than
n.
We are interested in the size of programs for a certain class of com-
puters. The zth computer in this class is dened in terms of '(2) z
X
1, pp. 128{134], which is the two-variable partial X -recursive func-
tion with G
odel number z. These computers use an oracle for deciding
membership in the set X , and the zth computer produces the output
'(2)
z (x y ) when given the program x and the data y . Thus the output
X
depends on the set X as well as the numbers x and y.
We now choose the standard universal computer U that can simulate
any other computer. U is dened as follows:
U X ((2x + 1)2z ; 1 y) = '(2)X
z (x y ):
Thus for each computer C there is a constant c such that any program
of size n for C can be simulated by a program of size n + c for U .
Having picked the standard computer U , we can now dene the
program size measures that will be used throughout this paper.
The fundamental concept we shall deal with is I (=X ), which is the
number of bits of information needed to specify an algorithm relative
to X for the partial function , or, more briey, the information in
relative to X . This is dened to be the size of the smallest program for
Program Size, Oracles, and the Jump Operation 353
:
I (=X ) = minlg(x) ( = y U X (x y)]):
Here it is understood that I (=X ) = 1 if is not partial X -recursive.
I (x ! y=X ), which is the information relative to X to go from the
number x to the number y, is dened as follows:
I (x ! y=X ) = min I (=X ) ((x) = y):
And I (x=X ), which is the information in the number x relative to the
set X , is dened as follows:
I (x=X ) = I (0 ! x=X ):
Finally I (=X ) is used to dene three versions I (A=X ), Ir(A=X ),
and If (A=X ) of the information relative to X of a set A. These corre-
spond to the three ways of naming a set 1, pp. 69{71]: by r.e. indices,
by characteristic indices, and by canonical indices. The rst denition
is as follows:
I (A=X ) = I (x if x 2 A then 1 else undened]=X ):
Thus I (A=X ) < 1 i A is r.e. in X . The second denition is as follows:
Ir (A=X ) = I (x if x 2 A then 1 else 0]=X ):
Thus Ir(A=X ) < 1 i A is recursive in X . And the third denition,
which applies only to nite sets, is as follows:
X
If (A=X ) = I ( 2x=X ):
x2A
The following notational convention is used: I (), I (x ! y), I (x),
I (A), Ir (A), and If (A) are abbreviations for I (=), I (x ! y=),
I (x=), I (A=), Ir (A=), and If (A=), respectively.
We use the coding of nite sequences of numbers into individual
numbers 1, p. 71]: is an eective one-one mapping from S1k=0 N k
onto N . And we also use the notation f (x) for of the sequence
hf (0) f (1) : : : f (x ; 1)i 1, p. 377] for any function f , f (x) is the
code number for the nite initial segment of f of length x.
The following theorems, whose straight-forward proofs are omitted,
give some basic properties of these concepts.
Theorem 1.
354 Part V|Technical Papers on Blank-Endmarker Programs
(a) I (x=X ) lg(x) + c
(b) There are less than 2n numbers x with I (x=X ) < n.
(c) jI (x=X ) ; I (y=X )j 2 lg(jx ; y j) + c
(d) The set of all true propositions of the form \I (x ! y=X ) z" is
r.e. in X .
(e) I (x ! y=X ) I (y=X ) + c
Recall that there are 2n numbers x of size n, that is, there are 2n
numbers x with lg(x) = n. In view of (a) and (b) most x of size n have
I (x=X ) n. Such x are said to be X -random. In other words, x is
said to be X -random if I (x=X ) is approximately equal to lg(x) most
x have this property.
Theorem 2.
(a) I ( (hx yi)) I ( (hy xi)) + c
(b) I ( (hx yi) ! (hy xi)) c
(c) I (x) I ( (hx yi)) + c
(d) I ( (hx yi) ! x) c
Theorem 3.
(a) I (x ! (x)=X ) I (=X )
(b) For each that is partial X -recursive there is a c such that
I ((x)=X ) I (x=X ) + c.
(c) I (x ! f (x)=X ) I (f=X ) + c
(d) I (f (x) ! x=X ) c and I (x=X ) I (f (x)=X ) + c
Theorem 4.
(a) I (x=X ) I (y x]=X ) and I (y x]=X ) I (x=X ) + c
(b) I (x=X ) If (fxg=X ) + c and If (fxg=X ) I (x=X ) + c
Program Size, Oracles, and the Jump Operation 355
(c) I (x=X ) Ir (fxg=X ) + c and Ir (fxg=X ) I (x=X ) + c
(d) I (x=X ) I (fxg=X ) + c and I (fxg=X ) I (x=X ) + c
(e) Ir(A=X ) If (A=X ) + c and I (A=X ) Ir(A=X ) + c
See 2] for a dierent approach to dening program size measures
for functions, numbers, and sets.
is 0.
It is easy to see that limx f (x) = U X (w 0) = z and I (f=X )
0
lg(w) + c < n + c.
Program Size, Oracles, and the Jump Operation 357
(b) By hypothesis there is a program w of size less than n such that
limx U X (w x) = z. Given w one can use the oracle for X 0 to
calculate z. At stage i one asks the oracle whether there is a
j > i such that U X (w j ) 6= U X (w i). If so, one goes to stage
i + 1 and tries again. If not, one is nished because U X (w i) = z.
This shows that I (z=X 0) lg(w) + c < n + c. Q.E.D.
See 3] for applications of oracles and the jump operation in the
context of self-delimiting programs for sets and probability constructs
in this paper we are only interested in programs with endmarkers.
5. Other Applications
In this section some other applications of oracles and the jump opera-
tion are presented without proof.
First of all, we would like to examine a question raised by C. P.
Schnorr 10, p. 189] concerning the relationship between I (x) and the
limiting relative frequency of programs for x. However, it is more ap-
propriate to ask what is the relationship between the self-delimiting
program size measure H (x) 9] and the limiting relative frequency
of programs for x (with endmarkers). Dene F (x n) to be ; log2
of (the number of programs w less than or equal to n such that
U (w 0) = x)=(n + 1). Then Theorem 12 of 10] is analogous to the
following:
Theorem 10. There is a c such that every x satises F (x n)
H (x) ; c for almost all n.
This shows that if H (x) is small, then x has many programs.
Schnorr asks whether the converse is true. In fact it is not:
Theorem 11. There is a c such that every x satises F (x n)
H (x=0) ; c for almost all n.
Thus even though H (x) is large, x will have many programs if
H (x=0) is small.
Program Size, Oracles, and the Jump Operation 363
We would like to end by examining the maximum nite cardinality
#A and co-cardinality #A attainable by a set A of bounded program
size. First we dene the partial function G :
G(x=X ) = max z (I (z=X ) x):
The following easily established results show how gigantic G is:
(a) If is partial X -recursive and x > I (=X ) + c, then (x), if
dened, is less than G(x=X ).
(b) If is partial X -recursive, then there is a c such that (G(x=X )),
if dened, is less than G(x + c=X ).
Theorem 12.
(a) G(x ; c) < max#A (If (A) x) < G(x + c)
(b) G(x ; c=0) < max #A (Ir(A) x) < G(x + c=0)
(c) G(x ; c=0) < max #A (Ir (A) x) < G(x + c=0)
(d) G(x ; c=0) < max #A (I (A) x) < G(x + c=0)
(e) G(x ; c=00) < max #A (I (A) x) < G(x + c=00)
Here it is understood that the maximizations are only taken over
those cardinalities which are nite.
The proof of (e) is beyond the scope of the method used in this
paper (e) is closely related to the fact that fxjWx is co-niteg is *3-
complete 1, p. 328] and to Theorem 16 of 3].
Appendix
Theorem 3b can be strengthened to the following:
I ((x)=X ) I (x=X ) + I (=X ) +
lg(I (=X )) + lg(lg(I (=X ))) +
2 lg(lg(lg(I (=X )))) + c:
364 Part V|Technical Papers on Blank-Endmarker Programs
There are many other similar inequalities.
To formulate sharp results of this kind it is necessary to abandon the
formalism of this paper, in which programs have endmarkers. Instead
one must use the self-delimiting program formalism of 9] and 3] in
which programs can be concatenated and merged. In that setting the
following inequalities are immediate:
H ((x)=X ) H (x=X ) + H (=X ) + c
H (x ('(x))]=X ) H ('=X ) + H (=X ) + c:
References
1] H. Rogers, Jr.: Theory of recursive functions and eective com-
putability, McGraw-Hill, New York, 1967.
2] G. J. Chaitin: Information-theoretic limitations of formal sys-
tems, J. ACM. 21 (1974), 403{424.
3] G. J. Chaitin: Algorithmic entropy of sets, Comput. Math. Appl.
2 (1976), 233{245.
4] T. Kamae: On Kolmogorov's complexity and information, Osaka
J. Math. 10 (1973), 305{307.
5] R. P. Daley: A note on a result of Kamae, Osaka J. Math 12
(1975), 283{284.
6] D. W. Loveland: A variant of the Kolmogorov concept of com-
plexity, Information and Control 15 (1969), 510{526.
7] G. J. Chaitin: Information-theoretic characterizations of recur-
sive innite strings, Theoretical Comput. Sci. 2 (1976), 45{48.
8] R. M. Solovay: unpublished manuscript on 9] dated May 1975.
9] G. J. Chaitin: A theory of program size formally identical to in-
formation theory, J. ACM 22 (1975), 329{340.
Program Size, Oracles, and the Jump Operation 365
10] C. P. Schnorr: Optimal enumerations and optimal Godel number-
ings, Math. Systems Theory 8 (1975), 182{191.
11] G. J. Chaitin: Algorithmic information theory, IBM J. Res. De-
velop. 21 (1977), in press.
367
ON THE LENGTH OF
PROGRAMS FOR
COMPUTING FINITE
BINARY SEQUENCES
Journal of the ACM 13 (1966),
pp. 547{569
Gregory J. Chaitin1
The City College of the City University of New York
New York, N.Y.
Abstract
The use of Turing machines for calculating nite binary sequences is
studied from the point of view of information theory and the theory of
recursive functions. Various results are obtained concerning the number
of instructions in programs. A modied form of Turing machine is
studied from the same point of view. An application to the problem of
dening a patternless sequence is proposed in terms of the concepts here
369
370 Part VI|Technical Papers on Turing Machines & LISP
developed.
Introduction
In this paper the Turing machine is regarded as a general purpose
computer and some practical questions are asked about programming
it. Given an arbitrary nite binary sequence, what is the length of the
shortest program for calculating it? What are the properties of those
binary sequences of a given length which require the longest programs?
Do most of the binary sequences of a given length require programs of
about the same length?
The questions posed above are answered in Part 1. In the course of
answering them, the logical design of the Turing machine is examined
as to redundancies, and it is found that it is possible to increase the
eciency of the Turing machine as a computing instrument without
a major alteration in the philosophy of its logical design. Also, the
following question raised by C. E. Shannon 1] is partially answered:
What eect does the number of dierent symbols that a Turing machine
can write on its tape have on the length of the program required for a
given calculation?
In Part 2 a major alteration in the logical design of the Turing
machine is introduced, and then all the questions about the lengths of
programs which had previously been asked about the rst computer
are asked again. The change in the logical design may be described in
the following terms: Programs for Turing machines may have transfers
from any part of the program to any other part, but in the programs
for the Turing machines which are considered in Part 2 there is a xed
upper bound on the length of transfers.
Part 3 deals with the somewhat philosophical problem of dening
a random or patternless binary sequence. The following denition is
proposed: Patternless nite binary sequences of a given length are se-
quences which in order to be computed require programs of approxi-
mately the same length as the longest programs required to compute
1 This paper was written in part with the help of NSF Undergraduate Research
Participation Grant GY-161.
Computing Finite Binary Sequences 371
any binary sequences of that given length. Previous work along these
lines and its relationship to the present proposal are discussed briey.
Part 1
1.1. We dene an N -state M -tape-symbol Turing machine by an N -
row by M -column table. Each of the NM places in this table must
have an entry consisting of an ordered pair (i j ) of natural numbers,
where i goes from 0 to N and j goes from 1 to M + 2. These entries
constitute, when specied, the program of the N -state M -tape-symbol
Turing machine. They are to be interpreted as follows: An entry (i j )
in the kth row and the pth column of the table means that when the
machine is in its kth state and the square of its one-way innite tape
which is being scanned is marked with the pth symbol, then the machine
is to go to its ith state if i 6= 0 (the machine is to halt if i = 0) after
performing the operation of
1. moving the tape one square to the right if j = M + 2,
2. moving the tape one square to the left if j = M + 1, and
3. marking (overprinting) the square of the tape being scanned with
the j th symbol if 1 j M .
Special names are given to the rst, second and third symbols. They
are, respectively, the blank (for unmarked square), 0 and 1.
A Turing machine may be represented schematically as follows:
1 0 0 1 0 0 ::::::
End of Tape Scanner 6 Tape
Black Box
It is stipulated that
(1.1A) Initially the machine is in its rst state and scanning the rst
square of the tape.
372 Part VI|Technical Papers on Turing Machines & LISP
(1.1B) No Turing machine may in the course of a calculation scan the
end square of the tape and then move the tape one square to the
right.
(1.1C) Initially all squares of the tape are blank.
Since throughout this paper we shall be concerned with computing
nite binary sequences, when we say that a Turing machine calculates
a particular nite binary sequence (say, 01111000), we shall mean that
the machine stops with the sequence written at the end of the tape,
with all other squares of the tape blank and with its scanner on the rst
blank square of the tape. For example, the following Turing machine
has just calculated the sequence mentioned above:
0 1 1 1 1 0 0 0 ::::::
6
Halted
6
Computing Finite Binary Sequences 383
Now control has been passed to Section III. First of all, Section III
accumulates in base-two on the tape a count of the number of blank
squares between the scanner and P when f assumes control. (This
number is m ; 1.) This base-two count, which is written on the tape,
is simply a binary sequence with a 1 at its left end. Section III then
removes this 1 from the left end of the binary sequence. The resulting
sequence is called Sn .
Note that if the row numbers entered in
(
row i + 2, column 1 if n = 2i + 1,
row i + 2, column 2 if n = 2i + 2,
of Section I are suitably specied, this binary sequence Sn can be made
any one of the 2v binary sequences of length v = (the greatest integer
not greater than log2(f ; d) ; 1). Finally, Section III writes Sn in
a region of the tape far to the right where all the previous Sj (j =
1 2 : : : n ; 1) have been written during previous phases, cleans up the
tape so that only the sequences P and Sj (j = 1 2 3 : : : n) remain on
it, positions the scanner back on the square at the end of the tape and,
as the last act of phase n, passes control back to row 1 again.
The foregoing description of the workings of the program omits some
important details for the sake of clarity. These follow.
It must be indicated how Section III knows when the last phase
(phase 2(d ; 2)) has occurred. During the nth phase, P is copied just
to the right of S1 S2 : : : Sn (of course a blank square is left between Sn
and the copy of P ). And during the (n +1)-th phase, Section III checks
whether or not P is currently dierent from what it was during the nth
phase when the copy of it was made. If it isn't dierent, then Section III
knows that phasing has in fact stopped and that a termination routine
must be executed.
The termination routine rst forms the nite binary sequence S
consisting of
S1 S2 : : : S2(d;2)
each immediately following the other. As each of the Sj can be any one
of the 2v binary sequences of length v if the row numbers in the entries
in Section I are appropriately specied, it follows that S can be any
384 Part VI|Technical Papers on Turing Machines & LISP
one of the 2w binary sequences of length w = 2(d ; 2)v. Note that
2(d ; 2)(log2(f ; d) ; 1) w > 2(d ; 2)(log2(f ; d) ; 2)
so that
! !
1 N
w ! 2 (1 ; log N )N log2 log N ! 2N log2 N:
2 2
As we want the program to be able to compute any sequence S of length
not greater than (2 + N )N log2 N , we have S consist of S followed to
the right by a single 1 and then a string of 0's, and the termination
routine removes the rightmost 0's and rst 1 from S . Q.E.D.
The result just obtained shows that it is impossible to make further
improvement in the logical design of the Turing machine of the kind
described in Section 1.2 and actually eected in Section 1.3 if we let
the number of tape symbols be xed and speak asymptotically as the
number of states goes to innity, in our present Turing machines 100
percent of the bits required to specify a program also serve to specify
the behavior of the machine.
Note too that the argument presented in the rst paragraph of this
section in fact establishes that, say, for any xed s greater than zero,
at most n;s 2n binary sequences S of length n satisfy
n
LM (S ) (1 + n) (M ; 1) log2 n :
Thus we have: For any xed s greater than zero, at most n;s 2n binary
sequences of length n fail to satisfy the double inequality
n
(1 + n) (M ; 1) LM (S ) (1 + 0n )
n
log2 n (M ; 1) log2 n :
1.7. It may be desirable to have some idea of the \local" as well
as the \global" behavior of LM (Cn). The following program of 8 rows
causes an 8-state 3-tape-symbol Turing machine to compute the binary
sequence 01100101 of length 8 (this program is in the format of the
machines of Section 1.1):
Computing Finite Binary Sequences 385
1,2 2,4 2,4
2,3 3,4 3,4
3,3 4,4 4,4
4,2 5,4 5,4
5,2 6,4 6,4
6,3 7,4 7,4
7,2 8,4 8,4
8,3 0,4 0,4
And in general:
(1.7.1) LM (Cn) n.
From this it is easy to see that for m greater than n:
(1.7.2) LM (Cm) LM (Cn) + (m ; n).
Also, for m greater than n:
(1.7.3) LM (Cm) + 1 LM (Cn).
For if one can calculate any binary sequence of length m greater
than n with an M -tape-symbol Turing machine having LM (Cm) states,
one can certainly program any M -tape-symbol Turing machine having
LM (Cm) + 1 states to calculate the binary sequence consisting of (any
particular sequence of length n) followed by a 1 followed by a sequence
of (m ; n ; 1) 0's], and then|instead of immediately halting|to rst
erase all the 0's and the rst 1 on the right end of the sequence. This
last part of the program takes up only a single row of the table in the
format of the machines of Section 1.1 this row r is:
row r r,5 r,1 0,1
Together (1.7.2) and (1.7.3) yield:
(1.7.4) jLM (Cn+1) ; LM (Cn)j 1.
From (1.7.1) it is obvious that LM (C1) = 1, and with (1.7.4) and the
fact that LM (Cn) goes to innity with n it nally is concluded that:
(1.7.5) For any positive integer p there is at least one solution n of
LM (Cn ) = p:
386 Part VI|Technical Papers on Turing Machines & LISP
1.8. In this section a certain amount of insight is obtained into the
properties of nite binary sequences S of length n for which LM (S )
is close to LM (Cn). M is considered to be xed throughout this sec-
tion. There is some connection between the present subject and that
of Shannon in 2, Pt. I, especially Th. 9].
The main result is as follows:
(1.8.1) For any e > 0 and d > 1 one has for all suciently large n:
If S is any binary sequence of length n satisfying the statement
that
(1.8.2) the ratio of the number of 0's in S to n diers from 12 by
more than e,
then LM (S ) < LM (CndH ( 21 +e 12 ;e)] ):
Here H (p q) (p 0 q 0 p + q = 1) is a special case of the entropy
function of Boltzmann statistical mechanics and information theory and
equals 0 if p = 0 or 1, and ;p log2 p ; q log2 q otherwise. Also, a real
number enclosed in brackets denotes the least integer greater than the
enclosed real. The H function comes up because the logarithm to the
base-two of the number
X n!
j ; 1 j>e
k
k
n 2
of binary sequences of length n satisfying (1.8.2) is asymptotic to
nH ( 21 + e 12 ; e). This may be shown easily by considering the ratio of
successive binomial coecients and using the fact that log(n!) ! n log n.
To prove (1.8.1), rst construct a class of eectively computable
functions Mn (:) with the natural numbers from 1 to 2n as range and
all binary sequences of length n as domain. Mn (S ) is dened to be
the ordinal number of the position of S in an ordering of the binary
sequences of length n dened as follows:
1. If two binary sequences S and S 0 have, respectively, m and m0
0's, then S comes before (after) S 0 according as j mn ; 21 j is greater
(less) than j mn ; 12 j.
0
Computing Finite Binary Sequences 387
2. If 1 does not settle which comes rst, take S to come before
(after) S 0 according as S represents (ignoring 0's to the left) a
larger (smaller) number in base-two notation than S 0 represents.
The only essential feature of this ordering is that it gives small
ordinal numbers to sequences for which j mn ; 21 j has large values. In
fact, as there are only
2(1+ )nH ( 12 +e 12 ;e)
n
rows. Thus for n suciently large this many rows plus r is all that is
needed to compute any binary sequence S of length n satisfying (1.8.2).
And by the asymptotic formula for LM (Cn ) of Section 1.6, it is seen
that the total number of rows of program required is, for n suciently
large, less than
LM (CndH ( 12 +e 12 ;e)]):
Q.E.D.
From (1.8.1) and the fact that H (p q) 1 with equality if and only
if p = q = 21 , it follows from LM (Cn ) ! (n=((M ; 1) log2 n)) that, for
example,
388 Part VI|Technical Papers on Turing Machines & LISP
(1.8.3) For any e > 0, all binary sequences S in CnM , n suciently
large, violate (1.8.2)
and more generally,
(1.8.4) Let
Sn1 Sn2 Sn3 : : :
be any innite sequence of distinct nite binary sequences of
lengths, respectively, n1 n2 n3 : : : which satises
LM (Sn ) ! LM (Cn ):
k k
It follows that
LM (C (TL (S )M )) = (1 + k )LM (Sk )
M k
Computing Finite Binary Sequences 391
which is|since the length of
C (TL M (S )M )
k
is
(1 + k )LM (Sk )(M ; 1) log2 LM (Sk )
and
LM (C(1+ )L (S )(M ;1)log2 L (S )) = (1 + 0k )LM (Sk )
k M k M k
Part 2
2.1. In this section we return to the Turing machines of Section 1.1
and add to the conventions (1.1A), (1.1B) and (1.1C),
(2.1D) An entry (i j ) in the pth row of the table of a Turing machine
must satisfy ji ; pj b. In addition, while a ctitious state is
used (as before) for the purpose of halting, the row of the table
for this ctitious state is now considered to come directly after
the actual last row of the program.
Here b is a constant whose value is to be regarded as xed throughout
Part 2. In Section 2.2 it will be shown that b can be chosen su-
ciently large that the Turing machines thus dened (which we take the
liberty of naming \bounded-transfer Turing machines") have all the
calculating capabilities that are basically required of Turing machines
for theoretical purposes (e.g., such purposes as dening what one means
by \eective process for determining: : : "), and hence have calculating
abilities sucient for the proofs of Part 2 to be carried out.
(2.1D) may be regarded as a mere convention, but it is more prop-
erly considered as a change in the basic philosophy of the logical design
of the Turing machine (i.e., the philosophy expressed by A. M. Turing
3, Sec. 9]).
Here in Part 2 there will be little point in considering the general
M -tape-symbol machine. It will be understood that we are always
speaking of 3-tape-symbol machines.
There is a simple and convenient notational change which can be
made at this point it makes all programs for bounded-transfer Turing
machines instantly relocatable (which is convenient if one puts together
Computing Finite Binary Sequences 393
a program from subroutines) and it saves a great deal of superuous
writing. Entries in the tables of machines will from now on consist of
ordered pairs (i0 j 0), where i0 goes from ;b to b and j 0 goes from 1 to
5. A \new" entry (i0 j 0) is to be interpreted in terms of the functioning
of the machine in a manner depending on the number p of the row of
the table it is in this entry has the same meaning as the \old" entry
(p + i0 j 0) used to have.
Thus, halting is now accomplished by entries of the form (k j ) (1
k b) in the kth row (from the end) of the table. Such an entry causes
the machine to halt after performing the operation indicated by j .
2.2. In this section we attempt to give an idea of the versatility
of the bounded-transfer Turing machine. It is here shown in two ways
that b can be chosen suciently large so that any calculation which one
of the Turing machines of Section 1.1 can be programmed to perform
can be imitated by a suitably programmed bounded-transfer Turing
machine.
As the rst proof, b is taken to be the number of rows in a 3-tape-
symbol universal Turing machine program for the machines of Section
1.1. This universal program (with its format changed to that of the
bounded-transfer Turing machines) occupies the last rows of a program
for a bounded-transfer Turing machine, a program which is mainly
devoted to writing out on the tape the information which will enable
the universal program to imitate any calculation which any one of the
Turing machines of Section 1.1 can be programmed to perform. One
row of the program is used to write out each symbol of this information
(as in the program in Section 1.7), and control passes straight through
the program row after row until it reaches the universal program.
Now for the second proof. To program a bounded-transfer Turing
machine in such a manner that it imitates the calculations performed
by a Turing machine of Section 1.1, consider alternate squares on the
tape of the bounded-transfer Turing machine to be the squares of the
tape of the machine being imitated. Thus
394 Part VI|Technical Papers on Turing Machines & LISP
1 0 1 0 ::::::
6
is imitated by
1 0 1 0 ::::::
6
1 0 1 1 1 0 1 ::::::
6
The rows of the table which cause the bounded-transfer Turing machine
to do the foregoing (type I rows) are interwoven or braided with two
other types of rows. The rst of these (type II rows) is used for the
sole purpose of putting the bounded-transfer Turing machine back in its
initial state (row 1 of the table this row is a type III row). They appear
(as do the other two types of rows) periodically throughout the table,
and each of them does nothing but transfer control to the preceding
one. The second of these (type III rows) serve to pass control back in
Computing Finite Binary Sequences 395
the other direction each time control is about to pass a block of type I
rows that imitate a particular state of the other machine while traveling
through type III rows, the type III rows erase the rightmost of the 1's
used to write out the number of the next state to be imitated. When
nally none of these place-marking 1's is left, control is passed to the
group of type I rows that was about to be passed, which then proceeds
to imitate the appropriate state of the Turing machine of Section 1.1.
Thus the obstacle of the upper bound on the length of transfers
in bounded-transfer Turing machines is overcome by passing up and
down the table by small jumps, while keeping track of the progress to
the desired destination is achieved by subtracting a unit from a count
written on the tape just prior to departure.
Although bounded-transfer Turing machines have been shown to
be versatile, it is not true that as the number of states goes to innity,
asymptotically 100 percent of the bits required to specify a program also
serve to specify the behavior of the bounded-transfer Turing machine.
2.3. In this section the following fundamental result is proved.
(2.3.1) L(Cn ) ! an, where a is, of course, a positive constant.
First it is shown that there exists an a greater than zero such that:
(2.3.2) L(Cn ) an.
It is clear that there are exactly
((5)(2b + 1))3N
dierent ways of making entries in the table of an N -state bounded-
transfer Turing machine that is, there are
2((3log2(10b+5))N )
dierent programs for an N -state bounded-transfer Turing machine.
Since a dierent program is required to have the machine calculate
each of the 2n dierent binary sequences of length n, it can be seen
that an N -state bounded-transfer Turing machine can be programmed
to calculate any binary sequence of length n only if
(3 log2(10b + 5))N n or N 3 log (10 n :
2 b + 5)
396 Part VI|Technical Papers on Turing Machines & LISP
Thus one can take a = (1=(3 log 2(10b + 5))).
Next it is shown that:
(2.3.3) L(Cn ) + L(Cm) L(Cn+m ).
To do this we present a way of making entries in a table with at most
L(Cn ) + L(Cm) rows which causes the bounded-transfer Turing ma-
chine thus programmed to calculate any particular binary sequence S
of length n + m. S can be expressed as a binary sequence S 0 of length n
followed by a binary sequence S 00 of length m. The table is then formed
from two sections which are numbered in the order in which they are
encountered in reading from row 1 to the last row of the table. Section
I consists of at most L(Cn) rows. It is a program which calculates S 0.
Section II consists of at most L(Cm) rows. It is a program which cal-
culates S 00. It follows from this construction and the denitions that
(2.3.3) holds.
(2.3.2) and (2.3.3) together imply (2.3.1).2 This will be shown by a
demonstration of the following general proposition:
(2.3.4) Let A1 A2 A3 : : : be an innite sequence of natural numbers
satisfying
(2.3.5) An + Am An+m .
Then as n goes to innity, (An=n) tends to a limit from above.
2 As stated in the preface of this book, it is straightforward to apply to LISP
the techniques used here to study bounded-transfer Turing machines. Let us dene
HLISP (x) where x is a bit string to be the size in characters of the smallest LISP
S-expression whose value is the list x of 0's and 1's. Consider the LISP S-expression
(APPEND P Q), where P is a minimal LISP S-expression for the bit string x and
Q is a minimal S-expression for the the bit string y. I.e., the value of P is the list of
bits x and P is HLISP(x) characters long, and the value of Q is the list of bits y and
Q is HLISP (y) characters long. (APPEND P Q) evaluates to the concatenation of
the bit strings x and y and is HLISP (x) + HLISP(y) + 10 characters long. Therefore,
let us dene HLISP (x) to be HLISP (x)+10: Now HLISP is subadditive like L(S). The
0 0
discussion of bounded-transfer Turing machines in this paper and the next therefore
applies practically word for word to HLISP = HLISP +10. In particular, let B(n) be
0
the maximum of HLISP(x) taken over all n-bit strings x. Then B(n)=n is bounded
0
away from zero, B(n + m) B(n) + B(m), and B(n) is asymptotic to a nonzero
constant times n.]
Computing Finite Binary Sequences 397
For all n, An 0, so that (An=n) 0 that is, f(An=n)g is a set of
reals bounded from below. It is concluded that this set has a greatest
lowest bound a. We now show that
lim An = a:
n!1 n
Since a is the greatest lower bound of the set f(An=n)g, for any e
greater than zero there is a d for which
(2.3.6) (Ad=d) < a + e.
Every natural number n can be expressed in the form n = qd + r, where
0 r < d. From (2.3.5) it can be seen that for any n1 n2 n3 : : : nq+1,
qX
+1
An A(P +1 n
q
):
k=1
k
=1
k k
This is proved from (2.5.5), the proof being identical with that of
(1.7.5).
Computing Finite Binary Sequences 405
(2.5.5) For any positive integer p there is at least one solution n of
L(Cn) = p.
Let the nk satisfy L(Cn ) = f ;1 (k), where f ;1(k) is dened to be
the smallest value of j for which f (j ) = k. Then since L(Cn ) n,
k
Q.E.D.
(2.5.4) and (2.4.1) yield:
(2.5.6) Let f (n) be any eectively computable function that goes to
innity with n and satises f (n + 1) ; f (n) = 0 or 1. Then there
are an innity of distinct nk for which less than 2n ;f (n ) binary
k k
Part 3
3.1. Consider a scientist who has been observing a closed system that
once every second either emits a ray of light or does not. He summarizes
his observations in a sequence of 0's and 1's in which a zero represents
\ray not emitted" and a one represents \ray emitted." The sequence
may start
0110101110 : : :
and continue for a few thousand more bits. The scientist then examines
the sequence in the hope of observing some kind of pattern or law.
What does he mean by this? It seems plausible that a sequence of 0's
and 1's is patternless if there is no better way to calculate it than just
406 Part VI|Technical Papers on Turing Machines & LISP
by writing it all out at once from a table giving the whole sequence:
My Scientic Theory
0
1
1
0
1
0
1
1
1
0
...
This would not be considered acceptable. On the other hand, if the
scientist should hit upon a method by which the whole sequence could
be calculated by a computer whose program is short compared with the
sequence, he would certainly not consider the sequence to be entirely
patternless or random. And the shorter the program, the greater the
pattern he might ascribe to the sequence.
There are many genuine parallels between the foregoing and the way
scientists actually think. For example, a simple theory that accounts
for a set of facts is generally considered better or more likely to be true
than one that needs a large number of assumptions. By \simplicity" is
not meant \ease of use in making predictions." For although General
or Extended Relativity is considered to be the simple theory par ex-
cellence, very extended calculations are necessary to make predictions
from it. Instead, one refers to the number of arbitrary choices which
have been made in specifying the theoretical structure. One naturally
is suspicious of a theory the number of whose arbitrary elements is of
an order of magnitude comparable to the amount of information about
reality that it accounts for.
On the basis of these considerations it may perhaps not appear en-
tirely arbitrary to dene a patternless or random nite binary sequence
as a sequence which in order to be calculated requires, roughly speak-
ing, at least as long a program as any other binary sequence of the same
Computing Finite Binary Sequences 407
length. A patternless or random innite binary sequence is then dened
to be one whose initial segments are all random. In making these de-
nitions mathematically approachable it is necessary to specify the kind
of computer referred to in them. This would seem to involve a rather
arbitrary choice, and thus to make our denitions less plausible, but in
fact both of the kinds of Turing machines which have been studied by
such dierent methods in Parts 1 and 2 lead to precise mathematical
denitions of patternless sequences (namely, the patternless or random
nite binary sequences are those sequences S of length n for which L(S )
is approximately equal to L(Cn), or, xing M , those for which LM (S ) is
approximately equal to LM (Cn)) whose provable statistical properties
start with forms of the law of large numbers. Some of these properties
will be established in a paper of the author to appear.4
A nal word. In scientic research it is generally considered better
for a proposed new theory to account for a phenomenon which had
not previously been contained in a theoretical structure, before the
discovery of that phenomenon rather than after. It may therefore be
of some interest to mention that the intuitive considerations of this
section antedated the investigations of Parts 1 and 2.
3.2. The denition which has just been proposed5 is one of many
attempts which have been made to dene what one means by a pattern-
less or random sequence of numbers. One of these was begun by R. von
Mises 5] with contributions by A. Wald 6], and was brought to its cul-
mination by A. Church 7]. K. R. Popper 8] criticized this denition.
The denition given here deals with the concept of a patternless bi-
nary sequence, a concept which corresponds roughly in intuitive intent
with the random sequences associated with probability half of Church.
However, the author does not follow the basic philosophy of the von
4 The author has subsequently learned of work of P. Martin-Lof (\The Denition
of Random Sequences," research report of the Institutionen for Forsakringsmate-
matik och Matematisk Statistik, Stockholm, Jan. 1966, 21 pp.) establishing sta-
tistical properties of sequences dened to be patternless on the basis of a type of
machine suggested by A. N. Kolmogorov. Cf. footnote 5.
5 The author has subsequently learned of the paper of A. N. Kolmogorov, Three
approaches to the denition of the concept \amount of information," Problemy
Peredachi Informatsii Problems of Information Transmission], 1 , 1 (1965), 3{11
in Russian], in which essentially the denition oered here is put forth.
408 Part VI|Technical Papers on Turing Machines & LISP
Mises{Wald{Church denition instead, the author is in accord with
the opinion of Popper 8, Sec. 57, footnote 1]:
I come here to the point where I failed to carry out fully
my intuitive program|that of analyzing randomness as far
as it is possible within the region of nite sequences, and of
proceeding to innite reference sequences (in which we need
limits of relative frequencies) only afterwards, with the aim
of obtaining a theory in which the existence of frequency
limits follows from the random character of the sequence.
Nonetheless the methods given here are similar to those of Church the
concept of eective computability is here made the central one.
A discussion can be given of just how patternless or random the
sequences given in this paper appear to be for practical purposes. How
do they perform when subjected to statistical tests of randomness?
Can they be used in the Monte Carlo method? Here the somewhat
tantalizing remark of J. von Neumann 9] should perhaps be mentioned:
Any one who considers arithmetical methods of producing
random digits is, of course, in a state of sin. For, as has
been pointed out several times, there is no such thing as a
random number|there are only methods to produce ran-
dom numbers, and a strict arithmetical procedure of course
is not such a method. (It is true that a problem that we sus-
pect of being solvable by random methods may be solvable
by some rigorously dened sequence, but this is a deeper
mathematical question than we can now go into.)
Acknowledgment
The author is indebted to Professor Donald Loveland of New York
University, whose constructive criticism enabled this paper to be much
clearer than it would have been otherwise.
Computing Finite Binary Sequences 409
References
1] Shannon, C. E. A universal Turing machine with two inter-
nal states. In Automata Studies, Shannon and McCarthy, Eds.,
Princeton U. Press, Princeton, N. J., 1956.
2] |. A mathematical theory of communication. Bell Syst. Tech.
J. 27 (1948), 379{423.
3] Turing, A. M. On computable numbers, with an application
to the Entscheidungsproblem. Proc. London Math. Soc. f2g 42
(1936{37), 230{265 Correction, ibid., 43 (1937), 544{546.
4] Davis, M. Computability and Unsolvability. McGraw-Hill, New
York, 1958.
5] von Mises, R. Probability, Statistics and Truth. MacMillan,
New York, 1939.
6] Wald, A. Die Widerspruchsfreiheit des Kollektivbegries der
Wahrscheinlichkeitsrechnung. Ergebnisse eines mathematischen
Kolloquiums 8 (1937), 38{72.
7] Church, A. On the concept of a random sequence. Bull. Amer.
Math. Soc. 46 (1940), 130{135.
8] Popper, K. R. The Logic of Scientic Discovery. U. of Toronto
Press, Toronto, 1959.
9] von Neumann, J. Various techniques used in connection with
random digits. In John von Neumann, Collected Works, Vol. V.
A. H. Taub, Ed., MacMillan, New York, 1963.
10] Chaitin, G. J. On the length of programs for computing nite
binary sequences by bounded-transfer Turing machines. Abstract
66T-26, Notic. Amer. Math. Soc. 13 (1966), 133.
11] |. On the length of programs for computing nite binary se-
quences by bounded-transfer Turing machines II. Abstract 631-6,
Notic. Amer. Math. Soc. 13 (1966), 228{229. (Erratum, p. 229,
line 5: replace \P " by \L".)
410 Part VI|Technical Papers on Turing Machines & LISP
Gregory J. Chaitin1
Buenos Aires, Argentina
Abstract
An attempt is made to carry out a program (outlined in a previous pa-
per) for dening the concept of a random or patternless, nite binary
sequence, and for subsequently dening a random or patternless, in-
nite binary sequence to be a sequence whose initial segments are all
random or patternless nite binary sequences. A denition based on
411
412 Part VI|Technical Papers on Turing Machines & LISP
the bounded-transfer Turing machine is given detailed study, but insuf-
cient understanding of this computing machine precludes a complete
treatment. A computing machine is introduced which avoids these dif-
culties.
CR Categories:
5.22, 5.5, 5.6
1. Introduction
In this section a denition is presented of the concept of a random or
patternless binary sequence based on 3-tape-symbol bounded-transfer
Turing machines.2 These computing machines have been introduced
and studied in 1], where a proposal to apply them in this manner is
made. The results from 1] which are used in studying the denition
are listed for reference at the end of this section.
An N -state, 3-tape-symbol bounded-transfer Turing machine is de-
ned by an N -row, 3-column table. Each of the 3N places in this table
must contain an ordered pair (i j ) of natural numbers where i takes on
values from ;b to b, and j from 1 to 5.3 These entries constitute, when
specied, the program of the N -state, 3-tape-symbol bounded-transfer
Turing machine and are to be interpreted as follows. An entry (i j )
in the kth row and the pth column of the table means that when the
machine is in its kth state, and the square of its one-way innite tape
1 Address: Mario Bravo 249, Buenos Aires, Argentina.
2 The choice of 3-tape-symbol machines is made merely for the purpose of xing
ideas.
3 Here b is a constant whose value is to be regarded as xed throughout this
paper. Its exact value is not important as long as it is not \too small." For an
explanation of the meaning of \too small," and proofs that b can be chosen so that
it is not too small, see 1, Secs. 2.1 and 2.2]. (b will not be mentioned again.)
Computing Finite Binary Sequences: Statistical Considerations 413
1 0 0 1 0 0 ::::::
End of Tape Scanner 6 Tape
Black Box
0 1 1 1 1 0 0 0 ::::::
6
Halted
Section I:
1,4 1,4 1,4
Section II consists of L(B (q(S ))) rows. It is a
program for calculating B (q(S )) consisting of the
smallest possible number of rows.
Section III:
1,4 1,4 1,4
Section IV consists of L(B (n)) rows. It is a
program for calculating B (n) consisting of the
smallest possible number of rows.
Section V consists by denition of c ; 2 rows.
It calculates the eectively computable function
q;1(q(S ) n) = S
it nds the two arguments on the tape.
Figure 3. Proof of Theorem 1
sequences.
Given a nite binary sequence, we may place a binary point to its
left and consider it to be the base-two notation for a nonnegative real
number r less than 1. Having done so it is natural to consider, say, the
ternary sequence used to represent r to the same degree of precision
in base-three notation. Let us dene this formally for an arbitrary
base b. Suppose that the binary sequence S of length n represents a
real number r when a binary point is axed to its left. Let n0 be the
smallest positive integer for which 2n bn . Now consider the set of all
0
right end. Let r0 be the greatest of these reals which is less than or
Computing Finite Binary Sequences: Statistical Considerations 419
equal to r, and let the b-ary sequence S 0 be the one used to represent
r0 in base-b notation. S 0 is the b-ary sequence which we will associate
with the binary sequence S . Note that no two binary sequences of the
same length are associated with the same b-ary sequence.
It is now possible to state the principal result of this section.
Theorem 3. Let S1 S2 S3 : : : be a sequence of distinct, nite
binary sequences of lengths, respectively, n1 n2 n3 : : : which satises
L(Sk ) ! L(Cn ). Then the sequence S10 S20 S30 : : : of associated b-ary
sequences is simply normal.
k
nd ]
H ; ; e
b b; b b; b
log 2 b
Here
X
b
H (p1 p2 : : : pb) (p1 0 p2 0 : : : pb 0 pi = 1)
i=1
is dened to be equal to
X
b
; pi log2 pi
i=1
where in this sum any terms 0 log2 0 are to be replaced by 0.
The H function occurs because the logarithm to the base two of
X 0!
(b ; 1) nk
k
j ; 1 j>e
k
n0
b;
b
420 Part VI|Technical Papers on Turing Machines & LISP
the number of b-ary sequences S 0 of length n0 which satisfy (5) is as-
ymptotic, as n approaches innity, to
n0H 1b ; b ;e 1 1b ; b ;e 1 1b + e
which is in turn asymptotic to nH= log2 b, for n0 ! n= log2 b. This may
be shown by considering the ratio of successive terms of the sum and
using Stirling's approximation, log(n!) ! n log n 4, Ch. 6, Sec. 3].
To prove Lemma 1 we rst dene an ordering q by the following two
conditions:
(a) Consider two binary sequences (of length n) S and T whose associ-
ated b-ary sequences (of length n0) S 0 and T 0 contain, respectively,
s and t occurrences of j . S comes before (after) T if
s 1
0 ; is greater (less) than t0 ; 1 :
n b n b
(b) If condition (a) doesn't settle which of the two sequences of length
n comes rst, take S to come before (after) T if S 0 represents (ig-
noring 0's to the left) a larger (smaller) number in base-b notation
than T 0 represents.6
Proof. We now apply Theorem 1 to any binary sequence S of length
n such that its associated b-ary sequence S 0 of length n0 satises (5).
Theorem 1 gives us
L(S ) L(Clog2 q(S)]) + L(Clog2 n] ) + c (6)
where, as we know from the paragraph before the last, for all suciently
large values of n,
1 nH
log2 q(S ) < 1 + 4 (d ; 1) log b : (7)
2
From (3b) and (7) we obtain for large values of n,
nH
L(Clog2 q(S)]) < a 1 + 12 (d ; 1) log : (8)
2b
6 This condition was chosen arbitrarily for the sole purpose of \breaking ties."
Computing Finite Binary Sequences: Statistical Considerations 421
And eq. (3b) implies that for large values of n,
nH :
L(Clog2 n]) + c < a 41 (d ; 1) log (9)
2b
Adding ineqs. (8) and (9), we see that ineq. (6) yields, for large values
of n, 3 nH
L(S ) < a 1 + 4 (d ; 1) log b :
2
Applying eq. (3b) to this last inequality, we see that for all suciently
large values of n,
L(S ) < L(Cnd log2 ])
H
b
!
inf length of subsequence of S selected by V >0 (10)
length of S
where the innum is taken over all nite binary sequences S . Then as
k approaches innity, the ratio of the number of 0's in the subsequence
of Sk which is selected by V to the number of 1's in this subsequence
tends to the limit 1.
Before proceeding to the proof it should be mentioned that a sim-
ilar result can be obtained for the generalized place selections due to
Loveland 9{11].
The proof of Theorem 4 runs parallel to the proof of Theorem 3. The
subsidiary result which is proved by taking in Theorem 1 the ordering
q dened below is
Corollary 1. Let e be a real number greater than 0, d be a real
number greater than 1, S be a binary sequence of length n, and let V
be a place selection which selects from S a subsequence S 0 of length n0.
Suppose that
the number of 0's in S 0 ; 1 > e: (a)
n0 2
Then for n0 greater than N we have
L(S ) L(Clog2 q(S)]) + L(Clog2 n]) + c
where
log2 q(S ) < n0dH ( 12 + e 12 ; e) + (n ; n0):
Computing Finite Binary Sequences: Statistical Considerations 423
Here N depends only on e and d, and c depends only on V .
De
nition.8 Let S be a binary sequence of length n, let S 0 of length
n0 be the subsequence of S selected by the place selection V , and let
S 00 be the subsequence of S which is not selected by V . Let9
Q = F (S 0) S 00 01 B12 B22 B32
where each Bi is a single bit and
1 B1 B2 B3 = B (the length of F (S 0)):
We then dene q(S ) to be the unique solution of B (q(S )) = Q.
De
nition. (Let us emphasize that F (S 0) is never more than about
n0H ( 12 + e 21 ; e)
bits long for S 0 which satisfy supposition (a) of Cor. 1: this is the crux
of the proof.) Consider the \padded" numerals for the integers from
0 to 2n ; 1 padded to a length of n0 bits by adding 0's on the left.
0
and thus q(S) = q(T ) for S and T of the same length only if S = T. But q(S)
is greater than 2n for some binary sequences S of length n. To correct this it is
necessary to obtain the \real" ordering q from the ordering q that we dene here
0
however, the result of this redenition is to decrease the value of q(S) for some S,
this is a quibble.
9 Our superscript notation for concatenation is invoked here for the rst time.
424 Part VI|Technical Papers on Turing Machines & LISP
6. Fundamental Properties of the L-Func-
tion
In Sections 3{5 the random or patternless nite binary sequences have
been studied. Before turning our attention to the random or patternless
innite binary sequences, we would like to show that many fundamental
properties of the L-function are simple consequences of the inequality
L(S S 0) L(S )+L(S 0) taken in conjunction with the simple normality
of sequences of random nite binary sequences.
In Theorem 3 take b = 2k and let the innite sequence S1 S2 S3 : : :
consist of all the elements of the various Cn's. We obtain
Corollary 2. For any e > 0, k, and for all suciently large values
of n, consider any element S of Cn to be divided into between (n=k) ; 1
and (n=k) nonoverlapping binary subsequences of length k with not
more than k ; 1 bits left over at the right end of S . Then the ratio
of the number of occurrences of any particular one of the 2k possible
binary subsequences of length k to (n=k) diers from 2;k by less than
e.
Keeping in mind the hypothesis of Corollary 2, let S be some ele-
ment of Cn. Then we have L(Cn ) = L(S ), and from Corollary 2 with
L(S ) = L(S 0 S 00 S 000 ) L(S 0 ) + L(S 00) + L(S 000) +
8. Statistical Properties of In
nite, Ran-
dom or Patternless Binary Sequences
Results concerning the statistical properties of innite, random or pat-
ternless binary sequences follow from the corresponding results for nite
sequences. Thus Theorem 8 is an immediate consequence of Theorem
3, and Corollary 1 and eq. (3b) yield Theorem 9.
Theorem 8. Real numbers whose binary expansions are sequences
in C1 are simply normal in every base.11
11It is known from probability theory that a real r which is simply normal in
every base has the following property. Let b be a base, and denote by an the nth
\digit" in the base-b expansion of r. Consider a b-ary sequence c1 c2 : : : cm. As n
Computing Finite Binary Sequences: Statistical Considerations 429
Theorem 9. Any innite binary sequence in C1 is a collective with
respect to the set of place selections12 which are eectively computable
and satisfy the following condition: For any innite binary sequence S ,
lim inf the number of bits in Skkwhich are selected by V > 0:
12Wald 12] introduced the notion of a collective with respect to a set of place
selections von Mises had originally permitted \all place selections which depend
only on elements of the sequence previous to the one being considered for selection."
13 The author has subsequently learned of Kolmogorov 13], in which a similar
kind of computing machine is used in essentially the same manner for the purpose
of dening a nite random sequence. Martin-Lof 14{15] studies the statistical prop-
erties of these random sequences and puts forth a denition of an innite random
sequence.
430 Part VI|Technical Papers on Turing Machines & LISP
computing machine is understood, the subscript will be omitted) are
dened as follows:
(
LM (S ) = min M (P )=S (length of P )
1 if there are no such P
LM (Cn ) = S ofmax L (S ):
length n M
In this general setting the program for the denition of a random or
patternless binary sequence assumes the following form: The pattern-
less or random nite binary sequences of length n are those sequences
S for which L(S ) is approximately equal to L(Cn). The patternless or
random innite binary sequences S are those whose truncations Sn are
all patternless or random nite sequences. That is, it is necessary that
for large values of n, L(Sn ) > L(Cn ) ; f (n) where f approaches innity
slowly.
We dene below a binary computing machine M which has, as is
easily seen, the following very convenient properties.
(a) L(Cn) = n + 1.
(b) Those binary sequences S of length n for which L(S ) < L(Cn );m
are less than 2n;m in number.
(c) For any binary computer M there exists a constant c such that
for all nite binary sequences S , LM (S ) LM (S ) + c.
References
1] Chaitin, G. J. On the length of programs for computing nite
binary sequences. J. ACM 13, 4 (Oct. 1966), 547{569.
2] von Neumann, J., and Morgenstern, O. Theory of Games
and Economic Behavior. Princeton U. Press, Princeton, N. J.,
1953.
3] Hardy, G. H., and Wright, E. M. An Introduction to the
Theory of Numbers. Oxford U. Press, Oxford, 1962.
4] Feller, W. An Introduction to Probability Theory and Its Ap-
plications, Vol. I. Wiley, New York, 1964.
5] Feinstein, A. Foundations of Information Theory. McGraw-
Hill, New York, 1958.
6] von Mises, R. Probability, Statistics, and Truth. Macmillan,
New York, 1939.
16Compare the last paragraph of Section 10.
17In view of Section 10, it apparently is not possible by the methods of this paper
to replace the \log2 k" here by a signicantly smaller function.
434 Part VI|Technical Papers on Turing Machines & LISP
7] Kolmogorov, A. N. On tables of random numbers. Sankhya
A], 25 (1963), 369{376.
8] Church, A. On the concept of a random sequence. Bull. Amer.
Math. Soc. 46 (1940), 130{135.
9] Loveland, D. W. Recursively Random Sequences. Ph.D. Diss.,
N.Y.U., June 1964.
10] |. The Kleene hierarchy classication of recursively random se-
quences. Trans. Amer. Math. Soc. 125 (1966), 487{510.
11] |. A new interpretation of the von Mises concept of random
sequence. Z. Math. Logik Grundlagen Math. 12 (1966), 279{294.
12] Wald, A. Die Widerspruchsfreiheit des Kollectivbegries der
Wahrsheinlichkeitsrechnung. Ergebnisse eines mathematischen
Kolloquiums 8 (1937), 38{72.
13] Kolmogorov, A. N. Three approaches to the denition of the
concept \quantity of information." Problemy Peredachi Infor-
matsii 1 (1965), 3{11. (in Russian)
14] Martin-Lof, P. The denition of random sequences. Res. Rep.,
Inst. Math. Statist., U. of Stockholm, Stockholm, 1966, 21 pp.
15] |. The denition of random sequences. Inform. Contr. 9 (1966),
602{619.
16] Lofgren, L. Recognition of order and evolutionary systems. In
Computer and Information Sciences|II, Academic Press, New
York, 1967, pp. 165{175.
17] Levin, M., Minsky, M., and Silver, R. On the problem of
the eective denition of \random sequence". Memo 36 (revised),
RLE and MIT Comput. Center, 1962, 10 pp.
Gregory J. Chaitin1
Buenos Aires, Argentina
Abstract
It is suggested that there are innite computable sets of natural numbers
with the property that no innite subset can be computed more simply
or more quickly than the whole set. Attempts to establish this without
restricting in any way the computer involved in the calculations are not
435
436 Part VI|Technical Papers on Turing Machines & LISP
entirely successful. A hypothesis concerning the computer makes it pos-
sible to exhibit sets without simpler subsets. A second and analogous
hypothesis then makes it possible to prove that these sets are also with-
out subsets which can be computed more rapidly than the whole set. It
is then demonstrated that there are computers which satisfy both hy-
potheses. The general theory is momentarily set aside and a particular
Turing machine is studied. Lastly, it is shown that the second hypoth-
esis is more restrictive then requiring the computer to be capable of
calculating all innite computable sets of natural numbers.
CR Categories:
5.22
Introduction
Call a set of natural numbers perfect if there is no way to compute in-
nitely many of its members essentially better (i.e. simpler or quicker)
than computing the whole set. The thesis of this paper is that per-
fect sets exist. This thesis was suggested by the following vague and
imprecise considerations.
One of the most profound problems of the theory of numbers is that
of calculating large primes. While the sieve of Eratosthenes appears to
be as simple and as quick an algorithm for calculating all the primes as
is possible, in recent times hope has centered on calculating large primes
by calculating a subset of the primes, those that are Mersenne numbers.
Lucas's test is simple and can test whether or not a Mersenne number is
a prime with rapidity far greater than is furnished by the sieve method.
If there are an innity of Mersenne primes, then it appears that Lucas
1 Address: Mario Bravo 249, Buenos Aires, Argentina.
Computing Innite Sets of Natural Numbers 437
has achieved a decisive advance in this classical problem of the theory
of numbers.2
An opposing point of view is that there is no way to calculate large
primes essentially better than to calculate them all. If this is the case
it apparently follows that there must be only nitely many primes.
1. General Considerations
The notation and terminology of this paper are largely taken from Davis
3].
De
nition 1. A computing machine * is dened by a 2-ary non-
vanishing computable function in the following manner. The natural
number n is part of the output *(p t) of the computer * at time t re-
sulting from the program p if and only if the nth prime3 divides (p t).
The innite set *(p) of natural numbers which the program p causes
the computing machine * to calculate is dened to be
*(p t)
t
if innitely many numbers are put out by the computer in numerical
order and without any repetition. Otherwise, *(p) is undened.
De
nition 2. A program complexity measure 1 is a computable
1-ary function with the property that only nitely many programs p
have the same complexity 1(p).
De
nition 3. The complexity 1(S ) of an innite computable
set S of natural numbers as computed by the computer * under the
complexity measure 1 is dened to be equal to
(
min(p)=S 1(p) if there are such p
1 otherwise.
2 For Lucas's test, cf. Hardy and Wright 1, Sec. 15.5]. For a history of number
theory, cf. Dantzig 2], especially Sections 3.12 and B.8.
3 The 0th prime is 2, the 1st prime is 3, etc. The primes are, of course, used here
only for the sake of convenience.
438 Part VI|Technical Papers on Turing Machines & LISP
I.e. 1(S ) is the complexity of the simplest program which causes the
computer to calculate S , and if there is no such program,4 the com-
plexity is innite.5
In this section we do not see any compelling reason for regarding
any particular computing machine and program complexity measure as
most closely representing the state of aairs with which number theo-
rists are confronted in their attempts to compute large primes as simply
and as quickly as possible.6 The four theorems of this section and their
extensions hold for any computer * and any program complexity mea-
sure 1. Thus, although we don't know which computer and complexity
measure to select, as this section holds true for all of them, we are
covered.
Theorem 1. For any natural number n, there exists an innite
computable set S of natural numbers which has the following properties:
(a) 1(S ) > n.
(b) For any innite computable set R of natural numbers, R S
implies 1(R) 1(S ).
Proof. We rst prove the existence of an innite computable set A
of natural numbers having no innite computable subset B such that
1(B ) n. The innite computable sets C of natural numbers for
which 1 (C ) n are nite in number. Each such C has a smallest
element c. Let the (nite) set of all these c be denoted by D. We take
A = D.
Now let A0 A1 A2 : : : be the innite computable subsets of A. Con-
sider the following set:
E = f1(A0) 1(A1) 1(A2) : : :g:
4 This possibility can never arise for the simple-program computers or the quick-
program computers introduced later such computers can be programmed to com-
pute any innite computable set of natural numbers.
5 A more formal denition would perhaps use !, the rst transnite ordinal,
instead of 1.
6 In Sections 2 and 3 the point of view is dierent some computing machines are
dismissed as degenerate cases and an explicit choice of program complexity function
is suggested.
Computing Innite Sets of Natural Numbers 439
From the manner in which A was constructed, we know that each mem-
ber of E is greater than n. And as the natural numbers are well-ordered,
we also know that E has a smallest element r. There exists a natural
number s such that 1(As) = r. We take S = As, and we are nished.
Q.E.D.
Theorem 2. For any natural number n and any innite computable
set T of natural numbers with innite complement, there exists a com-
putable set S of natural numbers which has the following property:
T S and 1(S ) > n.
Proof. There are innitely many computable sets of natural num-
bers which have T as a subset, but the innite computable sets F of
natural numbers for which 1(F ) n are nite in number. Q.E.D.
Theorem 3. For any 1-ary computable function f , there exists an
innite computable set S of natural numbers which has the following
property: *(p) S implies the existence of a t0 such that for t > t0,
n 2 *(p t) only if t > f (n).
Proof. We describe a procedure for computing S in successive stages
(each stage being divided into two successive steps) during the kth
stage it is determined in the following manner whether or nor k 2
S . Two subsets of the computing machine programs p such that p <
k=4] are considered: set A, consisting of those programs which have
been \eliminated" during some stage previous to the kth and set B ,
consisting of those programs not in A which cause * to output the
natural number k during the rst f (k) time units of calculation.
Step 1. Put k in S if and only if B is empty.
Step 2. Eliminate all programs in B (i.e. during all future stages
they will be in A).
The above constructs S . That S contains innitely many natural
numbers follows from the fact that up to the kth stage at most k=4
programs have been eliminated, and thus at most k=4 natural numbers
less than or equal to k can fail to be in S .7
7I.e. the Schnirelman density d(S) of S is greater than or equal to 3/4. It follows
from d(S) 3=4 that S is basis of the second order i.e. every natural number can
be expressed as the sum of two elements of S. Cf. Gelfond and Linnik 4, Sec. 1.1].
We conclude that the mere fact that a set is a basis of the second order for the
natural numbers does not provide a quick means for computing innitely many of
its members.
440 Part VI|Technical Papers on Turing Machines & LISP
It remains to show that *(p) S implies the existence of a t0 such
that for t > t0, n 2 *(p t) only if t > f (n). Note that for n 4p + 4,
n 2 *(p t) only if t > f (n). For a value of n for which this failed to be
the case would assure p's being in A, which is impossible. Thus given
a program p such that *(p) S , we can calculate a point at which the
program has become slow and will remain so i.e. we can calculate a
permissible value for t0. In fact, t0(p) = maxj<4p+4 f (j ). Q.E.D.
The following theorem and the type of diagonal process used in its
proof are similar in some ways to Blum's exposition of a theorem of
Rabin in 5, pp. 241{242].
Theorem 4. For any 1-ary computable function f and any innite
computable set T of natural numbers with innite complement, there
exists an innite computable set S of natural numbers which is a su-
perset of T and which has the following property: *(p) = S implies the
existence of a t0 such that for t > t0, n 2 *(p t) only if t > f (n).
Proof. First we dene three functions: a(n) is equal to the nth
natural number in T b(n) is equal to the smallest natural number j
greater than or equal to n such that j 2 T and j + 1 62 T and c(n) is
equal to maxnkb(n) f (k). As proof, we give a process for computing
S \ T in successive stages during the kth stage it is determined in the
following manner whether or not a(k) 2 S . Consider the computing
machine programs 0 1 2 : : : k to fall into two mutually exclusive sets:
set A, consisting of those programs which have been eliminated during
some stage previous to the kth and set B , consisting of all others.
Step 1. Determine the set C consisting of the programs in B which
cause the computing machine * to output during the rst c(a(k)) time
units of calculation any natural numbers greater than or equal to a(k)
and less then or equal to b(a(k)).
Step 2. Check whether C is empty. Should C = , we neither
eliminate programs nor put a(k) in S we merely proceed to the next
(the (k + 1)-th) stage. Should C = , however, we proceed to step 3.
Step 3. We determine p0 , the smallest natural number in C .
Step 4. We ask, \Does the program p0 cause * to output the num-
ber a(k) during the rst c(a(k)) time units of calculation?" According
as the answer is \no" or \yes" we do or don't put a(k) in S .
Step 5. Eliminate p0 (i.e. during future stages p0 will be in A).
The above constructs S . We leave to the reader the verication that
Computing Innite Sets of Natural Numbers 441
the constructed S has the desired properties. Q.E.D.
We now make a number of remarks.
Remark 1. We have actually proved somewhat more. Let U be
any innite computable set of natural numbers. Theorems 1 and 3
hold even if it is required that the set S whose existence is asserted
be a subset of U . And if in Theorems 2 and 4 we make the additional
assumption that T is a subset of U , and U \ T is innite, then we can
also require that S be a subset of U .
The above proofs can practically be taken word for word (with obvi-
ous changes which may loosely be summarized by the command \ignore
natural numbers not in U ") as proofs for these extended theorems. It
is only necessary to keep in mind the essential point, which in the case
of Theorem 3 assumes the following form. If during the kth stage of
the diagonal process used to construct S we decide whether to put in
S the kth element of U , we are still sure that *(p) S is impossible
for all the p which were eliminated before. For if *(p) U , then p
is eliminated as before while if *(p) has elements not in U , then it is
clear that *(p) S is impossible, for S is a subset of U .
Remark 2. In Theorems 1 and 2 we see two possible extremes
for S . In Theorem 1 we contemplate an arbitrarily complex innite
computable set of natural numbers that has the property that there
is no way to compute innitely many of its members which is simpler
than computing the whole set. On the other hand, in Theorem 2 we
contemplate an innite computable set of natural numbers that has the
property that there is a way to compute innitely many of its members
which is very much simpler than computing the whole set. Theorems
3 and 4 are analogous to Theorems 1 and 2, but Theorem 3 does not
go as far as Theorem 1. Although Theorem 3 asserts the existence
of innite computable sets of natural numbers which have no innite
subsets which can be computed quickly, it does not establish that no
innite subset can be computed more quickly than the whole set. In this
generality we are unable to demonstrate a Theorem 3 truly analogous
to Theorem 1, although an attempt to do so is made in Remark 5.
Remark 3. The restriction in the conclusions of Theorems 3 and
4 that t be greater than t0 is necessary. For as Arbib remarks in 6,
p. 8], in some computers * any nite part of S can be computed very
quickly by a table look-up procedure.
442 Part VI|Technical Papers on Turing Machines & LISP
Remark 4. The 1-ary computable function f of Theorems 3 and
4 can go to innity very quickly indeed with increasing values of its
argument. For example, let f0(n) = 2n , fk+1 (n) = fk (fk (n)). For each
k, fk+1(n) is greater than fk (n) for all but a nite number of values
of n. We may now proceed from nite ordinal subscripts to the rst
transnite ordinal by a diagonal process: f! (n) = maxkn fk (n). We
choose to continue the process up to !2 in the following manner, which
is a natural way to proceed (i.e. the fundamental sequences can be
computed by simple programs) but which is by no means the only way
to get to !2. i and j denote nite ordinals.
f!i+j+1 (n) = f!i+j (f!i+j (n))
f!(i+1)(n) = max f (n)
kn !i+k
f!2 (n) = max f (n):
kn !k
Taking f = f!2 in Theorem 3 yields an S such that any attempt
to compute innitely many of its elements requires an amount of time
which increases almost incomprehensibly quickly with the size of the
elements computed.
More generally, the above process may be continued through to
any constructive ordinal.8 For example, there are more or less natural
manners to reach 0, the rst epsilon-number the territory up to it is
very well charted.9
The above is essentially a constructive version of remarks by Borel
8] in an appendix on a theorem of P. du Bois-Reymond. These remarks
are partly reproduced in Hardy 9].
Remark 5. Remark 4 suggests the following approach to the speed
of programs. For any constructive ordinal there is a computable 2-ary
function f (by no means unique) with the property that the set of 1-ary
functions fk dened by fk (n) = f (k n) is a representative of when
ordered in such a manner that a function g comes before a function h
if and only if g(n) < h(n) holds for all but a nite number of values of
n. We now associate an ordinal Ord (S ) with each innite computable
set S of natural numbers in accordance with the following rules:
8Cf. Davis 3, Sec. 11.4] for a denition of the concept of a constructive ordinal
number.
9 Cf. Fraenkel 7, pp. 207{208].
Computing Innite Sets of Natural Numbers 443
(a) Ord(S ) equals the smallest ordinal < such that fk0 , the
th element of the set of functions fk , has the following property:
There exists a program p and a time t0 such that *(p) = S and
for t > t0, n 2 *(p t) only if t fk0 (n).
(b) If (a) fails to dene Ord (S ) (i.e. if the set of ordinals is empty),
then Ord (S ) = .
Then for any constructive ordinal we have the following analogue
to Theorem 1.
Theorem 10. Any innite computable set T of natural numbers
has an innite computable subset S with the following properties:
(a) Ord(S ) Ord(T ).
(b) For any innite computable set R of natural numbers, R S
implies Ord(S ) Ord (R).
Proof. Let T0 T1 T2 : : : be the innite computable subsets of T .
Consider the following set of ordinal numbers less than or equal to :
fOrd (T0) Ord (T1 ) Ord (T2) : : :g:
2.A. Simplicity
In this subsection we make one explicit choice for the program com-
plexity measure 1. We consider programs to be nite binary sequences
as well as natural numbers:
Programs
Binary Sequence ( 0 1 00 01 10 11 000 001 010 011 : : :
Natural Number 0 1 2 3 4 5 6 7 8 9 10 : : :
Henceforth, when we denote a program by a lowercase (uppercase)
Latin letter, we are referring to the program considered as a natural
number (binary sequence). Next we dene the complexity of a program
P to be the number of bits in P (i.e. its length). I.e. the complexity
1(p) of a program p is equal to log2(p + 1)], the greatest integer not
greater than the base-2 logarithm of p + 1.
We now introduce the simple-program computers. Computers sim-
ilar to them have been used in Solomono 11], Kolmogorov 12], and
in 13].
De
nition 5. A simple-program computer * has the following
property: For any computer 3, there exists a natural number c such
that 1(S ) 1(S ) + c for all innite computable sets S of natural
numbers.
To the extent that it is plausible to consider all computer programs
to be binary sequences, it seems plausible to consider all computers
which are not simple-program computers as unnecessarily awkward de-
generate cases which are unworthy of attention.
Remark 7. Note that if * and 3 are two simple-program com-
puters, then there exists a natural number c which has the following
property: j1(S ) ; 1(S )j c for all innite computable sets S of
446 Part VI|Technical Papers on Turing Machines & LISP
natural numbers. In fact we can take
c = max(c c):
Theorem 5. For any connecting function , there exists a simple-
program computer *
which has the following property: For any -
connected set S and any innite computable subset R of S ,
1 (S ) 1 (R):
< *
(( t) =
>
> * (P 0 t) = * (P t) (1)
: *
(P 1 t) = Tt <t *
(P 1 t0) \ ;(Sn2 (Pt) ;1(n)):
0
As * is a simple-program computer, so is *
, for *
(P 0 t) = *(P t).
*
also has the following very important property: For all programs P 0
for which *
(P 0) is a subset of some -connected set S , *
(P 1) = S .
Moreover, *
(P 1) cannot be a proper subset of any -connected set.
In summary, given a P such that *
(P ) is a proper subset of a -
connected set S , then by changing the rightmost bit of P to a 1 we get
a program P 0 with the property that *
(P 0) = S . This implies that for
any innite computable subset R of a -connected set S ,
1 (S ) 1 (R):
Q.E.D.
In view of Remark 7, the following theorem is merely a corollary to
Theorem 5.
Theorem 6. Consider a simple-program computer *. For any
connecting function , there exists a natural number c
which has the
following property: For any -connected set S and any innite com-
putable subset R of S , 1(S ) 1(R) + c
. In fact, we can take11
c
= 2 max(c c ):
11 That
c = c + c
will do follows upon taking a slightly closer look at the matter.
Computing Innite Sets of Natural Numbers 447
2.B. Speed
This treatment runs parallel to that of subsection 2.A.
De
nition 6. A quick-program computer * has the following prop-
erty: For any computer 3, there exists a 1-ary computable function s
such that for all programs p for which 3(p) is dened, there exists a
program p0 such that *(p0) = 3(p) and
3(p t0) *(p0 t0)
t t0 t s
( )
0
t
sible, we now cast it into the framework of lattice theory, cf. Birkho
15].
De
nition L1. Let *1 and *2 be computing machines. *1 im *2
(*1 can be imitated by *2) if and only if there exists a 1-ary computable
function f which has the following property: For any program p for
which *1(p) is dened, there exists a program p0 such that *2(p0) =
*1(p) and
*1(p t0) *2(p0 t0)
t t
0
t f (t)
0
S2 = *2(L(p) t0)
t t
0
explicitly:
s (n) = max
# (k n):
kn
Here is, of course, the 2-ary computable function which denes the
computer 3 as in Denition 1. Q.E.D.
where n stands for the largest element of the left-hand side of the
relation, if this set is not empty (otherwise, n stands for 0).
Proof. p0 is obtained from p in the following manner. c
rows are
added to the table dening the program p. All transfers to the next
to the last row in the program p are replaced by transfers to the rst
row of the added section. The new rows of the table use the program
p as a subroutine. They make the program p think that it is working
as usual, but actually p is using neither the quadrant's three edge rows
nor the three edge columns p has been fooled into thinking that these
squares do not exist because the new rows moved the scanner to the
fourth square on the diagonal of the quadrant before turning control
over to p for the rst time by transferring to the last row of p. This
protected region is used by the new rows to do its scratch-work, and
also to keep permanent records of all natural numbers which it causes
, to output.
Every time the subroutine thinks it is making , output a natural
number n, it actually only passes n and control to the new rows. These
proceed to nd out which natural numbers are in ;( ;1(n)). Then
12 This implies
(S)
(&(p)) + c :
I.e.
(S)
(R) + c
for any innite computable subset R of the -connected set S.
Computing Innite Sets of Natural Numbers 453
the new rows eliminate those elements of ;( ;1 (n)) which , put out
previously. Finally, they make , output those elements which remain,
move the scanner back to what the subroutine last thought was its
position, and return control to the subroutine. Q.E.D.
Remark A. Assuming that only the computer , and program
complexity measure $ are of interest, it appears that we have before
us some connected sets which are in a very strong sense perfect sets.
For, as was mentioned in Remark 6, there are -connected sets which
, must compute very slowly. For such sets, the term s
(n) in (b) above
is negligible compared with t.
Theorem A2. Consider a simple-program computer * and the
program complexity measure 1(p) = log2(p + 1)]. Let S0 S1 S2 : : :
be a sequence of distinct innite computable sets of natural numbers.
Then we may conclude that
lim 1(Sk )
k!1 2$ (Sk ) log2 $ (Sk )
n# = nm 0
nk 2 *(P tk ) (0 k m):
Dene A(i j ) (the predicate \the program j is eliminated during the
ith stage"),14 A (the set of programs eliminated before the m0 th stage),
14During the ith stage of this diagonal process it is decided whether or not the
ith element of '(P) is in '1(P1).
Computing Innite Sets of Natural Numbers 455
and A0 (the set of programs eliminated before or during the m0th stage)
as follows:
A(i j ) i j < i=4] and ni 2 *1(j t0)
t t
0
i
where
*1(p t3) = fs0 s1 s2 : : :g (s0 < s1 < s2 < )
t3 t
and for no n1 n2, t1 < t2 t is it simultaneously the case that
n1 2 *1(p t1) and n2 2 *1(p t2). Note that for all p, *1(p) = *0(p),
both sides of the equation being undened if one of them is.
Sets S whose existence is asserted by Theorem B1 which must be
computed very slowly by *1 must be computed very much more slowly
indeed by *0, and thus *1 im *0 cannot be the case. Moreover, within
any innite computable set U of natural numbers, there are such sets
S.
We now show in greater detail that (*0) < (*1) = 1 by a reductio
ad absurdum of *1 im *0. Suppose *1 im *0. Then by denition
there exists a 1-ary computable function h such that for any program p
for which *1(p) is dened, there exists a program p0 such that *0(p0) =
*1(p) and
*1(p t0) *0(p0 t0)
t t
0
t h(t)
0
Here, as usual, S = fs0 s1 s2 : : :g (s0 < s1 < s2 < ). Note
that sk , the kth element of S , must be greater than or equal to
k:
458 Part VI|Technical Papers on Turing Machines & LISP
3. sk k.
By the denition of (*1) im (*0) we must have
h(g(n) + 1) g(sg(n)+1) + 1
for all but nitely many n 2 S . By (2) this implies
h(g(n) + 1) > kmax
s
h(k) + 1
( )+1
g n
hence g(n) + 1 > sg(n)+1 for all but nitely many n 2 S . Invoking (3)
we obtain g(n) + 1 > sg(n)+1 g(n) + 1, which is impossible. Q.E.D.
A slightly dierent way of obtaining the following theorem was an-
nounced in 20].
Theorem B3. Any countable partially ordered set is order-
isomorphic with a subset of L. That is, L is a \universal" countable
partially ordered set.
Proof. We show that an example of a universal partially ordered set
is C , the computable sets of natural numbers ordered by set inclusion.
Thus the theorem is established if we can nd in L an isomorphic image
of C . This isomorphic image is obtained in the following manner. Let
S be a computable set of natural numbers. Let S 0 be the set of all odd
multiples of 2n , where n ranges over all elements of S . The isomorphic
image of the element S of C is the element l.u.b. (*0) (*S1 ) of L. Here
0
Denition B2.
It only remains to prove that C is a universal partially ordered set.
Sacks 17, p. 53] attributes to Mostowski 21] the following result: There
is a universal countable partially ordered set A = fa0 a1 a2 : : :g with
the property that the predicate an am is computable. We nish the
proof by constructing in C an isomorphic image A0 = fA0 A1 A2 : : :g
of A as follows:
Ai = fkjak aig:
It is easy to see that Ai Aj if and only if ai aj .] Q.E.D.
Corollary B2. L has exactly @0 elements.
Computing Innite Sets of Natural Numbers 459
References
1] Hardy, G. H., and Wright, E. M. An Introduction to the
Theory of Numbers. Clarendon Press, Oxford, 1962.
2] Dantzig, T. Number, the Language of Science. Macmillan, New
York, 1954.
3] Davis, M. Computability and Unsolvability. McGraw-Hill, New
York, 1958.
4] Gelfond, A. O., and Linnik, Yu. V. Elementary Methods
in Analytic Number Theory. Rand McNally, Chicago, 1965.
5] Blum, M. Measures on the computation speed of partial recur-
sive functions. Quart. Prog. Rep. 72, Res. Lab. Electronics, MIT,
Cambridge, Mass., Jan. 1964, pp. 237{253.
6] Arbib, M. A. Speed-up theorems and incompleteness theorems.
In Automata Theory, E. R. Cainiello (Ed.), Academic Press, New
York, 1966, pp. 6{24.
7] Fraenkel, A. A. Abstract Set Theory. North-Holland, Amster-
dam, The Netherlands, 1961.
8] Borel, E . Le"cons sur la Theorie des Fonctions. Gauthier-
Villars, Paris, 1914.
9] Hardy, G. H. Orders of Innity. Cambridge Math. Tracts, No.
12, U. of Cambridge, Cambridge, Eng., 1924.
10] Dekker, J. C. E., and Myhill, J. Retraceable sets. Canadian
J. Math. 10 (1958), 357{373.
11] Solomonoff, R. J. A formal theory of inductive inference, Pt.
I. Inform. Contr. 7 (1964), 1{22.
12] Kolmogorov, A. N. Three approaches to the denition of the
concept \amount of information." Problemy Peredachi Informat-
sii 1 (1965), 3{11. (Russian)
460 Part VI|Technical Papers on Turing Machines & LISP
13] Chaitin, G. J. On the length of programs for computing nite
binary sequences: statistical considerations. J. ACM 16, 1 (Jan.
1969), 145{159.
14] Arbib, M. A., and Blum, M. Machine dependence of degrees
of diculty. Proc. Amer. Math. Soc. 16 (1965), 442{447.
15] Birkhoff, G. Lattice Theory. Amer. Math. Soc. Colloq. Publ.
Vol. 25, Amer. Math. Soc., Providence, R. I., 1967.
16] Chaitin, G. J. On the length of programs for computing nite
binary sequences. J. ACM 13, 4 (Oct. 1966), 547{569.
17] Sacks, G. E. Degrees of Unsolvability. No. 55, Annals of Math.
Studies, Princeton U. Press, Princeton, N. J., 1963.
18] Hartmanis, J., and Stearns, R. E. On the computational
complexity of algorithms. Trans. Amer. Math. Soc. 117 (1965),
285{306.
19] Blum, M. A machine-independent theory of the complexity of
recursive functions. J. ACM 14, 2 (Apr. 1967), 322{336.
20] Chaitin, G. J. A lattice of computer speeds. Abstract 67T-397,
Notices Amer. Math. Soc. 14 (1967), 538.
21] Mostowski, A. U
ber gewisse universelle Relationen. Ann. Soc.
Polon. Math. 17 (1938), 117{118.
22] Blum, M. On the size of machines. Inform. Contr. 11 (1967),
257{265.
23] Chaitin, G. J. On the diculty of computations. Panamerican
Symp. of Appl. Math., Buenos Aires, Argentina, Aug. 10, 1968.
(to be published)
461
ON THE LENGTH OF
PROGRAMS FOR
COMPUTING FINITE
BINARY SEQUENCES BY
BOUNDED-TRANSFER
TURING MACHINES
AMS Notices 13 (1966), p. 133
463
464 Part VII|Abstracts
with the rest of the tape blank, and scanning the rst blank square of
the tape. Dene L(S ) for any nite binary sequence S by: A Turing
machine with n internal states can be programmed to compute S if and
only if n L(S ). Dene L(Cn) by L(Cn) = max L(S ), where S is any
binary sequence of length n. Let Cn be the set of all binary sequences
of length n satisfying L(S ) = L(Cn).
Then
(1) L(Cn ) ! an:
(2) There exists a constant c such that for all m and n, those binary
sequences S of length n satisfying
L(S ) < L(Cn ) ; log2 n] ; m ; c
are less than 2n;m in number.
(3) For any e > 0 and d > 1, for all n suciently large, if S is a
binary sequence of length n such that the ratio of the number of 0's in
S to n diers from 12 by more than e, then
L(S ) < L(CndH ( 21 +e 12 ;e)]):
Here
H (p q) = ;p log2 p ; q log2 q:
We propose also that elements of Cn be considered the most pattern-
less or random binary sequences of length n. This leads to a denition
and theory of randomness related to the R. von Mises{A. Wald{A.
Church theory, but in accord with some criticisms of K. R. Popper.
(Received October 19, 1965.)
ON THE LENGTH OF
PROGRAMS FOR
COMPUTING FINITE
BINARY SEQUENCES BY
BOUNDED-TRANSFER
TURING MACHINES II
AMS Notices 13 (1966), pp. 228{229
465
466 Part VII|Abstracts
operation. Hence
L(Cn+m ) L(Cn ) + L(Cm):
With (1) this subadditivity property yields
(5) an L(Cn)
(actually, subadditivity is used in the proof of (1)). Also,
(6) for any natural number k, if an element of Cn is partitioned
into successive subsequences of length k, then each of the 2k possible
subsequences will occur ! 2;k (n=k) times.
(6) follows from (1) and a generalization of (3). (4), (5) and (6) give
immediately X
(7) an 2;n L(S )
where the summation is over binary sequences S of length n. Denote
the binary sequence of length n consisting entirely of zeros by 0n . As
L(0n ) = O(log n), for n suciently large
X
L(Cn ) > 2;n L(S ) an
or
(8) an < L(Cn):
For each k it follows from (4) and (6) that for s suciently large
L(Cs) = L(S ) = L(S 0 0k S 00)
where
S 0 0k S 00 = S 2 Cs
so that
L(Cs) L(Cn) + L(Cm ) + L(0k ) (n + m = s ; k):
This last inequality yields
(9) (L(Cn ) ; an) is unbounded.
(Received January 6, 1966.)
COMPUTATIONAL
COMPLEXITY AND
GO DEL'S
INCOMPLETENESS
THEOREM
AMS Notices 17 (1970), p. 672
467
468 Part VII|Abstracts
COMPUTATIONAL
COMPLEXITY AND
GO DEL'S
INCOMPLETENESS
THEOREM
ACM SIGACT News, No. 9
(April 1971), pp. 11{12
G. J. Chaitin
IBM World Trade, Buenos Aires
Abstract
Given any simply consistent formal theory F of the state complexity
L(S ) of nite binary sequences S as computed by 3-tape-symbol Turing
machines, there exists a natural number L(F ) such that L(S ) > n is
provable in F only if n < L(F ). On the other hand, almost all nite
binary sequences S satisfy L(S ) > L(F ). The proof resembles Berry's
paradox, not the Epimenides nor Richard paradoxes.
469
470 Part VII|Abstracts
473
474 Part VII|Abstracts
false: 9x 8y P (x y a) (a < n).
(Received June 19, 1972.)
INFORMATION-
THEORETIC ASPECTS OF
POST'S CONSTRUCTION
OF A SIMPLE SET
AMS Notices 19 (1972), p. A{712
475
476 Part VII|Abstracts
(Received June 19, 1972.)
ON THE DIFFICULTY OF
GENERATING ALL
BINARY STRINGS OF
COMPLEXITY LESS
THAN N
AMS Notices 19 (1972), p. A{764
477
478 Part VII|Abstracts
where the maximum is taken over all number-theoretic functions f of
complexity less than n.
Let X
(n) = (the length of S )
where the sum is taken over all binary strings S of complexity less than
n.
Take f g to mean that there are c and c0 such that for all n,
f (n) g(n + c) and g(n) f (n + c0).
Theorem. .
(Received June 19, 1972.)
ON THE GREATEST
NATURAL NUMBER OF
DEFINITIONAL OR
INFORMATION
COMPLEXITY N
Recursive Function Theory: Newsletter,
No. 4 (Jan. 1973), pp. 11{13
481
482 Part VII|Abstracts
by c + log2 n.
G. J. Chaitin, Mario Bravo 249, Buenos Aires, Argentina]
THERE ARE FEW
MINIMAL DESCRIPTIONS
Recursive Function Theory: Newsletter,
No. 4 (Jan. 1973), p. 14
483
484 Part VII|Abstracts
INFORMATION-
THEORETIC
COMPUTATIONAL
COMPLEXITY
Abstracts of Papers, 1973 IEEE Inter-
national Symposium on Information The-
ory, June 25{29, 1973, King Saul Hotel,
Ashkelon, Israel, IEEE Catalog No. 73
CHO 753{4 IT, p. F1{1
485
486 Part VII|Abstracts
biology.
A THEORY OF PROGRAM
SIZE FORMALLY
IDENTICAL TO
INFORMATION THEORY
Abstracts of Papers, 1974 IEEE Interna-
tional Symposium on Information Theory,
October 28{31, 1974, University of Notre
Dame, Notre Dame, Indiana, USA, IEEE
Catalog No. 74 CHO 883{9 IT, p. 2
489
490 Part VII|Abstracts
Part VIII
Bibliography
491
PUBLICATIONS OF
G J CHAITIN
1. \On the length of programs for computing nite binary sequences
by bounded-transfer Turing machines," AMS Notices 13 (1966),
p. 133.
2. \On the length of programs for computing nite binary sequences
by bounded-transfer Turing machines II," AMS Notices 13 (1966),
pp. 228{229.
3. \On the length of programs for computing nite binary se-
quences," Journal of the ACM 13 (1966), pp. 547{569.
4. \On the length of programs for computing nite binary sequences:
statistical considerations," Journal of the ACM 16 (1969), pp.
145{159.
5. \On the simplicity and speed of programs for computing innite
sets of natural numbers," Journal of the ACM 16 (1969), pp.
407{422.
6. \On the diculty of computations," IEEE Transactions on In-
formation Theory IT-16 (1970), pp. 5{9.
7. \To a mathematical denition of `life'," ACM SICACT News, No.
4 (Jan. 1970), pp. 12{18.
8. \Computational complexity and G
odel's incompleteness theo-
rem," AMS Notices 17 (1970), p. 672.
493
494 Part VIII|Bibliography
9. \Computational complexity and G
odel's incompleteness theo-
rem," ACM SIGACT News, No. 9 (April 1971), pp. 11{12.
10. \Information-theoretic aspects of the Turing degrees," AMS No-
tices 19 (1972), pp. A-601, A-602.
11. \Information-theoretic aspects of Post's construction of a simple
set," AMS Notices 19 (1972), p. A-712.
12. \On the diculty of generating all binary strings of complexity
less than n" AMS Notices 19 (1972), p. A-764.
13. \On the greatest natural number of denitional or information
complexity n" Recursive Function Theory: Newsletter, No. 4
(Jan. 1973), pp. 11{13.
14. \A necessary and sucient condition for an innite binary string
to be recursive," Recursive Function Theory: Newsletter, No. 4
(Jan. 1973), p. 13.
15. \There are few minimal descriptions," Recursive Function The-
ory: Newsletter, No. 4 (Jan. 1973), p. 14.
16. \Information-theoretic computational complexity," Abstracts of
Papers, 1973 IEEE International Symposium on Information
Theory, p. F1{1.
17. \Information-theoretic computational complexity," IEEE Trans-
actions on Information Theory IT-20 (1974), pp. 10{15.
Reprinted in T. Tymoczko, New Directions in the Philosophy of
Mathematics, Birkh
auser, 1986.
18. \Information-theoretic limitations of formal systems," Journal of
the ACM 21 (1974), pp. 403{424.
19. \A theory of program size formally identical to information the-
ory," Abstracts of Papers, 1974 IEEE International Symposium
on Information Theory, p. 2.
20. \Randomness and mathematical proof," Scientic American 232,
No. 5 (May 1975), pp. 47{52.
Publications of G J Chaitin 495
21. \A theory of program size formally identical to information the-
ory," Journal of the ACM 22 (1975), pp. 329{340.
22. \Information-theoretic characterizations of recursive innite stri-
ngs," Theoretical Computer Science 2 (1976), pp. 45{48.
23. \Algorithmic entropy of sets," Computers & Mathematics with
Applications 2 (1976), pp. 233{245.
24. \Program size, oracles, and the jump operation," Osaka Journal
of Mathematics 14 (1977), pp. 139{149.
25. \Algorithmic information theory," IBM Journal of Research and
Development 21 (1977), pp. 350{359, 496.
26. \Recent work on algorithmic information theory," Abstracts of
Papers, 1977 IEEE International Symposium on Information
Theory, p. 129.
27. \A note on Monte Carlo primality tests and algorithmic informa-
tion theory," with J.T. Schwartz, Communications on Pure and
Applied Mathematics 31 (1978), pp. 521{527.
28. \Toward a mathematical denition of `life'," in R.D. Levine and
M. Tribus, The Maximum Entropy Formalism, MIT Press, 1979,
pp. 477{498.
29. \Algorithmic information theory," in Encyclopedia of Statistical
Sciences, Volume 1, Wiley, 1982, pp. 38{41.
30. \G
odel's theorem and information," International Journal of
Theoretical Physics 22 (1982), pp. 941{954. Reprinted in T.
Tymoczko, New Directions in the Philosophy of Mathematics,
Birkh
auser, 1986.
31. \Randomness and G
odel's theorem," Mondes en Developpement,
No. 54{55 (1986), pp. 125{128.
32. \Incompleteness theorems for random reals," Advances in Applied
Mathematics 8 (1987), pp. 119{146.
496 Part VIII|Bibliography
33. Algorithmic Information Theory, Cambridge University Press,
1987.
34. Information, Randomness & Incompleteness, World Scientic,
1987.
35. \Randomness in arithmetic," Scientic American 259, No. 1
(July 1988), pp. 80{85.
36. Algorithmic Information Theory, 2nd printing (with revisions),
Cambridge University Press, 1988.
37. Information, Randomness & Incompleteness, 2nd edition, World
Scientic, 1990.
38. Algorithmic Information Theory, 3rd printing (with revisions),
Cambridge University Press, 1990.
39. \A random walk in arithmetic," New Scientist 125, No. 1709 (24
March 1990), pp. 44{46. Reprinted in N. Hall, The New Scientist
Guide to Chaos, Penguin, 1992, and in N. Hall, Exploring Chaos,
Norton, 1993.
40. \Algorithmic information & evolution," in O.T. Solbrig and G.
Nicolis, Perspectives on Biological Complexity, IUBS Press, 1991,
pp. 51{60.
41. \Le hasard des nombres," La Recherche 22, N 232 (mai 1991),
pp. 610{615.
42. \Complexity and biology," New Scientist 132, No. 1789 (5 Octo-
ber 1991), p. 52.
43. \LISP program-size complexity," Applied Mathematics and Com-
putation 49 (1992), pp. 79{93.
44. \Information-theoretic incompleteness," Applied Mathematics
and Computation 52 (1992), pp. 83{101.
45. \LISP program-size complexity II," Applied Mathematics and
Computation 52 (1992), pp. 103{126.
Publications of G J Chaitin 497
46. \LISP program-size complexity III," Applied Mathematics and
Computation 52 (1992), pp. 127{139.
47. \LISP program-size complexity IV," Applied Mathematics and
Computation 52 (1992), pp. 141{147.
48. \A Diary on Information Theory," The Mathematical Intelli-
gencer 14, No. 4 (Fall 1992), pp. 69{71.
49. Information-Theoretic Incompleteness, World Scientic, 1992.
50. Algorithmic Information Theory, 4th printing, Cambridge Uni-
versity Press, 1992. (Identical to 3rd printing.)
51. \Randomness in arithmetic and the decline and fall of reduction-
ism in pure mathematics," Bulletin of the European Association
for Theoretical Computer Science, No. 50 (June 1993), pp. 314{
328. Reprinted in J.L. Casti and A. Karlqvist, Cooperation and
Conict in General Evolutionary Processes, Wiley, 1995. Also
reprinted in Chaos, Solitons & Fractals, Vol. 5, No. 2, pp. 143{
159, 1995.
52. \On the number of n-bit strings with maximum complexity," Ap-
plied Mathematics and Computation 59 (1993), pp. 97{100.
53. \The limits of mathematics|course outline & software," 127 pp.,
December 1993. To obtain, send e-mail to \chao-dyn @ xyz.lanl.-
gov" with \Subject: get 9312006".
54. \Randomness and complexity in pure mathematics," Interna-
tional Journal of Bifurcation and Chaos 4 (1994), pp. 3{15.
55. \Responses to `Theoretical mathematics:: : '," Bulletin of the
American Mathematical Society 30 (1994), pp. 181{182.
56. Foreword in C. Calude, Information and Randomness, Springer-
Verlag, 1994, pp. ix{x.
57. \The limits of mathematics," 270 pp., July 1994. To obtain, send
e-mail to \chao-dyn @ xyz.lanl.gov" with \Subject: get 9407003".
498 Part VIII|Bibliography
58. \The limits of mathematics IV," 231 pp., July 1994. To ob-
tain, send e-mail to \chao-dyn @ xyz.lanl.gov" with \Subject:
get 9407009".
59. \The limits of mathematics|extended abstract," 7 pp., July
1994. To obtain, send e-mail to \chao-dyn @ xyz.lanl.gov" with
\Subject: get 9407010".
60. \Randomness in arithmetic and the decline and fall of reduction-
ism in pure mathematics," in J. Cornwell, Nature's Imagination,
Oxford University Press, 1995, pp. 27{44.
61. \The Berry paradox," Complexity 1, No. 1 (1995), pp. 26{30.
62. \A new version of algorithmic information theory," 12 pp., June
1995. To obtain, send e-mail to \chao-dyn @ xyz.lanl.gov" with
\Subject: get 9506003".
63. \The limits of mathematics|tutorial version," 143 pp., Septem-
ber 1995. To obtain, send e-mail to \chao-dyn @ xyz.lanl.gov"
with \Subject: get 9509010".
64. \How to run algorithmic information theory on a computer," 21
pp., September 1995. To obtain, send e-mail to \chao-dyn @
xyz.lanl.gov" with \Subject: get 9509014".
65. \The limits of mathematics," 45 pp., September 1995. To ob-
tain, send e-mail to \chao-dyn @ xyz.lanl.gov" with \Subject:
get 9509021".
DISCUSSIONS OF
CHAITIN'S WORK
1. M. Davis, \What is a computation?," in L.A. Steen, Mathematics
Today, Springer-Verlag, 1978.
2. R. Rucker, Mind Tools, Houghton Mi+in, 1987.
3. J.L. Casti, Searching for Certainty, Morrow, 1990.
4. J.A. Paulos, Beyond Numeracy, Knopf, 1991.
5. J.D. Barrow, Theories of Everything, Oxford University Press,
1991.
6. D. Ruelle, Chance and Chaos, Princeton University Press, 1991.
7. P. Davies, The Mind of God, Simon & Schuster, 1992.
8. J.D. Barrow, Pi in the Sky, Oxford University Press, 1992.
9. L. Brisson and F.W. Meyerstein, Puissance et Limites de la Rai-
son, Les Belles Lettres, 1995.
10. G. Johnson, Fire in the Mind, Knopf, 1995.
11. P. Coveney and R. Higheld, Frontiers of Complexity, Fawcett
Columbine, 1995.
499
500 Part VIII|Bibliography
Epilogue
501
UNDECIDABILITY &
RANDOMNESS IN PURE
MATHEMATICS
This is a lecture that was given 28 September 1989 at the Eu-
ropalia 89 Conference on Self-Organization in Brussels. The
lecture was
lmed by EuroPACE this is an edited transcript.
Published in G. J. Chaitin, Information, Randomness & In-
completeness, 2nd Edition, World Scienti
c, 1990, pp. 307{
313.
G. J. Chaitin
Abstract
I have shown that God not only plays dice in physics, but even in pure
mathematics, in elementary number theory, in arithmetic! My work is a
fundamental extension of the work of Godel and Turing on undecidabil-
ity in pure mathematics. I show that not only does undecidability occur,
but in fact sometimes there is complete randomness, and mathematical
truth becomes a perfect coin toss.
503
504 Epilogue
Randomness in Physics
What I'd like to talk about today is taking an important and fundamen-
tal idea from physics and applying it to mathematics. The fundamental
idea that I'm referring to is the notion of randomness, which I think
it is fair to say obsesses physicists. That is to say, the question of to
what extent is the future predictable, to what extent is our inability to
predict the future our limitation, or whether it is in principle impossible
to predict the future.
This idea has of course a long history in physics. In Newtonian
physics there was Laplacian determinism. Then came quantum me-
chanics. One of the controversial features of quantum mechanics was
that probability and randomness were introduced at a fundamental
level to physics. This greatly upset Einstein. And then surprisingly
enough with the modern study of nonlinear dynamics we realize that
classical physics after all really did have randomness and unpredictabil-
ity at its very core. So the notion of randomness and unpredictability
begins to look like a unifying principle, and I would like to suggest that
this even extends to mathematics.
I would like to suggest that the situation in mathematics is related to
the one in physics: If we can't prove something, if we don't see a pattern
or a law, or we cannot prove a theorem, the question is, is this our fault,
is it just a human limitation because we're not bright enough or we
haven't worked long enough on the question to be able to settle it? Or
is it possible that sometimes there simply is no mathematical structure
to be discovered, no mathematical law, no mathematical theorem, and
in fact no answer to a mathematical question? This is the question
about randomness and unpredictability in physics, transferred to the
domain of mathematics.
Arithmetization
Now you will of course immediately say, \This is not the kind of math-
ematical assertion that I normally encounter in pure mathematics."
What one would like, of course, is to translate it into number theory,
the bedrock of mathematics.
And you know G
odel had the same problem. When he originally
constructed his unprovable true assertion, it was bizarre. It said, \I'm
unprovable!" Now that is not the kind of mathematical assertion that
one normally considers as a working mathematician. G
odel devoted a
lot of ingenuity, some very clever, brilliant and dense mathematics, to
making \I'm unprovable" into an assertion about whole numbers. This
includes the trick of G
odel numbering and a lot of number theory.
There has been a lot of work deriving from that original work of
G
odel's. In fact that work was ultimately used to show that Hilbert's
tenth problem is unsolvable. A number of people worked on that. I can
take advantage of all that work that's been done over the past sixty
years. There is a particularly dramatic development, the work of Jones
and Matijasevi%c which was published about ve years ago.
They discovered that the whole subject is really easy, which is sur-
prising because it had been very intricate and messy. They discovered
in fact that there was a theorem proved by E
douard Lucas a hundred
years ago, a very simple theorem that does the whole job, if one knows
how to use it properly, as Jones and Matijasevi%c showed how to do.
So one needs very little number theory to convert the assertion
about $ that I talked about into an assertion about whole numbers, an
arithmetical assertion. Let me just state this result of Lucas because
it's delightful, and it's surprisingly powerful. That was of course the
achievement of Jones and Matijasevi%c, to realize this.
The hundred-year old theorem of Lucas has to do with when is a
binomial coecient even and when is it odd. If one asks what is the
coecient of X K in the expansion of (1 + X )N , in other words, what
is the K th binomial coecient of order N , well the answer is that it's
odd if and only if K implies N |on a bit by bit basis, considered as
Undecidability & Randomness in Pure Mathematics 511
bit strings. In other words, to know if a binomial coecient KN \N
choose K " is odd, what you have to do is look at each bit in the lower
number K that's on, and check if the corresponding bit in the upper
number N is also on. If that's always the case on a bit by bit basis,
then, and only then, will the binomial coecient be odd. Otherwise
it'll be even.
This is a remarkable fact, and it turns out to be all the number
theory one really needs to know, amazingly enough.
Randomness in Arithmetic
So what is the result of using this technique of Jones and Matijasevi%c
based on this remarkable theorem of Lucas?
Well, the result of this is a diophantine equation. I thought it
would be fun to actually write it down, since my assertion that there
is randomness in pure mathematics would have more force if I can
exhibit it as concretely as possible. So I spent some time and eort on
a large computer and with the help of the computer I wrote down a
two-hundred page equation with seventeen-thousand variables.
This is what is called an exponential diophantine equation. That
is to say, it involves only whole numbers, in fact, non-negative whole
numbers, 0, 1, 2, 3, 4, 5, ... the natural numbers. All the variables and
constants are non-negative integers. It's called \exponential diophan-
tine," \exponential" because in addition to addition and multiplication
one allows also exponentiation, an integer raised to an integer power.
That's why it's called an exponential diophantine equation. That's also
allowed in normal polynomial diophantine equations but the power has
to be a constant. Here the power can be a variable. So in addition to
seeing X 3 one can also see X Y in this equation.
So it's a single equation with 17,000 variables and everything is
considered to be non-negative integers, unsigned whole numbers. And
this equation of mine has a single parameter, the variable N . For
any particular value of this parameter, I ask the question, \Does this
equation have a nite number of whole-number solutions or does this
equation have an innite number of solutions?"
The answer to this question is my random arithmetical fact|it
512 Epilogue
turns out to correspond to tossing a coin. It \encodes" arithmetically
whether the N th bit of $ is a 0 or a 1. If the N th bit of $ is a 0,
then this equation, for that particular value of N , has nitely many
solutions. If the N th bit of the halting probability $ is a 1, then this
equation for that value of the parameter N has an innite number of
solutions.
The change from Hilbert is twofold: Hilbert looked at polynomial
diophantine equations. One was never allowed to raise X to the Y th
power, only X to the 5th power. Second, Hilbert asked \Is there a
solution? Does a solution exist or not?" This is undecidable, but it is
not completely random, it only gives a certain amount of randomness.
To get complete randomness, like an independent fair coin toss, one
needs to ask, \Is there an innite number of solutions or a nite number
of solutions?"
Let me point out, by the way, that if there are no solutions, that's
a nite number of solutions, right? So it's one way or the other. It
either has to be an innite number or a nite number of solutions. The
problem is to know which. And my assertion is that we can never know!
In other words, to decide whether the number of solutions is nite
or innite (the number of solutions in whole numbers, in non-negative
integers) in each particular case, is in fact an irreducible arithmetical
mathematical fact.
So let me emphasize what I mean when I say \irreducible math-
ematical facts." What I mean, is that it's just like independent coin
tosses, like a fair coin. What I mean, is that essentially the only way to
get out as theorems whether the number of solutions is nite or innite
in particular cases, is to assume this as axioms.
In other words, if we want to be able to settle K cases of this
question|whether the number of solutions is nite or not for K par-
ticular values of the parameter N |that would require that K bits of
information be put into the axioms that we use in our formal axiom
system. That's a very strong sense of saying that these are irreducible
mathematical facts.
I think it's fair to say that whether the number of solutions is nite
or innite can therefore be considered to be a random mathematical or
arithmetical fact.
To recapitulate, Hilbert's tenth problem asks \Is there a solution?"
Undecidability & Randomness in Pure Mathematics 513
and doesn't allow exponentiation. I ask \Is the number of solutions
nite?" and I do allow exponentiation.
In the sixth question, it is proposed to axiomatize probability theory
as part of physics, as part of Hilbert's program to axiomatize physics.
But I have found an extreme form of randomness, of irreducibility, in
pure mathematics|in a part of elementary number theory associated
with the name of Diophantos and which goes back two thousand years
to classical Greek mathematics.
Moreover, my work is an extension of the work of G
odel and Turing
which refuted Hilbert's basic assumption in his 1900 lecture, that every
mathematical question has an answer|that if you ask a clear question
there is a clear answer. Hilbert believed that mathematical truth is
black or white, that something is either true or false. It now looks
like it's gray, even when you're just thinking about the unsigned whole
numbers, the bedrock of mathematics.
Further Reading
I. Stewart, \The ultimate in undecidability," Nature, 10 March
1988, pp. 115{116.
J. P. Delahaye, \Une extension spectaculaire du th
eor5eme de
G
odel: l'
equation de Chaitin," La Recherche, juin 1988, pp. 860{
862. English translation, AMS Notices, October 1989, pp. 984{
987.
G. J. Chaitin, \Randomness in arithmetic," Scientic American,
July 1988, pp. 80{85.
G. J. Chaitin, Information, Randomness & Incompleteness|
Papers on Algorithmic Information Theory, World Scientic, Sin-
gapore, 1987.
Undecidability & Randomness in Pure Mathematics 515
G. J. Chaitin, Algorithmic Information Theory, Cambridge Uni-
versity Press, Cambridge, 1987.
516 Epilogue
ALGORITHMIC
INFORMATION &
EVOLUTION
This is a revised version of a lecture presented April 1988 in
Paris at the International Union of Biological Sciences Work-
shop on Biological Complexity. Published in O. T. Solbrig
and G. Nicolis, Perspectives on Biological Complexity, IUBS
Press, 1991, pp. 51{60.
G. J. Chaitin
IBM Research Division, P.O. Box 218
Yorktown Heights, NY 10598, U.S.A.
Abstract
A theory of information and computation has been developed: \algo-
rithmic information theory." Two books $11{12] have recently been
published on this subject, as well as a number of nontechnical dis-
cussions $13{16]. The main thrust of algorithmic information the-
ory is twofold: (1) an information-theoretic mathematical denition
of random sequence via algorithmic incompressibility, and (2) strong
information-theoretic versions of Godel's incompleteness theorem. The
517
518 Epilogue
halting probability $ of a universal Turing machine plays a fundamental
role. $ is an abstract example of evolution: it is of innite complexity
and the limit of a computable sequence of rational numbers.
2. Evolution
The origin of life and its evolution from simpler to more complex forms,
the origin of biological complexity and diversity, and more generally
the reason for the essential dierence in character between biology and
physics, are of course extremely fundamental scientic questions.
While Darwinian evolution, Mendelian genetics, and modern molec-
ular biology have immensely enriched our understanding of these ques-
tions, it is surprising to me that such fundamental scientic ideas should
not be reected in any substantive way in the world of mathematical
ideas. In spite of the persuasiveness of the informal considerations that
adorn biological discussions, it has not yet been possible to extract any
nuggets of rigorous mathematical reasoning, to distill any fundamental
new rigorous mathematical concepts.
In particular, by historical coincidence the extraordinary recent
progress in molecular biology has coincided with parallel progress in
the emergent eld of computational complexity, a branch of theoretical
computer science. But in spite of the fact that the word \complexity"
springs naturally to mind in both elds, there is at present little contact
between these two worlds of ideas!
The ultimate goal, in fact, would be to set up a toy world, to dene
mathematically what is an organism and how to measure its complexity,
and to prove that life will spontaneously arise and increase in complex-
ity with time.
4. Previous work
I have been concerned with these extremely dicult questions for the
past twenty years, and have a series of publications 1{2, 7{13] devoted
in whole or in part to searching for ties between the concepts of al-
gorithmic information theory and the notion of biological information
and complexity.
In spite of the fact that a satisfactory denition of randomness or
lack of structure has been achieved in algorithmic information theory,
the rst thing that one notices is that it is not ipso facto useful in
biology. For applying this notion to physical structures, one sees that
a gas is the most random, and a crystal the least random, but neither
has any signicant biological organization.
My rst thought was therefore that the notion of mutual or com-
mon information, which measures the degree of correlation between two
structures, might be more appropriate in a biological context. I devel-
oped these ideas in a 1970 paper 1], and again in a 1979 paper 8] using
the more-correct self-delimiting program-size complexity measures.
In the concluding chapter of my Cambridge University Press book
11] I turned to these questions again, with a number of new thoughts,
522 Epilogue
among them to determine where biological questions fall in what logi-
cians call the \arithmetical hierarchy."
The concluding remarks of my 1988 Scientic American article 13]
emphasize what I think is probably the main contribution of the chapter
at the end of my book 11]. This is the fact that in a sense there is a
kind of evolution of complexity taking place in algorithmic information
theory, and indeed in a very natural context.
The remaining publications 2, 7, 9{10, 12] emphasize the impor-
tance of the problem, but do not make new suggestions.
6. Technical note: A
nite version of this
model
There is also a \nite" version of this abstract model of evolution. In
it one xes N and constructs a computable innite sequence st = s(t)
of N -bit strings, with the property that for all suciently large times t,
st = st+1 is a xed random N -bit string, i.e., one for which its program-
size complexity H (st) is not less than its size in bits N . In fact, we
can take st to be the rst N -bit string that cannot be produced by any
program less than N bits in size in less than t seconds.
In a sense, the N bits of information in st for t large are coming
from t itself. So one way to state this, is that knowing a suciently
large natural number t is \equivalent to having an oracle for the halting
problem" (as a logician would put it). That is to say, it provides as
much information as one wants.
By the way, computations in the limit are extensively discussed in
my two papers 5{6], but in connection with questions of interest in
algorithmic information theory rather than in biology.
7. Conclusion
To conclude, I must emphasize a number of disclaimers.
First of all, $ is a metaphor for evolution only in an extremely
abstract mathematical sense. The measures of complexity that I use,
while very pretty mathematically, pay for this prettiness by having
limited contact with the real world.
In particular, I postulate no limit on the amount of time that may
be taken to compute an object from its minimal-size description, as
long as the amount of time is nite. Nine months is already a long
time to ask a woman to devote to producing a working human infant
from its DNA description. A pregnancy of a billion years, while okay
in algorithmic information theory, is ridiculous in a biological context.
Algorithmic Information & Evolution 525
Yet I think it would also be a mistake to underestimate the signif-
icance of these steps in the direction of a fundamental mathematical
theory of evolution. For it is important to start bringing rigorous con-
cepts and mathematical proofs into the discussion of these absolutely
fundamental biological questions, and this, although to a very limited
extent, has been achieved.
References
Items 1 to 10 are reprinted in item 12.
1] G. J. Chaitin, \To a mathematical denition of `life'," ACM
SICACT News, January 1970, pp. 12{18.
2] G. J. Chaitin, \Information-theoretic computational complexity,"
IEEE Transactions on Information Theory IT-20 (1974), pp. 10{
15.
3] G. J. Chaitin, \Randomness and mathematical proof," Scientic
American, May 1975, pp. 47{52.
4] G. J. Chaitin, \A theory of program size formally identical to
information theory," Journal of the ACM 22 (1975), pp. 329{340.
5] G. J. Chaitin, \Algorithmic entropy of sets," Computers & Math-
ematics with Applications 2 (1976), pp. 233{245.
6] G. J. Chaitin, \Program size, oracles, and the jump operation,"
Osaka Journal of Mathematics 14 (1977), pp. 139{149.
7] G. J. Chaitin, \Algorithmic information theory," IBM Journal of
Research and Development 21 (1977), pp. 350{359, 496.
8] G. J. Chaitin, \Toward a mathematical denition of `life'," in
R.D. Levine and M. Tribus, The Maximum Entropy Formalism,
MIT Press, 1979, pp. 477{498.
9] G. J. Chaitin, \Algorithmic information theory," in Encyclopedia
of Statistical Sciences, Volume 1, Wiley, 1982, pp. 38{41.
526 Epilogue
10] G. J. Chaitin, \G
odel's theorem and information," International
Journal of Theoretical Physics 22 (1982), pp. 941{954.
11] G. J. Chaitin, Algorithmic Information Theory, Cambridge Uni-
versity Press, 1987.
12] G. J. Chaitin, Information, Randomness & Incompleteness|
Papers on Algorithmic Information Theory, World Scientic,
1987.
13] G. J. Chaitin, \Randomness in arithmetic," Scientic American,
July 1988, pp. 80{85.
14] P. Davies, \A new science of complexity," New Scientist, 26 No-
vember 1988, pp. 48{50.
15] J. P. Delahaye, \Une extension spectaculaire du th
eor5eme de
G
odel: l'
equation de Chaitin," La Recherche, juin 1988, pp. 860{
862. English translation, AMS Notices, October 1989, pp. 984{
987.
16] I. Stewart, \The ultimate in undecidability," Nature, 10 March
1988, pp. 115{116.
527
About the author
Gregory J Chaitin is a member of the theoretical physics group at the
IBM Thomas J Watson Research Center in Yorktown Heights, New
York. He created algorithmic information theory in the mid 1960's
when he was a teenager. In the two decades since he has been the prin-
cipal architect of the theory. His contributions include: the denition
of a random sequence via algorithmic incompressibility, the reformu-
lation of program-size complexity in terms of self-delimiting programs,
the denition of the relative complexity of one object given a minimal-
size program for another, the discovery of the halting probability Omega
and its signicance, the information-theoretic approach to Godel's in-
completeness theorem, the discovery that the question of whether an
exponential diophantine equation has nitely or innitely many solu-
tions is in some cases absolutely random, and the theory of program size
for Turing machines and for LISP. He is the author of the monograph
\Algorithmic Information Theory" published by Cambridge University
Press in 1987.
529
530
INFORMATION,
RANDOMNESS &
INCOMPLETENESS
Papers on Algorithmic Information Theory
| Second Edition
World Scienti
c Series in Computer Sci-
ence | Vol. 8
by Gregory J Chaitin (IBM)