Computer Runtimes and The Length of Proofs
Computer Runtimes and The Length of Proofs
Abstract. This paper is an experimental exploration of the relationship between the runtimes of Turing machines and the length of proofs in formal axiomatic systems. We compare the number of halting Turing machines of a given size to the number of provable theorems of rst-order logic of a given size, and the runtime of the longest-running Turing machine of a given size to the proof length of the most-dicult-to-prove theorem of a given size. It is suggested that theorem provers are subject to the same non-linear tradeo between time and size as computer programs are, aording the possibility of determining optimal timeouts and waiting times in automatic theorem proving. I provide the statistics for some small choices of parameters for both of these systems. Keywords: halting problem, halting probability, proof length, automatic theorem proving, Busy Beaver problem, program-size complexity, small Turing machines.
Introduction
While profound connections between computer programs and mathematical proofs have been studied and are known (e.g. the Curry-Howard correspondence), little has been done to connect the two elds at the level of empirical practice. We present an experimental approach to the question of optimal proving times for automatic theorem provers, which bears out Calude and Stays theoretical ndings that programs either stop quickly or never halt [4]. Working with self-delimiting programs, that is, programs that are not the beginning of any other valid programs, Chaitin dened the complexity of the runtime of a program which eventually halts that we cannot eectively compute [5], and Calude and Stay have recently proven [4] that even though short programs can run for a very long time, long programs are the scarcest because most of them will stop rather quicklyif they ever dodepending on their length. Thus, the probability of a machine halting decreases the longer it takes to halt, if it ever does. Just as Calude and Stay suggest that most Turing machines are fully determined qua termination by a small number of computational steps, and that the
M.J. Dinneen et al. (Eds.): WTCS 2012 (Calude Festschrift), LNCS 7160, pp. 224240, 2012. c Springer-Verlag Berlin Heidelberg 2012
225
error margin drops drastically, in [8] we have also shown that Turing machines are fully determined qua extensionality by a small number of initial input values (a theoretical value for the error margin has yet to be determined but the very few data points that we could generate suggest to follow at least a polynomial distribution). We undertake an experimental approach to the runtimes of deterministic Turing machines up to three states and two symbols in connection, and empirical evidence, to Calude and Stays theoretical results. Then we undertake the same experimental approach to formulas of predicate calculus, in order to nd some (if any) evidence in favour of a possible similar non-linear phenomenon in the distribution of proof lengths of (dis)proven theorems in random axiom systems and Turing machines. Traditional intuition might make one think this an ill-fated approach. On the one hand because undecidability would interfere in any such experimental attempt, and on the other hand, because small systems may say more about design choices than about important results. Even though possible limiting eects may appear right away one can limitedly circumvent these limits (as the Busy Beaver problem does) in an eort tantamount to other interesting experiments including some of Caludes own interest [3] or of my own [7], this latter providing useful applications for the evaluation of the algorithmic complexity of short strings dicult to calculate with the other alternative (lossless compression algorithms). With the intuition one gets from studying small systems (see [13]), it seems worth it and insightful to undertake these kind of experiments. 1.1 The Halting Problem
The Halting Problem for Turing machines involves deciding whether an arbitrary Turing machine M eventually halts on an arbitrary input x. One can ask whether there is a Turing machine halt M which, given code (M ) and the input x, eventually stops and produces 1 if M (x) stops, and 0 if M (x) does not stop. Turings seminal result states that this problem cannot be solved by any Turing machine, i.e. there is no such halt M . Halting can be recognized by simply running the machine in question; the main diculty is to detect non-halting machines. Since many real-world problems arising in the elds of compiler optimization, automatized software engineering, formal proof systems, and so forth are deeply connected to the halting problem, there is an interest in understanding the problem in order to translate theoretical results into practical applications. In [4], it was observed that for any computable probability distribution, most long times are eectively rare, so that at the limit they all had the same behavior regardless of the choice of distribution. They proved that the exact time at which a program stops is not too complicated algorithmically. It is (algorithmically) non-random because most programs either stop quickly or never halt. Since non-random times are (eectively) rare, according to Calude and Stay, the density of times at which an N -bit program can stop decreases quickly.
226
H. Zenil
There are (4n + 2)2n possible (n, 2) deterministic Turing machines with n states and 2 symbols. We denote by (n, m) the class (or space) of all n-state m-symbol Turing machines having a bidirectional tape and remaining on the same cell when entering the (additional to n) halting state. Among the machines that halt, there are some that print more 1s on their output tapes than any other Turing machines of the same size, and some that reach a maximum number of steps upon halting. If T is the number of 1s on the tape of a Turing machine T upon halting, then: (n) = max {T : T (n, 2) T (n) halts} with n the number of states of the Turing machine. If tT is the number of steps that a machine T takes upon halting, then S(n) = max {tT : T (n, 2) T (n) halts} with n the number of states of the Turing machine. (n) and S(n) are noncomputable functions [9] by reduction to the halting problem. Yet values are known for (n, 2) with n 4. The solution for (n, 2) with n < 3 is trivial; the process leading to the solution in (3, 2) is discussed by Lin and Rado [11]; and the process leading to the solution in (4, 2) is discussed in [1]. Solving the halting problem for small machines. It is easy to see that (1) = 1 and (2) = 4. Lin and Rado [9] proved (3) = 6 and Brady [1] that (4) = 13. The exact known values for S are S(1) = 1, S(2) = 6, S(3) = 21, S(4) = 107. These Busy Beaver values are for 2-symbol Turing machines. These numerical values of the Busy Beaver functions have been calculated by a combination of techniques, notably the exhaustive simulation of a reduced number of non-equivalent Turing machines, as it turns out that many can be decided (e.g. evident loops, etc) and because the number of cases is small enough one can either analyse case by case or actually run the machines and analyse their behaviour until deciding whether it halts or not. This is evidently possible because of the relatively small number of Turing machines with up to the number of states for for which the values of the Busy Beaver functions are known. A program showing the evolution of all known Busy Beaver machines developed by this papers authors is available online [15]. The formalism followed in this paper is the same as the one originally described and followed for the Busy Beaver problem as introduced by Rado [9]. It is worth noting that the Busy Beaver problem is dened for Turing machines with initial empty tapes, and Turing machines studied in this paper are all provided with an initially empty tapes too. Turing universality tells us, however, that for every Turing machine with an arbitrary input there is a Turing machine with empty input computing the same function, hence Turing machines with empty tapes cover all possible cases (the translation may only result in some extra states).
227
Calude and Stay showed that long-running Turing machines can only halt at non-random times; the density of non-random times near n is about 1/n. Longrunning means that if we have a universal Turing machine U and machine M is implemented by a program m for U of length n, then U (m) runs for more than c 2n steps, where c is some uncomputable constant depending on U . 3.1 Halting History of (2, 2) Turing Machines
We know that a machine halts if it enters the halting state before reaching the known Busy Beaver value S(n). If it does not, then it never halts. The halting problem and the halting probability problem are closely related to the Busy Beaver problem in that a solution to any one of them would yield a solution to each of the others. Consider the halting space of all (2, 2) Turing machines (with an extra halting state) provided with an empty tape. The table in Fig. 1 shows the runtime distribution at which all machines in (2, 2) halt (or do not).
t kt 6544 1 2000 2 800 3 160 4 56 5 362 6 78 p(kt ) 0.65 0.20 0.080 0.016 0.0056 0.036 0.0078
Fig. 1. Runtime distribution at which all machines halt (those that dont are indicated by ). Where t is the number of steps, kt the number of machines that halted at t (out of a total of 3456 that halt), and p(kt ) is the halting probability of a machine to halt (or not) in time t.
There are 10 000 2-state, 2-symbol Turing machines (the 10 000 gure comes simply from the formula giving the number of Turing machines with n = 2 states (4n + 2)2n ). No other Turing machine halts after 6 steps (see Fig. 1) in (2, 2). Machines that never halt are 6544 in number, representing around .65 of the total. What we term a runtime space is the product of a class of (n, m) Turing machines for xed n and m, where programs are uniformly distributed, and the time space, which is discrete, has a halting time mapped to a greyscale color (the lighter the color, the sooner it halted; white means the program never halted and red means it reached the Busy Beaver value S(n)). Each point in Fig. 3 represents a Turing machine and as dened by the corresponding spectrum in Fig. 2, the lighter the square the sooner it halted. White
228
H. Zenil
Fig. 2. Halting color mapping spectrum for Turing machines in (2, 2) (the last color is red, visible in the online and printed versions only)
Fig. 3. Runtime distribution plot showing all the 10 000 Turing machines in (2, 2) compressed in a Peano curve packing array (preserving the enumeration distance between machines). Some clusters may emerge due to the enumeration (e.g. terms involving transition rule parameters grouping Turing machines). The plot may look as if it had less than the necessary rows and columns to represent all the 10 000 Turing machines, but that is a consequence of the Peano packing, each apparent pixel is in fact a small cluster of several machines.
cells represent machines that dont halt. Red cells (only visible in the online and color printed versions) show the Busy Beaver machines (for this space, with runtime S(2) = 6 steps). Among all the 3456 Turing machines in (2, 2) that halt, .65 of them do so after the rst step, .2 do so after the second, .05 after the third, and so on. In other words, .57 out of the 3456 (2, 2) Turing machines that halted did so at the rst step, .81 halted before or by the second step at the latest, .84 before or by the third step at the latest, and so on (see Fig. 6).
Computer Runtimes and the Length of Proofs t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kt 100 214t 5382624 1075648 819200 614656 409600 263424 204800 97216 102400 53760 51200 20800 25600 12512 12800 4264 6400 2424 3200 1064 1600 536 800 304 400 176 200 128 100 p(kt ) 0.71 0.14 0.082 0.035 0.013 0.0071 0.0028 0.0017 0.00057 0.00032 0.00014 0.000071 0.000040 0.000023 0.000017
229
Fig. 4. Where t is the number of steps, kt the number of machines that halted at t, and p(kt ) is the halting probability calculated from t and kt . 100 214t is a good t to the limit behavior as a function relating runtimes and the number of Turing machines halting at a certain runtime for the 14 runtimes at which Turing machines halt.
number of machines 1106 800 000 600 000 400 000 200 000 runtime
10
12
14
Fig. 5. Number of machines in (3, 2) that halt step by step versus 100 214t (dark line (blue in color version))
3.2
Interesting output distribution facts: Out of 7 529 536 machines only 2 146 912 halt. There are 5 382 624 machines that do not halt. Those machines that halt only produce 126 dierent output strings, with the largest being 6 digits in length (the Busy Beavers). Exactly .2 of the Turing machines produce a 0 or a 1 as output.
230
number of machines (log) 9.2
H. Zenil
number of machines (log) 14.6 14.5
9.1
14.4 14.3
Fig. 6. Accumulated number of machines in (2, 2) (left) and (3, 2) (right) that halt step by step
The fact that the gures are mostly white and lightly colored is an indicator of the sparsity of non-halting or quickly-halting machines.
Fig. 7. Halting spectrum for (3, 2). Last color in the spectrum is red (only visible in the online and color printed versions).
Inspired by [13] where Wolfram undertakes an exhaustive investigation of the space of propositional logic formulas, I extended his ideas to investigate the space of rst order logic. The extension wasnt trivial, among other reasons because unlike propositional calculus, predicate calculus is undecidable, meaning that one may come across cases where formulas (or their negations) are not proven or disproven in an axiom system of rst order logic. Proof lengths are, of course, not bounded, or one would be able to decide whether a formula in an axiom system can be proven or not if it has reached a limit. Frequency of proof lengths for randomly generated formulas, however, can be studied and analyzed. Frequency distributions of (dis)proven formulas turn out to follow a similar distribution to those of randomly generated computer programs, in which most programs, just as we found for formulas, halt (or are (dis)proven) quickly, with their number diminishing fast over time. When I met Cris Calude and became acquainted with his fascinating work, including a recent collaboration with Michael Stay on the distribution of halting times of random computer programs [4], it prompted me to seek connections with these other ndingspersuaded as I was of the strong connections known to exist between computation and proof theoryand to undertake an empirical
231
Fig. 8. Runtime deep eld of a segment of runtimes from the 7 529 536 Turing machines in (3, 2). The (3, 2) Busy Beavers are barely visible as isolated red points (online and color printed versions only).
Fig. 9. This is what a typical random part of the runtime deep eld looks like after a 10 zoom from a 10th. square area of the original (Fig. 8) image.
232
H. Zenil
investigation of both the halting runtimes of Turing machines that Calude and Stay had calculated theoretically, and the lengths of proofs found by automatic theorem provers. It follows from Chaitin [5] and Calude and Stay [4] that to (dis)prove a formula in an axiom system one only needs to check up to the runtime for which the Turing machine encoding the proof no longer halts. Busy Beavers, as used in the previous section, are therefore relevant to automatic theorem proving because they provide an upper bound on the length of proofs. One only needs to run the computer to (dis)prove the formula up to the Busy Beaver value of the size of the Turing machine, and if it cannot be proven by then then it is undecidable for that axiom system. Moreover, Calude and Stays work may then suggest that chances of proving a formula should decrease over time, or that if a formula can be (dis)proven it will likely do so early in time rather than later meaning that one can set an optimal time for a given provability certainty goal. 4.1 Computer Runtimes and Lengths of Proofs
Optimal proving times are relevant because, on the one hand, they may allow one to set a maximum waiting time, given that proofs may never arrive if a theorem is undecidable in an axiom system, but also because one would know how long to wait before giving up with a certain degree of certainty of provability. If one had a goal (say to prove a fraction of .90 of a set of formulas) one could calculate an optimal timeout and a maximum waiting time, taking advantage of the fact that in the case of theorem provers running on digital computers, there is a correspondence between runtime and proof length. The numbers involved are so large and grow so fast because of the combinatoric explosion (in the number of formulas as well as the number of Turing machines). We were only able to explore the tip of the iceberg of the space of all possible rst-order formulas, but with interesting and encouraging results nonetheless. 4.2 Enumerating and Generating Predicate Calculus Axiom Systems with Equality
A number of sound and complete calculi have been developed enabling fully automated theorem provers for rst-order logic. Equational logic is quite simple, and yet powerful [2]. Its atomic formulas are equations, making it very easy to encode and deal with. In our formalism, terms are rst-order formulas built from variables and constants using function symbols. Equalities of the form lhs = rhs are the atomic formulas in our language, where lhs and rhs are terms. One can represent most mathematical axiom systems and theorems in equational form, so it is expressively very rich. A logical system which possesses an explicitly stated set of axioms from which theorems can be derived is an axiomatic system. In predicate calculus, a formula is in prenex normal form if it can be written as a string of quantiers followed by a quantier-free part. All rst-order wellformed formulas (hereafter simply formulas) are logically equivalent to some formula in prenex normal form. Skolemization is a way of removing existential
233
quantiers from a formula. Variables bound by existential quantiers which are not within the scope of universal quantiers can simply be replaced by the appropriate constants. Both will be used in order to enumerate all possible quantied axioms and formulas of rst order logic. All equational formulas can be represented with two binary operators f and p, where p is a pairing function and f is an indexing operator (any possible binary function). The rst parameter of f will be a constant determining its index, while the second is any other term (variable, constant, f itself or p). When the existential quantier is inside a universal quantier, the bound variable must be replaced by a Skolem function of the variables bound by universal quantiers. We can then specify any constant using a formula of the form: a b f (a, a) = f (b, b). And the ith constant can be dened in terms of f and p recursively as follows: c(0)=p(f(a, a), f(a, a)) c(n+1)=p(f(a, a), c(n)) Or in a single M athematica expression: Nest[p[f[a,a],#]&,p[a,a],i] To represent all possible functions one can combine both f and p. For instance, f (c(i), p(c(i), x)) is the expression representing the i-th function (the function with index i) of x. This assumes that there are an innite number of individuals in the most general case. Notice that x may be a list built from pairs. Formulas were enumerated and generated by the number of variables and constants on both sides of the equality. There are no formulas of length 1, simply because an equality requires at least 2 terms on each side. Finally, all single axioms were arranged by length. The length of an equational formula is the sum of the bound variables on both sides of the equality. Axiom systems are simply all the possible subsets over the formulas of xed length. Applying this operation makes the number of axiom systems to grow exponentially, so we were able to proceed exhaustively only up to 3 bound variables formulas and to generate a sample of 1000 axiom systems only (an initial segment) for 4 bound variables formulas. An automatic theorem prover was fed with all 4 bound variable single formulas as its proving goal for each of the generated axiom system, producing almost 10 103 proofs. Among the initial 1000 axiom systems, 607 were used only, as they were proven to be consistent (no axiom was the negation of any other) and independent (no axiom could be derived from the others). An example of a formula with 3 bound variables is: x1 x2 x3 , x1 = f (f (x2 , x3 ), x1 ) and with four: x1 x2 x3 x4 , x1 = p(f (x2 , x3 ), x4 ). An example of an axiom system consisting of 2 axioms each with 2 bound variables is: x1 x2 , x1 = f (x2 , x1 ) x1 x2 , x1 = p(x1 , x2 ). Notice that one does not need to further compose f with p or p with f in order to produce other possible formulas, because f is a general function with an index as rst parameter and any term as second parameter which can be p or f itself, without the need of innitely nesting each into the other in order to reach other possible constructions.
234
H. Zenil
4.3
Experimental Setting
The project was undertaken using Mathematicas built-in implementation of the well known and award-winning theorem prover Waldmeister 1 . Waldmeister returns True after evaluating an expression in Mathematica if it can prove the conclusions from the given axioms, and False if it can prove that the conclusions do not follow from the axioms. If it cannot prove either, it returns Unevaluated. The axiom systems generatedas described in section 4.2were rst checked for logical consistency and internal axiom independence, these being two of the most important qualities of conventional mathematical axiom systems. A is said to be consistent if no theorem and its negation can be derived from A. On the other hand, if A is an axiom system and a A, then a is considered independent in A, or an independent axiom of A if a cannot be derived from A {a}. As with any axiomatic system, we want this axiomatic system to be minimal, i.e. to contain no superuous axiom. From this point on, only consistent axiom systems were taken into account. Miscellaneous interesting rst results: It was found that only .01 out of a total of 490 axiomatic systems with 1 or 2 axioms of length up to 3 bound variables were non-independent, i.e. one of its members could be derived from a combination of the others. All the 29 axiomatic systems of length 3 with 2 or more axioms were independent. This could be explained by the way in which the axiomatic systems were enumerated, because axioms closer to each other in the enumeration seem to have a better chance of being derived from each other. The condition of being a theorem or an axiom is evidently an arbitrary convention. The number of consistent axiom systems of length 3 was only .0342 percent of a total of 1024 initial axiomatic systems. In the case of axiom systems of length 4 (composed by formulas of that size), .607 of them were found to be consistent. This may be interpreted in two dierent ways: that even when the complexity of the axiom systems grows, the overall inconsistency does not increase, or else that the process only unveils the tip of the iceberg, where they are consistent chiey due to their simplicity (both in terms of number of axioms per axiom system and the length of the axioms themselves, thereby reducing the possible number of clashes). 4.4 Distribution of Proof Lengths
The relation between the length of the formulas and the optimal runtime limit is of particular utility when no upper bound is known (or possible), when, for example, there are non-provable formulas for which longer runtimes will not make any dierencewhich, as veried herein, would cover a negligible number of cases.
1
235
A total of 89 145 formulas out of the 97 727 with at most 4 variables were proven to be theorems (or their negations) after a single step. One can call such a theorem trivial simply because its proof, requiring only 1 step, can be accomplished with an axiom, therefore itself being an axiom. The proof length (t) distribution (in percentage) of formulas with up to 4 variables is as shown in Fig. 10.
t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 kt 89145 2311 473 931 928 426 577 834 1344 294 186 206 44 15 7 2 4 p(kt ) 91.2184 2.36475 0.484001 0.952654 0.949584 0.435908 0.59042 0.853398 1.37526 0.300838 0.190326 0.210791 0.0450234 0.0153489 0.00716281 0.00204652 0.00409303
Fig. 10. Proof length (t) distribution (in percentage) of formulas with up to 4 variables
Proof length distribution of (dis)proven theorems. Where t is the number of steps the theorem prover has taken to produce the proof, kt the number of machines that halted at t, and p(kt ) is the halting probability of having (dis)proven k theorems in time t from which one can build a probability distribution p(kt ). It is worth noting that the behavior of 10 graph resembles the rst case of (2, 2) Turing machines, where the number of machines that halted was not strictly decreasing (unlike (3, 2) that was monotonically decreasing). Already 0.912 out of the total number of theorems are proven by the very rst step, with that number dropping as the total is approached. From the distribution it follows that going beyond the 7th. step to the 17 steps that require the longest proofs only adds .012 new (dis)proven formulas to the total. Summary of proving times: A total of 89 145 formulas out of 97 727 were immediately proved (or disproved) after the rst step (i.e. 91.21%). 95.96 were proven after 5 steps, and 96 969 formulas were proven after 9 steps (which is almost half of the 17 maximum number of steps reached by the formulas with 4 bound variables). That is, 99.22% of the total. Letting the theorem prover run up to 17 steps only generates 758 new proofs, that is only 0.77% of the total.
236
H. Zenil
number of theorems (dis)proven (log)
Fig. 12. Truth space of 97 727 proofs from the 607 consistent and independent axiom systems (x axis) against 161 formulas (y axis) from formulas with 4 bound variables. Every dot is a proof, a black square indicates that a particular theorem holds in a particular axiom system (which explains the diagonal, among other patterns) and white means the formula was proven to be false in the corresponding axiom system (i.e. the negation is a theorem). No undecidable candidate was found.
As for Turing machines (see Fig. 8), the space of proof lengths (Fig. 14) is mostly white and lightly colored as an indicator of the sparsity of long proof lengths given that most formulas are (dis)proven very quickly, suggesting that the distribution of proof lengths follows the distribution of program runtimes.
237
Fig. 14. Proof length deep eld plot from the 97 727 formulas of up to 4 variables. formula Busy Beavers are barely visible as isolated red points (online and color printed versions only). Points are arranged as in 12.
As for Busy Beaver Turing machines, the values of which depend on the size of the Turing machines (states and symbols), proof lengths depend on the length of the formulas. One can dene Busy Beaver formulas (the values of which will be denoted by f BB(n)) as the formulas for which an automatic theorem prover takes more time to (dis)prove whether a theorem is decidable, or to produce the longest proof, among all the formulas of a xed length. Unlike Turing machines, however, the size of a formula can take many forms, and may depend on the number of bound variables (as was the case in the experiments undertaken here), the number of logical operators or the number of symbols in general. It also depends on the formalism, just as Busy Beavers depend on the formalism used by Rado [9]. Following the analogy, the values of f BB(n) would therefore work in a similar way and may be used just as Busy Beaver Turing machine values are currently usedfor dening maximum runtimes and maximum output lengths for (small) Turing machines, saving time once an upper limit is known. The exact relation would also save considerable computational resources in automatic theorem proving. As explained before, the theoretical algorithmic analysis in [4] indicates that a program that has not stopped after running for a long time has smaller and smaller chances of eventually stopping, so the longer the time t the more unlikely the program is to halt. Calude and Stays results can be interpreted as follows: most Turing machines are fully determined qua termination by a small number of computational steps, and the error margin upon betting that a Turing machine will halt drops exponentially. Because proofs are programs for automatic theorem prover and one can connect this interpretation to the probability of a formula to be (dis)proven in an axiom system with a condence error margin to be proven dropping fast. Let the optimal timeout be the number of steps for which a fraction of formulas from a set of xed length is (dis)proven. Evidently, proving time is asymptotically optimal, in the sense that the closest to the maximum runtime (the Busy
238
H. Zenil runtime (dis)proven fraction f (t) = 1/2t t of theorems p(t) (rst signicant digit) 1 0.9 0.5 2 0.02 0.2 3 0.005 0.1 4 0.01 0.06 5 0.009 0.03 6 0.004 0.02 7 0.006 0.008 8 0.009 0.004 9 0.01 0.002 10 0.003 0.001 11 0.002 0.0005 12 0.002 0.0002 13 0.0005 0.0001 14 0.0002 0.00006 15 0.00007 0.00003 16 0.00002 0.00002 17 0.00001 0.00001
Fig. 15. Runtime distribution at which all machines halt (those that dont are indicated by ). Where t is the number of steps, kt the number of machines that halted at t (out of a total of 3456 that halt), and p(kt ) is the halting probability calculated from t and kt .
Beaver formula values), the greatest the fraction of (dis)proven formulas. An optimal time OP T ime for a given goal implies that upon t one has reached a fraction of (dis)proved formulas. Thus OP T ime(n, ) = min{t(n) : |t(n) | = }, where n is the length of the set of formulas, the desired fraction of (dis)proved formulas and || the number of formulas proven at time t(n) 0. Obviously 0 < OP T ime(n) f BB(n) for each time t > 0, and OP T ime(n) = f BB(n) if = 1, that is, if the fraction of formulas to be (dis)proved is 1 (i.e. if the goal is to (dis)prove all the formulas of a xed length). Just as with Busy Beavers, the exact value of OP T ime(n) is uncomputable and unpredictable in general, but one can approach it. For example, in our formalism, for 4 bound variables it can be calculated from the probability distribution in 15. One can ascertain, for example, that from a uniform distribution of randomly generated formulas, nearly .90 of the formulas will be proven after the rst step. And that the number of new proofs from then on will rapidly drop as a function of the number of steps. The value of OP T ime(n) can also determine a timeout for single formulas, given a condence expectation. Which is to say that a single formula has, for example, a .90 chance of being (dis)proven in the rst step, and that it has diminishing possibilities, if any, of being (dis)proven thereafter. We think that the results are robust enough to model specications of theorem provers, despite not being completely independent. We were able to verify the results using another very dierent theorem prover, the Automatic
239
Proof Search or AProS [10] for propositional logic and predicate calculus (the theorem prover deals, however, with all sorts of other classical and non-classical calculus). AProS uses the intercalation method to search for normal natural deduction proofs not requiring a language in which the atomic formulas are identities, unlike Waldmeister. Notice that for this new case, the denition of the length of formulas was adjusted to the new framework, given that since the prover calculus does not require equality, no sense can be given to left or right hand sides. The set of randomly chosen operators used to generate formulas were the classic and, or, implies and double implies. AProS found proofs for .12 of the assertions (and for .353 of a set of assertions with no-double conditionals), out of a random choice of 1000 automatically generated predicate calculus assertions with up to 4 quantiers, 3 general functions, 3 logical operators and 3 variables. The longest proof length (runtime) was 42 with an average proof length of 13, and a distribution very close to the one described by Waldmeister using Mathematica.
A logically signicant question concerns the structure of the theorems established. If signicant structural features are uncovered, then one could generate randomly formulas of that structure and repeat the proof length and runtime distribution experiments. It would be quite interesting, if one could nd, for example, systematic biases for dierent theorem provers and theorem proving techniques when deviating in distribution from each other. One can continue the process of generalizing theoretical results from computer programs to proof lengths and seek the equivalent of Busy Beavers in sets of well dened proofs and theorem provers. Just as for larger Busy Beaver Turing machine values, the computer time and resources to explore much larger sets of proofs are out of reach. The experiments suggest that the statistics for theorem proving times from randomly generated formulas may follow a similar trend to the distribution of runtimes of random computer programs. And that when searching for proofs, appropriate timeouts can be set and optimal waiting times dened depending on the size of the formulas as it has been determined that runtimes depend on the size of machines. It is too soon, however, to declare any true resemblance and there are always dangers of extrapolating from the behavior of small systems. Acknowledgments. I am grateful to Cris Calude who encouraged me to publish these results in connection with his own work [4]. I am also indebted to Stephen Wolfram, Todd Rowland and Matthew Szudzik for their support and guidance during and after the 2005 NKS Summer School at Brown University, when I started this project as part of a 3-week Summer project and inspired by Stephen Wolframs own work in [13], intending to extend his results from propositional logic to predicate calculus. I am also grateful to J.-P. Delahaye
240
H. Zenil
with whom Ive undertaken related research [7], studying the output distribution of abstract computing machines. To Wilfried Sieg for his guidance and for introducing me to AProS, which I used to strengthen the experimental results in this paper while a visiting scholar at Carnegie Mellon, and to Jeremy Avigad who brought me to Carnegie Mellon. And to the anonymous referee. Any error or omission remains, of course, the sole responsibility of this author.
References
1. Brady, A.H.: The Determination of the Value of Rados Noncomputable Function for Four-State Turing Machines. Math. Comput. 40, 647665 (1983) 2. Baumgartner, P., Zhang, H.: On Using Ground Joinable Equations in Equational Theorem Proving. In: Proceedings of the 3rd International Workshop on First Order Theorem Proving (St Andrews, Scotland), Fachberichte Informatik 5/2000, pp. 3343. Universitt Koblenz-Landau (2000) a 3. Calude, C.S., Dinneen, M.J., Shu, C.-K.: Computing a glimpse of randomness. Experimental Mathematics 11(2), 369378 (2002) 4. Calude, C.S., Stay, M.A.: Most programs stop quickly or never halt. Advances in Applied Mathematics 40, 295308 (2005) 5. Chaitin, G.J.: Computing the Busy Beaver function. Information, Randomness & Incompleteness, 7476 (1984) 6. Chaitin, G.J.: A theory of program size formally identical to information theory. J. ACM 22, 329340 (1975) 7. Delahaye, J.-P., Zenil, H.: Numerical Evaluation of Algorithmic Complexity for Short Strings: A Glance Into the Innermost Structure of Randomness. Appl. Math. Comput. (in press, 2011) 8. Joosten, J., Soler-Toscano, F., Zenil, H.: Program-size Versus Time Complexity, Speed-up and Slowdown Phenomena in Small Turing Machines. International Journal of Unconventional Computing (2011) 9. Rado, T.: On Non-Computable Functions. Bell System Technical J. 41, 877884 (1962) 10. Sieg, W.: The AProS Project: Strategic Thinking & Computational Logic. Logic Journal of the IGPL 15(4), 359368 (2007) 11. Lin, S., Rado, T.: Computer Studies of Turing Machine Problems. J. ACM 12, 196212 (1965) 12. Hillenbrand, T., Lchner, B.: The Next WALDMEISTER Loop. In: Voronkov, A. o (ed.) CADE 2002. LNCS (LNAI), vol. 2392, pp. 486500. Springer, Heidelberg (2002) 13. Wolfram, S.: A New Kind of Science. Wolfram Media (2002) 14. Zvonkin, A.K., Levin, L.A.: The complexity of nite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Math. Surveys 25(6), 83124 (1970) 15. Zenil, H.: Busy Beaver, from the Wolfram Demonstrations Project (2009), http://demonstrations.wolfram.com/BusyBeaver/