Goldfarb NotesonMetamath
Goldfarb NotesonMetamath
Warren Goldfarb
Department of Philosophy
Harvard University
1 Axiomatics 1
1.1 Formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Axioms and rules of inference . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Natural numbers: the successor function . . . . . . . . . . . . . . . . 9
1.4 General notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Peano Arithmetic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Basic laws of arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Gödel’s Proof 23
2.1 Gödel numbering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Primitive recursive functions and relations . . . . . . . . . . . . . . . 25
2.3 Arithmetization of syntax . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Numeralwise representability . . . . . . . . . . . . . . . . . . . . . . 35
2.5 Proof of incompleteness . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6 ‘I am not derivable’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3 Formalized Metamathematics 43
3.1 The Fixed Point Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Gödel’s Second Incompleteness Theorem . . . . . . . . . . . . . . . . 47
3.3 The First Incompleteness Theorem Sharpened . . . . . . . . . . . . . 52
3.4 Löb’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3
5 Formalized Semantics 69
5.1 Tarski’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2 Defining truth for LPA . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 Uses of the truth-definition . . . . . . . . . . . . . . . . . . . . . . . 74
5.4 Second-order Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.5 Partial truth predicates . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.6 Truth for other languages . . . . . . . . . . . . . . . . . . . . . . . . 81
6 Computability 85
6.1 Computability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 Recursive and partial recursive functions . . . . . . . . . . . . . . . . 87
6.3 The Normal Form Theorem and the Halting Problem . . . . . . . . 91
6.4 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.5 Undecidability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.6 Recursive and recursively enumerable sets . . . . . . . . . . . . . . . 107
6.7 Recursive Function Theory . . . . . . . . . . . . . . . . . . . . . . . 110
Chapter 1
Axiomatics
1
2 CHAPTER 1. AXIOMATICS
this simpleminded formal language lacks the complexity that makes formal languages
of interest, it does illustrate some general points.
First, the formation rules are inductive: to specify the class of formulas we
stipulate first that certain particular strings are formulas, and then that certain
operations, when applied to formulas, yield new formulas. Finally, we specify that
only strings obtained in this way are formulas. Thus, the class of formulas is the
smallest class containing the string mentioned in clause (1) and closed under the
operation given in clause (2). Henceforth, in our inductive definitions we shall tacitly
assume the final “nothing else” clause.
Moreover, these inductive rules have a special feature. The operations for con-
structing new formulas out of old ones increase the length of the string. This feature
makes it easy to check whether a string is a formula. Hence it yields the effective
decidability of the class of formulas.
In specifying the alphabet above, we wrote down not the signs themselves but
rather the names of those signs, just as in specifying a bunch of people we would write
down names of those people. Much of what we shall be doing in this book is talking
about formal languages. The language in which we talk about formal languages
is the metalanguage, which is English amplified by some technical apparatus. The
language being talked about is often called the object language.
In clause (2) of the formation rules we used the syntactic variable ‘F ’. Upper-
case eff is not part of the formal language; it is part of the metalanguage. In the
metalanguage it functions as a variable, ranging over strings of signs.
A short discussion of use and mention is now in order. We use words to talk
about things. In the sentence
the first word is used to refer to a German logician. The sentence mentions this
logician, and uses a name to do so. Similarly, in the sentence
the same logician is mentioned, and the expression consisting of the first four words
is used to mention him. We shall speak of complex expressions like those four words
also as names, so in (b) we have a complex name of Frege. In general, to speak of
an object we use an expression that is a name of that object or, in other words, an
expression that refers to that object. Clearly the object mentioned is not part of
the sentence; its name is.
1.1. FORMAL LANGUAGES 3
Confusion may arise when we speak about linguistic entities. If I wish to men-
tion (talk about) an expression, I do not use that expression—for if I did I would
be mentioning the object that the expression refers to, if any. Instead, I should use
a name of that expression. Thus I might say
The first word of sentence (a) refers to a German logician.
Here, the first six words comprise a name of a linguistic expression. One standard
manner of forming names of expressions is to surround the expression with single
quotation marks. Thus we may say
‘Frege’ refers to a German logician.
Similarly,
‘The author of Begriffsschrift’ refers to a German logician.
Thus, to obtain a true sentence from ‘ is a primitive sign of the simpleminded
formal language’ we must fill the blank with a name of a primitive sign, not with
the primitive sign, for example:
‘?’ is a primitive sign of the simple-minded formal language.
We cannot say: ? is a primitive sign. That is nonsense, since a star is not a sign of
English.
Now let us consider the use of syntactic variables. Let F be a formula of the
simple-minded formal language. Here ‘F ’ is used as a syntactic variable. Our forma-
_
tion rules tell us that F followed by ‘??’ is also a formula. We also say that F ‘??’
_
is a formula, where ‘ ’ is a metalinguistic sign for the concatenation operator. We
_ _
may even say that F ‘?’ ‘?’ is a formula (and read this as follows: F concatenated
_
with star concatenated with star), for, after all, ‘?’ ‘?’ is identical with ‘??’. We
may not speak this way: F ? ? is a formula; nor this way: ‘F ? ?’ is a formula. The
former doesn’t work, since ‘F ? ?’ is not a referring expression, and hence cannot
be used to mention anything. The latter doesn’t work, since ‘F ? ?’ is the string
consisting of an upper-case eff and two stars, and this string is not even a string of
signs of the formal language, much less a formula.
Similarly, we can say: let G be a string consisting of an odd number of stars;
_
then ‘∆’ G is a formula. Also, let F be a formula and let G be a string consisting
_
of an even number of stars; then F G is a formula.
The distinctions between name and thing named, and between syntactic vari-
able and formal sign, should be thoroughly understood, particularly because—for
4 CHAPTER 1. AXIOMATICS
the sake of brevity—we shall soon abandon the pedantic mode of speech in which
the distinctions are strictly observed.
Now let us given an example of a formal language more typical of those consid-
ered in logical studies, which we shall call L= , the language of identity. The alphabet
consists of the six signs ‘=’, ‘∼’, ‘⊃’, ‘∀’, ‘(’ and ‘)’ along with the formal variables
‘x’, ‘y’,‘z’, ‘x0 ’, ‘y 0 ’, ‘z 0 ’, ‘x00 ’, . . . . The first four signs are called the identity sign,
the negation sign, the conditional sign, and the universal quantifier. The formation
rules are as follows:
_ _
(1) if u and v are formal variables, then u ‘=’ v is a formula;
_ _ _ _ _
(2) if F and G are formulas, then ‘∼’ F and ‘(’ F ‘⊃’ G ‘)’ are
formulas;
_ _ _ _
(3) if F is a formula and u is a formal variable, then ‘∀’ u ‘(’ F ‘)’ is
a formula.
∼∀x0 (x0 = y 0 )
Second, now that we have set up the precise formation rules, we shall often omit
parentheses when we talk of formulas. For example, we will usually drop the the
1.2. AXIOMS AND RULES OF INFERENCE 5
Truth-functional axioms. If F , G, and H are formulas then the following are axioms:
(T1) F ⊃ (G ⊃ F )
(T3) (F ⊃ G) ⊃ (∼G ⊃ ∼F )
(T4) F ⊃ ∼ ∼ F
(T5) ∼ ∼ F ⊃ F
(Q1) ∀u(F ) ⊃ F 0
Axioms of identity.
(I1) x = x
(I2) x = y ⊃ (F ⊃ G), where F and G are any formulas that differ only in that
G has free y at some or all places where F has free x.
These axioms express the traditional logical understandings of the signs of our
formal language: ‘∼’ and ‘⊃’ are to be read as negation and material conditional
(not and if-then), ‘∀’ as the universal quantifier (for all) and ‘=’ as identity.
Note that in specifying the truth-functional axioms we gave axiom schemata, each
of which gives rise to an infinite number of axioms by replacement of the syntactic
variables with formulas. Similarly for the quantificational axioms and the second
axiom of identity. In contrast, axiom (I1) is a particular formula. The axioms
generated by (Q1) are axioms of universal instantiation; by dint of them and modus
ponens, if a formula is derivable then so are its instances.
Let us give several examples of derivations in this formal system. The first is
fairly straightforward.
x=x
∀x(x = x)
∀x(x = x) ⊃ y = y
y=y
Here, the first formula is axiom (I1), the second results from it by application
of the rule of universal generalization, the third is an axiom (Q1) of universal in-
stantiation, and the fourth results from the second and third by modus ponens.
This illustrates a general feature of the system. If a formula F containing a free
variable u is derivable, then so will be the formula obtained from F by relettering
the variable, as long as it remains free.
Our next derivation is a little more complex.
x = y ⊃ (x = x ⊃ y = x)
(x = y ⊃ (x = x ⊃ y = x)) ⊃ ((x = y ⊃ x = x) ⊃ (x = y ⊃ y = x))
(x = y ⊃ x = x) ⊃ (x = y ⊃ y = x)
x = x ⊃ (x = y ⊃ x = x))
x=x
x=y⊃x=x
x=y⊃y=x
The first, second, fourth, and fifth formulas are axioms ((I2), (T2), (T1), (I1),
respectively). The third results from the first two by modus ponens; the sixth
results from the fourth and fifth by modus ponens; and the last results from the
third and sixth by modus ponens. Thus the symmetry of ‘=’ is derivable, using just
the truth-functional and identity axioms.
8 CHAPTER 1. AXIOMATICS
Finally, we want to show that the transitivity of ‘=’ is derivable. Here we shall
use the observation above about relettering of free variables, and not go through
the steps of using universal generalization and instantiation. First we note that
` x = y ⊃ (x0 = x ⊃ x0 = y), since it is an axiom (I2). (Recall that ‘`’ means
“is derivable”.) Relettering y as z, we obtain ` x = z ⊃ (x0 = x ⊃ x0 = z), then,
relettering x as y, ` y = z ⊃ (x0 = y ⊃ x0 = z), and finally relettering x0 as x,
` y = z ⊃ (x = y ⊃ x = z). This last formula is a way of expressing transitivity.
We have just shown by a metamathematical argument that y = z ⊃ (x = y ⊃
x = z) is derivable, that is, that there is a sequence of formulas obeying certain
syntactic restrictions and ending with ` y = z ⊃ (x = y ⊃ x = z). In this
argument we did not actually exhibit the derivation. Of course, we can always show
a formula derivable in PA by giving a derivation of it, by writing down the sequence
of formulas. But since formal derivations quickly become very long and tedious, we
eschew these direct verifications of derivability. Instead, we show general principles
about derivability and use them to show that a derivation exists. It is essential to
bear in mind that the metamathematical arguments are not the derivations: they
establish the existence of derivations without actually exhibiting them.
An especially useful general principle for establishing derivabilities is this: ax-
ioms (T1)–(T5) together with modus ponens yield the derivability of all truth-
functionally valid formulas. (A formula is truth-functionally valid if it is built up
from some parts by use of ‘∼’ and ‘⊃’, and every assignment of truth-values to those
parts makes the whole formula come out true.) In a phrase, the system is truth-
functionally complete. (T1)–(T5) were axioms of the first fully laid-out axiomatic
system for truth-functional logic, namely that of Frege in Begriffsschrift (1879). It
is impressive that he formulated a system that turned out to be truth-functionally
complete, even though the concept of truth-functional validity was not articulated
until nearly forty years later. (Frege’s system had an additional axiom, which turned
out to be redundant.) The first published proof of the truth-functional complete-
ness of an axiomatic system was due to the American logician Emil Post (1921).
We won’t pause now to prove the property for our system; a proof is outlined in
Appendix §1.
Here is a typical application of truth-functional completeness. A more natural
expression of transitivity than the formula used above is x = y ⊃ (y = z ⊃ x = z).
To show it derivable, note that the formula
(y = z ⊃ (x = y ⊃ x = z)) ⊃ (x = y ⊃ (y = z ⊃ x = z))
is truth-functionally valid (it has the form (F ⊃ (G ⊃ H) ⊃ (G ⊃ (F ⊃ H))). Hence
1.3. NATURAL NUMBERS: THE SUCCESSOR FUNCTION 9
Note that these are not signs of the formal language, nor are they “defined signs
of the formal language” (whatever that would mean), nor are we proposing a new
formal language that incorporates them (although that, of course, could be done).
They are signs of the metalanguage, used to provide short names of long formulas.
For example, we can write x = y y = z ⊃ x = z, and mean thereby the formula
∼(x = y ⊃ ∼y = z) ⊃ x = z. Since x = y ⊃ (y = z ⊃ x = z) truth-functionally
implies x = y y = z ⊃ x = z, it follows that the latter is also derivable.
We also introduce ‘∃’ as a metalinguistic abbreviation, writing ∃uF for ∼∀u(∼F ).
Now if ∀u(∼F ) ⊃ ∼F 0 is an axiom (Q1), we may note that since it truth-functionally
implies F 0 ⊃ ∼∀u(∼F ), we obtain the derivability of F 0 ⊃ ∃uF , which is a form of
existential generalization.
of language L= by adding two primitive signs, ‘0’ and ‘S’. In order to specify the
formulas, we first specify a class of strings called terms.
Clearly (2) and (3) are the same rules for constructing complex formulas from
simple ones as in L= . The only difference in the formation rules lies in in the
specification of the atomic formulas, those licensed by clause (1), which do not
contain ‘∼’ ‘⊃’, or ‘∀’: the atomic formulas now are equations between terms of LS ,
rather than just equations between formal variables.
We now specify axioms for a system ΣS . The logical truth-functional, quan-
tificational, and identity axioms are all framed as in §2. Of course, the formulas
referred to are formulas of LS , so in saying, for example, that for all formulas F and
G, F ⊃ (G ⊃ F ) is an axiom, we mean for all formulas F and G of LS . Moreover,
the notion of instance is expanded: instances of ∀uF are formulas that are obtained
from F by replacing u with a term, provided any variable in the term is free for u.
The axioms of successor are as follows:
(S1) ∼Sx = 0
(S2) Sx = Sy ⊃ x = y
Axioms (S1) - (S3) are each individual formulas of LS , while (S4) is an infinite
list of formulas. As before, the rules of inference are modus ponens and universal
generalization.
1.3. NATURAL NUMBERS: THE SUCCESSOR FUNCTION 11
Because the formal system contains universal instantiation as axioms and uni-
versal generalization as a rule of inference, it doesn’t matter whether we formulate
axioms like (S1)-(S4) as open formulas, as above, or as universally quantified closed
formulas, for example, ∀x∀y(Sx = Sy ⊃ x = y). Each of these forms is derivable
from the other using (Q1), modus ponens, and universal generalization. For some
purposes, not at issue in this volume, it is important that all nonlogical axioms—
those aside from the truth-functional axioms, quantificational axioms, and axioms
of identity—be closed; hence some authors use only the universally quantified forms.
(See the Exercises for §2.5 for an example of where this is needed.) The interderiv-
ability of the closed and open forms of the axioms motivate an extension of our
terminology: we shall call any formula obtainable from an axiom by generalization
and instantiation an instance of that axiom.
As an example, let us show that x = S0 ⊃ ∼x = SSy is derivable in ΣS . We
have ` ∼0 = Sy, from an instance of (S1) and the symmetry of identity. We also
have ` S0 = SSy ⊃ 0 = Sy, an instance of (S2). By truth-functional implication,
` ∼S0 = SSy. Now, since by axiom (I2) ` x = S0 ⊃ (x = SSy ⊃ S0 = SSy), by
another truth-functional implication we can infer ` x = S0 ⊃ ∼x = SSy.
In fact ΣS can derive every formula that is true about the integers and succes-
sor. To make this claim precise, we must talk about interpretations of the language
LS . Intuitively, an interpretation of a formal language is specified by providing
meanings to the signs. The interpretation of the logical signs is fixed: ‘=’ is inter-
preted as identity, ‘∼’ as negation, ‘⊃’ as material conditional, and ‘∀’ as universal
quantification. Thus, all an interpretation of L= would need to fix is the universe
over which the quantifiers range. LS , on the other hand, also contains the signs ‘0’
and ‘S’. Syntactically, ‘0’ functions as a constant, that is, a name of an object, and
‘S’ as a one-place function sign. Hence an interpretation of LS consists of first a
(nonempty) universe, second an interpretation of ‘0’ as an element of the universe,
and third an interpretation of ‘S’ as a function on the universe whose values lie in
the universe. Here is an interpretation: the universe is the natural numbers; ‘0’ is
interpreted as zero and ‘S’ as the successor function. Thus, under this interpreta-
tion, ‘SSS0’ refers to three; and ∀x(∼Sx = 0) asserts that zero is the successor of
no natural number.
The interpretation we have just given is the intended interpretation, the one we
had in mind when we formulated LS . Other interpretations may easily be devised;
for example, we could take the universe to be all integers, negative, zero, and posi-
tive, with ‘S’ as the successor function on them, and ‘0’ as zero; or take the universe
to be the natural numbers together with two other objects, which we’ll call a and
12 CHAPTER 1. AXIOMATICS
b, and interpret ‘S’ as the successor function on the natural numbers and takes a to
b and b to a.
Suppose we have an interpretation of LS . Let F be a sentence of LS , that is, a
formula without free variables. Then either F is true under the interpretation or else
F is not true, that is, it is false under the interpretation. For example, ∀x(∼Sx = 0)
and ∀x(∼SSx = x) are true under the intended interpretation, because zero is
not the successor of any number, and no number is the double successor of itself,
whereas ∀x(∃y(x = Sy)) ⊃ (∃z(x = SSz)) is not, because not all numbers that
are successors of some number are double successors of some number (the number
one is the sole counterexample). In the first variant interpretation, ∀x(∼Sx = 0) is
false, since zero is the successor of minus one, while ∀x(∼SSx = x) is true, as is also
∀x(∃y(x = Sy) ⊃ ∃z(x = SSz)), since in fact every member of the universe is the
double successor of something. In the second variant interpretation, ∀x(∼SSx = x)
is false, because the function that interprets ‘S’ when applied twice takes a to itself,
and also takes b to itself.
What about open formulas, that is, formulas with free variables? Consider, for
example, ∀x(∼y = SSx), in which y is free. Under interpretation, this formula is
true or false once the free variable y is assigned a value in the universe of discourse.
Indeed, under the intended interpretation this formula is true for values zero, one,
and two of y, and false for all other values. (In the first variant interpretation, it is
true for no values of y.) In general, under an interpretation a formula is true or false
for given assignments of values to the free variables. We also call an open formula
true under an interpretation without qualification (that is, without reference to an
assignment of values to the free variables) iff it is true for all assignments of values
in the universe of discourse to the free variables. Thus we would say that ∼Sx = 0
is true under the intended interpretation.
Semantics is the study of interpretations of formal languages, and of properties
of signs that are defined with reference to interpretations. (In purely mathematical
contexts this is also called model theory, since interpretations are also called models,
and the intended interpretation of a system like ΣS is called the standard model.)
Syntax is the study of purely formal properties of signs and formal systems, with
no mention of interpretation. The central notion of semantics is truth under an
interpretation; the central notion of syntax is derivability. Of course there can be
connections between these notions, as we shall shortly see.
1.4. GENERAL NOTIONS 13
Recall that a sentence is a formula without free variables. Call a sentence refutable
when its negation is derivable. Then the definition may be rephrased thus: Σ
is syntactically complete iff every sentence is either derivable or refutable. The
restriction to sentences, rather than arbitrary formulas, is important, since ordinarily
formulas with free variables will be neither provable nor refutable. For example, in
system ΣS the formula x = 0 is neither derivable nor refutable. Nor would we
want it to be, for if it were then by universal generalization either ∀x(x = 0) or
∀x(∼x = 0) would be derivable, and neither is a happy result, since either would
make the system inconsistent.
The property of syntactic completeness is sometimes called ‘formal complete-
ness’ or ‘negation completeness’. Note that syntactic completeness, like consistency,
is a purely syntactic notion. It is important to distinguish syntactic completeness
from other completeness notions, one of which we’ve seen already (truth-functional
completeness), one of which is defined below, and one of which we’ll encounter in
the next section. The use of the word ‘complete’ for many different notions is a
pun, historically engendered by the vague intuition that a formal system should be
called complete when it does everything we want it to do. At different times and
for different systems, what ‘we want it to do’ led to different notions.
• Σ is decidable iff there is a purely mechanical procedure for determining
whether any given formula is derivable in Σ.
14 CHAPTER 1. AXIOMATICS
We require of all formal systems only that the notion of derivation be effective, not
the notion of derivability. In particular, since the rules of inference may allow a
shorter formula to be inferred from longer formulas, there may be no obvious way
of telling from a formula how long a derivation of it might have to be. Whether or
not a system is decidable is thus a real question, and often takes considerable work
to settle.
Let us now leave syntax and define two semantic notions.
The power of semantic talk is illustrated by these cheerful facts: if there is at least
one interpretation with respect to which Σ is sound, then Σ is consistent; if there is at
least one interpretation with respect to which Σ is complete, then Σ is syntactically
complete. This follows from the fact that — since ‘∼ ’ is always interpreted as
negation — for any sentence F , either F is true or ∼F is true, but not both.
The definitions given in this section are completely general. Let us now restrict
attention to formal systems that use the same logical axioms and rules of inferences
as those given in §1.2. Those axioms are true under all interpretations, and the rules
of inference preserve truth: if the premises of an application of either of these rules
is true under an interpreation, so is the conclusion. Consequently, if the nonlogical
axioms of a system are true under an interpretion, then in any derivation F1 , . . . , Fn ,
every formula, either being an axiom or resulting by a rule of inference from previous
formulas, must be true in the interpretation, and so all derivable formulas will be
true under the interpretation and the system will be sound for the interpretation.
A corollary of this is: if the nonlogical axioms are true under a given interpretation
and a formula F is not true under that interpretation, then F cannot be derivable.
How do the systems we have formulated in this chapter fare under these defini-
tions? The axioms of Σ= , being purely logical, are true under every interpretation.
Obviously, then, they are consistent. The system is not syntactically complete, nor
would we want it to be, since it is intended to express the general logical laws of
identity, not the facts about a particular mathematical domain. In particular, for
example, ∀x∀y(x = y) is neither derivable nor refutable: it is true in the interpre-
tation with a one-element domain, and false in all other interpretations. Σ= is also
decidable, which is not difficult, but also not trivial, to prove. As it turns out, a
1.5. PEANO ARITHMETIC. 15
(N1) ∼Sx = 0
(N2) Sx = Sy ⊃ x = y
(N3) x + 0 = x
(N4) x + Sy = S(x + y)
(N5) x × 0 = 0
(N6) x × Sy = (x × y) + x
Note that (N1)–(N6) are particular formulas, but (N7) is an axiom schema: re-
placing F (x) by any formula of LPA yields an axiom. This schema provides the
mathematical induction axioms. Intuitively, mathematical induction is the princi-
ple that if 0 possesses a property and if whenever a number possesses the property
then so does its successor, then all numbers possess the property. The power of PA
to derive formalizations of interesting arithmetical claims — including all the clas-
sical theorems of number theory — stems from the inclusion of the mathematical
induction axioms.
1.5. PEANO ARITHMETIC. 17
Now the intended interpretation of LPA is, not surprisingly, that the universe is
the natural numbers, ‘0’ denotes zero, ‘S’, ‘+’ and ‘×’ denote the successor function,
addition, and multiplication. It can look fairly obvious that PA is sound for this
interpretation, but this obscures a difficulty. As we shall see in Chapter 5, framing
the notion of truth in this interpretation requires a metalanguage that is expressively
richer than what can be formalized in LPA , and, despite its superficial obviousness,
demonstrating that PA is sound for this interpretation requires a metalanguage
that is mathematically stronger than what is formalized by PA. (The problem, as it
turns out, lies in the unbounded logical complexity of the axioms of mathematical
induction. In particular the formula put in for F (x) in (N7) can have arbitrarily
many quantifiers.) For this reason we avoid semantical reasoning in proving the
results of the next three chapters. Semantic considerations will appear only as
heuristic or suggestive. In my view, in the study of foundations of mathematics, we
should avoid strong assumptions in the metalanguage, assumptions which are in as
much need of a foundation as is the mathematics that we are trying to ground by
formulating formal systems and investigating them.
Let us, then, return to the syntactic investigation of PA. Although the axioms
of PA includes those we called (S1) and (S2) for the system ΣS , here renamed (N1)
and (N2), it does not include (S3) and (S4), because they are derivable in PA using
mathematical induction. We shall show this at the beginning of the next section.
Hence every axiom of system ΣS is derivable in PA, so that every formula derivable
in ΣS is derivable in PA. From Herbrand’s result cited at the end of §1.4, it follows
that every sentence of LPA that does not contain ‘+’ or ‘×’ is either derivable or
refutable.
Of course, the difference between LPA and LS is that LPA has terms and for-
mulas that do contain ‘+’ and ‘×’. Let us note first that the intersubstitutivity of
identicals (“equals for equals yields equals”) is derivable. That is,
` x = y ⊃ t(x) = t(y),
where t(x) is any term containing x and t(y) comes from t(y) by replacing x with y.
An instance of this yields the following: suppose s, s0 , t and t0 are terms such that
t0 can be obtained from t by replacing a subterm s by s0 ; then ` s = s0 ⊃ t = t0 .
Now let us see how (N3)–(N6) yield the derivability of equations involving
addition and multiplication. (As with formulas, in speaking of terms we shall often
drop the outermost pair of parentheses.) As an example, let show the derivability of
S0 + SS0 = SSS0 (one plus two equals three). First, ` S0 + 0 = S0, since it is an
instance of (N3). Then, ` S0 + S0 = S(S0 + 0), since it is an instance of (N4). By
18 CHAPTER 1. AXIOMATICS
as desired. To show that the axioms (S4) are derivable, we’ll consider the example
∼SSx = x; the argument is generalizable to any positive number of occurrences of
‘S’. Let F (x) be the formula ∼SSx = x. Then ` F (0), since F (0) is an instance
of axiom (N1). Now F (x) ⊃ F (Sx) is ∼SSx = x ⊃ ∼SSSx = Sx, which is truth-
functionally implied by SSSx = Sx ⊃ SSx = x, which is an instance of axiom (N2).
Hence ` F (x) ⊃ F (Sx), so using an axiom of mathematical induction we obtain
` F (x), and so ` ∼SSx = x.
Our next aim is to show that the commutative law of addition is derivable. We
do this in three stages. First we show ` 0 + x = x. Let F (0) be 0 + x = x. Then
` F (0), since it is an instance of axiom (N3). By intersubstitutivity, ` F (x) ⊃
S(0 + x) = Sx. By axiom N(4), ` 0 + Sx = S(0 + x). By transitivity of identity,
` F (x) ⊃ 0+Sx = Sx, that is, ` F (x) ⊃ F (Sx). The result follows by mathematical
induction.
Second, we show ` Sz + x = S(z + x). Let F (x) be Sz + x = S(z + x). By
(N3), ` z + 0 = z, so that ` S(z + 0) = Sz, so ` Sz = S(z + 0) by symmetry.
Another instance of (N3) yields ` Sz + 0 = Sz. By transitivity, ` Sz + 0 = S(z + 0),
that is ` F (0). By (N4), ` Sz + Sx = S(Sz + x). Hence by intersubstitutivity,
` Sz + x = S(z + x) ⊃ Sz + Sx = SS(z + x). By (N4) again, ` z + Sx = S(z + x),
so that ` S(z + Sx) = SS(z + x). By symmetry and transitivity, ` Sz + x =
S(z + x) ⊃ Sz + Sx = S(z + Sx). This is just F (x) ⊃ F (Sx). Thus we obtain the
desired conclusion.
Finally, we show ` x + y = y + x. Let F (x) be x + y = y + x. From what
was shown two paragraphs above, ` 0 + y = y, and by axiom (N3) and symmetry
y = y + 0, so by transitivity ` F (0). By axiom (N4), ` y + Sx = S(y + x). An
instance of what was shown in the previous paragraph yields ` Sx + y = S(x + y).
By intersubstitutivity, ` x + y = y + x ⊃ S(x + y) = S(y + x). By symmetry and
transitivity of identity, ` x+y = y+x ⊃ (Sx+y = y+Sx), that is, ` F (x) ⊃ F (Sx).
We leave to the reader arguments for the derivability of other basic laws of
arithmetic, for example the associativity of addition, the law of cancellation y + x =
z + x ⊃ y = z, the commutativity and associativity of multiplication, and the
distributive law (y + z) × x = (y × x) + (z × x). (See the Exercises.)
We now wish to show that the basic properties of the usual ordering relation of
the natural numbers can be derived in PA. PA does not have primitive vocabulary
to express this relation, but, since a number m is no greater than a number n iff
some natural number added to m yields n, it makes sense to introduce the following
metamathematical shorthand: by x 6 y we mean the formula ∃z(y = z + x). More
generally, if s and t are any terms, by s 6 t we mean the formula ∃u(t = u + s),
20 CHAPTER 1. AXIOMATICS
where u is the earliest variable among z, z 0 , z 00 , . . . that is distinct from all variables
in s and t.
Since ` x = 0+x and ` x = x+0, by existential generalization we have ` x 6 x
and ` 0 6 x. To show the derivability of transitivity, that is,
x6yy 6z ⊃x6z
We might argue as follows. Suppose that x 6 y and y 6 z. Then there are z 0
and z 00 such that z 0 + x = y and z 00 + y = z. By intersubstitutivity, z 0 + (z 00 + x) = z.
By the associativity of addition, (z 0 + z 00 ) + x = z. Hence there exists an x0 , namely
z 0 + z 00 , such that x0 + x = z. That is, x 6 z.
The argument of the foregoing paragraph is more informal than those we have
used previously. That it establishes the derivability of transitivity can be seen,
roughly speaking, by noting that all the moves are purely logical inferences from
formulas known to be derivable; and that by dint of the logical axioms, PA can
capture all such inferences. More precisely, the argument shows that the formula
(z 0 + x = y z 00 + y = z) ⊃ (z 0 + z 00 ) + x = z is truth-functionally implied by appro-
priate instances of associativity, (N4), and intersubstitutivity. Hence this formula is
derivable. Together with existential generalization, this formula truth-functionally
implies (z 0 + x = y z 00 + y = z) ⊃ ∃z 0 (z 0 + x = z); and the latter formula logically
implies
∃z(z + x = y) ∃z 0 (z 0 + y = z) ⊃ ∃z 0 (z 0 + x = z)
which is just the formula x 6 y y 6 z ⊃ x 6 z. Another way of seeing that the
informal argument establishes derivability is by noting that the informal argument
can be directly transcribed into a natural deduction system for logical inference,
like that of Goldfarb’s Deductive Logic, yielding a deduction in that system whose
premises are formulas known to be derivable and whose conclusion is the desired
transitivity formula; and any implication that can be shown by such a deduction is,
as we show in the Appendix §2, derivable using the logical axioms of PA.
More generally, the fact that all logically correct steps can be captured in PA
can be framed as the quantificational completeness or logical completeness of PA,
namely, that all quantificationally valid formulas are derivable. (A formula is quan-
tificationally valid iff it is true under all interpretations.) As a consequence, if a
formula logically implies another (in the sense of quantification theory), and the
first formula is derivable, then the second formula will be, too. The quantificational
completeness of a formal system of logical axioms was first shown by Kurt Gödel, in
1.6. BASIC LAWS OF ARITHMETIC 21
`x6yy 6x⊃x=y
Suppose x 6 y and y 6 x. Then there are numbers z and z 0 such that z + x =
y and z 0 + y = x. By intersubstitutivity, z 0 + (z + x) = x, so by associativity
(z 0 + z) + x = x. Since ` 0 + x = x, (z 0 + z) + x = 0 + x. By the law of cancellation
z 0 + z = 0. By the law ` x + y = 0 ⊃ y = 0 (Exercise 1.?), z = 0. Hence 0 + x = y,
so that x = y.
The derivability of four further laws of ordering are left to the reader (see the
Exercises):
x60⊃x=0
x 6 Sx
x 6 y ∼x = y ⊃ Sx 6 y
x6y∨y 6x
These laws express that the ordering is a linear ordering, that is, any two
elements are comparable; that it has a least element, namely zero; and that the
ordering is discrete, that is, for every number there is a next one in the ordering,
namely its successor (there is nothing in between a number and its successor).
22 CHAPTER 1. AXIOMATICS
Chapter 2
Gödel’s Proof
More precisely put, we define a function Γ from signs of the alphabet to integers:
23
24 CHAPTER 2. GÖDEL’S PROOF
where p is the nth prime number. The number to which γ carries a string is called
the gödel number of the string. For example, γ(‘+’) = 23 = 8, γ(‘∀S)’) = 28 32 510 =
22,500,000,000, and γ(‘0 = 0’) = 21 35 51 = 2430. The function γ is one-to-one:
distinct strings are carried to distinct numbers. (This follows from the Unique
Factorization Theorem, first proved by Gauss in 1798, which asserts that every
number greater than 1 has a unique factorization into prime powers.) Moreover, γ
is effective.
Having given the gödel numbering, we may now deal with numbers alone. More
precisely, we know that there is a property of numbers that holds of just the gödel
numbers of formulas of PA, for example. We want to define this property, and
others like it, entirely within number theory. Our definitions will be purely number-
theoretic without any mention of syntax. Our interest in defining properties like
this is given by the gödel numbering; but the properties are intrinsically purely
number-theoretic, and their number-theoretic structure does not in any way depend
on the syntax of PA or on the correlation γ. The number theory we will be using
is informal number theory, as might be seen in a typical mathematics class, not the
formalized version enshrined by PA. Eventually we shall want to formalize some of
our proceedings; in §2.4 we shall see how much.
2.2. PRIMITIVE RECURSIVE FUNCTIONS AND RELATIONS 25
Exponentiation n0 = 1
nk+1 = (nk ) · n
ϕ(n, 0) = ψ(n)
ϕ(n, k + 1) = ξ(ϕ(n, k), n, k)
Thus the value of ϕ(n, k+1) is defined as some known function that has as inputs the
previous value ϕ(n, k), n,and k. Not all variables on the left need actually appear on
the right. For example, in the definition of addition, only the previous value n + k
appears on the right hand side of the second equation, but neither k nor n by itself
do. Even the following counts as a definition by recursion, although no variables
appear on the right-hand side:
α(0) = 1
α(k + 1) = 0.
The function α takes every positive integer to 0, and takes 0 to 1. We call α the
switcheroo function.
The general form of definition by recursion for functions with more than two
arguments is like that given above, but with a sequence n1 , n2 , . . . , nm of arguments
taking the place of n.
Composition. This is simply the compounding of given functions and rela-
tions. For example, we may define
· n) + (n −
|k − n| = (k − · k),
in which truncated difference and addition are compounded. This function gives
the absolute value of difference between k and n. Composition also gives us the
means to capture definition by cases. For example, suppose we wanted to define the
function of k and n that yields k 2 if k ≤ n and n2 if n < k. We can do this by using
addition, multiplication, truncated difference, and switcheroo thus, thereby showing
that this function is primitive recursive:
· n) + n · n · α(n + 1 −
k · k · α(k − · k)
2.2. PRIMITIVE RECURSIVE FUNCTIONS AND RELATIONS 27
usage, since we have also used these three signs as signs of the formal language LPA .
The context should make clear when the signs are used in informal mathematics, and
when as signs (or names for signs) of the formal language, but the reader should be
alert to the need to be sensitive to this. In order to minimize the overlap in language,
in informal mathematics we use ‘·’ for multiplication, as opposed to ‘× in the formal
language, and numerical variables from the middle of the English alphabet (‘i’ to
‘r’), not the end. Also, in informal mathematics, we will use somewhat different
logical notation: ‘ & ’ for and, ‘→’ for if-then, ‘↔’‘ for iff, and an overstrike bar
(over a sign for a relation) to mean not, for example, n > m iff n≤m. The quantifiers
we use in informal mathematics will also look different from those of LPA . For want
of a good alternative, ‘∨’ will remain as ambiguous between its informal usage and
its formal usage as (inclusive) or. End of Note.
Another definition method we will want to use for relations is bounded quantifi-
cation. For example,
E
k divides n iff ( p ≤ n)(n = p · k).
A
n is prime iff n > 1 & ( k ≤ n)(k|n → k = 1 ∨ k = n),
where ‘k|n’ abbreviates ‘k divides n’. The bound on the quantifier insures com-
putability: one need make only a finite search in order to determine whether the
new relation holds or not. Often an equivalent definition of the relation can be made
without the bound — for example, it is true that k divides n iff (∃p)(n = p · n) and
that n is prime iff n > 1 & (∀k)(k|n → k = 1 ∨ k = n)— but a definition without a
bound does not guarantee computability.
It is straightforward to show that if a relation R(k, n) is primitive recursive
A
then so is the relation ( k ≤ p)R(k, n), which has arguments p and n. Let χ be
the characteristic function of R, and define χ0 by recursion thus: χ0 (0, n) = χ(0, n),
χ0 (p + 1, n) = χ0 (p, n) · χ(p + 1, n). Thus χ0 is primitive recursive, and is the
characteristic function of ( k ≤ p)R(k, n), since χ0 (p, n) is 1 just in case each
A
of χ(0, n), . . . , χ(p, n) is 1, that is, just in case each of R(0, n), . . . , R(p, n) holds.
Bounded existential quantification can be obtained from bounded universal quan-
E
tification by truth-funcctional operations, since ( k ≤ p)R(k, n) is the complement
A
of ( k ≤ p)R(k, n)
A final definition-method we shall use frequently is bounded leastness. This is
used to define a new function from a given relation. The notation we use is this: an
expression
(µk ≤ p)R(k)
denotes the least number k ≤ p such that R holds of k, if there is such a number, and
2.2. PRIMITIVE RECURSIVE FUNCTIONS AND RELATIONS 29
denotes 0 otherwise. Thus if we define ϕ(n) = (µk ≤ n)(n = k + k) then ϕ(n) = n/2
if n is even and ϕ(n) = 0 if n is odd. Again, the point of having a bound on the
leastness operator is for the sake of computability; and again it is straightforward
to show that a function defined by bounded leastness from a primitive recursive
relation is itself primitive recursive. (See the Exercises.)
In short, since the primitive recursive functions and relations are closed under
definition by recursion, composition, truth-functional combination, bounded quan-
tification, and bounded leastness, when we use any of these definition methods
to define new functions and relations from functions and relations known to be
primitive recursive, the newly defined functions and relations will also be primitive
recursive.
We conclude this section by defining five primitive recursive functions and re-
lations concerning prime numbers and prime factorizations.
pr(0) = 1;
pr(k + 1) = (µn ≤ pr(k)! + 1)(n > pr(k) & n is prime) (2.1)
For each k > 0, pr(k) is the k th prime number, so that pr(1) = 2, pr(2) = 3,
pr(3) = 5, and so on. The bound comes from Euclid’s observation that if p is a
prime number then there is a prime number greater than p and no greater than
p! + 1. For if 2 ≤ n ≤ p then n divides p!, and so leaves a remainder of 1 when
divided into p! + 1. Hence either p! + 1 itself is prime, or it has a prime factor which
must be greater than p. (This is Euclid’s proof that there are infinitely many prime
numbers.)
The definition of pr(k) compresses several steps into one: those several steps are
definitions by truth-functional combination, bounded-leastness, composition, and
recursion. We shall often be giving definitions in this compressed form. This one
time, let us lay out the individual definitions step-by-step. First we note that the
relation that holds of m and n iff (n > m & n is prime) is primitive recursive,
since it is a truth-functional combination of primitive recursive relations. Next
we note that the function ϕ(j, m) = µn ≤ j(n > m & n is prime) is primitive
recursive, since it is defined from a primitive recursive relation by bounded-leastness.
Now let ψ(m) = ϕ(m! + 1, m); ψ is primitive recursive since it is obtained from
primitive recursive functions by composition. Finally, we define pr(k) by: pr(0) = 1,
pr(k + 1) = ψ(pr(k)). This is a definition by recursion; we may conclude that pr(k)
is primitive recursive.
succ(n) = 22 ∗ n (2.9)
A
Tmseq(n) ↔ Seq(n) & ( k ≤ `(n))(k > 0 → Attm([n]k ) ∨
E
( i, j < k)Tmop([n]i , [n]j , [n]k )) (2.14)
17 1 9 17 3 1 10
For example, Tmseq holds of 22 32 52 3 5 7 11 . Note that if Tmseq(n) holds then
n must be a “second-order sequence number”, that is, a sequence number in which
32 CHAPTER 2. GÖDEL’S PROOF
all the exponents are themselves sequence numbers. Mirroring Lemma: Tmseq(n)
holds iff n is a sequence number, and the exponents [n]1 , [n]2 , . . . [n]`(n) in its prime
factorization are the gödel numbers of a sequence t1 , t2 , . . . , t`(n) of strings with the
following property: each tk either is an atomic term, or, for some strings ti and tj
earlier in the sequence, is Sti or (ti +tj ) or (ti ×tj ). Such a sequence of strings shows
how a term is built up from its constituent parts in accord with the formation rule
for terms. Thus a string t is a term if and only if there is such a sequence whose
last member is t. Hence, we define:
E
Tm(n) ↔ ( m ≤ bg(n))(Tmseq(m) & n = [m]`(m) ) (2.15)
The bound on m is large enough. For suppose t is a term, and let i be the number of
occurrences of the signs ‘S’, ‘+’, and ‘×’ in t. Let t1 . . . , tj be a sequence of strings
as in the preceding paragraph such that tj = t and j is as small as possible. Then
j ≤ 2i + 1 (this may be proved by induction on j), and of course 2i + 1 ≤ γ(t). Also,
each ti is a subterm of t, so that γ(ti ) ≤ γ(t). Hence the sequence number m whose
prime factorization has exponents γ(t1 ), γ(t2 ), . . . , γ(tj ) is at most bg(γ(t)). We may
conclude that the following Mirroring Lemma clause holds: Tm(n) iff n = γ(t) for
some term t of LPA .
A similar series of definitions will yield a primitive recursive relation that mirrors
the property of being a formula.
neg(n) = 26 ∗ n (2.17)
cond(m, n) = paren(m ∗ 27 ∗ n) (2.18)
gen(k, n) = 28 ∗ 2k ∗ paren(n) (2.19)
A
Formseq(n) ↔ Seq(n) & ( k ≤ `(n)) k > 0 → Atform([n]k ) ∨
E
( i, j < k)Formop([n]i , [n]j , [n]k ) (2.21)
E
Form(n) ↔ ( m ≤ bg(n)(Formseq(m) & n = [m]`(m) ) (2.22)
2.3. ARITHMETIZATION OF SYNTAX 33
The Mirroring Lemma clauses for (2.17) – (2.21) are left to the reader. By reasoning
parallel to that for Tm(n), we can then infer that Form(n) holds iff n = γ(F ) for
some formula F of LPA .
Our next aim is to define a primitive recursive function that mirrors substitution
of terms for free variables. To do this, we must first mirror the notions of bound
and free occurrences of variables. Let the ith place in a string be the address for
the ith sign in the string, counting from the left. Thus in ‘∀x((x + 0) = x)’ the fifth
place is the location of the occurrence of x that follows a left parenthesis, while the
tenth place is the location of the rightmost occurrence of x.
E
Bound(i, k, n) ↔ Form(n) & Var(k) & ( p, q, r ≤ n)(n = p ∗ gen(k, q) ∗ r &
`(p) + 1 ≤ i ≤ `(p) + `(gen(k, q)))(2.23)
Mirroring Lemma: Bound(i, k, n) holds if n = γ(F ) for some formula F , k = Γ(u)
for some formal variable u, and the ith place in F lies within the scope of a quantifier
binding u. (Note that u need not occur at the ith place. This feature of the definition
will make it easier to mirror ‘t is free for u in F ’ (Exercise 2.?).)
Free(i, k, n) ↔ Form(n) & Var(k) & [n]i = k & Bound(i, k, n) (2.24)
Mirroring Lemma: Free(i, k, n) holds if n = γ(F ) for some formula F , k = Γ(u) for
some formal variable u, and u has a free occurrence at the ith place in F .
In syntax, substitution of a term for a free variable u is the simultaneous re-
placement of all free occurrences of u by occurrences of the term. In order to mirror
this by a primitive recursive function, we need to break it down into a step-by-step
procedure of substituting for the free occurrences of the variable one by one. We will
make the substitutions starting with the last free occurrence (the rightmost one) and
continuing right-to-left. The reason for this is that if the term being substituted has
length > 1, a substitution perturbs all the addresses to the right of the occurrence
being replaced. By making the substitutions from right to left, at each step the
addresses of the occurrences that are yet to be replaced remain unperturbed.
Define (max k ≤ m)R(k) as
A
(µk ≤ m)(R(k) & ( j ≤ m)(j > k → R(j))).
Then (max k ≤ m)R(k) is the largest k ≤ m such that R(k) holds, and 0 if there is
no such k.
If n = γ(F ) for some formula F and k = Γ(u) for some formal variable u, then
occ(0, k, n), occ(1, k, n), occ(2, k, n), . . . give the addresses, from largest address
down, of the places where u is free in F . If u has m free occurrences in F , then
occ(i, k, n) is nonzero for 0 ≤ i < m, while occ(m, k, n) = 0. Thus our next function
gives the number of free occurrences of u in F .
If n and p are sequence numbers and 0 < i ≤ `(n), then subat(n, p, i) is the sequence
number whose first i − 1 exponents match the first i − 1 exponents in n, whose next
`(p) exponents match those in p, and whose final `(n) − i exponents match the final
`(n) − i exponents in n. Thus, for example, subat(30, 72, 2) = 21 ∗ 23 32 ∗ 21 = 9450.
Mirroring Lemma: if n = γ(s) for some string s and p = γ(s0 ) for some string s0 ,
and i is at most the length of s, then subat(n, p, i) is the gödel number of the string
obtained from s by substituting s0 for whatever appears in s at the ith place.
subst(n, k, p, 0) = n
subst(n, k, p, i + 1) = subat(subst(n, k, p, i), p, occ(i, k, n)) (2.28)
Mirroring Lemma: if n = γ(F ) for some formula F , k = Γ(u) for some formal vari-
able u, and p = γ(s) for some string s, then subst(n, k, p, 1) is the gödel number of the
result of substituting s for the rightmost free occurrence of u in F ; subst(n, k, p, 2)
is the result of substituting s for the two rightmost free occurrences of v in F ; and
so on.
sub(n, k, p) = subst(n, k, p, nocc(k, n)) (2.29)
Mirroring Lemma: if n = γ(F (u)) for some formula F (u), k = Γ(u) for some formal
variable u, and p = γ(t) for some term t, then sub(k, p, n) = γ(F (t)).
The next function will be of great importance in the proofs of this Chapter and
the next, although it will might look a little mysterious now.
γ(F (n)), that is, diag(n) is the gödel number of the formula obtained from F (y) by
substituting n for all free occurrences of y. (So if there are no free occurrences of
y in F (y), then diag(n) = n. Also, if n is not the gödel number of a formula, then
diag(n) = n.) For reasons that will become clear only later, diag is called the gödel
diagonal function.
Our next task is to define a primitive recursive relation that mirrors the property
of being an axiom.
E
T1Ax(n) ↔ Form(n) & ( k, m ≤ n)(n = cond(k, cond(m, k))) (2.31)
Mirroring Lemma: T1Ax(n) iff n = γ(F ) for a formula F of LPA that is an axiom
generated from schema (T1).
We leave to the reader the task of providing primitive recursive definitions that
mirror the other axioms. (See the Exercises.) These will culminate in a primitive
recursive definition of a 1-place relation Ax(n) that yields the Mirroring Lemma
clause: Ax(n) iff n = γ(F ) for some formula F that is an axiom of PA.
E
Infop(i, j, k) ↔ j = cond(i, k) ∨ ( p ≤ k)(Var(p) & k = gen(p, i)) (2.32)
A
Drvtn(n) ↔ Seq(n) & ( k < `(n)) k > 0 → Ax([n]k ) ∨
E
( i, j < k)Infop([n]i , [n]j , [n]k ) (2.33)
Mirroring Lemma: Drvtn(n) holds iff n is a sequence number, and the exponents
[n]1 , [n]2 , . . . [n]`(n) in its prime factorization are the gödel numbers of a sequence
of formulas that is a derivation in PA. If this holds, we say that n encodes the
derivation.
Der(m, n) ↔ Drvtn(m) & n = [m]`(m) (2.34)
Mirroring Lemma: Der(m, n) holds iff m encodes a derivation in PA of a formula
with gödel number n.
the condition; in each case the formula that numeralwise represents the function has
one more free variable than the function has arguments.
The formula x + y = z numeralwise represents addition. In §1.5 we’ve shown
that, for any k, n, and q, if q is the sum of k and n then ` k + n = q. We must also
show that if q is the sum of k and n then ` k + n = z ⊃ z = q. By the transitivity
of identity, ` z = k + n k + n = q ⊃ z = q. Since ` k + n = q, by truth-functional
logic and the symmetry of identity we have ` k + n = z ⊃ z = q. A similar
argument shows that the formula x × y = z numeralwise represents multiplication.
Numeralwise representation is, in one sense, a weak constraint on a formaliza-
tion of a relation or function. It requires only that the formalization of “pointwise
facts” about the relation or function be derivable in PA: for all particular argu-
ments, whether the relation holds or not, and for all particular arguments, what the
value of the function is and that it is the only value. For other purposes we might
well want to require more, for example, that formalizations of general laws that the
relation or the function obeys be derivable in PA. We might not take the formula
x + y = z to be a good formalization of addition unless, say, the commutative law
were derivable using it. However, numeralwise representation is all that is needed
for the First Incompleteness Theorem.
Representability Theorem: Every primitive recursive relation and
function is numeralwise representable in PA.
We put off the proof until Chapter 4. As we shall see, it is straightforward, amount-
ing primarily to verifying that manipulations of finite sequences of numbers can be
formalized in PA. For now we note only that the proof is entirely syntactic and
is constructive: it provides a recipe for constructing, given any primitive recursive
definition of a function or a relation, a formula that numeralwise represents that
function or relation.
obtain another way of phrasing the condition: if ∃xH(x) is derivable, then not
all of ∼H(0), ∼H(1), ∼H(2), . . . are derivable. That is, if it can be derived that
there is a number with a certain property, it cannot be derived that each particular
number fails to have the property. Note that ω-consistency is a syntactic property,
although somewhat more complex than consistency. Note too that ω-consistency
implies consistency: for if the system is inconsistent every formula is derivable, so the
system is ω-inconsistent. As we shall see, consistency does not imply ω-consistency:
it is possible for a system to be consistent but ω-inconsistent. But clearly we would
want to require PA to be ω-consistent.
Let us list the results of previous sections that will be used in the proof. From
the previous section we need the Representability Theorem. From the arithmeti-
zation of syntax we need just two results: the existence of a primitive recursive
relation Der(k, n) such that Der(k, n) holds iff n is the gödel number of a formula
and k encodes a derivation of that formula; and the existence of a primitive recursive
function diag(n) such that if n is the gödel number of a formula F (y) then diag(n)
is the gödel number of F (n). From the logical material in Chapter 1, we need only
the fact that if a universal quantification ∀xF (x) is derivable, then so are all its
numerical instances F (k). This of course follows from the instantiation axiom (Q1)
and modus ponens.
Now let Q be the 2-place relation defined thus: for all integers k and n,
Q(k, n) ↔ Der(k, diag(n))
Then Q is primitive recursive;. By the Representability Theorem, there is a formula
A(x, y) that numeralwise represents Q. That is, for all k and n,
(a) if Q(k, n) then ` A(k, n);
(b) if Q(k, n) then ` ∼A(k, n).
Let p be the gödel number of ∀xA(x, y). Then, from the property of diag(n) just
noted, diag(p) is the gödel number of ∀xA(x, p). Note that ∀xA(x, p) is a sentence,
that is, contains no free variables. We can now complete the proof in five steps.
(1) If ` ∀xA(x, p) then, for each k, ` A(k, p) .
This is clear, since each A(k, p) is an instance of ∀xA(x, p).
(2) If ` ∀xA(x, p) then there exists a number k such that ` ∼A(k, p).
For suppose ` ∀xA(x, p). Then there is a number k that encodes a derivation
of ∀xA(x, p), so that Der(k, q), where q is the gödel number of ∀xA(x, p). As noted
above, q = diag(p). Hence Der(k, diag(p)), that is, Q(k, p). But then, by (b) above,
` ∼A(k, p)
2.5. PROOF OF INCOMPLETENESS 39
This follows from (3) and (5), taking G to be the sentence ∀xA(x, p). This sentence
is often called the Gödel sentence.
The core of the foregoing proof is step (2). Traditionally, to show a fact of
the form “if F is derivable then so is H” we show that H can be derived from
F , or from a formula obtained from F by universal generalization. Indeed, step
(1) has precisely this character (other examples can be found in the Exercises for
§1.2). But this is not at all what is going on in step (2). Rather, the supposition
that ` ∀xA(x, p) is exploited as a metalinguistic fact; this fact is then mirrored as a
number theoretic fact (namely, that there is a number k such that Der(k, q), where q
is the gödel number of ∀xA(x, p)); and that number-theoretic fact is then formalized
in the system, by means of numeralwise representation. In a somewhat loose way
of speaking, we might say that we are not drawing an inference from the content
∀xA(x, p) might be taken to express, but rather from the fact of its derivability.
The formula A(x, y) was so chosen that for any formula F (y) with gödel number
n, if ` F (n) then, for some k, ` ∼A(k, n). That is, since the gödel number of F (n)
is diag(n), ` F (n) tells us that there exists a number k such that Der(k, diag(n)),
so that ` ∼A(k, n) by numeralwise representation. Obtaining step (2) is a matter
only of choosing the right formula F (y). Namely, we choose F (y) to be ∀xA(x, y).
Call its gödel number p. Thus we obtain the result that if ` ∀xA(x, p) then there
exists a k such that ` ∼A(k, p).
40 CHAPTER 2. GÖDEL’S PROOF
(The recourse to soundness at this point can actually be avoided by inspection of the
proof of the Representability Theorem. That A(k, n) is true iff Der(k, diag(n)) will
follow directly from the construction of the formula A(x, y). See §4.3. ) Since the
universe of the intended interpretation is the natural numbers, for each n, ∀xA(x, n)
is true iff Der(k, diag(n)) for every natural number k. By mirroring, this condition
obtains iff the formula with gödel number diag(n) is not derivable. Now let p be the
gödel number of ∀xA(x, y); then ∀xA(x, p) is true iff the formula with gödel number
diag(p) is not derivable. Since the formula with gödel number diag(p) is ∀xA(x, p),
this shows
That is, the condition for the truth of the Gödel sentence is a number-theoretic
fact that mirrors the underivability of the Gödel sentence. Note that mirroring is
essential here. In the most direct sense, the Gd̈el sentence asserts a number-theoretic
statement; it is true iff a certain number-theoretic condition holds. It is only using
mirroring that we obtain the biconditional between the truth of the Gödel sentence
and a syntactic condition.
From (†) and the soundness of PA a quick semantical argument for incomplete-
ness can be formulated. G cannot be derivable, since if it is then by (†) it is false,
which would violate soundness. So G is not derivable, and hence it is true. Hence
∼G is false, and so by soundness it cannot be derivable. Gödel’s First Incomplete-
ness Theorem is often stated semantically thus: there is a true sentence of LPA that
is not derivable in PA. (As we shall see in §3.2, though, the syntactic proof of the
previous section yields an important further result unobtainable from the semantic
proof.)
There are two ingredients to obtaining (†). First is gödelization, that is, the
arithmetization of syntax, which shows that syntactic notions can be captured by
formulas of LPA , by formalizing the number-theoretic notions that mirror the syn-
tactic ones. Second is the use of the function diag(n), which allows the construction
of a formula that can appear on both sides of the biconditional (†).
Gödel’s strategy can be viewed this way. Define “formula m at n” as: the result
of substituting n for any free occurrences of y in the formula with gödel number
m (if m is not the gödel number of a formula, let formula m at n be an arbitrary
object that is not a formula). By gödelization, the relation “formula m at n is not
derivable” can be captured by a formula of LPA : all we need do is formalize the
A
number-theoretic relation ( k)Der(k, sub(m, 19, nmrl(n))). Now identify the two
variables, that is, consider only the case when m = n. (In the plane, these pairs
42 CHAPTER 2. GÖDEL’S PROOF
form the diagonal; hence the use of “diag”.) So we have a formula F (y) with one
free variable that captures the 1-place relation “formula n at n is not derivable”,
that is, for each n, F (n) is true iff formula n at n is not derivable. Let p be the
gödel number of F (y). We then have F (p) is true if formula p at p is not derivable;
and formula p at p is just F (p). Thus we obtain (†).
So far in this section we have been using the notion of truth in the intended
interpretation for formulas of LPA . However, there is a way of capturing (†) syntac-
tically, namely, by formalizing it within LPA . The right hand side of (†) expresses
a syntactic property, which is formalizable via gödelization in a direct way. Let
D(x, y) be a formula that numeralwise represents the relation Der(k, n). Then if H
is a formula with gödel number n, the formula ∃xD(x, n) is a formalization of the
E
assertion that ( k)Der(k, n), which mirrors the assertion that H is derivable. Thus
the underivabiity of the Gödel sentence can be formalized by ∼∃xD(x, q), where q
is the gödel number of the Gödel sentence. The left hand side of (†) is the ascrip-
tion of truth to the Gödel sentence, but inside LPA this can be formalized by the
Gödel sentence itself, since the metalinguistic ascription of truth to a sentence can
be captured in the object language by the assertion of that sentence. We claim that
the resulting formalization is derivable in PA, that is,
Formalized Metamathematics
` ∆(p, z) ≡ z = q
Hence the formula A(x, p) is provably equivalent to
43
44 CHAPTER 3. FORMALIZED METAMATHEMATICS
and the latter formula is of course equivalent to ∼∃xD(x, q). Thus we have shown
Proof. Let ∆(x, y) numeralwise represent the function diag, let F 0 (y) be ∀z(∆(y, z) ⊃
F (z)), let k be the Gödel number of F 0 (y), and let H be F 0 (k). Let m be the gödel
number of H; then m = sub(k, 19, nmrl(k)) = diag(k), so that
` ∆(k, z) ≡ z = m
From this it follows that F 0 (k) is provably equivalent to ∀z(z = m ⊃ F (z)),
and hence to F (m).
The formula F (y) may contain free variables aside from y. If it does not, that
is, if y is the only free variable in F (y) then H will be a sentence; and if it does
contain other free variables then the free variables of H will be precisely those other
variables.
A graphic way of stating the Lemma is possible with some new notation: if F
is a formula, then let pF q be the formal numeral for the Gödel number of F ; that
is, pF q is k, where k = γ(F ). Thus pq is a function that carries formulas to formal
3.1. THE FIXED POINT LEMMA 45
numerals. Its interaction with the number-theoretic function nmrl(n) and the gödel
numbering γ is given by the following diagram:
γ
F - γ(F )
pq nmrl
? ?
pF q -nmrl(γ(F ))
γ
That is, for each formula F , γ(pF q) = nmrl(γ(F )). The Fixed Point Lemma then
says: for every formula F (y) there is a formula H such that
` H ≡ F (pHq).
constructed in the proof of the Fixed Point Theorem will not contain 0 = 0 as a
subformula, so that ` F (pHq), and hence ` H. On the other hand, if we let J be
0 = 0 and carry out the construction of two paragraphs back, then H 0 does contain
0 = 0 as a subformula, so that ` ∼F (pH 0 q), and hence ` ∼H 0 . Thus H and H 0 are
not equivalent.
The quick proof of Gödel’s Theorem given just before the Fixed Point Lemma
can be viewed as an application of the Lemma to the formula ∼∃xD(x, y), where
D(x, y) numeralwise represents the relation Der(n, k). Let us define a 1-place
E
number-theoretic relation Dvbl(n) as ( k)Der(k, n) (note this is not a primitive
recursive definition, since there is no bound on the quantifier). This relation mirrors
derivability, that is, Dvbl(n) holds iff n is the gödel number of a formula deriv-
able in PA. Since D(x, y) numeralwise represents Der, we can think of the formula
∃xD(x, y) as formalizing Dvbl(n), and so, by mirroring, as being a formal expression
of derivability. Thus Gödel’s Theorem can be obtained by applying the Fixed Point
Lemma to a formal expression of underivability. (This analysis of Gödel’s proof,
along with the formulation of the Fixed Point Lemma, was first given by Rudolf
Carnap in 1934.) The quick proof can be further streamlined so as to highlight the
needed properties of the expression of derivability. Let Prov(y) be a formula meant
as such a formal expression, and suppose it obeys the following two conditions:
We call a formula Prov(y) a standard provability predicate iff it obeys the fol-
lowing conditions:
Adequacy. For each formula F , if ` F then ` Prov(pF q).
Formal Modus Ponens. For all formulas F and G,
` Prov(pF ⊃ Gq) ⊃ (Prov(pF q) ⊃ Prov(pGq)).
Formal Adequacy. For each formula F ,
` Prov(pF q) ⊃ Prov(pProv(pF q)q).
` F ⊃ (∼F ⊃ S0 = 0)
From (7) it follows that if PA is consistent then (Con) is not derivable. For if (Con)
were derivable then, by modus ponens, G would be derivable, and so PA would be
inconsistent.
At the end of the previous section we showed the following: let Prov(y) fulfill
Adequacy, and let G be a fixed point of ∼Prov(y); then if G is derivable, PA is in-
consistent. The foregoing proof of Gödel’s Second Theorem is a direct formalization
of the proof we gave.
Gödel’s Second Theorem differs from the First Theorem in the subtlety of what
it says. For much hinges on what we take to be a formal statement of the consistency
of the system. We have required (and used in the proof) that the formal consistency
statement have property (]), and that Prov(y) be a standard provability predicate.
A simple example shows that some such subtlety is needed. Let Prov0 (y) be the
formula Prov(y) y 6= pS0 = 0q, where Prov(y) is a standard provability predicate.
Then Prov0 (y) fulfills the Adequacy Condition. For if ` F , then ` Prov(pF q), and
moreover either F is not the formula S0 = 0, in which case ` ∼pF q = pS0 = 0q,
whence ` Prov0 (pF q), or else F is the formula S0 = 0, in which case PA is incon-
sistent, whence again ` Prov0 (pF q). But if we take (Con) to be ∼Prov0 (pS0 = 0q),
then clearly (Con) is derivable in PA. The hitch here, of course, is that this formula
(Con) does not have property (]). This shows that Prov0 (y) is not standard: in fact,
it fails to fulfill Formal Modus Ponens (although it does fulfill Formal Adequacy).
There are more complicated examples: we can construct formulas (Con) that
do possess property (]), but such that the predicate Prov(y) mentioned in (]) is not
standard (lacks, say, Formal Adequacy); and such that (Con) is derivable in PA. Of
course, we want to say that any such formula (Con) is not, in any intuitive sense, a
formal statement of the consistency of PA, because the predicate Prov(y) is not a
52 CHAPTER 3. FORMALIZED METAMATHEMATICS
good formalization of derivability. As has been pointed out, Formal Modus Ponens
and Formal Adequacy are simply the formalizations of assertions about derivability
which can be shown by elementary metamathematical arguments. And for a predi-
cate to qualify as a good formalization of derivability, it must be usable in derivations
that formalize elementary metamathematical arguments about derivability.
refutation of F (where ‘smaller’ means simply: with smaller gödel number). For-
malizations of the following heuristic arguments will then yield the two conditions,
as we shall see. First, suppose F is derivable; thus there is a derivation of F . Check
all the (finitely many) smaller derivations that there are, and you’ll see (provided
PA is consistent) that none is a refutation of F . Hence F is rovable. Second, sup-
pose ∼F is derivable; thus there is a refutation of F . No smaller derivation is one
of F (provided PA is consistent), and this can be checked. Hence if there is some
derivation of F , then there is a smaller refutation of F (namely, the given one).
Hence F is not rovable.
To formalize this we shall need two properties of the formula x 6 y defined in
§1.6:
Property (a) follows by instantiation from the last law given in §1.6. For prop-
erty (b), note first that from F (j) is logically equivalent to x = j ⊃ F (x). Hence
property (b) will follow if we have, for each number k, ` x 6 k ⊃ x = 0 ∨ x =
1 ∨ ... ∨ x = k. The proof of this is left to the reader. (See the Exercises.)
We may now define the formula Rov(y). Let Ref be the two-place primitive
recursive relation defined by Ref(k, n) ↔ Der(k, neg(n)), and let D(x, y) and R(x, y)
numeralwise represent the relations Der and Ref, respectively. Let B(x, y) be
from which ` Rov(pF q) follows by existential generalization. If, on the other hand,
PA is inconsistent, then every formula is derivable, so in particular ` Rov(pF q).
(2) Rov(y) fulfills the Rosser Condition.
54 CHAPTER 3. FORMALIZED METAMATHEMATICS
For suppose ` ∼F . Thus there is a k such that Ref(k, γ(F )), so that `
R(k, pF q). Hence ` k 6 x ⊃ (∃z)(z 6 x R(z, pF q)), or, equivalently,
Now if PA is consistent then Der(j, γ(F )) for each j ≤ k. Hence ` ∼D(j, pF q) for
j = 0, 1, ..., k. By (b) above,
` x 6 k ⊃ ∼D(x, pF q)
that PA1 is ω-consistent; and this does not follow from the ω-consistency of PA.
That is, the above argument exploits the fact that consistency is “inherited” in
passing from PAi to PAi+1 . Any analogous argument using Gödel’s Theorem would
have to show that ω-consistency is inherited; and this is not so easy. (We could
show this if we adopt a semantical position and assume PA is sound. For the fact
that the Gödel sentence is true in the intended interpretation allows us to conclude
that the system like PA but including the Gödel sentence as a new axiom is also
sound, and hence ω-consistent.)
proof of Löb’s Theorem is more or less what is obtained by formalizing the heuristic,
after ‘is true’ is replaced by ‘is derivable’. Let Leon be the sentence: if Leon is true
then all students are above average. Suppose that Leon is true; so if Leon is true
then all students are above average; so by modus ponens, all students are above
average. In the preceding sentence we have shown, on the supposition that Leon is
true, that all students are above average That is, we have shown: If Leon is true,
then all students are above average. Thus we have shown Leon is true. And then,
by modus ponens, all students are above average. Paradox!
The rigorous proof of Löb’s Theorem follows. Let F be a formula such that
` Prov(pF q) ⊃ F . We wish to show that ` F . Let H be a fixed point of the formula
Prov(y) ⊃ F . That is,
` H ≡ (Prov(pHq) ⊃ F ).
The argument that follows contains several truth-functional steps; to highlight the
truth-functional forms involved, we shall abbreviate the formula Prov(pHq) ⊃ F by
J and the formula Prov(pProv(pHq)q) by K.
Löb’s Theorem has many applications, and can be used to show both derivabil-
ities and underivabilities. For example, as was first pointed out by Georg Kreisel,
Gödel’s Second Theorem is an easy corollary of Löb’s. For suppose ` (Con). Recall
that (Con) is ∼Prov(S0 = 0). By truth-functional logic, ` Prov(S0 = 0) ⊃ H
for any formula H; in particular, ` Prov(S0 = 0) ⊃ S0 = 0. And then, by Löb’s
Theorem, ` S0 = 0, so that PA is inconsistent. Another example concerns the
Rosser sentence R. In the Exercises, we have seen that ` G ⊃ Con, where G is the
Gödel sentence; using Löb’s Theorem, we can show the same does not hold of R,
if PA is consistent. For suppose ` R ⊃ (Con). As we also saw in the Exercises,
` (Con) ⊃ ∼Prov(p∼Rq). Hence, by truth-functional logic, ` R ⊃ ∼Prov(p∼Rq).
3.4. LÖB’S THEOREM 57
` Prov(pProv(pF q) ⊃ F q) ⊃ Prov(pF q)
Since Formal Modus Ponens gives us
we can obtain the desired result by applying the result of the preceding paragraph.
Other applications of Löb’s Theorem are contained in the exercises. Here we
offer one more: we use Löb’s Theorem to show one way in which ω-consistency is a
rather more intricate notion than simple consistency. Define a sequence A0 , A1 , ... of
formulas thus: A0 is S0 = 0; A1 is Prov(pS0 = 0q); A2 is Prov(pProv(pS0 = 0q)q);
and, in general, Ai+1 is Prov(pAi q). We assume Prov(y) has the form ∃xD(x, y).
(Note that ∼A1 is just the formula (Con).)
(1) ` Ai ⊃ Ai+1 for each i.
For i = 0 this follows since ` ∼S0 = 0. For i > 0 this follows from Formal Adequacy.
(2) If PA is consistent then no Ai for i > 0 is refutable in PA.
Were ∼Ai derivable in PA for i > 0, then by (1), ∼A1 would be derivable in PA.
That is, ` (Con). But then, by Gödel’s Second Theorem, PA would be inconsistent.
(3) If PA is ω-consistent then no Ai is derivable in PA.
The proof is by induction on i. For i = 0 this follows just from the consistency of PA.
Suppose Ai is not derivable. Then, by numeralwise expressibility, ` ∼D(n, pAi q)
for each number n. But Ai+1 is just (∃x)D(x, pAi q). Hence, were Ai+1 derivable,
PA would be ω-inconsistent.
(4) The system obtained by adding any Ai as a new axiom to PA is ω-inconsistent.
58 CHAPTER 3. FORMALIZED METAMATHEMATICS
59
60 CHAPTER 4. FORMALIZING PRIMITIVE RECURSION
are all true, where k is the numerical value of t. (Since here ∀v(v ≤ t ⊃ F (v)) is a
sentence, the term t must be a closed term.) The property of truth for ∆0 -sentences
is primitive recursive; and for any ∆0 -sentence F , if F is true then ` F , and if F is
not true, then ` ∼F (see the Exercises). Thus PA is ∆0 -complete.
A Σ1 -formula is a formula ∃v1 . . . ∃vn F , where F is ∆0 . That is, a Σ1 -formula
may contain unbounded quantifiers, but they must all be existential quantifiers and
must govern the rest of the formula. Symmetrically, a Π1 -formula is a formula
∀v1 . . . ∀vn F , where F is ∆0 ; here all the unbounded quantifiers are universal. (In
these, we also let n = 0, so that every ∆0 -formula counts as both a Σ1 -formula and
a Π1 -formula.)
Some logical equivalences should be remarked on at once. The negation of a
Σ1 -formula is equivalent to a Π1 -formula, and vice-versa. The conjunction of two
Σ1 -formulas is logically equivalent to a Σ1 -formula, as is the disjunction of two
Σ1 -formulas. And the conjunction or disjunction of two Π1 -formulas is logically
equivalent to a Π1 -formula. Hence we shall often speak somewhat loosely, for ex-
ample, of a conjunction of Σ1 -formulas as though it were itself a Σ1 -formula, and a
negation of a Σ1 -formula as thought it were itself a Π1 -formula.
In §4.3 we shall prove the
It follows from the Theorem that every primitive recursive relation is numeralwise
representable in PA both by a Σ1 -formula and by a Π1 -formula. For suppose R is a
primitive recursive relation, say, of two arguments. By definition, the characteristic
function χ of R is primitive recursive, so there is a Σ1 -formula F (x, y, z) that numer-
alwise represents χ. The formula F (x, y, S0) is Σ1 and the formula ∼F (x, y, 0) is Π1 .
Both formulas, we claim, numeralwise represent R. For if R(k, n) then χ(k, n) = 1,
so that ` F (k, n, z) ≡ z = S0, and consequently ` F (k, n, S0) and ` ∼F (k, n, 0).
And if R(k, n), then χ(k, n) = 0 so that ` F (k, n, z) ≡ z = 0, and consequently
` ∼F (k, n, S0) and ` F (k, n, 0).
Let us now look at the formulas we considered in Chapters 2 and 3. Let us
assume that we take those formulas to be of the least complexity possible.
The Gödel sentence is Π1 . As we originally defined it in §2.4 the Gödel sentence
is obtained from a numeralwise representation of the primitive recursive relation
Der(k, diag(n)) by universally quantifying one free variable and replacing the other
free variable with a numeral. Since the numeralwise repesentation may be taken to
be Π1 , the Gödel sentence will also be Π1 .
4.2. Σ1 -COMPLETENESS AND Σ1 -SOUNDNESS 61
in the streamlined proof of Gödel’s First Theorem given at the end of §3.1.
From Σ1 -soundness we may also infer that if we add an irrefutable Π1 -sentence
as a new axiom, no Σ1 -sentences that were previously underivable become derivable,
so that the extended system remains Σ1 -sound. For let J be the Π1 -sentence, and let
K be a Σ1 -sentence derivable in the expanded system. By the Deduction Theorem
(see Exercise 2.?), J ⊃ K is derivable in PA. J ⊃ K is equivalent to a Σ1 -sentence,
so by the Σ1 -soundness of PA it is true. Since J is irrefutable, ∼J is not true; hence
K is true. Since PA is Σ1 -complete, ` K.
Finally, we note that Σ1 -soundness is equivalent to the restriction of ω-consistency
to Σ1 -sentences: there is no Σ1 -sentence ∃xF (x) such that ` ∃xF (x) and also
` ∼F (n) for every n. This property is called 1-consistency. Suppose PA is Σ1 -
sound and ` ∃xF (x), where ∃xF (x) is Σ1 . Then ` F (k) for some k, as we noted
above, and ∼F (k) is not derivable, by consistency. (For the implication in the other
direction, from 1-consistency to Σ1 -soundness, see the Exercises.)
If PA is Σ1 -sound, a Σ1 - sentence is derivable only if some numerical instance of
it is derivable. This does not hold of more complex existential sentences, assuming
PA is consistent. For example, let F (x) be the formula (x = 0 R ∨ x = S0 ∼R),
where R is the Rosser sentence. Since ` R ∨ ∼R, we also have ` ∃xF (x). But F (k)
is not derivable for any k. For if k is greater than 1 then ` ∼F (k), while ` F (0) ⊃ R
and ` F (S0) ⊃ ∼R. Rosser’s Theorem tells us that neither ` R nor ` ∼R. Hence
F (k) is not derivable for any k.
As we’ve noted, Σ1 -soundness implies consistency. The converse is not true. In
general there are systems that are consistent but not Σ1 -sound: for example, take
the system obtained by adding the negation of the Gödel sentence as an additional
axiom to PA. But even just restricted to PA there is no implication: it can be
shown that if F formalizes the Σ1 -soundness of PA, then not ` (Con) ⊃ F . (See
the Exercises.)
Σ1 -completeness, in contrast, is an entirely elementary property. It follows
from provabilities that can be established by elementary means in PA. This has the
consequence that formalized Σ1 -completeness is derivable in PA: for any Σ1 -sentence
F,
` F ⊃ Prov(pF q)
Formal Adequacy is a particular case of formalized Σ1 - completeness, since Prov(pF q)
is a Σ1 -sentence. (More details on formalized Σ1 -completeness are contained in the
Appendix, §6.)
From ∆0 - and Σ1 -completeness, we see that Gödel proved a best possible result.
The simplest formula that could possibly be true but not provable in PA would be
4.3. PROOF OF REPRESENTABILITY 63
(a) for any k and any sequence hj0 , . . . , jk i of integers, there exists an
integer s such that β(s, i) = ji for each i, 0 ≤ i ≤ k;
(b) β is numeralwise representable in PA by a ∆0 -formula B(x, y, z) such
that ` B(x, y, z) B(x, y, z 0 ) ⊃ z = z 0 .
0 + p · j0 + p2 · 1 + p3 · j1 + p4 · 2 + p5 · j2 + . . . + p2k · k + p2k+1 · jk
.
(Thus, if q is written as a numeral in p-ary notation, it will have 2k + 2 digits:
counting from the right the odd places will be occupied by 0, 1, 2, ..., k and the even
places by j0 , j1 , j2 , . . . , jk .) Let Q(p, m) be the relation “p is a prime and m is a
power of p”; clearly Q(p, m) is numeralwise representable by a ∆0 formula. Let
R(s, i, j) be the relation
E
( m, n, p, q, r ≤ s)(s = π(p, q) & Q(p, m) & n < m2 & q = n + m2 · i +
m2 · p · j + m2 · p2 · r)
We may then let β(s, i) = µjR(s, i, j). Condition (a) of the Lemma then follows,
for s = π(p, q), with p and q as above.
Now R(s, i, j) is also numeralwise representable by a ∆0 -formula; call it F (x, y, z).
Then let B(x, y, z) be F (x, y, z)∀x0 (x0 6 zF (x, y, x0 ) ⊃ z = x0 ). B(x, y, z) is clearly
∆0 and numeralwise represents β(s, i).
For condition (b) of the Lemma, note that B(x, y, z) B(x, y, z 0 ) implies both
z 0 6 z ⊃ z = z 0 and z 6 z 0 ⊃ z 0 = z. Since ` z 6 z 0 ∨ z 0 6 z, (b) follows.
The Sequence Encoding Lemma allows us to formalize the explicit definition of
ϕ given above: by using numeralwise representations of θ and ψ as well as B(x, y, z),
we could easily write a formula that expresses the following: there exists an s such
4.3. PROOF OF REPRESENTABILITY 65
that β(s, 0) = θ(n), β(s, i + 1) = ψ(n, i, β(s, i)) for each i < k, and β(s, k) = p.
We would thereby obtain a numeralwise representation of ϕ. However, since we
want the representation to be Σ1 , a difficulty emerges: we may assume that the
representations θ and ψ are Σ1 , but the formalization of the clause “β(s, i + 1) =
ψ(n, i, β(s, i)) for each i < k” will contain a bounded universal quantifier governing
the Σ1 -formula representing ψ, and this would no longer be Σ1 .
To overcome this difficulty, we shall require a stronger notion of representation
by a Σ1 -formula. (For readability, in what follows, we are going to use u, v, and w
as if they were formal variables of LPA , and we will use v, w, z 6 u as shorthand for
v 6 u w 6 u z 6 u.)
A formula ∃uF (u, x, y, z) is an excellent representation of a 2-place function ϕ
iff if F (u, x, y, z) is ∆0 and for all n, k and p such that p = ϕ(n, k)
Since ` ∃uF (x, n, k, p) follows from (i), an excellent representation ∃uF (u, x, y)
does indeed numeralwise represent ϕ. The additional requirements on excellent rep-
resentations are that the formula contain only one unbounded existential quantifier,
and that if p = ϕ(n, k) not just ∃uF (u, n, k, p) but also some numerical instance
F (q, n, k, p) is derivable. (As shown in the previous section, we could infer the
derivability of a numerical instance by invoking Σ1 soundness. But we do not want
to make the Representability Theorem dependent on such a metamathematical sup-
position. Instead, we will build the derivability of a numerical instance into the
construction of the representation.)
The definition of excellent representation for functions of one argument and of
more than two arguments is similar. We also will call a ∆0 formula that represents a
primitive recursive function an excellent representation. (To conform literally to the
definition above, we would need to add an existential quantifier binding a variable
that does not actually appear in the formula.)
We now show that every primitive recursive function has an excellent represen-
tation. The basic functions are all representable by atomic formulas of LPA , which
are, of course, ∆0 and so are excellent representations For composition, let us take
a simple example, which is easily generalized. Suppose ζ is a 1-place function de-
fined by ζ(k) = ν(ξ(k)), and suppose ∃uN (u, x, y) and ∃uX(u, x, y) are excellent
representations of ν and ξ. Then let Z(u, x, y) be
` ∃uZ(u, k, n, y) ⊃ y = p
because ∃uZ(u, k, y) provably implies ∃u∃v∃z(X(u, k, z) N (v, z, y)), which by sup-
position provably implies ∃v∃z(z = j N (v, z, y)); by dint of the laws of identity this
provably implies ∃uN (u, j, y), which again by supposition provably implies y = p.
For recursion, let us consider the example of the 2-place function function ϕ
defined from θ and ψ as at the beginning of this section. Let ∃uT (u, x, y) be an
excellent representation of θ, and ∃uR(u, v, x, y, z) be one of ψ. For readability, we
introduce more shorthand: we exploit (b) of the Sequence Encoding Lemma and use
(v)x as shorthand meant to express “the number z such that B(v, x, z)”. That is, a
formula F ((v)x ) is shorthand for ∃z(z 6 v B(v, x, z) F (z)). Condition (b) assures
us that we can treat the shorthand exactly as if (v)x were a term of LPA : that is,
` F ((v)x ) (v)x = y ⊃ F (y). (See the Exercises.)
Let P (u, x, y, z) be
where H(u, v, x, y) is
Formalized Semantics
69
70 CHAPTER 5. FORMALIZED SEMANTICS
and so on. After all, one might argue, it is biconditionals like this which give
us the sense of the notion of truth. So whatever else our account tells us about
truth, it had better yield those biconditionals. The biconditionals are sometimes
called Tarski paradigms, Tarski biconditionals, or T-sentences. We may phrase the
condition generally: the account must yield everything of the form
S is true iff p
where for p we substitute a sentence and for S a name of that sentence. So let us
consider the language LPA , with its intended interpretation. Tarski’s Theorem can
be obtained immediately from the Fixed Point Theorem. For suppose T (y) is any
formula of LPA with one free variable. By the Fixed Point Theorem applied to the
formula ∼T (y), there is a sentence H of LPA such that ` ∼T (pHq) ≡ H. Hence
by the soundness of PA, ∼T (pHq) ≡ H is true in the intended interpretation of
PA. But then T (y) cannot be a truth predicate for LPA ; it fails to act like a truth
predicate on the formula H.
Clearly the same argument can work more generally. Given a language L,
suppose Σ is a formal system that is sound for the intended interpretation of L
and in which the primitive recursive function diag can be numeralwise represented.
Then the Fixed Point Theorem may be proved, and as above we obtain, given any
purported truth predicate T (y), a sentence H on which T (y) fails to act like a truth
predicate. Note that, first, the sentence H is obtained constructively. That is, the
proof of the Fixed Point Theorem provides a method for constructing H, given T (y).
Note second that the failure of T (y) to act like a truth predicate on H is exhibited
proof-theoretically, by the derivability of a biconditional that contradicts Tarski’s
paradigm T (pHq) ≡ H. Of course, to obtain such a proof-theoretic counterexample,
we have to bring in a formal system Σ. To obtain Tarski’s Theorem directly, that is,
without any use of formal systems, we would have to proceed entirely semantically.
We carry this out immediately below.
The proof of Tarski’s Theorem is just a formalization of the Liar Paradox, whose
starkest form is this: let (*) be the sentence “Sentence (*) is not true”. If (*) is true
then it is not true, and if it is not true then it is true; contradiction. The paradox
arises, so it seems, from the assumption that the words “is true” function as a truth
predicate for all English sentences, including those containing the words “is true”.
The only thing needed to carry the argument out in a formal setting is a way of
getting the effect of the self-reference, of having (*) be a sentence that talks about
itself. That is the role of Gödel’s function diag.
In the proof given above, the formal system Σ entered only through the use
5.1. TARSKI’S THEOREM 71
of the Fixed Point Theorem. A proof of Tarski’s Theorem that does not mention
formal systems at all can be obtained if we replace this step by a use of a semantic
form of the Fixed Point Theorem, to wit: for every formula F (y) of L there is a
formula H such that F (pHq) ≡ H is true. Given this, for every formula T (y) of L
there will be a sentence H such that ∼T (pHq) ≡ H is true, and so T (y) fails to be
a truth predicate. To prove the semantic Fixed Point Theorem, we introduce the
notion of definability in L.
not preclude the existence of a truth predicate for L in some language extending L.
Tarski went on to show how truth predicates for various formal languages can indeed
be constructed in more extensive formal languages. We now turn to an examination
of his definition of truth, taking LPA as the formal language to be treated.
TrAt (n) ≡ Sent(n) & Atform(n) & (∃j)(∃k)(j, k ≤ n & n = j∗25 ∗k & nv(j) = nv(k)).
(Here Sent(n) is the p.r. relation true just of the gödel numbers of sentences, i.e.,
Sent(n) iff Form(n) & (∀i, j)(i, j ≤ n → Free(i, j, n)).) Clearly TrAt (n) holds iff n is
the gödel number of a true atomic sentence. Note, by the way, that TrAt is primitive
recursive.
The inductive definition of truth is obtained by mirroring the inductive char-
acterization displayed above:
(∗) Tr(n) ≡ Sent(n) & [(Atform(n) & TAt (n))
∨ (∃i < n)(n = neg(i) & Sent(i) & Tr(i))
5.2. DEFINING TRUTH FOR LPA 73
Definition (∗) uniquely determines the numbers of which Tr holds. This may be
shown by an induction on logical complexity. Let lc(n) = k if n is the gödel number
of a sentence of LPA containing k occurrences of the logical signs “∼”, “∨”, “∀”; also
let lc(n) = 0 if n is not the gödel number of a sentence. Clearly, Tr(n) is uniquely
determined for all n with lc(n) = 0, since for such n we have either Sent(n) or
Atform(n); and if Tr(k) is determined for all k with lc(k) < lc(n), then the clauses
of (∗) suffice to fix whether or not Tr(n). It should also be clear that the property
Tr so determined is the one we want: Tr(n) holds iff n is the gödel number of a true
sentence of LPA .
Although definition (∗) is inductive, it is not a primitive recursive definition,
because there is an unbounded quantifier on the right-hand side. Whether or not
Tr(n) holds when n = gen(k, i) depends upon whether Tr holds of an infinite number
of other numbers. This makes it impossible to obtain an explicit definition by the
method of Chapter 4. Indeed, Tarski’s Theorem tells us that an explicit definition—
that is, a biconditional of the form Tr(n) ≡ X where X does not contain Tr—cannot
be obtained if X is limited to number-theoretic notions, that is, to primitive recursive
functions and relations, truth-functions, and quantifiers that range over the integers.
To obtain an explicit definition of truth for LPA a more powerful metalanguage
must be used. In fact, we can obtain such an explicit definition by allowing the
metalanguage to include variables and quantifiers ranging over sets of integers. Using
“X” as such a variable, let (∗X ) be obtained from (*) by replacing each occurrence
of “Tr( )” with an occurrence of “ ∈ X” (“n ∈ X” means “n belongs to the
set X”). As we noted above, there is one and only one set X such that (∗X ) holds
for all n. Hence we may define
And since ∀n(∗X ) holds for one and only one X, we could equally well define
on p, that for each p there exists a p-good set, and that all p-good sets agree on
those n such that lc(n) ≤ p. A set X such that (∗X ) holds for all n can then be
obtained by stitching together p-good sets for each p. Indeed, one can even avoid
the necessity of showing the existence of a set X such that (∀n)(∗X ) by framing a
definition of Tr directly in terms of p-good sets. We can define
,
that is, Tr(m) holds iff m belongs to every lc(m)-good set. It will still follow
that Tr itself obeys (∗) for all n. End of Note.
`∗ Tr(pF q) ≡ F.
(The proof of this is not entirely straightforward, but basically it proceeds by in-
duction on the logical complexity of F .)
To say that PA is sound is to say that all derivable formulas are true. Here,
however, we are using “true” to apply not just to sentences but to formulas with free
5.3. USES OF THE TRUTH-DEFINITION 75
variables as well. A formula with free variables is said to be true iff it is true for all
values of its variables. This holds iff the universal closure of the formula is true, and
also iff all numerical instances of the formula are true, where a numerical instance
of a formula is the result of replacing all free variables with formal numerals.. Let
NI(m, n) be the p.r. relation that holds iff m is the gödel number of a formula and
n is the gödel number of a numerical instance of that formula. Let Prov(y) be a
standard formalization of derivability in PA. The following Claim is, then, that the
soundness of PA is derivable in the formal system.
Sketch of Proof. One shows the derivability of the formalizations of “Every axiom of
PA is true” and of “The rules of inference preserve truth”. Indeed, the derivability of
the latter, and of the former for the logical axioms of PA, follows in a straightforward
way from the derivability of (∗). For example, consider an axiom of the form F ⊃
(F ∨ G). All numerical instances of this axiom have the form F 0 ⊃ (F 0 ∨ G0 ), where
F 0 and G0 are sentences. From (∀n)(∗) we can infer (gödelizations) of the following
claims: if F 0 and G0 are sentences, then F 0 ⊃ (F 0 ∨ G0 ) is true iff either F 0 is not
true or (F 0 ∨ G0 ) is true, and (F 0 ∨ G0 ) is true iff either F 0 is true or G0 is true. Hence
F 0 ⊃ (F 0 ∨ G0 ) is true whenever F 0 and G0 are sentences; and hence the numerical
instances of any formula F ⊃ (F ∨ G) is true.
The same sort of argument works for the other logical axioms and for the rules
of inference. These arguments can easily be formalized to yield derivations in the
formal system. The individual nonlogical axioms of PA can most easily be derived
true by using the Tarski paradigms applied to their universal closures, since they are
also axioms of the extended system. That is, for example, `∗ Tr(p∀x(∼Sx = 0)q)
because `∗ Tr(p∀x(∼Sx = 0)q) ≡ ∀x(∼Sx = 0) and `∗ ∀x(∼Sx = 0). It re-
mains to show that the truth of all induction axioms F (0) ∀x(F (x) ⊃ F (Sx)) ⊃
∀xF (x), where F (x) is any formula of LPA , can be derived in the formal sys-
tem. We must show that, for any numerical instance F 0 (x) of F (x), if F 0 (0) and
∀x(F 0 (x) ⊃ F 0 (Sx)) are true then so is ∀xF 0 (x). It should not be surprising that
to derive this we must use induction. Consider the property that holds of a num-
ber n iff F 0 (n) is true. If F 0 (0) and ∀x(F 0 (x) ⊃ F 0 (Sx)) are true then 0 has the
property and the successor of any number with the property also has the prop-
erty. Hence by induction every number has the property; hence ∀xF 0 (x) is true.
Note that, because we are showing this for any numerical instance F 0 (x) of any
formula F (x), the property must invoke the notion of truth; hence the formaliza-
76 CHAPTER 5. FORMALIZED SEMANTICS
tion of this argument will use an induction axiom that contains the formula Tr(x).
Now let Con(PA) be the formalization of the consistency of PA, i.e., the formula
∼Prov(pS0 = 0q).
Corollary `∗ Con(P A)
Proof. Since (∗) is derivable, so are Tarski paradigms; hence `∗ Tr(pS0 = 0q) ≡
S0 = 0. Since the formal system extends PA, `∗ ∼S0 = 0. Hence `∗ ∼Tr(pS0 = 0q).
By the Claim, `∗ ∼Prov(pS0 = 0q).
Note that a similar argument shows that in the formal system we can derive
the formal statement of the ω-consistency of PA. The Corollary shows that our
envisaged extension of PA can derive formulas of LPA , that are not derivable in PA.
Thus, not only is it expressively richer—it can formalize notions that PA cannot,
like truth for LPA —but also it is deductively stronger with respect to that part of
the language that is common to it and LPA .
have to be observed. That is, there are separate universal instantiation axiom
schemata for each sort: ∀vF ⊃ F (v/t) for v a numerical variable and t a term,
and ∀V F ⊃ F (V /U ), for V and U set variables. The formal system SA, or full
second-order arithmetic, has the following non-logical axioms: the number-theoretic
axioms (N1)–(N6) of PA, together with the induction axiom
X(0) (∀x)(X(x) ⊃ X(Sx)) ⊃ ∀xX(x)
in which the set variable X is a free variable; and the comprehension axioms
∃X∀x(X(x) ≡ F (x))
whenever F (x) is a formula of LSA in which X does not occur free. As a result
of including the comprehension axioms, mathematical induction can be framed as
a single axiom rather than as a schema. That is, if F (x) is any formula, F (0)
∀x(F (x) ⊃ F (Sx)) ⊃ ∀xF (x) can be derived from the single axiom of mathematical
induction and the comprehension axiom ∃X∀x(X(x) ≡ F (x)).
SA is a very powerful formal system. It goes far beyond number theory: since
real numbers can be encoded as sets of integers, theorems of the theory of real
numbers and of the calculus can be derived in it. For that reason, SA is sometimes
called “classical analysis”. (Of course, despite its strength, SA is still incomplete,
assuming it is consistent. That is, since clearly SA can be gödelized, all p.r. functions
can be represented, and a standard provability predicate can be specified, all the
results of Chapter 3 apply to SA.)
Now the language LSA can be used to formalize the definition of truth for LPA .
And SA is certainly powerful enough to derive the formalization of (∀n)(∗). SA
also contains the mathematical induction axioms needed to carry through the proof
of the soundness of PA. Hence we see that there are sentences of LPA that are
derivable in SA but not in PA. (Assuming PA consistent.) Mathematicians used
to ask whether any theorem about the numbers proved using “analytic methods”
(methods that invoked the real numbers and their laws) could in principle be proved
using only “elementary methods”, that is, based on the usual first-order properties
of the integers. What we have seen shows the answer to this question is negative.
Now SA is far more powerful a system than is needed to show the soundness
of PA. The system ACA (arithmetical comprehension axioms) suffices. This theory
has, instead of all comprehension axioms, just those obtained from the comprehen-
sion schema by replacing F (x) with a formula containing no bound set variables.
Thus the only formulas that determine sets of integers are those restricted to quan-
tifying over integers. However, because of this restriction on comprehension axioms,
78 CHAPTER 5. FORMALIZED SEMANTICS
Since this conditional is derivable in PA, it is derivable in SA. The consequent of this
conditional is just Con(SA), and so is not derivable in SA (assuming SA consistent).
Hence the antecedent is not derivable in SA, that is, in SA we cannot derive the
statement that the inconsistency of SA is not derivable in PA.
hp1 , . . . , pm i and q. Lets say this function is ϕ(hp1 , . . . , pm i, q). We have: F is true
iff there exists a sequence hp1 , . . . , pm i such that ϕ(hp1 , . . . , pm i, q) is the gödel num-
ber of a true Πn sentence. That is, F is true iff there exists a sequence hp1 , . . . , pm i
and there exists a r such that
1. r = ϕ(hp1 , . . . , pm i, q);
The method used in the preceding sections for LPA can be applied to a formal lan-
guage provided that the language contains names for each member of the universe of
discourse of the intended interpretation. If this condition does not hold, a difficulty
arises in trying to define the truth of universal quantifications ∀xF (x), since the
truth of such a sentence is no longer specifiable in terms of the truth of its instances
F (c), where c is a term of the language. All we can say is that ∀xF (x) is true iff
F (x) is true when x is assigned any value from the universe of discourse. But then
we must provide a definition not just of truth but of the broader notion “truth under
an assignment of values to the free variables”. In this, we must treat all formulas of
the language, not just sentences.
Let U be the universe of discourse of the intended interpretation of a formal
language L. Assignments of values from U to the variables of language L may be
identified with finite sequences of elements of U ; such a sequence hs1 , . . . , sk i can be
taken to assign s1 to the alphabetically first variable of L, s2 to the alphabetically
second variable of L, . . . , sk to the alphabetically k th variable of L. For variables
later than the alphabetically k th , let us take sk , the last member of the sequence,
as the assigned value. In other words, if σ is a finite sequence of members of U of
length m, let (σ)i be the ith member of σ if i ≤ m and let (σ)i = (σ)m if i > m. We
shall take σ to assign (σ)i to the alphabetically ith variable of language L for every
i.
We say that a finite sequence σ satisfies a formula F iff F is true under the
assignment of values that σ encodes. Note that σ assigns values to all variables of
the language; but if the alphabetically ith variable of L does not occur free in F ,
then whether or not a sequence σ satisfies F will not depend on (σ)i (provided that
i is less than the length of σ).
We seek an inductive definition of satisfaction. The only trick lies in the treat-
ment of quantification. Now ∀vF (v) is true under an assignment of values to vari-
ables just in case the formula F (v) is true under every assignment that differs from
the given one at most in what is assigned to the variable v. For in that case, F (v) is
satisfied no matter what value is assigned to v, while the values of the other variables
remain fixed.
Let SatAt (σ, n) be the satisfaction relation for atomic formulas; i.e., it holds if n
is the gödel number of an atomic formula of L that is satisfied by σ. Let Atform, neg,
dis, gen, and Form be functions and relations that mirror the obvious syntactical
operations and properties for language L; and for each i let var(i) be the number
82 CHAPTER 5. FORMALIZED SEMANTICS
correlated with the alphabetically ith variable of L. The inductive definition is then:
Sat(σ, n) ≡ Form(n) & [(Atform(n) & SatAt (σ, n))
∨ (∃i < n)(n = neg(i) & Form(i) & Sat(σ, i))
∨ (∃i, j < n)(n = dis(i, j) & (Sat(σ, i) ∨ Sat(σ, j)))
∨ (∃i, k < n)(n = gen(var(k), i)
& (∀σ 0 )((∀j)((j 6= k → (σ 0 )j = (σ)j ) → Sat(σ 0 , i))].
It remains to see how SatAt may be defined. Details here will depend on the
vocabulary of the language L. Suppose, for example, that L contains, as nonlogical
vocabulary, just finitely many predicate-signs P1 , . . . , Pm , each of which is two-
place. (Thus L contains no constants or function-signs.) Suppose the intended
interpretation of L is given by a universe of discourse U and the m two-place relations
Φ1 , . . . , Φm on U . Let v1 , v2 , . . . be the variables of L in alphabetic order. Then we
would define
SatAt (σ, γ(F )) ≡ [(F has the form P1 vi vj & Φ1 (σ)i (σ)j )
∨ (F has the form P2 vi vj & Φ2 (σ)i (σ)j )
∨ . . . ∨ (F has the form Pm vi vj & Φm (σ)i (σ)j )]
If L contains function signs, then we must specify the value of the terms of L under
assignments σ. For example, for LPA , we would define a function val(σ, n) so that
if n is the gödel number of a term, then val(σ, n) is the value of that term when the
variables in it take the values assigned by σ. We would then define
(2) syntactic objects of L (by gödelization this can amount to nothing more than
talking about numbers);
(3) the relations and functions that are the interpretations of the predicates and
function signs of L.
Moreover, by a device similar to that of §??, the inductive definition can be
converted into an explicit definition of Sat in a language that contains, in addition,
quantification over relations between sequences of members of U and (the Gödel
5.6. TRUTH FOR OTHER LANGUAGES 83
numbers of) formulas of L. That is, let (†R ) be the result of replacing, in the
inductive definition of Sat above, all occurrences of “Sat” with the variable R. We
can then define Sat explicitly by
or by
Sat(σ, n) ≡ (∀R)((†R ) → R(σ, n)).
Finally, given a definition of satisfaction, we can then define truth as follows
Computability
6.1 Computability
The notion of algorithm, or effective procedure, is an intuitive one that has been
used in mathematics for a long time. An algorithm is simply a clerical procedure
that can be applied to any of a range of inputs and will, on any input, yield an
output. The basic idea is that an algorithm is a bunch of rules that can be applied
mechanically; obtaining an output from any given input is just a matter of applying
those rules (mindlessly, so to speak).
Before the 1930s, the notion was used in this intuitive sense. For example,
as the tenth of the mathematical problems he formulated in 1900, Hilbert asked
whether there is an algorithm which, if applied to any polynomial (containing any
number of variables) with integer coefficients, determines whether or not there are
integral values for the variables of the polynomial that give the polynomial the value
0. We ourselves have made intuitive use of the notion of algorithm. For example,
in defining formal language and formal system, we said there must be an effective
procedure for telling whether or not any given string of signs is a formula, and there
must be an effective procedure for telling whether or not any given finite sequence
of formulas is a derivation.
A number-theoretic function is said to be computable, in the intuitive sense, (or
algorithmic) iff there is an algorithm for computing the function, i.e., an algorithm
that yields, given any integers as inputs, the value of the function for these inputs
as arguments. Our first aim in this unit is to give a precise mathematical definition
of the notion of computable function.
Now we have seen a class of functions each of which is clearly computable, in
85
86 CHAPTER 6. COMPUTABILITY
the intuitive sense, namely, the primitive recursive functions. Perhaps one might
think that this class exhausts the computable functions—that any function which
we would intuitively call algorithmic is in fact primitive recursive. But this is not
the case, as the following heuristic argument indicates.
First, the specification of the p.r. functions allows us to consider all p.r. defini-
tions as being written in some standard symbolic form. We may then effectively list
all p.r. definitions of one-place functions. Say this list is D1 , D2 , D3 , . . .; and for each
i let ψi be the p.r. function that Di defines. (We can, for example, gödelize the p.r.
definitions and then list them in increasing order of gödel number.) Now consider
the following procedure: given input n, find the p.r. definition Dn . Then compute
the value of ψn at argument n; since Dn is a p.r. definition of ψn , it tells us how to
do this. Add one to this value of ψn ; the result the output. What was just described
is clearly an algorithm. This algorithm computes a function; call that function ϕ.
Then ϕ cannot be primitive recursive. For suppose it were; then it would have a
p.r. definition, and that definition would occur on our list, say as Dk . That is, we
would have ϕ = ψk . But the value of ϕ at argument k is ψk (k) + 1; so ϕ cannot be
identical with ψk . Thus ϕ is not p.r. But ϕ is computable.
This shows that the p.r. functions do not exhaust the computable functions.
Indeed, our argument yields more: it shows that however we eventually define the
notion of computable, it cannot be possible to list effectively algorithms that com-
pute all and only the computable functions.
Our eventual definition of computable function will handle this pitfall as fol-
lows. We shall define a notion of “computing instructions”, that is, a standard
form of algorithm. A computable function is any function that can be computed
by a computing-instruction. There will be an effective procedure for listing all the
computing-instructions. But not all computing-instructions succeed in computing
functions, and it will be impossible to “separate out”, in an effective manner, just the
computing instructions that do compute functions. This will become clearer when
we see the details. In fact, we shall in fact give two explications of computability:
the first uses formal systems of a particular sort, called “Herbrand-Gödel- Kleene
systems” (also called the “equation calculus”); the second uses an abstract model
of a computing machine, called a “Turing machine”. It will turn out that these
explications are equivalent.
6.2. RECURSIVE AND PARTIAL RECURSIVE FUNCTIONS 87
If t and u are terms then t = u is a formula. All formulas are called equations.
An HGK-system is simply a finite set of equations. We treat each HGK-system
as a formal system: the notions of derivation and derivability in any such system E
are determined by taking the members of E to be the axioms, and allowing just the
following two rules of inference.
`E f (p1 , p2 , . . . , pn ) = r ↔ ϕ(p1 , p2 , . . . , pn ) = r.
The function ϕ is said to be general recursive (or, more briefly, recursive) iff it is
defined by some HGK-system E.
The notion of general recursive function is what we shall adopt as our precise
mathematical explication of the intuitive notion of computable function. Every
general recursive function is computable in the intuitive sense. For suppose ϕ is
defined by the HGK-system E. Then to compute ϕ(p1 , . . . , pn ) one simply makes
an exhaustive search through the derivations in system E until one finds a derivation
of an equation f (p1 , p2 , . . . , pn ) = r for some r; r is then the value of ϕ(p1 , . . . , pn ).
Since E defines ϕ , we are assured that there is such a derivation in system E.
Not every HGK-system defines a function. For example, suppose E contains
the one equation f (Sx) = SS0. Then `E f (p) = 2 for every p > 0, but no equation
of the form f (0) = r is derivable from E. We shall consider this phenomenon more
closely two pages hence. For now, we are interested solely in HGK-systems that do
define functions.
We start by investigating the extent of the general recursive functions.
Fact 6.2. The class of general recursive functions is closed under composition.
Proof. Obvious.
Our third fact about the extent of the general recursive functions gives us some-
thing new: the possibility of specifying functions by use of the unbounded leastness
operator µ.
g1 (x, 0) = x
g2 (x, 0) = 0
g(x1 , . . . , xn , 0) = S0
h(Sx, 0, y) = y
We claim that the system E ∗ defines the function µk[ϕ(p1 , . . . , pn , k) = 0], and
hence that function is general recursive. To prove the claim it suffices to note the
following:
1. `E ∗ g2 (i, j) = k iff k = i · j.
Qk
2. `E ∗ g(p1 , p2 , . . . , pn , k + 1) = q iff q = i=0 ϕ(p1 , . . . , pn , i).
4. `E ∗ f (p1 , p2 , . . . , pn ) = k iff
Qk−1 Qk
i=0 ϕ(p1 , . . . , pn , i) 6= 0 but i=0 ϕ(p1 , . . . , pn , i) = 0.
If the function ϕ has the property that (∀p1 )(∀p2 ) . . . (∀pn )(∃k)[ϕ(p1 , . . . , pn , k) =
0] then we say that the application of the unbounded leastness operator µ to ϕ is
licensed. Thus Facts 6.1–6.3 tell us that the class of general recursive functions con-
tains the primitive recursive functions and is closed under composition and licensed
90 CHAPTER 6. COMPUTABILITY
application of µ. In §?? we shall show the converse: every general recursive func-
tion can be obtained from primitive recursive functions by composition and licensed
application of µ.
As we pointed out above, not every HGK-system defines a function. For a sys-
tem E to define an n-place function, there must exist for all p1 , . . . , pn a unique r such
that `E f (p1 , . . . , pn ) = r. This condition can fail in two ways: for some p1 , . . . , pn
there may be distinct q and r with `E f (p1 , . . . , pn ) = q and `E f (p1 , . . . , pn ) = r;
or for some p1 , . . . , pn there may be no r with `E f (p1 , . . . , pn ) = r.
The former problem may easily be avoided. We simply redefine the notion of
derivability in an HGK-system as follows: we now say that an equation
f (p1 , . . . , pn ) = r is derivable in a system E iff, first, there is a derivation of it
in E and, second, no smaller derivation is a derivation of f (p1 , . . . , pn ) = q for
q 6= r (“smaller” in the sense of a gödel numbering, which we assume has been
fixed). This device—inspired, as should be obvious, by Rosser’s proof of §??—
makes it the case that for any p1 , . . . , pn there is at most one integer r such that
`E f (p1 , . . . , pn ) = r.
The latter problem, however, admits no such solution. Examples of HGK-
systems in which for some p1 , . . . , pn no equation f (p1 , . . . , pn ) = r can be derived
are easy to formulate. We have already seen a simple one. Here is another: let E
contain just the equations g(x, 0) = x, g(x, Sy) = Sg(x, y), f (g(x, x)) = x. Then
`E f (p) = r iff p is even and r = p/2. Thus E does not define a 1-place function.
E does define something, though; namely, a function whose domain is just the even
integers, and which takes each integer in its domain to one-half of that integer. Such
a function is called a partial function.
Definition. An n-place partial function is an integer-valued function whose
domain is some set of n-tuples of integers. If the domain of an n-place partial
function ϕ is the set of all n-tuples, then ϕ is said to be total. (The domain of
an n-place partial function may be all n-tuples, or it may be empty, or it may be
something in between.)
We may now take it that every HGK-system E defines an n-place partial
function for each n > 0, namely, the unique partial function ϕ such that for all
p1 , . . . , pn , r,
`E f (p1 , . . . , pn ) = r iff ϕ(p1 , . . . , pn ) = r.
Thus the domain of ϕ is the set of n-tuples hp1 , . . . , pn i such that for some r
`E f (p1 , . . . , pn ) = r. An n-place partial function ϕ is said to be partial recur-
sive iff it is defined by some HGK-system. Thus, a general recursive function is
simply a partial recursive function that is total.
6.3. THE NORMAL FORM THEOREM AND THE HALTING PROBLEM 91
We saw in the previous section that licensed application of the leastness operator
µ to a general recursive function yields a general recursive function (Fact 6.3). The
same proof shows that any application of µ—licensed or not—to a general recursive
function yields a partial recursive function. That is, if ϕ is general recursive, then
the function µk[ϕ(p1 , . . . , pn , k) = 0], which takes hp1 , . . . , pn i to the least k such
that ϕ(p1 , . . . , pn , k) = 0 if there is such a k and takes no value on hp1 , . . . , pn i if
there is no such k, is partial recursive. That function will be total, and hence general
recursive, just in case the application of µ is licensed.
Note: The leastness operator µ may also be applied to partial recursive func-
tions that are not total; but here the definition must be phrased with care. Reflection
on the proof of Fact 6.3 shows that, if that proof is to show µk[ϕ(p1 , . . . , pn , k) = 0]
to be partial recursive when ϕ is partial recursive, we should define
the least k such that
ϕ(p1 , . . . , pn , k) = 0 if such a k exists
and ϕ(p1 , . . . , pn , j) has
µk[ϕ(p1 , . . . , pn , k) = 0] =
a valuefor each j < k.
no value otherwise.
(I) Each HGK system is correlated with a finite set of integers, the gödel numbers
of its axioms. These finite sets, in turn, can be correlated with integers, so
that every integer corresponds to an HGK system and vice versa. We call the
integer so correlated with an HGK-system E the index number of E.
(II) For each n > 0 there is a (n + 2)-place primitive recursive function Dern such
that Dern (e, p1 , . . . , pn , q) = 0 if and only if q is the gödel number of a deriva-
tion in the HGK-system with index number e, and the last line of this deriva-
92 CHAPTER 6. COMPUTABILITY
tion has the form f (p1 , . . . , pn ) = r for some r; and Dern (e, p1 , . . . , pn , q) = 1
otherwise.
(III) There is a 1-place primitive recursive function Res such that if q is the gödel
number of a sequence of equations the last of which has a formal numeral r
on the right-hand side, then Res(q) = r; and Res(q) = 0 otherwise.
Note that ϕ is total (and hence general recursive) iff the application of µ is licensed.
The Normal Form Theorem shows that every partial recursive function can be
obtained by starting with a primitive recursive function, applying the µ-operator,
and composing with a primitive recursive function. The partial function will be
total, and hence general recursive, iff the application of µ is licensed.
Note: In the statement of the Normal Form Theorem and below, when we
use “=” between two expressions for partial functions we mean: when both sides
have values then those values are identical; and when one side takes no value, then
neither does the other. End of Note.
As we’ve said, we take the notion of general recursive function to be the precise
explication of the intuitive notion of computable function.
Church’s Thesis—HGK Form. A function is computable (in the intuitive
sense) if and only if it is general recursive.
Church’s Thesis is not a mathematical claim. It asserts the equivalence of
a mathematical notion and an intuitive one, and hence there can be no question
of proof. Of course, there can be plausibility arguments (of a more or less philo-
sophical nature). I have already argued for the direction “If general recursive then
6.3. THE NORMAL FORM THEOREM AND THE HALTING PROBLEM 93
computable”. For the converse, one might note the following. First, every particular
function that people have ever encountered and judged on intuitive grounds to be
computable has turned out to be general recursive. Second, there are in the lit-
erature other mathematical explications of the notion of computability (we’ll treat
one, namely, Turing-computability, in §??), and in each case it can be shown that
the explications are equivalent, that is, each yields exactly the same class of func-
tions. Third, if our intuitive notion of algorithm is something like that of a finite
list of instructions applied to various inputs, then any precise notions of instruction
and application of instructions should be gödelizable, and this will yield a result
analogous to the Normal Form Theorem.
A partial recursive function is “semi-computable” in the following sense: given
input hp1 , . . . , pn i, we can systematically seek a derivation of f (p1 , . . . , pn ) = r for
some r; if there is such a derivation, we shall find it eventually; but if there is no
such, we will go on forever. Of course, if the function is total then, no matter what
input we are given, our computing will eventually stop.
It would be nicer not to have to deal with partial functions. One might hope to
eliminate partial recursive functions that are not total; perhaps one could effectively
weed out those HGK-systems that define nontotal functions. To do this for 1-place
functions, one would need an effective procedure that would yield, given any HGK-
system E, a “yes” if E defines a 1-place total function and a “no” if not. However,
as we shall now see, no such effective procedure exists.
We shall speak of effective procedures whose inputs are index numbers, rather
than HGK-systems. Such procedures can then be identified with general recursive
functions (where we take output 1 to be “yes” and output 0 to be “no”).
For any number e, we use ϕe for the 1-place partial recursive function defined
by the HGK-system with index number e. For n > 1, we use ϕne for the n-place
partial recursive function defined by the HGK-system with index number e.
Unsolvability of the Totality Problem. There is no general recursive
function ψ such that, for every integer e,
1 if ϕe is total
ψ(e) =
0 if ϕe is not total.
Proof. Suppose such a ψ exists. By Facts 6.1 and 6.2, the function η(e, q) = ψ(e) ·
Der1 (e, e, q) is general recursive. From the supposition about ψ, we have η(e, q) = 0
if either ϕe is not total or else ϕe is total and q is the gödel number of a derivation
94 CHAPTER 6. COMPUTABILITY
in the HGK-system with index number e of an equation f (e) = r for some r. Hence
∀e∃q[η(e, q) = 0]. By Fact 6.3, the function µq[η(e, q) = 0] is general recursive. By
Facts 6.1 and 6.2, the function δ(e) = Res(µq[η(e, q) = 0]) + 1 is general recursive.
By the definition of Res we have, for every e,
ϕe (e) + 1 if ϕe is total
δ(e) =
1 if ϕe is not total.
Now δ, being general recursive, is identical to ϕe0 for some e0 , and ϕe0 is thus
total. But then we have ϕe0 (e0 ) = ϕe0 (e0 ) + 1, a contradiction. (The reader should
compare this proof to the heuristic proof about p.r. functions given in §??).
Thus there is no effective way to weed out HGK-systems that fail to define total
functions. Perhaps then, one could hope at least to “patch up” those HGK-systems
that so fail. That is, if ϕe takes no value on p, why not just set the value equal to
0? But to do this, one would need an effective procedure for telling, for any e and
any p, whether p is in the domain of ϕe or not. This too turns out to be impossible.
Unsolvability of the Halting Problem. There is no 2-place general recur-
sive function ψ such that
1 if ϕe takes a value on p
ψ(e, p) =
0 if ϕe takes no value on p.
Proof. Suppose there were such a ψ. Then there exists a partial recursive function
δ such that, for all e,
0 if ϕe (e) has no value
δ(e) =
no value if ϕe (e) has a value.
Namely, let δ(e) = µk(ψ(e, e) + k = 0). By the supposition, if e is not in the domain
of ϕe then ψ(e, e) = 0, so that δ(e) = 0; and if e is in the domain of ϕe then
ψ(e, e) = 1 so that δ takes no value on e.
Now, since δ is partial recursive, it is ϕe0 for some e0 . By the specification of δ
we then have: if e0 is in the domain of ϕe0 then e0 is not in the domain of δ, i.e., e0
is not in the domain of ϕe0 ; and if e0 is not in the domain of ϕe0 then δ(e0 ) = 0 so
that e0 is in the domain of ϕe0 This is a contradiction.
6.4. TURING MACHINES 95
The unsolvability of the totality and halting problems show that we cannot
eliminate nontotal partial recursive functions by any effective procedure. This is as
we should have expected, given the heuristic argument of §??. The cost of capturing
all computable functions by means of HGK-systems is that we cannot effectively
avoid those HGK-systems that do not define total functions.
• • • • •
where each dot represents either a blank or else a symbol of some sort, and the tape
extends without limit in both directions from the depicted segment.
We may conceive of the machine as a mechanism that, at any moment, sits over
a cell of the tape. The machine can scan the cell it sits over; it then takes the symbol
(or blank) inscribed in that cell into account, and does something. The something
it may do includes replacing the symbol with another and includes moving on—that
is, shifting to the next cell to the right or to the next cell to the left. What the
machine does at any point is determined by a finite list of instructions that we have
given the machine. (Since we do not care about anything but the behavior of the
machine, we may say that a machine is just its instructions.)
To be more precise, at any particular moment the machine is in one of a finite
96 CHAPTER 6. COMPUTABILITY
number of states. Each machine-instruction has the form: if in state i and the
cell being scanned contains symbol t, then do so-and-so and go into state j. The
so-and-so has two parts: the first is either erase the symbol t or replace t with t0
or leave t as it is; the second is either stay put or move left one cell or move right
one cell. In short, a Turing machine is specified by: first, a list of states, that is,
a specification of how many states there are; second, a finite alphabet of symbols
(including blank); and third, a finite list of instructions. Each instruction has the
form of a quintuple
hi, t, t0 , X, ji.
where i and j are numbers no greater than the number of states, t and t0 are
members of the alphabet, and X is either “D” (dont move), “L” (move left one), or
“R” (move right one). The instruction may be read: if in state i and scanning a cell
containing t, then replace t with t0 , move as X directs, and go into state j. To make
the machine deterministic—that is, at each juncture there is at most one applicable
instruction—we insist that no two instructions have the same first two members.
Given a Turing machine M , we may investigate the behavior of M when it is
started in a particular state at a particular place on a given tape.
Example. Let M be the Turing machine that has 4 states, whose alphabet is
just B (blank) and | (stroke), and whose instructions are:
h1, |, |, R, 1i h1, B, B, L, 2i h2, |, B, L, 3i
h3, B, |, D, 4i h3, |, |, D, 4i
How does this machine work? Let us first consider what happens if the machine
starts in state 1 situated at a cell containing a stroke and to the left and right of
which are cells containing blanks. We may symbolize this initial situation thus:
B B | B B
Since the machine is in state 1 and is scanning a stroke, the first instruction is
applicable. Thus the machine leaves the stroke as is, moves right one cell, and
remains in state 1. So heres how things look after this first move.
6.4. TURING MACHINES 97
B B | B B
The second instruction is now applicable. So the machine moves to the left and goes
into state 2. After this second move, we have:
B B | B B
The third instruction is now applicable. Hence the machine erases the stroke, moves
left one, and goes into state 3.
B B B B B
At this point the fourth instruction applies, so the machine writes a stroke in the
B | B B B
B | | | B
B | | | B
B | | | B
B | | | B
1
6.4. TURING MACHINES 99
B | | | B
B | | B B
B | | B B
The machine halts after six steps, since it is in state 4 and scanning a cell containing
a stroke.
In general, suppose M is started in state 1 scanning a cell containing a stroke to
the right of which there are n > 0 cells containing strokes and then a cell containing
a blank. The machine then moves to the right through all the cells that contain
strokes until it encounters the cell that contains a blank; then it backs up, erases the
last stroke, backs up once more, and halts. We may summarize its behavior more
easily after we introduce some terminology.
A tape represents a number n ≥ 0 iff the tape is blank but for n + 1 consecutive
cells each of which contains a stroke. To start a machine on input n is to start
the machine in state 1 situated at the leftmost stroke in a tape representing n. A
Turing machine yields m on input n iff when the machine is started on input n it
eventually halts, and at the moment when it halts, the tape represents m.
Thus for each n ≥ 0, the Turing machine M above yields Pred(n) on input n
100 CHAPTER 6. COMPUTABILITY
There is no Turing machine M such that, for all e and n, if the Turing
machine numbered e halts on input n, then M yields 1 on input he, ni,
and if the Turing machine numbered e does not halt on input n, then
M yields 0 on input he, ni.
For suppose M were such a Turing machine. Let N be the machine provided by the
fact granted above, and let d be the gödel number of N . From the specification of
M we have that M yields 1 on input hd, di iff N halts on input d; but from the
definition of N we have that N halts on input d iff M yields 0 on input hd, di. Thus
we obtain a contradiction, and we may conclude that no such M exists.
102 CHAPTER 6. COMPUTABILITY
Appendix. Formal treatment. Lest the reader be carried away by the rather
pictorial nature of the preceding section, we indicate here how Turing machines and
their behavior may be defined more formally. Let S be a finite set of symbols,
including “B” and “|”, and let q1 , q2 , . . . be symbols not in S. Then a Turing
machine M (on S) is simply a finite set of quintuples
hqi , t, t0 , X, qj i,
where t and t0 are in S and X is one of the symbols “D”, “L”, or “R”, such that no
two distinct quintuples have the same first two members. (The symbol qi represents
state i.)
We now seek to formalize the notion of “the situation of a machine at a given
time”. Note that in all our work we have been dealing with tapes that are blank
in all but a finite number of cells. Thus all we need to encode is: what that finite
stretch of tape that contains all the nonblank cells looks like; where the machine is
(what cell it is scanning); and what state the machine is in. We can capture this
by a notion of instantaneous description (id): an instantaneous description is any
string of the form P qi tQ, where t is a member of S and P and Q are (possibly
empty) strings of symbols from S.
We now define the notion that encodes the operation of a machine M according
to its instructions. Let I and J be id’s. We say that I M -produces J iff either
4. There are (possibly empty) strings P and Q such that I is P sqk tQ, J is
P qm st0 Q, and hqk , t, t0 , L, qm i is a quintuple in M ;
or
6.5. UNDECIDABILITY 103
The notion of yielding on inputs that are n-tuples can be defined in similar fashion.
Thus we see that Turing machines can be treated as nothing more than (pecu-
liar) types of formal systems. The point of this is, in part, simply to make clear that
we may gödelize Turing machines and their behavior. That is, we may assign index
numbers to Turing machines, and gödel numbers to instantaneous descriptions and
to finite sequences of instantaneous descriptions, in such a way that the following
holds:
1. There is, for each n > 0, a (n + 2)-place primitive recursive relation Tn such
that Tn (e, p1 , . . . , pn , q) iff e is the index number of a Turing machine M and
q is the gödel number of a finished M -computation on input hp1 , . . . , pn i.
2. There is a 1-place primitive recursive function Ans such that Ans(q) = m iff
q is the gödel number of a sequence of ids, the last id of which has the form
P qj Q such that P Q contains m + 1 strokes.
From this it then follows that for each n-place partial function ϕ that is com-
puted by a Turing machine, there is an e such that
6.5 Undecidability
In this section we are concerned with applying recursive functions to the study of
formal systems. One major issue is that of decidability. In §?? we said that a formal
system Σ is decidable iff there is a computational procedure for telling, given any
104 CHAPTER 6. COMPUTABILITY
Now either ϕ(γ(G)) = 1 or not. If ϕ(γ(G)) = 1 then `PA Φ(pGq, S0) so that
`PA ∼G; hence if PA is consistent then G is not derivable in PA. If ϕ(γ(G)) 6= 1,
then `PA ∼Φ(pGq, S0), so that `PA G. Hence, in either case, ϕ gives us the wrong
answer on G.
Note 1. The above proof should feel familiar. One could rephrase it thus: if
PA were decidable, the Bew would be numeralwise representable in PA, by dint of
the Extended Representability Theorem. But, by the Fixed Point Theorem, if PA is
consistent, then Bew is not numeralwise expressible in PA. In other words, Gödel’s
work immediately tells us that there is no primitive recursive decision procedure
for PA; and that work is extendible to any notion that will be representable in PA.
All that was necessary after 1931, then, was to formulate the appropriate general
notion of computability, and show that all computable functions were numeralwise
representable.
Note 2. We have shown that for every general recursive function ϕ there is a
formula G such that: either G is derivable and ϕ(γ(G)) 6= 1 or else ∼G is derivable
and ϕ(γ(G)) = 1. This shows that there is no way of extending PA to a system that
is both consistent and decidable. PA is therefore said to be essentially undecidable.
The proof of Church’s Theorem relies only on the fact that every recursive func-
tion is numeralwise representable in PA, and on the Fixed Point Theorem (and, of
course, the Fixed Point Theorem holds provided that the function diag is numeral-
wise representable). Every consistent formal system in which all recursive functions
are numeralwise representable is thus essentially undecidable. We now show that
even systems considerability weaker than PA are essentially undecidable.
Definition. Let Q be the formal system whose language is LPA , and whose
axioms are like those of PA except that the axiom-schema of induction is eliminated
and, in its stead, the axiom ∼x = 0 ⊃ ∃y(x = Sy) is added.
System Q is often called Robinson arithmetic, after Raphael Robinson, who first
formulated the system in 1950. Q is a very weak system, because of the absence of
induction axioms. Even quite elementary truths like ∀x(0 + x = x) are not derivable
in it. Nonetheless,
106 CHAPTER 6. COMPUTABILITY
Proof. The inclusion of the axiom ∼x = 0 ⊃ ∃y(x = Sy) yields the derivability in
Q of the formulas x ≤ m ⊃ x = 0 ∨ x = 1 ∨ . . . ∨ x = m and x ≤ m ∨ m ≤ x for
every m. A close analysis of the proof of the Representability Theorem given in §??
shows that these properties of ≤, together with the facts that x + y numeralwise
represents addition in Q, and x × y numeralwise represents multiplication in Q,
yield the representability of all primitive recursive functions in Q. From that, the
representability of all general recursive functions in Q follows by the same argument
as was used above for PA.
Proof. Let A be the conjunction of the universal closures of the seven non-logical
axioms. By the Deduction Theorem we have:
(←) If S is empty then S is the domain of the partial recursive function that is
nowhere defined. If S = range(g), where g is general recursive, let ψ(n) = µp[g(p) =
n]. Then ψ is partial recursive, and ψ(n) is defined iff n ∈ range(g). That is,
domain(ψ) = range(g) = S.
R.e. Fact 2. If a set and its complement are both r.e., then the set is recursive.
Proof. The intuitive idea is this: suppose there are search procedures for S and for
S. Given n, start both search procedures on n; since either n ∈ S or n ∈ S, at
some point one of the search procedures will terminate. If the search procedure for
S terminates, we know n ∈ S; if that for S terminates, we know n ∈ / S. Thus we
have a decision procedure for membership in S.
More rigorously, suppose S = domain(ϕd ) and S = domain(ϕe ). Let ψ(n) =
µp[Der1 (d, n, p) · Der1 (e, n, p) = 0]. Since the application of µ is licensed, ψ is
general recursive. Let g(n) = α(Der1 (d, n, ψ(n))). Then g is general recursive.
Moreover, if n ∈ S then Der1 (d, n, ψ(n)) = 0, so that g(n) = 1. If n ∈ / S then
Der1 (d, n, ψ(n)) 6= 0, so that g(n) = 0. Thus g is the characteristic function of
S.
R.e. Fact 2 can be extended to k-place relations for k > 1, if we extend our
definitions in the obvious way: a k-place relation is recursive iff its characteristic
function is general recursive, and is recursively enumerable iff it is the domain of
some k-place partial recursive function. The following result relates r.e. sets to 2-
place recursive relations. It can easily be extended to relate k-place r.e./ relations
to (k + 1)-place recursive relations.
R.e. Fact 3 A set is r.e. iff there exists a 2-place recursive relation R such
that, for each n, n ∈ S iff (∃p)R(n, p).
Proof. (→) Suppose S = domain(ϕe ). Then n ∈ S iff (∃p)(Der1 (e, n, p) = 0), and,
for any e, Der1 (e, n, p) = 0 is a recursive relation of n and p. (Indeed, it is primitive
recursive.)
(←) Let R be a 2-place recursive relation, and let S be the set of integers n
such that (∃p)R(n, p). Let χ be the characteristic function of R; thus χ is general
recursive, so that the function ψ(n) = µp[χ(n, p) = 1] is partial recursive. Clearly
n ∈ S iff n ∈ domain(ψ); hence S is r.e.
Thus, a set is r.e. iff it can be obtained from a recursive relation by existential
quantification.
6.6. RECURSIVE AND RECURSIVELY ENUMERABLE SETS 109
(a) If two sets are r.e., then so are their union and their intersection.
(b) If ψ is any partial recursive function and k is any integer, then {n | ψ(n) = k}
is r.e.
(d) A set is r.e. iff it is the range of some partial recursive function.
Proof. n is in this set iff (∃m)DerΣ (m, n). The result thus follows by R.e. Fact 3.
Proof. For each n, let h(n) be the gödel number of F (n); h is primitive recursive,
and hence general recursive. By Mirroring, `Σ F (n) iff (∃m)DerΣ (m, h(n)). The
result then follows by R.e. Fact 3.
Claim 2 shows that every set weakly representable in Σ is r.e. Claim 3 shows
that each of the formal systems PA, SA, and Q numeralwise represent the same sets
(assuming they are all consistent), to wit, the recursive sets. Moreover, assuming
these systems are ω-consistent, they all weakly represent the same sets, to wit, the
recursively enumerable sets.
110 CHAPTER 6. COMPUTABILITY
functions.
One can put this into HGK-system language too: there is one HGK-system
that “incorporates” all HGK-systems.
Enumeration Theorem. For each n > 0 there is a universal partial recursive
function Un of n + 1 arguments; that is, for all integers e, p1 , . . . , pn ,
Un (e, p1 , . . . , pn ) = ϕ(n)
e (p1 , . . . , pn ).
Proof. Define ψ thus: for each m, ψ(m) = ϕm (m) + 1 (as usual, with the convention
that if the right side is undefined then so is the left). Then ψ is partial recursive,
since ψ(m) = U1 (m, m) + 1, and thus ψ comes from the universal function U1 by
composition with the successor function.
Now let ϕe be any 1-place general recursive function. By definition of ψ, ψ(e) =
ϕe (e) + 1. Since ϕe is total, ϕe (e) is defined; thus ψ(e) is defined and ψ(e) 6= ϕe (e).
Hence no general recursive function agrees with ψ at all places at which ψ is defined.
(The sharp-eyed reader will have noted the similarity between this proof and that
of the Unsolvability of the Totality Problem, page ?? above.)
Proof. Let ψ(p) = U1 (p, p). Then ψ is partial recursive, and its domain is precisely
K. Hence K is recursively enumerable. To show that K is not recursive it suffices
to show that K is not recursively enumerable. Suppose K were the domain of a
partial recursive function ϕm . Then m ∈ K iff ϕm (m) is defined. Thus m ∈ K iff
m ∈ K, by the specification of K. This is a contradiction.
112 CHAPTER 6. COMPUTABILITY
Proof. Consider the following syntactic operation on the HGK-system that defines
ψ: given an integer e, first reletter the function letter f as fk , for k large enough
to avoid conflicts; then add the equation f (x) = fk (e, x). Clearly, the resulting
HGK system defines the partial function that takes each n to ψ(e, n). Moreover, by
gödelization, the function that takes e to the index number of the resulting HGK-
system is general recursive (indeed, primitive recursive). That function is the desired
h.
Note. There are forms of the Uniformization Theorem for more arguments.
E.g., let ψ be a 3-place partial recursive function. Then there exists a 2-place general
recursive g such that ϕg(d,e) (n) = ψ(d, e, n) for all d, e, and n. In the literature, the
general form of the Uniformization Theorem is called the “s–n–m Theorem”.
The Uniformization Theorem is extremely useful in establishing the nonrecur-
siveness of various sets of index numbers. We use it to establish reducibilities.
Definition. Let A and B be sets of integers. A is many-one reducible to B (in
symbols A ≤m B) iff there exists a general recursive function h such that, for each
n, n ∈ A iff h(n) ∈ B.
If A ≤m B. then the question of membership in A is reducible to the question
of membership in B: if one knew how to decide membership in B, one would then
know how to decide membership in A.
Lemma. If A ≤m B and B is recursive, then A is recursive. If A ≤m B and
B is recursively enumerable, then A is recursively enumerable.
6.7. RECURSIVE FUNCTION THEORY 113
Proof. Let h be a general recursive function such that, for all n, n ∈ A iff h(n) ∈
B. If B is recursive, then the characteristic function χ of B is recursive; and the
characteristic function of A is the composition of the characteristic function of B
and h, and so is itself recursive. Hence A is a recursive set. If B is recursively
enumerable then it is the domain of some ϕe . Now the partial function that takes
each n to ϕe (h(n)) is partial recursive; and A is its domain. Hence A is recursively
enumerable.
{e | 0 is in the domain of ϕe };
{e | the domain of ϕe is not empty};
{e | the domain of ϕe is infinite};
{e | ϕe is a total constant function}.
Proof. For any e and n, let ψ(e, n) = ϕe (e). By the Enumeration Theorem, ψ is
partial recursive. By the Uniformization Theorem, there exists a general recursive
function h such that, for all e and n, ϕh(e) (n) = ψ(e, n). Thus we have:
Thus, if S is any of the sets listed in the statement of the result, e ∈ K iff h(e) ∈ S.
The result follows by the Reduction Lemma.
Hence e ∈
/ K iff ϕg(e) has infinite range, and we are done.
Proof. The recursive function h obtained in the proof of Result 3 has the property
that e ∈ K iff h(e) ∈ Tot. Thus e ∈ K iff h(e) ∈ Tot, so that K ≤m Tot. If Tot were
r.e., then, by the Reduction Lemma, K would be r.e. But in the proof of Result 2
above, we showed that K is not r.e. Hence Tot is not r.e.
To show Tot is not r.e., we proceed in a manner similar to the proof of the
Unsolvability of the Totality Problem. Suppose Tot is r.e. Thus there exists a
recursive function g such that Tot = range(g). Let ψ(n) = ϕg(n) (n) + 1. ψ is partial
recursive, by the Enumeration Theorem. ψ is total, since for each n ϕg(n) is total.
Let d be an index number for ψ, i.e., let ψ = ϕd . Since ψ is total, d ∈ range(g). Let
e be such that d = g(e). Then ϕd (e) = ϕg(e) (e), but also ϕd (e) = ψ(e) = ϕg(e) (e)+1.
This is a contradiction.
The above proof shows how to obtain, given any r.e. subset of Tot, a recur-
sive function no index for which lies in the subset. This can be used to prove the
following striking result: for any ω-consistent system there are total recursive func-
tions that cannot be proved to be total in the system. For let Σ be an ω-consistent
formal system, and let S be the set of integers e such that the formalization of
(∀p)(∃q)(Der1 (e, p, q) = 0) is derivable in Σ. By Claim 2 of §??, S is r.e. By the
ω-consistency of Σ, if that formalization is derivable in Σ then ϕe is total. Hence S
is an r.e. subset of Tot, and there exists a recursive function ψ no index for which
is in S. That is, for all d, if ψ = ϕd then d ∈
/ S. Thus ψ cannot be proved in Σ to
be total.