0% found this document useful (0 votes)
9 views118 pages

Goldfarb NotesonMetamath

Uploaded by

JaZz SF
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views118 pages

Goldfarb NotesonMetamath

Uploaded by

JaZz SF
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

Notes on Metamathematics

Warren Goldfarb

W.B. Pearson Professor of Modern Mathematics and Mathematical Logic

Department of Philosophy
Harvard University

DRAFT: January 1, 2018


In Memory of Burton Dreben (1927–1999), whose spirited
teaching on Gödelian topics provided the original inspiration
for these Notes.
Contents

1 Axiomatics 1
1.1 Formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Axioms and rules of inference . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Natural numbers: the successor function . . . . . . . . . . . . . . . . 9
1.4 General notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Peano Arithmetic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Basic laws of arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Gödel’s Proof 23
2.1 Gödel numbering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Primitive recursive functions and relations . . . . . . . . . . . . . . . 25
2.3 Arithmetization of syntax . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Numeralwise representability . . . . . . . . . . . . . . . . . . . . . . 35
2.5 Proof of incompleteness . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6 ‘I am not derivable’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3 Formalized Metamathematics 43
3.1 The Fixed Point Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Gödel’s Second Incompleteness Theorem . . . . . . . . . . . . . . . . 47
3.3 The First Incompleteness Theorem Sharpened . . . . . . . . . . . . . 52
3.4 Löb’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4 Formalizing Primitive Recursion 59


4.1 ∆0 , Σ1 , and Π1 formulas . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Σ1 -completeness and Σ1 -soundness . . . . . . . . . . . . . . . . . . . 61
4.3 Proof of Representability . . . . . . . . . . . . . . . . . . . . . . . . 63

3
5 Formalized Semantics 69
5.1 Tarski’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2 Defining truth for LPA . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 Uses of the truth-definition . . . . . . . . . . . . . . . . . . . . . . . 74
5.4 Second-order Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.5 Partial truth predicates . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.6 Truth for other languages . . . . . . . . . . . . . . . . . . . . . . . . 81

6 Computability 85
6.1 Computability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 Recursive and partial recursive functions . . . . . . . . . . . . . . . . 87
6.3 The Normal Form Theorem and the Halting Problem . . . . . . . . 91
6.4 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.5 Undecidability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.6 Recursive and recursively enumerable sets . . . . . . . . . . . . . . . 107
6.7 Recursive Function Theory . . . . . . . . . . . . . . . . . . . . . . . 110
Chapter 1

Axiomatics

1.1 Formal languages


The subject of our study is formal systems, that is, precisely and rigorously specified
axiom systems. A formal system has two parts: a formal language, which provides a
precisely demarcated class of expressions called formulas; and derivation rules (ax-
ioms and rules of inference), which determine a privileged class of formulas that are
said to be derivable in the formal system. We start by looking at formal languages.
A formal language is specified by giving an alphabet and formation rules. The
alphabet is the stock of primitive signs; it may be finite or infinite. The formation
rules serve to specify those strings of primitive signs that are the formulas of the
formal language. (A string is a finite sequence of signs, written as a concatenation
of the signs without separation.) In some books, formulas are called “well-formed
formulas”, or “wffs”, but this is redundant: to call a string a formula is to say it is
well-formed. A formal language must be effectively decidable; that is, there must
be a purely mechanical procedure, an algorithm, for determining whether or not
any given sign is in the alphabet, and whether or not any given string is a formula.
(In Chapter 5 we give a rigorous account of what is meant by “purely mechanical
procedure”. For now, we rely on a loose intuitive understanding of the notion.)
Here is a simpleminded example. The alphabet consists of the signs ‘∆’ and
‘?’, and the formation rules are:
(1) ‘∆?’ is a formula;
(2) if F is a formula then so is F followed by ‘??’;
(3) no string is a formula unless its being so results from clauses (1) and (2).
In this formal language, for example, ‘∆???’ is a formula, but ‘∆??’ is not. Although

1
2 CHAPTER 1. AXIOMATICS

this simpleminded formal language lacks the complexity that makes formal languages
of interest, it does illustrate some general points.
First, the formation rules are inductive: to specify the class of formulas we
stipulate first that certain particular strings are formulas, and then that certain
operations, when applied to formulas, yield new formulas. Finally, we specify that
only strings obtained in this way are formulas. Thus, the class of formulas is the
smallest class containing the string mentioned in clause (1) and closed under the
operation given in clause (2). Henceforth, in our inductive definitions we shall tacitly
assume the final “nothing else” clause.
Moreover, these inductive rules have a special feature. The operations for con-
structing new formulas out of old ones increase the length of the string. This feature
makes it easy to check whether a string is a formula. Hence it yields the effective
decidability of the class of formulas.
In specifying the alphabet above, we wrote down not the signs themselves but
rather the names of those signs, just as in specifying a bunch of people we would write
down names of those people. Much of what we shall be doing in this book is talking
about formal languages. The language in which we talk about formal languages
is the metalanguage, which is English amplified by some technical apparatus. The
language being talked about is often called the object language.
In clause (2) of the formation rules we used the syntactic variable ‘F ’. Upper-
case eff is not part of the formal language; it is part of the metalanguage. In the
metalanguage it functions as a variable, ranging over strings of signs.
A short discussion of use and mention is now in order. We use words to talk
about things. In the sentence

(a) Frege devised the notion of formal system.

the first word is used to refer to a German logician. The sentence mentions this
logician, and uses a name to do so. Similarly, in the sentence

(b) The author of Begriffsschrift devised the notion of formal system.

the same logician is mentioned, and the expression consisting of the first four words
is used to mention him. We shall speak of complex expressions like those four words
also as names, so in (b) we have a complex name of Frege. In general, to speak of
an object we use an expression that is a name of that object or, in other words, an
expression that refers to that object. Clearly the object mentioned is not part of
the sentence; its name is.
1.1. FORMAL LANGUAGES 3

Confusion may arise when we speak about linguistic entities. If I wish to men-
tion (talk about) an expression, I do not use that expression—for if I did I would
be mentioning the object that the expression refers to, if any. Instead, I should use
a name of that expression. Thus I might say
The first word of sentence (a) refers to a German logician.
Here, the first six words comprise a name of a linguistic expression. One standard
manner of forming names of expressions is to surround the expression with single
quotation marks. Thus we may say
‘Frege’ refers to a German logician.
Similarly,
‘The author of Begriffsschrift’ refers to a German logician.
Thus, to obtain a true sentence from ‘ is a primitive sign of the simpleminded
formal language’ we must fill the blank with a name of a primitive sign, not with
the primitive sign, for example:
‘?’ is a primitive sign of the simple-minded formal language.
We cannot say: ? is a primitive sign. That is nonsense, since a star is not a sign of
English.
Now let us consider the use of syntactic variables. Let F be a formula of the
simple-minded formal language. Here ‘F ’ is used as a syntactic variable. Our forma-
_
tion rules tell us that F followed by ‘??’ is also a formula. We also say that F ‘??’
_
is a formula, where ‘ ’ is a metalinguistic sign for the concatenation operator. We
_ _
may even say that F ‘?’ ‘?’ is a formula (and read this as follows: F concatenated
_
with star concatenated with star), for, after all, ‘?’ ‘?’ is identical with ‘??’. We
may not speak this way: F ? ? is a formula; nor this way: ‘F ? ?’ is a formula. The
former doesn’t work, since ‘F ? ?’ is not a referring expression, and hence cannot
be used to mention anything. The latter doesn’t work, since ‘F ? ?’ is the string
consisting of an upper-case eff and two stars, and this string is not even a string of
signs of the formal language, much less a formula.
Similarly, we can say: let G be a string consisting of an odd number of stars;
_
then ‘∆’ G is a formula. Also, let F be a formula and let G be a string consisting
_
of an even number of stars; then F G is a formula.
The distinctions between name and thing named, and between syntactic vari-
able and formal sign, should be thoroughly understood, particularly because—for
4 CHAPTER 1. AXIOMATICS

the sake of brevity—we shall soon abandon the pedantic mode of speech in which
the distinctions are strictly observed.
Now let us given an example of a formal language more typical of those consid-
ered in logical studies, which we shall call L= , the language of identity. The alphabet
consists of the six signs ‘=’, ‘∼’, ‘⊃’, ‘∀’, ‘(’ and ‘)’ along with the formal variables
‘x’, ‘y’,‘z’, ‘x0 ’, ‘y 0 ’, ‘z 0 ’, ‘x00 ’, . . . . The first four signs are called the identity sign,
the negation sign, the conditional sign, and the universal quantifier. The formation
rules are as follows:
_ _
(1) if u and v are formal variables, then u ‘=’ v is a formula;
_ _ _ _ _
(2) if F and G are formulas, then ‘∼’ F and ‘(’ F ‘⊃’ G ‘)’ are
formulas;
_ _ _ _
(3) if F is a formula and u is a formal variable, then ‘∀’ u ‘(’ F ‘)’ is
a formula.

Thus the following are examples of formulas of L= :

∼∀x0 (x0 = y 0 )

∀x(∼x = 0 ⊃ ∼∀y(∼(x = y)))


The formation rules just set out are hard to read, due to the care with which
we have abided by the use-mention distinction, and in particular have differentiated
syntactic variables from signs of the formal language. For brevity, we now introduce
some metalinguistic conventions that enable us to gloss over these niceties. More
precisely, we indulge in a bit of ambiguous notation. We let each sign of the formal
language serve in the metalanguage as a name of itself, thereby eliminating the need
for quote-names of those signs (although we shall ordinarily continue to use quote-
names when mentioning a sign individually). Moreover we use concatenation in the
metalanguage to represent concatenation of strings of signs. With these conventions
we may rewrite the rules as follows:

(1) if u and v are formal variables, then u = v is a formula;


(2) if F and G are formulas, then ∼F and (F ⊃ G) are formulas;
(3) if F is a formula and u is a formal variable then ∀u(F ) is a formula.

Second, now that we have set up the precise formation rules, we shall often omit
parentheses when we talk of formulas. For example, we will usually drop the the
1.2. AXIOMS AND RULES OF INFERENCE 5

outermost parentheses, and use x = y ⊃ y = x for the formula (x = y ⊃ y = x). We


will also drop them from consecutive quantifiers, using, say ∀x∀y(x = y) to mean
the formula ∀x(∀y(x = y)); similarly we will use ∀x(x = x ⊃ x = x) for the formula
∀x((x = x ⊃ x = x)). Finally, we will drop them from around a syntactic variable in
a quantification, writing ∀xF instead of ∀x(F ). Of course, we omit parentheses only
when it is clear and unambiguous how to reinstate them so as to yield a formula.
We end this section, first, by recalling an important syntactic definition from
elementary logic. If u is a formal variable, then the quantifier ∀u is said to bind
u. An occurrence of u in a formula is bound if it is within the scope of a quantifier
binding u; otherwise the occurrence is free. A formula with free variables is said
to be open and without them closed. A closed formula is also called a sentence.
Second we define the instances of a universal quantification ∀uF as those formulas
obtained from F by substituting a variable v for the free occurrences of the variable
u, provided those newly introduced occurrences of v all remain free. Another way of
putting the definition is this: call a variable v free for u in F iff no free occurrence
of u in F lies in the scope of a quantifier binding v. Then an instance of ∀uF is any
formula obtained from F by replacing free occurrences of u with occurrences of a
variable that is free for u in F . (Note that u is free for u in F , so F itself counts as
an instance of ∀uF .

1.2 Axioms and rules of inference


A formal system consists of a formal language together with derivation rules, which
split into two parts. The first part stipulates that certain formulas are axioms; the
second part provides rules of inference for obtaining formulas from other formulas.
Given axioms and inference rules, we define the notions of derivation and derivability
as follows: A derivation is a finite sequence F1 , . . . , Fn of formulas such that each
formula in the sequence either is an axiom or else results by a rule of inference from
formulas that precede it in the sequence. A derivation F1 , . . . , Fn is said to be a
derivation of its last formula Fn . A formula is derivable iff there is a derivation of it.
If Σ is a formal system and F is a formula of the formal language, we write ‘`Σ F ’
for ‘F is derivable in Σ’, omitting the subscript ‘Σ’ when the context fixes which
formal system is meant.
The derivation rules of a formal system must be effective: there must be a purely
mechanical procedure for determining, given any formula F , whether or not F is an
axiom; and a purely mechanical procedure for determining, given a formula F and
some other formulas, whether or not F results from the other formulas by a rule
6 CHAPTER 1. AXIOMATICS

of inference. It follows that there is a purely mechanical procedure for determining


whether or not any given sequence of formulas is a derivation. The underlying idea
is that whether or not a sequence of formulas is a derivation is to be determined by
looking at the syntactic structure of the formulas.
We now give the derivation rules for a formal system, Σ= , whose formal language
is L= . To help in discussing the rules, we split the axioms into three groups and
give them labels.

Truth-functional axioms. If F , G, and H are formulas then the following are axioms:

(T1) F ⊃ (G ⊃ F )

(T2) (F ⊃ (G ⊃ H)) ⊃ ((F ⊃ G) ⊃ (F ⊃ H))

(T3) (F ⊃ G) ⊃ (∼G ⊃ ∼F )

(T4) F ⊃ ∼ ∼ F

(T5) ∼ ∼ F ⊃ F

Quantificational axioms. If F is a formula, F 0 an instance of F, u a formal variable,


and G a formula in which u is not free, then the following are axioms:

(Q1) ∀u(F ) ⊃ F 0

(Q2) ∀u(G ⊃ F ) ⊃ (G ⊃ ∀uF )

Axioms of identity.

(I1) x = x

(I2) x = y ⊃ (F ⊃ G), where F and G are any formulas that differ only in that
G has free y at some or all places where F has free x.

These axioms express the traditional logical understandings of the signs of our
formal language: ‘∼’ and ‘⊃’ are to be read as negation and material conditional
(not and if-then), ‘∀’ as the universal quantifier (for all) and ‘=’ as identity.

Rules of inference. Let F and G be formulas and u be a variable,


1.2. AXIOMS AND RULES OF INFERENCE 7

Modus ponens: G may be inferred from F and F ⊃ G


Universal generalization: ∀uF may be inferred from F .

Note that in specifying the truth-functional axioms we gave axiom schemata, each
of which gives rise to an infinite number of axioms by replacement of the syntactic
variables with formulas. Similarly for the quantificational axioms and the second
axiom of identity. In contrast, axiom (I1) is a particular formula. The axioms
generated by (Q1) are axioms of universal instantiation; by dint of them and modus
ponens, if a formula is derivable then so are its instances.
Let us give several examples of derivations in this formal system. The first is
fairly straightforward.

x=x
∀x(x = x)
∀x(x = x) ⊃ y = y
y=y

Here, the first formula is axiom (I1), the second results from it by application
of the rule of universal generalization, the third is an axiom (Q1) of universal in-
stantiation, and the fourth results from the second and third by modus ponens.
This illustrates a general feature of the system. If a formula F containing a free
variable u is derivable, then so will be the formula obtained from F by relettering
the variable, as long as it remains free.
Our next derivation is a little more complex.

x = y ⊃ (x = x ⊃ y = x)
(x = y ⊃ (x = x ⊃ y = x)) ⊃ ((x = y ⊃ x = x) ⊃ (x = y ⊃ y = x))
(x = y ⊃ x = x) ⊃ (x = y ⊃ y = x)
x = x ⊃ (x = y ⊃ x = x))
x=x
x=y⊃x=x
x=y⊃y=x

The first, second, fourth, and fifth formulas are axioms ((I2), (T2), (T1), (I1),
respectively). The third results from the first two by modus ponens; the sixth
results from the fourth and fifth by modus ponens; and the last results from the
third and sixth by modus ponens. Thus the symmetry of ‘=’ is derivable, using just
the truth-functional and identity axioms.
8 CHAPTER 1. AXIOMATICS

Finally, we want to show that the transitivity of ‘=’ is derivable. Here we shall
use the observation above about relettering of free variables, and not go through
the steps of using universal generalization and instantiation. First we note that
` x = y ⊃ (x0 = x ⊃ x0 = y), since it is an axiom (I2). (Recall that ‘`’ means
“is derivable”.) Relettering y as z, we obtain ` x = z ⊃ (x0 = x ⊃ x0 = z), then,
relettering x as y, ` y = z ⊃ (x0 = y ⊃ x0 = z), and finally relettering x0 as x,
` y = z ⊃ (x = y ⊃ x = z). This last formula is a way of expressing transitivity.
We have just shown by a metamathematical argument that y = z ⊃ (x = y ⊃
x = z) is derivable, that is, that there is a sequence of formulas obeying certain
syntactic restrictions and ending with ` y = z ⊃ (x = y ⊃ x = z). In this
argument we did not actually exhibit the derivation. Of course, we can always show
a formula derivable in PA by giving a derivation of it, by writing down the sequence
of formulas. But since formal derivations quickly become very long and tedious, we
eschew these direct verifications of derivability. Instead, we show general principles
about derivability and use them to show that a derivation exists. It is essential to
bear in mind that the metamathematical arguments are not the derivations: they
establish the existence of derivations without actually exhibiting them.
An especially useful general principle for establishing derivabilities is this: ax-
ioms (T1)–(T5) together with modus ponens yield the derivability of all truth-
functionally valid formulas. (A formula is truth-functionally valid if it is built up
from some parts by use of ‘∼’ and ‘⊃’, and every assignment of truth-values to those
parts makes the whole formula come out true.) In a phrase, the system is truth-
functionally complete. (T1)–(T5) were axioms of the first fully laid-out axiomatic
system for truth-functional logic, namely that of Frege in Begriffsschrift (1879). It
is impressive that he formulated a system that turned out to be truth-functionally
complete, even though the concept of truth-functional validity was not articulated
until nearly forty years later. (Frege’s system had an additional axiom, which turned
out to be redundant.) The first published proof of the truth-functional complete-
ness of an axiomatic system was due to the American logician Emil Post (1921).
We won’t pause now to prove the property for our system; a proof is outlined in
Appendix §1.
Here is a typical application of truth-functional completeness. A more natural
expression of transitivity than the formula used above is x = y ⊃ (y = z ⊃ x = z).
To show it derivable, note that the formula

(y = z ⊃ (x = y ⊃ x = z)) ⊃ (x = y ⊃ (y = z ⊃ x = z))
is truth-functionally valid (it has the form (F ⊃ (G ⊃ H) ⊃ (G ⊃ (F ⊃ H))). Hence
1.3. NATURAL NUMBERS: THE SUCCESSOR FUNCTION 9

it is derivable. Since ` y = z ⊃ (x = y ⊃ x = z), we obtain ` x = y ⊃ (y = z ⊃


x = z) by modus ponens. In fact, we can make the argument more concise yet. The
fact that the displayed formula above is truth-functionally valid is just the fact that
(y = z ⊃ (x = y ⊃ x = z)) truth-functionally implies (x = y ⊃ (y = z ⊃ x = z)).
Truth-functional completeness (and modus ponens) tell us that if one derivable
formula truth-functionally implies another, then the other is also derivable.
An even more natural formulation of transitivity would be (x = y y = z) ⊃ x =
z, where ‘’ is a sign for conjunction (and). However, L= has no sign for conjunction.
Yet, of course, conjunction and all the other truth-functions are definable in terms
of negation and conditional. Hence we proceed as follows: we introduce several
metalinguistic signs as abbreviations, namely the signs , ∨ and ≡. If F and G are
formulas we write
F G for ∼(F ⊃ ∼G)
F ∨G for ∼F ⊃ G
F ≡G for (F ⊃ G)  (G ⊃ F )

Note that these are not signs of the formal language, nor are they “defined signs
of the formal language” (whatever that would mean), nor are we proposing a new
formal language that incorporates them (although that, of course, could be done).
They are signs of the metalanguage, used to provide short names of long formulas.
For example, we can write x = y  y = z ⊃ x = z, and mean thereby the formula
∼(x = y ⊃ ∼y = z) ⊃ x = z. Since x = y ⊃ (y = z ⊃ x = z) truth-functionally
implies x = y  y = z ⊃ x = z, it follows that the latter is also derivable.
We also introduce ‘∃’ as a metalinguistic abbreviation, writing ∃uF for ∼∀u(∼F ).
Now if ∀u(∼F ) ⊃ ∼F 0 is an axiom (Q1), we may note that since it truth-functionally
implies F 0 ⊃ ∼∀u(∼F ), we obtain the derivability of F 0 ⊃ ∃uF , which is a form of
existential generalization.

1.3 Natural numbers: the successor function


The focus of our metamathematical attention in this book are formal systems in-
tended to capture laws of arithmetic, that is, properties of the natural numbers
(nonnegative integers) 0, 1, 2, 3, . . . . The successor function is the function that
takes each natural number to the next one; the natural numbers are generated from
0 by repeated application of this function. Let us start by giving a system that for-
malizes properties of this function. The formal language LS amplifies the alphabet
10 CHAPTER 1. AXIOMATICS

of language L= by adding two primitive signs, ‘0’ and ‘S’. In order to specify the
formulas, we first specify a class of strings called terms.

1. ‘0’ is a term; any formal variable is a term;

2. if t is a term then so is St.

Thus a term is ‘0’ or a formal variable by itself, or a string of successive occur-


rences of ‘S’ followed by either ‘0’ or a formal variable. Now, as for formulas:

1. if s and t are terms, then s = t is a formula;

2. if F and G are formulas,then ∼F and (F ⊃ G) are formulas;

3. if F is a formula and v is a formal variable then ∀v(F ) is a formula.

Clearly (2) and (3) are the same rules for constructing complex formulas from
simple ones as in L= . The only difference in the formation rules lies in in the
specification of the atomic formulas, those licensed by clause (1), which do not
contain ‘∼’ ‘⊃’, or ‘∀’: the atomic formulas now are equations between terms of LS ,
rather than just equations between formal variables.
We now specify axioms for a system ΣS . The logical truth-functional, quan-
tificational, and identity axioms are all framed as in §2. Of course, the formulas
referred to are formulas of LS , so in saying, for example, that for all formulas F and
G, F ⊃ (G ⊃ F ) is an axiom, we mean for all formulas F and G of LS . Moreover,
the notion of instance is expanded: instances of ∀uF are formulas that are obtained
from F by replacing u with a term, provided any variable in the term is free for u.
The axioms of successor are as follows:

(S1) ∼Sx = 0

(S2) Sx = Sy ⊃ x = y

(S3) ∼x = 0 ⊃ ∃y(x = Sy)

(S4) ∼S . . . Sx = x, where ‘S . . . S’ represents any nonempty string of successive


occurrences of ‘S’.

Axioms (S1) - (S3) are each individual formulas of LS , while (S4) is an infinite
list of formulas. As before, the rules of inference are modus ponens and universal
generalization.
1.3. NATURAL NUMBERS: THE SUCCESSOR FUNCTION 11

Because the formal system contains universal instantiation as axioms and uni-
versal generalization as a rule of inference, it doesn’t matter whether we formulate
axioms like (S1)-(S4) as open formulas, as above, or as universally quantified closed
formulas, for example, ∀x∀y(Sx = Sy ⊃ x = y). Each of these forms is derivable
from the other using (Q1), modus ponens, and universal generalization. For some
purposes, not at issue in this volume, it is important that all nonlogical axioms—
those aside from the truth-functional axioms, quantificational axioms, and axioms
of identity—be closed; hence some authors use only the universally quantified forms.
(See the Exercises for §2.5 for an example of where this is needed.) The interderiv-
ability of the closed and open forms of the axioms motivate an extension of our
terminology: we shall call any formula obtainable from an axiom by generalization
and instantiation an instance of that axiom.
As an example, let us show that x = S0 ⊃ ∼x = SSy is derivable in ΣS . We
have ` ∼0 = Sy, from an instance of (S1) and the symmetry of identity. We also
have ` S0 = SSy ⊃ 0 = Sy, an instance of (S2). By truth-functional implication,
` ∼S0 = SSy. Now, since by axiom (I2) ` x = S0 ⊃ (x = SSy ⊃ S0 = SSy), by
another truth-functional implication we can infer ` x = S0 ⊃ ∼x = SSy.
In fact ΣS can derive every formula that is true about the integers and succes-
sor. To make this claim precise, we must talk about interpretations of the language
LS . Intuitively, an interpretation of a formal language is specified by providing
meanings to the signs. The interpretation of the logical signs is fixed: ‘=’ is inter-
preted as identity, ‘∼’ as negation, ‘⊃’ as material conditional, and ‘∀’ as universal
quantification. Thus, all an interpretation of L= would need to fix is the universe
over which the quantifiers range. LS , on the other hand, also contains the signs ‘0’
and ‘S’. Syntactically, ‘0’ functions as a constant, that is, a name of an object, and
‘S’ as a one-place function sign. Hence an interpretation of LS consists of first a
(nonempty) universe, second an interpretation of ‘0’ as an element of the universe,
and third an interpretation of ‘S’ as a function on the universe whose values lie in
the universe. Here is an interpretation: the universe is the natural numbers; ‘0’ is
interpreted as zero and ‘S’ as the successor function. Thus, under this interpreta-
tion, ‘SSS0’ refers to three; and ∀x(∼Sx = 0) asserts that zero is the successor of
no natural number.
The interpretation we have just given is the intended interpretation, the one we
had in mind when we formulated LS . Other interpretations may easily be devised;
for example, we could take the universe to be all integers, negative, zero, and posi-
tive, with ‘S’ as the successor function on them, and ‘0’ as zero; or take the universe
to be the natural numbers together with two other objects, which we’ll call a and
12 CHAPTER 1. AXIOMATICS

b, and interpret ‘S’ as the successor function on the natural numbers and takes a to
b and b to a.
Suppose we have an interpretation of LS . Let F be a sentence of LS , that is, a
formula without free variables. Then either F is true under the interpretation or else
F is not true, that is, it is false under the interpretation. For example, ∀x(∼Sx = 0)
and ∀x(∼SSx = x) are true under the intended interpretation, because zero is
not the successor of any number, and no number is the double successor of itself,
whereas ∀x(∃y(x = Sy)) ⊃ (∃z(x = SSz)) is not, because not all numbers that
are successors of some number are double successors of some number (the number
one is the sole counterexample). In the first variant interpretation, ∀x(∼Sx = 0) is
false, since zero is the successor of minus one, while ∀x(∼SSx = x) is true, as is also
∀x(∃y(x = Sy) ⊃ ∃z(x = SSz)), since in fact every member of the universe is the
double successor of something. In the second variant interpretation, ∀x(∼SSx = x)
is false, because the function that interprets ‘S’ when applied twice takes a to itself,
and also takes b to itself.
What about open formulas, that is, formulas with free variables? Consider, for
example, ∀x(∼y = SSx), in which y is free. Under interpretation, this formula is
true or false once the free variable y is assigned a value in the universe of discourse.
Indeed, under the intended interpretation this formula is true for values zero, one,
and two of y, and false for all other values. (In the first variant interpretation, it is
true for no values of y.) In general, under an interpretation a formula is true or false
for given assignments of values to the free variables. We also call an open formula
true under an interpretation without qualification (that is, without reference to an
assignment of values to the free variables) iff it is true for all assignments of values
in the universe of discourse to the free variables. Thus we would say that ∼Sx = 0
is true under the intended interpretation.
Semantics is the study of interpretations of formal languages, and of properties
of signs that are defined with reference to interpretations. (In purely mathematical
contexts this is also called model theory, since interpretations are also called models,
and the intended interpretation of a system like ΣS is called the standard model.)
Syntax is the study of purely formal properties of signs and formal systems, with
no mention of interpretation. The central notion of semantics is truth under an
interpretation; the central notion of syntax is derivability. Of course there can be
connections between these notions, as we shall shortly see.
1.4. GENERAL NOTIONS 13

1.4 General notions


Let Σ be a formal system whose language contains the usual logical signs. We
define several important notions about derivability in Σ. In these definitions, we
use ‘formula’ to mean formula of the formal language of Σ.
• Σ is consistent iff for no formula F are both F derivable and ∼F derivable.
Let us call Σ consistent* iff there is a formula F that is not derivable. Clearly we
want our formal systems to be consistent*; else all formulas are derivable, and so
the system loses all interest. Our interest in having our formal systems be consistent
comes from the fact the sign ‘∼’ is the sign for negation. As a result, for most formal
systems of interest, consistency and consistency* are equivalent. (In fact they are
equivalent as long as the system is truth-functionally complete and contains modus
ponens.)

• Σ is syntactically complete iff for every sentence F either F is derivable or else


∼F is derivable.

Recall that a sentence is a formula without free variables. Call a sentence refutable
when its negation is derivable. Then the definition may be rephrased thus: Σ
is syntactically complete iff every sentence is either derivable or refutable. The
restriction to sentences, rather than arbitrary formulas, is important, since ordinarily
formulas with free variables will be neither provable nor refutable. For example, in
system ΣS the formula x = 0 is neither derivable nor refutable. Nor would we
want it to be, for if it were then by universal generalization either ∀x(x = 0) or
∀x(∼x = 0) would be derivable, and neither is a happy result, since either would
make the system inconsistent.
The property of syntactic completeness is sometimes called ‘formal complete-
ness’ or ‘negation completeness’. Note that syntactic completeness, like consistency,
is a purely syntactic notion. It is important to distinguish syntactic completeness
from other completeness notions, one of which we’ve seen already (truth-functional
completeness), one of which is defined below, and one of which we’ll encounter in
the next section. The use of the word ‘complete’ for many different notions is a
pun, historically engendered by the vague intuition that a formal system should be
called complete when it does everything we want it to do. At different times and
for different systems, what ‘we want it to do’ led to different notions.
• Σ is decidable iff there is a purely mechanical procedure for determining
whether any given formula is derivable in Σ.
14 CHAPTER 1. AXIOMATICS

We require of all formal systems only that the notion of derivation be effective, not
the notion of derivability. In particular, since the rules of inference may allow a
shorter formula to be inferred from longer formulas, there may be no obvious way
of telling from a formula how long a derivation of it might have to be. Whether or
not a system is decidable is thus a real question, and often takes considerable work
to settle.
Let us now leave syntax and define two semantic notions.

• Σ is sound with respect to an interpretation iff every formula derivable in Σ is


true under that interpretation.

• Σ is complete with respect to an interpretation iff every formula true under


that interpretation is derivable in Σ.

The power of semantic talk is illustrated by these cheerful facts: if there is at least
one interpretation with respect to which Σ is sound, then Σ is consistent; if there is at
least one interpretation with respect to which Σ is complete, then Σ is syntactically
complete. This follows from the fact that — since ‘∼ ’ is always interpreted as
negation — for any sentence F , either F is true or ∼F is true, but not both.
The definitions given in this section are completely general. Let us now restrict
attention to formal systems that use the same logical axioms and rules of inferences
as those given in §1.2. Those axioms are true under all interpretations, and the rules
of inference preserve truth: if the premises of an application of either of these rules
is true under an interpreation, so is the conclusion. Consequently, if the nonlogical
axioms of a system are true under an interpretion, then in any derivation F1 , . . . , Fn ,
every formula, either being an axiom or resulting by a rule of inference from previous
formulas, must be true in the interpretation, and so all derivable formulas will be
true under the interpretation and the system will be sound for the interpretation.
A corollary of this is: if the nonlogical axioms are true under a given interpretation
and a formula F is not true under that interpretation, then F cannot be derivable.
How do the systems we have formulated in this chapter fare under these defini-
tions? The axioms of Σ= , being purely logical, are true under every interpretation.
Obviously, then, they are consistent. The system is not syntactically complete, nor
would we want it to be, since it is intended to express the general logical laws of
identity, not the facts about a particular mathematical domain. In particular, for
example, ∀x∀y(x = y) is neither derivable nor refutable: it is true in the interpre-
tation with a one-element domain, and false in all other interpretations. Σ= is also
decidable, which is not difficult, but also not trivial, to prove. As it turns out, a
1.5. PEANO ARITHMETIC. 15

sentence F of L= is derivable in Σ= iff it holds in all universes of size at most m,


where m is the number of quantifiers in F ; and it is easy to compute when this
property holds. (See Appendix §3.)
For system ΣS , we have the following. All of (S1) - (S4) are true in the intended
interpretation, so ΣS is sound for that interpretation, and hence consistent. More-
over, ΣS is complete for that interpretation, and hence syntactically complete. This
was first shown by the French logician Jacques Herbrand, in his doctoral dissertation
(1930). His proof also yields the decidability of the system. In fact, it yields more.
For Herbrand showed how to construct, for any formula F , a formula G with no
quantifiers and the same free variables as F such that ` F ≡ G. It follows that ev-
ery formula F with the one free variable x will hold (in the intended interpretation)
either for finitely many values of x or for all but finitely many values of x. (See the
Exercises). Thus even very simple arithmetical notions like “x is odd” cannot be
expressed. To formalize more serious arithmetical facts, a larger vocabulary than
just ‘0’ and ‘S’ is needed.

1.5 Peano Arithmetic.


The standard formal system for the formalization of arithmetic is usually called
Peano Arithmetic, although this name is something of a historical misnomer (see
§5.4). It is more accurately called first-order arithmetic. We expand the vocabulary
of LS to include function signs for addition, and multiplication. Thus we take LPA
to have, as its alphabet: the logical signs ‘∼ ’, ‘⊃’, ‘∀ ’, ‘=’, ‘(‘, ‘)’, the arithmetical
signs ‘0’, ‘S’, ‘+’, ‘×’, and the formal variables ‘x’, ‘y’, ‘z’, ‘x0 ’, ‘y 0 ’, . . . . The
formation rules first specify the terms:

1. 0 is a term; each formal variable is a term.

2. If s and t are terms then so are St, (s + t), and (s × t).

So some examples of terms are SSS0, ((SS0+SSS0)×SS0), and ((Sx×Sy)+SSS0).


Now, as for formulas:

1. If s and t are terms then s = t is a formula (an atomic formula).

2. If F and G are formulas then so are ∼F and (F ⊃ G).

3. If F is a formula and u is a formal variable then ∀u(F ) is a formula.


16 CHAPTER 1. AXIOMATICS

If t is a term, u a formal variable, and F a formula, we say t is free for u in F


iff every variable occurring in t is free for u in F . The instances of a formula ∀uF
are all the formulas that can be obtained from F by replace the free occurrences of
u by occurrences of a term t that is free for u in F . We also introduce a notation
for substitution in formulas. We will use a syntactic variable F (u) for a formula
ordinarily containing free occurrences of the variable u, and then, if t is a term of
the language that is free for u in F (u), F (t) will stand for the result of substituting
t for all free occurrences of u in F (u). The same convention will govern the use of
syntactic variables F (u, v), and so on, with more variables indicated. We also allow
F (u) to stand for a formula without free occurrences of u, but in that case F (t) will
be the same formula as F (u).
Now let us specify the axioms of the formal system PA. The logical axioms are
just as in 1.2, with the understanding, of course, that the syntactic variables F , G, H
range over formulas of LPA . The rules of inference are, also as in previous sections,
modus ponens and universal generalization. What is new are the number-theoretical
axioms.

(N1) ∼Sx = 0

(N2) Sx = Sy ⊃ x = y

(N3) x + 0 = x

(N4) x + Sy = S(x + y)

(N5) x × 0 = 0

(N6) x × Sy = (x × y) + x

(N7) F (0)  ∀x(F (x) ⊃ F (Sx)) ⊃ ∀xF (x)

Note that (N1)–(N6) are particular formulas, but (N7) is an axiom schema: re-
placing F (x) by any formula of LPA yields an axiom. This schema provides the
mathematical induction axioms. Intuitively, mathematical induction is the princi-
ple that if 0 possesses a property and if whenever a number possesses the property
then so does its successor, then all numbers possess the property. The power of PA
to derive formalizations of interesting arithmetical claims — including all the clas-
sical theorems of number theory — stems from the inclusion of the mathematical
induction axioms.
1.5. PEANO ARITHMETIC. 17

Now the intended interpretation of LPA is, not surprisingly, that the universe is
the natural numbers, ‘0’ denotes zero, ‘S’, ‘+’ and ‘×’ denote the successor function,
addition, and multiplication. It can look fairly obvious that PA is sound for this
interpretation, but this obscures a difficulty. As we shall see in Chapter 5, framing
the notion of truth in this interpretation requires a metalanguage that is expressively
richer than what can be formalized in LPA , and, despite its superficial obviousness,
demonstrating that PA is sound for this interpretation requires a metalanguage
that is mathematically stronger than what is formalized by PA. (The problem, as it
turns out, lies in the unbounded logical complexity of the axioms of mathematical
induction. In particular the formula put in for F (x) in (N7) can have arbitrarily
many quantifiers.) For this reason we avoid semantical reasoning in proving the
results of the next three chapters. Semantic considerations will appear only as
heuristic or suggestive. In my view, in the study of foundations of mathematics, we
should avoid strong assumptions in the metalanguage, assumptions which are in as
much need of a foundation as is the mathematics that we are trying to ground by
formulating formal systems and investigating them.
Let us, then, return to the syntactic investigation of PA. Although the axioms
of PA includes those we called (S1) and (S2) for the system ΣS , here renamed (N1)
and (N2), it does not include (S3) and (S4), because they are derivable in PA using
mathematical induction. We shall show this at the beginning of the next section.
Hence every axiom of system ΣS is derivable in PA, so that every formula derivable
in ΣS is derivable in PA. From Herbrand’s result cited at the end of §1.4, it follows
that every sentence of LPA that does not contain ‘+’ or ‘×’ is either derivable or
refutable.
Of course, the difference between LPA and LS is that LPA has terms and for-
mulas that do contain ‘+’ and ‘×’. Let us note first that the intersubstitutivity of
identicals (“equals for equals yields equals”) is derivable. That is,

` x = y ⊃ t(x) = t(y),
where t(x) is any term containing x and t(y) comes from t(y) by replacing x with y.
An instance of this yields the following: suppose s, s0 , t and t0 are terms such that
t0 can be obtained from t by replacing a subterm s by s0 ; then ` s = s0 ⊃ t = t0 .
Now let us see how (N3)–(N6) yield the derivability of equations involving
addition and multiplication. (As with formulas, in speaking of terms we shall often
drop the outermost pair of parentheses.) As an example, let show the derivability of
S0 + SS0 = SSS0 (one plus two equals three). First, ` S0 + 0 = S0, since it is an
instance of (N3). Then, ` S0 + S0 = S(S0 + 0), since it is an instance of (N4). By
18 CHAPTER 1. AXIOMATICS

intersubstitutivity ` S(S0 + 0) = SS0. By the transitivity of ‘=’, ` S0 + S0 = SS0.


By another instance of (N4), ` S0 + SS0 = S(S0 + S0). By intersubstitutivity,
` S(S0 + S0) = SSS0. By transitivity, ` S0 + SS0 = SSS0.
The terms S0, SS0, SSS0, . . . are called formal numerals. As a convention in
the metalanguage, if n is a number we shall use boldface n for the formal numeral
that contains n occurrences of ‘S’. Thus 0 is 0 and 3 is SSS0. The argument we
have just given, iterated as necessary, tells us that if m, n, and p are numbers such
that p is the sum of m and n, then ` m + n = p.
Now let’s show that SS0 × SS0 = SSSS0 (two times two equals four) is
derivable. First, ` SS0 × 0 = 0, by (N5). Then (N6) gives us ` SS0 × S0 =
(SS0 × 0) + SS0. By intersubstitutivity, ` (SS0 × 0) + SS0 = 0 + SS0. By what we
just noted about the derivability of addition statements, ` 0 + SS0 = SS0. By tran-
sitivity, ` SS0 × S0 = SS0. Another instance of (N6) yields ` SS0 × SS0 = (SS0 ×
S0) + SS0; intersubstitutivity and transitivity yield ` SS0 × SS0 = SS0 + SS0;
since ` SS0 + SS0 = SSSS0, transitivity yields ` SS0 × SS0 = SS0 + SS0.
Consequently, by transitivity, ` SS0 × SS0 = SSSS0.
Again, this argument is a general template. It should not be hard to see how to
obtain, for any numbers m, n, and p, if p is the product of m and n then ` m×n = p.
Note that in the derivations we have just been showing to exist, no use of (N7),
mathematical induction, was made. This is not surprising, since the formulas being
shown derivable are all particular equations, whereas the point of mathematical
induction is to provide derivations of general numerical laws, a task to which we
now turn.

1.6 Basic laws of arithmetic


Our invocations of axioms of mathematical induction will follow a general pattern.
We shall specify a formula F (x), and show that ` F (0) and ` F (x) ⊃ F (Sx). Since
the latter yields ` ∀x(F (x) ⊃ F (Sx)) by generalization, an axiom of mathematical
induction will then give us ` ∀xF (x) and hence also ` F (x).
Our first task is to show that the axioms (S3) and (S4) of ΣS are derivable in PA.
For (S3), let F (x) be the formula ∼x = 0 ⊃ ∃y(x = Sy). By axiom (I1), ` 0 = 0; and
this formula truth-functionally implies ∼0 = 0 ⊃ ∃y(0 = Sy), that is, F (0). Hence
` F (0). Also by (I1), ` Sx = Sx. By existential generalization, ` ∃y(Sx = Sy).
This formula truth-functionally implies ∼Sx = 0 ⊃ ∃y(Sx = Sy), that is, F (Sx).
Hence ` F (Sx), so ` F (x) ⊃ F (Sx) by truth-functional implication. We may
conclude by mathematical induction that ` F (x),that is, ` ∼x = 0 ⊃ ∃y(x = Sy),
1.6. BASIC LAWS OF ARITHMETIC 19

as desired. To show that the axioms (S4) are derivable, we’ll consider the example
∼SSx = x; the argument is generalizable to any positive number of occurrences of
‘S’. Let F (x) be the formula ∼SSx = x. Then ` F (0), since F (0) is an instance
of axiom (N1). Now F (x) ⊃ F (Sx) is ∼SSx = x ⊃ ∼SSSx = Sx, which is truth-
functionally implied by SSSx = Sx ⊃ SSx = x, which is an instance of axiom (N2).
Hence ` F (x) ⊃ F (Sx), so using an axiom of mathematical induction we obtain
` F (x), and so ` ∼SSx = x.
Our next aim is to show that the commutative law of addition is derivable. We
do this in three stages. First we show ` 0 + x = x. Let F (0) be 0 + x = x. Then
` F (0), since it is an instance of axiom (N3). By intersubstitutivity, ` F (x) ⊃
S(0 + x) = Sx. By axiom N(4), ` 0 + Sx = S(0 + x). By transitivity of identity,
` F (x) ⊃ 0+Sx = Sx, that is, ` F (x) ⊃ F (Sx). The result follows by mathematical
induction.
Second, we show ` Sz + x = S(z + x). Let F (x) be Sz + x = S(z + x). By
(N3), ` z + 0 = z, so that ` S(z + 0) = Sz, so ` Sz = S(z + 0) by symmetry.
Another instance of (N3) yields ` Sz + 0 = Sz. By transitivity, ` Sz + 0 = S(z + 0),
that is ` F (0). By (N4), ` Sz + Sx = S(Sz + x). Hence by intersubstitutivity,
` Sz + x = S(z + x) ⊃ Sz + Sx = SS(z + x). By (N4) again, ` z + Sx = S(z + x),
so that ` S(z + Sx) = SS(z + x). By symmetry and transitivity, ` Sz + x =
S(z + x) ⊃ Sz + Sx = S(z + Sx). This is just F (x) ⊃ F (Sx). Thus we obtain the
desired conclusion.
Finally, we show ` x + y = y + x. Let F (x) be x + y = y + x. From what
was shown two paragraphs above, ` 0 + y = y, and by axiom (N3) and symmetry
y = y + 0, so by transitivity ` F (0). By axiom (N4), ` y + Sx = S(y + x). An
instance of what was shown in the previous paragraph yields ` Sx + y = S(x + y).
By intersubstitutivity, ` x + y = y + x ⊃ S(x + y) = S(y + x). By symmetry and
transitivity of identity, ` x+y = y+x ⊃ (Sx+y = y+Sx), that is, ` F (x) ⊃ F (Sx).
We leave to the reader arguments for the derivability of other basic laws of
arithmetic, for example the associativity of addition, the law of cancellation y + x =
z + x ⊃ y = z, the commutativity and associativity of multiplication, and the
distributive law (y + z) × x = (y × x) + (z × x). (See the Exercises.)
We now wish to show that the basic properties of the usual ordering relation of
the natural numbers can be derived in PA. PA does not have primitive vocabulary
to express this relation, but, since a number m is no greater than a number n iff
some natural number added to m yields n, it makes sense to introduce the following
metamathematical shorthand: by x 6 y we mean the formula ∃z(y = z + x). More
generally, if s and t are any terms, by s 6 t we mean the formula ∃u(t = u + s),
20 CHAPTER 1. AXIOMATICS

where u is the earliest variable among z, z 0 , z 00 , . . . that is distinct from all variables
in s and t.
Since ` x = 0+x and ` x = x+0, by existential generalization we have ` x 6 x
and ` 0 6 x. To show the derivability of transitivity, that is,

x6yy 6z ⊃x6z
We might argue as follows. Suppose that x 6 y and y 6 z. Then there are z 0
and z 00 such that z 0 + x = y and z 00 + y = z. By intersubstitutivity, z 0 + (z 00 + x) = z.
By the associativity of addition, (z 0 + z 00 ) + x = z. Hence there exists an x0 , namely
z 0 + z 00 , such that x0 + x = z. That is, x 6 z.
The argument of the foregoing paragraph is more informal than those we have
used previously. That it establishes the derivability of transitivity can be seen,
roughly speaking, by noting that all the moves are purely logical inferences from
formulas known to be derivable; and that by dint of the logical axioms, PA can
capture all such inferences. More precisely, the argument shows that the formula
(z 0 + x = y  z 00 + y = z) ⊃ (z 0 + z 00 ) + x = z is truth-functionally implied by appro-
priate instances of associativity, (N4), and intersubstitutivity. Hence this formula is
derivable. Together with existential generalization, this formula truth-functionally
implies (z 0 + x = y  z 00 + y = z) ⊃ ∃z 0 (z 0 + x = z); and the latter formula logically
implies

∃z(z + x = y)  ∃z 0 (z 0 + y = z) ⊃ ∃z 0 (z 0 + x = z)
which is just the formula x 6 y  y 6 z ⊃ x 6 z. Another way of seeing that the
informal argument establishes derivability is by noting that the informal argument
can be directly transcribed into a natural deduction system for logical inference,
like that of Goldfarb’s Deductive Logic, yielding a deduction in that system whose
premises are formulas known to be derivable and whose conclusion is the desired
transitivity formula; and any implication that can be shown by such a deduction is,
as we show in the Appendix §2, derivable using the logical axioms of PA.
More generally, the fact that all logically correct steps can be captured in PA
can be framed as the quantificational completeness or logical completeness of PA,
namely, that all quantificationally valid formulas are derivable. (A formula is quan-
tificationally valid iff it is true under all interpretations.) As a consequence, if a
formula logically implies another (in the sense of quantification theory), and the
first formula is derivable, then the second formula will be, too. The quantificational
completeness of a formal system of logical axioms was first shown by Kurt Gödel, in
1.6. BASIC LAWS OF ARITHMETIC 21

his doctoral dissertation (1930). We will not be applying quantificational complete-


ness in our arguments in any formal way, however, but rather only the informal and
more humdrum fact that our logical axioms are sufficient to formalize all customary
logical inferences. (This was already verified by Frege in 1879.)
Let us show

`x6yy 6x⊃x=y
Suppose x 6 y and y 6 x. Then there are numbers z and z 0 such that z + x =
y and z 0 + y = x. By intersubstitutivity, z 0 + (z + x) = x, so by associativity
(z 0 + z) + x = x. Since ` 0 + x = x, (z 0 + z) + x = 0 + x. By the law of cancellation
z 0 + z = 0. By the law ` x + y = 0 ⊃ y = 0 (Exercise 1.?), z = 0. Hence 0 + x = y,
so that x = y.
The derivability of four further laws of ordering are left to the reader (see the
Exercises):

x60⊃x=0
x 6 Sx
x 6 y  ∼x = y ⊃ Sx 6 y
x6y∨y 6x
These laws express that the ordering is a linear ordering, that is, any two
elements are comparable; that it has a least element, namely zero; and that the
ordering is discrete, that is, for every number there is a next one in the ordering,
namely its successor (there is nothing in between a number and its successor).
22 CHAPTER 1. AXIOMATICS
Chapter 2

Gödel’s Proof

2.1 Gödel numbering


In order to investigate the syntax of system PA, treating it as a mere system of signs,
Gödel saw that we may proceed as follows. First we specify a mapping from signs
to numbers: that is, we correlate numbers with the primitive signs of the formal
language and with strings of signs of the formal language. As a result syntactic
properties and relations become correlated with number-theoretic properties and
relations, which may be defined and investigated using purely number-theoretic
means.
Here is the correlation of numbers number with the primitive signs of LPA that
we shall use
‘0’ with 1 ‘x’ with 17
‘S’ with 2 ‘y’ with 19
‘+’ with 3 ‘z’ with 21
‘×’ with 4 ‘x0 ’ with 23
‘=’ with 5 ‘y 0 ’ with 25
‘∼’ with 6 ‘z 0 ’ with 27
‘⊃’ with 7 and so on
‘∀’ with 8
‘(’ with 9
‘)’ with 10

More precisely put, we define a function Γ from signs of the alphabet to integers:

23
24 CHAPTER 2. GÖDEL’S PROOF

Γ(σ) = m iff either σ is ‘0’ and m = 1, or σ is ‘S’ and m = 2, or σ is ‘+’ and


m = 3, and so on. Γ is one-to-one, that is, it carries distinct signs to distinct
integers. Moreover, Γ is effective: given a sign of the alphabet we can find the
number correlated, and given a number we can decide whether there is a sign to
which the number is correlated and if so we can find which sign this is. Γ is obviously
not onto, that is, there are integers that are not correlated with any sign of LPA . In
fact, we purposely left gaps so that, if we want to treat a language that augments
LPA (as we will in §5.4), we can add new correlations without disturbing the ones
already made.
We now wish to correlate numbers with strings of signs. Clearly Γ yields a
correlation of a finite sequence of numbers with each string of signs. The further
step that is needed is to get from finite sequences of numbers to numbers. To do
this, following Gödel we shall use products of prime powers. A prime number is an
integer greater than 1 whose only integral divisors are itself and 1. In order, the
first ten primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29. We define a function γ from
strings to numbers as follows: if σ1 . . . σn is a string, then

γ(σ1 . . . σn ) = 2Γ(σ1 ) · 3Γ(σ2 ) · . . . · pΓ(σn )

where p is the nth prime number. The number to which γ carries a string is called
the gödel number of the string. For example, γ(‘+’) = 23 = 8, γ(‘∀S)’) = 28 32 510 =
22,500,000,000, and γ(‘0 = 0’) = 21 35 51 = 2430. The function γ is one-to-one:
distinct strings are carried to distinct numbers. (This follows from the Unique
Factorization Theorem, first proved by Gauss in 1798, which asserts that every
number greater than 1 has a unique factorization into prime powers.) Moreover, γ
is effective.
Having given the gödel numbering, we may now deal with numbers alone. More
precisely, we know that there is a property of numbers that holds of just the gödel
numbers of formulas of PA, for example. We want to define this property, and
others like it, entirely within number theory. Our definitions will be purely number-
theoretic without any mention of syntax. Our interest in defining properties like
this is given by the gödel numbering; but the properties are intrinsically purely
number-theoretic, and their number-theoretic structure does not in any way depend
on the syntax of PA or on the correlation γ. The number theory we will be using
is informal number theory, as might be seen in a typical mathematics class, not the
formalized version enshrined by PA. Eventually we shall want to formalize some of
our proceedings; in §2.4 we shall see how much.
2.2. PRIMITIVE RECURSIVE FUNCTIONS AND RELATIONS 25

2.2 Primitive recursive functions and relations


We wish to define functions and relations of numbers in a purely number-theoretic
way that insures that the functions and relations are computable. In order to do
this, we use a delimited set of methods of definition. The definitions will provide
algorithms—purely mechanical procedures—for calculating the values of the func-
tions and for ascertaining whether the relations hold or not of any specific arguments.
Functions and relations that can be defined by these methods are called primitive
recursive.
We start by stipulating that certain basic functions are primitive recursive: all
constant functions (those which take all arguments to the same value), the identity
function (which takes each number to itself), and the successor function. Clearly
these are all computable. To define new primitive recursive functions, we may use
the methods of recursion and composition.
Recursion. The idea here is to define the value of a function for argument
k + 1 in terms of its value for argument k. That is, we first specify a value in terms
of previously defined functions when one argument is 0; and then we show how,
using other previously defined functions, to obtain the value when that argument is
k + 1, in terms of the value when that argument is k. Here are several examples,
with the first two also rephrased in words.
Addition n+0=n
n + (k + 1) = (n + k) + 1
The value of the function on n and 0 is given by the identity function on n; the
value on n and k + 1 is the successor of the value on n and k.
Multiplication n·0=0
n · (k + 1) = (n · k) + n
The value of the function on n and 0 is given by the constant function 0; the value
of n and k + 1 is the sum of the value on n and k and n.
Factorial 0! = 1
(k + 1)! = k! · (k + 1)

Exponentiation n0 = 1
nk+1 = (nk ) · n

Truncated predecessor pred(0) = 0


pred(k + 1) = k
26 CHAPTER 2. GÖDEL’S PROOF

Truncated difference · 0=n


n−
· (k + 1) = pred(n −
n− · k)

Thus n − · k is the difference between n and k if n ≥ k and is 0 if n ≤ k.


For functions of two arguments, a definition by recursion looks like this: if ψ(n)
and ξ(j, n, k) are functions already known to be primitive recursive, then we can
define a new function ϕ thus:

ϕ(n, 0) = ψ(n)
ϕ(n, k + 1) = ξ(ϕ(n, k), n, k)
Thus the value of ϕ(n, k+1) is defined as some known function that has as inputs the
previous value ϕ(n, k), n,and k. Not all variables on the left need actually appear on
the right. For example, in the definition of addition, only the previous value n + k
appears on the right hand side of the second equation, but neither k nor n by itself
do. Even the following counts as a definition by recursion, although no variables
appear on the right-hand side:

α(0) = 1
α(k + 1) = 0.

The function α takes every positive integer to 0, and takes 0 to 1. We call α the
switcheroo function.
The general form of definition by recursion for functions with more than two
arguments is like that given above, but with a sequence n1 , n2 , . . . , nm of arguments
taking the place of n.
Composition. This is simply the compounding of given functions and rela-
tions. For example, we may define
· n) + (n −
|k − n| = (k − · k),

in which truncated difference and addition are compounded. This function gives
the absolute value of difference between k and n. Composition also gives us the
means to capture definition by cases. For example, suppose we wanted to define the
function of k and n that yields k 2 if k ≤ n and n2 if n < k. We can do this by using
addition, multiplication, truncated difference, and switcheroo thus, thereby showing
that this function is primitive recursive:
· n) + n · n · α(n + 1 −
k · k · α(k − · k)
2.2. PRIMITIVE RECURSIVE FUNCTIONS AND RELATIONS 27

In sum, a function is primitive recursive iff it can be defined by starting with


the basic functions and iteratively applying the definition methods of recursion and
composition.
We also want a notion of a relation’s being primitive recursive. A suitable
notion can be defined in terms of primitive recursive functions as follows. The
characteristic function of an m-place relation R is the m-place function χ such
that if R(n1 , . . . , nm ) holds then χ(n1 , . . . , nm ) = 1, and if R(n1 , . . . , nm ) does not
hold then χ(n1 , . . . , nm ) = 0. We define: a relation is primitive recursive iff its
characteristic function is primitive recursive.
Since primitive recursive functions are computable, so are primitive recursive
relations: to determine whether a primitive relation holds at an m-tuple, simply
compute its characteristic function and see whether it is 1 or not. Now α(|k − n|) is
the characteristic function of the relation k = n, and α(k − · n) is the characteristic
function of the relation k ≤ n; hence these relations are primitive recursive relations.
The set of odd numbers is primitive recursive (we identify sets and properties with
1-place relations), since its characteristic function can be defined thus: χ(0) =
0, χ(k + 1) = α(χ(k)).
Additional methods of definition are allowable in defining primitive recursive
functions and relations, provided they can be reduced to applications of recursion
and composition. Three methods in particular will be of great use to us.
New relations can be defined from given ones by truth-functional (Boolean)
combination. For example, k > n iff not n ≤ k; and k ≥ n iff k > n or k = n.
We claim that any truth-functional combination of primitive recursive relations is
primitive recursive. It then follows that k > n is primitive recursive, and that
k ≥ n is as well. To prove the claim, note first that if an m-place relation R
is primitive recursive, then so is its complement, that is, the relation that holds
of an m-tuple of numbers just in case R does not. For if χ is the characteris-
tic function of R, then the complement of R has characteristic function χ0 , where
χ0 (n1 , . . . , nm ) = α(χ(n1 , . . . , nm )). Second, if R and S are m-place p.r. relations
then so is the intersection of R and S, that is, the relation that holds of n1 , . . . , nm iff
R(n1 , . . . , nm ) and S(n1 , . . . , nm ). For if R has characteristic function χR and S has
characteristic function χS then the intersection has characteristic function χ, where
χ(n1 , . . . , nm ) = χR (n1 , . . . , nm ) · χS (n1 , . . . , nm ). It follows from these observations
that if R and S are primitive relations then so is the union of R and S, and, indeed,
that any truth-functional combination of primitive relations is primitive recursive.
Note. In the language we have been using for informal mathematics, ‘=’ means
identity, ‘+’ means addition, and ‘0’ means zero. This amounts to an ambiguous
28 CHAPTER 2. GÖDEL’S PROOF

usage, since we have also used these three signs as signs of the formal language LPA .
The context should make clear when the signs are used in informal mathematics, and
when as signs (or names for signs) of the formal language, but the reader should be
alert to the need to be sensitive to this. In order to minimize the overlap in language,
in informal mathematics we use ‘·’ for multiplication, as opposed to ‘× in the formal
language, and numerical variables from the middle of the English alphabet (‘i’ to
‘r’), not the end. Also, in informal mathematics, we will use somewhat different
logical notation: ‘ & ’ for and, ‘→’ for if-then, ‘↔’‘ for iff, and an overstrike bar
(over a sign for a relation) to mean not, for example, n > m iff n≤m. The quantifiers
we use in informal mathematics will also look different from those of LPA . For want
of a good alternative, ‘∨’ will remain as ambiguous between its informal usage and
its formal usage as (inclusive) or. End of Note.
Another definition method we will want to use for relations is bounded quantifi-
cation. For example,
E
k divides n iff ( p ≤ n)(n = p · k).
A
n is prime iff n > 1 & ( k ≤ n)(k|n → k = 1 ∨ k = n),
where ‘k|n’ abbreviates ‘k divides n’. The bound on the quantifier insures com-
putability: one need make only a finite search in order to determine whether the
new relation holds or not. Often an equivalent definition of the relation can be made
without the bound — for example, it is true that k divides n iff (∃p)(n = p · n) and
that n is prime iff n > 1 & (∀k)(k|n → k = 1 ∨ k = n)— but a definition without a
bound does not guarantee computability.
It is straightforward to show that if a relation R(k, n) is primitive recursive
A
then so is the relation ( k ≤ p)R(k, n), which has arguments p and n. Let χ be
the characteristic function of R, and define χ0 by recursion thus: χ0 (0, n) = χ(0, n),
χ0 (p + 1, n) = χ0 (p, n) · χ(p + 1, n). Thus χ0 is primitive recursive, and is the
characteristic function of ( k ≤ p)R(k, n), since χ0 (p, n) is 1 just in case each
A
of χ(0, n), . . . , χ(p, n) is 1, that is, just in case each of R(0, n), . . . , R(p, n) holds.
Bounded existential quantification can be obtained from bounded universal quan-
E
tification by truth-funcctional operations, since ( k ≤ p)R(k, n) is the complement
A
of ( k ≤ p)R(k, n)
A final definition-method we shall use frequently is bounded leastness. This is
used to define a new function from a given relation. The notation we use is this: an
expression
(µk ≤ p)R(k)
denotes the least number k ≤ p such that R holds of k, if there is such a number, and
2.2. PRIMITIVE RECURSIVE FUNCTIONS AND RELATIONS 29

denotes 0 otherwise. Thus if we define ϕ(n) = (µk ≤ n)(n = k + k) then ϕ(n) = n/2
if n is even and ϕ(n) = 0 if n is odd. Again, the point of having a bound on the
leastness operator is for the sake of computability; and again it is straightforward
to show that a function defined by bounded leastness from a primitive recursive
relation is itself primitive recursive. (See the Exercises.)
In short, since the primitive recursive functions and relations are closed under
definition by recursion, composition, truth-functional combination, bounded quan-
tification, and bounded leastness, when we use any of these definition methods
to define new functions and relations from functions and relations known to be
primitive recursive, the newly defined functions and relations will also be primitive
recursive.
We conclude this section by defining five primitive recursive functions and re-
lations concerning prime numbers and prime factorizations.
pr(0) = 1;
pr(k + 1) = (µn ≤ pr(k)! + 1)(n > pr(k) & n is prime) (2.1)
For each k > 0, pr(k) is the k th prime number, so that pr(1) = 2, pr(2) = 3,
pr(3) = 5, and so on. The bound comes from Euclid’s observation that if p is a
prime number then there is a prime number greater than p and no greater than
p! + 1. For if 2 ≤ n ≤ p then n divides p!, and so leaves a remainder of 1 when
divided into p! + 1. Hence either p! + 1 itself is prime, or it has a prime factor which
must be greater than p. (This is Euclid’s proof that there are infinitely many prime
numbers.)
The definition of pr(k) compresses several steps into one: those several steps are
definitions by truth-functional combination, bounded-leastness, composition, and
recursion. We shall often be giving definitions in this compressed form. This one
time, let us lay out the individual definitions step-by-step. First we note that the
relation that holds of m and n iff (n > m & n is prime) is primitive recursive,
since it is a truth-functional combination of primitive recursive relations. Next
we note that the function ϕ(j, m) = µn ≤ j(n > m & n is prime) is primitive
recursive, since it is defined from a primitive recursive relation by bounded-leastness.
Now let ψ(m) = ϕ(m! + 1, m); ψ is primitive recursive since it is obtained from
primitive recursive functions by composition. Finally, we define pr(k) by: pr(0) = 1,
pr(k + 1) = ψ(pr(k)). This is a definition by recursion; we may conclude that pr(k)
is primitive recursive.

[n]k = (µi ≤ n)(pr(k)i |n & pr(k)i+1 |n) (2.2)


30 CHAPTER 2. GÖDEL’S PROOF

[n]k is the exponent of pr(k) in the prime factorization of n.


E A
Seq(n) ↔ ( k ≤ n)( i ≤ n)(pr(i)|n ↔ i ≤ k)) (2.3)

Seq(n) holds iff n is a sequence number, a number in whose prime factorization


appear 2, 3, . . . , up through pr(k) for some k. For sequence numbers n we shall
speak of the ‘exponents in n’ to mean the exponents in the prime factorization of n.

`(n) = (µk ≤ n)(Seq(n) & pr(k)|n & pr(k + 1)|n) (2.4)

If n is a sequence number, `(n) is the number of distinct prime factors of n. We call


`(n) the length of n. We included Seq(n) as a conjunct in this definition in order to
secure that `(n) = 0 if n is not a sequence number.
We need an easy way of giving large bounds for several of the definitions below.
Here is one way of doing this (there are others, of course): let bg(n) be the sequence
number of length n in which each of the exponents is n. We leave to the reader the
task of giving a primitive recursive definition of bg(n) (see the Exercises).
A
m ∗ n = (µk ≤ bg(m + n)) Seq(k) & ( i ≤ `(m))([k]i = [m]i )
A 
& ( i ≤ `(n))(i > 0 → [k]i+`(m) = [n]i ) (2.5)

If m and n are sequence numbers, then m ∗ n is the sequence number k in which


the exponents are first those in m and then those in n, and whose length is exactly
`(m) + `(n). (The definition puts constraints on [k]i for 1 ≤ i ≤ `(m) + `(n). Since
m ∗ n is the least k fulfilling those constraints, it will not have any further prime
factors.) Thus 30 ∗ 30 = 30,030, since 30 = 21 31 51 and 30,030 = 21 31 51 71 111 131 ,
and 24 ∗ 18 = 5880, since 24 = 23 31 , 18 = 21 32 , and 5880 = 23 31 51 72 . If m is a
sequence number but n is not, then m ∗ n = n ∗ m = m, since `(n) = 0, and similarly
if n is a sequence number but m is not. If neither m nor n is a sequence number,
then m ∗ n = 0.

2.3 Arithmetization of syntax


In this section, we shall be defining functions and relations which, from a purely
number-theoretical point of view, might seem somewhat unnatural. Although they
are defined solely in terms of numbers, the motivation for these definitions stems
from correspondences to the syntax of PA. We group these correspondences together
as the Mirroring Lemma. A typical clause of this Lemma looks like this: if s and s0
2.3. ARITHMETIZATION OF SYNTAX 31
_
are strings of signs of LPA then γ(s) ∗ γ(s0 ) = γ(s s0 ). Thus the number-theoretic
function ∗ mirrors the syntactic operation of concatenation. (This claim should be
obvious, given the specification of the gödel numbering γ and the definition of the
number-theoretic function ∗.) Each clause of the Mirroring Lemma asserts that a
number-theoretic function or relation mirrors some syntactic notion.

paren(n) = 29 ∗ n ∗ 210 . (2.6)

Thus paren(30) = 29 31 51 71 1110 . Mirroring Lemma: Let s be a string of signs; then


_ _
paren(γ(s)) = γ(‘(’ s ‘)’), that is, paren(γ(s)) is the gödel number of the string
resulting by putting s in parentheses.

Var(n) ↔ n ≥ 17 & n is odd (2.7)

Attm(n) ↔ ( k ≤ n)(n = 2k & (k = 1 ∨ Var(k)))


E
(2.8)
Mirroring Lemma: Var(n) iff n = Γ(u) for some formal variable u. Attm(n) iff n is
the gödel number of an atomic term, that is, ‘0’ or a formal variable.

succ(n) = 22 ∗ n (2.9)

plus(m, n) = paren(m ∗ 23 ∗ n) (2.10)


times(m, n) = paren(m ∗ 24 ∗ n) (2.11)
Tmop(i, j, k) ↔ k = succ(i) ∨ k = plus(i, j) ∨ k = times(i, j) (2.12)
_
Mirroring Lemma: Let s and s0 be strings of signs; then succ(γ(s)) = γ(‘S’ s),
_ _ _ _ _ _ _ _
plus(γ(s), γ(s0 )) = γ(‘(’ s ‘+’ s0 ‘)’) and times(γ(s), γ(s0 )) = γ(‘( s ‘×’ s0 ‘)’).

nmrl(0) = 2; nmrl(n + 1) = succ(nmrl(n)) (2.13)

Thus nmrl(1) = 22 31 = 12; nmrl(2) = 22 32 51 = 180; nmrl(3) = 22 32 52 71 = 4500.


Mirroring Lemma: for every n, nmrl(n) is the gödel number of the formal numeral
n.

A
Tmseq(n) ↔ Seq(n) & ( k ≤ `(n))(k > 0 → Attm([n]k ) ∨
E
( i, j < k)Tmop([n]i , [n]j , [n]k )) (2.14)
17 1 9 17 3 1 10
For example, Tmseq holds of 22 32 52 3 5 7 11 . Note that if Tmseq(n) holds then
n must be a “second-order sequence number”, that is, a sequence number in which
32 CHAPTER 2. GÖDEL’S PROOF

all the exponents are themselves sequence numbers. Mirroring Lemma: Tmseq(n)
holds iff n is a sequence number, and the exponents [n]1 , [n]2 , . . . [n]`(n) in its prime
factorization are the gödel numbers of a sequence t1 , t2 , . . . , t`(n) of strings with the
following property: each tk either is an atomic term, or, for some strings ti and tj
earlier in the sequence, is Sti or (ti +tj ) or (ti ×tj ). Such a sequence of strings shows
how a term is built up from its constituent parts in accord with the formation rule
for terms. Thus a string t is a term if and only if there is such a sequence whose
last member is t. Hence, we define:
E
Tm(n) ↔ ( m ≤ bg(n))(Tmseq(m) & n = [m]`(m) ) (2.15)

The bound on m is large enough. For suppose t is a term, and let i be the number of
occurrences of the signs ‘S’, ‘+’, and ‘×’ in t. Let t1 . . . , tj be a sequence of strings
as in the preceding paragraph such that tj = t and j is as small as possible. Then
j ≤ 2i + 1 (this may be proved by induction on j), and of course 2i + 1 ≤ γ(t). Also,
each ti is a subterm of t, so that γ(ti ) ≤ γ(t). Hence the sequence number m whose
prime factorization has exponents γ(t1 ), γ(t2 ), . . . , γ(tj ) is at most bg(γ(t)). We may
conclude that the following Mirroring Lemma clause holds: Tm(n) iff n = γ(t) for
some term t of LPA .
A similar series of definitions will yield a primitive recursive relation that mirrors
the property of being a formula.

Atform(n) ↔ ( j, k ≤ n)(Tm(j) & Tm(k) & n = j ∗ 25 ∗ k)


E
(2.16)

neg(n) = 26 ∗ n (2.17)
cond(m, n) = paren(m ∗ 27 ∗ n) (2.18)
gen(k, n) = 28 ∗ 2k ∗ paren(n) (2.19)

Formop(i, j, k) ↔ k = neg(i) ∨ k = cond(i, j) ∨


E
( m ≤ k)(Var(m) & k = gen(m, i)) (2.20)

A
Formseq(n) ↔ Seq(n) & ( k ≤ `(n)) k > 0 → Atform([n]k ) ∨
E 
( i, j < k)Formop([n]i , [n]j , [n]k ) (2.21)
E
Form(n) ↔ ( m ≤ bg(n)(Formseq(m) & n = [m]`(m) ) (2.22)
2.3. ARITHMETIZATION OF SYNTAX 33

The Mirroring Lemma clauses for (2.17) – (2.21) are left to the reader. By reasoning
parallel to that for Tm(n), we can then infer that Form(n) holds iff n = γ(F ) for
some formula F of LPA .
Our next aim is to define a primitive recursive function that mirrors substitution
of terms for free variables. To do this, we must first mirror the notions of bound
and free occurrences of variables. Let the ith place in a string be the address for
the ith sign in the string, counting from the left. Thus in ‘∀x((x + 0) = x)’ the fifth
place is the location of the occurrence of x that follows a left parenthesis, while the
tenth place is the location of the rightmost occurrence of x.
E
Bound(i, k, n) ↔ Form(n) & Var(k) & ( p, q, r ≤ n)(n = p ∗ gen(k, q) ∗ r &
`(p) + 1 ≤ i ≤ `(p) + `(gen(k, q)))(2.23)
Mirroring Lemma: Bound(i, k, n) holds if n = γ(F ) for some formula F , k = Γ(u)
for some formal variable u, and the ith place in F lies within the scope of a quantifier
binding u. (Note that u need not occur at the ith place. This feature of the definition
will make it easier to mirror ‘t is free for u in F ’ (Exercise 2.?).)
Free(i, k, n) ↔ Form(n) & Var(k) & [n]i = k & Bound(i, k, n) (2.24)
Mirroring Lemma: Free(i, k, n) holds if n = γ(F ) for some formula F , k = Γ(u) for
some formal variable u, and u has a free occurrence at the ith place in F .
In syntax, substitution of a term for a free variable u is the simultaneous re-
placement of all free occurrences of u by occurrences of the term. In order to mirror
this by a primitive recursive function, we need to break it down into a step-by-step
procedure of substituting for the free occurrences of the variable one by one. We will
make the substitutions starting with the last free occurrence (the rightmost one) and
continuing right-to-left. The reason for this is that if the term being substituted has
length > 1, a substitution perturbs all the addresses to the right of the occurrence
being replaced. By making the substitutions from right to left, at each step the
addresses of the occurrences that are yet to be replaced remain unperturbed.
Define (max k ≤ m)R(k) as
A
(µk ≤ m)(R(k) & ( j ≤ m)(j > k → R(j))).
Then (max k ≤ m)R(k) is the largest k ≤ m such that R(k) holds, and 0 if there is
no such k.

occ(0, k, n) = (max i ≤ `(n))Free(i, k, n)


occ(m + 1, k, n) = (max i < occ(m, k, n))Free(i, k, n) (2.25)
34 CHAPTER 2. GÖDEL’S PROOF

If n = γ(F ) for some formula F and k = Γ(u) for some formal variable u, then
occ(0, k, n), occ(1, k, n), occ(2, k, n), . . . give the addresses, from largest address
down, of the places where u is free in F . If u has m free occurrences in F , then
occ(i, k, n) is nonzero for 0 ≤ i < m, while occ(m, k, n) = 0. Thus our next function
gives the number of free occurrences of u in F .

nocc(k, n) = (µm ≤ `(n))(occ(m, k, n) = 0) (2.26)

subat(n, p, i) = (µm ≤ bg(p + n))( q, r ≤ n)(n = q ∗ 2[n]i ∗ r


E

& i = `(q) + 1 & m = q ∗ p ∗ r) (2.27)

If n and p are sequence numbers and 0 < i ≤ `(n), then subat(n, p, i) is the sequence
number whose first i − 1 exponents match the first i − 1 exponents in n, whose next
`(p) exponents match those in p, and whose final `(n) − i exponents match the final
`(n) − i exponents in n. Thus, for example, subat(30, 72, 2) = 21 ∗ 23 32 ∗ 21 = 9450.
Mirroring Lemma: if n = γ(s) for some string s and p = γ(s0 ) for some string s0 ,
and i is at most the length of s, then subat(n, p, i) is the gödel number of the string
obtained from s by substituting s0 for whatever appears in s at the ith place.

subst(n, k, p, 0) = n
subst(n, k, p, i + 1) = subat(subst(n, k, p, i), p, occ(i, k, n)) (2.28)

Mirroring Lemma: if n = γ(F ) for some formula F , k = Γ(u) for some formal vari-
able u, and p = γ(s) for some string s, then subst(n, k, p, 1) is the gödel number of the
result of substituting s for the rightmost free occurrence of u in F ; subst(n, k, p, 2)
is the result of substituting s for the two rightmost free occurrences of v in F ; and
so on.
sub(n, k, p) = subst(n, k, p, nocc(k, n)) (2.29)
Mirroring Lemma: if n = γ(F (u)) for some formula F (u), k = Γ(u) for some formal
variable u, and p = γ(t) for some term t, then sub(k, p, n) = γ(F (t)).
The next function will be of great importance in the proofs of this Chapter and
the next, although it will might look a little mysterious now.

diag(n) = sub(n, 19, nmrl(n)). (2.30)

So if n = 219 35 51 , then diag(n) = 22 32 52 72 . . . pr(n)2 pr(n + 1)1 pr(n + 2)5 pr(n +


3)1 . Mirroring Lemma: if n = γ(F (y)) for some formula F (y), then diag(n) =
2.4. NUMERALWISE REPRESENTABILITY 35

γ(F (n)), that is, diag(n) is the gödel number of the formula obtained from F (y) by
substituting n for all free occurrences of y. (So if there are no free occurrences of
y in F (y), then diag(n) = n. Also, if n is not the gödel number of a formula, then
diag(n) = n.) For reasons that will become clear only later, diag is called the gödel
diagonal function.
Our next task is to define a primitive recursive relation that mirrors the property
of being an axiom.
E
T1Ax(n) ↔ Form(n) & ( k, m ≤ n)(n = cond(k, cond(m, k))) (2.31)
Mirroring Lemma: T1Ax(n) iff n = γ(F ) for a formula F of LPA that is an axiom
generated from schema (T1).
We leave to the reader the task of providing primitive recursive definitions that
mirror the other axioms. (See the Exercises.) These will culminate in a primitive
recursive definition of a 1-place relation Ax(n) that yields the Mirroring Lemma
clause: Ax(n) iff n = γ(F ) for some formula F that is an axiom of PA.
E
Infop(i, j, k) ↔ j = cond(i, k) ∨ ( p ≤ k)(Var(p) & k = gen(p, i)) (2.32)

A
Drvtn(n) ↔ Seq(n) & ( k < `(n)) k > 0 → Ax([n]k ) ∨
E 
( i, j < k)Infop([n]i , [n]j , [n]k ) (2.33)
Mirroring Lemma: Drvtn(n) holds iff n is a sequence number, and the exponents
[n]1 , [n]2 , . . . [n]`(n) in its prime factorization are the gödel numbers of a sequence
of formulas that is a derivation in PA. If this holds, we say that n encodes the
derivation.
Der(m, n) ↔ Drvtn(m) & n = [m]`(m) (2.34)
Mirroring Lemma: Der(m, n) holds iff m encodes a derivation in PA of a formula
with gödel number n.

2.4 Numeralwise representability


We now define a syntactic notion which gives a sense in which number-theoretic
functions and relations may be formalized within PA, and so establishes a link
between the informal number theory of the previous two sections and the formal
system. For vividness, we’ll frame the definition for 2-place relations. Let R be
such a relation. A formula F (x, y) of LPA , whose only free variables are x and y,
numeralwise represents the relation R iff for all integers k and n,
36 CHAPTER 2. GÖDEL’S PROOF

if R(k, n) then ` F (k, n)

if R(k, n) then ` ∼F (k, n).

Similarly, a 1-place relation (that is, a set) is numeralwise represented by a


formula F (x), whose only free variable is x, iff ` F (n) whenever the relation holds
of n (whenever n is in the set) and ` ∼F (n) whenever it doesn’t. A 3-place relation
would be numeralwise represented by a formula F (x, y, z) with just those three free
variables; if the relation holds of a triple then the corresponding numerical instance
of F (x, y, z) is derivable and if it doesn’t then the corresponding numerical instance
is refutable. A relation is numeralwise representable iff there is a formula that
numeralwise represents it.
The formula x = y numeralwise represents the identity relation. To show this
we must show, for any integers k and n, if k and n are identical then ` k = n, and
if k and n are distinct then ` ∼k = n. If k and n are identical, then k and n are
the same formal numeral, so that k = n is the same formula as k = k. And since
x = x is an axiom of PA, it follows that ` k = k.
Now suppose k and n are distinct. We illustrate the argument by an example.
Suppose k = 4 and n = 2. Axiom (N2) is ` Sx = Sy ⊃ x = y. Hence, ` SSSS0 =
SS0 ⊃ SSS0 = S0 and ` SSS0 = S0 ⊃ SS0 = 0, so that by truth-functional
logic ` SSSS0 = SS0 ⊃ SS0 = 0. But by axiom (N1) ` ∼SS0 = 0. Hence, by
truth-functional logic, ` ∼SSSS0 = SS0, that is, ` ∼4 = 2. It should be clear how
to generalize this argument to show ` ∼k = n whenever k > n. For the case k < n,
we may then invoke the symmetry of ‘=’.
We also want to formulate a notion of numeralwise representation for functions.
Here, we want PA to capture the fact that a function has a unique value for each
argument.
Let ϕ be an 2-place function from numbers to numbers. A formula F (x, y, z)
numeralwise represents the function ϕ (in PA) iff for all numbers m,n, and q such
that ϕ(m, n) = q
` F (m, n, z) ≡ z = q
Note that since ` q = q, the condition yields ` F (m, n, q); indeed the condition
is equivalent to ` F (m, n, q)  (F (m, n, z) ⊃ z = q). The condition formalizes the
claims tht q is the value of ϕ for arguments m and n, and that q is the only value. For
a 1-place function ϕ, the condition would read: for all n and q such that ϕ(n) = q,
` F (n, y) ≡ y = q. We leave to the reader the task of giving the general form of
2.5. PROOF OF INCOMPLETENESS 37

the condition; in each case the formula that numeralwise represents the function has
one more free variable than the function has arguments.
The formula x + y = z numeralwise represents addition. In §1.5 we’ve shown
that, for any k, n, and q, if q is the sum of k and n then ` k + n = q. We must also
show that if q is the sum of k and n then ` k + n = z ⊃ z = q. By the transitivity
of identity, ` z = k + n  k + n = q ⊃ z = q. Since ` k + n = q, by truth-functional
logic and the symmetry of identity we have ` k + n = z ⊃ z = q. A similar
argument shows that the formula x × y = z numeralwise represents multiplication.
Numeralwise representation is, in one sense, a weak constraint on a formaliza-
tion of a relation or function. It requires only that the formalization of “pointwise
facts” about the relation or function be derivable in PA: for all particular argu-
ments, whether the relation holds or not, and for all particular arguments, what the
value of the function is and that it is the only value. For other purposes we might
well want to require more, for example, that formalizations of general laws that the
relation or the function obeys be derivable in PA. We might not take the formula
x + y = z to be a good formalization of addition unless, say, the commutative law
were derivable using it. However, numeralwise representation is all that is needed
for the First Incompleteness Theorem.
Representability Theorem: Every primitive recursive relation and
function is numeralwise representable in PA.
We put off the proof until Chapter 4. As we shall see, it is straightforward, amount-
ing primarily to verifying that manipulations of finite sequences of numbers can be
formalized in PA. For now we note only that the proof is entirely syntactic and
is constructive: it provides a recipe for constructing, given any primitive recursive
definition of a function or a relation, a formula that numeralwise represents that
function or relation.

2.5 Proof of incompleteness


As a last preliminary, we define the notion of ω-consistency. PA is ω-consistent iff for
no formula F (x) are all the following derivable: ∼∀xF (x) as well as the numerical
instances F (n) for every n. Since ∼∀xF (x) is logically equivalent to ∃x ∼ F (x),
we can rephrase the definition thus: if ` F (n) for every n, then ∃x ∼ F (x) is not
derivable. Thus, ω-consistency requires that if all numerical instances of a formula
are derivable then the formula expressing the existence of a counterexample cannot
derivable. If we take F (x) to be ∼H(x) and cancel the double negation, then we
38 CHAPTER 2. GÖDEL’S PROOF

obtain another way of phrasing the condition: if ∃xH(x) is derivable, then not
all of ∼H(0), ∼H(1), ∼H(2), . . . are derivable. That is, if it can be derived that
there is a number with a certain property, it cannot be derived that each particular
number fails to have the property. Note that ω-consistency is a syntactic property,
although somewhat more complex than consistency. Note too that ω-consistency
implies consistency: for if the system is inconsistent every formula is derivable, so the
system is ω-inconsistent. As we shall see, consistency does not imply ω-consistency:
it is possible for a system to be consistent but ω-inconsistent. But clearly we would
want to require PA to be ω-consistent.
Let us list the results of previous sections that will be used in the proof. From
the previous section we need the Representability Theorem. From the arithmeti-
zation of syntax we need just two results: the existence of a primitive recursive
relation Der(k, n) such that Der(k, n) holds iff n is the gödel number of a formula
and k encodes a derivation of that formula; and the existence of a primitive recursive
function diag(n) such that if n is the gödel number of a formula F (y) then diag(n)
is the gödel number of F (n). From the logical material in Chapter 1, we need only
the fact that if a universal quantification ∀xF (x) is derivable, then so are all its
numerical instances F (k). This of course follows from the instantiation axiom (Q1)
and modus ponens.
Now let Q be the 2-place relation defined thus: for all integers k and n,
Q(k, n) ↔ Der(k, diag(n))
Then Q is primitive recursive;. By the Representability Theorem, there is a formula
A(x, y) that numeralwise represents Q. That is, for all k and n,
(a) if Q(k, n) then ` A(k, n);
(b) if Q(k, n) then ` ∼A(k, n).
Let p be the gödel number of ∀xA(x, y). Then, from the property of diag(n) just
noted, diag(p) is the gödel number of ∀xA(x, p). Note that ∀xA(x, p) is a sentence,
that is, contains no free variables. We can now complete the proof in five steps.
(1) If ` ∀xA(x, p) then, for each k, ` A(k, p) .
This is clear, since each A(k, p) is an instance of ∀xA(x, p).
(2) If ` ∀xA(x, p) then there exists a number k such that ` ∼A(k, p).
For suppose ` ∀xA(x, p). Then there is a number k that encodes a derivation
of ∀xA(x, p), so that Der(k, q), where q is the gödel number of ∀xA(x, p). As noted
above, q = diag(p). Hence Der(k, diag(p)), that is, Q(k, p). But then, by (b) above,
` ∼A(k, p)
2.5. PROOF OF INCOMPLETENESS 39

(3) If PA is consistent then ∀xA(x, p) is not derivable.


This follows from (1) and (2), since together they imply that if ` ∀xA(x, p)
then PA is inconsistent.
(4) If PA is consistent then, for each k, ` A(k, p).
For suppose PA consistent. By (3), ∀xA(x, p) is not derivable. Hence there is
no number that encodes a derivation of it. By Mirroring, for each k, Der(k, diag(p)).
That is, for each k, Q(k, p). By (a) above, for each k, ` A(k, p).
(Of course, if PA is inconsistent then also ` A(k, p) for each k, since everything
is derivable, but we do not need this fact.)
(5) If PA is ω-consistent, then ∼∀xA(x, p) is not derivable.
Suppose PA is ω-consistent. Then PA is consistent. By (4), A(k, p) is derivable
for each integer k. So if ∼∀xA(x, p) were also derivable, PA would be ω-inconsistent.

Gödel’s First Incompleteness Theorem If PA is ω-consistent then


there is a sentence G of LPA such that neither G nor ∼G is derivable in
PA, and hence PA is syntactically incomplete.

This follows from (3) and (5), taking G to be the sentence ∀xA(x, p). This sentence
is often called the Gödel sentence.
The core of the foregoing proof is step (2). Traditionally, to show a fact of
the form “if F is derivable then so is H” we show that H can be derived from
F , or from a formula obtained from F by universal generalization. Indeed, step
(1) has precisely this character (other examples can be found in the Exercises for
§1.2). But this is not at all what is going on in step (2). Rather, the supposition
that ` ∀xA(x, p) is exploited as a metalinguistic fact; this fact is then mirrored as a
number theoretic fact (namely, that there is a number k such that Der(k, q), where q
is the gödel number of ∀xA(x, p)); and that number-theoretic fact is then formalized
in the system, by means of numeralwise representation. In a somewhat loose way
of speaking, we might say that we are not drawing an inference from the content
∀xA(x, p) might be taken to express, but rather from the fact of its derivability.
The formula A(x, y) was so chosen that for any formula F (y) with gödel number
n, if ` F (n) then, for some k, ` ∼A(k, n). That is, since the gödel number of F (n)
is diag(n), ` F (n) tells us that there exists a number k such that Der(k, diag(n)),
so that ` ∼A(k, n) by numeralwise representation. Obtaining step (2) is a matter
only of choosing the right formula F (y). Namely, we choose F (y) to be ∀xA(x, y).
Call its gödel number p. Thus we obtain the result that if ` ∀xA(x, p) then there
exists a k such that ` ∼A(k, p).
40 CHAPTER 2. GÖDEL’S PROOF

The formula ∀xA(x, y) formalizes a number-theoretic property. Since A(x, y)


numeralwise represents the relation Der(k, diag(n)), we could say that ∀xA(x, y) for-
malizes the one-place relation that holds of n iff for no k do we have Der(k, diag(n));
and by mirroring this holds iff the formula with gödel number diag(n) is not deriv-
able. Now ∀xA(x, y) itself has a gödel number, namely, p. So the formula ∀xA(x, p)
formalizes the statement that this one-place relation holds of p, which mirrors the
statement that the formula with gödel number diag(p) is not derivable. But diag(p)
is the gödel number of that very formula ∀xA(x, p)! That is, the Gödel sentence
formalizes a number-theoretic condition that mirrors this syntactic claim: the Gödel
sentence is not derivable. It is no wonder that the Gödel sentence is not derivable:
for if it were, the claim it formalizes would not be correct, and there would be a
derivation of it, which would be encoded by a number k such that ` ∼A(k, n),
producing an inconsistency in PA.
That the Gödel sentence formalizes a number-theoretic fact that mirrors the
claim that the Gödel sentence is underivable is often expressed in picturesque terms
thus: the Gödel sentence asserts that it is not underivable. Or even more vividly,
“the Gödel sentence says ‘I am not derivable’.” Formulas don’t talk, of course,
so this is figurative language. Unfortunately, it can also cause confusion, so it is
important to see exactly how to put the point precisely and unfiguratively. We do
this in the next section.
Step (4) also merits comment, since it expresses an important phenomenon
that Gödel’s proof brings to light. What is shown here (assuming PA consistent) is
that a formula F (x) can be derived whenever a formal numeral is substituted for
x; but the universal quantification ∀xF (x) cannot be derived. Loosely speaking:
the property that F (x) expresses can be derived to hold of each particular number,
but “every number possesses the property” cannot be formally derived. For what
is shown above is that for each k, the formalization of Der(k, diag(p)) is derivable,
but the formalization of (∀k)Der(k, diag(p)) is not derivable.

2.6 ‘I am not derivable’


The most direct way of making precise sense of the claim that the Gödel sentence
asserts its own underivability is to adopt a semantical stance and use the notion
of the truth of formulas in the intended interpretation of LPA . All uses of “truth”
below are to be understood in this sense. First note that the formula A(k, n) is
true iff Der(k, diag(n)): this follows from the soundness of PA for the intended
interpretation and the choice of A(x, y) as numeralwise representing that relation.
2.6. ‘I AM NOT DERIVABLE’ 41

(The recourse to soundness at this point can actually be avoided by inspection of the
proof of the Representability Theorem. That A(k, n) is true iff Der(k, diag(n)) will
follow directly from the construction of the formula A(x, y). See §4.3. ) Since the
universe of the intended interpretation is the natural numbers, for each n, ∀xA(x, n)
is true iff Der(k, diag(n)) for every natural number k. By mirroring, this condition
obtains iff the formula with gödel number diag(n) is not derivable. Now let p be the
gödel number of ∀xA(x, y); then ∀xA(x, p) is true iff the formula with gödel number
diag(p) is not derivable. Since the formula with gödel number diag(p) is ∀xA(x, p),
this shows

(†) ∀xA(x, p) is true iff ∀xA(x, p) is not derivable.

That is, the condition for the truth of the Gödel sentence is a number-theoretic
fact that mirrors the underivability of the Gödel sentence. Note that mirroring is
essential here. In the most direct sense, the Gd̈el sentence asserts a number-theoretic
statement; it is true iff a certain number-theoretic condition holds. It is only using
mirroring that we obtain the biconditional between the truth of the Gödel sentence
and a syntactic condition.
From (†) and the soundness of PA a quick semantical argument for incomplete-
ness can be formulated. G cannot be derivable, since if it is then by (†) it is false,
which would violate soundness. So G is not derivable, and hence it is true. Hence
∼G is false, and so by soundness it cannot be derivable. Gödel’s First Incomplete-
ness Theorem is often stated semantically thus: there is a true sentence of LPA that
is not derivable in PA. (As we shall see in §3.2, though, the syntactic proof of the
previous section yields an important further result unobtainable from the semantic
proof.)
There are two ingredients to obtaining (†). First is gödelization, that is, the
arithmetization of syntax, which shows that syntactic notions can be captured by
formulas of LPA , by formalizing the number-theoretic notions that mirror the syn-
tactic ones. Second is the use of the function diag(n), which allows the construction
of a formula that can appear on both sides of the biconditional (†).
Gödel’s strategy can be viewed this way. Define “formula m at n” as: the result
of substituting n for any free occurrences of y in the formula with gödel number
m (if m is not the gödel number of a formula, let formula m at n be an arbitrary
object that is not a formula). By gödelization, the relation “formula m at n is not
derivable” can be captured by a formula of LPA : all we need do is formalize the
A
number-theoretic relation ( k)Der(k, sub(m, 19, nmrl(n))). Now identify the two
variables, that is, consider only the case when m = n. (In the plane, these pairs
42 CHAPTER 2. GÖDEL’S PROOF

form the diagonal; hence the use of “diag”.) So we have a formula F (y) with one
free variable that captures the 1-place relation “formula n at n is not derivable”,
that is, for each n, F (n) is true iff formula n at n is not derivable. Let p be the
gödel number of F (y). We then have F (p) is true if formula p at p is not derivable;
and formula p at p is just F (p). Thus we obtain (†).
So far in this section we have been using the notion of truth in the intended
interpretation for formulas of LPA . However, there is a way of capturing (†) syntac-
tically, namely, by formalizing it within LPA . The right hand side of (†) expresses
a syntactic property, which is formalizable via gödelization in a direct way. Let
D(x, y) be a formula that numeralwise represents the relation Der(k, n). Then if H
is a formula with gödel number n, the formula ∃xD(x, n) is a formalization of the
E
assertion that ( k)Der(k, n), which mirrors the assertion that H is derivable. Thus
the underivabiity of the Gödel sentence can be formalized by ∼∃xD(x, q), where q
is the gödel number of the Gödel sentence. The left hand side of (†) is the ascrip-
tion of truth to the Gödel sentence, but inside LPA this can be formalized by the
Gödel sentence itself, since the metalinguistic ascription of truth to a sentence can
be captured in the object language by the assertion of that sentence. We claim that
the resulting formalization is derivable in PA, that is,

(‡) ` ∀xA(x, p) ≡ ∼∃xD(x, q)


To show this, we need to go into more detail about the construction of A(x, y).
That task will occupy us in at the beginning of Chapter 3. Moreover we shall see
that, just as (†) yields a concise semantic proof of the First Incompleteness Theorem,
(‡) provides a quick way of reformulating the syntactic proof of the preceding section.
Chapter 3

Formalized Metamathematics

3.1 The Fixed Point Lemma


As we pointed out in the preceding section, Gödel’s proof can be analyzed into
two central components, gödelization and the use of the diagonal function. The
contributions each component makes to the proof can be highlighted if we look
more closely at how the Gödel sentence may be constructed. In the proof, we
used ∀xA(x, p) for the Gödel sentence, where A(x, y) numeralwise represents the
relation Der(k, diag(n)). It is natural to think of such a formula as built up from
numeralwise representations of Der and diag. So let D(x, y) be a formula of LPA
that numeralwise represents the relation Der(k, n) and let ∆(y, z) be a formula that
numeralwise represents the function diag(n). Then let A(x, y) be the formula

∀z(∆(y, z) ⊃ ∼D(x, z))


Now let p be the gödel number of ∀xA(x, y) and let q = diag(p). ∀xA(x, p) is
the Gödel sentence; its gödel number is diag(p), that is, q. Since ∆(y, z) numeralwise
represents diag, we have

` ∆(p, z) ≡ z = q
Hence the formula A(x, p) is provably equivalent to

∀z(z = q ⊃ ∼D(x, z))


which by the laws of identity is equivalent to ∼D(x, q). From the equivalence of
A(x, p) and ∼D(x, q) we may infer the equivalence of ∀xA(x, p) and ∀x ∼ D(x, q),

43
44 CHAPTER 3. FORMALIZED METAMATHEMATICS

and the latter formula is of course equivalent to ∼∃xD(x, q). Thus we have shown

(‡) ` ∀xA(x, p) ≡ ∼∃xD(x, q)


Once we have (‡), we need pay no further attention to the internal structure of
the Gödel sentence or to the use of diag. The rest of the work is done by gödelization,
in particular, that the number-theoretic relation Der(k, n) mirrors the syntactic rela-
tion ‘derivation of’ and that the formula D(x, y) numeralwise represents Der(k, n).
We can then formulate a quick proof of Gödel’s First Theorem. First, suppose
` ∀xA(x, p). By mirroring, Der(k, q) for some k, so that by numeralwise repre-
sentation, ` D(k, q) for some k, whence ` ∃xD(x, q). But from ` ∀xA(x, p), (‡)
and truth-functional logic we also have ` ∼∃xD(x, q). Hence PA is inconsistent.
Second, suppose ` ∼∀xA(x, p). If PA is consistent, ∀xA(x, p) is not derivable, so by
mirroring Der(k, q) for each k; so by numeralwise representation ` ∼D(k, q) for each
k. But, from ` ∼∀xA(x, p), (‡), and truth-functional logic we have ` ∃xD(x, q).
Hence PA is ω-inconsistent.
The method just used to construct the Gödel sentence can be applied more
generally, so as to yield the following theorem, sometimes also called the “Diagonal
Lemma”.

Fixed Point Lemma. Let F (y) be any formula. Then there is a


formula H such that ` H ≡ F (m), where m is the Gödel number of H.

Proof. Let ∆(x, y) numeralwise represent the function diag, let F 0 (y) be ∀z(∆(y, z) ⊃
F (z)), let k be the Gödel number of F 0 (y), and let H be F 0 (k). Let m be the gödel
number of H; then m = sub(k, 19, nmrl(k)) = diag(k), so that

` ∆(k, z) ≡ z = m
From this it follows that F 0 (k) is provably equivalent to ∀z(z = m ⊃ F (z)),
and hence to F (m). 
The formula F (y) may contain free variables aside from y. If it does not, that
is, if y is the only free variable in F (y) then H will be a sentence; and if it does
contain other free variables then the free variables of H will be precisely those other
variables.
A graphic way of stating the Lemma is possible with some new notation: if F
is a formula, then let pF q be the formal numeral for the Gödel number of F ; that
is, pF q is k, where k = γ(F ). Thus pq is a function that carries formulas to formal
3.1. THE FIXED POINT LEMMA 45

numerals. Its interaction with the number-theoretic function nmrl(n) and the gödel
numbering γ is given by the following diagram:
γ
F - γ(F )

pq nmrl

? ?
pF q -nmrl(γ(F ))
γ

That is, for each formula F , γ(pF q) = nmrl(γ(F )). The Fixed Point Lemma then
says: for every formula F (y) there is a formula H such that

` H ≡ F (pHq).

Any such formula H is called a fixed point of F (y).


It should be noted that there are other fixed points of a formula F (y) aside
from that constructed in the proof of the Fixed Point Lemma. In fact, there are
infinitely many distinct ones. This can be seen as follows. Let J be any sentence
derivable in PA, and let H 0 be the fixed point constructed as in the proof but for
the formula F (y)  J. Then ` H 0 ≡ F (pH 0 q)  J. Since J is derivable, it follows
by truth-functional logic that ` H 0 ≡ F (pH 0 q), so that H 0 is a fixed point of
F (y). Moreover H 0 is distinct from H, since H is ∀z(∆(k, z) ⊃ F (z)) and H 0 is
∀z(∆(k0 , z) ⊃ F (z))J, where k 6= k 0 . Clearly, using a different J in this construction
will yield yet a different H 0 ; hence we can generate infinitely many different ones.
Nor should it be thought that these different fixed points will necessarily be
equivalent. The original fixed point H is derivably equivalent to F (pHq); the new
fixed point H 0 is derivably equivalent to F (pH 0 q). As these formulas ascribe F (y) to
different objects (different Gödel numbers), they say different things. (Your saying
‘I’m hungry’ is not equivalent to my saying ‘I’m hungry’.)
One can even insure that certain different fixed points are inequivalent. Let
F (y) numeralwise represent the primitive recursive relation that holds of a number
n iff it is the Gödel number of a formula of which 0 = 0 is not a subformula. There
will be such an F (y) that itself does not contain 0 = 0 as a subformula (if it did,
we could always replace the subformula 0 = 0 by any other derivable formula), and,
similarly, there will be a formula ∆(y, z) that numeralwise represents the function
diag(n) that does not contain 0 = 0 as a subformula. Hence the fixed point H
46 CHAPTER 3. FORMALIZED METAMATHEMATICS

constructed in the proof of the Fixed Point Theorem will not contain 0 = 0 as a
subformula, so that ` F (pHq), and hence ` H. On the other hand, if we let J be
0 = 0 and carry out the construction of two paragraphs back, then H 0 does contain
0 = 0 as a subformula, so that ` ∼F (pH 0 q), and hence ` ∼H 0 . Thus H and H 0 are
not equivalent.
The quick proof of Gödel’s Theorem given just before the Fixed Point Lemma
can be viewed as an application of the Lemma to the formula ∼∃xD(x, y), where
D(x, y) numeralwise represents the relation Der(n, k). Let us define a 1-place
E
number-theoretic relation Dvbl(n) as ( k)Der(k, n) (note this is not a primitive
recursive definition, since there is no bound on the quantifier). This relation mirrors
derivability, that is, Dvbl(n) holds iff n is the gödel number of a formula deriv-
able in PA. Since D(x, y) numeralwise represents Der, we can think of the formula
∃xD(x, y) as formalizing Dvbl(n), and so, by mirroring, as being a formal expression
of derivability. Thus Gödel’s Theorem can be obtained by applying the Fixed Point
Lemma to a formal expression of underivability. (This analysis of Gödel’s proof,
along with the formulation of the Fixed Point Lemma, was first given by Rudolf
Carnap in 1934.) The quick proof can be further streamlined so as to highlight the
needed properties of the expression of derivability. Let Prov(y) be a formula meant
as such a formal expression, and suppose it obeys the following two conditions:

• Adequacy: For any formula F , if ` F then ` Prov(pF q).

• Faithfulness: For any formula F , if ` Prov(pF q) then ` F .

Now let G be a fixed point of ∼Prov(y). To show G not derivable in PA, if PA is


consistent, we need just the Adequacy condition: since ` G ≡ ∼Prov(pGq), if ` G,
then ` ∼Prov(pGq) by truth-functional logic, but also ` Prov(pGq) by Adequacy,
so that PA would be inconsistent. Moreover, we can show ∼G not derivable in
PA, if PA is consistent, using just Faithfulness: if ` ∼G, then ` Prov(pGq) by
truth-functional logic, whence ` G by Faithfulness, so that, again, PA would be
inconsistent.
At this point the acute reader should be wondering what happened to the
hypothesis of ω-consistency. The answer is that it figures in the proof that there
exists a formula Prov(y) fulfilling the two conditions. That is, we can show that
if D(x, y) numeralwise represents the relation Der, then the formula (∃x)D(x, y)
obeys the Adequacy Condition and—provided that PA is ω-consistent—also the
Faithfulness Condition. For suppose ` F . By Mirroring, there exists a k such
that Der(k, γ(F )). By numeralwise representation, for such a k ` D(k, pF q). By
3.2. GÖDEL’S SECOND INCOMPLETENESS THEOREM 47

existential generalization, ` ∃xD(x, pF q). Thus Adequacy is fulfilled. Now suppose


` ∃xD(x, pF q). We show that if not ` F then PA is ω-inconsistent. If not ` F , then
Der(k, γ(F )) for no k, in which case, by numeralwise representation, ` ∼D(k, pF q)
for every k. This together with ` ∃xD(x, pF q) is an ω-inconsistency. Hence, if PA
is ω-consistent then ∃xD(x, y) fulfills the Faithfulness condition.
Now, if Prov(y) numeralwise represented the relation Dvbl(n), then consistency
alone would be enough to insure Faithfulness: for if not ` F , then ` ∼Prov(pF q)
by numeralwise representation, whence not ` Prov(pF q) by consistency. But in
fact, we cannot require that Prov(y) numeralwise represent Dvbl(n), since if PA is
consistent then Dvbl(n) is not numeralwise representable! (This claim follows from
a straightforward application of the Fixed Point Theorem. See the Exercises.) Note
that this shows that the relation Drvbl(n) is not primitive recursive.
Caution. The Adequacy Condition does not imply that F ⊃ Prov(pF q) is
derivable. Indeed, assuming that PA is ω-consistent, there are formulas F such that
F ⊃ Prov(pF q) is not derivable. Similarly, the Faithfulness Condition does not
imply that Prov(pF q) ⊃ F is derivable. If PA is consistent, there are formulas F
such that Prov(pF q) ⊃ F is not derivable. (See the Exercises.)

3.2 Gödel’s Second Incompleteness Theorem


Gödel’s Second Theorem states that if PA is consistent then the consistency of
PA is not derivable in PA. Now the consistency of PA is a metamathematical
statement. What does it mean to say that this metamathematical statement is or
is not derivable? Of course we mean that a formal expression of this statement is
or is not derivable. And by a formal expression we mean a formula that formalizes
whatever number-theoretic fact mirrors the consistency of PA. That is, by mirroring,
PA is consistent iff the class of Dvbl-numbers has such-and-such a property, which
is number-theoretic in nature. The formal expression of consistency is obtained by
formalizing the such-and-such property.
Recall that to prove Gödel’s First Theorem one proves that if PA is consistent
then G is not derivable, where G is the Gödel sentence. Number-theoretically speak-
ing, this is a proof that if the class of Dvbl-numbers has such-and-such a property
then q is not a Dvbl-number, where q is the Gödel number of G. If we formalize
this number-theoretic assertion we get the formula
(Con) ⊃ ∼Prov(q),
where (Con) is the formal expression of consistency and Prov(y) is the formal ex-
48 CHAPTER 3. FORMALIZED METAMATHEMATICS

pression of derivability. Indeed, by formalizing the proof of the number-theoretic


assertion—a proof which, as Gödel emphasized, is purely number-theoretic and uses
only simple arithmetical principles—we obtain the derivability in PA of this formula.
But we saw above that
` G ≡ ∼Prov(q).
Hence
` (Con) ⊃ G.
We may conclude that if (Con) were derivable in PA, then G could be derivable in
PA. But then, by the First Theorem, PA would be inconsistent.
Thus the heart of the proof of Gödel’s Second Incompleteness Theorem is to
show that a formalization of Gödel’s First Incompleteness Theorem is derivable in
PA. To fill in the details of this outline, we first have to fix on a formal expression
of consistency. There are many candidates. PA is consistent iff for no formula F
are F and ∼F both derivable. Hence one could formalize consistency by formalizing
the assertion: for no number n do we have both Dvbl(n) and Dvbl(neg(n)). Alter-
natively, PA is consistent iff there exists an underivable formula; hence one could
formalize the assertion: there exists an n such that Form(n) and Dvbl(n). The es-
sential thing about the formal expression of consistency is that it function formally
(that is, within PA) in an appropriate way in relation to the formal representa-
tion of derivability. What this comes to is that if Prov(y) is the representation of
derivability, then the representation (Con) of consistency should fulfill the following
condition:

(]) For each formula F , ` (Con) ⊃ ∼(Prov(pF q)  Prov(p∼F q))

As we shall see, this condition can be secured in a pleasingly simple way.


The formalization of the argument for Gödel’s First Theorem also requires
the derivability in PA of certain formulas involving Prov(y). For example, since PA
includes modus ponens we have: if F ⊃ G is derivable , then if F is derivable so is G.
(Number-theoretically, if Dvbl(cond(m, n)), then if Dvbl(m) then Dvbl(n).)We need
to be able to derive the formalization of this fact in PA. Thus we should like to have
the derivability in PA of the formula Prov(pF ⊃ Gq) ⊃ (Prov(pF q) ⊃ Prov(pGq))
for all formulas F and G. The derivability of such formulas is not assured by the
Adequacy Condition. We are thus led to consider what formal properties a formula
ought to have in order that the formula be a suitable formal expression of the
notion of derivability. Once we make this explicit, we shall be in a position to give
a perspicuous proof of Gödel’s Second Theorem.
3.2. GÖDEL’S SECOND INCOMPLETENESS THEOREM 49

We call a formula Prov(y) a standard provability predicate iff it obeys the fol-
lowing conditions:
Adequacy. For each formula F , if ` F then ` Prov(pF q).
Formal Modus Ponens. For all formulas F and G,
` Prov(pF ⊃ Gq) ⊃ (Prov(pF q) ⊃ Prov(pGq)).
Formal Adequacy. For each formula F ,
` Prov(pF q) ⊃ Prov(pProv(pF q)q).

The condition of Formal Adequacy amounts to the derivability in PA of a formal


expression of the fact that Prov(y) obeys Adequacy. For Adequacy states that if F is
derivable then so is the formula Prov(pF q). If the notion ‘is derivable’ is formalized
by Prov(y), the formalization of Adequacy is Prov(pF q) ⊃ Prov(pProv(pF q)q).
Note. Not every standard provability predicate can be thought of as expressing
provability. E.g., any formula that numeralwise represents the relation Form will
fulfill the three conditions. The natural formalization of derivability, however, is
what we have in mind in talking of a standard provability predicate. This natural
formalization is ∃xD(x, y), where D(x, y) numeralwise represents the relation Der.
As we saw at the end of §3.1, the Adequacy Condition follows at once. Formal
Modus Ponens requires that D(x, y) be a “good” formalization of the relation Der
in a sense that goes beyond numeralwise representation. To be specific, it requires
that D(x, y) can be used in a formal derivation of the elementary number-theoretic
fact that, for any i and j and any formulas F and G, if Der(i, γ(F ⊃ G)) and
Der(j, γ(F )), then there exists an integer k such that Der(k, γ(G)) (in fact one can
take k = i ∗ j ∗ 2γ(G) ). Finally, to obtain Formal Adequacy we would have to be able
to use D(x, y) in a formalization of the proof of Adequacy, and so in a formalization
of the fact that D(x, y) numeralwise represents the relation Der. Although there
are many details here, the arithmetical reasoning needed is elementary (as can be
seen from the proof of the Representability Theorem in Chapter 4), and poses no
difficulty for formalization.
We do not include Faithfulness among the conditions for a standard provability
predicate because then the existence of a standard provability predicate would re-
quire a hypothesis like ω-consistency; whereas a formula Prov(y) fulfilling the three
conditions above can be shown to exist without any consistency-like suppositions.
One consequence of Adequacy and Formal Modus Ponens will be used often
enough below to merit special mention:
50 CHAPTER 3. FORMALIZED METAMATHEMATICS

If ` F ⊃ G then ` Prov(pF q) ⊃ Prov(pGq).

For if ` F ⊃ G then ` Prov(pF ⊃ Gq) by Adequacy. By Formal Modus Ponens and


modus ponens we obtain ` Prov(pF q) ⊃ Prov(pGq). In what follows, we shall call
this principle ‘Quick FMP’.
Having considered what conditions a formal representation of derivability should
satisfy, let us now return to the question of a formal representation of the assertion
that PA is consistent. If Prov(y) is a standard provability predicate, we may use
it to formulate a simple formal expression of consistency. Namely, let (Con) be the
formula ∼Prov(pS0 = 0q). It is straightforward to show that (Con) obeys condition
(]). For let F be any formula. We have

` F ⊃ (∼F ⊃ S0 = 0)

by truth-functional logic. Applying Quick FMP we obtain

` Prov(pF q) ⊃ Prov(p∼F ⊃ S0 = 0q)

From Formal Modus Ponens and truth-functional logic we have

` Prov(pF q) ⊃ (Prov(p∼F q) ⊃ Prov(pS0 = 0q))


so by truth-functionally equivalence we then have

` Prov(pF q)  Prov(p∼F q) ⊃ Prov(pS0 = 0q)


Condition (]) is just the contrapositive of this formula.
The formula (Con) also has the following amusing property: Let F be any
formula; then ` ∼Prov(pF q) ⊃ (Con). (See the Exercises.) The amusing property
gives further support to the notion that (Con) is an appropriate formalization of
the statement that PA is consistent, since it shows that the formalization of the
elementary fact that if there is an underivable formula then PA is consistent, is
itself derivable.
Fully elaborated, then, Gödel’s Second Incompleteness Theorem reads as fol-
lows:

Gödel’s Second Incompleteness Theorem Let (Con) be a sentence


of LPA that fulfills condition (]), where Prov(y) is a standard provability
predicate. Then if PA is consistent, (Con) is not derivable in PA.
3.2. GÖDEL’S SECOND INCOMPLETENESS THEOREM 51

Proof. Let G be any fixed point of ∼Prov(y). Thus ` G ≡ ∼Prov(pGq). As we


saw in the previous section, it follows from the Adequacy of Prov(y) that if PA is
consistent then G is not derivable. We also have:

(1) ` Prov(pGq) ⊃ ∼G by the specification of G


(2) ` Prov(pProv(pGq)q) ⊃ Prov(p∼Gq) by Quick FMP
(3) ` Prov(pGq) ⊃ Prov(pProv(pGq)q) by Formal Adequacy
(4) ` Prov(pGq) ⊃ Prov(p∼Gq) by t-f logic, from (2) and (3)
(5) ` Prov(pGq) ⊃ ∼(Con) by t-f logic, from (4) and (])
(6) ` (Con) ⊃ ∼Prov(pGq) by t-f logic, from (5)
(7) ` (Con) ⊃ G by t-f logic, from (6) and
the specification of G.

From (7) it follows that if PA is consistent then (Con) is not derivable. For if (Con)
were derivable then, by modus ponens, G would be derivable, and so PA would be
inconsistent. 
At the end of the previous section we showed the following: let Prov(y) fulfill
Adequacy, and let G be a fixed point of ∼Prov(y); then if G is derivable, PA is in-
consistent. The foregoing proof of Gödel’s Second Theorem is a direct formalization
of the proof we gave.
Gödel’s Second Theorem differs from the First Theorem in the subtlety of what
it says. For much hinges on what we take to be a formal statement of the consistency
of the system. We have required (and used in the proof) that the formal consistency
statement have property (]), and that Prov(y) be a standard provability predicate.
A simple example shows that some such subtlety is needed. Let Prov0 (y) be the
formula Prov(y)  y 6= pS0 = 0q, where Prov(y) is a standard provability predicate.
Then Prov0 (y) fulfills the Adequacy Condition. For if ` F , then ` Prov(pF q), and
moreover either F is not the formula S0 = 0, in which case ` ∼pF q = pS0 = 0q,
whence ` Prov0 (pF q), or else F is the formula S0 = 0, in which case PA is incon-
sistent, whence again ` Prov0 (pF q). But if we take (Con) to be ∼Prov0 (pS0 = 0q),
then clearly (Con) is derivable in PA. The hitch here, of course, is that this formula
(Con) does not have property (]). This shows that Prov0 (y) is not standard: in fact,
it fails to fulfill Formal Modus Ponens (although it does fulfill Formal Adequacy).
There are more complicated examples: we can construct formulas (Con) that
do possess property (]), but such that the predicate Prov(y) mentioned in (]) is not
standard (lacks, say, Formal Adequacy); and such that (Con) is derivable in PA. Of
course, we want to say that any such formula (Con) is not, in any intuitive sense, a
formal statement of the consistency of PA, because the predicate Prov(y) is not a
52 CHAPTER 3. FORMALIZED METAMATHEMATICS

good formalization of derivability. As has been pointed out, Formal Modus Ponens
and Formal Adequacy are simply the formalizations of assertions about derivability
which can be shown by elementary metamathematical arguments. And for a predi-
cate to qualify as a good formalization of derivability, it must be usable in derivations
that formalize elementary metamathematical arguments about derivability.

3.3 The First Incompleteness Theorem Sharpened


Gödel’s proof of incompleteness requires the hypothesis of ω-consistency. In the most
streamlined rendition of this proof, as we saw in Section 3.1 above, ω-consistency is
used to show the faithfulness of a provability predicate; and faithfulness is used to
show the underivability of the negation of the Gödel sentence. In 1936 J. Barkley
Rosser sharpened Gödel’s result by showing how to prove incompleteness using only
consistency, not ω-consistency, as a hypothesis. Rosser’s basic idea was to replace
the Faithfulness Condition with another one, which we shall call the Rosser Condi-
tion. There are two desiderata: a provability predicate that fulfills the Adequacy
and Rosser Conditions should be usable in a fixed-point argument to establish the
incompleteness of PA (relying only on consistency); and the existence of such a
provability predicate should be demonstrable without recourse to ω-consistency.

Rosser’s Lemma. Suppose a formula Rov(y) fulfills the following two


conditions for all formulas F:
Adequacy. If ` F then ` Rov(pF q)
Rosser. If ` ∼F then ` ∼Rov(pF q).
Let R be a fixed point of ∼Rov(y). If PA is consistent, then neither R
nor ∼R is derivable.

Proof. Suppose ` R; by Adequacy, ` Rov(pRq); since ` R ≡ ∼Rov(pRq), ` ∼R;


thus PA is inconsistent. Suppose ` ∼R; by Rosser, ` ∼Rov(pRq); since ` R ≡
∼Rov(pRq), ` R; thus PA is inconsistent. 
Note that no formula Rov(y) that fulfills the Rosser Condition can be a standard
provability predicate. For that condition yields ` ∼Rov(pS0 = 0q).
To prove the Theorem it now suffices to show the existence of a suitable formula
Rov(y). Here Rosser came up with a most ingenious and fertile idea. A refutation
of a formula F is a derivation of ∼F . We formalize the following notion, which we
dub ‘rovability’: a formula F is rovable iff there is a derivation of F but no smaller
3.3. THE FIRST INCOMPLETENESS THEOREM SHARPENED 53

refutation of F (where ‘smaller’ means simply: with smaller gödel number). For-
malizations of the following heuristic arguments will then yield the two conditions,
as we shall see. First, suppose F is derivable; thus there is a derivation of F . Check
all the (finitely many) smaller derivations that there are, and you’ll see (provided
PA is consistent) that none is a refutation of F . Hence F is rovable. Second, sup-
pose ∼F is derivable; thus there is a refutation of F . No smaller derivation is one
of F (provided PA is consistent), and this can be checked. Hence if there is some
derivation of F , then there is a smaller refutation of F (namely, the given one).
Hence F is not rovable.
To formalize this we shall need two properties of the formula x 6 y defined in
§1.6:

(a) for every integer k, ` x 6 k ∨ k 6 x;


(b) for each formula F (x) and each integer k, if ` F (j) for j = 0, 1, ..., k,
then ` x 6 k ⊃ F (x).

Property (a) follows by instantiation from the last law given in §1.6. For prop-
erty (b), note first that from F (j) is logically equivalent to x = j ⊃ F (x). Hence
property (b) will follow if we have, for each number k, ` x 6 k ⊃ x = 0 ∨ x =
1 ∨ ... ∨ x = k. The proof of this is left to the reader. (See the Exercises.)
We may now define the formula Rov(y). Let Ref be the two-place primitive
recursive relation defined by Ref(k, n) ↔ Der(k, neg(n)), and let D(x, y) and R(x, y)
numeralwise represent the relations Der and Ref, respectively. Let B(x, y) be

D(x, y)  ∀z(z 6 x ⊃ ∼R(z, y))

and, finally, let Rov(y) be (∃x)B(x, y).


(1) Rov(y) fulfills the Adequacy Condition.
For suppose ` F . Then for some k, Der(k, γ(F )), so that ` D(k, pF q). Now,if
PA is consistent, then Ref(j, γ(F )) for each j ≤ k. Hence ` ∼R(j, pF q) for j =
0, 1, ..., k. By (b) above and universal generalization, we then have ` ∀z(z 6 k ⊃
∼R(z, pF q)). Thus

` D(k, pF q)  ∀z(z 6 k ⊃ ∼R(z, pF q))

from which ` Rov(pF q) follows by existential generalization. If, on the other hand,
PA is inconsistent, then every formula is derivable, so in particular ` Rov(pF q).
(2) Rov(y) fulfills the Rosser Condition.
54 CHAPTER 3. FORMALIZED METAMATHEMATICS

For suppose ` ∼F . Thus there is a k such that Ref(k, γ(F )), so that `
R(k, pF q). Hence ` k 6 x ⊃ (∃z)(z 6 x  R(z, pF q)), or, equivalently,

` k 6 x ⊃ ∼∀z(z ≤ x ⊃ ∼R(z, pF q))

Now if PA is consistent then Der(j, γ(F )) for each j ≤ k. Hence ` ∼D(j, pF q) for
j = 0, 1, ..., k. By (b) above,

` x 6 k ⊃ ∼D(x, pF q)

By (a) above and modus ponens, we obtain

` ∼D(x, pF q) ∨ ∼∀z(z 6 x ⊃ ∼R(z, pF q))

that is, ` ∼[D(x, pF q)  ∀z(z 6 x ⊃ ∼R(z, pF q))]. Hence, by universal generaliza-


tion, ` ∀x ∼ [D(x, pF q)  ∀z(z 6 x ⊃ ∼R(z, pF q))]. That is, ` ∼Rov(pF q).
Rosser’s Lemma, (1) and (2) then yield

Rosser’s Sharpened Incompleteness Theorem. If PA is consistent


then PA is not syntactically complete.

Rosser’s Theorem has several interesting consequences. It can be used to show,


on the assumption that PA is consistent, the existence of infinitely many logically
inequivalent formally undecidable sentences in PA. (A sentence F is formally un-
decidable in PA iff F is neither derivable nor refutable in PA.) The argument goes
thus. Assume PA consistent. Let R1 be the Rosser sentence for PA; that is, the
fixed point of ∼Rov(y). Thus R1 is formally undecidable in PA. Now let PA1 be
the formal system obtained from PA by adding the sentence R1 as a new axiom.
Then PA1 is consistent (else ∼R1 would be derivable in PA, so PA itself would be
inconsistent). We may repeat Rosser’s argument for PA1 : that is, let R2 be a fixed
point of ∼Rov1 (y), where Rov1 (y) is a rovability predicate for the formal system
PA1 . Then R2 is formally undecidable in PA1 . A fortiori, R2 is formally undecidable
in PA. Now let PA2 be the formal system obtained from PA1 by adding R2 as a
new axiom. Again, PA2 is consistent. And the Rosser sentence R3 for the formal
system PA2 is formally undecidable in PA2 ; a fortiori it is formally undecidable in
PA. And so on.
This argument would not work if all we had was Gödel’s original Theorem,
and used Gödel sentences instead of Rosser sentences—even if we assume the ω-
consistency of PA. For to carry the argument through we would need to require
3.4. LÖB’S THEOREM 55

that PA1 is ω-consistent; and this does not follow from the ω-consistency of PA.
That is, the above argument exploits the fact that consistency is “inherited” in
passing from PAi to PAi+1 . Any analogous argument using Gödel’s Theorem would
have to show that ω-consistency is inherited; and this is not so easy. (We could
show this if we adopt a semantical position and assume PA is sound. For the fact
that the Gödel sentence is true in the intended interpretation allows us to conclude
that the system like PA but including the Gödel sentence as a new axiom is also
sound, and hence ω-consistent.)

3.4 Löb’s Theorem


In 1952 Leon Henkin asked the following question: What about the formula that
“says” “I am provable”? That is, let Prov(y) be a standard provability predicate,
and suppose H is a fixed point of Prov(y), so that ` H ≡ Prov(pHq). Is this formula
H derivable? No heuristic indicates an answer: for if H is derivable then what H
“says” is true, so there is no conflict with soundness; and if H is not derivable, then
what H “says” is false, so again there is no conflict. But in 1955 M.H. Löb settled
the matter by an ingenious, and not straightforwardly intuitive, proof.

Löb’s Theorem. Let F be a formula such that ` Prov(pF q) ⊃ F .


Then ` F .

Henkin’s question is thereby answered: the formula that “says” “I am provable”


is indeed provable. For if ` H ≡ Prov(pHq), then by truth-functional logic `
Prov(pHq) ⊃ H, so by Löb’s Theorem, ` H. The interest of Löb’s Theorem goes
further. As we shall see, it can be exploited to yield much new information about
derivability.
Now the formula Prov(pF q) ⊃ F can be construed as a formalization (or as
close to it as we can get) of the assertion that if F is derivable then F is true.
Thus it is a formalization of something we would like to believe, namely soundness.
Löb thus shows that soundness for a formula F is derivable only if F is derivable.
Conversely, if ` F then, by truth-functional logic, also ` Prov(pF q) ⊃ F . Thus we
get the derivability of Prov(pF q) ⊃ F , trivially, only in the case F is itself derivable.
As Rohit Parikh quipped, “PA could’nt be more modest about its own veracity.”
Before presenting Löb’s proof, we give an informal heuristic, formulated by
Henkin after he read Löb’s proof. The heuristic argument uses the notion of truth
freely—that is what leads to the paradoxical nature of the conclusion. The actual
56 CHAPTER 3. FORMALIZED METAMATHEMATICS

proof of Löb’s Theorem is more or less what is obtained by formalizing the heuristic,
after ‘is true’ is replaced by ‘is derivable’. Let Leon be the sentence: if Leon is true
then all students are above average. Suppose that Leon is true; so if Leon is true
then all students are above average; so by modus ponens, all students are above
average. In the preceding sentence we have shown, on the supposition that Leon is
true, that all students are above average That is, we have shown: If Leon is true,
then all students are above average. Thus we have shown Leon is true. And then,
by modus ponens, all students are above average. Paradox!
The rigorous proof of Löb’s Theorem follows. Let F be a formula such that
` Prov(pF q) ⊃ F . We wish to show that ` F . Let H be a fixed point of the formula
Prov(y) ⊃ F . That is,
` H ≡ (Prov(pHq) ⊃ F ).

The argument that follows contains several truth-functional steps; to highlight the
truth-functional forms involved, we shall abbreviate the formula Prov(pHq) ⊃ F by
J and the formula Prov(pProv(pHq)q) by K.

(1) `H⊃J by the specification of H


(2) ` Prov(pHq) ⊃ Prov(pJq) by Quick FMP
(3) ` Prov(pJq) ⊃ (K ⊃ ProvpF q) by Formal Modus Ponens
(4) ` Prov(pHq) ⊃ K by Formal Adequacy
(5) ` Prov(pHq) ⊃ Prov(pF q) by t-f logic, from (2), (3) and (4)
(6) ` Prov(pF q) ⊃ F by supposition
(7) ` Prov(pHq) ⊃ F by t-f logic, from (5) and (6)
(8) `H by (7) and the specification of H
(9) ` Prov(pHq) by Adequacy
(10) `F t-f logic, from (7) and (9). 

Löb’s Theorem has many applications, and can be used to show both derivabil-
ities and underivabilities. For example, as was first pointed out by Georg Kreisel,
Gödel’s Second Theorem is an easy corollary of Löb’s. For suppose ` (Con). Recall
that (Con) is ∼Prov(S0 = 0). By truth-functional logic, ` Prov(S0 = 0) ⊃ H
for any formula H; in particular, ` Prov(S0 = 0) ⊃ S0 = 0. And then, by Löb’s
Theorem, ` S0 = 0, so that PA is inconsistent. Another example concerns the
Rosser sentence R. In the Exercises, we have seen that ` G ⊃ Con, where G is the
Gödel sentence; using Löb’s Theorem, we can show the same does not hold of R,
if PA is consistent. For suppose ` R ⊃ (Con). As we also saw in the Exercises,
` (Con) ⊃ ∼Prov(p∼Rq). Hence, by truth-functional logic, ` R ⊃ ∼Prov(p∼Rq).
3.4. LÖB’S THEOREM 57

By contraposition, ` Prov(p∼Rq) ⊃ ∼R. By Löb’s Theorem ` ∼R. Hence, by


Rosser’s Theorem, PA is inconsistent.
A simple example of using Löb’s Theorem to show a derivability is the following:
suppose A is a formula such that ` A ⊃ (Prov(pF q) ⊃ F ) for every F ; then A is
refutable. For the hypothesis implies that ` A ⊃ (Prov(p∼Aq) ⊃ ∼A). This truth-
functionally implies ` Prov(p∼Aq) ⊃ ∼A, whence ` ∼A.
A more elaborate example is this: suppose H and F are formulas such that `
Prov(pHq) ⊃ (Prov(pF q) ⊃ F ); then ` Prov(pHq) ⊃ F . For let K be Prov(pHq) ⊃
F . By Formal Modus Ponens, ` Prov(pKq)  Prov(pProv(pHq)q) ⊃ Prov(pF q).
Since Prov(pHq) ⊃ Prov(pProv(pHq)q) and Prov(pHq) ⊃ (Prov(pF q) ⊃ F ) are
both derivable, by truth-functional logic ` Prov(pKq) ⊃ K, so that K is derivable.
This result may be applied to show that the formalized version of Löb’s Theorem is
derivable, that is, for each formula F

` Prov(pProv(pF q) ⊃ F q) ⊃ Prov(pF q)
Since Formal Modus Ponens gives us

` Prov(pProv(pF q) ⊃ F q) ⊃ (Prov(pProv(pF q)q) ⊃ ProvpF q)

we can obtain the desired result by applying the result of the preceding paragraph.
Other applications of Löb’s Theorem are contained in the exercises. Here we
offer one more: we use Löb’s Theorem to show one way in which ω-consistency is a
rather more intricate notion than simple consistency. Define a sequence A0 , A1 , ... of
formulas thus: A0 is S0 = 0; A1 is Prov(pS0 = 0q); A2 is Prov(pProv(pS0 = 0q)q);
and, in general, Ai+1 is Prov(pAi q). We assume Prov(y) has the form ∃xD(x, y).
(Note that ∼A1 is just the formula (Con).)
(1) ` Ai ⊃ Ai+1 for each i.
For i = 0 this follows since ` ∼S0 = 0. For i > 0 this follows from Formal Adequacy.
(2) If PA is consistent then no Ai for i > 0 is refutable in PA.
Were ∼Ai derivable in PA for i > 0, then by (1), ∼A1 would be derivable in PA.
That is, ` (Con). But then, by Gödel’s Second Theorem, PA would be inconsistent.
(3) If PA is ω-consistent then no Ai is derivable in PA.
The proof is by induction on i. For i = 0 this follows just from the consistency of PA.
Suppose Ai is not derivable. Then, by numeralwise expressibility, ` ∼D(n, pAi q)
for each number n. But Ai+1 is just (∃x)D(x, pAi q). Hence, were Ai+1 derivable,
PA would be ω-inconsistent.
(4) The system obtained by adding any Ai as a new axiom to PA is ω-inconsistent.
58 CHAPTER 3. FORMALIZED METAMATHEMATICS

This is immediate from the proof of (3).


(5) If PA is ω-consistent then Ai+1 ⊃ Ai is not derivable for any i.
This follows at once from (3) and Löb’s Theorem.
We have shown the existence of a sequence of formulas, each of which is strictly
weaker than the one that precedes it (in the sense that it is provably implied by
the one that precedes it, but does not provably imply it) but each of which is ω-
inconsistent with PA. Contrast the case of simple consistency: if F and G are each
inconsistent with PA, then F and G are both refutable in PA, and hence ` F ≡ G.
Chapter 4

Formalizing Primitive Recursion

4.1 ∆0 , Σ1 , and Π1 formulas


In this chapter we show that all primitive recursive functions and relations are
numeralwise representable in PA. In fact we show that the numeralwise represen-
tations may be taken to be formulas of a restricted syntactic form. This will have
the consequence that the sentences that have been of interest to us in the preceding
two chapters, like the Gödel sentence, the Rosser sentence, and (Con), may also be
taken to be of restricted syntactic forms.
The ∆0 -formulas of LPA are those in which all quantifiers are bounded. More
precisely: every atomic formula is a ∆0 -formula; if F and G are ∆0 -formulas, then
so are ∼F and F ⊃ G; if F is a ∆0 -formula, v a variable, and t a term not containing
v, then ∀v(v 6 t ⊃ F ) is a ∆0 -formula.
(Recall that LPA does not contain the existential quantifier; we have used ∃v
as metamathematical shorthand for ∼∀v∼. So ∆0 -formulas can express bounded
existential quantifiers, since ∃v(v 6 t  F ) is shorthand for ∼∀v ∼ (v 6 t  F ), which
is logically equivalent to ∼∀v(v 6 t ⊃ ∼F ).)
The numerical value of a closed term, that is, a term without variables, is
defined in the obvious way: the numerical value of 0 is 0, the numerical value of St
is one more than the numerical value of t, and the numerical values of terms s + t
and s × t are the sum and the product, respectively, of the numerical values of s and
of t. Now the property of truth for ∆0 -sentences may be defined by induction on the
construction of the sentence: an atomic sentence s = t is true iff the numerical value
of s is equal to the numerical value of t; F ⊃ G is true iff F is false or G is true;
∼F is true iff F is not true; and ∀v(v 6 t ⊃ F (v)) is true iff F (0), F (1), . . . , F (k)

59
60 CHAPTER 4. FORMALIZING PRIMITIVE RECURSION

are all true, where k is the numerical value of t. (Since here ∀v(v ≤ t ⊃ F (v)) is a
sentence, the term t must be a closed term.) The property of truth for ∆0 -sentences
is primitive recursive; and for any ∆0 -sentence F , if F is true then ` F , and if F is
not true, then ` ∼F (see the Exercises). Thus PA is ∆0 -complete.
A Σ1 -formula is a formula ∃v1 . . . ∃vn F , where F is ∆0 . That is, a Σ1 -formula
may contain unbounded quantifiers, but they must all be existential quantifiers and
must govern the rest of the formula. Symmetrically, a Π1 -formula is a formula
∀v1 . . . ∀vn F , where F is ∆0 ; here all the unbounded quantifiers are universal. (In
these, we also let n = 0, so that every ∆0 -formula counts as both a Σ1 -formula and
a Π1 -formula.)
Some logical equivalences should be remarked on at once. The negation of a
Σ1 -formula is equivalent to a Π1 -formula, and vice-versa. The conjunction of two
Σ1 -formulas is logically equivalent to a Σ1 -formula, as is the disjunction of two
Σ1 -formulas. And the conjunction or disjunction of two Π1 -formulas is logically
equivalent to a Π1 -formula. Hence we shall often speak somewhat loosely, for ex-
ample, of a conjunction of Σ1 -formulas as though it were itself a Σ1 -formula, and a
negation of a Σ1 -formula as thought it were itself a Π1 -formula.
In §4.3 we shall prove the

Strengthened Representability Theorem Every primitive recursive


function is numeralwise representable in PA by a Σ1 -formula.

It follows from the Theorem that every primitive recursive relation is numeralwise
representable in PA both by a Σ1 -formula and by a Π1 -formula. For suppose R is a
primitive recursive relation, say, of two arguments. By definition, the characteristic
function χ of R is primitive recursive, so there is a Σ1 -formula F (x, y, z) that numer-
alwise represents χ. The formula F (x, y, S0) is Σ1 and the formula ∼F (x, y, 0) is Π1 .
Both formulas, we claim, numeralwise represent R. For if R(k, n) then χ(k, n) = 1,
so that ` F (k, n, z) ≡ z = S0, and consequently ` F (k, n, S0) and ` ∼F (k, n, 0).
And if R(k, n), then χ(k, n) = 0 so that ` F (k, n, z) ≡ z = 0, and consequently
` ∼F (k, n, S0) and ` F (k, n, 0).
Let us now look at the formulas we considered in Chapters 2 and 3. Let us
assume that we take those formulas to be of the least complexity possible.
The Gödel sentence is Π1 . As we originally defined it in §2.4 the Gödel sentence
is obtained from a numeralwise representation of the primitive recursive relation
Der(k, diag(n)) by universally quantifying one free variable and replacing the other
free variable with a numeral. Since the numeralwise repesentation may be taken to
be Π1 , the Gödel sentence will also be Π1 .
4.2. Σ1 -COMPLETENESS AND Σ1 -SOUNDNESS 61

The natural provability predicate Prov(y) is Σ1 , since it is the existential quan-


tification of a formula that numeralwise represents the primitive recursive relation
Der(k, n).
The proof of the Fixed Point Lemma given in §3.1 shows that there exists a
fixed point of a formula F (y) of the form ∀z(∆(p, z) ⊃ F (z)) where ∆(y, z) is a
representation of the diag function. If we take ∆(y, z) to be Σ1 and if F (y) is Π1 ,
then the fixed point will also be Π1 . Thus if we obtain the Gödel sentence in the
manner of Chapter 3, by applying the Fixed Point Lemma to ∼Prov(y), once again
it will be Π1 . The Rosser sentence is also Π1 , since it is a fixed point of ∼Rov(y),
which is Π1 .
The formula (Con) is Π1 , since it is ∼Prov(pS0 = 0q) and Prov(y) is Σ1 .

4.2 Σ1 -completeness and Σ1 -soundness


A Σ1 -sentence ∃v1 . . . ∃vm F (v1 , . . . , vm ), where F (v1 , . . . , vm ) is ∆0 , is true iff there
exist numbers k1 , . . . , km such that F (k1 , . . . , km ) is true. If this condition holds
then ` F (k1 , . . . , km ) by ∆0 -completeness, so ` ∃v1 . . . ∃vm F (v1 , . . . , vm ) by existen-
tial generalization. Thus PA is Σ1 -complete, in the sense that all true Σ1 sentences
are derivable.
(We do not have Σ1 -completeness is any stronger sense: a false Σ1 sentence
may fail to be refutable. Indeed, we have just seen that the Gödel sentence G is Π1 ;
hence ∼G is a false Σ1 -sentence that is not refutable, assuming PA is consistent.)
The converse property to Σ1 -completeness is called Σ1 -soundness: every deriv-
able Σ1 -sentence is true. Σ1 -soundness implies that if ∃xF (x) is a derivable Σ1 -
sentence then some numerical instance F (k) is derivable. For if ∃xF (x) is true,
then F (k) is true for some k, so that ` F (k) by Σ1 -completeness. Note also that
Σ1 -soundness implies consistency.
Since the Gödel sentence can be taken to be Π1 , Σ1 - soundness can be used
instead of ω-consistency to show that it is not refutable. (Recall that consistency is
sufficient to show that it is not derivable.) For if ∀xA(x, p) is the Gödel sentence and
is Π1 , then ∼∀xA(x, p) is equivalent to a Σ1 -sentence ∃x∼A(x, p). Were it derivable
then by Σ1 -soundness there would be a k such that ∼A(k, p) is derivable, but as we
showed ` A(k, p) for each k, so that we would have a violation of consistency.
Σ1 -soundness also implies the faithfulness of the natural provability predi-
cate ∃xD(x, y) where D(x, y) numeralwise represents Der(k, n) and is Σ1 . For if
` Prov(pF q) then by Σ1 -soundness there is a k such that ` D(k, pF q); for this k
we have Der(k, γ(F )), so that by mirroring ` F . Thus Σ1 -soundness may be used
62 CHAPTER 4. FORMALIZING PRIMITIVE RECURSION

in the streamlined proof of Gödel’s First Theorem given at the end of §3.1.
From Σ1 -soundness we may also infer that if we add an irrefutable Π1 -sentence
as a new axiom, no Σ1 -sentences that were previously underivable become derivable,
so that the extended system remains Σ1 -sound. For let J be the Π1 -sentence, and let
K be a Σ1 -sentence derivable in the expanded system. By the Deduction Theorem
(see Exercise 2.?), J ⊃ K is derivable in PA. J ⊃ K is equivalent to a Σ1 -sentence,
so by the Σ1 -soundness of PA it is true. Since J is irrefutable, ∼J is not true; hence
K is true. Since PA is Σ1 -complete, ` K.
Finally, we note that Σ1 -soundness is equivalent to the restriction of ω-consistency
to Σ1 -sentences: there is no Σ1 -sentence ∃xF (x) such that ` ∃xF (x) and also
` ∼F (n) for every n. This property is called 1-consistency. Suppose PA is Σ1 -
sound and ` ∃xF (x), where ∃xF (x) is Σ1 . Then ` F (k) for some k, as we noted
above, and ∼F (k) is not derivable, by consistency. (For the implication in the other
direction, from 1-consistency to Σ1 -soundness, see the Exercises.)
If PA is Σ1 -sound, a Σ1 - sentence is derivable only if some numerical instance of
it is derivable. This does not hold of more complex existential sentences, assuming
PA is consistent. For example, let F (x) be the formula (x = 0  R ∨ x = S0  ∼R),
where R is the Rosser sentence. Since ` R ∨ ∼R, we also have ` ∃xF (x). But F (k)
is not derivable for any k. For if k is greater than 1 then ` ∼F (k), while ` F (0) ⊃ R
and ` F (S0) ⊃ ∼R. Rosser’s Theorem tells us that neither ` R nor ` ∼R. Hence
F (k) is not derivable for any k.
As we’ve noted, Σ1 -soundness implies consistency. The converse is not true. In
general there are systems that are consistent but not Σ1 -sound: for example, take
the system obtained by adding the negation of the Gödel sentence as an additional
axiom to PA. But even just restricted to PA there is no implication: it can be
shown that if F formalizes the Σ1 -soundness of PA, then not ` (Con) ⊃ F . (See
the Exercises.)
Σ1 -completeness, in contrast, is an entirely elementary property. It follows
from provabilities that can be established by elementary means in PA. This has the
consequence that formalized Σ1 -completeness is derivable in PA: for any Σ1 -sentence
F,
` F ⊃ Prov(pF q)
Formal Adequacy is a particular case of formalized Σ1 - completeness, since Prov(pF q)
is a Σ1 -sentence. (More details on formalized Σ1 -completeness are contained in the
Appendix, §6.)
From ∆0 - and Σ1 -completeness, we see that Gödel proved a best possible result.
The simplest formula that could possibly be true but not provable in PA would be
4.3. PROOF OF REPRESENTABILITY 63

Π1 ; and that’s exactly what the Gödel sentence is.

4.3 Proof of Representability


To show that every primitive recursive function is numeralwise representable we
need to show three things: that the basic functions (the constant functions, the
identity function, and the successor function) are numeralwise representable; that
any function defined by composition from representable functions is itself repre-
sentable; and that any function defined by recursion from representable functions
is itself representable. As we shall see, the first two are easy, even when we add
the requirement that the representations be Σ1 . The third, however, requires some
work. To prove it, we must convert recursive definitions into explicit definitions that
can be formalized in LPA .
Consider a representative case: a 2-place function ϕ defined by
ϕ(n, 0) = θ(n)
ϕ(n, k + 1) = ψ(n, k, ϕ(n, k))
where θ and ψ are functions known to be representable. The definition tells us
how to compute ϕ(n, k) for any n and k, in a stepwise fashion. First we obtain
ϕ(n, 0)by computing θ(n); call this number j0 . Next we obtain ϕ(n, 1) by computing
ψ(n, 0, j0 ); call the result j1 . To find ϕ(n, 2) we compute ψ(n, 1, j1 ); call the result
j2 . To find ϕ(n, 3) we compute ψ(n, 2, j2 ). And so on. The integer jk obtained by
this process will be the value of ϕ(n, k). Thus
ϕ(n, k) = p iff there exists a sequence hj0 , . . . , jk i of integers such that
j0 = θ(n), ji+1 = ψ(n, i, ji ) for each i < k, and jk = p.
This is an explicit definition: ϕ does not occur on the right side of the biconditional.
To formalize it in LPA , we must be able to deal formally with finite sequences. More
precisely, we need to formalize the operation of extracting the members of a sequence
from an integer that encodes the sequence. (In Chapter 2 we saw how to encode
sequences using products of prime powers. That is of no help here, since we first
need to deal with sequences in order to show that exponentiation is representable.)
The following Lemma, sometimes called “Gödel’s β-function Lemma”, enables us to
do this.

Sequence Encoding Lemma. There exists a two-place primitive re-


cursive function β such that
64 CHAPTER 4. FORMALIZING PRIMITIVE RECURSION

(a) for any k and any sequence hj0 , . . . , jk i of integers, there exists an
integer s such that β(s, i) = ji for each i, 0 ≤ i ≤ k;
(b) β is numeralwise representable in PA by a ∆0 -formula B(x, y, z) such
that ` B(x, y, z)  B(x, y, z 0 ) ⊃ z = z 0 .

To prove the Lemma, we start by defining the pairing function π:


1
π(p, q) = · (p + q) · (p + q + 1) + q
2
π maps pairs of integers one-one and onto the integers. To see this, given any r, let
n be the largest number such that 21 · n · (n + 1) ≤ r. Then r − ( 21 n · (n + 1)) ≤ n,
and r = π(p, q) for q = r − 12 (n · (n + 1)) and p = n − q, and for this pair p, q only.
We shall encode sequences by a pair of numbers p, q, and then take π(p, q) as
the number s in condition (i). Given a sequence hj0 , . . . , jk i let p be any prime that
is both larger than k and larger than all the ji , and let q be

0 + p · j0 + p2 · 1 + p3 · j1 + p4 · 2 + p5 · j2 + . . . + p2k · k + p2k+1 · jk

.
(Thus, if q is written as a numeral in p-ary notation, it will have 2k + 2 digits:
counting from the right the odd places will be occupied by 0, 1, 2, ..., k and the even
places by j0 , j1 , j2 , . . . , jk .) Let Q(p, m) be the relation “p is a prime and m is a
power of p”; clearly Q(p, m) is numeralwise representable by a ∆0 formula. Let
R(s, i, j) be the relation
E
( m, n, p, q, r ≤ s)(s = π(p, q) & Q(p, m) & n < m2 & q = n + m2 · i +
m2 · p · j + m2 · p2 · r)

We may then let β(s, i) = µjR(s, i, j). Condition (a) of the Lemma then follows,
for s = π(p, q), with p and q as above.
Now R(s, i, j) is also numeralwise representable by a ∆0 -formula; call it F (x, y, z).
Then let B(x, y, z) be F (x, y, z)∀x0 (x0 6 zF (x, y, x0 ) ⊃ z = x0 ). B(x, y, z) is clearly
∆0 and numeralwise represents β(s, i).
For condition (b) of the Lemma, note that B(x, y, z)  B(x, y, z 0 ) implies both
z 0 6 z ⊃ z = z 0 and z 6 z 0 ⊃ z 0 = z. Since ` z 6 z 0 ∨ z 0 6 z, (b) follows.
The Sequence Encoding Lemma allows us to formalize the explicit definition of
ϕ given above: by using numeralwise representations of θ and ψ as well as B(x, y, z),
we could easily write a formula that expresses the following: there exists an s such
4.3. PROOF OF REPRESENTABILITY 65

that β(s, 0) = θ(n), β(s, i + 1) = ψ(n, i, β(s, i)) for each i < k, and β(s, k) = p.
We would thereby obtain a numeralwise representation of ϕ. However, since we
want the representation to be Σ1 , a difficulty emerges: we may assume that the
representations θ and ψ are Σ1 , but the formalization of the clause “β(s, i + 1) =
ψ(n, i, β(s, i)) for each i < k” will contain a bounded universal quantifier governing
the Σ1 -formula representing ψ, and this would no longer be Σ1 .
To overcome this difficulty, we shall require a stronger notion of representation
by a Σ1 -formula. (For readability, in what follows, we are going to use u, v, and w
as if they were formal variables of LPA , and we will use v, w, z 6 u as shorthand for
v 6 u  w 6 u  z 6 u.)
A formula ∃uF (u, x, y, z) is an excellent representation of a 2-place function ϕ
iff if F (u, x, y, z) is ∆0 and for all n, k and p such that p = ϕ(n, k)

(i) there exists an integer q such that F (q, n, k, p);


(ii) ` ∃uF (u, n, k, z) ⊃ z = p.

Since ` ∃uF (x, n, k, p) follows from (i), an excellent representation ∃uF (u, x, y)
does indeed numeralwise represent ϕ. The additional requirements on excellent rep-
resentations are that the formula contain only one unbounded existential quantifier,
and that if p = ϕ(n, k) not just ∃uF (u, n, k, p) but also some numerical instance
F (q, n, k, p) is derivable. (As shown in the previous section, we could infer the
derivability of a numerical instance by invoking Σ1 soundness. But we do not want
to make the Representability Theorem dependent on such a metamathematical sup-
position. Instead, we will build the derivability of a numerical instance into the
construction of the representation.)
The definition of excellent representation for functions of one argument and of
more than two arguments is similar. We also will call a ∆0 formula that represents a
primitive recursive function an excellent representation. (To conform literally to the
definition above, we would need to add an existential quantifier binding a variable
that does not actually appear in the formula.)
We now show that every primitive recursive function has an excellent represen-
tation. The basic functions are all representable by atomic formulas of LPA , which
are, of course, ∆0 and so are excellent representations For composition, let us take
a simple example, which is easily generalized. Suppose ζ is a 1-place function de-
fined by ζ(k) = ν(ξ(k)), and suppose ∃uN (u, x, y) and ∃uX(u, x, y) are excellent
representations of ν and ξ. Then let Z(u, x, y) be

∃v∃w∃z(v, w, z 6 u  X(v, x, z)  N (w, z, y))


66 CHAPTER 4. FORMALIZING PRIMITIVE RECURSION

We claim ∃uZ(u, x, y) is an excellent representation of ζ. Clearly Z(u, x, y) is


∆0 . Let any k be given and let j = ξ(k), and p = ν(j), so that p = ζ(k). By
supposition, there are numbers q and r such that ` X(q, k, j)  N (r, j, p). Let s be
the maximum of q, r, and j. Since ` q, r, k 6 s, we have

` ∃v∃w∃z(v, w, z 6 s  X(v, k, z)  N (w, z, p)

that is, ` Z(s, k, p). And we also have

` ∃uZ(u, k, n, y) ⊃ y = p
because ∃uZ(u, k, y) provably implies ∃u∃v∃z(X(u, k, z)  N (v, z, y)), which by sup-
position provably implies ∃v∃z(z = j  N (v, z, y)); by dint of the laws of identity this
provably implies ∃uN (u, j, y), which again by supposition provably implies y = p.
For recursion, let us consider the example of the 2-place function function ϕ
defined from θ and ψ as at the beginning of this section. Let ∃uT (u, x, y) be an
excellent representation of θ, and ∃uR(u, v, x, y, z) be one of ψ. For readability, we
introduce more shorthand: we exploit (b) of the Sequence Encoding Lemma and use
(v)x as shorthand meant to express “the number z such that B(v, x, z)”. That is, a
formula F ((v)x ) is shorthand for ∃z(z 6 v  B(v, x, z)  F (z)). Condition (b) assures
us that we can treat the shorthand exactly as if (v)x were a term of LPA : that is,
` F ((v)x )  (v)x = y ⊃ F (y). (See the Exercises.)
Let P (u, x, y, z) be

∃v(v 6 u  ∃u0 (u0 6 u  T (u0 , x, (v)0 ))  H(u, v, x, y)  (v)y = z)

where H(u, v, x, y) is

∀w(Sw 6 y ⊃ ∃u00 (u00 6 u  R(u00 , x, w, (v)w , (v)Sw )))


We claim that ∃uP (u, x, y, z) is an excellent representation of ϕ. Clearly P (u, x, y, z)
is ∆0 . Suppose any k is given, and let p = ϕ(k). Let j0 , . . . jk be defined as
before, that is, j0 = θ(n) and ji+1 = ψ(n, i, ji ), and then let s be a number such
thatβ(s, i) = ji for each i ≤ k. Let q be such that ` T (q, n, j0 ), and for each
i < k, let qi be such that ` R(qi , n, i, ji , ji+1 ). Since ∃uT (u, x, y) is an excellent
representation of θ, and ∃uR(u, v, x, y, z) is an excellent representation of ψ there
will be such numbers q and q0 , . . . , qk−1 . Finally, let r be the maximum of s, q, and
the qi .
We show first that ` P (r, n, k, p). Now ` T (q, n, j0 ) and also ` (s)0 = j0 ;
hence ` ∃u0 (u0 6 r  T (u0 , n, (s)0 ). For each i < k, we have ` (s)i = ji  ((s)i+1 =
4.3. PROOF OF REPRESENTABILITY 67

ji+1 , so that ` qi 6 r  R(qi , n, i, (s)i , ((s)i+1 . Consequently ` ∀w(Sw 6 k ⊃


∃u00 (u00 6 r  R(u00 , n, w, (s)w , (s)Sw ))), that is, ` H(r, s, n, k). And finally we have
` (s)k ) = p.
To complete the proof, we need to show ` ∃uP (u, n, k, z) ⊃ z = p. The
following argument may easily be formalized in PA. Suppose P (u, n, k, z). Let v be
such that ∃u0 T (u0 , n, (v)0 ))H(u, v, n, k)(v)k = z). Since ∃uT (u, x, y) is an excellent
representation of θ and j0 = θ(n), we have (v)0 = j0 . Now, letting the variable w in
H(u, v, n, k) be 0, we have ∃u00 R(u00 , n, 0, (v)0 , (v)S0 ). Since ∃uR(u, v, x, y, z) is an
excellent representation of ψ and j1 = ψ(n, 0, j0 ), we have (v)S0 = j1 . Continuing
in this way, taking the variable w in H(u, v, n, k) to take successively the values
1, 2, . . . , k − 1, we obtain (v)i = ji for each i ≤ k. Since (v)k = z, and jk = p, we
obtain z = p.
68 CHAPTER 4. FORMALIZING PRIMITIVE RECURSION
Chapter 5

Formalized Semantics

5.1 Tarski’s Theorem


Tarski’s Theorem says that no sufficiently rich formal language contains its own
truth predicate. To see what this means we must first define the notion of truth
predicate. Let L be a formal language, and suppose we have fixed an intended
interpretation of L. This interpretation provides a notion of the truth of a sentence
of L; that is, we use “true” to mean true under this intended interpretation. To give
formal representation to this notion of truth, we suppose that L is gödelized, and
that L contains numerals, i.e., syntactic objects which under interpretation denote
the integers. A truth predicate for L is a formula T r(y) such that, for each sentence
F of L, the biconditional
Tr(pF q) ≡ F
is true. (In this definition, Tr(y) may be a formula in L or a formula in some
language that extends L. In the latter case, by the truth of the biconditional we
mean its truth under the intended interpretation of this more inclusive language.)
This definition is motivated by the following philosophical observation of Tarski’s.
Suppose we wish to give an account of the notion is true, as applied to the sentences
of a language like English. As a minimal condition on any such account, we should
be able to derive from it all sentences

“Snow is white” is true iff snow is white

“Tucson is in Arizona” is true iff Tucson is in Arizona


“2 + 2 = 5” is true iff 2 + 2 = 5

69
70 CHAPTER 5. FORMALIZED SEMANTICS

and so on. After all, one might argue, it is biconditionals like this which give
us the sense of the notion of truth. So whatever else our account tells us about
truth, it had better yield those biconditionals. The biconditionals are sometimes
called Tarski paradigms, Tarski biconditionals, or T-sentences. We may phrase the
condition generally: the account must yield everything of the form

S is true iff p

where for p we substitute a sentence and for S a name of that sentence. So let us
consider the language LPA , with its intended interpretation. Tarski’s Theorem can
be obtained immediately from the Fixed Point Theorem. For suppose T (y) is any
formula of LPA with one free variable. By the Fixed Point Theorem applied to the
formula ∼T (y), there is a sentence H of LPA such that ` ∼T (pHq) ≡ H. Hence
by the soundness of PA, ∼T (pHq) ≡ H is true in the intended interpretation of
PA. But then T (y) cannot be a truth predicate for LPA ; it fails to act like a truth
predicate on the formula H.
Clearly the same argument can work more generally. Given a language L,
suppose Σ is a formal system that is sound for the intended interpretation of L
and in which the primitive recursive function diag can be numeralwise represented.
Then the Fixed Point Theorem may be proved, and as above we obtain, given any
purported truth predicate T (y), a sentence H on which T (y) fails to act like a truth
predicate. Note that, first, the sentence H is obtained constructively. That is, the
proof of the Fixed Point Theorem provides a method for constructing H, given T (y).
Note second that the failure of T (y) to act like a truth predicate on H is exhibited
proof-theoretically, by the derivability of a biconditional that contradicts Tarski’s
paradigm T (pHq) ≡ H. Of course, to obtain such a proof-theoretic counterexample,
we have to bring in a formal system Σ. To obtain Tarski’s Theorem directly, that is,
without any use of formal systems, we would have to proceed entirely semantically.
We carry this out immediately below.
The proof of Tarski’s Theorem is just a formalization of the Liar Paradox, whose
starkest form is this: let (*) be the sentence “Sentence (*) is not true”. If (*) is true
then it is not true, and if it is not true then it is true; contradiction. The paradox
arises, so it seems, from the assumption that the words “is true” function as a truth
predicate for all English sentences, including those containing the words “is true”.
The only thing needed to carry the argument out in a formal setting is a way of
getting the effect of the self-reference, of having (*) be a sentence that talks about
itself. That is the role of Gödel’s function diag.
In the proof given above, the formal system Σ entered only through the use
5.1. TARSKI’S THEOREM 71

of the Fixed Point Theorem. A proof of Tarski’s Theorem that does not mention
formal systems at all can be obtained if we replace this step by a use of a semantic
form of the Fixed Point Theorem, to wit: for every formula F (y) of L there is a
formula H such that F (pHq) ≡ H is true. Given this, for every formula T (y) of L
there will be a sentence H such that ∼T (pHq) ≡ H is true, and so T (y) fails to be
a truth predicate. To prove the semantic Fixed Point Theorem, we introduce the
notion of definability in L.

A formula F (v1 , . . . , vm ) of L defines an m-place relation R just in case,


for all n1 , . . . , nm , F (n1 , . . . , nm ) is true iff R(n1 , . . . , nm ). A relation is
definable in L iff there exists a formula of L that defines it.

Definability is a semantic correlate of numeralwise representability. Indeed, if


F (v1 , . . . , vm ) numeralwise represents R in a formal system that is sound for the
intended interpretation of L, then F (v1 , . . . , vm ) defines R in L. The converse does
not hold. In general, there are many relations that are definable but not numer-
alwise representable. For example, the 1-place relation Bew is, as we have seen,
not numeralwise representable in PA (assuming PA consistent). Yet it is definable
in LPA , since the formula ∃xD(x, y) defines, it, where D(x, y) is any formula that
numeralwise represents the relation Der(m, n).
Note that Tarski’s Theorem may be stated thus: the class of gödel numbers of
true sentences of L is not definable in L.
A function is said to be definable in L iff its graph is definable in L. Now suppose
that the function diag is definable in L. Thus there exists a formula ∆(x, y) of L
such that ∆(k, n) is true iff n = diag(k). Given a formula F (y) of L, let p be the
gödel number of ∃z(∆(y, z)  F (z)) and let H be ∃z(∆(p, z)  F (z)). Note that the
gödel number of H is diag(p). Since diag(p) is the only value of z for which ∆(p, z)
is true, H will be true iff F (q) is true, where q = diag(p). That is, H ≡ F (q) is
true. But F (q) is just F (pHq); thus the semantic Fixed Point Theorem is proved.
This proof differs little from the proof of the syntactic Fixed Point Theorem
of §??. The only difference is this: we take ∆(x, y) to define diag, rather than to
numeralwise represent it in some formal system. Therefore, instead of the deriv-
ability of H ≡ F (pHq) in the formal system, our conclusion is the truth of this
biconditional. Note that the only properties of L needed are these: the function
diag is definable in L; and L contains the usual logical signs (quantifiers and truth-
functions). This, in the end, is the cash value of the condition “sufficiently rich”
that we used in our original statement of Tarski’s Theorem.
Now Tarski’s Theorem says there is no truth predicate for L in L. It does
72 CHAPTER 5. FORMALIZED SEMANTICS

not preclude the existence of a truth predicate for L in some language extending L.
Tarski went on to show how truth predicates for various formal languages can indeed
be constructed in more extensive formal languages. We now turn to an examination
of his definition of truth, taking LPA as the formal language to be treated.

5.2 Defining truth for LPA


We wish to define truth in the intended interpretation of LPA . Truth is a property
of sentences, that is, formulas without free variables; and the truth of a complex
sentence depends in a straightforward way on the truth of sentences from which it
is constructed.
A sentence ∼F is true iff F is not true; a sentence F ∨ G is true iff F
is true or G is true; and a sentence ∀vF (v) is true iff F (n) is true for
every n.
Note that this last clause is correct because every member of the universe of discourse
is, in the intended interpretation, denoted by some formal numeral n. Consequently,
we can easily formulate an inductive definition of truth, that defines the truth of a
sentence in terms of the truth of sentences of lesser logical complexity.
As a base clause for this inductive definition, we need a definition of truth for
atomic sentences, that is, for atomic formulas that contain no variables. Atomic
sentences have the form s = t, where s and t are terms of LPA without variables. A
term without variables has a fixed numerical value in the intended interpretation of
LPA . For example, the term ((SS0 × SSS0) + S0) has the value 7. It is not hard to
see that there is a primitive recursive function nv that takes each such term to its
numerical value. (See the Exercises.) We use it to define truth for atomic sentences:

TrAt (n) ≡ Sent(n) & Atform(n) & (∃j)(∃k)(j, k ≤ n & n = j∗25 ∗k & nv(j) = nv(k)).

(Here Sent(n) is the p.r. relation true just of the gödel numbers of sentences, i.e.,
Sent(n) iff Form(n) & (∀i, j)(i, j ≤ n → Free(i, j, n)).) Clearly TrAt (n) holds iff n is
the gödel number of a true atomic sentence. Note, by the way, that TrAt is primitive
recursive.
The inductive definition of truth is obtained by mirroring the inductive char-
acterization displayed above:
(∗) Tr(n) ≡ Sent(n) & [(Atform(n) & TAt (n))
∨ (∃i < n)(n = neg(i) & Sent(i) & Tr(i))
5.2. DEFINING TRUTH FOR LPA 73

∨ (∃i, j < n)(n = dis(i, j) & (Tr(i) ∨ Tr(j)))


∨ (∃i, k < n)(n = gen(k, i) & (∀p)(Tr(sub(i, k, nmrl(p)))))].

Definition (∗) uniquely determines the numbers of which Tr holds. This may be
shown by an induction on logical complexity. Let lc(n) = k if n is the gödel number
of a sentence of LPA containing k occurrences of the logical signs “∼”, “∨”, “∀”; also
let lc(n) = 0 if n is not the gödel number of a sentence. Clearly, Tr(n) is uniquely
determined for all n with lc(n) = 0, since for such n we have either Sent(n) or
Atform(n); and if Tr(k) is determined for all k with lc(k) < lc(n), then the clauses
of (∗) suffice to fix whether or not Tr(n). It should also be clear that the property
Tr so determined is the one we want: Tr(n) holds iff n is the gödel number of a true
sentence of LPA .
Although definition (∗) is inductive, it is not a primitive recursive definition,
because there is an unbounded quantifier on the right-hand side. Whether or not
Tr(n) holds when n = gen(k, i) depends upon whether Tr holds of an infinite number
of other numbers. This makes it impossible to obtain an explicit definition by the
method of Chapter 4. Indeed, Tarski’s Theorem tells us that an explicit definition—
that is, a biconditional of the form Tr(n) ≡ X where X does not contain Tr—cannot
be obtained if X is limited to number-theoretic notions, that is, to primitive recursive
functions and relations, truth-functions, and quantifiers that range over the integers.
To obtain an explicit definition of truth for LPA a more powerful metalanguage
must be used. In fact, we can obtain such an explicit definition by allowing the
metalanguage to include variables and quantifiers ranging over sets of integers. Using
“X” as such a variable, let (∗X ) be obtained from (*) by replacing each occurrence
of “Tr( )” with an occurrence of “ ∈ X” (“n ∈ X” means “n belongs to the
set X”). As we noted above, there is one and only one set X such that (∗X ) holds
for all n. Hence we may define

(∗∗) Tr(m) ≡ (∃X)[(∀n)(∗X ) & m ∈ X].

And since ∀n(∗X ) holds for one and only one X, we could equally well define

(∗ ∗ ∗) Tr(m) ≡ (∀X)[(∀n)(∗X ) → m ∈ X].

Note 5.1: To show that the predicate T r as defined by (∗∗) or by (∗ ∗ ∗) itself


obeys (∗) for all n, one would have to show that (∀n)(∗X ) holds for one and only
one set X. As indicated above, uniqueness can be shown by induction on lc(n). The
existence of such an X also can be shown by induction on lc(n). Call a set X p-good
iff (∗X ) holds for all n such that lc(n) ≤ p. It is fairly simple to show, by induction
74 CHAPTER 5. FORMALIZED SEMANTICS

on p, that for each p there exists a p-good set, and that all p-good sets agree on
those n such that lc(n) ≤ p. A set X such that (∗X ) holds for all n can then be
obtained by stitching together p-good sets for each p. Indeed, one can even avoid
the necessity of showing the existence of a set X such that (∀n)(∗X ) by framing a
definition of Tr directly in terms of p-good sets. We can define

Tr(m) ≡ (∃X)[(∀n)(lc(n) ≤ lc(m) → (∗X )) & m ∈ X]


,
that is, Tr(m) holds iff m belongs to some lc(m)-good set. Equivalently, we
could define

Tr(m) ≡ (∀S)[(∀n)(lc(n) ≤ lc(m) → (∗X )) → m ∈ X]

,
that is, Tr(m) holds iff m belongs to every lc(m)-good set. It will still follow
that Tr itself obeys (∗) for all n. End of Note.

5.3 Uses of the truth-definition


In this section we sketch how the definition of truth, formalized in a formal language
that extends LPA , can be used to show the soundness of formal system PA. We
consider a formal language that extends LPA by the inclusion of variables ranging
over sets of numbers. Let Tr(x) be the formalization in such a language of the
definition of Tr(m), using any of the alternatives from the end of the previous
section. We assume a formal system has been specified which extends PA and
which allows us to derive the formalized version of (∀n)(∗). In the remainder of this
section we use “`∗ ” to mean derivability in such a system. (Details about a suitable
formal language and formal system are in §?? below). From the derivability of a
formalized version of (∀n)(∗) follows the derivability of the Tarski paradigms: that
is, for each sentence F of LPA ,

`∗ Tr(pF q) ≡ F.

(The proof of this is not entirely straightforward, but basically it proceeds by in-
duction on the logical complexity of F .)
To say that PA is sound is to say that all derivable formulas are true. Here,
however, we are using “true” to apply not just to sentences but to formulas with free
5.3. USES OF THE TRUTH-DEFINITION 75

variables as well. A formula with free variables is said to be true iff it is true for all
values of its variables. This holds iff the universal closure of the formula is true, and
also iff all numerical instances of the formula are true, where a numerical instance
of a formula is the result of replacing all free variables with formal numerals.. Let
NI(m, n) be the p.r. relation that holds iff m is the gödel number of a formula and
n is the gödel number of a numerical instance of that formula. Let Prov(y) be a
standard formalization of derivability in PA. The following Claim is, then, that the
soundness of PA is derivable in the formal system.

Claim. `∗ ∀y(Prov(y) ⊃ ∀z(NI(y, z) ⊃ Tr(z)))

Sketch of Proof. One shows the derivability of the formalizations of “Every axiom of
PA is true” and of “The rules of inference preserve truth”. Indeed, the derivability of
the latter, and of the former for the logical axioms of PA, follows in a straightforward
way from the derivability of (∗). For example, consider an axiom of the form F ⊃
(F ∨ G). All numerical instances of this axiom have the form F 0 ⊃ (F 0 ∨ G0 ), where
F 0 and G0 are sentences. From (∀n)(∗) we can infer (gödelizations) of the following
claims: if F 0 and G0 are sentences, then F 0 ⊃ (F 0 ∨ G0 ) is true iff either F 0 is not
true or (F 0 ∨ G0 ) is true, and (F 0 ∨ G0 ) is true iff either F 0 is true or G0 is true. Hence
F 0 ⊃ (F 0 ∨ G0 ) is true whenever F 0 and G0 are sentences; and hence the numerical
instances of any formula F ⊃ (F ∨ G) is true.
The same sort of argument works for the other logical axioms and for the rules
of inference. These arguments can easily be formalized to yield derivations in the
formal system. The individual nonlogical axioms of PA can most easily be derived
true by using the Tarski paradigms applied to their universal closures, since they are
also axioms of the extended system. That is, for example, `∗ Tr(p∀x(∼Sx = 0)q)
because `∗ Tr(p∀x(∼Sx = 0)q) ≡ ∀x(∼Sx = 0) and `∗ ∀x(∼Sx = 0). It re-
mains to show that the truth of all induction axioms F (0)  ∀x(F (x) ⊃ F (Sx)) ⊃
∀xF (x), where F (x) is any formula of LPA , can be derived in the formal sys-
tem. We must show that, for any numerical instance F 0 (x) of F (x), if F 0 (0) and
∀x(F 0 (x) ⊃ F 0 (Sx)) are true then so is ∀xF 0 (x). It should not be surprising that
to derive this we must use induction. Consider the property that holds of a num-
ber n iff F 0 (n) is true. If F 0 (0) and ∀x(F 0 (x) ⊃ F 0 (Sx)) are true then 0 has the
property and the successor of any number with the property also has the prop-
erty. Hence by induction every number has the property; hence ∀xF 0 (x) is true.
Note that, because we are showing this for any numerical instance F 0 (x) of any
formula F (x), the property must invoke the notion of truth; hence the formaliza-
76 CHAPTER 5. FORMALIZED SEMANTICS

tion of this argument will use an induction axiom that contains the formula Tr(x).

Now let Con(PA) be the formalization of the consistency of PA, i.e., the formula
∼Prov(pS0 = 0q).

Corollary `∗ Con(P A)

Proof. Since (∗) is derivable, so are Tarski paradigms; hence `∗ Tr(pS0 = 0q) ≡
S0 = 0. Since the formal system extends PA, `∗ ∼S0 = 0. Hence `∗ ∼Tr(pS0 = 0q).
By the Claim, `∗ ∼Prov(pS0 = 0q).

Note that a similar argument shows that in the formal system we can derive
the formal statement of the ω-consistency of PA. The Corollary shows that our
envisaged extension of PA can derive formulas of LPA , that are not derivable in PA.
Thus, not only is it expressively richer—it can formalize notions that PA cannot,
like truth for LPA —but also it is deductively stronger with respect to that part of
the language that is common to it and LPA .

5.4 Second-order Arithmetic


In this section we specify some extensions of LPA and of PA that could be used to
obtain the result of §??. We introduce LSA , the language of second-order arithmetic.
To obtain this language, we add to the alphabet of LPA variables X, Y, Z, X 0 , Y 0 ,
Z 0 , . . .. These are called set variables; the are intended to range over sets of integers.
We now refer to variables x, y, z, x0 , . . . as numerical variables. To the formation rules
we add the following clauses:

If t is a term and V is a set variable, then V (t) is a formula (an atomic


formula); if F is a formula and V is a set variable, then ∀V (F ) is a
formula.

That completes the specification of the language. This is a two-sorted language,


since there are two sorts of variables, which play different syntactic roles. A variable
of one sort may not be substituted for another, that is, such a substitution will lead
from a formula to a non-formula. The intended interpretation of an atomic formula
V (t) is that the number denoted by t belongs to the set V .
We now consider axioms for a formal system in this language. The schemata for
logical axioms remain as before, although of course restrictions on sort of variable
5.4. SECOND-ORDER ARITHMETIC 77

have to be observed. That is, there are separate universal instantiation axiom
schemata for each sort: ∀vF ⊃ F (v/t) for v a numerical variable and t a term,
and ∀V F ⊃ F (V /U ), for V and U set variables. The formal system SA, or full
second-order arithmetic, has the following non-logical axioms: the number-theoretic
axioms (N1)–(N6) of PA, together with the induction axiom
X(0)  (∀x)(X(x) ⊃ X(Sx)) ⊃ ∀xX(x)
in which the set variable X is a free variable; and the comprehension axioms
∃X∀x(X(x) ≡ F (x))
whenever F (x) is a formula of LSA in which X does not occur free. As a result
of including the comprehension axioms, mathematical induction can be framed as
a single axiom rather than as a schema. That is, if F (x) is any formula, F (0) 
∀x(F (x) ⊃ F (Sx)) ⊃ ∀xF (x) can be derived from the single axiom of mathematical
induction and the comprehension axiom ∃X∀x(X(x) ≡ F (x)).
SA is a very powerful formal system. It goes far beyond number theory: since
real numbers can be encoded as sets of integers, theorems of the theory of real
numbers and of the calculus can be derived in it. For that reason, SA is sometimes
called “classical analysis”. (Of course, despite its strength, SA is still incomplete,
assuming it is consistent. That is, since clearly SA can be gödelized, all p.r. functions
can be represented, and a standard provability predicate can be specified, all the
results of Chapter 3 apply to SA.)
Now the language LSA can be used to formalize the definition of truth for LPA .
And SA is certainly powerful enough to derive the formalization of (∀n)(∗). SA
also contains the mathematical induction axioms needed to carry through the proof
of the soundness of PA. Hence we see that there are sentences of LPA that are
derivable in SA but not in PA. (Assuming PA consistent.) Mathematicians used
to ask whether any theorem about the numbers proved using “analytic methods”
(methods that invoked the real numbers and their laws) could in principle be proved
using only “elementary methods”, that is, based on the usual first-order properties
of the integers. What we have seen shows the answer to this question is negative.
Now SA is far more powerful a system than is needed to show the soundness
of PA. The system ACA (arithmetical comprehension axioms) suffices. This theory
has, instead of all comprehension axioms, just those obtained from the comprehen-
sion schema by replacing F (x) with a formula containing no bound set variables.
Thus the only formulas that determine sets of integers are those restricted to quan-
tifying over integers. However, because of this restriction on comprehension axioms,
78 CHAPTER 5. FORMALIZED SEMANTICS

we must add a schema of mathematical induction, that is,


F (0)  ∀x(F (x) ⊃ F (Sx)) ⊃ ∀xF (x)
for every formula F (x) of LSA .
That ACA suffices to derive the formalized version of (∀n)(∗) relies on the
fact that, for any p, a p-good set can be defined by a formula with no bound set
variables. Then one of the definitions of Tr(m) suggested at the end of §?? can
be used. The schema of induction is needed, rather than the formulation using a
free set variable, because the proof of soundness of PA needs induction for formulas
involving the formalization Tr(x) of the definition of Tr(m), and that formalization
contains bound set variables.
The formal system with language LSA whose comprehension axioms are those
of ACA and whose induction axiom is the single axiom using a free set variable, a
formal system called ACA0 , is too weak to prove the soundness of PA. In fact, any
formula of LPA derivable in ACA0 is derivable in PA. (In logicians’ jargon, ACA0
is a conservative extension of PA.) Thus, ACA0 can express things that cannot
be expressed in LPA , and derive interesting things about them—it can formulate
a truth-definition and prove the Tarski paradigms— but cannot derive any purely
number-theoretic facts beyond what is derivable in PA.
In SA or ACA the consistency of PA is derivable; hence in those systems, the
Gödel sentence for PA is also derivable. Now we know that, if PA is consistent,
then in PA one cannot derive any formula that asserts an underivability in PA.
What we have just seen is that in SA or ACA one can derive such statements about
underivability in PA.
Nonetheless, even in SA one cannot derive every correct statement about un-
derivability in PA (assuming SA consistent). For let ProvSA (y) be a provability
predicate for SA formulated in language LPA . We claim first that, for any formula
F of LSA ,
` ProvSA (pF q) ⊃ Prov(pProvSA (pF q)q),
that is, in PA we can derive the statement “If a formula is derivable in SA, then in PA
it can be derived that the formula is derivable in SA”. The following reasoning, once
formalized, establishes this: if F is derivable in SA, then for some m DerSA (m, γ(F )),
where DerSA mirrors the notion of “derivation in SA”. Since DerSA can be numer-
alwise represented in PA, `PA DerSA (m, pF q), so that `PA ∃xDerSA (x, pF q)), that
is `PA ProvSA (pF q).
Now let F be “S0 = 0”, and take the contrapositive:
∼Prov(pProvSA (pS0 = 0q)q) ⊃ ∼ProvSA (pS0 = 0q).
5.5. PARTIAL TRUTH PREDICATES 79

Since this conditional is derivable in PA, it is derivable in SA. The consequent of this
conditional is just Con(SA), and so is not derivable in SA (assuming SA consistent).
Hence the antecedent is not derivable in SA, that is, in SA we cannot derive the
statement that the inconsistency of SA is not derivable in PA.

5.5 Partial truth predicates


Although there is no truth predicate for LPA in LPA , there are truth predicates in
LPA for fragments of LPA . In §4.1, we defined the ∆0 , Σ1 , and Π1 formulas. Also
we saw that truth for ∆0 sentences is primitive recursive (see the Exercises).
A Σ1 sentence has the form ∃v1 . . . ∃vm H(v1 , . . . , vm ), where H(v1 , . . . , vm ) is a
∆0 formula having only the free variables indicated. Such a sentence is true iff there
exists a sequence hp1 , . . . , pm i of numbers such that H(p1 , . . . , pm ) is true. Now the
gödel number of H(p1 , . . . , pm ) is a primitive recursive function of the gödel number
of ∃v1 . . . ∃vm H(v1 , . . . , vm ) and the (encoding of) the sequence hp1 , . . . , pm i, while
the truth of H(p1 , . . . , pm ) is a primitive recursive matter, as H(p1 , . . . , pm ) is a
∆0 sentence. Hence the truth of H(p1 , . . . , pm ) is a primitive recursive relation of
the gödel number of ∃v1 . . . ∃vm H(v1 , . . . , vm ) and the (encoding of) the sequence
hp1 , . . . , pm i. This relation can therefore be formalized with a Σ1 formula. The
statement “there exists a sequence hp1 , . . . , pm i of numbers such that H(p1 , . . . , pm )
is true” can then be formalized by adding more existential quantifiers at the front.
It follows that truth for Σ1 sentences can be formalized by a Σ1 formula. And then
we also obtain that truth for Π1 sentences can be formalized by a Π1 formula, since
the truth of a Π1 sentence F is just the falsity of the Σ1 sentence ∼F .
Now let us define, inductively: a Σn+1 -formula is a formula ∃v1 . . . ∃vm H, where
H is a Πn -formula, and a Πn+1 -formula is a formula ∀v1 . . . ∀vm H, where H is a
Σn formula. (As before, we allow m to be 0. This has the effect of making this
a cumulative hierarchy of formulas: if F is Σn or Πn , it is both Σp and Πp for all
p > n.)
We claim that, for each n, truth for Σn sentences can be expressed by a Σn -
formula, and (consequently) truth for Πn sentences can be expressed by a Πn -
formula. The argument goes by induction on n, and essentially repeats what we
just noticed for Σ1 sentences. Assume that truth for Πn sentences can be ex-
pressed by a Πn -formula. Let q be the gödel number of a Σn+1 sentence F =
∃v1 . . . ∃vm H(v1 , . . . , vm ), where H(v1 , . . . , vm ) is Πn . F is true iff there exists a
sequence hp1 , . . . , pm i such that H(p1 , . . . , pm ) is true. The gödel number of the
latter formula is a primitive recursive function of the (encoding of) the sequence
80 CHAPTER 5. FORMALIZED SEMANTICS

hp1 , . . . , pm i and q. Lets say this function is ϕ(hp1 , . . . , pm i, q). We have: F is true
iff there exists a sequence hp1 , . . . , pm i such that ϕ(hp1 , . . . , pm i, q) is the gödel num-
ber of a true Πn sentence. That is, F is true iff there exists a sequence hp1 , . . . , pm i
and there exists a r such that

1. r = ϕ(hp1 , . . . , pm i, q);

2. r is the gödel number of a true Πn sentence.

Clause (2) can be expressed by a Πn formula, by assumption; clause (1) can be


expressed by either a Π1 or a Σ1 formula. It follows that this entire condition can
be expressed by a Σn+1 formula.
Call a formula of LPA essentially Σn or Πn if it can be transformed into a Σn or
a Πn formula, respectively, by the usual prenexing rules. Thus every formula of LPA
is essentially Σn or essentially Πn for some n, and it is straightforward to extend
the formalizations of Σn truth and Πn truth to essentially Σn and essentially Πn
sentences (see the Exercises).
The existence of these partial truth predicates in LPA has an interesting con-
sequence. Pick any n. Consider the system obtained from PA by restricting all
axioms and all formulas in derivations to essentially Σn formulas. Call this system
PAn . Let Prov∗ (w, y) be a formalization of the notion of derivability in PAw , and let
Tn (y) be the Σn formalization of truth for Σn sentences. It shouldnt be surprising
that PA can prove the soundness of PAn , i.e., for any formula F derivable in PAn ,
F is true (or, more properly speaking, every numerical instance of F is true). That
is, we have
` ∀y[Prov∗ (n, y) ⊃ ∀z(NI(y, z) ⊃ Tn (z))].
But since ` ∼Tn (pS0 = 0q), it follows that

` ∼Prov∗ (n, pS0 = 0q).

That is, for each n, PA proves “PAn is consistent”.


But, assuming PA is consistent, PA cannot prove “for each n, PAn is consistent”,
that is, 6` ∀x ∼ Prov∗ (x, pS0 = 0q). For, it can be shown that ` ∀y[Prov(y) ⊃
∃x(Prov∗ (x, y))], that is, we can derive in PA the formalization of the obvious fact
that if a formula is derivable in PA it is derivable in some PAn . But then ` ∀x ∼
Prov∗ (x, pS0 = 0q) ⊃ ∼Prov(pS0 = 0q), so if PA could derive “for each n, PAn
is consistent” then it could derive “PA is consistent”, violating Gödel’s Second
Theorem.
5.6. TRUTH FOR OTHER LANGUAGES 81

5.6 Truth for other languages

The method used in the preceding sections for LPA can be applied to a formal lan-
guage provided that the language contains names for each member of the universe of
discourse of the intended interpretation. If this condition does not hold, a difficulty
arises in trying to define the truth of universal quantifications ∀xF (x), since the
truth of such a sentence is no longer specifiable in terms of the truth of its instances
F (c), where c is a term of the language. All we can say is that ∀xF (x) is true iff
F (x) is true when x is assigned any value from the universe of discourse. But then
we must provide a definition not just of truth but of the broader notion “truth under
an assignment of values to the free variables”. In this, we must treat all formulas of
the language, not just sentences.
Let U be the universe of discourse of the intended interpretation of a formal
language L. Assignments of values from U to the variables of language L may be
identified with finite sequences of elements of U ; such a sequence hs1 , . . . , sk i can be
taken to assign s1 to the alphabetically first variable of L, s2 to the alphabetically
second variable of L, . . . , sk to the alphabetically k th variable of L. For variables
later than the alphabetically k th , let us take sk , the last member of the sequence,
as the assigned value. In other words, if σ is a finite sequence of members of U of
length m, let (σ)i be the ith member of σ if i ≤ m and let (σ)i = (σ)m if i > m. We
shall take σ to assign (σ)i to the alphabetically ith variable of language L for every
i.
We say that a finite sequence σ satisfies a formula F iff F is true under the
assignment of values that σ encodes. Note that σ assigns values to all variables of
the language; but if the alphabetically ith variable of L does not occur free in F ,
then whether or not a sequence σ satisfies F will not depend on (σ)i (provided that
i is less than the length of σ).
We seek an inductive definition of satisfaction. The only trick lies in the treat-
ment of quantification. Now ∀vF (v) is true under an assignment of values to vari-
ables just in case the formula F (v) is true under every assignment that differs from
the given one at most in what is assigned to the variable v. For in that case, F (v) is
satisfied no matter what value is assigned to v, while the values of the other variables
remain fixed.
Let SatAt (σ, n) be the satisfaction relation for atomic formulas; i.e., it holds if n
is the gödel number of an atomic formula of L that is satisfied by σ. Let Atform, neg,
dis, gen, and Form be functions and relations that mirror the obvious syntactical
operations and properties for language L; and for each i let var(i) be the number
82 CHAPTER 5. FORMALIZED SEMANTICS

correlated with the alphabetically ith variable of L. The inductive definition is then:
Sat(σ, n) ≡ Form(n) & [(Atform(n) & SatAt (σ, n))
∨ (∃i < n)(n = neg(i) & Form(i) & Sat(σ, i))
∨ (∃i, j < n)(n = dis(i, j) & (Sat(σ, i) ∨ Sat(σ, j)))
∨ (∃i, k < n)(n = gen(var(k), i)
& (∀σ 0 )((∀j)((j 6= k → (σ 0 )j = (σ)j ) → Sat(σ 0 , i))].
It remains to see how SatAt may be defined. Details here will depend on the
vocabulary of the language L. Suppose, for example, that L contains, as nonlogical
vocabulary, just finitely many predicate-signs P1 , . . . , Pm , each of which is two-
place. (Thus L contains no constants or function-signs.) Suppose the intended
interpretation of L is given by a universe of discourse U and the m two-place relations
Φ1 , . . . , Φm on U . Let v1 , v2 , . . . be the variables of L in alphabetic order. Then we
would define
SatAt (σ, γ(F )) ≡ [(F has the form P1 vi vj & Φ1 (σ)i (σ)j )
∨ (F has the form P2 vi vj & Φ2 (σ)i (σ)j )
∨ . . . ∨ (F has the form Pm vi vj & Φm (σ)i (σ)j )]
If L contains function signs, then we must specify the value of the terms of L under
assignments σ. For example, for LPA , we would define a function val(σ, n) so that
if n is the gödel number of a term, then val(σ, n) is the value of that term when the
variables in it take the values assigned by σ. We would then define

SatAt (σ, n) ≡ [Atform(n) & (∃a)(∃b)(n = a ∗ 25 ∗ b & Val(σ, a) = Val(σ, b))].

In sum, an inductive definition of satisfaction for a formal language L can be


formulated in a language that is adequate for talking about
(1) arbitrary finite sequences σ of members of U and their members;

(2) syntactic objects of L (by gödelization this can amount to nothing more than
talking about numbers);

(3) the relations and functions that are the interpretations of the predicates and
function signs of L.
Moreover, by a device similar to that of §??, the inductive definition can be
converted into an explicit definition of Sat in a language that contains, in addition,
quantification over relations between sequences of members of U and (the Gödel
5.6. TRUTH FOR OTHER LANGUAGES 83

numbers of) formulas of L. That is, let (†R ) be the result of replacing, in the
inductive definition of Sat above, all occurrences of “Sat” with the variable R. We
can then define Sat explicitly by

Sat(σ, n) ≡ (∃R)((†R ) & R(σ, n))

or by
Sat(σ, n) ≡ (∀R)((†R ) → R(σ, n)).
Finally, given a definition of satisfaction, we can then define truth as follows

Tr(n) ≡ (∀σ)Sat(σ, n).


84 CHAPTER 5. FORMALIZED SEMANTICS
Chapter 6

Computability

6.1 Computability
The notion of algorithm, or effective procedure, is an intuitive one that has been
used in mathematics for a long time. An algorithm is simply a clerical procedure
that can be applied to any of a range of inputs and will, on any input, yield an
output. The basic idea is that an algorithm is a bunch of rules that can be applied
mechanically; obtaining an output from any given input is just a matter of applying
those rules (mindlessly, so to speak).
Before the 1930s, the notion was used in this intuitive sense. For example,
as the tenth of the mathematical problems he formulated in 1900, Hilbert asked
whether there is an algorithm which, if applied to any polynomial (containing any
number of variables) with integer coefficients, determines whether or not there are
integral values for the variables of the polynomial that give the polynomial the value
0. We ourselves have made intuitive use of the notion of algorithm. For example,
in defining formal language and formal system, we said there must be an effective
procedure for telling whether or not any given string of signs is a formula, and there
must be an effective procedure for telling whether or not any given finite sequence
of formulas is a derivation.
A number-theoretic function is said to be computable, in the intuitive sense, (or
algorithmic) iff there is an algorithm for computing the function, i.e., an algorithm
that yields, given any integers as inputs, the value of the function for these inputs
as arguments. Our first aim in this unit is to give a precise mathematical definition
of the notion of computable function.
Now we have seen a class of functions each of which is clearly computable, in

85
86 CHAPTER 6. COMPUTABILITY

the intuitive sense, namely, the primitive recursive functions. Perhaps one might
think that this class exhausts the computable functions—that any function which
we would intuitively call algorithmic is in fact primitive recursive. But this is not
the case, as the following heuristic argument indicates.
First, the specification of the p.r. functions allows us to consider all p.r. defini-
tions as being written in some standard symbolic form. We may then effectively list
all p.r. definitions of one-place functions. Say this list is D1 , D2 , D3 , . . .; and for each
i let ψi be the p.r. function that Di defines. (We can, for example, gödelize the p.r.
definitions and then list them in increasing order of gödel number.) Now consider
the following procedure: given input n, find the p.r. definition Dn . Then compute
the value of ψn at argument n; since Dn is a p.r. definition of ψn , it tells us how to
do this. Add one to this value of ψn ; the result the output. What was just described
is clearly an algorithm. This algorithm computes a function; call that function ϕ.
Then ϕ cannot be primitive recursive. For suppose it were; then it would have a
p.r. definition, and that definition would occur on our list, say as Dk . That is, we
would have ϕ = ψk . But the value of ϕ at argument k is ψk (k) + 1; so ϕ cannot be
identical with ψk . Thus ϕ is not p.r. But ϕ is computable.
This shows that the p.r. functions do not exhaust the computable functions.
Indeed, our argument yields more: it shows that however we eventually define the
notion of computable, it cannot be possible to list effectively algorithms that com-
pute all and only the computable functions.
Our eventual definition of computable function will handle this pitfall as fol-
lows. We shall define a notion of “computing instructions”, that is, a standard
form of algorithm. A computable function is any function that can be computed
by a computing-instruction. There will be an effective procedure for listing all the
computing-instructions. But not all computing-instructions succeed in computing
functions, and it will be impossible to “separate out”, in an effective manner, just the
computing instructions that do compute functions. This will become clearer when
we see the details. In fact, we shall in fact give two explications of computability:
the first uses formal systems of a particular sort, called “Herbrand-Gödel- Kleene
systems” (also called the “equation calculus”); the second uses an abstract model
of a computing machine, called a “Turing machine”. It will turn out that these
explications are equivalent.
6.2. RECURSIVE AND PARTIAL RECURSIVE FUNCTIONS 87

6.2 Recursive and partial recursive functions


We shall first specify a formal language LHGK . An HGK-system will be a formal
system of a particular form in this language, and the notion of derivation in an
HGK-system will be taken to correspond to the notion of computation.
Language LHGK
Alphabet: 0 S = ( ) ,
x y z x1 y1 z1 x2 . . . [formal variables]
f g h f1 g1 h1 f2 . . . [function letters]
The function letter f is called the principal function letter.
Formation rules:

1. 0 is a term; any formal variable is a term;

2. If t is a term then so is St;

3. If t1 , t2 , . . . , tn are terms, n > 0, and δ is a function letter, then δ(t1 , t2 , . . . , tn )


is a term;

4. Nothing else is a term.

If t and u are terms then t = u is a formula. All formulas are called equations.
An HGK-system is simply a finite set of equations. We treat each HGK-system
as a formal system: the notions of derivation and derivability in any such system E
are determined by taking the members of E to be the axioms, and allowing just the
following two rules of inference.

1. From an equation t = u that contains a formal variable v may be inferred any


equation that results from t = u by replacing v with a formal numeral.

2. From an equation t = u that contains no variables and an equation


δ(p1 , p2 , . . . , pn ) = r, where δ is a function letter, may be inferred any
equation that results from t = u by replacing one or more occurrences of
δ(p1 , p2 , . . . , pn ) with occurrences of r.

If E is an HGK-system, we write `E t = u for “the equation t = u is derivable


in the system E.”
88 CHAPTER 6. COMPUTABILITY

Definition. Let ϕ be an n-place function. ϕ is said to be defined by an


HGK-system E iff, for all p1 , p2 , . . . , pn , and r,

`E f (p1 , p2 , . . . , pn ) = r ↔ ϕ(p1 , p2 , . . . , pn ) = r.

The function ϕ is said to be general recursive (or, more briefly, recursive) iff it is
defined by some HGK-system E.
The notion of general recursive function is what we shall adopt as our precise
mathematical explication of the intuitive notion of computable function. Every
general recursive function is computable in the intuitive sense. For suppose ϕ is
defined by the HGK-system E. Then to compute ϕ(p1 , . . . , pn ) one simply makes
an exhaustive search through the derivations in system E until one finds a derivation
of an equation f (p1 , p2 , . . . , pn ) = r for some r; r is then the value of ϕ(p1 , . . . , pn ).
Since E defines ϕ , we are assured that there is such a derivation in system E.
Not every HGK-system defines a function. For example, suppose E contains
the one equation f (Sx) = SS0. Then `E f (p) = 2 for every p > 0, but no equation
of the form f (0) = r is derivable from E. We shall consider this phenomenon more
closely two pages hence. For now, we are interested solely in HGK-systems that do
define functions.
We start by investigating the extent of the general recursive functions.

Fact 6.1. Every primitive recursive function is general recursive.

Proof. It is a simple matter to formalize p.r. definitions by HGK-systems. For


example, the system consisting of the following four equations defines multiplication:
g(x, 0) = x, g(x, Sy) = Sg(x, y), f (x, 0) = 0, f (x, Sy) = g(f (x, y), x).

Fact 6.2. The class of general recursive functions is closed under composition.

Proof. Obvious.

Our third fact about the extent of the general recursive functions gives us some-
thing new: the possibility of specifying functions by use of the unbounded leastness
operator µ.

Fact 6.3. Let ϕ be an (n + 1)-place general recursive function such that


(∀p1 )(∀p2 ) . . . (∀pn )(∃k)[ϕ(p1 , . . . , pn , k) = 0]. Then µk[ϕ(p1 , . . . , pn , k) = 0], that is,
the n-place function that carries hp1 , . . . , pn i to the least k such that ϕ(p1 , . . . , pn , k) =
0, is also general recursive.
6.2. RECURSIVE AND PARTIAL RECURSIVE FUNCTIONS 89

Proof. Let E be an HGK-system defining ϕ. Let E 0 be the result of relettering all


function letters in E as letters fi ; in particular, let the principal function letter f
be relettered as f1 . Let E ∗ be the HGK-system obtained by adding the following
eight equations to E 0 :

g1 (x, 0) = x

g1 (x, Sy) = Sg1 (x, y)

g2 (x, 0) = 0

g2 (x, Sy) = g1 (g2 (x, y), x)

g(x1 , . . . , xn , 0) = S0

g(x1 , . . . , xn , Sy) = g2 (f1 (x1 , . . . , xn , y), g(x1 , . . . , xn , y))

h(Sx, 0, y) = y

f (x1 , . . . , xn ) = h(g(x1 , . . . , xn , y), g(x1 , . . . , xn , Sy), y)

We claim that the system E ∗ defines the function µk[ϕ(p1 , . . . , pn , k) = 0], and
hence that function is general recursive. To prove the claim it suffices to note the
following:

1. `E ∗ g2 (i, j) = k iff k = i · j.
Qk
2. `E ∗ g(p1 , p2 , . . . , pn , k + 1) = q iff q = i=0 ϕ(p1 , . . . , pn , i).

3. `E ∗ h(i, j, k) = m iff i 6= 0, j = 0, and k = m.

4. `E ∗ f (p1 , p2 , . . . , pn ) = k iff
Qk−1 Qk
i=0 ϕ(p1 , . . . , pn , i) 6= 0 but i=0 ϕ(p1 , . . . , pn , i) = 0.

If the function ϕ has the property that (∀p1 )(∀p2 ) . . . (∀pn )(∃k)[ϕ(p1 , . . . , pn , k) =
0] then we say that the application of the unbounded leastness operator µ to ϕ is
licensed. Thus Facts 6.1–6.3 tell us that the class of general recursive functions con-
tains the primitive recursive functions and is closed under composition and licensed
90 CHAPTER 6. COMPUTABILITY

application of µ. In §?? we shall show the converse: every general recursive func-
tion can be obtained from primitive recursive functions by composition and licensed
application of µ.
As we pointed out above, not every HGK-system defines a function. For a sys-
tem E to define an n-place function, there must exist for all p1 , . . . , pn a unique r such
that `E f (p1 , . . . , pn ) = r. This condition can fail in two ways: for some p1 , . . . , pn
there may be distinct q and r with `E f (p1 , . . . , pn ) = q and `E f (p1 , . . . , pn ) = r;
or for some p1 , . . . , pn there may be no r with `E f (p1 , . . . , pn ) = r.
The former problem may easily be avoided. We simply redefine the notion of
derivability in an HGK-system as follows: we now say that an equation
f (p1 , . . . , pn ) = r is derivable in a system E iff, first, there is a derivation of it
in E and, second, no smaller derivation is a derivation of f (p1 , . . . , pn ) = q for
q 6= r (“smaller” in the sense of a gödel numbering, which we assume has been
fixed). This device—inspired, as should be obvious, by Rosser’s proof of §??—
makes it the case that for any p1 , . . . , pn there is at most one integer r such that
`E f (p1 , . . . , pn ) = r.
The latter problem, however, admits no such solution. Examples of HGK-
systems in which for some p1 , . . . , pn no equation f (p1 , . . . , pn ) = r can be derived
are easy to formulate. We have already seen a simple one. Here is another: let E
contain just the equations g(x, 0) = x, g(x, Sy) = Sg(x, y), f (g(x, x)) = x. Then
`E f (p) = r iff p is even and r = p/2. Thus E does not define a 1-place function.
E does define something, though; namely, a function whose domain is just the even
integers, and which takes each integer in its domain to one-half of that integer. Such
a function is called a partial function.
Definition. An n-place partial function is an integer-valued function whose
domain is some set of n-tuples of integers. If the domain of an n-place partial
function ϕ is the set of all n-tuples, then ϕ is said to be total. (The domain of
an n-place partial function may be all n-tuples, or it may be empty, or it may be
something in between.)
We may now take it that every HGK-system E defines an n-place partial
function for each n > 0, namely, the unique partial function ϕ such that for all
p1 , . . . , pn , r,
`E f (p1 , . . . , pn ) = r iff ϕ(p1 , . . . , pn ) = r.
Thus the domain of ϕ is the set of n-tuples hp1 , . . . , pn i such that for some r
`E f (p1 , . . . , pn ) = r. An n-place partial function ϕ is said to be partial recur-
sive iff it is defined by some HGK-system. Thus, a general recursive function is
simply a partial recursive function that is total.
6.3. THE NORMAL FORM THEOREM AND THE HALTING PROBLEM 91

We saw in the previous section that licensed application of the leastness operator
µ to a general recursive function yields a general recursive function (Fact 6.3). The
same proof shows that any application of µ—licensed or not—to a general recursive
function yields a partial recursive function. That is, if ϕ is general recursive, then
the function µk[ϕ(p1 , . . . , pn , k) = 0], which takes hp1 , . . . , pn i to the least k such
that ϕ(p1 , . . . , pn , k) = 0 if there is such a k and takes no value on hp1 , . . . , pn i if
there is no such k, is partial recursive. That function will be total, and hence general
recursive, just in case the application of µ is licensed.
Note: The leastness operator µ may also be applied to partial recursive func-
tions that are not total; but here the definition must be phrased with care. Reflection
on the proof of Fact 6.3 shows that, if that proof is to show µk[ϕ(p1 , . . . , pn , k) = 0]
to be partial recursive when ϕ is partial recursive, we should define


 the least k such that



 ϕ(p1 , . . . , pn , k) = 0 if such a k exists
and ϕ(p1 , . . . , pn , j) has

µk[ϕ(p1 , . . . , pn , k) = 0] =

 a valuefor each j < k.





no value otherwise.

If µ is so defined, we have: the class of partial recursive functions is closed


under application of µ. End of Note.

6.3 The Normal Form Theorem and the Halting Prob-


lem
First we show that all partial recursive functions can be obtained from primitive
recursive functions by composition and application of µ. To do this, we gödelize the
language LHGK . Clearly this can be done so as to yield the following.

(I) Each HGK system is correlated with a finite set of integers, the gödel numbers
of its axioms. These finite sets, in turn, can be correlated with integers, so
that every integer corresponds to an HGK system and vice versa. We call the
integer so correlated with an HGK-system E the index number of E.

(II) For each n > 0 there is a (n + 2)-place primitive recursive function Dern such
that Dern (e, p1 , . . . , pn , q) = 0 if and only if q is the gödel number of a deriva-
tion in the HGK-system with index number e, and the last line of this deriva-
92 CHAPTER 6. COMPUTABILITY

tion has the form f (p1 , . . . , pn ) = r for some r; and Dern (e, p1 , . . . , pn , q) = 1
otherwise.

(III) There is a 1-place primitive recursive function Res such that if q is the gödel
number of a sequence of equations the last of which has a formal numeral r
on the right-hand side, then Res(q) = r; and Res(q) = 0 otherwise.

Normal Form Theorem. Let ϕ be an n-place partial recursive function. Then


there is a number e such that, for all p1 , . . . , pn ,

ϕ(p1 , . . . , pn ) = Res(µq[Dern (e, p1 , . . . , pn , q) = 0]).

Note that ϕ is total (and hence general recursive) iff the application of µ is licensed.

Proof. Since ϕ is partial recursive, it is defined by some HGK system E. Let e be


the index number of E. Given any p1 , . . . , pn , let k = µq[Dern (e, p1 , . . . , pn , q) = 0],
if there is such a q. Then k is the smallest number that is the gödel number of a
derivation from E whose last line has the form f (p1 , . . . , pn ) = r; and Res(k) = r.
But then `E f (p1 , . . . , pn ) = r; since E defines ϕ, r is the value of ϕ(p1 , . . . , pn ).
If, on the other hand, there is no q such that Dern (e, p1 , . . . , pn , q) = 0, then no
equation f (p1 , . . . , pn ) = r is derivable in E; since E defines ϕ, ϕ takes no value on
hp1 , . . . , pn i.

The Normal Form Theorem shows that every partial recursive function can be
obtained by starting with a primitive recursive function, applying the µ-operator,
and composing with a primitive recursive function. The partial function will be
total, and hence general recursive, iff the application of µ is licensed.
Note: In the statement of the Normal Form Theorem and below, when we
use “=” between two expressions for partial functions we mean: when both sides
have values then those values are identical; and when one side takes no value, then
neither does the other. End of Note.
As we’ve said, we take the notion of general recursive function to be the precise
explication of the intuitive notion of computable function.
Church’s Thesis—HGK Form. A function is computable (in the intuitive
sense) if and only if it is general recursive.
Church’s Thesis is not a mathematical claim. It asserts the equivalence of
a mathematical notion and an intuitive one, and hence there can be no question
of proof. Of course, there can be plausibility arguments (of a more or less philo-
sophical nature). I have already argued for the direction “If general recursive then
6.3. THE NORMAL FORM THEOREM AND THE HALTING PROBLEM 93

computable”. For the converse, one might note the following. First, every particular
function that people have ever encountered and judged on intuitive grounds to be
computable has turned out to be general recursive. Second, there are in the lit-
erature other mathematical explications of the notion of computability (we’ll treat
one, namely, Turing-computability, in §??), and in each case it can be shown that
the explications are equivalent, that is, each yields exactly the same class of func-
tions. Third, if our intuitive notion of algorithm is something like that of a finite
list of instructions applied to various inputs, then any precise notions of instruction
and application of instructions should be gödelizable, and this will yield a result
analogous to the Normal Form Theorem.
A partial recursive function is “semi-computable” in the following sense: given
input hp1 , . . . , pn i, we can systematically seek a derivation of f (p1 , . . . , pn ) = r for
some r; if there is such a derivation, we shall find it eventually; but if there is no
such, we will go on forever. Of course, if the function is total then, no matter what
input we are given, our computing will eventually stop.
It would be nicer not to have to deal with partial functions. One might hope to
eliminate partial recursive functions that are not total; perhaps one could effectively
weed out those HGK-systems that define nontotal functions. To do this for 1-place
functions, one would need an effective procedure that would yield, given any HGK-
system E, a “yes” if E defines a 1-place total function and a “no” if not. However,
as we shall now see, no such effective procedure exists.
We shall speak of effective procedures whose inputs are index numbers, rather
than HGK-systems. Such procedures can then be identified with general recursive
functions (where we take output 1 to be “yes” and output 0 to be “no”).
For any number e, we use ϕe for the 1-place partial recursive function defined
by the HGK-system with index number e. For n > 1, we use ϕne for the n-place
partial recursive function defined by the HGK-system with index number e.
Unsolvability of the Totality Problem. There is no general recursive
function ψ such that, for every integer e,

1 if ϕe is total
ψ(e) =
0 if ϕe is not total.

Proof. Suppose such a ψ exists. By Facts 6.1 and 6.2, the function η(e, q) = ψ(e) ·
Der1 (e, e, q) is general recursive. From the supposition about ψ, we have η(e, q) = 0
if either ϕe is not total or else ϕe is total and q is the gödel number of a derivation
94 CHAPTER 6. COMPUTABILITY

in the HGK-system with index number e of an equation f (e) = r for some r. Hence
∀e∃q[η(e, q) = 0]. By Fact 6.3, the function µq[η(e, q) = 0] is general recursive. By
Facts 6.1 and 6.2, the function δ(e) = Res(µq[η(e, q) = 0]) + 1 is general recursive.
By the definition of Res we have, for every e,

ϕe (e) + 1 if ϕe is total
δ(e) =
1 if ϕe is not total.

Now δ, being general recursive, is identical to ϕe0 for some e0 , and ϕe0 is thus
total. But then we have ϕe0 (e0 ) = ϕe0 (e0 ) + 1, a contradiction. (The reader should
compare this proof to the heuristic proof about p.r. functions given in §??).

Thus there is no effective way to weed out HGK-systems that fail to define total
functions. Perhaps then, one could hope at least to “patch up” those HGK-systems
that so fail. That is, if ϕe takes no value on p, why not just set the value equal to
0? But to do this, one would need an effective procedure for telling, for any e and
any p, whether p is in the domain of ϕe or not. This too turns out to be impossible.
Unsolvability of the Halting Problem. There is no 2-place general recur-
sive function ψ such that

1 if ϕe takes a value on p
ψ(e, p) =
0 if ϕe takes no value on p.

Proof. Suppose there were such a ψ. Then there exists a partial recursive function
δ such that, for all e,

0 if ϕe (e) has no value
δ(e) =
no value if ϕe (e) has a value.

Namely, let δ(e) = µk(ψ(e, e) + k = 0). By the supposition, if e is not in the domain
of ϕe then ψ(e, e) = 0, so that δ(e) = 0; and if e is in the domain of ϕe then
ψ(e, e) = 1 so that δ takes no value on e.
Now, since δ is partial recursive, it is ϕe0 for some e0 . By the specification of δ
we then have: if e0 is in the domain of ϕe0 then e0 is not in the domain of δ, i.e., e0
is not in the domain of ϕe0 ; and if e0 is not in the domain of ϕe0 then δ(e0 ) = 0 so
that e0 is in the domain of ϕe0 This is a contradiction.
6.4. TURING MACHINES 95

The unsolvability of the totality and halting problems show that we cannot
eliminate nontotal partial recursive functions by any effective procedure. This is as
we should have expected, given the heuristic argument of §??. The cost of capturing
all computable functions by means of HGK-systems is that we cannot effectively
avoid those HGK-systems that do not define total functions.

6.4 Turing Machines


In this section we give a seemingly rather different explication of the notion of
computability, although as has been mentioned it will turn out to yield the same
class of number-theoretic functions as the explication by the use of HGK-systems.
One of the chief virtues of the Turing machine explication lies in its picturesque
quality.
We think of a computing machine that operates on a two-way infinite tape.
This tape is to serve as the means for getting input into the machine, the place
where the machine prints its output, and also the place where the machine does its
scratch-work. (The machine has no memory.) The tape is divided into cells, each
of which can either be blank or else contain a symbol. Thus we may picture a tape
thus:

• • • • •

where each dot represents either a blank or else a symbol of some sort, and the tape
extends without limit in both directions from the depicted segment.
We may conceive of the machine as a mechanism that, at any moment, sits over
a cell of the tape. The machine can scan the cell it sits over; it then takes the symbol
(or blank) inscribed in that cell into account, and does something. The something
it may do includes replacing the symbol with another and includes moving on—that
is, shifting to the next cell to the right or to the next cell to the left. What the
machine does at any point is determined by a finite list of instructions that we have
given the machine. (Since we do not care about anything but the behavior of the
machine, we may say that a machine is just its instructions.)
To be more precise, at any particular moment the machine is in one of a finite
96 CHAPTER 6. COMPUTABILITY

number of states. Each machine-instruction has the form: if in state i and the
cell being scanned contains symbol t, then do so-and-so and go into state j. The
so-and-so has two parts: the first is either erase the symbol t or replace t with t0
or leave t as it is; the second is either stay put or move left one cell or move right
one cell. In short, a Turing machine is specified by: first, a list of states, that is,
a specification of how many states there are; second, a finite alphabet of symbols
(including blank); and third, a finite list of instructions. Each instruction has the
form of a quintuple
hi, t, t0 , X, ji.

where i and j are numbers no greater than the number of states, t and t0 are
members of the alphabet, and X is either “D” (dont move), “L” (move left one), or
“R” (move right one). The instruction may be read: if in state i and scanning a cell
containing t, then replace t with t0 , move as X directs, and go into state j. To make
the machine deterministic—that is, at each juncture there is at most one applicable
instruction—we insist that no two instructions have the same first two members.
Given a Turing machine M , we may investigate the behavior of M when it is
started in a particular state at a particular place on a given tape.
Example. Let M be the Turing machine that has 4 states, whose alphabet is
just B (blank) and | (stroke), and whose instructions are:
h1, |, |, R, 1i h1, B, B, L, 2i h2, |, B, L, 3i
h3, B, |, D, 4i h3, |, |, D, 4i
How does this machine work? Let us first consider what happens if the machine
starts in state 1 situated at a cell containing a stroke and to the left and right of
which are cells containing blanks. We may symbolize this initial situation thus:

B B | B B

Since the machine is in state 1 and is scanning a stroke, the first instruction is
applicable. Thus the machine leaves the stroke as is, moves right one cell, and
remains in state 1. So heres how things look after this first move.
6.4. TURING MACHINES 97

B B | B B

The second instruction is now applicable. So the machine moves to the left and goes
into state 2. After this second move, we have:

B B | B B

The third instruction is now applicable. Hence the machine erases the stroke, moves
left one, and goes into state 3.

B B B B B

At this point the fourth instruction applies, so the machine writes a stroke in the

B | B B B

And then the machine halts; no instruction is applicable. In general, a machine


98 CHAPTER 6. COMPUTABILITY

halts iff it is in stage i, is scanning a cell containing t, and no machine-instruction


has first two members i,t.
We now consider how the machine behaves if started in state 1 scanning a cell
containing a stroke, to the right of which are two more cells containing strokes and
then a cell containing a blank. Here are the initial situation and the ensuing ones:

B | | | B

B | | | B

B | | | B

B | | | B

1
6.4. TURING MACHINES 99

B | | | B

B | | B B

B | | B B

The machine halts after six steps, since it is in state 4 and scanning a cell containing
a stroke.
In general, suppose M is started in state 1 scanning a cell containing a stroke to
the right of which there are n > 0 cells containing strokes and then a cell containing
a blank. The machine then moves to the right through all the cells that contain
strokes until it encounters the cell that contains a blank; then it backs up, erases the
last stroke, backs up once more, and halts. We may summarize its behavior more
easily after we introduce some terminology.
A tape represents a number n ≥ 0 iff the tape is blank but for n + 1 consecutive
cells each of which contains a stroke. To start a machine on input n is to start
the machine in state 1 situated at the leftmost stroke in a tape representing n. A
Turing machine yields m on input n iff when the machine is started on input n it
eventually halts, and at the moment when it halts, the tape represents m.
Thus for each n ≥ 0, the Turing machine M above yields Pred(n) on input n
100 CHAPTER 6. COMPUTABILITY

(recall that Pred is the truncated predecessor function).


Here is a machine that, for each n ≥ 0, yields n + 1 on input n. M contains
just the two instructions h1, |, |, R, 1i and h1, B, |, D, 2i.
Definition. Let ϕ be a 1-place number-theoretic function. A Turing machine
M computes ϕ iff, for each n ≥ 0, M yields ϕ(n) on input n. The function ϕ is
Turing-computable iff some Turing machine computes ϕ.
It should be clear that every Turing-computable function is computable in the
intuitive sense.
Now a Turing machine M may yield nothing on input n. For example, let M be
the Turing machine with instructions h1, |, |, R, 2i and h2, |, |, D, 2i. Then M yields
0 on input 0 and yields nothing (does not halt) on input n for n > 0.
Thus not every Turing machine computes a 1-place function. But every Turing
machine does compute a partial function, if we define the notion of computing here
as follows: Let ϕ be a 1-place partial function. Then M computes ϕ iff M yields
ϕ(n) on input n for each n in the domain of ϕ; and M does not halt on input n for
each n not in the domain of ϕ. (We say M does not halt on input n iff M yields no
value on input n. This slight misuse of terminology is picturesque, and promotes
good intuitions.)
We may easily extend these notions to n-place functions and partial functions.
Say that a tape represents an n-tuple hp1 , . . . , pn i of integers iff the tape is blank
but for a sequence of consecutive cells, which contain the following configuration:
first, p1 + 1 consecutive strokes, then a blank, then p2 + 1 consecutive strokes, then
a blank, . . . , then pn + 1 consecutive strokes. The notion of yielding can then be
defined as before. From this we obtain the notion of a Turing machines computing
an n-place function or an n-place partial function. An n-place (partial) function is
Turing-computable iff some Turing machine computes it.
By skillful programming, it can be shown that all primitive recursive functions
are Turing-computable; that the Turing-computable partial functions are closed
under composition and application of µ. It then follows from the Normal Form
Theorem that all partial recursive functions are Turing-computable. Moreover, as
we shall see in an Appendix to this section, Turing machines are specific sorts of
formal systems; as such, they can be gödelized and a normal form theorem proved:
every Turing-computable partial function can be obtained from primitive recursive
functions by one application of µ and composition. It follows from this that all
Turing-computable partial functions are partial recursive. Thus HGK-systems and
Turing machines pick out the same classes of functions and partial functions.
Churchs Thesis—Turing Form. A function is computable in the intuitive
6.4. TURING MACHINES 101

sense iff it is Turing-computable.


Since the Turing-computable functions are exactly the general recursive func-
tions, this form of Churchs Thesis is equivalent to the one on page ??. However, some
logicians—including Gödel—take the formulation in terms of Turing computability
to be more directly supportable. They argue that Turing machines explicate not
just the notion of computability, but also that of computation. Thus, this analysis
provides the correct analysis of what we intuitively mean by “mechanical procedure”.
Some evidence for this, one can argue, comes from the stability of the notion of
Turing-computability. Although the kinds of instructions we allow Turing machines
to have are rather limited, one can add other primitive forms of instructions, but
in each case the new instructions can be simulated by instructions of the original
sort. Or one can give the machines “memory”, say by having any number of other
tapes to work on, and instructions of the form: go to the nth memory tape, and
work on it. Anything such expanded machines can do, however, can be simulated
by a machine of the restricted sort we have defined.
The Halting Problem. The Halting Problem for Turing machines is the
problem of determining, given any Turing machine M and any integer p, whether
or not M eventually halts if started on input p. Since Turing machines and HGK-
systems are equivalent, it should occasion no surprise that the Halting Problem is
not effectively solvable. A direct proof, completely in terms of Turing machines, is
not difficult to frame.
Suppose we assign gödel numbers to Turing machines (this can be done, since
machines are finite sets of quintuples). Now let us grant the following fact: for every
Turing machine M there is a Turing machine N such that, for all p, if M yields 0
on input hp, pi then N yields 1 on input p, and otherwise N does not halt on input
p. This fact can be shown by some reasonably simple computer-programming. It
then follows quickly that:

There is no Turing machine M such that, for all e and n, if the Turing
machine numbered e halts on input n, then M yields 1 on input he, ni,
and if the Turing machine numbered e does not halt on input n, then
M yields 0 on input he, ni.

For suppose M were such a Turing machine. Let N be the machine provided by the
fact granted above, and let d be the gödel number of N . From the specification of
M we have that M yields 1 on input hd, di iff N halts on input d; but from the
definition of N we have that N halts on input d iff M yields 0 on input hd, di. Thus
we obtain a contradiction, and we may conclude that no such M exists.
102 CHAPTER 6. COMPUTABILITY

Appendix. Formal treatment. Lest the reader be carried away by the rather
pictorial nature of the preceding section, we indicate here how Turing machines and
their behavior may be defined more formally. Let S be a finite set of symbols,
including “B” and “|”, and let q1 , q2 , . . . be symbols not in S. Then a Turing
machine M (on S) is simply a finite set of quintuples

hqi , t, t0 , X, qj i,

where t and t0 are in S and X is one of the symbols “D”, “L”, or “R”, such that no
two distinct quintuples have the same first two members. (The symbol qi represents
state i.)
We now seek to formalize the notion of “the situation of a machine at a given
time”. Note that in all our work we have been dealing with tapes that are blank
in all but a finite number of cells. Thus all we need to encode is: what that finite
stretch of tape that contains all the nonblank cells looks like; where the machine is
(what cell it is scanning); and what state the machine is in. We can capture this
by a notion of instantaneous description (id): an instantaneous description is any
string of the form P qi tQ, where t is a member of S and P and Q are (possibly
empty) strings of symbols from S.
We now define the notion that encodes the operation of a machine M according
to its instructions. Let I and J be id’s. We say that I M -produces J iff either

1. There are (possibly empty) strings P and Q such that I is P qk tQ, J is P qm t0 Q,


and hqk , t, t0 , D, qm i is a quintuple in M ;
or

2. There are (possibly empty) strings P and Q such that I is P qk tsQ, J is


P t0 qm sQ, and hqk , t, t0 , R, qm i is a quintuple in M ;
or

3. There is a (possibly empty) string P such that I is Pk t, J is P t0 qm B, and


hqk , t, t0 , R, qm i is a quintuple in M ;
or

4. There are (possibly empty) strings P and Q such that I is P sqk tQ, J is
P qm st0 Q, and hqk , t, t0 , L, qm i is a quintuple in M ;
or
6.5. UNDECIDABILITY 103

5. There is a (possibly empty) string Q such that I is qk tQ, J is qm Bt0 Q, and


hqk , t, t0 , L, qm i is a quintuple in M .

An M -computation is a finite sequence I1 , . . . , Ik of ids such that, for each i < k,


Ii M -produces Ii+1 . An M -computation is finished iff its last member is P qm tQ for
some P ,Q and qm , t are the first two members of no quintuple in M . Let m, n ≥ 0.
The Turing machine M yields p on input n iff there is a finished M -computation
I1 , . . . , Ik such that

(a) I1 is q1 | . . . |, with n + 1 occurrences of |;

(b) Ik is P qm Q for strings P , Q such that P Q contains p + 1 occurrences of |.

The notion of yielding on inputs that are n-tuples can be defined in similar fashion.
Thus we see that Turing machines can be treated as nothing more than (pecu-
liar) types of formal systems. The point of this is, in part, simply to make clear that
we may gödelize Turing machines and their behavior. That is, we may assign index
numbers to Turing machines, and gödel numbers to instantaneous descriptions and
to finite sequences of instantaneous descriptions, in such a way that the following
holds:

1. There is, for each n > 0, a (n + 2)-place primitive recursive relation Tn such
that Tn (e, p1 , . . . , pn , q) iff e is the index number of a Turing machine M and
q is the gödel number of a finished M -computation on input hp1 , . . . , pn i.

2. There is a 1-place primitive recursive function Ans such that Ans(q) = m iff
q is the gödel number of a sequence of ids, the last id of which has the form
P qj Q such that P Q contains m + 1 strokes.

From this it then follows that for each n-place partial function ϕ that is com-
puted by a Turing machine, there is an e such that

ϕ(p1 , ..., pn ) = Ans(µq[Tn (e, p1 , ..., pn , q)]).

6.5 Undecidability
In this section we are concerned with applying recursive functions to the study of
formal systems. One major issue is that of decidability. In §?? we said that a formal
system Σ is decidable iff there is a computational procedure for telling, given any
104 CHAPTER 6. COMPUTABILITY

formula of Σ, whether of not that formula is derivable in Σ. By use of Church’s


Thesis, we may make this a precise definition.
Definition. Let Σ be a formal system, which we assume gödelized via γ. A
decision procedure for Σ is a recursive function ϕ such that, for each formula F ,

1 if `Σ F
ϕ(γ(F )) =
0 if not `Σ F .

System Σ is (recursively) decidable iff there is a decision procedure for Σ. (Warning:


Do not confuse the notion of decidability with that of formal decidability. The former
applies to formal systems. The latter applies to formulas within formal systems. The
use of the same word is merest coincidence.)
We shall be proving the undecidability of various systems. To obtain such
results, we need to see how recursive functions may be numeralwise represented.
Here the Normal Form Theorem is of great help.
Extended Representability Theorem. Every recursive function is numer-
alwise representable in PA.

Proof. As we showed in Chapter 4, the class of functions that are numeralwise


representable in PA contains all primitive recursive functions and is closed under
composition. By the Normal Form Theorem it suffices to show that this class of
functions is closed under licensed application of µ, that is, if ϕ is an (n + 1)-place
(total) function that is numeralwise representable in PA and to which the application
of µ is licensed, then µk[ϕ(p1 , . . . , pn , k) = 0] is also numeralwise representable in
PA.
For notational convenience we take the case n = 2. Let Φ(x, y, z, w) numeralwise
represent ϕ; and let u < v be the formula u ≤ v  ∼u = v. Then let F (x, y, z) be the
formula
Φ(x, y, z, 0)  ∀z 0 (z 0 < z ⊃ ∼Φ(x, y, z, 0).
We claim that F (x, y, z) numeralwise represents the function µk[ϕ(p1 , p2 , k) = 0].
Indeed, it is a routine matter to show that if µk[ϕ(p1 , p2 , k) = 0] = q, then

` F (p1 , p2 , q)  (F (p1 , p2 , z) ⊃ z = q).

Church’s Theorem. If PA is consistent then PA is undecidable.


6.5. UNDECIDABILITY 105

Proof. Let ϕ be any recursive function; we construct a formula G of LPA that


attests to the fact that ϕ is not a decision procedure for PA. By the Extended
Representability Theorem, there is a formula Φ(x, y) that numeralwise represents ϕ
in PA. By the Fixed Point Theorem there is a formula G of PA such that

`PA G ≡ ∼Φ(pGq, S0).

Now either ϕ(γ(G)) = 1 or not. If ϕ(γ(G)) = 1 then `PA Φ(pGq, S0) so that
`PA ∼G; hence if PA is consistent then G is not derivable in PA. If ϕ(γ(G)) 6= 1,
then `PA ∼Φ(pGq, S0), so that `PA G. Hence, in either case, ϕ gives us the wrong
answer on G.

Note 1. The above proof should feel familiar. One could rephrase it thus: if
PA were decidable, the Bew would be numeralwise representable in PA, by dint of
the Extended Representability Theorem. But, by the Fixed Point Theorem, if PA is
consistent, then Bew is not numeralwise expressible in PA. In other words, Gödel’s
work immediately tells us that there is no primitive recursive decision procedure
for PA; and that work is extendible to any notion that will be representable in PA.
All that was necessary after 1931, then, was to formulate the appropriate general
notion of computability, and show that all computable functions were numeralwise
representable.
Note 2. We have shown that for every general recursive function ϕ there is a
formula G such that: either G is derivable and ϕ(γ(G)) 6= 1 or else ∼G is derivable
and ϕ(γ(G)) = 1. This shows that there is no way of extending PA to a system that
is both consistent and decidable. PA is therefore said to be essentially undecidable.
The proof of Church’s Theorem relies only on the fact that every recursive func-
tion is numeralwise representable in PA, and on the Fixed Point Theorem (and, of
course, the Fixed Point Theorem holds provided that the function diag is numeral-
wise representable). Every consistent formal system in which all recursive functions
are numeralwise representable is thus essentially undecidable. We now show that
even systems considerability weaker than PA are essentially undecidable.
Definition. Let Q be the formal system whose language is LPA , and whose
axioms are like those of PA except that the axiom-schema of induction is eliminated
and, in its stead, the axiom ∼x = 0 ⊃ ∃y(x = Sy) is added.
System Q is often called Robinson arithmetic, after Raphael Robinson, who first
formulated the system in 1950. Q is a very weak system, because of the absence of
induction axioms. Even quite elementary truths like ∀x(0 + x = x) are not derivable
in it. Nonetheless,
106 CHAPTER 6. COMPUTABILITY

Robinson’s Lemma. Every recursive function is numeralwise representable


in Q.

Proof. The inclusion of the axiom ∼x = 0 ⊃ ∃y(x = Sy) yields the derivability in
Q of the formulas x ≤ m ⊃ x = 0 ∨ x = 1 ∨ . . . ∨ x = m and x ≤ m ∨ m ≤ x for
every m. A close analysis of the proof of the Representability Theorem given in §??
shows that these properties of ≤, together with the facts that x + y numeralwise
represents addition in Q, and x × y numeralwise represents multiplication in Q,
yield the representability of all primitive recursive functions in Q. From that, the
representability of all general recursive functions in Q follows by the same argument
as was used above for PA.

From Robinson’s Lemma it follows that system Q is essentially undecidable.


Now let Q has only finitely many nonlogical axioms (seven, in fact). From these
two facts, we may prove
Church-Turing Theorem. There is no decision procedure for quantificational
validity.

Proof. Let A be the conjunction of the universal closures of the seven non-logical
axioms. By the Deduction Theorem we have:

A formula F is derivable in Q iff the formula (A ⊃ F ) is derivable using


just the logical axioms (i.e., the truth-functional, quantificational, and
identity axioms).

From the undecidability of Q, we then have: there is no effective procedure that


determines, given any formula F in LPA , whether or not (A ⊃ F ) is derivable using
just the logical axioms.
Now formulas (A ⊃ F ) are not in the language of pure quantification theory:
they contain function signs (namely, “S”, “+”, and “×”) and constants (“0”). How-
ever, by adjoining some new predicate letters and using definite descriptions, one
can effectively find a quantificational schema (A ⊃ F )∗ that is in this language and
that is derivable from logical axioms iff (A ⊃ F ) is so derivable.
But then there can be no effective procedure that decides quantificational va-
lidity. For such a procedure would yield a procedure that decides, given any formula
F of LPA , whether or not (A ⊃ F )∗ is derivable using logical axioms only; from this,
one could obtain a decision procedure for Q.
6.6. RECURSIVE AND RECURSIVELY ENUMERABLE SETS 107

6.6 Recursive and recursively enumerable sets


A set of integers is said to be recursive iff its characteristic function is general
recursive. (Recall that the characteristic function of a set S is the function that
takes n to 1 if n ∈ S and takes n to 0 if n ∈ / S.) Thus, a set is recursive iff there is
an effective procedure that determines, for any integer, whether or not that integer
is in the set.
It should be clear that the complement of a recursive set is recursive, and that
the union and the intersection of recursive sets are recursive. Since every general
recursive function is numeralwise representable in formal system PA, every recursive
set is numeralwise representable in PA; the same holds for formal system Q.
A formal system is decidable iff the set of gödel numbers of formulas derivable
in the system is recursive. Thus the set of gödel numbers of formulas derivable in
PA is not recursive. There is, however, a search procedure for this set, that is, an
effective procedure that, applied to any integer n, eventually terminates if n is in
the set but does not terminate if n is not in the set. We need merely look through
the integers, consecutively, and terminate when and if we find an integer that is the
gödel number of a derivation in PA of the formula with gödel number n. Sets for
which there exists a search procedure are called recursively enumerable.
Definition. A set of integers is recursively enumerable (r.e.) iff it is the domain
of some partial recursive function.
The rubric “recursively enumerable” reflects the fact that such a set can be
exhaustively listed in an effective manner, that is, there exists a general recursive
function g such that g(0), g(1), g(2), . . . lists all and only the members of the set
(possibly with repetitions). Let us prove this.
R.e. Fact 1. A set S is recursively enumerable iff either S is empty or else S
is the range of some general recursive function.

Proof. (→) Suppose S is r.e. Thus S = domain(ϕe ) for some e. If S is empty, we


are done. Otherwise, let k0 be a member of S. We construct a general recursive
function that takes an integer p to n if p is the gödel number of a computation (i.e.
derivation) of a value for ϕe (n), and takes p to k0 otherwise. In fact, let

µn[n ≤ p & Der1 (e, n, p) = 0] if there exists such an n ≤ p
g(p) =
k0 otherwise.

The g is clearly recursive (indeed, p.r.), and range(g) = domain(ϕe ) = S.


108 CHAPTER 6. COMPUTABILITY

(←) If S is empty then S is the domain of the partial recursive function that is
nowhere defined. If S = range(g), where g is general recursive, let ψ(n) = µp[g(p) =
n]. Then ψ is partial recursive, and ψ(n) is defined iff n ∈ range(g). That is,
domain(ψ) = range(g) = S.

R.e. Fact 2. If a set and its complement are both r.e., then the set is recursive.

Proof. The intuitive idea is this: suppose there are search procedures for S and for
S. Given n, start both search procedures on n; since either n ∈ S or n ∈ S, at
some point one of the search procedures will terminate. If the search procedure for
S terminates, we know n ∈ S; if that for S terminates, we know n ∈ / S. Thus we
have a decision procedure for membership in S.
More rigorously, suppose S = domain(ϕd ) and S = domain(ϕe ). Let ψ(n) =
µp[Der1 (d, n, p) · Der1 (e, n, p) = 0]. Since the application of µ is licensed, ψ is
general recursive. Let g(n) = α(Der1 (d, n, ψ(n))). Then g is general recursive.
Moreover, if n ∈ S then Der1 (d, n, ψ(n)) = 0, so that g(n) = 1. If n ∈ / S then
Der1 (d, n, ψ(n)) 6= 0, so that g(n) = 0. Thus g is the characteristic function of
S.

R.e. Fact 2 can be extended to k-place relations for k > 1, if we extend our
definitions in the obvious way: a k-place relation is recursive iff its characteristic
function is general recursive, and is recursively enumerable iff it is the domain of
some k-place partial recursive function. The following result relates r.e. sets to 2-
place recursive relations. It can easily be extended to relate k-place r.e./ relations
to (k + 1)-place recursive relations.
R.e. Fact 3 A set is r.e. iff there exists a 2-place recursive relation R such
that, for each n, n ∈ S iff (∃p)R(n, p).

Proof. (→) Suppose S = domain(ϕe ). Then n ∈ S iff (∃p)(Der1 (e, n, p) = 0), and,
for any e, Der1 (e, n, p) = 0 is a recursive relation of n and p. (Indeed, it is primitive
recursive.)
(←) Let R be a 2-place recursive relation, and let S be the set of integers n
such that (∃p)R(n, p). Let χ be the characteristic function of R; thus χ is general
recursive, so that the function ψ(n) = µp[χ(n, p) = 1] is partial recursive. Clearly
n ∈ S iff n ∈ domain(ψ); hence S is r.e.

Thus, a set is r.e. iff it can be obtained from a recursive relation by existential
quantification.
6.6. RECURSIVE AND RECURSIVELY ENUMERABLE SETS 109

Other r.e. Facts.

(a) If two sets are r.e., then so are their union and their intersection.

(b) If ψ is any partial recursive function and k is any integer, then {n | ψ(n) = k}
is r.e.

(c) Every recursive set is r.e.

(d) A set is r.e. iff it is the range of some partial recursive function.

(e) If R is a 2-place r.e. relation, then {n | (∃p)R(n, p)} is an r.e. set.

The proofs are left to the reader.


We now investigate relations between these recursion-theoretic notions and for-
mal systems. Let Σ be a formal system. We presume that Σ can be gödelized and
that, in particular, DerΣ is a recursive 2-place relation that mirrors “derivation of”
for system Σ.
Claim 1. The set of gödel numbers of formulas derivable in Σ is r.e.

Proof. n is in this set iff (∃m)DerΣ (m, n). The result thus follows by R.e. Fact 3.

Claim 2. For any formula F (x), {n | `Σ F (n)} is r.e.

Proof. For each n, let h(n) be the gödel number of F (n); h is primitive recursive,
and hence general recursive. By Mirroring, `Σ F (n) iff (∃m)DerΣ (m, h(n)). The
result then follows by R.e. Fact 3.

Claim 3. Suppose Σ is consistent. Then every set that is numeralwise repre-


sentable in Σ is recursive.

Proof. Let F (x) numeralwise represent a set S in Σ. Since Σ is consistent, S =


{n | `Σ F (n)} and S = {n | `Σ ∼F (n)}. By Claim 2, S and S are both r.e. By
R.e. Fact 2, S is recursive.

Claim 2 shows that every set weakly representable in Σ is r.e. Claim 3 shows
that each of the formal systems PA, SA, and Q numeralwise represent the same sets
(assuming they are all consistent), to wit, the recursive sets. Moreover, assuming
these systems are ω-consistent, they all weakly represent the same sets, to wit, the
recursively enumerable sets.
110 CHAPTER 6. COMPUTABILITY

Claim 4. Suppose Σ contains the usual quantifier rules. If Σ is syntactically


complete, then Σ is decidable.

Proof. Let Σ be syntactically complete. We may assume Σ consistent, for if it is


not then it is trivially decidable. Let A be the set of gödel numbers of sentences
derivable in Σ, let B be the set of gödel numbers of sentences refutable in Σ, and
let C be the set of integers that are not gödel numbers of sentences. A and B are
r.e., and C is recursive. By completeness, if n ∈ / A then n ∈ B ∪ C. By consistency,
if n ∈ A then n ∈ / B ∪ C. Hence A = B ∪ C. Thus both A and A are r.e., so A is
recursive.
Since Σ contains the quantifier rules, it includes universal instantiation and
generalization. Hence a formula is derivable in Σ iff its universal closure is derivable.
Let u take the gödel number of any formula to the gödel number of its universal
closure, and take other numbers to themselves. Then n is the gödel number of a
formula derivable in Σ iff u(n) ∈ A. Consequently, the set of gödel numbers of
formulas derivable in Σ is recursive; i.e., Σ is decidable.

In §?? we showed that if PA is consistent then it is undecidable. This, to-


gether with Claim 4, yields Rosser’s strengthening of Gödel’s Theorem, i.e., if PA
is consistent then it is syntactically incomplete. Such a proof of incompleteness by
way of undecidability, however, does not actually provide a sentence that is neither
derivable nor refutable.

6.7 Recursive Function Theory


In this section we investigate the recursive and partial recursive functions more
intrinsically, with an eye to showing basic undecidabilities that arise within the
theory of recursive functions. We have already shown two such results, namely, the
unsolvability of the Totality and Halting Problems. We shall see many more.
A basic tool in this is the Enumeration Theorem, which gives us the existence
of universal Turing Machines (or universal HGK-systems), in the following sense:
There is a Turing Machine M such that, on any input he, ni, M yields the same thing
that the Turing Machine numbered e yields on input n. That is, we can conceive
of M as having the capacity to follow these instructions: given e and n, pretend
you’re the Turing Machine with index number e, and calculate at input n. So M is
universal, in the sense that it can do everything that any Turing Machine can do.
Which is to say: a finite list of instructions suffices to capture all partial recursive
6.7. RECURSIVE FUNCTION THEORY 111

functions.
One can put this into HGK-system language too: there is one HGK-system
that “incorporates” all HGK-systems.
Enumeration Theorem. For each n > 0 there is a universal partial recursive
function Un of n + 1 arguments; that is, for all integers e, p1 , . . . , pn ,
Un (e, p1 , . . . , pn ) = ϕ(n)
e (p1 , . . . , pn ).

Special case n = 1: There is a 2-place partial recursive function U1 such that,


for all e and p, U1 (e, p) = ϕe (p).

Proof. Simply let Un (e, p1 , . . . , pn ) = Res(µk[Dern (e, p1 , . . . , pn , k) = 0].

The Enumeration Theorem allows us to obtain negative results. We exploit the


Theorem to define partial recursive functions that take index numbers as arguments
and, for each value of this argument, simulate the indexed function. This yields
quick diagonal arguments, of which we give two.
Result 1. There exists a one-place partial recursive function ψ such that no
general recursive function agrees with ψ on all arguments at which ψ is defined.

Proof. Define ψ thus: for each m, ψ(m) = ϕm (m) + 1 (as usual, with the convention
that if the right side is undefined then so is the left). Then ψ is partial recursive,
since ψ(m) = U1 (m, m) + 1, and thus ψ comes from the universal function U1 by
composition with the successor function.
Now let ϕe be any 1-place general recursive function. By definition of ψ, ψ(e) =
ϕe (e) + 1. Since ϕe is total, ϕe (e) is defined; thus ψ(e) is defined and ψ(e) 6= ϕe (e).
Hence no general recursive function agrees with ψ at all places at which ψ is defined.
(The sharp-eyed reader will have noted the similarity between this proof and that
of the Unsolvability of the Totality Problem, page ?? above.)

Result 2. Let K = {e | ϕe (e) is defined}. Then K is recursively enumerable,


but not recursive.

Proof. Let ψ(p) = U1 (p, p). Then ψ is partial recursive, and its domain is precisely
K. Hence K is recursively enumerable. To show that K is not recursive it suffices
to show that K is not recursively enumerable. Suppose K were the domain of a
partial recursive function ϕm . Then m ∈ K iff ϕm (m) is defined. Thus m ∈ K iff
m ∈ K, by the specification of K. This is a contradiction.
112 CHAPTER 6. COMPUTABILITY

The nonrecursiveness of K yields another proof of the Unsolvability of the


Halting Problem. For let K = domain(ϕd ); then the Halting Problem for ϕd is
undecidable. A fortiori, the Halting Problem for all partial recursive functions is
undecidable.
The theorem we now present, in a sense, goes in the opposite direction from the
Enumeration Theorem. The latter allows one to treat index numbers as arguments,
whereas the theorem below allows one to take operations on arguments to amount
to operations on index numbers.
Uniformization Theorem. Let ψ be any 2-place partial recursive function.
Then there is a general recursive function h such that, for all e and n,

ϕh(e) (n) = ψ(e, n).

Proof. Consider the following syntactic operation on the HGK-system that defines
ψ: given an integer e, first reletter the function letter f as fk , for k large enough
to avoid conflicts; then add the equation f (x) = fk (e, x). Clearly, the resulting
HGK system defines the partial function that takes each n to ψ(e, n). Moreover, by
gödelization, the function that takes e to the index number of the resulting HGK-
system is general recursive (indeed, primitive recursive). That function is the desired
h.

Note. There are forms of the Uniformization Theorem for more arguments.
E.g., let ψ be a 3-place partial recursive function. Then there exists a 2-place general
recursive g such that ϕg(d,e) (n) = ψ(d, e, n) for all d, e, and n. In the literature, the
general form of the Uniformization Theorem is called the “s–n–m Theorem”.
The Uniformization Theorem is extremely useful in establishing the nonrecur-
siveness of various sets of index numbers. We use it to establish reducibilities.
Definition. Let A and B be sets of integers. A is many-one reducible to B (in
symbols A ≤m B) iff there exists a general recursive function h such that, for each
n, n ∈ A iff h(n) ∈ B.
If A ≤m B. then the question of membership in A is reducible to the question
of membership in B: if one knew how to decide membership in B, one would then
know how to decide membership in A.
Lemma. If A ≤m B and B is recursive, then A is recursive. If A ≤m B and
B is recursively enumerable, then A is recursively enumerable.
6.7. RECURSIVE FUNCTION THEORY 113

Proof. Let h be a general recursive function such that, for all n, n ∈ A iff h(n) ∈
B. If B is recursive, then the characteristic function χ of B is recursive; and the
characteristic function of A is the composition of the characteristic function of B
and h, and so is itself recursive. Hence A is a recursive set. If B is recursively
enumerable then it is the domain of some ϕe . Now the partial function that takes
each n to ϕe (h(n)) is partial recursive; and A is its domain. Hence A is recursively
enumerable.

In the applications below of the Lemma, we use its contrapositive form: if


A ≤m B and A is not recursive, then B is not recursive, and similarly for recur-
sive enumerability. We often use the set K, that is, {e | ϕe (e) is defined}, which
was shown not to be recursive above. Note that, by R.e. Fact 2, K is not recur-
sively enumerable. The needed recursive function h is obtained by means of the
Uniformization Theorem.
Result 3. None of the following sets is recursive:

{e | 0 is in the domain of ϕe };
{e | the domain of ϕe is not empty};
{e | the domain of ϕe is infinite};
{e | ϕe is a total constant function}.

Proof. For any e and n, let ψ(e, n) = ϕe (e). By the Enumeration Theorem, ψ is
partial recursive. By the Uniformization Theorem, there exists a general recursive
function h such that, for all e and n, ϕh(e) (n) = ψ(e, n). Thus we have:

if e ∈ K then ϕh(e) is defined everywhere, and is constant;


if e ∈
/ K then ϕh(e) is defined nowhere.

Thus, if S is any of the sets listed in the statement of the result, e ∈ K iff h(e) ∈ S.
The result follows by the Reduction Lemma.

Result 4. {e | ϕe has infinite range} is not recursive.

Proof. Let ψ(e, n) = ϕe (e)+n. By the Enumeration Theorem, ψ is partial recursive.


By the Uniformization Theorem, there exists a general recursive function g such that
ϕg(e) (n) = ψ(e, n). Thus
114 CHAPTER 6. COMPUTABILITY

if e ∈ K then ϕg(e) has infinite range;


if e ∈/ K then ϕg(e) is defined nowhere, and hence does not have
infinite range.

Hence e ∈
/ K iff ϕg(e) has infinite range, and we are done.

Result 5. Let Tot = {e | ϕe is total}. Neither Tot nor Tot is recursively


enumerable.

Proof. The recursive function h obtained in the proof of Result 3 has the property
that e ∈ K iff h(e) ∈ Tot. Thus e ∈ K iff h(e) ∈ Tot, so that K ≤m Tot. If Tot were
r.e., then, by the Reduction Lemma, K would be r.e. But in the proof of Result 2
above, we showed that K is not r.e. Hence Tot is not r.e.
To show Tot is not r.e., we proceed in a manner similar to the proof of the
Unsolvability of the Totality Problem. Suppose Tot is r.e. Thus there exists a
recursive function g such that Tot = range(g). Let ψ(n) = ϕg(n) (n) + 1. ψ is partial
recursive, by the Enumeration Theorem. ψ is total, since for each n ϕg(n) is total.
Let d be an index number for ψ, i.e., let ψ = ϕd . Since ψ is total, d ∈ range(g). Let
e be such that d = g(e). Then ϕd (e) = ϕg(e) (e), but also ϕd (e) = ψ(e) = ϕg(e) (e)+1.
This is a contradiction.

The above proof shows how to obtain, given any r.e. subset of Tot, a recur-
sive function no index for which lies in the subset. This can be used to prove the
following striking result: for any ω-consistent system there are total recursive func-
tions that cannot be proved to be total in the system. For let Σ be an ω-consistent
formal system, and let S be the set of integers e such that the formalization of
(∀p)(∃q)(Der1 (e, p, q) = 0) is derivable in Σ. By Claim 2 of §??, S is r.e. By the
ω-consistency of Σ, if that formalization is derivable in Σ then ϕe is total. Hence S
is an r.e. subset of Tot, and there exists a recursive function ψ no index for which
is in S. That is, for all d, if ψ = ϕd then d ∈
/ S. Thus ψ cannot be proved in Σ to
be total.

You might also like