0% found this document useful (0 votes)
127 views115 pages

Mathematical Logic I

This document outlines the topics that will be covered in a course on logical metatheory. It introduces logical metatheory as the study of logical systems themselves, rather than just using logical systems. The course will start with propositional logic, proving properties like soundness and completeness. It will then cover predicate logic and the logic of mathematics, exploring limitations of formal systems through Gödel's incompleteness theorems. English will serve as the metalanguage for discussing the object languages of logical systems.

Uploaded by

1991Gio1991
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views115 pages

Mathematical Logic I

This document outlines the topics that will be covered in a course on logical metatheory. It introduces logical metatheory as the study of logical systems themselves, rather than just using logical systems. The course will start with propositional logic, proving properties like soundness and completeness. It will then cover predicate logic and the logic of mathematics, exploring limitations of formal systems through Gödel's incompleteness theorems. English will serve as the metalanguage for discussing the object languages of logical systems.

Uploaded by

1991Gio1991
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

Lecture notes for Mathematical Logic I

Phil 513 Kevin C. Klement


Fall 2011

CONTENTS

Introduction
A. The Topic . . . . . . . . . . . . . . .
B. Metalanguage and Object Language
C. Set Theory . . . . . . . . . . . . . .
D. Mathematical Induction . . . . . . .
1

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

Metatheory for Propositional Logic


A. The Syntax of Propositional Logic . . . . .
B. The Semantics of Propositional Logic . . .
C. Reducing the Number of Connectives . . .
D. Axiomatic Systems and Natural Deduction
E.
Axiomatic System L . . . . . . . . . . . . .
F.
The Deduction Theorem . . . . . . . . . .
G. Soundness and Consistency . . . . . . . .
H. Completeness . . . . . . . . . . . . . . . .
I.
Independence of the Axioms . . . . . . . .
Metatheory for Predicate Logic
A. The Syntax of Predicate Logic . . . . . . .
B. The Semantics of Predicate Logic . . . . .
C. Countermodels and Semantic Trees . . . .
D. An Axiom System . . . . . . . . . . . . . .
E.
The Deduction Theorem in Predicate Logic
F.
Doing without Existential Instantiation . .
G. Metatheoretic Results for System PF . . . .
H. Identity Logic . . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

Peano Arithmetic and Recursive Functions


A. The System S . . . . . . . . . . . . . . . . . . .
B. The Quasi-Fregean System F . . . . . . . . . .
C. Numerals . . . . . . . . . . . . . . . . . . . . .
D. Ordering, Complete Induction and Divisibility
E.
Expressibility and Representability . . . . . .
i

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

1
1
2
3
6

.
.
.
.
.
.
.
.
.

8
8
9
12
16
17
19
22
23
25

.
.
.
.
.
.
.
.

28
28
31
35
39
40
41
43
50

.
.
.
.
.

56
56
59
63
64
68

F.
G.
H.
4

Primitive Recursive and Recursive Functions . . . . . . . . . . . . . . . . . . . . . . . .


Number Sequence Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Representing Recursive Functions in System S . . . . . . . . . . . . . . . . . . . . . . .

Gdels Results and their Corollaries


A. The System , . . . . . . . . . . . . . . . . . . .
B. System S as its Own Metalanguage . . . . . . .
C. Arithmetization of Syntax . . . . . . . . . . . .
D. Robinson Arithmetic . . . . . . . . . . . . . . .
E.
Diagonalization . . . . . . . . . . . . . . . . . .
F.
-Consistency, True Theories and Completeness
G. Gdels First Incompleteness Theorem . . . . .
H. Churchs Thesis . . . . . . . . . . . . . . . . . .
I.
Lbs Theorem / Gdels Second Theorem . . .
J.
Recursive Undecidability . . . . . . . . . . . . .
K. Churchs Theorem . . . . . . . . . . . . . . . . .

ii

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

71
77
80
85
85
86
88
94
95
96
98
101
102
104
107

INTRODUCTION

A.

The Topic

logical system a good one. Does it need to conform


to natural language? Does it need to conform to
the metaphysical structure of the world? Does it
need to conform to the ordinary reasoning habits
of philosophers and mathematicians when theyre
not self-consciously thinking about logic? These
are difficult questions.
Minimally, we can pretty much all agree on the
following. For a logical system to be a good one,
it has to have the features it was designed to have.
For example, the derivation rules for propositional
logic you learned in your first logic course were
designed so that any argument for which it is possible to construct a derivation of the conclusion
from the premises is a valid one according to truth
tables. If a system of derivation were set up with
this aim but included, along with modus ponens
and modus tollens, additionally the inference rule
of affirming the consequent, i.e.,

This is a course in logical metatheory, i.e., the study


of logical systems. It is probably very different (and
considerably harder) than any logic courses you
may have taken before. Those courses (such as our
Phil 110 and 310) may have involved learning logical systems and the symbols they employ: what
they are, how they are used, and how they relate
to English. You learned how to construct formal
deductions or proofs within certain logical systems.
What you were proving was not anything about a
logical system, it was instead either not about anything at all (because the problem never told you
what the symbols meant in that context), or about
some made-up people or things. (E.g., perhaps you
had to prove something about some wacky folks
named Jay and Kay.)
If you have taken Intermediate Logic you have
mastered the boring part of logic: classical propositional logic and first-order predicate logic. You
may have been exposed to some relatively more
advanced and difficult topics: free logic, basic set
theory, and if youre lucky, modal logic. However,
the more advanced logical systems become, the
more controversial they get. For example, I think
free logic is a philosophical disaster and should be
taught only as something to avoid. Obviously, my
colleagues dont always agree with me. Philosophers (and others) widely disagree about what the
right form of modal logic is. So if you plan on continuing your logical education, its probably about
high time you started thinking about what makes a

From A B and B infer A


clearly, the system would inadequate, because
there would be invalid arguments for which one
could construct a derivation.
Logical metatheory is the branch of logic that
studies logical systems themselves. In this course,
rather than using a logical system to prove things
about Jay and Kay, well be proving things about
logical systems. However, its best not to start
with the controversial ones. People disagree about
how to do relevance logic, or deontic logic, or paraconsistent logic, and even whether or not these
branches of logic are worth doing at all. These are
1

not the ideal places to begin to learn how to prove


things about logical systems; its best to start at
the beginning. Well be starting with propositional
logic. In our first unit, well be proving about our
logical system for propositional logic that every
deduction possible within it is valid according to
truth tables, and conversely, that every argument
valid according to truth tables has a corresponding
deduction, or in other words that it is sound and
complete. Well then move on to proving things
about first-order predicate logic.
Lastly, well move on to the logic of mathematics, the basic reasoning patterns involved in mathematics and the basic principles of arithmetic. Well
show how, as the logical system under study gets
more complex, so does the apparatus one needs in
order to prove things about it. Well also discover
some interesting results about logical systems of
a certain sort, specifically that they dont always
quite live up to their original intent. For example,
we will be studying the attempt made in the late
1890s and early 1900s to fully capture all truths of
elementary number theory within a single deductive system, and show that the attempt failed, and
even that what they had hoped is impossible! This
is one of the results of Gdels incompleteness theorems. But lets start at the beginning: metatheory
for simple propositional logic.

B.

ordinary English. This is because in addition to


the symbols of our object language, well be adding
some technical terms and even some symbols to
ordinary English to make our lives easier.

The use/mention distinction


English already has some handy devices that make
it a good metalanguage. Specifically it has things
like quotation marks that we can use for mentioning an expression as opposed to using it. Kevin is
not a name, but Kevin is. Many words are verbs,
but verbs itself is only one word and it is not a
verb. This sentence mentions the word however.
This sentence, however, both uses and mentions
the word however. You get the idea.

The logic of the metalanguage


Well be using the metalanguage to prove things
about the object language, and proving anything
requires logical vocabulary. Luckily, English has
handy words like all, or, and, not, if, and
it allows us to add new words if we want like iff
for if and only if. Of course, our object languages
also have logical vocabularies, and have signs like
, , , . But wed better restrict those
signs to the object language unless we want to get
ourselves confused.
But we do want our metalanguage to be very
clear and precise. For that reason, when we use
the word or, unless noted otherwise, we mean
by this the inclusive meaning or or. Similarly,
if we use the phrase if . . . then . . . in this class
we always mean the material conditional unless
stated otherwise. (This makes our metalanguage
slightly more precise than ordinary English.) The
same sorts of logical inferences that apply in the
object language also apply in the metalanguage. So

Metalanguage and Object


Language

Modern logical systems tend to make use of their


own symbolic languages; hence one of the things
that get studied in logical metatheory are the languages of logical systems.
Definition: The object language is the language
being studied, or the language under discussion.

If blah blah blah then yadda yadda.


Blah blah blah.
Therefore, yadda yadda.

Definition: The metalanguage is the language


used when studying the object language.
In this course, the object languages will be the symbolic languages of propositional and predicate logic.
The metalanguage is English. To be more precise, it
is a slightly more technical variant of English than

. . . is a valid inference form. You have to use logic


to study logic. Theres no getting away from it.
However, Im not going to bother stating all the
logical rules that are valid in the metalanguage,
2

since Id need to do that in the metametalanguage, Instead, it looked something like:


and that would just get me started on an infinite
A B
regress. The rule of thumb is: if its OK in the
A
object language, its OK in the metalanguage too.
B
Why? Well, if you used the object-language version, modus ponens would only apply when the
antecedent is P and consequent is Q, and so the
following wouldnt have counted as an instance of
the rule:
(S T ) R
ST
R

Metalinguistic variables
Ordinary English doesnt really use variables, but
they make our lives a lot easier. Since the metalanguage is usually used in this course to discuss the
object language, the variables we use most often
in the metalanguage are variables that are used to
talk about all or some expressions of the metalanguage. Especially when we get to predicate logic,
where the object language itself contains variables,
again, we dont want to get the variables of the
object language confused with those of the metalanguage. Since predicate logic uses letters like
x and y as variables in the object language, it is
important to be clear when a variable is part of the
language. This can be done by making the metalanguages variable distinctive. For example, I use
fancy script letters like A and B in the metalanguage to mean any object-language expression of
a certain specified type. For example, I might write
something like:

The only way to get the rule to cover an infinite


number of possible cases is to state it schematically,
i.e., using variables of the metalanguage to describe
any object language expressions of certain forms.
Hence, variables in the metalanguage used in this
way are called schematic letters.
In your homework and exams, you may prefer to use Greek letters instead of script letters,
which may be easier to draw in a more distinctive
way. You may do whatever you wish provided I
can tell the difference between object language and
metalanguage variables.
Schematic letters will be used every single day
in this class. Better make friends with them quick.

If A is a sentence of predicate logic,


then A contains no variables not bound
by a quantifier.

C.

Notice that, in that statement, the variable A is


used, not mentioned. The letter A is not itself
used in predicate logic, and contains no variables
bound or free. Its something I use in the metalanguage in place of mentioning a particular object
language expression. So A might be F a or it
might be (x)(F x Gx), etc.
A typical use of these is to represent any object
language expression or set of expressions matching
certain patterns. This happens for example in stating the inference rules of the object language. Just
look at the lists of rules you used when learning
logic. Whichever book you used, modus ponens
didnt look (or shouldn t have looked) like :

Set Theory

Generally, in order to do logical metatheory for


a given logical system, the logical apparatus of
the metalanguage has to be at least as complex,
and usually more complex than that of the object
language. So in order to do metatheory for propositional and predicate logic, well need something
stronger, and in particular, well need some set
theory. Note that this course is not a course on
set theory; were not going to be studying logical
systems for set theory. Instead, were going to presuppose or use some set theoretical notation in our
metalanguage, i.e., English. Therefore, you should
think of all the signs and variables in this section
as an expansion of English. This semester at least,
set theory will be something we use when we study
propositional and predicate logic; not something
we are studying.

P Q
P
Q
3

Definition: The intersection of and , written , is the set that contains everything that
is a member of both and .

This means that we can be relatively informal


about it. This is good because the exact rules
and principles of set theory are still controversial.
There are different systems, e.g., ZF set theory,
NBG set theory, the theory of types, and so on.
Luckily we dont need to get into those details, because all well need for this course is the rudiments
they all share.

Definition: The relative complement of and


, written , is the set containing all members
of that are not members of .
Definition: The empty set or null set, written
, or { }, is the set with no members.

Sets

Definition: If and are sets, then they are disDefinition: A set is a collection of entities for
joint iff they have no members in common, i.e., iff
which it is determined, for every entity of a given
= .
type, that the entity either is or is not included in the
set.

Ordered n-tuples and relations


Definition: An urelement is a thing that is not a
Definition: An ordered n-tuple, written
set.
hA1 , . . . , An i, is something somewhat like a set,
Definition: An entity A is a member of set iff except that the elements are given a fixed order, so
that hA1 , . . . , An i = hB1 , . . . , Bn i iff Ai = Bi for
it is included in that set.
all i such that 1 i n.
We write this as: A . We write A
/ to
An ordered 2-tuple, e.g., hA, Bi is also called an ormean that A is not a member of .
Sets are determined entirely by their members: dered pair. An entity is identified with its 1-tuple.
for sets and , = iff for all A, A iff Definition: If and are sets, then the CarteA .
sian product of and , written , is the
set of all ordered pairs hA, Bi such that A and
Definition: A singleton or unit set is a set conB .
taining exactly one member.
Generally, n is used to represent all ordered ntupes consisting entirely of members of . Notice
that 2 = .
The following definition is philosophically
problematic, but a common way of speaking in
mathematics.

{A} means the set containing A alone. Generally, {A1 , . . . , An } means the set containing all
of A1 , . . . , An , but nothing else.
The members of sets are not ordered, so from
{A, B} = {C, D} one cannot infer that A = C,
only that either A = C or A = D.

Definition: An n-place relation (in extension)


on set is any subset of n .

Definition: If and are sets, is said to be a


subset of , written , iff all members of
are members of ; and is said to be a proper
subset of , written , iff all members of
are members of , but not all members of are
members of .

A 2-place relation is also called a binary relation.


Binary relations are taken to be of sets of ordered
pairs. A 1-place relation is also called (the extension of) a property.

Definition: If and are sets, the union of Definition: If R is a binary relation, then the doand , written , is the set that contains ev- main of R is the set of all A for which there is an B
such that hA, Bi R.
erything that is a member of either or .
4

Definition: If R is a binary relation, the range of Cardinal numbers


R is the set of all B for which there is an A such that
Definition: If and are sets, then they are
hA, Bi R.
equinumerous, written
= , iff there is a oneDefinition: The field of R is the union of the do- one function whose domain is and whose range is
main and range of R.
.
Definition: If R is a binary relation, R is reflex- Definition: Sets and have the same cardiive iff hA, Ai R for all A in the field of R.
nality or cardinal number if and only if they are
equinumerous.
Definition: If R is a binary relation, R is symmetric iff for all A and B in the field of R, hA, Bi R
Definition: If and are sets, then the cardinal
only if hB, Ai R.
number of is said to be smaller than the cardinal
Definition: If R is a binary relation, R is tran- number of iff there is a set Z such that Z

sitive iff for all A, B and C in the field of R, if and = Z but there is no set W such that W
and W
= .
hA, Bi R and hB, Ci R then hA, Ci R.
Definition: A binary relation R is an equiva- Definition: If is a set, then A is denumerable
lence relation iff R is symmetric, transitive and iff is equinumerous with the set of natural numbers
{0, 1, 2, 3, 4, . . . , (and so on ad inf.)}.
reflexive.
Definition: If R is an equivalence relation then, Definition: Aleph null, also known as aleph
the R-equivalence class on A, written [A]R , is naught, written 0 , is the cardinal number of any
the set of all B such that hA, Bi R.
denumerable set.
Definition: If is a set, then is finite iff either
= or there is some positive integer n such that
Definition: A function (in extension) is a bi- is equinumerous with the set {1, . . . , n}.
nary relation which, for all A, B and C, if it includes
hA, Bi then it does not also contain hA, Ci unless Definition: A set is infinite iff it is not finite.
B = C.
Definition: A set is countable iff it is either finite
So if F is a function and A is in its domain, then or denumerable.
there is a unique B such that hA, Bi F ; this
unique B is denoted by F (A).
Homework
Assuming
that , and Z are sets, R is a relaDefinition: An n-place function is a function
whose domain consists of n-tuples. For such a tion, F is a function, and A and B are any entities,
function, we write F (A1 , . . . , An ) to abbreviate informally verify the following:
(1) A {B} iff A = B
F (hA1 , . . . , An i).
(2) if and Z then Z
Definition: An n-place operation on is a (3) if and then =
function whose domain is n and whose range is (4) ( ) Z = ( Z)
a subset of .
(5) ( ) Z = ( Z)
(6) = and =
Definition: If F is a function, then F is one-one (7) =
iff for all A and B in the domain of F , F (A) = (8) ( ) ( ) =
F (B) only if A = B.
(9) 1 =

Functions

(10) If R is an equivalence relation, then ([A]R =


[B]R iff hA, Bi R) and (if [A]R 6= [B]R then
[A]R and [B]R are disjoint).
(11) Addition can be thought of as a 2-place operation on the set of natural numbers.
(12)
=
(13) The set of even non-negative integers is denumerable.
(14) The set of all integers, positive and negative, is
denumerable.

D.

In this class, we rarely use these principles in the


metalanguage. Instead, we use some corollaries
that come in handy in the study of logical systems.
Mendelson does not give these principles special
names, but I will.
Definition: The principle of wff induction
states that:
For a given logical language, if holds of the simplest well-formed formulas (wffs) of that language,
and holds of any complex wff provided that holds
of those simpler wffs out of which it is constructed,
then holds of all wffs.

Mathematical Induction

Well also be expanding the logic of the metalan- This principle is often used in logical metatheory.
induction. Actuguage by allowing ourselves the use of mathemati- It is a corollary of mathematical
0
ally,
it
is
a
version
of
it.
Let

be
the property a
cal induction, a powerful tool of mathematics.
number has if and only if all wffs of the logical
Definition: The principle of mathematical in- language having that number of logical operators
have . If is true of the simplest well-formed forduction states the following:
If ( is true of 0), then if (for all natural numbers n, mulas, i.e., those that contain zero operators, then
0
if is true of n, then is true of n + 1), then is 0 has . Similarly, if holds of any wffs that are
constructed out of simpler wffs provided that those
true of all natural numbers.
simpler wffs have , then whenever a given natural
0
0
To use the principle mathematical induction to ar- number n has then n + 1 also has . Hence, by
rive at the conclusion that something is true of mathematical induction, all natural numbers have
0
all natural numbers, one needs to prove the two , i.e., no matter how many operators a wff contains, it has . In this way wff induction simply
antecedents, i.e.:
reduces to mathematical induction.
Similarly, this principle is usually utilized by
Base step. is true of 0
proving the antecedents, i.e.:
Induction step. for all natural numbers n, if is
Base step. is true of the simplest well-formed
true of n, then is true of n + 1
formulas (wffs) of that language; and
Typically, the induction step is proven by means of
Induction step. holds of any wffs that are cona conditional proof in which it is assumed that
structed out of simpler wffs provided that
is true of n, and from this assumption it is shown
those simpler wffs have .
that must be true of n + 1. In the context of this
conditional proof, the assumption that is true of Again, the assumption made when establishing the
n is called the inductive hypothesis.
induction step that holds of the simpler wffs is
From the principle of mathematical induction, called the inductive hypothesis.
one can derive a related principle:
Well also be using:
Definition: The principle of complete (or
strong) induction states that:
If (for all natural numbers n, whenever is true of
all numbers less than n, is also true of n) then is
true of all natural numbers.

Definition: The principle of proof induction:


In a logical system that contains derivations or proofs,
if is true of a given step of the proof whenever is
true of all previous steps of the proof, then is true
of all steps of the proof.
6

The principle of proof induction is an obvious corollary of the principle of complete induction. The
steps in a proof can be numbered; were just applying complete induction to those numbers.
Homework
Answer any of these we dont get to in class:
(1) Let be the property a number x has just in
case the sum of all numbers leading up to and
including x is x(x+1)
. Use the principle of math2
ematical induction to show that is true of all
natural numbers.
(2) Let be the property a number x has just in
case it is either 0 or 1 or it is evenly divisible by
a prime number greater than 1. Use the principle of complete induction to show that is
true of all natural numbers.
(3) Let be the property a wff A of propositional
logic has if and only if has a even number of
parentheses. Use the principle of wff induction
to show that holds of all wffs of propositional
logic. (If needed, consult the next page for a
definition of a wff in propositional logic.)
(4) Consider a logical system for propositional
logic that has only one inference rule: modus
ponens. Use the principle of proof induction to
show that every line of a proof in this system
is true if the premises are true.

UNIT 1
METATHEORY FOR PROPOSITIONAL LOGIC

A.

The Syntax of Propositional


Logic

(i) any statement letter is a wff;


(ii) if A is a wff then so is A ;1
(iii) if A and B are wffs then so is (A B);
We finally turn to our discussion of the logical (iv) if A and B are wffs then so is (A B);
(v) if A and B are wffs, then so is (A B);
metatheory for propositional logic (also known as
sentential logic). In particular, we shall limit our (vi) if A and B are wffs, then so is (A B);
study to classical (bivalent) truth-functional propo- (vii) nothing that cannot be constructed by repeated
applications of the above steps is a wff.
sitional logic. We first sketch the make-up of the
object language under study. The syntax of a language, or the rules governing how its expressions
* The above definition is provisional; we shall later
can and cannot be combined.
The basic building blocks are statement letters, amend it. This tell us everything we need to know
about the syntax or grammar of propositional
connectives and parentheses.
logic.
Definition: A statement letter is any uppercase
You may be familiar with a slightly different
letter of the Roman alphabet written with or without
notation. I am sticking with the book.
a numerical subscript.
Examples: A, B, P , Q1 , P13 , and N123 are
all statement letters. The numerical subscripts are
used in case we would ever need to deal with more
than 26 simple statements at once. Hence P 1 and
P 2 are counted as different statement letters.

Mendelsons sign Alternatives


Negation

,
Conjunction

&,
Disjunction

+
Material conditional

,
Material biconditional
,

Definition: A propositional connective is any


of the signs , , , and .

Definition: A well-formed formula* (abbrevi- Feel free to use whatever signs you prefer. I might
ated wff) is defined recursively as follows:
not even notice.
1

Here we are not really using the phrase A , since this definition is in the metalanguage and is not part of
English. Nor, however, are we mentioning it, since A is not a part of the object language. Really we should be using
special quasi-quotation marks, also known as Quine corners, where pA q is the object language expression formed by
concatenating to whatever expression A is. Although imprecise, I forgo Quine corners and rely just on context, to avoid
a morass of these marks, and to allow for another use of the same notation Mendelson uses in chap. 3.

B.

Parentheses Conventions
The chart above also gives the ranking of the connectives used when omitting parentheses. Sometimes when a wff gets really complicated, its easier
to leave off some the parentheses. Because this
leads to ambiguities, we need conventions regarding how to read them. When parentheses are omitted, and it is unclear which connective has greater
scope, the operator nearer the top on the list above
should be taken as having narrow scope, and the
operator nearer the bottom of the list should be
taken as having wider scope. For example:

The Semantics of
Propositional Logic

To give a semantic theory for a language is to


specify the rules governing the meanings of the
expressions of that language. In truth-functional
propositional logic, however, nothing regarding the
meaning of the statement letters over and above
their truth or falsity is relevant for determining the
truth conditions of the complex wffs in which they
appear. Moreover, the meanings of the connectives
are thought to be exhausted by the rules governing
how the truth-value of the wffs they are used to
construct depends on the truth values of the statement letters out of which they are constructed.
In short, everything relevant to the logical semantics of a wff of propositional logic is given by
its truth table. I assume you already know how to
construct truth tables. E.g.:

ABC
is an abbreviation of:
(A (B C))
whereas:

(P
T
T
F
F

ABC
is an abbreviation of:
((A B) C)

T
T
T
F

Q)
T T
F T
T T
F F

(Q
F T
F F
T T
F F

(P
T T
T T
F
F
T
F

Q))
T
T
F
F
F
T
T
F

When the operators are the same, the convention is


Roughly, this shows us every thing we need to
association to the left, i.e., the leftmost occurrence know about the meaning of the wff (P Q)
is taken to have narrowest scope. So
(Q (P Q)).
To get serious with our study, we need a numABC
ber of definitions.
Definition: A truth-value assignment is any
function whose domain is the statement letters of
propositional logic, and whose range is a nonempty
((A B) C)
subset of truth values {TRUTH, FALSITY} ( T and F
Obviously, for and , this last convention is for short).
less important, since (A B) C is logically
equivalent with A (B C ), and similarly, Informally, each row of a truth table represents a
(A B) C is equivalent with A (B C ). different truth-value assignment. Each row repSometimes parentheses cannot be left off, with- resents a different possible assignment of truth
values to the statement letters making up the wff
out making the wff mean something else:
or wffs in question.
In virtue of way it is constructed out of truthA (B C)
functional connectives, every wff is determined
to be either true or false (and not both) for any
cannot be written
given truth-value assignment to the statement letters making it up. The truth value of a statement
A B C.
is an abbreviation of:

for a given truth-value assignment is represented Abbreviation: The notation


in the final column of a truth table, underneath its
A  B
main connective.
Definition: A wff is a tautology iff it is true for ev- is used to mean that A and B are logically equivery possible truth-value assignment to its statement alent.
letters.
Definition: If is a set of wffs and A is a wff,
The wff (P Q) (Q (P Q)) is then A is logical consequence of if and only if
not a tautology, because it is false for the truth- there is no truth-value assignment to the statement
value assignment that makes both P and Q false, letters making up the wffs in and A that makes
despite that all the other truth-value assignments every member of true but makes A false.
make it true. However, the wff (P Q) (Q
To say that A is a logical consequence of is the
(P Q)) is a tautology, because it is true for
same as saying that an argument with the members
every truth-value assignment, i.e., on every row of
of as its premises and A is conclusion is valid by
a truth table.
truth tables.
Abbreviation: The notation:
Abbreviation: The notation:
A

A

means that A is a tautology.

is used to mean that A is a logical consequence of


.

Definition: A wff is a self-contradiction iff it is


false for every possible truth-value assignment.
These four uses of the sign  are related in intuitive ways. A tautology can in effect be thought of
Definition: A wff is contingent iff it is true for
something that is true in virtue of logic alone, or
some possible truth-value assignments and false for
the conclusion of a logically valid argument that beothers.
gins without any premises at all! I.e,  A means
Definition: A wff A is said to logically imply a the same as  A .

wff B iff there is no possible truth-value assignment Definition: Two wffs A and B are said to be conto the statement letters making them up that makes sistent or mutually satisfiable if and only if there
A true and B false.
is at least one truth-value assignment to the statement letters making them up that makes both A
Abbreviation: The notation:
and B true.
A B
means that A logically implies B. Note that this
sign is part of the metalanguage; it is an abbreviation of the English words . . . logically implies . . . .
The sign  is not used in the object language.
So A  (B  D) and A (B  C ) are
nonsense.
Definition: Two wffs A and B are logically
equivalent if and only if every possible truth-value
assignment to the statement letters making them up
give them the same truth value.

It is time to get our first practice proving things


in the metalanguage. Again, were going to use English to prove something about the logical language
of propositional logic. We can be somewhat informal about the logical structure of our proof, since
we havent fully laid out a deductive system for
the metalanguage. But its usually best to number
the steps of the proof just like an object language
deduction and be as clear as possible about how
the proof works.
Heres what were going to prove:

10

Result: For any wffs A and B, A  B iff


 (A B). (Logical implication is equivalent
with tautologyhood of material implication.)

Lines (6)(9) represent a conditional proof that


if  (A B) then A  B. Putting the two
together we get that A  B iff  (A B). This
is what we were aiming to prove.
e

What were proving is a biconditional; in particular,


were proving that one statement logically implies
another iff the corresponding object language conditional statement is a tautology. Well prove this
biconditional using the same strategy wed use if
we were going to prove a biconditional in some
object language deduction system. In particular,
we prove the conditional going one way, and then
the other. So the proof goes like this:

(Here on out, I use e to demarcate the end of a


proof in the metalanguage.)
Be careful about not mixing up the object language and the metalanguage. Assuming that not
 A is not the same as assuming that  A .
After all, if A is contingent, neither it nor its negation is a tautology. The sign should never be
used for negation in the metalanguage, nor
used instead of iff, etc. If you wish, you can write,
2 A to mean not- A , but never  A .
Thats not even meaningful!

Proof:
(1) Assume that A  B. We need to show that
(A B) is a tautology.
(2) Suppose for reductio ad absurdum (indirect
proof) that (A B) is not a tautology. This
means that there is some truth-value assignment that does not make (A B) true. It
must make (A B) false.
(3) According to the truth table rules for the sign
, this means it must make A true and B
false.
(4) However, this contradicts the assumption that
A  B, since that rules out any truth-value
assignment making A true and B false.
(5) Our supposition at line (2) must be mistaken,
and so  (A B) after all.
Lines (1)(5) represent a conditional proof in the
metalanguage that if A  B then  (A B).
We need to go the other way as well.
(6) Assume that  (A B).
(7) Assume for reductio ad absurdum that it is not
true that A  B. This means that there is at
least one truth-value assignment that makes
A true but B false.
(8) Since there is at least one truth-value assignment that makes A true and B false, there is
at least one truth-value assignment that makes
(A B) false, given the rules for constructing truth tables for the sign .
(9) However, this contradicts our assumption at
line (6). Hence, A  B after all.

Result: A  B iff  (A B).

Proof:
Similar to previous example.

Result: For any wffs A and B, if  A and


 (A B) then  B.

First, to be clear about what were doing, were


not proving that modus ponens is a valid reasoning
form. That would be to prove that
{(A B), A }  B
The above is true, and easily proven, but its not
what were after. Instead, were proving something
a bit stronger, namely that modus ponens preserves
tautologyhood, i.e., that if both A and A B
are tautologies, then B is a tautology as well.
Proof:
What were proving is a conditional. We assume
the antecedent and attempt to prove the consequent.
(1) Assume that  A and  (A B).

11

(2) This that both A and (A B) are tautologies, i.e., that every possible truth-value assignment to the statement letters making them up
makes them true.
(3) Suppose for reductio ad absurdum that there
were some truth-value assignment (row of a
truth table) making B false.
(4) Notice that because every truth-value assignment makes (A B) true, if it makes B false
it must make A false as well.
(5) From lines (3) and (4) we get the result that
there is a truth-value assignment making A
false.
(6) However, it follows from line (2) that no truthvalue assignment makes A false.
(7) Lines (5) and (6) are a contradiction, and so our
assumption at line (3) is false, and so  B.
(8) Therefore, by conditional proof, if  A and
 (A B) then  B.
e

easier if it is simpler, because the more complex


the language is, the more there is to say about it.
When doing logical metatheory, its usually to our
advantage to whittle down our object language
(and the logical calculi we develop in it) to as small
as possible. To that end, we ask, do we really need
all five connectives (, , , and )?
After all, our object language is not inadequate
in any way by not including a sign for the exclusive sense of disjunction, since we can represent it
using other signs, e.g., as (A B) (A B)
or (A B), etc. And no, we dont need all five
of the ones we have. First well show that we could
get by with just three, and later two, and finally
one.

Result (Adequate Connectives): Every possible truth function can be represented by means
of the connectives , and alone.

In your book, there are also proofs of the following:

Result: If  A , and B is the result of (uniformly) replacing certain statement letters in A


by complex wffs, then  B.

Result: If A is a wff containing wff C in one


or more places, and B is just like A except containing wff D in those places where A contains
C , then if C  D then A  B.

C.

Reducing the Number of


Connectives

When working within a given language, usually


the more complex it is, the easier it is to say what
you want, because you have more vocabulary in
which to say it. However, when youre trying to
prove something about the language, its usually

Proof:
Well prove this somewhat informally.
(1) Assume that A is some wff built using any set
of truth-functional connectives, including, if
you like, connectives other than our five. (A
might make use of some three or four-place
truth-functional connectives, or connectives
such as the exclusive or, or any others you
might imagine for bivalent logic.)
(2) What were going to show is that there is a wff
B formed only with the connectives ,
and that is logically equivalent with A .
(3) In order for it to be logically equivalent to A ,
the wff B that we construct must have the
same final truth value for every possible truthvalue assignment to the statement letters making up A , or in other words, it must have the
same final column in a truth table.
(4) Let P1 , P2 , . . . , Pn be the distinct statement
letters making up A . For some possible truthvalue assignments to these letters, A may be
true, and for others A may be false. The
only hard case would be the one in which
A is contingent. Clearly tautologies and selfcontradictions can be constructed with the

12

signs , and , and all tautologies are


logically equivalent to one another, and all selfcontradictions are equivalent to one another,
in those cases, our job is easy. Let us suppose
instead that A is contingent.
(5) Let us construct a wff B in the following way.
a) Consider in turn each possible truth-value
assignment to the letters P1 , P2 , . . . , Pn .
For each truth-value assignment, construct
a conjunction made up of those letters the
truth-value assignment makes true, along
with the negations of letters the truth-value
assignment makes false.

This means that we form a disjunction


using as the disjuncts those conjunctions
formed in step a) for those rows that make
A true. The others are left out. In this
case:
(A B C) (A B C)
(A B C)
The three conjunctions in the disjunction
conform to the three truth-value assignments that make A true.

Example: Suppose the letters involved are (6) The wff B constructed in step (5) is logically
A, B and C. This means that there
equivalent to A . Consider that for those truthare eight possible truth-value assignments,
value assignments making A true, one of the
corresponding to the eight rows of a truth
conjunctions making up the disjunction B is
table. We construct an appropriate contrue, and hence the whole disjunction is true as
junction for each.
well. For those truth-value assignments making A false, none of the conjunctions making
A B C
Conjunction
up B is true, because each conjunction will
T T T
ABC
contain at least one conjunct that is false on
T T F
A B C
that truth-value assignment.
T F T
A B C
T F F
A B C
Example: Let us construct a truth table for the
F T T
A B C
formula we constructed during our last step:
F T F
A B C
F F T
A B C
(ABC)(ABC) (ABC)
F F F
A B C
T TTT T T T TT F F T T F TF T F T
b) From the resulting conjunctions, form a
T T T F F T T T T TT F T F T F T F F
complex disjunction formed from those
T F F F T F T F F FF T F F TF F F T
conjunctions formed in step a) for which
T F F F F F T F F FT F F F TF F F F
the corresponding truth-value assignment
F F T F T F F F T F F T T T FTTT T
makes A true.
F F T F F F F F T FT F F T FTT F F
Example: Suppose for the example above
F F F F T F F F F FF T F T FF F F T
that the final column of the truth table for
F F F F F F F F F FT F F T F F F F F
A is as follows (just at random):
By examining the final column for this truth
A B C
A
table, we see that it has the same final column
T T T
T
as that given for A .
T T F
T
T F T
F
(7) This establishes our result. The example was
T F F
F
arbitrary; the same process would work regardF T T
T
less of the number of statement letters or final
F T F
F
column for the statement involved.
e
F F T
F
F F F
F
13

Reducing Further

Reducing Still Further

The above result means that any set of connectives


in which we can always find equivalent forms for
(A B), (A B) and A is an adequate set of
connectives. This means we can reduce still further.
We dont need all three. We can get by with two in
any of three ways.

Actually, if we started from a different basis, we


could get by with just one connective. The most
common way to do this is with the Sheffer stroke,
written |. It has the following truth table:
A
T
T
F
F

Corollary: All truth-functions can be defined


using only and .

B
T
F
T
F

(A | B)
F
T
T
T

A | B could be read not both A and B, and


indeed is equivalent to (A B). However, as
our aim is to reduce all operators to |, it is best not
Proof:
The form (A B) is equivalent to (A B) to think of the meanings of or as playing a
and could be used instead of the latter in the proof role.
above.
e
Corollary: All truth-functions can be defined
using only the Sheffer stroke.
Corollary: All truth-functions can be defined
using only and .
Proof:
Note that:
Proof:
The form (A B) is equivalent with (A
B) and could be used instead of the latter in the
proof above.
e

(A | A )  A
((A | A ) | (B | B))  (A B)
((A | B) | (A | B))  (A B)
and just for kicks, we can add:
(A | (B | B))  (A B)
(((A | A ) | (B | B)) | (A | B))  (A B)

Corollary: All truth-functions can be defined


using only and .

Hence, forms using the Sheffer stroke can be substituted in the proof above.
e
Another way is with the Sheffer/Peirce dagger,
written (neither . . . nor . . . ), which has the
truth table:

Proof:
Note that
(A B)  (A B) and
(A B)  (A B)
and so the former forms can be used in place of the
latter forms in the proof above.
e

14

A
T
T
F
F

B
T
F
T
F

(A B)
F
F
F
T

Corollary: All truth-functions can be defined


using only the Sheffer/Peirce dagger.

Proof:
It suffices to note that:
(A A )  A
((A A ) (B B))  (A B)
((A B) (A B))  (A B)

(5) If we fill both in with Ts get the Sheffer stroke.


If we fill both in Fs, we get the Sheffer/Peirce
dagger. That rules out two of the four remaining possibilities.
(6) If we fill them in T and F respectively, the result
is equivalent with B, and if we fill them in
with F and T, the result is equivalent with A .
(7) Negation is clearly insufficient for defining all
other truth functions (by itself, it can define
only two truth functions). So the remaining options are inadequate. There are no possibilities
left. Our # is impossible.
e

There are, however, triadic connectives and 4+


But thats it. | and are the only binary connec- place connectives that work.
tives from which all truth functions can be derived.
In fact, we can prove this.

Austere Syntax

Result: No binary operator besides | and is


by itself sufficient to capture all truth functions.

We noted earlier that having a reduced vocabulary


in the object language makes proving things about
it in the metalanguage easier, because there is less
to say. So we might decide to revise our definition
of a well formed formula, and make it just this
simple:
(i) Any statement letter is a wff;
(ii) if A and B are wffs then so is (A | B);
(iii) nothing that cannot be constructed by repeated applications of the above is a wff.
However, there are trade offs. The Sheffer stroke
is less psychologically natural, and the rules of inference governing the Sheffer stroke are far less
intuitive than anything as simple as modus ponens
and modus tollens.
In this course, we take an intermediate route,
and take and as our only primitive connectives. Therefore, we now officially revise our
definition of a wff as follows:

Proof:
(1) Suppose there were some other binary connective # that was adequate by itself.
(2) We know immediately that (A # B) must
be false when A and B are both true. If not,
then it would impossible to form something
equivalent to a contradiction, since the top
row of the truth table (the truth-value assignment making all statement letters true) would
always make a wff true.
(3) For similar reasons, (A #B) must be true
when A and B are both false, or else it would
be impossible to form something equivalent to
a tautology.
Definition: A(n official) well-formed formula
(4) Lines (2) and (3) give us this much of the table
(wff) is defined recursively as follows:
for #:
(i) Any statement letter is a wff;
(ii) if A is a wff then so is A ;
A B A #B
(iii) if A and B are wffs then so is (A B);
T T
F
(iv) nothing that cannot be constructed by repeated
T F
?
applications of the above is a wff.
F T
?
F
F
T
We can continue to use the signs , and ,
The question is how to fill in the remaining ?s. but treat them as mere abbreviations. They are
15

definitional shorthands, just like the conventions DN: From A infer A . From A infer A .
we adopted regarding parentheses:
I: From A infer A B. From A infer B A .
I: From A and B infer A B.
Abbreviations:
I: From A B and B A infer A B.
6I: From A andA infer 6.
(A B) abbreviates (A B)
6O: From 6 infer A .
(A B) abbreviates (A B)
O: From (A B) infer A B.
(A B) abbreviates ((A B) (B A ))O: From (A B) infer A . From (A
B) infer B.
Whenever one of these signs appears, what is really O: From (A B) infer A B.
meant is the wff obtained by replacing definiens O: From (A B) infer A B.
with the definiendum. o, e.g.,
Additional proof techniques
CD: Start a subderivation assuming A . If you de(P Q) (R S)
rive B, you may end the subderivation and infer
is just a shorthand abbreviation for
A B.
ID: Start a subderivation assuming A . If you de((P Q) (R S))
rive 6, you may end the subderivation and infer
A . OR Start a subderivation assuming A . If
Similarly, (P P ) means (P P ), and you derive 6, you may end the subderivation and
(P P ) means (P P ).
infer A .
Here there are 21 inference rules and 3 additional proof techniques.

D.

Axiomatic Systems and


Natural Deduction

(2) Copis System

Our next topic is proofs or deductions in the object language. You learned a deduction system for
propositional logic in your first logic course. Most
likely, it was what is called a natural deduction
system, and contained 15 or more rules of inference. There are many competing natural deduction systems out there. The following are derived
loosely on the systems of Kalish and Montague,
Gentzen and Fitch, respectively.
Examples:
(1) Hardegrees System
Inference rules
O: From A B and A infer B. From
A B and B infer A .
O: From A B and A infer B. From A B
and B infer A .
O: From A B infer A . From A B infer B.
O: From A B infer A B. From A B
infer B A .

Inference rules
MP: From A B and A infer B.
MT: From A B and B infer A .
DS: From A B and A infer B .
HS: From A B and B C infer A C .
Simp: From A B infer A .
Conj: From A and B infer A B.
Add: From A infer A B.
CD: From A B and (A D) (B C )
infer D C
Abs: From A B infer A (A B)
Replacement rules
DN: Replace A with A or vice versa.
Com: Replace A B with B A or vice versa.
Replace A B with B A or vice versa.
Assoc: Replace A (B C ) with (A B)
C or vice versa. Replace A (B C ) with
(A B) C or vice versa.
Dist: Replace A (B C ) with (A B)
(A C ) or vice versa. Replace A (B
C ) with (A B) (A C ) or vice versa.

16

Trans: Replace A B with B A or vice


versa.
Impl: Replace A B with A B or vice
versa.
Equiv: Replace A B with (A B) (B
A ) or vice versa. Replace A B with
(A B) (A B) or vice versa.
Exp: Replace (A B) C with A (B
C ) or vice versa.
Taut: Replace A with A A or vice versa. Replace A with A A or vice versa.
Additional proof techniques
CP: Start a subderivation assuming A . If you derive B, you may end the subderivation and infer
A B.
IP: Start a subderivation assuming A . If you derive
B B, you may end the subderivation and infer
A .
Here we have 23 rules and two additional proof
techniques.
A natural deduction system is a system
designed to include as its inference rules those
steps of reasoning that are most psychologically
simple and easy. Usually, this means that some of
the rules are redundant. Consider, e.g., modus
tollens (MT) in the Copi/Cohen system. It is
redundant given the rules of transposition and
modus ponens. Instead of using MT, one could
always use them instead.
1.
2.
3.
4.

P Q
Q
Q P
P

Therefore, in what follows we attempt to construct a relatively minimalistic deduction system


for propositional logic; a system, moreover, that
was custom made for our new revised definition of
a well-formed formula. In that system, officially, all
wffs are built up only using the signs and .
The other signs can be utilized as abbreviations or
shorthand notations, but they are not parts of the
official symbolism.

E.

Axiomatic System L

This system uses the restricted definition of a wff in


which and are the only primitive connectives.
First we need some definitions:
Definition: An axiom of L is any wff of one of the
following three forms:
(A1) A (B A )
(A2) (A (B C ))
((A B) (A C ))
(A3) (A B) ((A B) A )
Note: strictly speaking, there are an infinite number of axioms, because every instance of these
forms is an axiom. Instances of (A1) include not
only P (Q P ) but also complicated
instances such as (A B) ((D
M ) (A B)).

Hence (A1) is not itself an axiom; it is an axiom


1 Trans
schema. System L has an infinity of axioms, but
2, 3 MP
three axiom schemata.
Natural deduction systems contrast with axiomatic systems. Axiomatic systems aim to be System L has only one inference rule, viz., modus
as minimal as possible. They employ as few basic ponens: from A B and A infer B.
principles and rules as possible. For them, sticking
to what is psychologically most natural or conve- Definition: A proof in L of a conclusion A from
nient is not the prime goal.
a set of premises is a finite ordered sequence of
Generally, when working within a deduction wffs B1 , B2 , . . . , Bn , such that the last member of
system, proofs are easier when the system is more the sequence, Bn , is the conclusion A and for each
complex, because you have more rules to work member of the sequence Bi , where 1 i n, either
with. However, when proving things about a de- (1) Bi is a member of the premise set , or (2) Bi
duction system, its much easier when the system is an axiom of L, or (3) Bi follows from previous
is as simple and minimal as possible.
members of the sequence by modus ponens.
17

To put this less formally, L is a deduction system


in which each step must be either a premise, an
axiom, or a modus ponens inference. There are no
other rules.
All proofs are direct. There are no indirect or
conditional proofs.
Contrast the simplicity of this system with the
natural deduction systems on the previous page.
Yet, this system is no less powerful. Indeed, in a
week or two we will prove that it is complete, i.e.,
that every thing that should be provable in it is
provable in it.
Abbreviation: We use the notation
` A (or `L A )
to mean that there is at least one proof (in L) of
A from , or that A is provable from the set of
premises . If has one or just a few members, we
write simply: B ` A or B, C ` A etc. (The sign
` is called the turnstile.)
Definition: A theorem of L is any wff A such
that ` A .
In other words, a theorem is a wff that can be
proven without using any premises.
Abbreviation: We use the notation
`A
To mean that A is a theorem.
Here is a proof showing that P P is a theorem
of L:
1. P ((P P ) P )
instance of A1
2. P (P P )
instance of A1
3. (P ((P P ) P ))
((P (P P )) (P P ))instance of A2
4. (P (P P )) (P P )
1, 3 MP
5. P P
2, 4 MP
Here we see with line 1 and line 2 that different
instances of the same axiom schema are quite often used within the same proof. Line 3 is a typical
instance of (A2), making A and C into P and B
into (P P ).

In general proofs in an axiomatic system are


longer and less natural than in natural deduction.
We make it up to ourselves by never proving the
same thing twice. Notice that the above proof suffices for the particular theorem P P . However, the exact same line of reasoning would work
for any statement of the form A A . Whatever
A is, there is a proof of the form:
1. A ((A A ) A )
A1
2. A (A A )
A1
3. (A ((A A ) A ))
((A (A A )) (A A )) A2
4. (A (A A )) (A A )
1, 3 MP
5. A A
2, 4 MP
Not only that, but we could introduce the appropriate five steps into any derivation whenever we
wanted something of the form A A . Just like
every instance of A (B A ) is an axiom,
every instance of (A1) is a theorem. Hence we call
it a theorem schema. Let us call this schema Self
Implication (Self-Imp).
Once we have given a proof for a theorem
schema, from then on we treat its instances as
though they were axioms, and allow ourselves to
make use of it in any later proof just by citing the
previous proof. This is allowable since, if need be,
we could always just repeat the steps of the original proof in the middle of the new proof. Heres a
proof showing that for any wff A , it holds that:
A ` A
1. A
Premise
2. A (A A )
A1
3. A A
1, 2 MP
4. (A A ) ((A A ) A ) A3
5. (A A ) A
3, 4 MP
6. A A
(Self-Imp)
7. A
5, 6 MP
Strictly speaking, steps such as 6 are not allowed;
however, we could remedy this by simply inserting the appropriate steps from our previous proof
schema. This gets tedious. Our motto in this class
is to never prove something again once youve already proven it once.
Not only that, but the result that for any wff
A , we have A ` A is also the sort of thing
that might come in handy down the road. It is not

18

a theorem schema, since it involved a premise, and


does not show anything to be a theorem. However, what it does show is that whenever we have
arrived at something of the form A within a
proof, we could do the same steps above to arrive at
A . Were allowed to skip these steps, and cite and
above result. In effect, weve added a new inference
rule to our system. We havent really added to the
system, since we could always fill in the missing
steps.
Hence a result of this form is called a derived
rule. Let us call this derived rule double negation
(DN) for obvious reasons. (Actually, it is only half
of double negation. Wed also need to show that
A ` A , which is different.)
Your book just gives theorem schemata and
derived rules generic names like Prop. 1.11a. Personally, I find them easier to remember if I make
up my own descriptive names and abbreviations
like (Self-Imp) and (DN). You can more or less
do as you like. When I do my grading I wont really be looking at how you annotate your proofs.
Ill be looking more at the content of the proofs
themselves.
Another thing I find helpful that the book
doesnt do is recognize that each step in a proof
is itself a result, since we could have stopped the
proof there. Hence I like to use the sign ` before
any step of a proof I arrive at without using any
premises, and similarly, for those steps that did
require a premise, I like to make note of this by
writing the premise before the sign `. So for the
first example, I prefer:

6. ` A A
(Self-Imp)
7. A ` A
5, 6 MP
Written this way, every single line becomes a metatheoretic result. Moreover, it shows which lines in
a proof are justified by which premises, and which
lines were justified without using any premises.
(When a premise is introduced, it is its own justification.) Here we see that in the first proof every line was a theorem, but in the second proof,
some lines were theorems, but others required the
assumption at line 1. The disadvantage of this notation is that it is more to write, which gets tedious
especially when more than one premise is involved.
You can do much the same thing by abbreviating
using line numbers, e.g., by writing line 3 instead
as:
3. [1] ` A A
1, 2 MP
with the [1] representing line 1, and so on.

F.

The Deduction Theorem

If you do your homework, youll be chugging away


at a number of interesting and worthwhile new theorem schemata and derived rules. Today we show
something more radical: we show that the natural deduction method of conditional proof, while
not strictly speaking allowed in System L, is unnecessary in L, because there is a rote procedure for
transforming a would-be conditional proof into a
direct proof. To be more precise, were going to
prove the following meta-theoretic result:

1. ` A ((A A ) A )
A1
2. ` A (A A )
A1
3. ` (A ((A A ) A ))
((A (A A )) (A A )) A2
4. ` (A (A A )) (A A ) 1, 3 MP
5. ` A A
2, 4 MP
And for the second, I prefer to write:

Result (The Deduction Theorem):


If {C } ` A , then ` C A . Or, in other
words, if we can construct a proof for a certain
result A using a set of premises along with
an additional premise or assumption, C , then it
is always possible, using the original set alone, to
construct a proof for the conditional statement
C A.

A ` A
Premise
` A (A A )
A1
A ` A A
1, 2 MP
` (A A ) ((A A ) A )
Proof:
A3 (1) Assume that {C } ` A . This means that
5. A ` (A A ) A
3, 4 MP
there is a proof, i.e., an ordered sequence of

1.
2.
3.
4.

19

wffs B1 , B2 , . . . , Bn that satisfies the definition of being a proof of A from {C }.


(2) Were going to use the technique of proof induction (see page 6) to show that for every step
in this proof, Bi , where 1 i n, it holds
that ` C Bi .
(3) An argument by proof induction works by first
making an inductive hypothesis. Let Bi be an
arbitrary step in the proof. Were allowed to
assume as an inductive hypothesis that for all
earlier steps in the proof Bj such that j < i it
holds that ` C Bj . We need to show that
the same holds for Bi given this assumption.
(4) Because Bi is a step in a proof of A from
{C }, Bi is any one of these three things:
a) Bi is a premise, i.e., it is a member of
{C }.
b) Bi is an axiom of L.
c) Bi followed from previous steps in the
proof by modus ponens.
We will show for any of these cases, it holds
that ` C Bi .
Case a) : Bi is a premise. This means that either Bi is C or it is a member of . If Bi
is C , then C Bi is the same as C C ,
and hence an instance of (Self-Imp), which can
be introduced into any proof. In that case,
` C Bi . If Bi is a member of then
clearly ` Bi . We can introduce the axiom
Bi (C Bi ) as an instance of (A1), and
so by MP we can conclude ` C Bi .
Case b) : Bi is an axiom. Hence we can introduce Bi into any proof at any time. By (A1),
Bi (C Bi ) is also an axiom. Hence
by MP we get ` C Bi , and a fortiori
` C Bi .
Case c) : Bi followed from previous steps in the
proof by modus ponens. This is the hard case.
By the definition of modus ponens, there must
be two previous members of the sequence, Bj
and Bk from which it followed, with Bj taking
the form Bk Bi . By the inductive hypothesis, it holds that ` C Bj and ` C
Bk . Because Bj takes the form Bk Bi , this
means ` C (Bk Bi ). We can then
introduce the axiom (C (Bk Bi ))
((C Bk ) (C Bi )) as an instance

of (A2). By two applications of MP, we get


` C Bi .
(5) Hence, for every step Bi in the original proof,
we can push the assumption C through to
make it an antecedent. This is true of the
last step in the proof, Bn , which must be A ,
since A was the conclusion of the original
proof. Hence, ` C Bn means that
`C A.
e
The above proof of the deduction theorem is
fairly hard to follow in the abstract, but the idea
behind it is actually very simple. What it means is
that for every proof making use of some number
of assumptions or premises, we can eliminate one
of the premises and make it an antecedent on each
line of the original proof. There is a rote procedure for transforming each line into a line with the
eliminated assumption or premise as an antecedent.
We follow this procedure for the example given on
the next page. The deduction theorem works almost as a substitute for conditional proof; more
precisely, however, it shows that conditional proof
in the object language is not needed.

Applying the Deduction Theorem


Last class we covered this proof schema:
1. A ` A
Premise
2. ` A (A A )
A1
3. A ` A A
1, 2 MP
4. ` (A A ) ((A A ) A )
A3
5. A ` (A A ) A
3, 4 MP
6. ` A A
(Self-Imp)
7. A ` A
5, 6 MP
We used a premise to arrive at our conclusion; the
deduction theorem tells us that there is a proof not
making use of the premise, in which the premise
of the original argument becomes an antecedent
on the result, i.e.:
` A A
The proof of the deduction theorem provides us
with a way of transforming the above proof schema
into one for the result that ` A A .

20

We take the steps of the original proof one


by one, and depending on what kind of case it is,
we treat it appropriately. In transforming each
step, the goal is to push the discharged premise
to the other side of the turnstyle, and arrive at
` A . . ..
Line 1 is the discharged premise. It falls in case a)
from the previous page. It becomes:
1. ` A A

(Self-Imp)

Line 2 appeals to an axiom. It falls in case b). So,


it becomes:
2. ` A (A A )
A1
3. ` (A (A A ))
(A (A (A A )))
A1
4. ` A (A (A A ))2,3 MP
Line 3 is gotten by MP. It falls in case c):
5. ` (A (A (A A )))
((A A )
(A (A A )))
A2
6. ` (A A )
(A (A A ))
4, 5 MP
7. ` A (A A )
1, 6 MP
Line 4 also appeals to an axiom. We treat it just
like we treated line 2:

15. ` (A A ) (A (A A ))
A1
16. ` A (A A )
14, 15 MP
Line 7 is another MP step:
17. ` (A ((A A ) A ))
((A (A A )) (A A ))
A2
18. ` (A (A A )) (A A )
13, 17 MP
19. ` A A
16, 18 MP
Weve transformed our original 7 step proof into a
19 step proof for the result we were after. Notice
that in the new proof, every single step is a theorem; the hypothesis is removed entirely. The final
line shows that all wffs of the form A A
are theorems.
This procedure can be lengthy, but it sure is
effective! The proofs that result from the transformation procedure are not usually the most elegant ones possible Notice, e.g., that lines 2 and
7 are identical, so we could have skipped lines 3
7! However, we followed the recipe provided on
the previous page blindly, since we know that that
procedure will work in every case.
Since we know we can always transform the
one kind of proof into the other, from here on out
(well, except in tonights homework), whenever
you have a result of the form A ` B, just go
ahead and conclude ` A B, annotating with
DT. (In effect, this allows you to do conditional
proofs in our System L.)

8. ` (A A ) ((A A ) A )
A3
9. ` ((A A ) ((A A ) A ))
(A ((A A )
((A A ) A )))
A1
10. ` A ((A A )
((A A ) A ))
8, 9 MP Important Derived Rules for System L
Line 5 is gotten at by MP, so case c) again:
The following are either proven in your book, as11. ` (A ((A A ) ((A
signed for homework, or not worth our time to
A ) A ))) ((A (A
A )) (A ((A A ) A ))) bother proving now.
Remember that (A B) is defined as (A
A2
B) and (A B) is defined as (A B), etc.
12. ` (A (A A ))
(A ((A A ) A )) 10,11 MP
Derived rules:
My name/abbreviation:
13. ` A ((A A ) A ) 7, 12 MP
A B, B C ` A C
Syllogism (Syll)
Line 6 appeals to a theorem schema. Strictly speak- A (B C ) ` B (A C )
ing we should write out the intermediate steps, but
Interchange (Int)
to save time we can treat it like an axiom, and use A B ` B A
Transposition (Trans)
the method for case b):
A B ` B A
Transposition (Trans)
14. ` A A

(Self-Imp) A B, B ` A
21

Modus Tollens (MT)

A ` A
Double Negation (DN)
A ` A
Double Negation (DN)
A ` A B
False Antecedent (FA)
A `BA
True Consequent (TC)
A , B ` (A B) True Ant/False C.(TAFC)
(A B) ` A
True Antecedent (TA)
(A B) ` B
False Antecedent (FC)
A B, A B ` B
Inevitability (Inev)
A A `A
Redundancy (Red)
A `A A
Redundancy (Red)
A `A B
Addition (Add)
A `BA
Addition (Add)
A B`BA
Commutativity (Com)
A B`BA
Commutativity (Com)
A B`BA
Commutativity (Com)
(A B) C ` A (B C )
Associativity (Assoc)
A (B C ) ` (A B) C

(Assoc)
(A B) C ` A (B C )

(Assoc)
A (B C ) ` (A B) C

(Assoc)
A B`A
Simplification (Simp)
A B`B
Simplification (Simp)
A ,B ` A B
Conjunction Intro (Conj)
A B, B A ` A B
Biconditional Intro (BI)
A ,B ` A B
Biconditional Intro (BI)
A , B ` A B
Biconditional Intro (BI)
A B`A B
Biconditional Elim (BE)
A B`BA
Biconditional Elim (BE)
A B, A ` B
Bic. Modus Ponens (BMP)
A B, B ` A
Bic. Modus Ponens (BMP)
A B, A ` B Bic. Modus Tollens (BMT)
A B, B ` A
Bic. Modus Tollens (BMT)

be provable in it given the intended semantics for


the signs utilized in the language. Generally, we
say that a logical system is consistent if and only
if there is no wff A such that both A and A are
provable in the system.
We now show that L has these features.

Result (Soundness): System L is sound, i.e., for


any wff A , if ` A then  A . In other words,
every theorem of L is a tautology.

Proof:
(1) Assume ` A . This means that there is a sequence of wffs B1 , B2 , . . . , Bn constituting
proof of A in which every step is either an
axiom or derived from previous steps by MP.
(2) We shall show by proof induction that every
step of such a proof is a tautology. We assume
as inductive hypothesis that all the steps prior
to a given step Bi are tautologies. We now
need to show that Bi is a tautology.
(3) Bi is either an axiom or derived from previous steps by MP. If it is an axiom, then it is a
tautology. (A simple truth table for the three
axiom schemata shows that all instances are
tautologies.) By an earlier result (see p. 3), anything derived from MP from tautologies is also
a tautology. Hence, Bi is a tautology.
(4) By proof induction, all steps of the proof are
tautologies, including the last step, which is A .
Hence  A .
e

As you see, I prefer Copis abbreviations. You can


use whatever abbreviations you prefer, provided
that you dont use a derived rule until youve given
a proof schema for it! Once you do it, you can
always refer back to it.

G.

Soundness and Consistency

Corollary (Consistency): System L is consistent, i.e., there is no wff A such that both ` A
and ` A .

Proof:

Time to get to the really good stuffthe important


Suppose for reductio that there is some A such
results of this chapter.
Generally, we say that a logical system is sound that ` A and ` A .
if and only if everything provable in it ought to Since L is sound,  A and  A .
22

By the definition of a tautology, every truth-value negation is a tautology, and hence neither it nor
assignment makes both A true and A true.
its negation is a theorem. The other sense of comHowever, no truth-value assignment can make pleteness is the converse of soundness, i.e., that
both A and A true, and so our assumption is everything that should be provable in the system
impossible.
e given the semantics of the signs it employs is in
fact provable. This notion of completeness was
Here we see that consistency is a corollary of first used by Kurt Gdel, and is sometimes called
semantic completeness. System L is complete in
soundness. Heres another.
this sense. Before we prove this, we first need to
prove something else.
Corollary: If {B1 , B2 , . . . , Bn } ` A then
{B1 , B2 , . . . , Bn }  A .

Composition Lemma

Proof:
The reason is that if
{B1 , B2 , . . . , Bn } ` A
then by multiple applications of the deduction theorem,
` (B1 (B2 . . . (Bn A ))).
Then, by soundness, we can conclude:
 (B1 (B2 . . . (Bn A )))
Then by simple reflections on the rules governing
truth tables, it is obvious that:
{B1 , B2 , . . . , Bn }  A
In other words, only logically valid arguments have
proofs in L.
e

H.

(Something that is proven only as a means towards


proving something else is called a lemma.)
Most likely, one of the first things you learned
about propositional logic is how to compute the
truth value of a given statement if you are given
the truth values of all its statement-letters. This is
what you do when you fill in a row of a truth table.
P Q R
P (Q R)
T T F
T T T (T F F)
In system L, this corresponds to the result that,
if, for every statement letter in A , you are given
either it or its negation as a premise, you should
be able to derive either the truth or the falsity of
A . For the example just given, we should have:
{P, Q, R} ` P (Q R). We do!
1. {P, Q, R} ` P
Premise
2. {P, Q, R} ` Q
Premise
3. {P, Q, R} ` R
Premise
4. {P, Q, R} ` (Q R)
2, 3 (TAFC)
5. ` (Q R) [P (Q R)]
A1
6. {P, Q, R} ` P (Q R)
4, 5 MP
Let us prove this result in a general form.

Completeness

Our next task is to prove the converse of soundness,


i.e., that if  A then ` A .
Unfortunately, the word complete is used
with two different meanings in mathematical logic.
On one meaning (used by Emil Post), a system is
said to be complete if and only if for every wff
A , either A or A is a theorem of the system.
System L is obviously not complete in this sense,
since for a contingent statement, neither it nor its
23

Result (Composition Lemma): If A is a wff


whose statement letters are P1 , . . . , Pn , and
there is a truth-value assignment f such that set
contains Pi iff f assigns the truth value T to
Pi , and contains Pi iff f assigns the truth
value F to Pi , then if f makes A true, then
` A , and if f makes A false, then ` A .

Proof:
Example: For illustration purposes only, well
We show this by wff induction.
assume A contains only three statement letBase step: Let A be a statement letter. Then the
ters P , Q and R.
only statement letter making up A is A itself. If
f assigns T to A , then A and hence ` A . (2) As a tautology, every truth-value assignment
to those statement letters makes A true.
Similarly, if f assigns F to A , then A , and
(3)
By the Composition Lemma, it follows that for
hence ` A .
every for set that contains either Pi or Pi
Induction step: Because all complex wffs are built
but not both for each i such that 1 i n, we
up using the signs and , we need to show two
have ` A .
things, (a) if A takes the form B then the above
holds for A assuming it holds of B, and (b) if A
Example: Consider the truth table for A ; it is a
takes the form B C , then the above holds of A
tautology, true on every row. Each row gives us
assuming it holds of B and C .
a different result from the Composition Lemma,
but always a different way of proving A .
First lets show part (a).
Suppose A takes the form B. If f makes A true,
P Q R A
Result of lemma
then it must make B false. By our assumption,
T T T T
{P, Q, R} ` A
` B, which is the same as ` A , which is
T T F T
{P, Q, R} ` A
what we want. If f makes A false, it must make
T F T T
{P, Q, R} ` A
B true. By our assumption ` B, and by (DN)
T F F T
{P, Q, R} ` A
` B, which is the same as ` A .
F T T T
{P, Q, R} ` A
Now lets show part (b).
F T F T
{P, Q, R} ` A
Suppose A takes the form B C . If f makes
F F T T
{P, Q, R} ` A
A true, it must make either B false or C true.
F F F T {P, Q, R} ` A
If f makes B false, then by our assumption `
B, and by the derived rule (FA), it follows that (4) By the Deduction Theorem, we can conclude
that if is a set containing either Pi or Pi
` B C , i.e., ` A . If f makes C true, by
for each i such that 1 i n 1, we have
our assumption ` C and so by the derived rule
both ` Pn A and ` Pn A .
(TC), we get ` B C , i.e., ` A . On the
By the derived rule (Inev), we can conclude
other hand, if f makes A false, it must make B

`A.
true and C false. By the assumption, ` B and
` C . Then by the derived rule (TAFC), we get
Example: What were doing here taking the
` (B C ), or in other words, ` A .
last statement letter or negation in each
This completes the induction step, and hence the
premise set and removing it by the deducComposition Lemma follows by wff induction. e
tion theorem, thereby making it an antecedent.
However, since we have both the case with the
We are now ready to tackle completeness.
affirmative antecedent and the case with the
negative antecedent, they drop off by (Inev).
{P, Q} ` R A
so {P, Q} ` A
{P, Q} ` R A

{P, Q} ` R A
so {P, Q} ` A
{P, Q} ` R A

{P, Q} ` R A
so {P, Q} ` A
{P, Q} ` R A

{P, Q} ` R A
so {P, Q} ` A
{P, Q} ` R A


Result (Completeness): System L is semantically complete, i.e., for any wff A , if  A then
`A.

Proof:
(1) Assume that  A , and let the statement letters
making it up be P1 , . . . , Pn .
24

(5) By continued application of the same process


described in step (4), we can successively eliminate the members of the premise sets, arriving
ultimately at the results that ` P1 A and
` P1 A . Again, by (Inev), it follows that
`A.
Example: We just continue the same process:

{P, Q} ` A so P ` Q A
P `A
{P, Q} ` A so P ` Q A

{P, Q} ` A so P ` Q A
P ` A
{P, Q} ` A so P ` Q A
Finally we get both ` P A and ` P
A , and can conclude ` A by (Inev).
Actually, from the above proof, the proof of the
composition lemma and the proof of the deduction
theorem, we could write an algorithm for teaching
a computer how to construct a derivation for any
given tautology of our language. (Most such proofs,
however, would be several thousand steps long.)

 A , then
Corollary: if B1 , . . . , Bn
B1 , . . . , Bn ` A . For every valid argument,
there is a deduction for it in our very minimal
System L.

exactly what it was intended to be: all and only


logical truths are provable in it, and all and only
valid arguments have proofs in it. The only way in
which it might be criticized if it is were redundant,
i.e., if it contained more axioms than necessary.
Our task today it to show that we needed all three
axiom schemata. Well show that it is impossible to
derive any one of the axiom schemata as a theorem
schema using only the other two axiom schemata
and MP.
This may seem like a difficult task: how does
one prove that something is not provable from
something else? Consider: if we expanded L by
adding axioms that are not tautologies, then obviously, we could prove that the new axioms were
independent because everything provable from the
axioms of L alone is a tautology. We cant use this
method to establish the independence of A1 from
A2 and A3, however, because all are tautologies.
Instead, we focus on a different, made-up property a wff can have, called selectness.
Definition: A schmuth-value assignment is
any function mapping statement letters of the language of propositional logic to the set {0, 1, 2}.

In effect, such an assignment is something that assigns either 0, 1 or 2 to each statement letter. This
is rather like a truth-value assignment, which maps
statement letters to T and F, except that here there
are more possibilities.
Proof:
Indirectly a schmuth-value assignment deterSee the proof of the converse of the above, given
as a corollary to Soundness on p. 23, and run it in mines a schmuth-value for complex wffs according
the other direction.
e to the following charts:
A B A B
0
0
0
0
1
2
Corollary: For any wff A , ` A iff  A . (All
A A
0
2
2
and only tautologies are theorems of L.)
0
1
1
0
2
1
1
1
1
2
2
0
1
2
0
2
0
0
Proof: Combine Soundness and Completeness.
2
1
0
2
2
0

I.

Independence of the Axioms

These charts allow us to construct schmuth tables.


You might think that in establishing soundness and Let us see what one looks like for an instance of
completeness, we have shown that System L is (A1).
25

A
0
0
0
1
1
1
2
2
2

(B
0
0
2
1
0
2
0
0
0
1
2
2
0
0
0
1
0
2

Proof:
Suppose A is select, i.e., has schmuth value 0 for
every possible schmuth-value assignment. Similarly, suppose that A B is select, i.e., has 0 for
every possible schmuth-value assignment. Then
B must be select as well. We can see this by the
schmuth table rules for . If B were not select,
then it would have 1 or 2 as value for some assignment. If so, then A B and A could not both
be select, because A B has value 2 when A
has 0 and B gets 1 or 2 as value.
e

A)
0
0
2
0
0
0
2
1
2
1
0
1
2
2
0
2
0
2

The final schmuth-value of this formula for each


schmuth-value assignment is given underneath
the main connective (the first ). Here we see
that this wff is schmtingent, i.e., it has different
schmuth-values for different schmuth-value assignments.
Contrast this with the possible instances of
(A3):
( A
1 0
1 0
1 0
1 1
1 1
1 1
0 2
0 2
0 2

B)
2 1 0
0
2 1 1
0
0
2 0 2
0
2 1 0
2 1 1
0
0
2 0 2
2 1 0
0
2 1 1
0
0
0 0 2

(( A
1 0
1 0
1 0
1 1
1 1
1 1
0 2
0 2
0 2

B) A )
2 0 0 0
2 1 0 0
0 2 0 0
2 0 0 1
2 1 0 1
0 2 2 1
0 0 2 2
2 1 0 2
2 2 0 2

Result: Axiom schema (A1) is independent of


(A2) and (A3).

Proof:
Suppose we had an axiom system in which our
only axiom schemata were (A2) and (A3) and our
only inference rule were modus ponens. If so, then
every theorem of the system would be select, since
the axioms are select and everything derived from
select wffs by MP is also select. Because some
instances of (A1) are not select, this means some
instances of (A1) would not be theorems of this system. Hence, not all instances of (A1) are derivable
from (A2), (A3) and MP alone.
e

Instances of (A3) are schmtologies, i.e., have the


A similar procedure can show that (A2) is indeschmuth-value 0 for any possible schmuth-value
pendent of (A1) and (A3). We again consider funcassignment.
tions assigning one of {0, 1, 2} to each statement
Definition: We say that a wff is select if and only letter, but instead use the different rules below for
if it is a schmtology, i.e., it has schmuth-value 0 for complex wffs:
any possible schmuth-value assignment.
A B A B
0
0
0
Similarly, all the instances of (A2) are select. Ill
0
1
2
spare you the 27 row table. Youll just have to take
A A
0
2
1
my word for it.
0
1
1
0
0
1
0
1
1
2
2
1
1
2
0
Result: Modus ponens preserves selectness.
2
0
0
2
1
0
2
2
0
26

We then define a notion of grotesqueness. A complex wff is grotesque if and only if it comes out
with value 0 using these revised rules for any possible assignment of 0, 1 or 2 to all its statement
letters.
It turns out that all instances of (A1) and (A3)
are grotesque, but some instances of (A2) are not.
Modus ponens preserves grotesqueness. So (A2) is
independent of (A1) and (A3).
For homework, youll be proving the independence of (A3) from (A1) and (A2). Relatively, thats
the easiest, since it doesnt require three values,
and can be done with assignments into {0, 1}, provided that one changes the rule governing how the
value for A is determined by the value of A .
These independence results establish that there
is no redundancy in our axiom schemata; we
couldnt simply remove one of them and be left
with a complete system.
In one sense, our system is as minimal as possible, but in another sense it isnt. We cant simply
remove any of the ones we have, but we could start
with completely different axiom schemata. Several
rival axiomatizations are possible; you can find a
list of some of them in your book pp. 4546. Axiomatizations have been found in which there is
only one axiom schema. Just like the decision to
use both and instead of |, however, there
are diminishing returns to minimalism. The proofs
in such systems for even the most mundane results often require an insane number of steps and
insanely complicated axioms.
However, in case youre curious, the first complete system for propositional logic using a single
axiom schema was discovered by Jean Nicod in
1917, and it uses the Sheffer stroke instead of
and . An axiom is any instance of the single
schema:
(A | (B | C )) | ((D | (D | D)) |
((E | B) | ((A | E ) | (A | E ))))
The only inference rule is: From A | (C | B) and
A infer B.
Are you glad I didnt make you use that system?

27

UNIT 2
METATHEORY FOR PREDICATE LOGIC

A.

The Syntax of Predicate


Logic

or only a variable one: the difference between x2


and xi . It simplifies some things when we get to
the semantics to use only one letter x, but I find
Onwards and upwards. Our first task is to describe statements with multiple variables much easier to
read with x, y and z instead of x1 , x2 and
our new language.
x3 .
Definition: An individual variable is one of the
Definition: An individual constant is one of the
lowercase letters x, y, or z, written with or withlowercase letters from a to e, written with or without a numerical subscript:
out a numerical subscript.
Examples: x, x1 , x12 , y, y2 , z, z13 , etc.

Examples: a, a2 , b, c124 , etc.

I use the unitalicized letters x, y and z as object language variables, and italicized letters in
the same rangex, y, z as metalinguistic
schematic letters for any object-language variables.
Thus, e.g.,
(x)(F x Gx)

Again, I use them unitalicized for object-language


constants, and italicized, when (very rarely) I need
to make a schematic statement about any constant.
Again, Mendelson only uses an .

Definition: A predicate letter is one of the uppercase letters from A to T , written with a numerical
Schematically represents all of (x)(F x Gx)
superscript 1, and with or without a numerical
and (y)(F y Gy) and (x3 )(F x3 Gx3 ),
subscript.
and so on. The difference is subtle, and usually not
so important to keep straight. After all, object lan- Examples: A1 , R2 , H 4 , F 1 , G3 , etc.
2
4
guage variables tend to be interchangeable; these
do not mean anything different. This is why Im us- Even when italicized, take these to be object laning notation that does not emphasize the difference. guage constants; script letters such as P are used
Still, we do need a technical means for differentiat- in their place schematically if need be.
ing between the two when it is necessary.
The superscript indicates how many terms the
Officially, Mendelson only uses xn , and not predicate letter takes to form a statement. A prediyn or zn , although he doesnt stick to this. His cate letter with a superscript 1 is called a monadic
variables are always italicized; the only differ- predicate letter. A predicate letter with a superence beween object language and metalanguage script 2 is called a binary or dyadic predicate
is whether a particular numerical subscript occurs, letter.
28

It is customary to leave these superscripts off everyone else. Here, I follow Mendelson, though
when it is obvious from context what they must be. Ill put hard brackets to another use in a minute.
E.g., R2 (a, b) can be written simply R(a, b).
However, I adopt the convention that if the
m
Officially Mendelson only uses An .
terms in an atomic formula contain no function letters, the parentheses and commas may be removed.
Definition: A function letter is one of the lowercase letters from f to l, written with a numerical
superscript 1, and with or without a numerical Examples: F x is shorthand for F 1 (x), and
subscript.
Rab is shorthand for R2 (a, b).
Examples: f 1 , g 2 , h33 , etc.
The numerical superscript indicates how many Definition: A well-formed formula (wff) is reargument places the function letter has. A func- cursively defined as follows:
(i) any atomic formula is a wff;
tion letter with a superscript 1 is called a monadic
(ii) if A is a wff, then A is a wff;
function letter; a function letter with a superscript
(iii)
if A and B are wffs, then (A B) is a wff;
2 is called a binary/dyadic function letter, etc.
Here too, it is customary to leave these super- (iv) If A is a wff and x is an individual variable,
then ((x) A ) is a wff;
scripts off when it is obvious from context what
(v) nothing that cannot be constructed by repeated
they must be. E.g., f 1 (x) can be written simply
applications of the above is a wff.
f (x).
Definition: A term of the language is defined recursively as follows:
(i) all individual variables are terms;
(ii) all individual constants are terms;
(iii) if F is a function letter with superscript n,
and t1 , . . . , tn are terms, then F (t1 , . . . , tn )
is a term;
(iv) nothing that cannot be constructed by repeated
applications of the above is a term.
Examples: a, x, f (a), g(x, f (y)), etc.

Mendelson puts parentheses around quantifiers.


Other notations for (x) include (x), x, and
V
x. Again, you can use whatever notation you
want, and I might not even notice.
We continue to use the same conventions as
last unit for dropping parentheses. In Mendelsons
practice, the quantifier is taken to fall between ,
, and , in the ranking. In other words:
(x) F x Ga
abbreviates:

As evinced above, I use italicized lowercase letters


from later on in (but not the end of) the alphabet,
such as t, r, etc., schematically for any terms.
Whereas:
Definition: An atomic formula is any expression
of the form P(t1 , . . . , tn ) where P is a predicate abbreviates:
letter with superscript n, and t1 , . . . , tn are all terms.

(((x) F x) Ga)
(x) F x Ga

((x)(F x Ga))
1

Examples: F (a), F (f (x)),


H 4 (x, b, y, g(a, x)), etc.

R43 (a, b, c),

If you used Hardegrees Intermediate textbook, you


may be used to using hard brackets [ and ] instead of soft brackets for atomic formulas. Mendelson uses soft brackets for both, as does almost

This is unusual on Mendelsons part and I shall


avoid making use of this convention in cases similar to this last one.
The existential quantifier is introduced by definition. The definitions for the other connectives
remain unchanged.

29

Abbreviations:
((x) A ) abbreviates ((x) A )
(A B) abbreviates (A B)
(A B) abbreviates (A B)
(A B) abbreviates
((A B) (B A ))

Definition: A wff A is said to be closed iff A


contains no free variables; otherwise A is said to be
open.

Open formulas may be very unfamiliar to some of


you. In Hardegrees books, you never see something like F x by itself as a line of a proof: there,
variables are always bound by quantifiers. Only
constants appear on their own.
To not be thrown by wffs including free variDefinition: A first-order language is any logi- ables, try not to equate the notion of a true/false
cal language that makes use of the above definition sentence with the notation of a wff. In fact:
of a wff, or modifies it at most by restricting which
constants, function letters and predicate letters are Definition: A sentence is a closed wff.
utilized (provided that it uses at least one predicate Normally, well only call sentences true or false.
letter). E.g., a language that does not use function For wffs containing free variables, we say they are
letters still counts as a first-order language.
satisfied by some values of the variables, and not
satisfied by others.
In a derivative sense, however, well say that an
Free and Bound Variables
open wff is true iff any values we choose for the
Definition: When a quantifier (x) occurs as part free variable(s) would satisfy them. However, these
of a wff A , the scope of the quantifier is defined as are semantic issues and were still doing syntax.
the smallest part ((x) B) of A such that ((x) B)
Examples:
is itself a wff.
1. If R2 means . . . is taller than . . . , and b is an
individual constant standing for Barack Obama,
Definition: If x is a variable that occurs within a
then the open wff, R2 (x, b), is satisfied by all
wff A , then an occurrence of x in A is said to be
values of the variable that are things taller than
a bound occurrence iff it occurs in the scope of a
Obama, and in our derivative sense, we say that
quantifier of the form (x) within A ; otherwise, the
this wff is not true because it is not satisfied by
occurrence of x is said to be a free occurrence.
every value of the variable.
The open wff R2 (x, b) R2 (x, b), (whatever
2.
Examples:
our interpretation) is satisfied by all values of
1. All three occurrences of x in
the variable, and, derivatively, is regarded as
(x)(F x F x) are bound.
true.
2. The (solitary) occurrence of x in
F x (y) Gy is free.
Why do we need both (x) R2 (x, b) and
R2 (x, b)? This will become clearer when we get
Definition: A variable x that occurs within a wff
to the system of deduction. (Actually this is in part
A is said to a bound variable, or simply bound, iff
historical accident; axiom systems that do without
there is at least one bound occurrence of x in A .
free variables have been devised, but they are more
complicated.)
Definition: A variable x that occurs within a wff
The difference is roughly the same as between
A is said to be a free variable, or simply free, iff
any and all.
there is at least one free occurrence of x in A .
Definition: If A is a wff, t is a term and x is a
Notice that x is both bound and free in F x variable, then t is free for x in A iff no free oc(x) Gx, because some occurrences are bound and currence of x in A lies within the scope of some
one is free.
quantifier (y) where y is a variable occurring in t .
30

Basically, this means that if you substitute t for


all the free occurrences of x within A , you wont
end up inadvertently binding any of the variables
that happen to occur in t.
Examples:
1. a is free for x in (y) Gy Gx.
2. f 2 (x, z) is free for x in (y) Gy Gx.
3. z is not free for x in (y) Gy (z) Rxz.
4. f 2 (a, z) is not free for x in
(y) Gy (z) Rxz.
5. f 2 (a, z) is free for x in
(y) Gy (z) (x) Rxz.
6. All terms are free for x in (y) Gy.
7. All terms are free for y in (y) Gy.
I write A [x] for an arbitrary wff that may or may
not contain x free. If the notation A [t] is used in
the same context, it means the result of substituting
the term t for all free occurrences of x (assuming
there are any) in A [x].

B.

The Semantics of Predicate


Logic

Over the next couple weeks, well introduce an


axiomatic system of deduction for predicate logic,
and prove it complete, consistent and sound, just
like we did for propositional logic. In other words,
well prove that every theorem is logically true,
and that every logically true wff is a theorem. But,
what is a logically true wff in predicate logic?
In propositional logic, a logically true wff is
just a tautology, and we had a decision procedure
for determining which wffs are logically true and
which are logically false, and which arguments are
valid and which are invalid, viz., truth tables.
However, you may never have learned anything similar for determining whether a given predicate logic statement is valid or invalid according
to its semantics (i.e., its form and the meaning of its
logical constants). We dont teach it here at UMass
in Intro or Intermediate Logic. We must make up
for this glaring omission immediately!
The notion of a truth-value assignment from
propositional logic is replaced with the notion of
an interpretation or model.

Examples:
1. If A [x] is F x, then A [y] is F y.
2. If A [x] is (y) R(y, x) then A [f (b)] is
(y) R(y, f (b)).
3. If A [z] is F z Gz, then A [d] is F d Gd.
4. If A [x] is F x (x) Gx, then A [d] is Definition: An interpretation M consists of the
F d (x) Gx.
following four things:
5. If A [x] is F a then A [y] is F a.
1. The specification of some non-empty set D to
serve as the domain of quantification for the
Mendelson writes A (x) and A (t) instead of A [x]
language.
and A [t]. I think my notation makes it clearer
This set is the sum total of entities the quantithat these signs are parts of the metalanguage, and
fiers are interpreted to range over. The domain
that the parentheses that appear here are not the
might include numbers only, or people only, or
parentheses used in atomic formula or in function
anything else you might imagine. The domain
terms.
of quantification is sometimes also known as
Similarly, I write A [x, y] for an arbitrary wff
the universe of discourse.
that may or may not contain x and y free, and in 2. An assignment, for each individual constant in
the same context I use A [t, s] for the result of subthe language, some fixed member of D for which
stituting t for all free occurrences of x, and s for
it is taken to stand.
all free occurrences of y, in A [x, y].
For a given constant c, this member is denoted
in the metalanguage by (c)M .
Examples:
3. An assignment, for each predicate letter with su1. If A [x, y] is Rxy, then A [a, b] is Rab.
perscript n in the language, some subset of Dn .
2. If A [x, y] is (z)(Rzx Ryz), then A [a, b]
That is, the interpretation assigns to each prediis (z)(Rza Rbz).
cate letter a set of n-tuples from D.
31

For a given predicate letter P n , this set is denoted in the metalanguage by (P n )M . This
set can be thought of as the extension of the
predicate letter under the interpretation.
4. An assignment, for each function letter with superscript n in the language, some n-place operation on D.
In other words, each function letter is assigned
a set of ordered pairs, the first member of which
is itself some n-tuple of members of D, and the
second member of which is some member of
D. This set of ordered pairs is a function, so
for each n-tuple in its domain, there is a unique
element of D in its range. So if D is the set
of natural numbers, a two-place function letter F 2 might be assigned the addition operation, i.e., the set of all ordered pairs of the form
hhn, mi, pi such that n + m = p. This operation
can be thought of as the mapping, or functionin-extension, represented by the function letter
under the interpretation.
In a sense, the four parts of a model fix the meanings of the quantifiers, constants, predicate letters
and function letters, respectively. (Or at the very
least, they fix as much of their meanings as is relevant in an extensional logical system such as firstorder predicate logic.) This leaves only something
to be said about variables.

Sequences
Each model is associated with a certain domain or
universe of discourse. Variables are allowed to take
different values within that domain. A variable
is given a value by what is called a (denumerable)
sequence.
Definition: A denumerable sequence or variable assignment for domain D is a function whose
domain is the set of positive natural numbers, and
whose range is a subset of D.
What does this have to do with assigning values to
the variables?
We first note that while there are an infinite
number of variables (since we can always use different subscripts), we can arrange them in a fixed

order and assign them a numbered position in that


ordering. I will utilize the ordering:
x, y, z, x1 , y1 , z1 , x2 , y2 , z2 , . . .
For any given variable x, its position in this order
can be determined with the formula:
p = 3n + k
Where p is the number of the position, n is the
number of the subscript on the variable (or 0 if it
has none), and k is either 1, 2 or 3 depending on
whether the letter used is x, y or z.
(Because your book does not use y or z officially, it can simply order the variables according
to their subscripts.)
For each interpretation M, there will be some
(usually infinite) number of sequences of the elements of its domain D. A denumerable sequence
can be thought of as an ordered list of elements of
the domain D that has a beginning but has no end,
in which the members of D are arranged in any
order, with or without repetition or patterns.
So if D is the set containing the members of
the Beatles, each of the columns below represents
a different denumerable sequence.
s1
s2
s3
s4
1 John
John
Ringo
Ringo
2 John
Paul
Paul
Ringo
3 John George
John
Ringo
4 John Ringo
John
Paul
5 John
John
Ringo
Ringo
6 John
Paul
George Ringo
7 John George
John
Ringo
8 John Ringo George
Paul
9 John
John
Ringo George
..
..
..
..
..
.
.
.
.
.
There are infinitely many more such sequences in
addition to those listed.
Think of each member of a sequence as the
value of the variable that occupies the corresponding position. The variable x is correlated with the
first position in such sequences; y is correlated
with the second position and so on. So s1 makes
John the value of every variable. s2 different Beatles the values of different variables in a patterned
way. And s3 does so in an unpatterned way.

32

Therefore, each sequence correlates every variable of the language with a member of the domain
of that interpretation. For a given variable x and
sequence s, in the metalanguage, we use s(x) to
denote the member of D which s correlates with
x. Hence, s3 (y) is Paul. Given the assignment
made for the constants and function letters in M,
derivatively, each sequence correlates every term
of the language with a member of D. If c is an
individual constant, then let s(c) be (c)M . Then for
function terms, let s(F (t1 , . . . , tn )) be the entity
in D such that hhs(t1 ), . . . , s(tn )i, i (F )M .
A sequence acts just as an assignment of values
to the variables. Within a given interpretation M,
an open wff might be satisfied by some and not
others.

Satisfaction
Definition: The notion of satisfaction is defined
recursively. For a given interpretation M with domain
D:
(i) If A is an atomic wff P(t1 , . . . , tn ), then sequence s satisfies A iff the ordered n-tuple
formed by those entities in the domain D that
s correlates with t1 , . . . , tn is in the extension
of P for interpretation M, or more precisely,
hs(t1 ), . . . , s(tn )i (P)M .
(ii) Sequence s satisfies a wff of the form A iff s
does not satisfy A .
(iii) Sequence s satisfies a wff of the form (A
B) iff either s does not satisfy A or s does
satisfy B.
(iv) Sequence s satisfies a wff of the form (x) A
iff every sequence s that differs from s at most
with regard to what entity of D it correlates
with the variable x satisfies A .

Truth and other Semantic Notions


The truth of a wff of predicate logic is relative to
an interpretation, just like whether or not a wff
of propositional logic is true is relative to a given
truth-value assignment.
Definition: A wff A is true for interpretation
M iff every denumerable sequence one can form from
the domain D of M satisfies A .
Abbreviation: Used in the metalanguage:
M A
Means that A is true for M. (The subscript on  is
necessary here.)
Definition: A wff A is false for interpretation
M iff no denumerable sequence one can form from
the domain D of M satisfies A .
Notice that while closed wffs are always either true
or false (but not both) for an interpretation M; open
wffs can be neither.
However, for all wffs A , A is true for M iff
A is false, and A is false iff A is true.
Here are some other important consequences
of these definitions (elaborated upon in your book,
pp. 6164):

Roughly speaking, each sequence assigns a member of D to each free variable, and from there, one
can determine whether or not the wff is satisfied
by that variable assignment as one would expect.
The notion of satisfaction is important because
it is used to define the notion of truth. (This is the
heart of Tarskis formal semantics.)
33

Modus ponens preserves truth in an interpretation, i.e., for all interpretations M, if M A


and M (A B) then M B.
If A is an open wff, and B is obtained from
A by binding all free variables of A with
initial quantifiers ranging over the whole
wff, then for any interpretation M, M A
iff M B.
Every instance of a truth-table tautology is
true for every interpretation.
If A is a sentence, then for every interpretation M, either M A or M A .
If t is free for x in A [x], then any wff of
the form (x) A [x] A [t] is true for all
interpretations.
If A does not contain x free, then any wff of
the form (x)(A B) (A (x) B)
is true for all interpretations.

Definition: If M is an interpretation, and is a set Definition: A wff A is said to be contradictory


of wffs all of which are true for M, then M is called a iff it is not satisfiable. Hence, A is contradictory iff
model for .
 A .
Notice that every interpretation will be a model for
some sets of wffs.
So it is appropriate to equate the notion of a
model with the notion of an interpretation. In
fact, the study of formal semantics for artificial
languages is sometimes called model theory. I
tend to use the words model and interpretation
interchangeably.

Definition: A wff A is said to logically imply a


wff B iff in every interpretation, every sequence that
satisfies A also satisfies B.
Abbreviation: The notation:
A B

Definition: A wff A is said to be logically true


or logically valid iff A is true for every possible means that wff A logically implies B.
interpretation.
Definition: A wff A is a logical consequence of
a set of wffs iff in every interpretation, every sequence that satisfies every member of also satisfies
A.

Abbreviation: The notation:


A

(leaving off any subscript) means that A is logically valid. Because interpretations are analogous
for truth-value assignments in propositional logic, Abbreviation: Similarly
this definition is analogous to the definition of a
A
tautology given in our last unit; this is why the
notation  is appropriate.
means that A is a logical consequence of .
What interpretations are possible? Do we know
how many? (In a footnote, Mendelson, rather
dubiously, equates interpretations with possible Definition: A wff A is said to be logically equivworlds. This is misleading in many ways, but it alent to a wff B iff in every interpretation M, A
can sometimes be helpful to think of it in this way.) and B are satisfied by the same sequences.
It is impossible that both  A and  A .

Abbreviation: The notation

Definition: A wff A is said to be satisfiable iff


there is at least one interpretation for which there is
at least one sequence that satisfies A .

A  B
means that A and B are logically equivalent.

Hence,  A iff A is not satisfiable, and


 A iff A is not satisfiable.

It follows that A  B iff A  B and


B A.

Derivatively, a set of wffs is said to be (mutually)


satisfiable iff there is at least one interpretation for
which there is at least one sequence that satisfies This rounds out our presentation of the important
every member of .
semantic concepts for predicate logic.
34

C.

Countermodels and
Semantic Trees

one interpretation that satisfies all the premises of


the argument but does not satisfy the conclusion,
the conclusion is not a logical consequence of the
If you are like many students, in your introduc- premises.
The same reasoning shows that the wff:
tory logic courses, you were taught the truth-table
method for showing the validity or invalidity of
F a ((Ga F a) Ga)
an argument in propositional logic, but were never
taught an analogous method for showing the inva- is not logically valid.
lidity of an argument in predicate logic. To be sure,
you were probably taught a deduction system for Definition: A model in which there is a sequence
predicate logic; but such deductions can only be that does not satisfy a given wff A (or set of wffs )
used to show that an argument is valid, not that an is called a countermodel to A (or to .)
argument is invalid.
The definitions above tell us more or less what In propositional logic, there is an effective procethe process should be like. Just like showing dure that always identifies a counter truth-value
a propositional logic argument to be invalid in- assignment if one exists, or shows that there are
volves finding a truth-value assignment making none. (Truth tables.) When a wff is logically valid,
the premises true and the conclusion false, the ap- is there always a method for proving that there are
propriate method for predicate logic involves find- no countermodels? If a wff is not valid, is there
ing an interpretation containing a sequence that always an effective method for finding its countermodels?
satisfies all the premises but not the conclusion.
As it turns out, no, there isnt. There is a
Consider the following argument:
method that works a lot of the time, but it isnt alFa
ways effective. This procedure involves constructGa F a
ing what are called semantic trees.
Ga
Semantic trees work very similarly to abbreThe conclusion of this argument is not a logical viated truth tables, i.e., those you do by simply
consequence of its premises. The reason is that assuming that the complex wff is F and attemptthere are sequences in some interpretations that ing either to find a truth-value assignment in line
satisfy the premises but not the conclusion. All we with this assumption, or to show that no truthhave to do is describe one.
value assignment ever could be in line with this
Consider the model B in which (1) the domain assumption.
of quantification is the set {Britney Spears}, (2) the
To test whether a certain wff is satisfiable we
assignment to all constants, including a, is Brit- write T next to it. To test whether it is a logical
ney Spears, (3) all predicate letters are assigned an truth we write F next to it to determine whether it
empty extension except the predicate letter F 1 , is possible for it not to be satisfied. To test whether
which is assigned the extension {Britney Spears}, a certain group of wffs could be satisfiable while
(4) all function letters are assigned operations map- another is not, we write T next to those which
ping ordered n-tuples of the members of the set are to be satisfied and F next to those which are
{Britney Spears} onto Britney Spears.
not. (This could be useful in testing the validity of
This model has only one sequence, that which an argument.) The book does not write Ts and
assigns every variable to Britney Spears. Call this Fs, but just wffs and their negations, but I think it
sequence s. Since s(a) (F 1 )B , s satisfies F a. better to stress the semantic nature of this exercise.
Hence, s also satisfies Ga F a. However, s (This is not meant as a replacement for a system of
does not satisfy Ga, since s(a)
/ (G1 )B . (Re- deduction.)
1 B
call that (G ) is .)
We then apply the rules below to the stateBecause there is at least one sequence in at least ments, depending on their main connectives. These
35

break down how the satisfaction of a given wff de- Afterwards, reapply rule for any T (y) B (or
pends on its parts, to see if the proposal is possible. F (y) B) lines that were applied earlier on the
When a certain possibility might be true in more
current branch.
than one way, the tree branches to explore both
possibilities.
Atomic Formulas
T P(t1 , . . . , tn )
Semantic Tree Rules
Check to see whether F P(t1 , . . . , tn ) appears
previously on branch. If so, close branch with 6.
Here are the rules for the primitive connectives.
If not, do nothing.
Negations
T (A )
..
.

F P(t1 , . . . , tn )
Check to see whether T P(t1 , . . . , tn ) appears
previously on branch. If so, close branch with 6.
If not, do nothing.

FA

Strictly speaking, the above rules suffice, since we


could rewrite any wff containing other connectives,
or the existential quantifier, in unabbreviated form.
However, it can be seen that the rules will be equivalent to the following additional tree rules. You
may choose either to use or not use these.

F A
..
.
TA
Conditionals
T (A B)
..
.
FA

Disjunctions
T (A B)
..
.

TB

F (A B)
..
.

TA

TA
FB

TB

F (A B)
..
.
FA
FB

Universal Quantifier
T (x) A [x]
..
.

Conjunctions
T (A B)
..
.

T A [t1 ]
..
.
T A [tn ]
(for all closed terms ti occurring on this branch of
tree)
F (x) A [x]
..
.

TA
TB
F (A B)
..
.
FA

F A [c]
(where c is some new constant unused in tree)
36

FB

Biconditionals
T (A B)
..
.
TA
TB

FA
FB

F (A B)
..
.
TA
FB

FA
TB

Existential Quantifiers
T (x) A [x]
..
.
T A [c]
(where c is some new constant unused in tree)
Afterwards, reapply rule for any T (y) B (or
F (y) B) lines on the current branch.
F (x) A [x]
..
.

(2) You will have applied the rules to every wff in


some branch without it closing.
In this case, the branch remaining open can be
used to construct a model and sequence for the
original hypothesis. (This will be a countermodel to B if you assumed B was unsatisfied,
etc.) Choose a domain with as many entities as
there closed terms on the branch, and assign
each term ti to one of the entities of the domain,
(ti )M , and, for each n-place predicate letter P
on the branch, include h(t1 )M , . . . , (tn )M i in
(P)M iff the assumption T P(t1 , . . . , tn ) occurs on that branch. The described model will
have a sequence that satisfies all and only those
wffs that have a T next to them in the branch.
(3) You will be stuck in an infinite loop of steps, and
never finish the tree. In this case, there is likely
a model that will have a sequence that satisfies
all the initial assumptions, but it may be one
with an infinitely large domain. With creative
insight, you may be able to determine what this
model will be like, but there is no algorithm for
doing this.

By changing the initial assumption, we can use


trees also to test whether or not a sentence is conF A [t1 ]
tradictory (by, e.g., assuming T B at the start), or
..
.
whether two sentences are equivalent (by deterF A [tn ]
mining whether their biconditional can be unsatis(for all closed terms ti occurring on this branch of fiable), and so on.
tree)
When a tree branches, youre considering different
ways of making good on your original assumption.
Examples:
The current branch is considered everything you
can reach by tracing upwards but not downwards
from the current location.
1. Let us first use a tree to show that
Rules can be applied in any order, but generally,
its more helpful to apply other rules before the T
(A B) and T (x) A [x] rules.
(x)(F x Gx), (x)(Gx Hx)
 (x)(F x Hx)
If you continue this procedure, you will achieve
one of three results:
(1) Every branch of the tree will close.
In this case, the initial assumption turned out
We do this by exploring the possibility of a seto be impossible. Therefore,  B. The tree
quence satisfying the premises but not the conitself can be transformed into a proof in the
clusion, and show that this is impossible.
metalanguage that B has no countermodels.
37

T (x)(F x Gx)
T (x)(Gx Hx)
F (x)(F x Hx)
F (F a Ha)
T Fa
F Ha
T (F a Ga)

F Fa
6

F (((x) F x) ((y) Gy)) (x)(F x Gx)


T ((x) F x) ((y) Gy)
F (x)(F x Gx)
T (x) F x
T (y) Gy
T Fa
T Gb
F F a Ga

T Ga
T (Ga Ha)
F Ga
6

F Fa
6

T Ha
6

F Ga
F F b Gb

F Fb

F Gb
6

Although two branches closed, one remains


open. We can use this to construct a countermodel, M. Let D = {, }, (a)M = ,
( b)M = , (F 1 )M = {}, (G1 )M = {}. In
The above can easily be transformed into a metany such model, no sequence satisfies the above.
alanguage proof. Such a proof would begin:
Hence this wff is not logically valid.
Suppose for reductio ad absurdum there is some
sequence s in some model M such that s satis- 3. For an example of an infinite tree, consider one
attempting to show that
fies (x)(F x Gx) and (x)(Gx Hx)
but not (x)(F x Hx). By the definition
2 (x) (y) Rxy F a
of satisfaction, it there must be some other sequence s0 , differing from s by at most what it
assigns to the variable x that does not satisfy
(F x Gx); let us call the entity s0 assigns to
x, . [ plays the role of a in the tree, though
we should not assume anything about the constant a.] Any sequence that assigns to x
will satisfy F x but not Hx . . . , and so on,
matching the lines of the tree. Branching will
result in a proof by cases in the metalanguage,
where each case leads to a different contradiction.
2. We now show that

2 (((x) F x) ((y) Gy)) (x)(F x Gx)

We do this by constructing a countermodel via


a tree. We do this by assigning F to the above.
38

which looks like this:


F (x) (y) Rxy Ga
T (x) (y) Rxy
F Ga
T (y) Ray
T Rab
T (y) Rby
T Rbc
T (y) Rcy
T Rcd
T (y) Rdy
T Rde
..
.
Clearly, there is no end to this tree, but its also
pretty clear that it does describe a model. Let
D = the set of natural numbers, (R2 )M be the
less than relation, (G1 )M be the property of being odd, and the constants stand for the natural
numbers in order, beginning with (a)M = 0.

Mendelson goes so far as to give metatheoretic


proofs that whenever a tree closes, the wff(s) in
question is (are) unsatisfiable (or logically true, if
F was assumed), and that the appropriate kind of
model exists if the tree doesnt close, and so on.

D.

An Axiom System

As our system of deduction for predicate logic, we


introduce the following. (Below, A , B, and C
are used as schematic letters rep-resenting wffs, x
as a schematic letter for individual variables, and
t for terms. is used as a metalinguistic variable
ranging over sets of object-language wffs.)

The First-Order Predicate Calculus (System PF)1


Definition: An axiom of PF or logical axiom is
any wff of one of the following five forms:
(A1) A (B A )
(A2) (A (B C ))
((A B) (A C ))
(A3) (A B) ((A B) A )
(A4) (x) A [x] A [t], for all instances such that
t is free for x in A [x]
(A5) (x)(A B) (A (x) B), for all
instances such that A contains no free
occurrences of x.

Definition: A first-order theory K is an axiomatic system in the language of predicate logic


that can be obtained from the above by adding zero
or more proper or non-logical axioms.
Proper axioms are added to represent the basic
principles of a certain area of thought. E.g., we
might form a first-order theory for the study of the
solar system by using the constants a1 , . . . , a9
as the nine planets (and Pluto), b1 for the sun,
etc., using O2 , for the orbiting relation, etc., and
adding as axioms certain laws of physics stated in
the language of predicate logic, and so on.
Note that every theorem schema of L corresponds to a theorem schema of PF (or any other
first-order theory). Since L is complete, if A is a
truth-table tautology, then `PF A . You may cite
this in your proofs by writing Taut as justification. Similarly, every derived rule of L corresponds
to a derived rule of PF. You may make use of this
in your derivations by using the abbreviations on
p. 21, or simply writing SL [System L] as justification. Alternatively, you may utilize the notation used within your favorite natural deduction
system for propositional logic, or abbreviate the
names given with the derived rules listed in sec. 2.5
of your textbook.
All first-order theories, including the barebones PF, have the following additional derived
rules.

Definition: The inference rules of PF are:


Modus ponens (MP): From (A B) and A , infer B.
Generalization (Gen): From A infer (x) A .

Result (UI, O or rule A4):


(x) A [x] ` A [t], where t is free for x in A [x].
(Universal instantiation.)

Abbreviation:
` A and simply ` A
Proof:
are defined as you might expect. In this unit, unless Follows directly from (A4) by MP.
otherwise specified, ` means `PF .
1

PF stands for Full Predicate calculus, i.e., the calculus within a syntax including all possible constants, predicate letters
and function letters. The Pure Predicate calculus, PP, is the same, but excluding all constants or function letters from the
syntax. Mendelson gives these abbreviations in Chapter 3. There are predicate calculi that are neither pure nor full.

39

Result (EG, I or E4): A [t, t] ` (x) A [t, x],


where t is free for x in A [t, x]. (Existential generalization; the repetition of t here indicates that
not all the occurrences of t need to change.)

Result (DT): If {C } ` A and in the proof


B1 , . . . , Bn of A from {C }, no step is obtained by an application of Gen that both (i) is
applied to a previous step that depends upon having C in the premise set, and (ii) uses a variable
occurring free in C , then ` C A .

Proof:
The following schema shows the object-language
Proof:
steps necessary.
1. A [t, t] ` A [t, t]
(Premise) (1) Assume the complex antecedent of DT. We will
show, using proof induction, that for every step
2. A [t, t] ` A [t, t]
1 SL (DN)
Bi in the proof B1 , . . . , Bn of A from {C },
3. ` (x) A [t, x] A [t, t]
A4
that it holds that ` C Bi . We are enti4. A [t, t] ` (x) A [t, x]
2, 3 SL (MT)
tled to assume that we have already gotten
5. A [t, t] ` (x) A [t, x]
4 definition of
` C Bj for all steps Bj prior to Bi .
e (2) Because B is a step in the proof of A from
i
{C }, the cases we have to consider are that:
(a) Bi is a member of , (b) Bi is C , (c) Bi is
an axiom, (d) Bi follows from previous steps
Result (Sub or Repl): A [x] ` A [t], where t is
in the proof by MP, and (e) Bi follows from a
free for x in A [x]. (The rule of substitution or
previous step by an application of Gen obeying
replacement of free variables.)
the restriction mentioned above. We consider
each case.
Case (a). Bi is a member of . Hence ` Bi ,
and by SL, ` C Bi .
Proof:
Case (b). Bi is C . Then C Bi is simSchematically:
ply C C , a simple tautology, whence
1. A [x] ` A [x]
(Premise)
` C Bi .
2. A [x] ` (x) A [x]
1 Gen
Case (c). Bi is an axiom. Hence ` Bi and by
3. A [x] ` A [t]
2 UI
SL, ` C Bi . A fortiori, ` C Bi .
e
Case (d). Bi follows from previous members
of the series by MP. Therefore there are preE. The Deduction Theorem in
vious members of the series Bj and Bk such
that Bj takes the form Bk Bi . By the
Predicate Logic
inductive hypothesis, we already have both
` C Bk and ` C (Bk Bi ). By
The deduction theorem does not hold generally
SL, ` C Bi .
in the first-order predicate calculus PF, nor would
we want it to. After all, in the semantics of prediCase (e). Bi follows from a previous member of
cate logic, it is not the case that  F x (x) F x,
the series by an application of Gen obeying the
and similarly in the system of deduction, while we
restriction mentioned above. Therefore, there
have F x ` (x) F x by Gen we should not have
is a previous step Bj such that Bi takes the
` F x (x) F x. We therefore state and prove
form (x) Bj for some variable x. Because of
the deduction theorem in the following restricted
the restriction, either obtaining Bj did not deform:
pend on having C in the premise set, or C does
40

not contain x free. In the first subcase, ` Bj


and hence by Gen, we have ` (x) Bj , i.e.,
` Bi . By SL, then, ` C Bi , as usual. In
the second subcase, we first note that we have
` C Bj by the inductive hypothesis. By
Gen, we obtain ` (x)(C Bj ). Because
C does not contain x free, as an instance of (A5)
we have ` (x)(C Bj ) (C (x) Bj ).
By MP, ` C (x) Bj , i.e., ` C Bi .e

F.

Doing without Existential


Instantiation

The natural deduction rule of Existential Instantiation or Existential Elimination (EI, O) recommends that from a given existentially quantified statement, one should infer the corresponding
statement with the quantifier removed, and some
new or unused constant in place of the variable.
(Mendelson calls this Rule C for choice.) HowObviously, in the proof establishing that F x ` ever, note that for most wffs A [x] it is not the case
(x) F x, Gen is applied to a step that both depends that
(x) A [x]  A [c]
on F x and makes use of a variable occurring free
in F x. (Note that invoking the Sub or Repl for any constant c. Thus, e.g., we ought not have
derived rule requires the same.) So we cannot con- (x) F x ` F (c) for any constant c, even an unused
one. Within an interpretation M, every constant c
clude ` F x (x) F x.
However, such is not the case with the proof: is assigned a fixed entity of the domain, viz., (c)M .
There simply is no inferring that (c)M is in the
extension of the predicate letter F , viz., (F )M ,
1. (x) F x ` (x) F x
Premise simply on the assumption that something is. This
2. ` (x) F x F y
(A4) so-called rule of natural deduction is logically
3. (x) F x ` F y
1, 2 MP invalid, and should be done away with. Luckily,
4. (x) F x ` (y) F y
3 Gen we dont need it. Bearing in mind Exercise 2.32d
from your homework, we have:
This we transform as follows:
(New-DR) If A does not contain x free, then
(x)(B A ) ` (x) B A .
` (x) F x (x) F x
` (x) F x F y
` (x) F x ((x) F x F y)
` (x) F x F y
` (y)((x) F x F y)
` (y)((x) F x F y)
((x) F x (y) F y)
7. ` (x) F x (y) F y

1.
2.
3.
4.
5.
6.

(Taut) With this, we have the following conversion from


A4 a pseudo-proof that uses EI to a proof that doesnt.
2 SL
1, 3 SL PSEUDO-PROOF:
4 Gen
(x) F x ` (x)(F x Gx)
A5
5, 6 MP

1.
2.
3.
4.

(x) F x ` (x) F x
(x) F x ` F a
(x) F x ` F a Ga
(x) F x ` (x)(F x Gx)

Again, this is not the most eloquent proof. (Getting


line 4 by SL is silly, since its an axiom, and indeed, the same one introduced at line 2.) However, CONVERSION:
thats what case (d) called for and following the
(x) F x ` (x)(F x Gx)
rote procedure always works.
From here on out, you can use this to shorten 1. (x) F x ` (x) F x
your proofs, but bear in mind the restrictions. You 2. F x ` F x
need to make sure that you dont apply it when 3. F x ` F x Gx
youve used Gen on a variable appearing free in an 4. F x ` (x)(F x Gx)
assumption!
5. ` F x (x)(F x Gx)
41

Premise
1 EI/O
2 SL
4 EG

Premise
Premise
2 SL
3 EG
4 DT

5 Gen (2) Let (x1 ) C1 [x1 ], . . . , (xm ) Cm [xm ] be the


members of the pseudo-proof to which EI is
6 New-DR
applied (in order), and let C1 [c1 ], . . . , Cm [cm ]
1, 7 MP
be the results of these EI steps (in order).
(3)
It is obvious that if we expand by
We will show that whenever a pseudo-proof is
adding {C1 [c1 ], . . . , Cm [cm ]}, we can prove
possible with EI/O, a conversion to a real proof
B without EI, or, in other words,
similar to the above is always possible. Stated very
{C
1 [c1 ], . . . , Cm [cm ]} ` B, since we are simsimply: replace every step arrived at by EI with a
ply adding the results of our EI steps to our
premise similar to it, except containing a variable
premise set.
not occurring free in any lines of the pseudo-proof
instead of the new constant. Continue the proof (4) Because no application of Gen is made to a free
variable of Cm [cm ] after it is introduced, by the
as normal, then push the new premise through
deduction theorem we have:
with the deduction theorem. Generalize, and apply
6. ` (x)(F x (x)(F x Gx))
7. ` (x) F x (x)(F x Gx)
8. (x) F x ` (x)(F x Gx)

New-DR. Then, along with the existential state {C1 [c1 ], . . . , Cm1 [cm1 ]} ` Cm [cm ] B
ment to which you applied EI, and MP, you get the
(5) Pick some variable y that does not occur free
result. Let us state this result more formally.
anywhere in the series A1 , . . . , An (preferDefinition: Pseudo-derivability or ` : ` B
ably xm ). Replace cm with y everywhere in
iff there is an ordered series of wffs A1 , . . . , An where
the proof for {C1 [c1 ], . . . , Cm1 [cm1 ]} `
An is B, and for each step Ai where 1 i n, eiCm [cm ] B. The result will also be a proof.
ther:
Notice that cm does not occur anywhere in the
(a) Ai is an axiom;
set {C1 [c1 ], . . . , Cm1 [cm1 ]}, since it was
(b) Ai is a member of ;
new when we introduced it. So there is no
(c) there is some previous step in series, Aj , such
reason it must be used rather than the variable.
that Aj takes the form (x) C [x], and Ai takes (6) Hence, {C1 [c1 ], . . . , Cm1 [cm1 ]} `
the form C [c], where c is a constant that does
Cm [y] B.
not occur in any previous step of the pseudo- (7) By Gen, {C1 [c1 ], . . . , Cm1 [cm1 ]} `
proof, nor in B, nor in any premise in (i.e.,
(y)(Cm [y] B).
Ai was derived by the pseudo-rule, EI);
(8) Because y does not occur free in the proof, and
(d) Ai follows from previous steps by MP;
B is An , B does not contain y free. Hence,
(e) Ai follows from a previous step by Gen, but not
by (New-DR), {C1 [c1 ], . . . , Cm1 [cm1 ]} `
using a variable that is free in some previous
(y) Cm [y] B.
step of the series C [c] arrived at by EI.
(9) Because Cm [cm ] was arrived at in the
original pseudo-derivation by EI on
some wff of the form (xm ) Cm [xm ],
which either is (y) Cm [y], or can be
used to get it, it must be that
Result: If ` B then ` B.
{C1 [c1 ], . . . , Cm1 [cm1 ]} ` (y) Cm [y]. Thus,
(The non-necessity of EI.)
by MP, {C1 [c1 ], . . . , Cm1 [cm1 ]} ` B.
(10) By the same procedure described in steps (5)
(9), we can eliminate Cm1 [cm1 ] from the
premise set, and so on, until we have elimProof:
inated everything except the members of .
Hence, ` B.
e
(1) Assume ` B, and let A1 , . . . , An be the This proof shows us that we dont need Existensteps of the pseudo-proof.
tial Instantiation, very much like we dont need
42

Conditional Proof. However, because we have this


metatheoretic result, we now know that whenever
we are able to carry out a pseudo-proof making use
of the rule, we could transform it into a proof that
does not make use of the rule. Hence, it is innocuous to pretend as if we do have such a rule. From
here on out we allow ourselves to make use of
Rule C or EI in our proofs, even though strictly
speaking there is no such rule. We must be careful,
however, to obey the restrictions for what counts
as a pseudo-derivation, as defined in the previous page. I like to mark the pseudo-steps with `
rather than `; you can stop using the * when the
dummy constant no longer appears.

G.

Proof:
Suppose for reductio ad absurdum that there is a
wff A such that ` A and ` A . By soundness,
 A and  A . In other words, every sequence
in every model satisfies both A and A . But a
sequence satisfies A iff it does not satisfy A , so
any arbitrary sequence will both satisfy and not
satisfy A , which is absurd.
e
Ultimately, we also want to prove the completeness
of PF. Well get there, but we first need to prove a
number of lemmas.

Result (Denumerability of wffs): The set of


wffs of the language of predicate logic is denumerable, i.e., we can generate a oneone correspondence between the set of natural numbers
the set of wffs.

Metatheoretic Results for


System PF

Result (Soundness): For all wffs A , if ` A


then  A .

Proof:
Every instance of the axiom schemata is logically
valid. (This can be verified using semantic trees.)
MP and Gen preserve logical validity. (In the case
of Gen, note that a wff is logically valid iff it is
satisfied by all sequences in all models. If an open
wff is satisfied by all sequences in a model, then the
corresponding wff with one of the variables bound
with an initial quantifier will also be satisfied by
all sequences in that model.) If ` A , then A is
derivable from the axioms by some finite number
of steps of MP and Gen, each preserving validity,
and hence,  A .
e

Result (Consistency): There is no wff A such


that both ` A and ` A .

Proof:
All wffs are built up of the simple signs: (, , ,
), , , , as well as the individual constants,
variables, predicate letters and function letters.
A. We define a function g that associates each simple sign with a different natural number.
(1) Let g(() = 3, g()) = 5, g(, ) = 7, g() =
9, g() = 11, and g() = 13.
(2) If c is a constant, and n is the number of its
subscript (if c has no subscript, then n = 0),
then depending on which letter of the alphabet
is used, let k be either 1, 2, 3, 4 or 5 (1 for a, 2
for b, etc.), and let g(c) = 7 + 8(5n + k).
(3) If x is a variable, and n is the number of its
subscript (if x has no subscript, then n = 0),
then depending on which letter of the alphabet
is used, let k be either 1, 2, or 3 (1 for x, 2 for
y and 3 for z), and let g(x) = 13 + 8(3n + k).
(4) If F is a function letter, and n is the number of
its subscript (if F has no subscript, then n = 0)
and m is the number of its superscript, then
depending of which letter of the alphabet is
used (f through l), let k be one of 1 through
7, and let g(F ) = 1 + 8(2m 37n+k ).

43

(5) If P is a predicate letter, and n is the number


of its subscript (if P has no subscript, then
n = 0) and m is the number of its superscript,
then depending of which letter of the alphabet is used (A through T ), let k be one of 1
through 20, and let g(P) = 3 + 8(2m 320n+k ).
We can now define the value of g for formulas in
virtue of its value for simple signs.
(6) Let (p0 , p1 , p3 , p4 , . . . ) be the sequence of prime
integers in order starting with 2. (There is no
greatest prime.) Hence p0 = 2, p1 = 3, p3 = 5,
and so on.
(7) Let 0 1 2 r be some string of signs from
the syntax of predicate logic. It might be something ill-formed like )xa12 (, or it might
be a well-formed formula like (x1 )(F (x1 )
F (x1 )). Here, 0 is the first sign in the string,
1 is the second sign, and so on. For all such
g( )
g( )
strings, let g(0 1 2 . . . r ) = p0 0 p1 1
g( )
r)
p2 2 . . . pg(
.
r
(8) For a given expression A , the number g(A )
is called the Gdel number of A . Notice that
because the Gdel numbers of the different
simple signs are all different, so are the Gdel
numbers of strings of signs, since for different strings, these numbers will have different
prime factorizations.
(9) Let N {0} be the set of natural numbers
greater than zero, C the subset of natural numbers that are Gdel numbers of wffs, and W the
set of all wffs. Consider now the function w(x)
from N {0} onto C, whose value for x as argument is the xth smallest natural number that
is the Gdel number of a wff of predicate logic.
Consider also the function s(x) from C onto
W, whose value for any Gdel number of a wff
is that wff. Then the function s(w(n + 1)), is a
11 correspondence between the set of natural
numbers and the set of wffs.
e

Proof:
As above, with the set of closed wffs (and their
Gdel numbers) substituted for W (and C).
Were inching closer to completeness. Before moving on, I want to make note of some differences between my proof of completeness and Mendelsons.
Mendelson prefers to speak of different first-order
theories. Remember than a first-order theory is
an axiomatic system gotten by adding additional
axioms to the axioms of PF. Really, talking about
what theorems are provable in a given system K,
where the additional axioms of K are the members
of a set is equivalent to speaking about what
is provable in the barebones system PF beginning
with as a set of premises, since clearly:
`K A iff `PF A
Its really a matter of taste whether we view the
proof as being about different theories or as being about different premise sets. I prefer to speak
about premise sets, since that we dont have to deal
with any sense of ` other than `PF . But the
differences are trivial.
Before moving on, let us introduce some new
metalinguistic definitions.
Definition: A set of wffs is said to be consistent iff there is no wff B such that both ` B and
` B . (Otherwise, is inconsistent.)
Definition: A set of wffs is said to be maximal
iff for every closed wff B, either B or B .
Definition: A set of wffs is said to be universal
iff for every wff B[x] that contains at most x free,
if it is the case for all closed terms t that B[t] ,
then (x) B[x] .
We now move on to our next important Lemma on
the way to completeness.

Result (LEL): If is a consistent set of closed


wffs, then there is a set of closed wffs such that:
(a) , (b) is consistent, (c) is maximal, and (d) is universal. (The Lindenbaum
Extension Lemma.)

Corollary: The set of closed wffs is also denumerable.

44

Proof:
(1) Assume that is a consistent set of closed wffs.
(2) For convenience, we assume that none of the
constants e, e1 , e2 , e3 , . . . , etc., occur anywhere in the wffs in .2
(3) By the denumerability of the set of closed wffs
of the language, we can arrange them in an
infinite sequence:
A1 , A2 , A3 , . . . , etc.
Making use of this sequence, let us recursively
define an infinite sequence of sets of wffs:
0 , 1 , 2 , . . . , etc.
As follows:
a) Let 0 = .
b) We define n+1 in terms of n in one of the
following three ways:
(i) if n {An+1 } is consistent, then let
n+1 = n {An+1 };
(ii) if n {An+1 } is inconsistent
and An+1 does not take the form
(x) B[x], then let n+1 = n
{An+1 };
(iii) if n {An+1 } is inconsistent
and An+1 does take the form
(x) B[x], then let n+1 = n
{An+1 } {B[ex ]}, where ex is
the first member of the sequence
e, e1 , e2 , e3 , . . . , that does not
occur in n .
(4) Let be the union of all of the members of the
-sequence (i.e., 0 1 2 . . . etc.)
(5) Obviously, . This establishes part (a) of
the consequent of the Lemma.
(6) Every member of the -sequence is consistent.
We prove this by mathematical induction.
Base step: 0 is , and it is consistent ex hypothesi.
Induction step: Suppose n is consistent. It
follows that n+1 is consistent by a proof by
cases:
2

Case (i): n+1 = n {An+1 } and n


{An+1 } is consistent, so n+1 is
consistent.
Case (ii): n+1 = n {An+1 }, and n
{An+1 } is inconsistent.
Hence there is some B such
that both n {An+1 } ` B
and n {An+1 } ` B.
By SL, n {An+1 } ` B
B.
Because, An+1 is closed, it follow by DT, that n ` An+1
(B B).
But ` (B B) by SL.
By MT, n ` An+1 .
Suppose for reductio that n+1
is inconsistent.
So there is some C such that
n {An+1 } ` C and n
{An+1 } ` C .
By reasoning parallel to the
above, by SL and the deduction theorem, we also have
n ` An+1 .
So n is inconsistent.
This contradicts the inductive
hypothesis. Hence n+1 is
consistent.
Case (iii): n+1 = n {An+1 }{B[ex ]},
n {An+1 } is inconsistent and
An+1 takes the form (x) B[x].
By the same reasoning as
in the previous case, n `
An+1 .
Suppose for reductio that n+1
is inconsistent.
So there is some C such that
n {An+1 } {B[ex ]} `
C and n {An+1 }
{B[ex ]} ` C .
By SL, n {An+1 }
{B[ex ]} ` C C .
By DT, n {B[ex ]} `
An+1 (C C ).

If this assumption is not warranted, we could use another denumerable sequence of constants, e.g., the bs or the cs, or
even add a new sequence of constants o, o1 , o2 , o3 , . . . , to the language if need be.

45

By MP, n {B[ex ]} ` C
c) So for all closed wffs A1 , A2 , . . . , etc., either it or its negation is included in .
C .
d) This establishes part (c) of the consequent
Because An+1 is closed and it
takes the form (x) B[x], the
of the Lemma.
wff B[ex ] is also closed.
(9) Finally, is also universal.
By DT, n ` B[ex ] (C
a) We show this by reductio. Suppose othC ).
erwise, i.e., suppose that there is a wff
` (C C ), and so by SL,
B[x] that contains at most x free, such
n ` B[ex ].
that for all closed terms t, B[t] , but
ex is not included in n .
(x) B[x]
/ .
Hence, we can replace ex
b) (x) B[x] is closed, so because is maxiwith the variable x throughmal, it must be that (x) B[x] .
out the proof for n ` B[ex ]
c) Because (x) B[x] is closed, it also foland the result will also be a
lows that (x) B[x] is a member of the A proof. Hence n ` B[x].
sequence, i.e., (x) B[x] is An+1 for some
By Gen, n ` (x) B[x],
number n.
which is the same as n `
/ ,
d) Obviously, however, since (x) B[x]
An+1 .
it follows that n+1 is not obtained from
So n is inconsistent, which
n using case (i).
contradicts the inductive hye) Nor was it obtained using case (ii), since
pothesis.
An+1 is of the form (x) B[x].
Hence n+1 is consistent.
f) This leaves case (iii), so n+1 is n
{An+1 } {B[ex ]}.
(7) It follows from (6) that is consistent.
g) Hence for some x, B[ex ] n+1 and so
a) Note that the -sequence is constantly exB[ex ] .
panding: For all j and k such that j < k,
h) But by our assumption, it holds for all
j k . Crudely, can be thought of as
closed terms t that B[t] .
the upper limit of the expansion.
i) All constants, ex included, are closed
b) So every finite subset of is a subset of
terms, so B[ex ] . j) Hence, both
some i for some suitably large i.
` B[ex ] and ` B[ex ].
c) However, every proof from has only a fij) However, this is impossible, because we
nite number of steps, and hence only makes
have already shown to be consistent.
use of a finite subset of .
k) Our supposition has been shown to be imd) If there were some B such that both `
possible, hence is universal.
B and ` B, for some suitably large i,
l)
This establishes part (d) of the consequent
it would have to be that both i ` B and
of the Lemma.
i ` B.
e) This is impossible because all the members
(10) By suitably defining , we have shown each of
of the -sequence are consistent by (6).
parts (a)-(d) of the consequent of the Lemma on
f) Hence, is consistent.
the basis of the assumption of its antecedent.
g) This establishes part (b) of the consequent
Hence, the Lemma is established.
e
of the Lemma.
(8) is obviously maximal as well.
a) All closed wffs are members of the sequence A1 , A2 , . . . , etc.
b) For each Ai , either it or its negation is a
member of i , and i .

Weve just shown that beginning with any consistent set of sentences, we can keep adding to it
ad infinitum to get a maximally consistent set of
sentences of the language.
We pause again for a new definition:

46

Example: The extension of F 1 under M, viz.,


(F 1 )M , will include the term a just in case
F 1 (a) , and will exclude a just in case
F 1 (a) , and so on.

Definition: A model or interpretation M is a denumerable model iff its domain of quantification


D is denumerable (as defined on p. 5).

Result (MCL): If is a consistent, maximal,


and universal set of closed wffs, then there is at
least one denumerable model for . (The Maximal Consistency Lemma.)

Proof:
(1) Assume that is a consistent, maximal, and
universal set of closed wffs. We can then describe a denumerable model M for using the
following procedure.
(2) Essentially, well let all the closed terms of the
language stand for themselves. (Another possible way of constructing a model would be to
let each closed term stand for its Gdel number. However, let us proceed using the former
method.)
(3) Let the domain of quantification D of M be the
set of closed terms of the language of first-order
predicate logic. Note that there are denumerably many closed terms, so M is a denumerable
model.
(4) For each constant c, let (c)M be c itself. So, for
example, (a)M is a, (b12 )M is b12 , etc.
(5) For each function letter F with superscript n,
let (F )M be that n-place operation on D which
includes all ordered pairs of the form

(7) We must now prove that this interpretation M


is a model for , i.e., that for all wffs A , if
A , then M A . We will actually prove
something stronger, i.e., that for all closed wffs
A , A iff M A . ( only contains closed
wffs, so we need not worry about open wffs.)
We prove this by wff induction.

hht1 , . . . , tn i, F (t1 , . . . , tn )i
i.e., the operation that has the closed term
F (t1 , . . . , tn ) as value for ht1 , . . . , tn i as argument.
Example: The operation (f 1 )M , which M assigns to the monadic function letter f 1 , will
contain such ordered pairs as ha, f 1 (a)i,
hb12 , f 1 (b12 )i, and hf 1 (a), f 1 (f 1 (a))i,
and so on.
(6) For each predicate letter P with superscript
n, let (P)M be that subset of Dn that includes the n-tuple ht1 , . . . , tn i iff the atomic
wff P(t1 , . . . , tn ) is included in .
47

Base step: A is a closed atomic formula.


Hence, A takes the form P(t1 , . . . , tn )
where P is a predicate letter with superscript n and t1 , . . . , tn are closed terms.
Because closed terms contain no variables,
all sequences in M will associate each ti
with itself.
So by the definition of satisfaction,
all sequences in M will satisfy A iff
ht1 , . . . , tn i (P)M .
By the definition of truth in an interpretation, M A iff ht1 , . . . , tn i (P)M .
By our characterization of M under
(6) above, ht1 , . . . , tn i (P)M iff
P(t1 , . . . , tn ) .
So P(t1 , . . . , tn ) iff M A , i.e.,
A iff M A .
Induction step: Assume as inductive hypothesis that it holds for all closed wffs B with
fewer connectives than A , that B iff
M B. We will then show that it holds for the
complex closed wff A that A iff M A .
This proceeds by a proof by cases on the makeup of A .
Case (a): A takes the form B, where B is
also closed and has one fewer connective than A .
By the inductive hypothesis,
B iff M B.
Because is consistent, if A
, then B
/ .
Because is maximal, if B
/
, then A .
So B
/ iff A .

Hence A iff not-M B.


Since B is closed, M B iff
not-M B.
Hence, A iff M B, i.e.,
A iff M A .
Case (b): A takes the form B C , where
B and C are closed wffs with fewer
connectives.
First we prove that if A then
M A .
Suppose A .
Since is maximal and consistent, B or B , but
not both, and likewise with C .
However, because B C
and is consistent, it cannot be
that both B and C ,
so either B or C .
By the inductive hypothesis,
B iff M B, and C
iff M C .
By the same reasoning given for
the previous case, B iff
M B.
So either M B or M C .
By the definition of satisfaction
for conditionals, it follows that
M B C , i.e., M A .
Now we prove that if M A then
A .
Suppose M A , i.e., M B
C.
Because B and C are closed,
by the definition of satisfaction
for conditionals, we have either
M B or M C .
By the inductive hypothesis,
B iff M B and C iff
M C .
Again, by the reasoning given
for the previous case, B
iff M B.
So either B or C .
Because is maximal, either
B C or (B C )
.
48

If (B C ) , then
would be inconsistent, because
` (B C ) B and
` (B C ) C .
So B C , i.e., A .
Putting these two results together,
we get that A iff M A .
Case (c): A takes the form (x) B[x], where
B[x] contains fewer connectives,
and B[x] contains at most x free.
First we prove that if A then
M A .
Suppose A , i.e.,
(x) B[x] .
Because B[x] contains at most
x free, for all closed terms t ,
B[t] is a closed wff.
Because is maximal, for all
closed terms t, either B[t]
or B[t] .
However, since is consistent,
it must be that for all closed
terms t , B[t] .
By the inductive hypothesis, for
all closed terms t, M B[t].
Because the domain of quantification for M is D, and D
consists of the set of closed
terms, and every closed term
is interpreted as standing for itself, a sequence of M will satisfy
B[x] iff it satisfies B[t] for that
closed term t that gets assigned
to x in that sequence.
Because all sequences of M satisfy B[t] for all closed terms t,
all sequences of M will satisfy
B[x], and hence all sequences
of M will satisfy (x) B[x].
Hence, M (x) B[x], i.e.,
M A .
We now prove that if M A then
A .
Suppose M A , i.e., all sequences of M satisfy (x) B[x].
Hence, all sequences of M sat-

isfy B[x], regardless of what


entity in the domain gets assigned to x.
Because the domain of quantification for M is D, and D
consists of the set of closed
terms, and every closed term
is interpreted as standing for itself, a sequence of M will satisfy
B[x] iff it satisfies B[t] for that
closed term t that gets assigned
to x in that sequence.
So, for all closed terms t,
M B[t].
By the inductive hypothesis, it
follows that, for all closed terms
t, B[t] .
Because is universal, it follows that (x) B[x] , i.e.,
A .
Putting these together, we get that
A iff M A .
(8) By induction, regardless of A s length, A
iff M A . So M is a model for . This establishes the Lemma.
e
If we follow Mendelson and think of a model as
a sort of possible world, a maximally consistent
set of sentences can be thought of as a maximally
descriptive yet consistent description of a possible
world. This lemma says roughly that for every
maximally descriptive consistent description of a
possible world, one exists for which that description is true.

Result (The Modeling Lemma): A set of closed


wffs is consistent iff it has a denumerable
model (i.e., there is at least one denumerable
model for ).

(MLb) If a set of closed wffs is consistent, then


has a denumerable model.
Instead of proving (MLa) directly, we shall prove
the following stronger thesis:
(MLa)* If a set of closed wffs has any model,
then is consistent.
Proof of (MLa)* and (MLa):
(1) Assume the opposite for reductio ad absurdum.
I.e., assume that is a set of closed wffs, and
there is at least one model M for , but that
is inconsistent.
(2) Hence, there is some A such that ` A and
` A .
(3) This means that A and A are each derivable
from the members of along with the axioms
of PF by zero or more applications of MP and
Gen.
(4) All the axioms of PF are logically valid, and
hence true in M.
(5) Similarly, all the members of are true in M
by hypothesis.
(6) However, both MP and Gen preserve truth in
an interpretation, so it must be that both M A
and M A .
(7) By the definition of truth in an interpretation,
every sequence in M satisfies both A and A .
(8) However, a sequence satisfies A iff it does
not satisfy A , so any arbitrary sequence of M
will both satisfy and not satisfy A , which is
absurd.
(9) Hence (MLa)* must be true. Regardless of the
size of the domain, any set of closed wffs that
can be modeled is consistent. This includes
those with denumerable models, so (MLa)* entails (MLa).

Proof of (MLb):
(1) Assume that is a consistent set of closed wffs.
(2) By LEL, there is a set of closed wffs such
that: (a) , (b) is consistent, (c) is
maximal, and (d) is universal.
(3) By MCL, there is an interpretation M that is a
Proof:
denumerable model for .
This biconditional breaks down into:
(4) So for all closed wffs A , if A , then M A .
(MLa) If a set of closed wffs has a denumerable (5) Because , for all closed wffs A , if A
model, then is consistent.
then A .
49

(6) So for all closed wffs A , if A then M A .


(7) Therefore, M is also a denumerable model for
.
e
The following is not needed for completeness, but (6)
is an interesting and surprising result of (MLa)*
and (MLb).
(7)
(8)
Corollary (The Skolem-Lwenheim Theorem):
(9)
If a set of closed wffs of first-order predicate
logic has any sort of model, then it has a
denumerable model.
(10)
Proof:
By the stronger (MLa)*, if has any sort of model,
then it is consistent. By (MLb), if it is consistent, it
has a denumerable model.
Finally we turn to completeness:

e) But A is derivable from B by universal


instantiation, so it would follow that ` A ,
which contradicts our earlier assumption.
f) Hence, {B} is consistent.
By the Modeling Lemma, {B} has a denumerable model. Hence there is an interpretation M such that M B.
But we also have  B, and hence M B.
By the definition of truth in an interpretation,
every sequence in M satisfies both B and B.
However, a sequence satisfies B iff it does
not satisfy B, so any arbitrary sequence of M
will both satisfy and not satisfy B, which is
absurd.
Weve shown our supposition to be impossible,
thereby establishing completeness indirectly.e

Corollary: If  A then ` A .

Proof:
Follows from minor modifications on the above
proof.
e

Result (Completeness): For all wffs A , if  A


then ` A .

Proof:
(1) Suppose  A , but suppose for reductio ad absurdum that it is not the case that ` A .
(2) Let B be the universal closure of A , i.e., if the
free variables of A are x1 , . . . , xn , then B is
(x1 ) . . . (xn ) A .
(3) Universal closure preserves truth in an interpretation, so  B.
(4) B has no free variables left, so B is closed.
(5) The singleton set containing B alone, {B},
must be consistent. Heres a proof of this by
reductio:
a) Suppose there were some C such that
{B} ` C and {B} ` C .
b) By SL, {B} ` C C .
c) Since B is closed, so is B, and so by DT,
we have ` B (C C ).
d) But ` (C C ), so by SL, ` B.

Unfortunately, this proof does not, as in the Propositional Calculus (System L), provide a recipe for
constructing a proof of any given logical truth in
PF. We have simply proven that any given logical truth must be derivable, because if it were not,
there would exist a countermodel to its logical validity.
The completeness of the first-order predicate
calculus was first proven by Kurt Gdel in 1930,
and so this is sometimes called Gdels Completeness Theorem, although his way of proving it
was actually very different from ours. (It was first
proven our way by Leon Henkin in 1949.) However,
Gdel is much more famous for his incompleteness
theorems than his completeness theorem.

H.

Identity Logic

To add identity to first-order predicate logic, we


simply pick a 2-place predicate lettersay I 2 to
use to stand for the identity relation, and make the
appropriate additions to our logical system.

50

Syntax
Officially, the syntax is unchanged. We already had
I 2 as a predicate letter. We are simply fixing its
intended meaning.
However, it is useful to introduce abbreviations
such as the following.
Abbreviations:

Result (Ref=): `PF= t = t, for any term t.


(Reflexivity of identity.)

Proof:
Direct from (A6) by universal instantiation.

t = u abbreviates I 2 (t, u)
t 6= u abbreviates I 2 (t, u)
(1 x) A [x] abbreviates (x) A [x]
[(x) (y)(A [x] A [y] x = y)],
where y is the first variable not occurring in A [x].
(n+1 x) A [x] abbreviates
(y)(A [y] (n x)(x 6= y A [x])),
where y is the first variable not occurring in A [x].

Result (LL/Sub=):
t = u, A [t, t] `PF= A [t, u], for all terms t and
u that are free for all variables in A [x, y], and
where A [t, u] arises from A [t, t] by replacing
some or all occurrences of t with u. (Leibnizs
law)

The above definition defines (2 x) A [x] in terms of


(1 x) A [x] and (3 x) A [x] in terms of (2 x) A [x]
and so on. (We could do even better by beginning
with (0 x) A [x] for (x) A [x].)
Because the syntax is unchanged, the set of wffs
and the set of closed wffs remain denumerable.

Proof:
Derived from (A7) by Gen on both x and y, then
universal instantiation to t and u, and MP 2. It
may be necessary to do some bound variable juggling, but this is no problem.

System of Deduction
Result (Sym=): t = u `PF= u = t, for any
Definition: The first-order predicate calculus
terms t and u. (Symmetry of identity.)
with identity, or System PF = is the system obtained from PF by adding the following axiom and
axiom schema:
Proof:
(A6) (x) x = x
t = u, t = t `PF= u = t is an instance of LL, and
(A7) x = y (A [x, x] A [x, y]),
for all instances in which y is free for x in we have `PF= t = t by reflexivity.
A [x, x], and A [x, y] is obtained from A [x, x]
by replacing some, but not necessarily all, free
occurrences of x with y.
Result (Trans=): t = u, u = v `PF= t = v for
any terms t, u and v. (Transitivity of identity.)
Definition: A first-order theory with identity
[equality] is any first-order theory that has all theorems of PF = formulable in its syntax as theorems
(i.e., it is a theory built on PF in which (A6) is either Proof:
an axiom or theorem, and all instances of (A7) are u = v, t = u ` = t = v is an instance of LL.
PF
either axioms or theorems.) This includes PF= itself.
The deduction theorem and replacement for exisSome easy theorems and derived rules:
tential instantiation are unchanged by the addition.
51

Semantics for Identity Logic


We intend I 2 to stand for the identity relation. So:
Definition: An interpretation M is a normal
model iff, for M, (I 2 )M is the set of all and only
ordered pairs of the form ho, oi of objects o included
in the domain of quantification D of M.
Definition: A wff A is identity-valid iff it is true
for all normal models. I abbreviate this as: = A .
Note that all wffs that are logically valid simpliciter
( A ) are identity-valid (= A ), but not viceversa. Note that (A6) and all allowed instances of
(A7) are identity-valid. (Proving this is homework.)

Some Important Metatheoretic Results


for PF = and other Theories with Identity

Result (Soundness):
For all wffs A , if `PF= A then = A .

(Proof is left as part of an exam question.)

Result (Consistency): There is no wff A such


that `PF= A and `PF= A .

(Proof is left as part of an exam question.)

Result: Any first-order theory K in which (A6)


is an axiom or theorem, and all instances of (A7)
in which A [x, x] is an atomic formula with no
individual constants are either axioms or theorems, is a first-order theory with identity. (The
possibility of reducing (A7).)

Proof:
(1) Assume that K is a first-order theory in which
(A6) is an axiom or theorem, and all instances
of (A7) in which A [x, x] is an atomic formula
with no individual constants are either axioms
or theorems. We shall prove that all instances
of (A7) can be derived regardless of the complexity of A [x, x], by wff induction.
(2) Base step: A [x, x] is atomic. By hypothesis,
(A7) is a theorem of K for all cases in which
A [x, x] is an atomic formula with no individual constants. All others can be obtained by
Gen and universal instantiation.
(3) Induction step: We assume that all instances
of (A7) hold for instances of A [x, x] that are
simpler than a given instance, and need to
show that for the given instance of A [x, x]
(A7) holds as well. This proceeds by a proof by
cases of the possible make-up of the instance
of A [x, x] in question.
Case (a): A [x, x] takes the form B[x, x].
i) Let C [x] be B[z, x]. Clearly,
C [x] is the same complexity as
B[x, x].
ii) By the inductive hypothesis,
we have this instance of (A7):
`K x = y (C [x] C [y]).
iii) By manipulating variables
with Gen and UI, we get:
`K y = x (C [y] C [x]).
iv) Because we have atomic
instances, we have: `K x =
y (x = x y = x), and so
with (A6) and SL we get:
`K x = y y = x.
v) So by SL:
`K x = y (C [y] C [x]).
vi) That is: `K x = y
(B[z, y] B[z, x]).
vii) By Gen on z and UI to x we
get: `K x = y (B[x, y]
B[x, x]).
viii) By SL: `K x = y
(B[x, x] B[x, y]), i.e.,
`K x = y (A [x, x]
A [x, y]).
Case (b): A [x, x] is of form (B[x, x]

52

C [x, x]).
i) By the inductive hypothesis,
we have: `K x = y
(C [x, x] C [x, y]).
ii) By the same procedure
described in the previous case:
`K x = y (B[x, y]
B[x, x]).
iii) By MP:
x = y `K B[x, y] B[x, x].
iv) Similarly:
x = y `K C [x, x] C [x, y].
v) Clearly, if we add
B[x, x] C [x, x] as a further
premise, we could complete a
syllogism, i.e.:
x = y, B[x, x] C [x, x] `K
B[x, y] C [x, y].
vi) By DT 2, we have: `K x =
y ((B[x, x] C [x, x])
(B[x, y] C [x, y])), which
is: `K x = y (A [x, x]
A [x, y]).
Case (c): A [x, x] takes the form
(z) B[x, x, z].
i) By the inductive hypothesis:
`K x = y (B[x, x, z]
B[x, y, z]).
ii) By MP: x = y `K
B[x, x, z] B[x, y, z].
iii) Hence, by UI and MP:
x = y, (z) B[x, x, z] `K
B[x, y, z].
iv) By Gen:
x = y, (z) B[x, x, z] `K
(z) B[x, y, z].
v) By DT 2, we have: `K x =
y ((z) B[x, x, z]
(z) B[x, y, z]), which is:
`K x = y (A [x, x]
A [x, y]).
(4) Hence, regardless of the complexity of A [x, x],
we have the appropriate instance of (A7).
Therefore, all instances of (A7) are theorems of
K.
(5) K is a first-order theory (one built by expanding
PF by adding proper axioms). Hence K has all

instances of (A1)(A5) as axioms. (A6) is either


an axiom or a theorem of K, and all instances
of (A7) are theorems. Hence all axioms of PF=
are theorems of K. Moreover, K has all the inference rules of PF, and hence all the inference
rules of PF= .
(6) All theorems of PF= are derived from (A1)(A7)
by the inference rules. Therefore, all theorems
of PF= are theorems of K.
(7) Therefore, K is a first-order theory with identity.
e

Result: If M is a model for the set of axioms of


PF = , then there is a normal model M* such that
for all wffs A , M A iff M A . (Contracting
Models to Normal Models.)

Proof:
(1) Assume that M is a model for the set of axioms
of PF= .
(2) It does not follow from this that M is a normal model, i.e., it does not follow that (I 2 )M
only consists of ordered pairs of the form ho, oi
of objects o included in the domain of quantification D of M. However, we do know the
following things about (I 2 )M :
a) Because M makes (A6) true, (I 2 )M must
be a reflexive relation in the set-theoretic
sense.
b) Because M makes the instance of (A7),
x = y (x = x y = x), true, and because it is reflexive (so all sequences satisfy
x = x), (I 2 )M must also be a symmetric
relation in the set-theoretic sense.
c) Because M makes the instance of (A7),
x = y (x = z y = z), true, and
because it is symmetric, (I 2 )M must also
be a transitive relation in the set-theoretic
sense.
d) So, (I 2 )M must be an equivalence relation.
e) Let us call this equivalence relation E. For
any object o in the domain D of M , [o]E is
the E-equivalence class on o; i.e., the set of
p such that ho, pi E.

53

satisfies B iff s0 satisfies B. We must show


that it holds for A as well. Proof by cases.
Case (a): A takes the form B. By the inductive hypothesis, s satisfies B iff
s0 satisfies B. By the definition of
satisfaction, s satisfies A iff s does
not satisfy B, and the same holds
for s0 . So, s does not satisfy A iff
s0 does not satisfy A , and hence s
satisfies A iff s0 satisfies A .
Case (b): A takes the form B C . By the
inductive hypothesis, s satisfies B
iff s0 satisfies B and s satisfies C
iff s0 satisfies C . s satisfies A iff it
either does not satisfy B or it does
satisfy C , and similarly for s0 , so s
satisfies A iff s0 satisfies A .
Case (c): A takes the form (x) B[x]. Now,
s will satisfy A iff all sequences
in M differing from s at most with
regard to what entity gets assigned
to x satisfy B[x], and s0 will satisfy
A iff all sequences 0 in M* differing from s0 at most with regard to
what entity gets assigned to x satisfy B[x]. Each such sequence in
M corresponds to such a sequence
0 of M* and vice-versa. By the inductive hypothesis, it will hold that
satisfies B[x] iff 0 satisfies B[x],
so s satisfies A iff s0 satisfies A .
So regardless of the length of A , for such a
sequence pair, s and s0 , s satisfies A iff s0 satisfies A . Such sequence pairs will exhaust the
sequences of M and M*, so it follows that for
all wffs A , M A iff M A .
Base step: A is atomic, i.e., it takes
(6) Obviously, it follows from this that M* is also
the form P(t1 , . . . , tn ). Then s satisfies
a model for the set of axioms of PF= . This
A iff hs(t1 ), . . . , s(tn )i (P)M , and s0
establishes the result.
e
satisfies A iff h[s(t1 )]E , . . . , [s(tn )]E i

(P)M .
By our description of M*

above, h[s(t1 )]E , . . . , [s(tn )]E i (P)M iff


hs(t1 ), . . . , s(tn )i (P)M , so s satisfies A
Result (Completeness): For all wffs A ,
iff s0 satisfies A .
if = A then `PF= A .

(3) We can then construct a normal model M* in


the following way.
a) Let the domain of quantification for M*,
viz., D , be the set of all E-equivalence
classes formed from members of D. I.e.,
if D is {o1 , o2 , o3 , . . . } then let D be
{[o1 ]E , [o2 ]E , [o3 ]E , . . . }.

b) For all constants c, let (c)M be [(c)M ]E .


c) For all function letters F with super
script n, let (F )M be the n-place operation on D that includes the ordered
pair hh[o1 ]E , . . . , [on ]E i, [oq ]E i iff (F )M includes hho1 , . . . , on i, oq i.
d) For all predicate letters P with su
perscript n, let (P)M be the subset
of Dn that includes the ordered ntuple h[o1 ]E , . . . , [on ]E i iff (P)M includes
ho1 , . . . , on i.
(4) It follows that M* is normal. Because (I 2 )M is

E, (I 2 )M is the set of ordered pairs that contains h[o]E , [p]E i iff ho, pi E. Because E is an
equivalence relation, ho, pi E iff [o]E = [p]E .
(5) We now prove that M A iff M A for all
wffs A . Note that this will be the case when
A is satisfied by all sequences in one interpretation when it is satisfied by all sequences in
the other. Each sequence s of M of the form:
o1 , o2 , o3 , . . . corresponds to a sequence s0 of
M* of the form [o1 ]E , [o2 ]E , [o3 ]E , . . . . For each
such sequence pair, s and s0 , it is apparent that
for any term t, s0 (t) is [s(t)]E . We now prove
that for all wffs A , for all such sequence pairs
s and s0 , sequence s (of M) will satisfy A iff
the corresponding sequence s0 (of M*) satisfies
A , by wff induction.

Induction step: Assume as inductive hypothesis that it holds for all wffs B simpler than
A that, for all such sequence pairs s and s0 , s This proof is left as an exam question, but it re54

quires the above possibility of contracting models


to normal models.

55

UNIT 3
PEANO ARITHMETIC AND RECURSIVE FUNCTIONS

A.

The System S

predicate theory, and adding a few additional axioms.


Definition: Stated in English, the Peano postuAny system that has the same mathematical
lates (also called the Peano axioms or the Peano- theorems is called a Peano arithmetic.
Dedekind Axioms) are the following five principles:
(P1) Zero is a natural number.
(P2) Every natural number has a successor which is Mendelsons System S: Syntax
also a natural number.
The new system does not require the addition of
(P3) Zero is not the successor of any natural number. anything new to the syntax of standard first-order
(P4) No two natural numbers have the same succes- predicate logic. In fact, we give the system S a
sor.
less complicated syntax than PF by making the
(P5) If something is both true of zero, and true of the following restrictions:
successor of a number whenever it is true of that
A. There is only one predicate-letter:
number, then it is true of all natural numbers
(i.e., the principle of mathematical induction).
I2
Your book discusses the history of these principles
and again, instead of I 2 (t, u) we write (t = u).
in more detail, but in the late 19th and early 20th
century it was widely believed that all the truths B. There is only one constant:
of number theory (pure arithmetic) could be derived as theorems from these principles. But so
a
long as they are simply stated in English, and not
introduced within a precisely formulated logical
but as an alternative, we use numeral 0.
calculus, this is a difficult supposition to test.
If these are the only truths we take as axiomatic, C. There are three function-letters:
to get truths regarding addition, multiplication, etc.,
wed also need certain principles of set theory, and
f1
f12
f22
the proper axiomatization of set theory is still very
controversial. However, without set theory, we can
but instead of writing f 1 (t), we write t0 , and
obtain more or less the same results by taking the
instead of writing f12 (t, u), we write (t + u),
notions of addition and multiplication as primitive
and
functions within a more or less standard first-order
instead of writing f22 (t, u), we write (t u).
56

D. There are still denumerably many variables, as Axiomatization of S


before.
Hence, all atomic wffs are identity statements. The system is built upon the predicate calculus;
bearing in mind the restrictions on the syntax menOther wffs are built from atomic ones as before.
tioned above, its axioms include instances of axiom
schemata (A1) through (A5) of the predicate calculus. Its only two primitive inference rules are Gen
Mendelsons System S: Semantics
and MP. (The deduction theorem, etc., holds in the
The system S has a single intended interpretation, same form.) We add the following:
its so-called standard model. (Although it does
have other models.)
Definition: A Proper Axiom of S is any one of
(S1)(S8) listed below, or any instance of (S9).
Definition: The standard model for S can be (S1) x = y (x = z y = z)
(S2) x = y x0 = y0
characterized as follows:
0
1. The domain of quantification is the set of natural (S3) 0 6= x
(S4) x0 = y0 x = y
numbers {0, 1, 2, 3, . . .}.
2. The interpretation of the constant a is the num- (S5) x + 0 = x
(S6) x + y0 = (x + y)0
ber zero.
3. The interpretation of the predicate-letter I 2 is (S7) x 0 = 0
0
the identity relation on the set of natural numbers. (S8) x y = (x y) + x
0
4. The interpretation of the function-letter f 1 is the (S9) A [0] ((x)(A [x] A [x ])
(x) A [x])
set of ordered pairs in which the second element
is the number one greater than the first element,
e.g., h0, 1i, h1, 2i and h2, 3i, etc.
The interpretation of f12 is the set of ordered pairs
in which the first element is itself an ordered pair
of natural numbers, and the second element is the
Result: S is a first-order theory with identity.
sum of those two numbers, e.g., hh2, 3i, 5i, etc.
The interpretation of f22 is the set of ordered pairs
in which the first element is itself an ordered pair
of natural numbers, and the second element is the Although (A6) and (A7) of the predicate calcuproduct of those two numbers, e.g., hh2, 3i, 6i, etc. lus with identity are not taken as axioms, they
are derivable as theorems in this system from the
above. It will be recalled from our last unit that
The axioms of system S (listed below) are true
we proved that if (A6) is a theorem, and those inin the standard model. Because S has a model,
stances of (A7) involving atomic formulas are theby the Modeling Lemma, it is consistent. Howorems, then other instances of (A7) follow. Some
ever, because the proof of the Modeling Lemma
of the principles necessary for getting instances
requires mathematical methods such as the princiof (A7) involving atomic wffs are proved below,
ple of mathematical induction in the metalanguage,
some are proved in the book, and some are left as
and system S contains object-language translations
homework.
of these very principles, this proof of consistency
appears somewhat circular. It is customary therefore to state this result in a somewhat weaker way:
e.g., assuming that ordinary mathematical reason- Result: The theorems and derived rules governing (as reflected in the metalanguage) is consistent,
ing reflexivity, symmetry and transitivity of idenso is system S.
57

y = x (y = z x = z)
3 UI3
x=yy=x
(Sym=T) UI2
x = y (y = z x = z)
4, 5 SL
(x) (y) (z)(x = y (y = z x = z))
6 Gen3
(Trans=) follows by UI3, and MP2.
e
4.
5.
6.
7.

tity hold in S, i.e.:


(A6) `S (x) x = x
(Ref=) `S t = t, for any term t.
(Sym=T) `S (x) (y)(x = y y = x).
(Sym=) t = u `S u = t, for any terms t, u.
(Trans=T) `S (x) (y) (z)(x = y
(y = z x = z))
(Trans=) t = u, u = s `S t = s,
for all terms t, u, s.

`S
`S
`S
`S

Result (MI): A [0], (x)(A [x] A [x0 ]) `S


(x) A [x], for any variable x and wff A [x]. (Derived rule for mathematical induction.)

Proof:
This follows from (S9) and MP2.
Proof:
Demonstration of (A6):
1. `S x + 0 = x
(S5)
2. `S x = y (x = z y = z)
(S1)
3. `S (x) (y) (z)(x = y (x = z y = z))
2 Gen3
4. `S x + 0 = x (x + 0 = x x = x) 3 UI3
5. `S x = x
1, 4 MP2
6. `S (x) x = x
5 Gen

(From here on out I shall often ignore the fact that


(S1)(S9) are stated with particular variables, rather
than schematically for all variables or all terms, and
treat, e.g., anything of the form x = x + 0 as if it
counted as (S5); obviously it takes only Gen and UI
to move from x to any other variable x.)

Result (Sub+): `S (x) (y) (z)(x = y


x + z = y + z). (Substitution of identicals for
addition.)

(Ref=) follows directly from (A6) by UI.


For (Sym=T):
Proof:
1. `S x = y (x = z y = z)
(S1) 1. x = y ` x = y
(Premise)
S
2. `S (x) (y) (z)(x = y (x = z y = z)) 2. ` x = x + 0
(S5)
S
1 Gen3
3. `S y = y + 0
(S5)
3. `S x = y (x = x y = x)
2 UI3 4. x = y ` x = y + 0
1,
3
Trans=
S
4. `S x = x (x = y y = x)
3 SL 5. ` x + 0 = x
2 Sym=
S
5. `S x = x
Ref= 6. x = y ` x + 0 = y + 0
4,
5
Trans=
S
6. `S x = y y = x
4, 5 MP 7. ` x = y x + 0 = y + 0
6 DT
S
7. `S (x) (y)(x = y y = x)
6 Gen2 8. x = y x + z = y + z ` x = y x + z =
S
y+z
(Premise)
(Sym=) follows by UI2 and MP.
9. x = y x+z = y +z, x = y `S x+z = y +z
1,8 MP
0
0
(Trans=T):
10. `S (x) (y)(x = y x = y ) (S2) Gen2
1. `S (x) (y) (z)(x = y (x = z y = z)) 11. `S x + z = y + z (x + z)0 = (y + z)0
10 UI2
(S1) Gen3
2. `S x1 = y1 (x1 = z1 y1 = z1 ) 1 UI3 12. x = y x + z = y + z, x = y `S (x + z)0 =
(y + z)0
9, 11 MP
3. `S (x1 ) (y1 ) (z1 )(x1 = y1
0
0
(S6)
(x1 = z1 y1 = z1 ))
2 Gen3 13. `S x + z = (x + z)
58

14. `S y + z 0 = (y + z)0
13 Gen, UI
15. x = y x+z = y+z, x = y `S x+z 0 = y+z 0
12, 13, 14 Trans=, Sym=
16. `S (x = y x + z = y + z)
(x = y x + z 0 = y + z 0 )
15 DT2
17. `S (z)[(x = y x + z = y + z)
(x = y x + z 0 = y + z 0 )]
16 Gen
18. `S (z)(x = y x + z = y + z)
7, 17 MI
19. `S (x) (y) (z)(x = y x + z = y + z)
18 Gen2
e
The proofs of the following are in the book:

Result: Analogues of (S5), (S6) and (Sub+),


flipped.
`S (x) x = 0 + x
`S (x) (y) x0 + y = (x + y)0
`S (x) (y) (z)(x = y z + x = z + y)

1. We add to the syntax of predicate logic the following subnective, which yields a term for any
variable x and wff A [x].
{x|A [x]}
This is read, the set of all x such that A [x].
2. All occurrences of x in a term of the form
{x|A [x]} are considered to be bound.
3. We also choose a two-place predicate letter E 2
to use for the membership relation. An expression of the form (t u) is shorthand for
E 2 (t, u), and (t
/ u) is shorthand for E 2 (t, u).

Axiomatization
The system F contains analogues of axiom
schemata (A1) through (A7) of the predicate calculus with identity (PF= ), the inference rules MP
and Gen, and the following two additional axiom
schemata:
(A8) (x)(A [x] x {y|A [y]}), for all cases
in which the variable y is free for x in A [x].

Result (Com+/Assoc+): Commutativity and


associativity of addition.
`S (x) (y) x + y = y + x
`S (x) (y) (z)((x + y) + z) = (x + (y + z))

(A9) (x)(x {y|A [y]} x {z|B[z]})


{y|A [y]} = {z|B[z]}, where {y|A [y]} and
{z|B[z]} do not contain x free.

Tonights homework includes proving analogous


results for multiplication. (I.e., youll prove substitution within multiplication, (Sub), flipped versions of (S7), (S8) and (Sub), as well as (Com).) We
get from these that S is a theory with identity, as
substitution is allowed in all contexts.

B.

Syntax

The Quasi-Fregean System F

Suppose you wanted to construct an axiomatic system for mathematics, but did not want to take ,
+, 0 , and 0 as primitive, and instead wanted to
define them. One initially attractive way would be
to do this within an axiomatic set theory, in a way
such as the following, which Im calling system
F. This system is not Freges system, but a crude
oversimplification thereof.

Some Intuitive Definitions / Abbreviations


(In the following, x, y and z are the first three variables that do not occur in the terms t and u; note
also that some of these are abbreviations of terms,
others are abbreviations of wffs.)
Set theoretic definitions
(t u) for {x|x t x u}
(t u) for {x|x t x u}
t for {x|x
/ t}
(t u) for (x)(x t x u)
V for {x|x = x}
for {x|x 6= x}
{t} for {x|x = t}
{t, u} for {x|x = t x = u}

59

ht, ui for {{t}, {t, u}}


(t u) for {x| (y) (z) x = hy, zi y t z
u)}
Dom(t) for {x| (y)(hx, yi t)}
Rng(t) for {x| (y)(hy, xi t)}
F ld(t) for (Dom(t) Rng(t))
Inv(t) for {x| (y) (z)(x = hy, zi hz, yi t)}
F nct(t) for (x) (y) (z)(hx, yi t
hx, zi t y = z)
Biject(t) for (F nct(t) F nct(Inv(t)))
Mathematical definitions
(t
= u) for (x)(Biject(x) Dom(x) = t
Rng(x) = u)
Card(t) for {x|x
= t}
0 for Card()
t0 for {x| (y)(y x x {y} t)}
1 for 00
2 for 10
3 for 20 [. . . and so on for other numerals]
N for {x| (y)(0 y (z)(z y z 0 y)
x y)}
F in(t) for (x)(x N t x)
Inf in(t) for F in(t)
Denum(t) for (t
= N)
Ctbl(t) for (F in(t) Denum(t))
(t u) for (x) (y) (z)(x t y u
zyx
= z)
(t < u) for ((t u) (u t))
(t) for {x|x N x < t}
(t + u) for Card(((t) {0}) ((u) {1}))
(t u) for Card((t) (u))

Results
With these definitions in place, one can derive
Peanos postulates as theorems in the following
forms:
(P1) `F 0 N
(P2) `F x N x0 N
(P3) `F x N 0 6= x0
(P4) `F x N y N (x0 = y 0 x = y)
(P5) `F A [0] (x)(A [x] A [x0 ])
(x)(x N A [x])
As well as analogues of Mendelsons other axioms:
(S5F) `F x N x + 0 = x
(S6F) `F x N y N (x + y 0 ) = (x + y)0

(S7F) `F x 0 = 0
(S8F) `F x N y N (x y 0 ) = ((x
y) + x)
Also, we have, e.g.:
`F ((1 x) A [x]) ({x|A [x]} 1)
`F ((2 x) A [x]) ({x|A [x]} 2)
`F ((3 x) A [x]) ({x|A [x]} 3)
And so on.

Disaster
The system F, unfortunately, is inconsistent due to
Russells paradox:
`F {x|x
/ x}
/ {x|x
/ x}
{x|x
/ x} {x|x
/ x}
Proof: Direct from (A8) and universal instantiation.
Whence both `F {x|x
/ x} {x|x
/ x}, and
`F {x|x
/ x}
/ {x|x
/ x}.
Hence `F A for all wffs A , making the system
entirely unsuitable for mathematics. In this system
we have both `F 1 + 1 = 2 and `F 1 + 1 = 3!
Poor Frege.
Homework
Without using Russells paradox or other contradiction, prove `F {x} = {y} x = y.

Some History
In the late 19th century, Euclids axiomatization of
geometry came under new scrutiny. Many mathematicians began to investigate the axiomatization
of arithmetic as well. In 1879 German mathematician Richard Dedekind surmised that five principles formed the basis of all pure arithmetic.
In the 1880s and 1890s, the adequacy of those
five principles (and others) was studied in depth,
most importantly by a group of Italian mathematicians lead by Giuseppe Peano. In order to consider
them more systematically, Peano urged that the
principles be written using a rigorously defined
symbolic notation for logic and set theory, which
he was still developing at that time. Given Peanos

60

role in systematizing and popularizing the above


principles, they have since come to be called the
Peano axioms or Peano postulates.
In the year 1900, Peano presented some of his
findings at the International Congress of Philosophy in Paris. In the audience was a young English
polyglot whose main contribution to academia was
a fellowship thesis on the compatibility of nonEuclidian geometry with Hegelian idealism. He
was so impressed with Peanos work that over
the next few months he had not only mastered
Peanist logic, but had suggested several improvements. This was a 28 year old Bertrand Russell.
Russell suggested that it wasnt enough to state
the axioms of arithmetic in logical notation. One
needed also to be explicit about the rules and principles governing that logical notation, because only
then could one really test what is and what isnt
provable. The axioms of arithmetic needed to be
supplemented with the axioms of logic. However,
in attempting to axiomatize logic and set theory,
bearing in mind Georg Cantors definition of cardinal numbers in terms of one-one correspondences,
Russell became convinced that given suitable definitions of the notions of zero, successor and
natural number, the Peano (so-called) axioms
could actually be derived as theorems from the axioms of logic alone. Russell, having just taught a
course on Leibniz, saw this as vindication of Leibnizs theory that mathematical truths are simply
more complicated truths of logic: a theory now
known as logicism.
Russell began work on writing what he imagined to be a two volume work called The Principles
of Mathematics. In volume one, he would explain
the reduction of mathematics to logic informally
(in English), and in volume two, he would set out to
derive all of pure mathematics within an axiomatic
system whose only axioms were logical axioms.
However, in mid-1901, after he had finished the
bulk of vol. I, he discovered the paradox of sets that
now bears his name. Realizing that this made the
most natural axiomatization of set theory inconsistent, Russell started to look for a philosophically
adequate solution that would nevertheless salvage
most of the work he (and others such as Dedekind
and Peano) had done. Not finding an easy solution,

Russell decided to publish vol. I with only a preliminary discussion of the contradiction and possible
ways of avoiding it, leaving a complete solution
of the inconsistency within the formal system for
further development in vol. II.
While finishing vol. I in 19011902, Russell did
a search of recent literature on the foundations
of mathematics, and in so doing rediscovered the
works of Gottlob Frege. Frege, working in almost
complete isolation, had already in his 1884 Grundlagen der Arithmetik (trans. Foundations of Arithmetic), given a list of basic principles of arithmetic
very similar to Dedekinds, but also suggested, like
Russell, that given suitable definitions in terms of
notions of pure logic, that these principles could be
derived from logical principles alone. In fact, Frege
had already developed the core of an axiomatic system for logic in his 1879 work Begriffsschrift, and
in his later 1893 magnum opus, Grundgesetze der
Arithmetik, vol. I. (trans. Basic Laws of Arithmetic),
Frege expanded that system by adding axioms for
value-ranges (in effect, class theory), and had begun to derive the elementary truths of number
theory. While Russell was delighted to find such
common ground between his work and Freges,
he also discovered that Freges system fell prey to
his paradox, and was therefore inconsistent. He
broke the news gently to Frege in a letter. Here is a
translation of that letter, as well as Freges response
(both originally written in German):

61

Dear Colleague:
[16 June 1902]
I have known your Grundgesetze
der Arithmetik for a year a half, but
only now have I been able to find the
time for the thorough study I intend
to devote to your writings. I find myself in full accord with you on all main
points, especially in your rejection of
any psychological element in logic and
in the value you attach to a conceptual
notation for the foundations of mathematics and of formal logic, which, incidentally, can hardly be distinguished.
On many questions of detail, I find discussions, distinctions and definitions
in your writings for which one looks

in vain in other logicians . . .


I have encountered a difficulty only
on one point. You assert (p. 17) that a
function could also constitute the indefinite element. This is what I used
to believe, but this view now seems to
me dubious because of the following
contradiction: Let w be the predicate
of being a predicate which cannot be
predicated of itself. Can w be predicated of itself? From either answer, the
contradictory follows. We must therefore conclude that w is not a predicate.
Likewise, there is no class (as a whole)
of those classes which, as wholes, are
not members of themselves . . .
On the fundamental questions
where symbols fail, the exact treatment of logic has remained very backward; I find yours to be the best treatment I know in our time; and this is
why I have allowed myself to express
my deep respect for you. It is very regrettable that you did not get around to
publishing the second volume of your
Grundgesetze; but I hope that this will
still be done.
Yours sincerely,
Bertrand Russell
Dear Colleague:
[22 June 1902]
Many thanks for your interesting
letter of 16 June. I am glad that you
agree with me in many things and that
you intend to discuss my work in detail . . .
Your discovery of the contradiction has surprised me beyond words
and, I would almost say, left me thunderstruck, because it has rocked the
ground on which I intended to build
arithmetic. It seems accordingly that
. . . my Basic Law [axiom] V is false . . . I
must give some further thought to the
matter. It is all the more serious as the
collapse of my Law V seems to undermine not only the foundations of my

arithmetic but the only possible foundations for arithmetic as such . . . Your
discovery is at any rate a very remarkable one, and it may perhaps lead to a
great advance in logic, undesirable as
it may seem at first sight . . .
The second volume of my
Grundgesetze is to appear shortly. I
shall have to give it an appendix where
I will do justice to your discovery. If
only I could find the right way of looking at it!
Yours sincerely,
Gottlob Frege
Unfortunately, the hastily prepared solution Frege
included in an Appendix to vol. II of Grundgesetze
was unsuccessful, and leads to a similar, but more
complicated, contradiction. For Russells part, it
took him seven more years to find a solution he
was happy with. By then, volume II of the Principles had grown so big, and had deviated so far from
the plan laid out in vol. I that Russell, along with his
new collaborator, Alfred North Whitehead, decided
to rename it Principia Mathematica, which was itself split into three volumes, published in 1910,
1911 and 1913. (Principia dropped set theory as
such, and instead re-interpreted talk of classes in
mathematics using notions not involving sets, but
instead higher-order quantification over propositional functions divided into ramified types.)
Meanwhile, other mathematicians had developed consistent set theories whose axioms, however, did not seem to have the character of selfevidence usually thought to be required of logical
truths. Such mathematicians still thought much of
mathematics could be reduced to set theory, but
denied that set theory was a branch of logic. The
first system was developed by Ernst Zermelo in
1908, added to, and made more rigorous by Adolf
Frnkel in 1922. Their system is now called ZF or
Zermelo-Frnkel set theory. Another was suggested by John von Neumann in 1925, and expanded by Paul Bernays and Kurt Gdel in the
1930s and is now called NBG set theory. Two more
set theories, NF and ML, were developed by W. V.
Quine in 1937 and 1940. New versions continue
62

to be discovered, such as George Booloss New V Proof:


0
0
which stays close to Freges original system with 1. `S x + 0 = (x + 0)
only a slight modification to Freges Basic Law V. 2. `S x + 0 = x
0
0
However, let us return to Mendelsons System L 3. `S x + 0 = x
0
4. `S x + 1 = x
for the time being.

C.

(S6)
(S5)
1, 2 LL
3 def. 1
e

Numerals
Result (1): `S x 1 = x

The following were either proven in your homework, or follow from those results.

(LL) t = u, A [t, t] `S A [t, u], and


Proof:
u = t, A [t, t] `S A [t, u], for all terms t
1. `S x 00 = (x 0) + x
and u that are free for x in A [x, x].
2. `S x 0 = 0
(Canc+T) `S (x) (y)(x + z = y + z x = y)
3. `S x 00 = 0 + x
4. `S 0 + x = x
(Canc+) t + s = u + s `S t = u
5. `S x 00 = x
Definition: Numerals are the primary or canoni- 6. `S x 1 = x
cal terms used in a given language to stand for specific natural numbers.
We have numerals in both the object language and
the metalanguage. In standard English (our metalanguage) the numerals are the signs:

(S8) Gen, UI
(S7)
1, 2 LL
(S5), (Com+)
3, 4 Trans=
5 def. 1
e

Result (2): `S x 2 = x + x.

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, . . . , etc.

Proof:
In the language of system S, the numerals consti- 1. `S
2. `S
tute the following series of closed terms:
3. `S
0 00 000 0000 00000 000000 0000000
4.
`S
0, 0 , 0 , 0 , 0 , 0 , 0 , 0 , . . . , etc.
Let us now introduce a metalanguage function, n
that yields, for a given number, the numeral of S for
that number. We define this function recursively
in the metalanguage as follows:

x 000 = (x 00 ) + x
x 00 = x
x 000 = x + x
x2=x+x

(S8) Gen, UI
Above, def. 1
1, 2 LL
3 def. 2
e

Result (0+): `S x + y = 0 (x = 0 y = 0).

Abbreviation: 0 is the constant 0


n + 1 is n0 (This is the metalanguage +.)
00

00000

So 2 is 0 , 5 is 0

0000000000000000000000000

, and 25 is 0

Result (+1): `S x + 1 = x0

Proof:
1. x + 0 = 0 `S x + 0 = 0
(Premise)
2. `S 0 + 0 = 0
(S5) Gen, UI
3. x + 0 = 0 `S x + 0 = 0 + 0
1, 2 LL
4. x + 0 = 0 `S x = 0
3 Canc+
5. `S 0 = 0
Ref=
6. x + 0 = 0 `S x = 0 0 = 0
3, 4 SL
7. `S x + 0 = 0 (x = 0 0 = 0)
6 DT
63

`S x + y 0 = (x + y)0
(S6)
0
`S 0 6= (x + y)
(S3) Gen, UI
`S (x + y)0 = 0 0 = (x + y)0 (Sym=T) UI2
`S (x + y)0 6= 0
9, 10 MT
`S x + y 0 6= 0
8, 11 LL
0
0
`S x + y = 0 (x = 0 y = 0)
12 SL
`S [x + y = 0 (x = 0 y = 0)]
[x + y 0 = 0 (x = 0 y 0 = 0)]
13 SL
15. `S (y){[x + y = 0 (x = 0 y = 0)]
[x + y 0 = 0 (x = 0 y 0 = 0)]}
14 Gen
16. `S (y)[x + y = 0 (x = 0 y = 0)] 7,15
MI
17. `S x + y = 0 (x = 0 y = 0)
16 UI
e
8.
9.
10.
11.
12.
13.
14.

or that:
000...(n times)...0 = 000...(m times)...0 `S 000...(nm times)...0 = 0
However, the negations of these follow from (S3),
or (S3) and Sym=, and UI. So, by DT and MT,
we get that `S 000...(n times)...0 6= 000...(m times)...0 , i.e.,
`S n 6= m.
e

Result (Num+): For all natural numbers n and


m, `S n + m = n + m.

Proofs of the following are left as homework or are Proof:


We use induction on m in the metalanguage. First,
given in the book:
for the base, n + 0 is simply n, and we have
(0) `S x 6= 0 (x y = 0 y = 0)
`S n = n + 0 by (S5) and Sym=. For the induc(1+) `S x + y = 1 [(x = 0 y = 1)
tion step, assume `S n + m = n + m. We need
`S n + (m + 1) = n + (m + 1), i.e., `S n + m0 =
(x = 1 y = 0)]
0
n + (m) . This follows by (S2), (S6) and Sym=. e
(1) `S x y = 1 (x = 1 y = 1)
(Succ) `S x 6= 0 (y)(x = y 0 )
(Canc) `S x 6= 0 (y x = z x y = z)
Result (Num): For all natural numbers n and
(Succs) `S x 6= 0 [x 6= 1 (y)(x = y 00 )]
m, `S n m = n m.
We also get the following very important results,
stated in the metalanguage.
You will prove the above as part of your homework.
Result (Num6=): For all natural numbers n and
m, if n 6= m, then `S n 6= m

D.

Ordering, Complete
Induction and Divisibility

Proof:
Abbreviations:
Assume that n 6= m. Hence what we need to prove
(t < u) for (x)(x 6= 0 t + x = u), where x
is a statement of the form:
is the first variable not in t or u
0
`S 000...(n times)... 6= 000...(m times)...0
(t u) for (t < u) (t = u)
0
where one side has more -signs than the other. (t > u) for (u < t)
Perform a reductio in the object language, taking as (t u) for (u t)
a premise the wff, 000...(n times)...0 = 000...(m times)...0 . By
(t u) for (t < u)
successive applications of (S4) and MP, depending
on whether n < m or m < n youll get either that: and we define (t  u), (t u) & (t  u) similarly.
000...(n times)...0 = 000...(m times)...0 `S 0 = 000...(mn times)...

64

Others (many assigned as homework):


Result (Irref<): `S x x

Proof:
1. `S x + 0 = x
(S5)
2. `S x + y = x + 0 y = 0
(Canc+T) Gen, UI, Com+
3. `S x + y = x y = 0
1, 2 LL
4. `S y 6= 0 x + y 6= x
3 SL
5. `S (y 6= 0 x + y 6= x)
4 DN
6. `S (y 6= 0 x + y = x)
5 def.
7. `S (y) (y 6= 0 x + y = x)
6 Gen
8. `S (y)(y 6= 0 x + y = x) 7 DN, def.
9. `S x x
8 defs. <,
e

Result (Trans<):
`S x < y (y < z x < z)

(Ref) `S
(Anti-Sym<) `S
(Trans ) `S
(Trans<) `S
(Order) `S
`S
`S
(to =) `S
(0) `S
(0<) `S
(0) `S
(0) `S
(< Succ) `S
(< Succ) `S
`S
(+Pres) `S
(+Pres<) `S
(Pres) `S
(Pres<) `S
(Canc+<) `S
(Canc+) `S
(Canc<) `S
(Canc) `S
(OrdC) `S

xx
x<yyx
x y (y z x z)
x y (y < z x < z)
x=yx<yy<x
x<yxy
xyx>y
xyxyx=y

0x
0 < x0
x0
x=0x0
x < x0
x < y x0 y
x < y0 x y
xx+y
y 6= 0 x < x + y
y 6= 0 x x y
x 6= 0 (y > 1 x < x y)
x<y x+z <y+z
xy x+z y+z
z 6= 0 (x < y x z < y z)
z 6= 0 (x y x z y z)
(x)[((y)(y < x A [y])
(y)(y x B[y]))
(y)(A [y] B[y])]
`S (x)[((y)(y x A [y])
(y)(y > x B[y]))
(y)(A [y] B[y])]

Proof:
1. x < y `S (z)(z 6= 0 x + z = y) Pr, def. <
2. y < z `S (x)(x 6= 0 y + x = z) Pr, def. <
3. x < y `S b 6= 0 x + b = y
1 Rule C

4. y < z `S c 6= 0 y + c = z
2 Rule C
5. x < y `S x + b = y
3 SL

6. y < z `S y + c = z
4 SL

7. x < y, y < z `S (x + b) + c = z
5, 6 LL
8. x < y, y < z `S x + (b + c) = z
7 Assoc+
9. `S b + c = 0 (b = 0 c = 0) (0+), Gen, UI
10. x < y `S b + c 6= 0
3, 9 SL

11. x < y, y < z `S b + c 6= 0 x + (b + c) = z


Result: For any natural number n,
8, 10 SL
`S (x = 0 . . . x = n) x n.
12. x < y, y < z `S (y)(y 6= 0 x + y = z)
11 EG
13. x < y, y < z `S x < z
12 def. <
14. `S x < y (y < z x < z)
13 DT2 Proof:
e We use induction on n in the metalanguage.
65

(1) The base case is (0).


(2) For the induction step, as inductive hypothesis,
assume:

Corollary: For any natural number n and wff


A [x], `S (A [0] . . . A [n])
(x)(x n A [x]).

`S (x = 0 . . . x = n) x n.
(3) We need:
`S (x = 0 . . . x = n x = n + 1)
xn+1

Corollary: For any natural number n,


`S (x = 0 . . . x = n) x < n + 1.

By the definition of the overbar, this is:


`S (x = 0 . . . x = n x = n0 ) x n0
(4) The left-to-right conditional follows by a proof
by cases: by (2), (Trans<) and (<Succ) for
the first n cases, and, for the final case, by the
obvious tautology:

Corollary: For any natural number n and wff


A [x], `S (A [0] . . . A [n])
(x)(x < n + 1 A [x]).

`S x = n0 x n0
(5) For the right-to-left conditional, first note by New Forms of Induction
the definition of , we have:
x n0 `S x < n0 x = n0
Result (CI):
`S (x)((y)(y < x A [y]) A [x])
(x) A [x]
(The principle of strong or complete mathematical induction.)

(6) By (<Succ), we have:


`S x < n0 x n
(7) By this and the inductive hypothesis, (2), then:
`S x < n0 (x = 0 . . . x = n)
(8) By obvious propositional logic rules:
`S x < n0 (x = 0 . . . x = n x = n0 )
(9) The following is an obvious tautology:
`S x = n0 (x = 0 . . . x = n x = n0 )
(10) So by a proof by cases starting with (5), we get:
x n0 `S (x = 0 . . . x = n x = n0 )
(11) Therefore, the right-to-left conditional follows
by the deduction theorem. This establishes the
biconditional, and completes the induction. e

Proof:
For the proof, like any conditional, we begin by
assuming the antecedent, abbreviated as ($).
1. ($) `S (x)((y)(y < x A [y]) A [x])
(Pr)
Rather than directly proceeding to derive
(x) A [x], we instead attempt to show
(x) (z)(z x A [z]) by normal (weak)
induction on x.
2. z 0 `S z = 0
(Pr), (Zero) SL
3. ($) `S (y)(y < 0 A [y]) A [0]
1 UI
4. `S y 0
(Zero) Gen, UI
5. `S y < 0 A [y]
4 SL
6. `S (y)(y < 0 A [y])
5 Gen
7. ($) `S A [0]
3, 6 MP

66

8. ($), z 0 `S A [z]
2, 7 LL
9. ($) `S z 0 A [z]
8 DT
10. ($) `S (z)(z 0 A [z])
9 Gen
This establishes the base step. Next:
11. (z)(z x A [z]) `S z x A [z]
(Pr), UI
12. `S z x0 z < x0 z = x0
Taut, def.
13. `S z < x0 z x
(<Succ), Gen, UI
14. (z)(z x A [z]) `S z < x0 A [z]
11, 13 SL
0
15. (z)(z x A [z]) `S y < x A [y]
14 Gen, UI
16. (z)(z x A [z]) `S (y)(y < x0
A [y]) 15 Gen
0
17. ($) `S (y)(y < x A [y]) A [x0 ] 1 UI
18. ($), (z)(z x A [z]) `S A [x0 ] 16, 17 MP
19. ($), (z)(z x A [z]) `S z = x0 A [z]
18 PF=
0
20. ($), (z)(z x A [z]) `S z x A [z]
12, 14, 19 SL
21. ($), (z)(z x A [z]) `S (z)(z x0
A [z])
20 Gen
22. ($) `S (z)(z x A [z]) (z)(z x0
A [z])
21 DT
23. ($) `S (x)[(z)(z x A [z])
(z)(z x0 A [z])]
22 Gen
This establishes the induction step, whence:
24. ($) `S (x) (z)(z x A [z])
10, 23 MI
25. ($) `S x x A [x]
24 UI2
26. `S x x
(Ref)
27. ($) `S A [x]
25, 26 MP
28. ($) `S (x) A [x]
27 Gen
29. `S (x)((y)(y < x A [y]) A [x])
(x) A [x]
28 DT
e

Corollary (LNP): `S (x) A [x]


(x)(A [x] (y)(y < x A [y]))
(The Least Number Principle.)

Corollary (MID): `S (x)(A [x]


(y)(y < x A [y])) (x) A [x]
(The Method of Infinite Descent.)

Proving this is homework, but it follows from LNP.

Divisibility
While the division function cannot be defined for
the natural numbers alone, the relation of divisibility can be so defined.
Abbreviation: t|u for (x)(u = t x), where x
is the first variable that does not occur in t and u.
This can be read as u is evenly divisible by t, t
evenly divides u, or as u is a multiple of t.

Result (Ref|): `S x|x

Proof:
1. `S x = x 1
2. `S (y)(x = x y)
3. `S x|x

(1), Sym=
1 EG
2 def. |
e

Result (1|): `S 1|x

Proof:
1. `S
2. `S
3. `S
4. `S

Proof:
This is roughly the transposition of (CI), with
A [x] substituted for A [x]. See book for details.e
67

x=x1
x=1x
(y)(x = 1 y)
1|x

(1), Sym=
1 (Com), Trans=
2 EG
3 def. |
e

Result (|0): `S x|0

Proof:
1. `S 0 = x 0
2. `S (y)(0 = x y)
3. `S x|0

(The proof of this is somewhat complicated, but is


sketched in the book.)
At this point, we can do virtually all elementary
arithmetic for natural numbers in system S.

E.

Expressibility and

(S7) Sym=
Representability
1 EG
2 def. |
e Weve now seen that number-theoretic relations
such as <, , |, etc., can be defined in S, even
though they were not taken as primitive predicate
letters. It is also easy to see that certain functions
on the natural numbers, such as the squaring funcResult (Trans|): `S x|y y|z x|z
tion, n2 , could be defined in S. Our topic over the
next few days will involve general results about
what sort of mathematical functions and relations
can be expressed or represented in system S (and
Proof:
similar systems), and what sort cannot be.
1. x|y y|z `S x|y
(Premise) SL
In the metatheory, functions and relations are
2. x|y y|z `S y|z
(Premise) SL
considered set-theoretically. An n-place relation,
3. x|y y|z `S (z)(y = x z)
1 def. |
for example, is considered to be a set of n-tuples.
4. x|y y|z `S (x)(z = y x)
2 def. |
An n-place function is considered as a set of or5. x|y y|z `S y = x b
3 Rule C
dered pairs, the first elements of which are them3 Rule C
6. x|y y|z `S z = y c
selves n-tuples. For most purposes, however,
5, 6 LL
7. x|y y|z `S z = (x b) c
we can think of them more informally as argu8. x|y y|z `S z = x (b c) 7 Assoc, Trans=
ment/value mappings.
9. x|y y|z `S (y)(z = x y)
8 EG
Let N be the set of natural numbers
10. x|y y|z `S x|z
9 def. |
11. ` x|y y|z x|z
10 DT {0, 1, 2, . . . }. We then define the following:
S

e Definition: A number-theoretic relation is any


subset of Nn for some n (e.g., any set of n-tuples of
Further results (either proven in the book, or as- natural numbers).
signed as homework):
`S y 6= 0 x|y x y
Examples: Being even, being odd, and being prime
`S x|y y|x x = y
are one-place number-theoretic relations (proper`S x|y x|(y z)
ties). Being greater than, being divisible by, etc., are
`S x|y x|z x|(y + z)
two-place number theoretic relations. We are here
`S x|1 x = 1
identifying being even with the set of even num`S x|y x|y 0 x = 1
bers, and being greater than with a set of ordered
pairs of numbers
Result (UQR): `S (x) (y)(y 6= 0
(1 z) (1 z1 )(x = (y z) + z1 z1 < y))
(Uniqueness of quotient and remainder.)

Definition: A number-theoretic function is a


function whose domain is Nn for some n, and whose
range is a subset of N.

68

Examples: Addition and multiplication are both


two-place number-theoretic functions. The function that yields, for a given natural number n as
argument, the nth prime, is a one-place numbertheoretic function.

number of n-place number-theoretic relations and


functions, so not all of them can be represented in
a theory such as S.
Examples:

Within a given mathematical system such as S, 1. The identity relation on the set of natural numbers is expressible in S by the wff x1 = x2 , since:
some functions and relations may be definable and
(a) If k1 = k2 , then k1 is the same as k2 , so
some may not be definable. Let us make this more
`S k1 = k2 is an instance of (Ref=).
precise.
(b) Result (Num6=), on p. 64, established that
Below, we assume that K is an axiom system
for any natural numbers k1 and k2 , if k1 6=
with numerals for natural numbers (e.g., System S).
k2 , then `S k1 6= k2 .
Definition: A given n-place number-theoretic rela- 2. The less than relation is expressible in S by the
tion R is said to be expressible in K iff there is a wff
wff x1 < x2 , i.e., (x)(x 6= 0 x1 + x = x2 ).
A [x1 , . . . , xn ] with x1 , . . . , xn as its free variables 3. The zero function, whose value is 0 for any
such that, for any natural numbers k1 , . . . , kn :
natural number as argument, is strongly repre(i) If R holds for hk1 , . . . , kn i, then
sentable in S (or any other theory with identity)
`K A [k1 , . . . , kn ];
by the wff (x1 = x1 y = 0), since:
(ii) If R does not hold for hk1 , . . . , kn i, then
(a) For any natural number k,
`K A [k1 , . . . , kn ].
`S (k = k 0 = 0)
(b) `S (1 y)(x1 = x1 y = 0)
Definition: A given n-place number-theoretic 4. The successor function is strongly representable
function F is said to be representable in K iff there
in S by y = x01 .
is a wff A [x1 , . . . , xn , y] with x1 , . . . , xn and y as 5. The projection functions Uin are functions
its free variables such that, for any natural numbers
which, for any n arguments, simply return their
k1 , . . . , kn and m:
ith argument as value. E.g., U34 (5, 8, 2, 13) = 2,
(i) If the value of F for hk1 , . . . , kn i as argument
and U34 (7, 1, 0, 16) = 0. They are strongly repreis m, then `K A [k1 , . . . , kn , m];
sentable in S (or any other theory with identity)
(ii) `K (1 y) A [k1 , . . . , kn , y].
by wffs of the form
(Note that A [x1 , . . . , xn , y] might be an identity
statement of the form y = F (x1 , . . . , xn ), where
F is a function letter, but it need not be; it could
instead be any wff containing y and x1 , . . . , xn free
satisfying the above conditions.)
Definition: A given n-place number-theoretic
function F is said to be strongly representable in
K iff there is a wff A [x1 , . . . , xn , y] with x1 , . . . , xn
and y as its free variables such that, for any natural
numbers k1 , . . . , kn and m:
(i) If the value of F for hk1 , . . . , kni as argument
is m, then `K A [k1 , . . . , kn , m];
(ii) `K (1 y) A [x1 , . . . , xn , y].
There are only denumerably many wffs within our
language, but there is a non-denumerably infinite
69

(x1 = x1 . . . xn = xn y = xi )
since:
(a) For any hk1 , . . . , kn i, the value of Uin is ki ,
and `S (k1 = k1 . . . kn = kn ki =
ki ).
(b) `S (1 y)(x1 = x1 . . . xn = xn y =
xi ).

Result: If K is a first-order theory with identity,


then the number-theoretic function F is representable in K iff it is strongly representable in
K.

resent f with the wff:

Sketch of proof:
Note that part (ii) of the definition of strong representability entails (by Gen and UI) part (ii) of
the definition of representability, so the right-toleft conditional holds. For the left-to-right conditional, note that if F is represented in K by
A [x1 , . . . , xn , y]. Then, we can construct a wff
B[x1 , . . . , xn , y] with the following form:

(z1 ) . . . (zm )(A1 [x1 , . . . xn , z1 ] . . .


Am [x1 , . . . xn , zm ] B[z1 , . . . , zm , y])

A proof that the above wff satisfies parts (i) and


(ii) of the definition of strong representability for
F is given in the book, but the result is somewhat
intuitively obvious.
For example, if multiplication can be strongly
represented in K by A [x1 , x2 , y], and addition can
be represented in K by B[x1 , x2 , y] then the func((1 y)(A [x1 , . . . , xn , y]) A [x1 , . . . , xn , y])
tion, whose value for two natural numbers n and
( (1 y)(A [x1 , . . . , xn , y]) y = 0)
m is the product of n and m added to itself (i.e.,
nm + nm), can be represented in K by the wff:
Then F will be strongly represented by this com(z1 ) (z2 )(A [x1 , x2 , z1 ] A [x1 , x2 , z2 ]
plex wff, because, (i) when m is the value of F for
hk1 , . . . , kn i as argument, if the appropriate numerB[z1 , z2 , y])
als replace x1 , . . . xn and y in the above, the first
Roughly, this says there is a z1 and z2 where both
disjunct is derivable, and so the whole is, and (ii)
z1 and z2 are the product of x1 and x2 , and y is the
`PF= (1 y) B[x1 , . . . , xn , y].
sum of z1 and z2 .
The details of the proof of this theorem of PF=
are sketched in the book, but, informally, either the
first conjunct of the first disjunct must hold, or the Characteristic Functions and Graphs
first conjunct of the second disjunct must hold (but Definition: If R is an n-place number-theoretic
not both). For the former case, then there is exactly relation, its characteristic function, written C ,
R
one y such that A [x1 , . . . , xn , y], and for the latter, is the n-place number-theoretic function defined as
there is, of course, always exactly one y such that follows:
y = 0.

0 if R holds for hk , . . . , k i,
1
n
CR (k1 , . . . , kn ) =
1 if not.
Result: Number-theoretic functions defined by
substitution of strongly representable functions
within strongly representable functions are also
strongly representable. More precisely, if F is an
n-place function whose value for hk1 , . . . , kn i is
g(h1 (k1 , . . . , kn ), . . . , hm (k1 , . . . , kn )) where g
and h1 , . . . , hm are all strongly representable in
K, then F is also strongly representable in K.

Examples:
(a) C< (3, 7) = 0 but C< (7, 3) = 1, etc.
(b) C= (2, 2) = 0 but C= (2, 3) = 1, etc.
(c) C| (3, 27) = 0 but C| (3, 26) = 1, etc.
Note that this is the reverse of many programming
languages, etc., in which the Boolean number 1
is used for truth, and 0 is used for falsity.

Sketch of proof:
Suppose that g is (strongly) represented in K by
the wff B[x1 , . . . , xm, y] and h1 through hm are
(strongly) represented by the wffs A1 [x1 , . . . xn , y]
through Am [x1 , . . . xn , y]. It is then possible to rep70

Result: For any theory K (e.g., system S) in


which `K 0 6= 1, the relation R is expressible
in K iff its characteristic function CR is representable in K.

Proof:
(1) To see the truth of the left-to-right conditional,
note that if R is expressed in K by the wff
A [x1 , . . . , xn ], then CR can be represented by
the wff:
(A [x1 , . . . , xn ] y = 0)
(A [x1 , . . . , xn ] y = 1).
(2) For the right-to-left conditional, note that
if CR is represented in K by some wff
A [x1 , . . . , xn , y], then given that `K 0 6= 1, R
can be expressed by the wff A [x1 , . . . , xn , 0].
Example: C< can be represented in S by the wff
((x)(x 6= 0 x1 + x = x2 ) y = 0)
( (x)(x 6= 0 x1 + x = x2 ) y = 1).

expressible or representable within an axiomatic


system. There are two very important categories
of functions that any axiomatic theory for number
theory should be able to represent, viz., primitive
recursive functions, and recursive functions.
These two categories of functions have broad
importance not only within logic, but also in mathematics and computer science generally. We shall
later prove that the functions in these categories
are representable within System S (i.e., Peano arithmetic). Our current task, however, is simply to get
a better understanding of what it is for a function
to fall into one or both of these two groups. For
the moment, therefore, were putting system S on
the shelf and will be discussing these functions
entirely in the metalanguage. Therefore, all the
mathematical notation that appears over the next
several pages is the notation of ordinary mathematics, not the notation used within system S or other
formal theory.

Definition: If F is an n-place number-theoretic


function, then the graph of F, written GF , is the
Definition: The initial functions are the follow(n + 1)-place number-theoretic relation that holds
ing functions:
for hk1 , . . . , kn , kn+1 i iff kn+1 is the value of F for
(1) The (one-place) zero function Z, the value of
hk1 , . . . , kn i as argument.
which is 0 for any argument (i.e., for all x,
Z(x) = 0).
Example: The graph of the addition function is the
relation that holds between three numbers just in (2) The (one-place) successor function N , the value
of which is always the number one greater than
case the sum of the first two numbers is the last.
its argument. (Note, we write this as N (x), not
x0 , to avoid confusing the metalanguage function
sign and its counterpart in the object language
of system S.)
Result: For any theory K, the function F is rep(3) The (n-place) projection functions Uin , which, for
resentable in K iff the graph of F is expressible
any n arguments, simply return their ith arguin K.
ment as value. (There is a different one for each
n and i.)

(Proving this is homework.)


The following are not functions, but rules used
We will be making a good deal of use of char- for obtaining one function from others already deacteristic functions in what follows, but relatively fined.
little use of graphs.
Definition: An n-place function f is said to be obtained by substitution from the m-place function g
and the n-place functions h1 , . . . , hm whenever the
F. Primitive Recursive and
value of f can be determined as follows:

Recursive Functions

We have been discussing number-theoretic functions and relations and what it is for them to be
71

f (x1 , . . . , xn ) =
g(h1 (x1 , . . . , xn ), . . . , hm (x1 , . . . , xn ))

Definition: An (n + 1)-place function f is said to


be obtained by recursion from the n-place function
g and the (n + 2)-place function h, iff both (i) the
value of f can be determined as follows when 0 is its
last argument:
f (x1 , . . . , xn , 0) = g(x1 , . . . , xn ),

Examples: y(y > 6) = 7 and


y(y is prime and even) = 2.
Definition: A number-theoretic function f is said
to be primitive recursive iff it can be obtained from
the initial functions by some finite number of applications of the rules of substitution and/or recursion.

Definition: A number-theoretic function f is said


and (ii) whenever its last argument is other than 0,
to be recursive iff it can obtained from the initial
its value can be determined from its value for the
functions by some finite number of applications of
arguments predecessor as follows:
substitution, recursion, and/or the choice of least
rules. (These are also called general recursive funcf (x1 , . . . , xn , y + 1) =
tions.)
h(x1 , . . . , xn , y, f (x1 , . . . , xn , y))
Obviously, all primitive recursive functions are reor, in the case of a one-place function, we say that cursive (though we shall later prove that the conf can be obtained by recursion from the constant verse does not hold).
Derivatively, a number-theoretic relation is said
k (where k is a particular natural number), and the
to
be
primitive recursive iff its characteristic func2-place function h whenever its values can be detertion is primitive recursive.
mined as follows:
A given subset of the natural numbers can be
f (0) = k
thought of as a one-place relation (property) on
the natural numbers. So a given set of natural
f (y + 1) = h(y, f (y))
numbers can also (derivatively) be called primitive
Definition: An n-place function f is said to be ob- recursive (or recursive) iff all and only its members
tained by the choice of least rule from the (n + 1)- share some number-theoretic property the characplace function g whenever the value of f can be teristic function of which is primitive recursive (or
recursive).
characterized as follows:
The following sorts of manipulations always
preserve (primitive) recursiveness:
f (x , . . . , x ) = the least natural number y
1

such that g(x1 , . . . , xn , y) = 0.


(If there is not always such a y, f cannot be defined
in this way.)
Note that if g is the characteristic function of some
relation R, then the value of f (x1 , . . . , xn ) will be
the least y such that R holds for hx1 , . . . , xn , yi.

Result: If the n-place function f is (primitive)


recursive, then so is the (n + 1)-place function
g, whose value, g(x1 , . . . , xn , xn+1 ), is always
simply f (x1 , . . . , xn ), so that the last argument
to g is always simply ignored.
(Adding dummy variables.)

Abbreviation: yR(x1 , . . . , xn , y) means the


least y such that R holds of hx1 , . . . , xn , yi.
Proof:
Function g can be defined by substitution using f
operates as a subnective, much like the use of the and the projection functions:
sign for descriptions. This restricted -operator
g(x1 , . . . , xn , xn+1 ) = f (U1n+1 (x1 , . . . , xn+1 ),
is used in the metalanguage only. Your book also
calls this the choice of least rule the restricted . . . , Unn+1 (x1 , . . . , xn+1 )) e
Operator rule for reasons that should be apparent.

72

Proof:
This follows by substitution, since:
Result: If n-place function f is (primitive)
recursive, then so is the n-place function g
whose value, g(. . . , xi , . . . , xj , . . . ), is always
f (. . . , xj , . . . , xi , . . . ).
(Permuting variables.)

Z n (x1 , . . . , xn ) = Z(U1n (x1 , . . . , xn ))

Result: For any n and k, the n-place constant


functions Ckn , the value of which for any narguments, is always k regardless of what the
arguments are, are primitive recursive.

Proof:
Again, using substitution and projection:
g(. . . , xi , . . . , xj , . . . ) = f (. . . , Ujn (x1 , . . . , xn ),
. . . , Uin (x1 , . . . , xn ), . . . ) e

Proof:
This can be proven by induction on k. For k = 0,
the n-place constant function is the same as the
n-place zero function. For the rest, the n-place constant function whose value is always k + 1 can be
defined by substitution since:
n
Ck+1
(x1 , . . . , xn ) = N (Ckn (x1 , . . . , xn ))

Result: If the (n + 1)-place function f is


(primitive) recursive, then so is n-place function g whose value, g(x1 , . . . , xn ) is always
f (x1 , x1 , . . . , xn ).
(Identifying variables.)

Proof:
Our method is similar to the above:
g(x1 , . . . , xn ) = f (U1n (x1 , . . . , xn ),
U1n (x1 , . . . , xn ), . . . , Unn (x1 , . . . , xn )) e
The practical effect of these three results, especially
when combined together, is that it strengthens the
substitution rule so that not all the hs need to be
n-place functions, nor do they have to put the variables in the same order as f , nor do they have to
make use of all the xs, etc. Similar results follow
for the g and h used in the recursion rule. (We
shall simply put this into practice from now on.)

One of the consequences of this, is that by using


such functions in place of one of the hs in the
definition of substitution, we can in effect simply
place a given natural number into the appropriate
argument spot of g. Similarly, if we use such a
function in place of the g in recursion, we can simply identify the value of f when its last argument
is 0 with a fixed natural number even when n > 0.
(We do this, e.g., in the definition of xy below.)
The class of recursive functions has been
proven equivalent to the class of Turing machinecomputable functions, or roughly, those whose
value a calculator or computer can in principle
determine using a mechanical procedure given
enough time. This may provide some intuitive
insights as we continue our discussion of them.

Result: The functions below are primitive recursive.

(a) Addition: x + y. Definable by recursion:


x + 0 = U11 (x) = x
x + (y + 1) = N (x + y)
(b) Multiplication: x y. Recursion again:
x 0 = Z(x) = 0
x (y + 1) = (x y) + x

Result: For any n, the n-place zero function Z n


is primitive recursive.

73

(c) x to the power of y: xy . Recursion:


x0 = C11 (x) = 1
xy+1 = (xy ) x
(d) Predecessor: (x). Recursion:
(0) = 0
(y + 1) = U12 (y, (y)) = y
.
y. Recur(e) Subtract-as-much-as-you-can: x
sion:
.
x
0=x
.
.
x
(y + 1) = (x
y)
(f) Absolute difference: |x y| . Substitution:
.
.
|x y| = (x
y) + (y
x)
(g) Signum: sg(x). Substitution:
.
sg(x) = x
(x)
(Yields 1 for everything except 0, for which it
yields 0. This function and the next are very
helpful in defining characteristic functions.)
(h) Reverse signum: sg(x). Substitution:
.
sg(x) = 1
sg(x)
(Yields 0 for everything except 0, for which it
yields 1.)
(i) Factorial: x! Recursion:
0! = 1
(y + 1)! = y! (y + 1)
(j) Minimum of 2 arguments: min(x, y). Substitution:
.
.
min(x, y) = x
(x
y)
(k) For any n > 2, the minimum of n arguments,
because each such function can be defined by
substitution using the previous one:
min(x1 , . . . , xn , xn+1 ) =
min(min(x1 , . . . , xn ), xn+1 )
(l) Maximum of 2 (or more) arguments:
.
max(x, y) = y + (x
y)
max(x1 , . . . , xn , xn+1 ) =
max(max(x1 , . . . , xn ), xn+1 )
(m) Remainder upon division: rm(x, y). Recursion:
rm(x, 0) = 0
rm(x, y + 1) =
N (rm(x, y)) sg(|x N (rm(x, y))|)
(n) Quotient upon division of y by x: qt(x, y).
(Rounded down.) Recursion:
qt(x, 0) = 0
qt(x, y+1) = qt(x, y)+sg(|xN (rm(x, y))|)

the list above is the notation of ordinary mathematics. We have not shown how these functions could
be represented in System S or any other axiomatic
system built upon the predicate calculus (at least
not yet anyway).

Bounded Sums and Products


The following notation:
X

f (x1 , . . . , xn , z)

z<y

stands for the (n + 1)-place bounded sum function


g, whose value for hx1 , . . . , xn , yi as argument is
the sum of all the values of f for hx1 , . . . , xn , 0i
through hx1 , . . . , xn , y 1i.
Result: If f is (primitive) recursive, then so is
the bounded sum g, as explained above.

Proof:
The function g can be defined by recursion as follows:
g(x1 , . . . , xn , 0) = 0
g(x1 , . . . , xn , y + 1) = g(x1 , . . . , xn , y) +
f (x1 , . . . , xn , y)
e
Similar results follow for the form:
X

= f (x1 , . . . , xn , z)

zy

This is definable by substitution, since:


X

f (x1 , . . . , xn , z) =

f (x1 , . . . , xn , z)

z<y+1

zy

Similarly for doubly bounded sums:


X

f (x1 , . . . , xn , z) =

y<z<v

. y)
z<(v

f (x1 , . . . , xn , z + y + 1)

The following notation:


Y

f (x1 , . . . , xn , z)

z<y

stands for the (n + 1)-place bounded product function g, whose value for hx1 , . . . , xn , yi as argument is the product of all the values of f for
Remember that all the mathematical notation on hx1 , . . . , xn , 0i through hx1 , . . . , xn , y 1i.
74

Definition: The disjunction of two relations,


written R-or-S, is the relation that holds for
hk1 , . . . , kn i iff it holds for either R or S.

Result: If f is (primitive) recursive, so is the


bonded product g, as characterized above.

(Similar terminology is used for other propositional


connectives.)
If we think of relations set-theoretically, conProof:
junctions are really intersections, and disjunctions
Again g can be defined by recursion, since:
are really unions, and negations are really compleg(x1 , . . . , xn , 0) = 1
ments, etc.
g(x1 , . . . , xn , y + 1) =
Mendelson sometimes uses notation such as
g(x1 , . . . , xn , y) f (x1 , . . . , xn , y)
e R S and R for negations and disjunctions of
relations. This notation can be misleading, because
Similar results follow for bounded products for it is still part of the metalanguage, not the objectall z y, as well as doubly bounded products language. Therefore, I use the English words.
(y < z < v).
Bounded sums can be used in clever ways to
scan ranges of numbers and count those numResult: If R and S are (primitive) recursive rebers with certain characteristics. For example, conlations, then so are their negations, conjunctions,
sider the tally function , whose value for x as
disjunctions, and so on.
argument is the number of factors of x less than or
equal to x itself. This function can be defined as
follows:
Proof:
X
sg(rm(z, x))
(x) =
By definition, if R and S are (primitive) recursive,
zx
their characteristic functions CR and CS are (primThis function scans the numbers up to and in- itive) recursive, in which case the characteristic
cluding x, and each time it encounters one the functions of their negations and disjunctions can
remainder of which is 0 when divided into x, the be defined as follows:
Cnot-R (x1 , . . . , xn ) = sg(CR (x1 , . . . , xn ))
value of the reverse signum function is 1, and so
CR-or-S (x1 , . . . , xn ) =
one more is added to the bounded sum.
CR (x1 , . . . , xn ) CS (x1 , . . . , xn )
Other propositional operations on relations can be
Relations and Recursion
defined in terms of disjunction and negation. e
Recall that a relation is said to be (primitive) recursive iff its characteristic function is a (primitive)
recursive function.

I use the notation:


z

R(x , . . . , x , z)

z<y

Definition: The negation of number-theoretic


relation R, viz., not-R, is the relation that holds in the metalanguage(!) to stand for the relation
of a given n-tuple of natural numbers hk1 , . . . , kn i Q that holds for hx1 , . . . , xn , yi iff the relation R
holds for at least one ordered (n + 1)-tuple of the
iff R does not hold of hk1 , . . . , kn i.
form hx1 , . . . , xn , zi where z < y. (Mendelson
writes
instead (z)z<y R(x1 , . . . , xn , z)but I find
Definition: The conjunction of numbertheoretic relations R and S, written R-and-S, this too close to the notation used in the object
is the relation that holds of hk1 , . . . , kn i iff R and S language, and potentially confusing.)
both hold of hk1 , . . . , kn i.
75

Here the bounded sum scans the values of z less


than y and adds one whenever it finds one for
which R does not hold for hx1 , . . . , xn , zi. Therefore, the value of the signum function is 0 iff this
scan finds no such z.
(We could also have defined this using the

Result: If R is a (primitive) recursive number


theoretic relation, so is the existentially quantified relation Q as annotated above.

. . . is the
bounded existential quantifier, since
Proof:
z<y
z
The characteristic function for Q can be defined
same
as
notnot- . . .).
e
in terms of the characteristic function for R by
z<y
substitution as follows:
Similar results follow for bounded universal
Y
quantifiers
using , and doubly bounded ones, etc.
CQ (x1 , . . . , xn , y) =
CR (x1 , . . . , xn , z)

z<y

The notation:

Note that if R holds for at least one hx1 , . . . , xn , zi


where z < y, then, for that hx1 , . . . , xn , zi, the
value of the characteristic function will be 0, in
which case, the value of the bounded product will
also be 0. If there is no such z, then the value of
the characteristic function will always be 1, and so
the bounded product will also yield 1 as value. e

zz<y R(x1 , . . . , xn , z)
is used to stand for the function g whose value
for hx1 , . . . , xn , yi as argument is the least number z less than y for which the relation R holds
for hx1 , . . . , xn , zi if there is such a z, and whose
value is y if there is no such z.

Similar results follow for bounded existential quantifiers using , doubly bounded existential quantifiers, etc.

Result: If the relation R is (primitive) recursive,


then so is the function g, defined by the bounded
-operator above.

The notation:
z

R(x , . . . , x , z)
n

Proof:
Function g can be defined using the characteristic
is used in the metalanguage to stand for the rela- function of R by substitution as follows:
tion Q that holds for hx1 , . . . , xn , yi just in case the

X
Y
relation R holds for all ordered (n + 1)-tuples of

g(x1 , . . . , xn , y) =
CR (x1 , . . . , xn , w)
the form hx1 , . . . , xn , zi where z < y.
z<y wx
z<y

Result: If number-theoretic relation R is (primitive) recursive, so then is the bounded universally


quantified relation Q, as annoted above.

Proof:

CQ (x1 , . . . , xn , y) = sg

z<y

CR (x1 , . . . , xn , z)

(As z increases, the bounded product will keep


adding 1 to the bounded sum so long as R does
not hold for any hx1 , . . . , xn , wi where w z. As
soon as a z is reached for which R does hold for
some hx1 , . . . , xn , wi where w z, the bounded
product will stop adding to the bounded sum, and
so the result will identical to the least such z.) e
Note that because functions defined using the
bounded -operator do not make use of the unbounded -operator, it is possible for such functions to be primitive recursive, not simply recursive, provided that CR is primitive recursive.
76

Similar results follow for bounded -operators 6. The function (x)y whose value is the exponent
using instead of <, and doubly bounded on the y th prime in the prime factorization of x,
operators.
is primitive recursive:
(x)y = zz<x ((py )z |x and not-((py )z+1 |x))
Result: The relations and functions listed below
are primitive recursive.

This will return the least z such that (py )z goes


evenly into x but (py )z+1 does not.
7. The notation `~(x), read the length of x, is
used for the function, that has as value, the number of prime factors of x (i.e., the number of ys
such that (x)y is not zero.) This function is also
primitive recursive. See the book for details.

1. The identity relation, since:


C= (x, y) = sg(|x y|)
2. The relation of being less than:

Result: If the functions g1 , . . . , gm are all (primitive) recursive, and if the relations R1 , . . . , Rm
are all (primitive) recursive, then so is the function f whose value can be informally characterized as follows:

.
x)
C< (x, y) = sg(y

3. The relation of being evenly divisible by:


C| (x, y) = sg(rm(x, y))
4. The property of being prime:

f (x1 , . . . , xn ) =

CPr (x) = C= ( (x), 2)


(Recall that (x) is the number of even divisors
of x less than or equal to x. Remember that
a number is prime iff it has exactly two such
divisors, 1 and itself.)
5. The function px , whose value for x as argument
is the xth prime number:

g1 (x1 , . . . , xn )

if R1 (x1 , . . . , xn )

..

gm (x1 , . . . , xn )

if Rm (x1 , . . . , xn )

is also (primitive) recursive.

Proof:
The above definition is equivalent to the following:

p0 = 2
py+1 = zz(py )!+1 (py < z and Pr(z))

f (x1 , ..., xn ) =
(g1 (x1 , . . . , xn ) sg(CR1 (x1 , . . . , xn ))) + . . .
+ (gm (x1 , . . . , xn ) sg(CRm (x1 , . . . , xn ))) e

(The bound placed on z comes from Euclids


proof that there is no greatest prime number.)
It is a well known mathematical result that every
positive integer x has a unique prime factorization,

G.

Number Sequence Encoding

x = (p0 )a0 (p1 )a1 . . . (pk )ak


Although this is somewhat counterintuitive, there
where a0 , . . . , ak are the series of exponents of are only denumerably many n-tuples of natural
the first k primes, where pk is the largest prime numbers for any positive integer n. The following
that evenly divides x.
chart shows one way of enumerating all ordered
77

Arbitrary Number Sequence Encoding

pairs of natural numbers:


h0, 0i
0
h0, 1i
1
h0, 2i
3
h0, 3i
6
h0, 4i
10
h0, 5i
15
..
.

h1, 0i
2
h1, 1i
4
h1, 2i
7
h1, 3i
11
h1, 4i
16

h2, 0i
5
h2, 1i
8
h2, 2i
12
h2, 3i
17

We can actually devise a single uniform method for


encoding any finite sequence (of arbitrary length)
of positive integers with a single natural number.
This has a variety of uses, and plays a crucial
role in Gdel numbering.
We do it as follows. Suppose the finite sequence
we want to encode is the following:

h3.0i h4, 0i h5, 0i

9
14
20
h3, 1i h4, 1i

13
19
h3, 2i

18

a0 , a1 , a2 , . . . , ak

This sequence can be encoded using the number


obtained by raising the first k + 1 prime numbers
(starting with 2) to these numbers as powers in
order, and multiplying, so that the above becomes:

We begin by enumerating all pairs whose elements


add up to zero, then those whose elements add up
to 1, then those that add up to 2, etc., in a systematic way by moving upwards along the diagonals.
If this is continued ad infinitum, no ordered pair
will be left out, and all natural numbers will be
used.
Moreover, the 2-place function whose value, for
a given x and y, is the natural number corresponding to hx, yi in this enumeration can be defined
thus:
(x, y) = qt(2, x2 + y 2 + 2xy + x + y) + x
This function is primitive recursive. So are the inverse functions 1 (z) and 2 (z), whose values for
a given z are the first and second elements respectively in the corresponding ordered pair.
Similar sorts of mappings can be devised for
enumerating all 3-tuples, 4-tuples, etc. Mendelson
actually does these sorts of mappings in a slightly
different way (less easy to put on a chart), and calls
his function 2 instead of , and calls the inverse
functions 12 and 22 . He then proves that its possible to define a function k similar to 2 for any
k > 0, as well as corresponding inverse functions,
and that all such functions can be shown to be
primitive recursive. We will not have much call
for this, as it is superseded by the uniform method
below.

pa00 pa11 pa22 . . . pakk


The result is a single positive integer: most likely,
a very large one, but a single integer nonetheless.
If we used the same method for encoding any
different finite sequence of positive integers, the
result would always be a different integer, because
it would have a different prime factorization.
There are certain primitive recursive functions
that are very helpful in working with and manipulating the numbers used for such encoding. Three
of them, px , (x)y and `~(x), have already been discussed.
The function (x)y can be used to retrieve a
given element of the sequence from the number
used to encode it. For example, if x encodes the
sequence: a0 , a1 , a2 , . . . , ak . Then we see that
(x)i = ai for any 0 i k.
Suppose that x encodes the sequence:
a0 , a1 , a2 , . . . , ak . Suppose also that y encodes
the sequence:
b0 , b1 , b2 , . . . , bj
Suppose that now we want to encode the sequence
that puts the b-sequence after the a-sequence:
a0 , a1 , a2 , . . . , ak , b0 , b1 , b2 , . . . , bj
Note that we cannot simply multiply x and y: this
would lead to the number that encodes:

78

(a0 + b0 ), (a1 + b1 ), (a2 + b2 ), . . . , etc.

This is not what we want. Instead, we define the


following function of x and y:
xy =x

by substitution as follows:
f #(x1 , . . . xn , y) =

(pz )f (x1 ,...xn ,z)

z<y

(p`~(x)+z )(y)z

(b) On the other hand, suppose that f # has already been shown to be (primitive) recursive.
One can then obtain f by substitution as follows:

z<`~(y)

The function x y, as you can see, is primitive


recursive, and is called the juxtaposition function,
because it is used in juxtaposing sequences of positive integers. (They must be positive integers: if
either sequence were to contain 0, the `~ function
wont return the correct sequence length.)
Do not be mislead by the fact that a number
of programming languages, and software, etc., use
the sign for multiplication. That is not what the
sign used here means.
Besides its use in Gdel numbering, number
sequence encoding can be used in the recursive
definitions of certain functions that might not otherwise seem recursive.

Course-of-Values Recursion
Because a single number can be used to encode a
finite sequence of numbers, it is possible to define
a function whose value for y as argument encodes
the sequence (course) of values of another function for all arguments leading up to and including
y.
If f is a (n + 1)-place number-theoretic function, then the notation f # is used for the
(n + 1)-place number-theoretic function whose
value for hx1 , . . . , xn , yi is the number that encodes the series of values for f for all (n + 1)tuples starting with hx1 , . . . , xn , 0i and ending
with hx1 , . . . , xn , y 1i.

f (x1 , . . . xn , y) = (f #(x1 , . . . , xn , y + 1))y


e
Sometimes it is easier to define f # recursively
than it is to define f , especially, a function whose
value for a given number depends not only upon
its value for the previous number, but upon more
than one or even all of its prior values. Such functions are said to be obtained by course-of-values
recursion, rather than simple recursion.
Example: Consider fib(x) whose value for any x
is the xth item in the Fibonacci sequence, which
adds the previous two members to get the next
(staring with 1, 1):
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . . , and so on.
This function cannot simply be defined by the recursion rule, because its value for y + 1 depends
not only on its value for y but also on its value for
y 1.
However, fib # can be obtained by the simple
recursion rule as follows:
fib #(0) = 0
fib #(y + 1) = (sg(C< (y, 2)) (4y + 2)) +
(C< (y, 2) fib #(y) (py )(fib #(y))(y) +(fib #(y))((y)) )
Since fib # is primitive recursive, and fib can be
obtained from it by substitution:
fib(x) = (fib #(x + 1))x

Result: A function f is (primitive) recursive iff


f # is (primitive) recursive.

The function fib is also primitive recursive.

Similar results follow for those relations one would


want to define in a similar sort of way. Generally,
we can say that if a function f is obtained from
(primitive) recursive functions by course-of-values
Proof:
recursion, then f is (primitive) recursive itself. (A
We prove this in both directions.
fuller proof is given in the book.)
(a) Suppose that f has already been shown to be
Course-of-values recursion is to simple recur(primitive) recursive. One can then obtain f # sion what strong induction is to weak induction.
79

Gdels -Function

The upshot of all this is that it provides yet another method of talking indirectly about sequences
Consider the following primitive recursive funcof numbers. Each sequence corresponds to a b
tion, defined as follows:
and c, which, when combined together with the
-function, give us a way of retrieving elements
(x1 , x2 , x3 ) = rm(1 + ((x3 + 1) x2 ), x1 )
in the sequence. Claims made about sequences
Surprisingly, for any series of natural numbers of numbers can be transformed into claims made
with n + 1 members
about the b and c that it would be appropriate to
use as the first two arguments to the -function
k0 , k1 , k2 , . . . , kn
for that sequence.
one can find two fixed natural numbers b and c
This will also help us prove that all recursive
such that, for any i, such that i n, (b, c, i) = ki .
functions are representable in S.
To see this, first, let c be
For further discussion of the Chinese Re(max(n, k0 , k1 , k2 , . . . , kn ))!
mainder Theorem, see the book, p. 184, 188,
419,
and:
http://www.cut-the-knot.org/
Next, consider the following sequence:
blue/chinese.shtml
u0 , u1 , u2 , . . . , un
where each ui = 1 + ((i + 1) c), for all i n.
No two members of the u-series have a factor
in common other than 1. (It is a matter of tedious
arithmetic to show this.)
It follows from this and a known principle of
modular arithmetic, the Chinese remainder theorem,
that there is at least one number b such that the
remainder upon division of b by ui is always ki
for every i from 0 to n. (The proof of this is more
tedious arithmetic.)
Therefore, because each ui is 1 + ((i + 1) c),
it follows that rm(1 + ((i + 1) c), b) = ki , which
is to say that (b, c, i) = ki .

H.

Representing Recursive
Functions in System S

Our next task is to show that every recursive function is representable in System S.
Recall that recursive functions are those obtained from the initial functions (the zero function,
the successor function and the projection functions) by some finite number of applications of the
substitution, recursion and choice of least rules.

Example: Suppose that our k-sequence is simply:

Therefore, in order to prove our result, we need


only show the following: (a) the initial functions
1, 2, 1
are representable in S, (b) the rule of substitution
Then n = 2 and max(2, 1, 2, 1) is also 2, and fi- preserves representability in S, (c) the rule of recursion preserves representability in S, and (d) the
nally, c = 2!, which is also 2.
choice of least rule preserves representability in S.
Then, the u-series is:
On pp. 6970, we showed how the initial func3, 5, 7
tions could be (strongly) represented in S, and we
(These numbers share no common factor.)
also discussed why it is that the substitution rule
It follows by the Chinese remainder theo- preserves (strong) representability.
rem that there at least one number b such that
Therefore, whats left is to show that the rerm(3, b) = 1, and rm(5, b) = 2, and rm(7, b) = 1.
cursion and choice of least rules preserve repre(In this case, b could be 22, or 127, etc.)
sentability. We begin with the easier of the two.
For this sequence, for all 0 i 2,
(22, 2, i) = ki
80

(4c) `S (z)(z < j E [k1 , . . . , kn , z, 0])


Conjoining (4a) with (4c), we get (3a).
5. To get (3b), since we have (3a), we need only
prove uniqueness. We first make an assumption:
(5a) E [k1 , . . . , kn , y, 0]
(z)(z < y E [k1 , . . . , kn , z, 0])
By the theorem (Order), we have that:
(5b) `S y = j y < j y > j.
However, the second and third disjuncts of
(5b) lead to contradictions with (5a), (4a)
and (4c), which leaves only the first, and
so, by DT:
(5c) `S E [k1 , . . . , kn , y, 0] (z)(z < y
E [k1 , . . . , kn , z, 0]) y = j
By Gen, (3a), EG, and exercise 2.70e, we
get (3b).
e

Result (Choice-of-least Lemma): The choice


of least rule preserves representability in S: More
precisely, if a given (n + 1)-place numbertheoretic function g is representable in S, and
f is an n-place number theoretic function whose
value for a given hx1 , . . . , xn i can be characterized as follows:
f (x1 , . . . , xn ) = the least natural number y
such that g(x1 , . . . , xn , y) = 0
then, f is also representable in S.

Proof:
1. Suppose that g is represented in S by the wff
E [x1 , . . . , xn , xn+1 , y]. By definition, then:
(1a) if the value of g for hk1 , . . . , kn , kn+1 i as
argument is m, then
`S E [k1 , . . . , kn , kn+1 , m]; and
(1b) `S (1 y) E [k1 , . . . , kn , kn+1 , y].
2. We can represent f in S using the wff:

Result (Recursion Lemma): The recursion rule


preserves representability in S: More precisely,
if a given n-place number-theoretic function g
is representable in S, and a given (n + 2)-place
number-theoretic function h is also representable
in S, and f is an (n + 1)-place number-theoretic
function, for which it is true for all x1 , . . . , xn
and y that:
(i) f (x1 , . . . , xn , 0) = g(x1 , . . . , xn )
(ii) f (x1 , . . . , xn , y + 1) =
h(x1 , . . . , xn , y, f (x1 , . . . , xn , y))
then, f is also representable in S.

E [x1 , . . . , xn , y, 0]
(z)(z < y E [x1 , . . . , xn , z, 0])
3. We need to show that:
(3a) If the value of f for hk1 , . . . , kn i is j, then,
`S E [k1 , . . . , kn , j, 0]
(z)(z < j E [k1 , . . . , kn , z, 0])
(3b) `S (1 y)(E [k1 , . . . , kn , y, 0]
(z)(z < y E [k1 , . . . , kn , z, 0]))
4. To show (3a), first assume that the value of f
for hk1 , . . . , kn i is j. Then j must be the least y
such that g(k1 , . . . , kn , y) = 0. Hence, by (1a):
(4a) `S E [k1 , . . . , kn , j, 0]
For any 0 i < j, g(k1 , . . . , kn , i) is
something other than 0, so, by (1a) and
(1b), we can get:
(4b) `S E [k1 , . . . , kn , 0, 0]
E [k1 , . . . , kn , 1, 0] . . .
E [k1 , . . . , kn , j 1, 0]
By a corollary proven on p. 66, it follows
from (4b) that:

Proof:
(This proof is very complex, perhaps the most complex single proof of the semester. It is given in
detail in the book. Im not going to try to recreate
all the details here, but will merely give a rough
outline.)
1. Suppose that f is obtained from g and h recursively as suggested above. Then, if m is the
value of f for some hx1 , . . . , xn , yi, there must
be some finite sequence,

81

v0 , v1 , . . . , vy
(the course-of-values of f for all arguments leading up to and including y) where vy is m and
also: v0 = g(x1 , . . . , xn ), and for all 0 i < y,

vi+1 = h(x1 , . . . , xn , i, vi )
E.g., if f (x, y) is xy , the v-sequence would be:

5. Given the above, it is possible to represent f


with the following wff:


1, x, x2 , x3 , . . . , xy

(z1 ) (z2 ) (y2 )(Bt[z1 , z2 , 0, y2 ]


A [x1 , . . . , xn , y2 ]) Bt[z1 , z2 , xn+1 , y]

I.e., C11 (x), C11 (x) x, (C11 (x) x) x, etc.

(z3 ) z3 < xn+1


2. However, talk about any finite sequence can be
proxyed using Gdels -function, i.e.:

(y3 ) (y4 )(Bt[z1 , z2 , z3 , y3 ]


Bt[z1 , z2 , z30 , y4 ]



E [x1 , . . . , xn , z3 , y3 , y4 ])

(x1 , x2 , x3 ) = rm(1 + ((x3 + 1) x2 ), x1 )


Ugly! What on earth does this say?!
What we want it to say is that y is
This can be strongly represented in S by the wff:
the value of the recursive function f for
hx1 , . . . , xn , xn+1 i as argument. Does it?
(z)(x1 = ((1 + ((x3 + 1) x2 )) z) + y
Remember that the function is used to
y < 1 + ((x3 + 1) x2 ))
talk indirectly about finite sequences. Because each finite sequence corresponds to
Hereafter well use Bt[x1 , x2 , x3 , y] as shorta fixed b and c such that (b, c, i) is always
hand for the above. The proof that this wff
the ith member of the sequence, quantifistrongly represents the function comes easily
cation over sequences can in effect be done
from (UQR) and the expressibility of the less
by quantifying over two numbers. The exthan relation by x1 < x2 .
istential quantification over z1 and z2 at
3. We can then use Bt[x1 , x2 , x3 , y] to construct
the start of this wff in effect says there is
statements in S that make assertions about fia finite sequence such that . . . .
nite sequences, and in particular, those finite
Given that Bt[. . .] represents the funcsequences that correspond to partial courses-oftion, and A [. . .] represents g, the first convalues of recursive functions for all arguments
junct on the inside says that there is a y2
up to a given point. This is all we need to repreat the start (0-spot) of the sequence, and
sent such functions.
its the value of g for hx1 , . . . xn i. This ba4. We supposed that g and h are representable in
sically says how the sequence of values of
S. Suppose that the wff that represents g is:
the recursive function begins.
Next, it says that y is at the xn+1 -spot of
A [x1 , . . . , xn , y]
the sequence of values, which is to be expected if y is the value of f when f s last
And suppose that the wff that represents h is:
argument is xn+1 .
Lastly, given that E [. . .] represents h, it
E [x1 , . . . , xn , xn+1 , xn+2 , y]
says that for each previous spot in the sequence (the z3 -spot, where z3 < xn+1 ), the
By definition, then, for any natural numbers
member of the sequence at the next spot
k1 , . . . , kn , kn+1 , kn+2 and m:
(y4 ) is obtained from the member at the
(4a) if the value of g for hk1 , . . . , kn i is m, then
z3 -spot (y3 ) in the appropriate way from
`S A [k1 , . . . , kn , m];
the h function.
(4b) `S (1 y) A [k1 , . . . , kn , y];
6. This will (hopefully) be much clearer with an ex(4c) if the value of h for hk1 , . . . , kn , kn+1 , kn+2 i
ample. With xx1 2 , the functions used in its recuris m, then `S E [k1 , . . . , kn , kn+1 , kn+2 , m];
sive definition are the constant function whose
(4d) `S (1 y) E [k1 , . . . , kn , kn+1 , kn+2 , y].
value is always 1 (this plays the role of g) and
82

multiplication (this plays the role of h). Making some minor simplifications, these are represented in S by the wffs y = 1 and y = x1 x2 .
According to the above recipe, the function xx1 2
is represented by the following wff:

(z3 ) z3 < 3
(y3 ) (y4 )(Bt[z1 , z2 , z3 , y3 ] 

Bt[z1 , z2 , z30 , y4 ] y4 = y3 1)

The sequence 1, 1, 1, 1 is encoded using the


function with b = 1 and c = 6, and we have:

(9b) `S Bt[1, 6, 0, 1]
(z1 ) (z2 ) (y2 )(Bt[z1 , z2 , 0, y2 ] y2 = 1)
(9c) `S Bt[1, 6, 1, 1]

(9d) `S Bt[1, 6, 2, 1]
Bt[z1 , z2 , x2 , y] (z3 ) z3 < x2
(9e) `S Bt[1, 6, 3, 1]
(y3 ) (y4 )(Bt[z1 , z2 , z3 , y3 ] Bt[z1 , z2 , z30 , y4 ]
(9f) `S 1 = 1

(9g) `S 1 = 1 1
y 4 = y 3 x1 )
The theorem (9a) follows from these theorems
This says that there is a sequence of natural
along with theorems proven on the ordering
numbers with x2 +1 members, the first of which
handout, and existential generalization.
is 1, the last of which is y, and each one relates 10. Similar results will follow for any other arguto the previous one by being its product when
ments to xx1 2 . Certain other results are needed
multiplied by x1 . With some thought, it is clear
for (7b) but they can be proved in similar fashthat this is the case if and only if y = xx1 2 .
ion.
e
7. Of course, we still need to prove that the resulting wff satisfies the conditions for representing
f , i.e., we need to show that:
Result: Every recursive function is representable
(7a) If the value of f for hk1 , . . . , kn , kn+1 i as
in System S.
argument is m, then:
`S (z1 ) (z2 ) (y2 )(Bt[z1 , z2 , 0, y2 ]
A [k1 , . . . , kn , y2 ]) Bt[z1 , z2 , kn+1 , m]
(z3 )(z3 < kn+1
Proof:
(y3 ) (y4 )(Bt[z1 , z2 , z3 , y3 ]
The initial functions are all representable in System
Bt[z1 , z2 , z30 , y4 ]
S, and whatever can be obtained from functions

representable in S by the rules of substitution, reE [k1 , . . . , kn , z3 , y3 , y4 ])) , and

(7b) `S (1 y) (z1 ) (z2 ) (y2 )(Bt[z1 , z2 , 0, y2 ] cursion and choice of least is also representable in
S. It follows by the definition of a recursive func A [k1 , . . . , kn , y2 ]) Bt[z1 , z2 , kn+1 , y]
tion that all recursive functions are representable
(z3 )(z3 < kn+1
in S.
e
(y3 ) (y4 )(Bt[z1 , z2 , z3 , y3 ]
Bt[z1 , z2 , z30 , y4 ]

E [k1 , . . . , kn , z3 , y3 , y4 ]))
8. Sigh. We dont have time. The result follows
Corollary: All primitive recursive numberfrom the nature of the recursive definition of f
theoretic functions are representable in S.
in terms of g and h, as well as the representability of g, h and the function by A [. . .], E [. . .],
and Bt[. . .], respectively. The full proof is given
Proof:
in the book.
9. Let us content ourselves with an example. 13 is It follows from the definitions of primitive recursive and recursive functions that all primitive re1. Hence we should have:

cursive functions are recursive.
e
(9a) `S (z1 ) (z2 ) (y2 ) Bt[z1 , z2 , 0, y2 ]
y2 = 1) Bt[z1 , z2 , 3, 1]
83

Corollary: All recursive number-theoretic relations are expressible in System S, including all
primitive recursive ones.

Proof:
By definition, their characteristic functions are recursive, and hence their characteristic functions
are representable in S. We have already established
that whenever a relations characteristic function
is representable in a given theory with identity, the
relation is expressible in that theory.
e
This gives us an intuitive sense of the strength
of system S; more or less, it has as theorems the
appropriate arithmetical results regarding all recursive functions and relations, i.e., those that can in
principle be calculated by a mechanical procedure
by computer, calculator or similar device.

84

UNIT 4
GDELS RESULTS AND THEIR COROLLARIES

A.

The System ,

Axiomatization

We normally think of the wffs in a logical system The system has one axiom: 2
The system has one inference rule:
as having meaning, or at least as having a meanadd circle: if A is a wff, from A , infer A #.
ing given an interpretation, such as the standard
A theorem is any wff that can be derived from
interpretation for System S. However, it is possible
the
axiom by some finite number of applications of
to think of an axiomatic system as just a system of
the inference rule.
rules for manipulating syntactic strings.
Hence, the following are theorems:
Consider the following simple system for ma2#
nipulating strings of symbols:
2##
2###
2####
Syntax
and so on . . .
The basic syntactic units are the signs 2 and #.
A formula is any string of one or more of
Metatheory
these two signs, such as: #, 2#, #2##2
or 2##2#2.
Schmdel Numbering
A well-formed formula (wff) is any formula
that begins with 2. So 2#2##2, 2222, Since the system has no intended meaning, notions
2##, and 2#2 are all wffs. However, such as completeness and soundness do not apply.
However, this does not mean that we cannot
##2#2 and #22#2 are not wffs.
prove anything about it. We can prove, e.g., that
not every wff is a theorem, etc.
Metalogical results for System , can be
Semantics
made simpler by coordinating every wff with its
The wffs of system , do not have any intended Schmdel number. Schmdel numbering is much
interpretation or meaning. (This is not to say that easier than Gdel numbering. To get the Schmdel
they cannot be interpreted as having a meaning, number of a string of signs for ,, simply replace
however.) The system is only intended to be a every 2 with the digit 1 and every # with the
game of string manipulation for the very easily digit 0, and think of the result as a numeral writamused.
ten in binary notation. Let the Schmdel number
85

of the wff be the number that this binary numeral This says that x = 1, or 2 and nothing odd above 2
signifies.
divides evenly in to x. In effect, we can prove that
2##### is a theorem of , in S, since:
Examples: Hence the Schmdel number of

2##2# is 18 (10010 in binary), and the
`S 32 = 1 2|32
Schmdel number of 2##### is 32 (100000 in

binary).
(y)(y > 2 2|y y|32)
Similarly, we can prove in S that 2####2 is not
a theorem of ,, since:

Result: The following results hold of ,.


(a) No two wffs of , have the same Schmdel
number.
(b) The number 0 is the only natural number
that is not the Schmdel number of a wff of
,.
(c) The number 1 is the only number that is a
Schmdel number of an axiom of ,.
(d) If n and m are Schmdel numbers of wffs
of ,, then the wff corresponding to m follows from the wff corresponding to n by add
circle iff m = 2n.
(e) A natural number n is the Schmdel number of a theorem of , iff it is a power of
2.

`S 33 = 1 (2|33


(y) y > 2 2|y y|33)

Because all recursive relations are expressible in S,


in effect, all metatheory for , could be done in S
rather than English. The numerals of S in effect act
as its names for wffs of ,.

B.

(The proofs of these results are fairly obvious.)


We might even say this: although they had no
intended meaning, it is possible to think of the wffs
of , as simply standing for numbers, and it is possible to think of all the metatheoretic properties and
relations of wffs of , as being number-theoretic
properties/relations. The 1-place relation (property) of being a theorem of ,, corresponds fully to
the number-theoretic property of being a power of
2.

System S as Metalanguage
Because all the number-theoretic properties and
relations one would need to do metatheory for ,
are recursive, it turns out that System S could be
used a metalanguage for System ,. For example,
x is a power of 2 can be expressed in S using the
wff:


x = 1 2|x (y)(y > 2 2|y y|x)



System S as its Own


Metalanguage

Youve probably guessed whats coming. System S


can partially act as its own metalanguage as well,
because every wff of System S corresponds to a
Gdel number, and many (though not all) metatheoretic properties and relations of wffs of S correspond to recursive number-theoretic properties
and relations of their Gdel numbers.
Since all recursive number-theoretic relations
are expressible in S, System S can in effect be used
to say, and even be used to prove many things about
itself. The result is a strange collapse of the metalanguage into the object language.
Although much more difficult to characterize
than their counterparts for system ,, the following
number-theoretic relations are primitive recursive,
and therefore can be captured in S:
being the Gdel number of a wff of S;
being the Gdel number of an axiom of S;
being the Gdel number of a wff that follows
by MP from wffs with Gdel numbers are n
and m;
being the Gdel number of a wff following
by Gen from a wff with Gdel number n;

86

being a number that encodes a finite sequence of Gdel numbers whose corresponding wffs, in order, constitute a proof of the
wff with Gdel number n, etc.
These properties and relations are entirely
arithmetical in nature, just like being a power
of 2 is entirely arithmetical in nature.

Gdelian Results
Gdel found a trick to make it possible, for any
system that can do enough mathematics to express
recursive properties and relations, to construct a
closed wff written entirely in the language of that
system that in effect says that its own Gdel number is not the Gdel number of a theorem of that
system. It then follows that if the system is consistent, it cannot be complete.
1. Suppose that for System S the wff in question
is abbreviated as G . Note that since G is built
up entirely in the syntax of S, it is really a
mathematical statement, involving only 0, 0 , +, ,
=, variables, and the logical constants.
Note that:
(1a) G is true in the standard interpretation for
S iff not-`S G .
(1b) G is true in the standard interpretation
iff `S G .
2. Suppose for reductio that both:
(2a) System S is consistent, i.e., there is no wff
A such that `S A and `S A .
(2b) System S is complete, i.e., for all wffs A ,
if A is true in the standard interpretation,
then `S A .
3. It follows that G is not true in the standard interpretation. If it were true, by (2b) it would be
a theorem, but by (1a) it would also not be a
theorem, which is impossible.
4. Since G is closed, and it is not true in the standard interpretation, G is true in the standard
interpretation. It then follows by (1b) that `S G ,
but it follows from (2b) that `S G . Hence, S is
inconsistent, which contradicts (2a).
5. Because S appears to be consistent, we must
conclude that it is incomplete.
Note that this means that there are purely arithmetical truths written in the syntax of System S

that cannot be derived within System S. These only


involve only 0, 0 , +, , =. So they are truths of
the natural numbers, i.e., of number theory. Peano
arithmetic, therefore, fails as a complete axiomatization of number theory.
The defect is not localized to System S. Any axiomatic system for mathematics of which it is true
that: (i) all recursive relations are expressible in it,
(ii) it has an arithmetizable syntax (its wffs can be
Gdel-numbered), (iii) the relation that holds between m and n just in case m encodes a sequence
of wffs of the system that constitutes a proof of the
wff of which n is the Gdel number, is a recursive
relation, either is inconsistent, or fails to capture
all truths of number theory, for similar reasons.
Adding more axioms and/or inference rules will
not help; this will simply change the recursive
properties and relations involved, but there will
still exist unprovable truths.
In fact, Gdel himself first proved his results
not for a first-order system like S, but for higher order logics similar to Whitehead and Russells Principia Mathematica, in his classic 1931 paper, ber
formal unentscheidbare Stze der Principia Mathematica und verwandter Systeme. (On Formally
Undecidable Propositions of Principia Mathematica
and Related Systems.)
Could we just give up one of (i) through (iii)?
None are promising. Obviously, no axiomatization
for mathematics that couldnt express recursive relations could be adequate. It is not known how to
construct a syntax that is not arithmetizable but is
still learnable, and similarly it is not known how
to construct an axiom system that is learnable and
useable in practice but in which the relation mentioned above would not be recursive. (These are
widely believed to be impossible.)
The upshot of this: It is impossible to capture
all arithmetical truths within a learnable axiomatic
system.
Over the next few weeks, well be looking at
these and similar results, more precisely, and in
more detail. This handout is only meant as a rough
sketch, and is somewhat crude and oversimplified.
The underlying idea of arithmetizing metalogic has also made possible bringing to bear the
full array of mathematical knowledge to issues in

87

metalogic, which has led to many other interesting 5. If P is a predicate letter, and n is the number
results besides Gdels.
of its subscript and m is the number of its superscript, then depending of which letter of the
alphabet is used (A through T), let k be one of
C. Arithmetization of Syntax
1 through 20, and let g(P) = 3 + 8(2m 3(20n+k) ).
We start with the process of Gdel numbering;
note that because of differences in the way I originally laid out the syntax and the way Mendelson
did, there are some very subtle but unimportant
differences in our way of doing Gdel numbering
below.

Gdel Numbers for Simple Signs

Gdel Numbers for Strings


Each string of simple symbols built from these simple signs can then be correlated with a finite sequence of the above numbers. This includes both
well-formed and ill-formed formulas, and function
terms. Hence, the wff I 2 (a, a) (i.e., a = a) is
correlated with:

Every formula constructed in the syntax of (first629859, 3, 15, 7, 15, 5


order) predicate logic is built up from the following
simple signs: (, ,, ), , , , as well as the in- 1. We can then extend the notion of Gdel numbering to cover strings by coordinating each
dividual constants, variables, predicate letters and
formula with the number that encodes the sefunction letters.
quence of numbers of its simple symbols in orThe process of Gdel numbering begins by
der, so:
defining a function g that assigns to each simple
sign a different odd positive integer:
g(I 2 (a, a)) = 2629859 33 515 77 1115 135
1. Firstly, we let . . .
2. The Gdel numbers of strings of symbols do not
g(() = 3,
overlap with Gdel numbers of simple symbols,
since the latter are always odd, and the former
g()) = 5,
are always even (all have 2 in their prime factorg(, ) = 7,
ization).
g() = 9,
3. Note that we must distinguish between the
g() = 11, and
Gdel number of the simple symbol ( and the
g() = 13.
Gdel number of the one-character-long string
(. The former is 3; the latter is 23 , i.e., 8.
2. If c is a constant, and n is the number of its
subscript (if c has no subscript, then n = 0),
then depending on which letter of the alphabet Gdel Numbers for Sequences of Formuis used, let k be either 1, 2, 3, 4 or 5 (1 for a, 2 las
for b, etc.), and let g(c) = 7 + 8(5n + k).
1. Each finite sequence of formulas or other strings
3. If x is a variable, and n is the number of its
(e.g., a proof) can be correlated with a finite sesubscript, then depending on which letter of
quence of Gdel numbers. E.g., if our sequence
the alphabet is used, let k be either 1, 2, or
of wffs is:
3 (1 for x, 2 for y and 3 for z), and let
A0 , A1 , . . . , Ak
g(x) = 13 + 8(3n + k).
This can be correlated with the sequence:
4. If F is a function letter, and n is the number
of its subscript, and m is the number of its sug(A0 ), g(A1 ), . . . , g(Ak )
perscript, then depending of which letter of the
alphabet is used (f through l), let k be one of
We can then extend the notion of Gdel numm (7n+k)
1 through 7, and let g(F ) = 1 + 8(2 3
).
bering to cover such sequences by using the
88

numbers that encode the sequences of their Recursive Syntax Arithmetization


Gdel numbers. Hence for the above, we have:
Different first-order languages use different cong(Ak )
g(A0 ) g(A1 )
g(A0 , A1 , . . . , Ak ) = 2
3
. . . pk
stants, function-letters and predicate-letters. E.g.,
System S only has the constant a, the predicate
2. Similarly, Gdel numbers of sequences of for- letter I 2 and function letters f 1 , f 2 , f 2 (i.e.,
1
2
mulas also do not overlap with Gdel numbers 0, =, 0 , + and ). System PF, however, allows any
of singular formulas. While both are always constant, function-letter or predicate letter. The
even, in their prime factorizations, in the latter, pure predicate-calculus (PP) is just like PF except
2 is always raised to an odd power, and in the that it has no constants or function-letters.
former, 2 is always raised to an even power.
3. Also we must distinguish between the Definition: A theory K is said to have a (primGdel number of a formula itself, and the itive) recursive vocabulary iff the following
Gdel number of a one-membered wff- number-theoretic properties are (primitive) recursive:
sequence. The Gdel number of I 2 (a, a) is
(a) IC(x): x is the Gdel number of a constant used
2629859 33 515 77 1115 135 , but the Gdel number of
(allowed) in K.
sequence consisting of this formula alone is 2
(b)
FL(x): x is the Gdel number of a functionraised to the power of 2629859 33 515 77 1115 135 .
letter used (allowed) in K.
(c) PL(x): x is the Gdel number of a predicateWorking Backwards
letter used (allowed) in K.
Not only do we know the algorithm for determining the Gdel number of some expression of predicate logic, there is also a fairly simple algorithm for
working in the reverse direction: i.e., given a natural number, determining what, if anything, that
number Gdelizes.
Odd numbers below 15 are obvious; all other
odd numbers represent either variables, constants,
predicate-letters or function-letters depending on
whether the remainder is 1, 3, 5 or 7, respectively,
when divided by 8.
For an even number, you must determine its
prime factorization and then work from there.
Examples:
77 is odd. Its remainder when divided by 8
is 5. Hence it is a variable. 77 = 13 + 64 =
13 + (8 8) = 13 + (8 ((3 2) + 2)). Decoding, we see that this is the number of the
variable y2 .
The prime factorization of 4,060,435,238,092,
800,000,000,000,000,000,000,000 is 251 33 523 75 ,
which corresponds to the wff A1 (b).
A logically unimportant point of trivia: all
Gdel numbers of wffs are evenly divisible
by 1000 or a higher power of 10. Do you see
why?

Systems S, PF and PP all have primitive recursive


vocabularies. For S, e.g., IC(x) is the property x
has iff x = 15; for PF, IC(x) is the property x has
y

iff

(x = 7 + 8y); for PP, IC(x) is the property x

y<x

has iff x 6= x (i.e., the empty set).


Indeed, it is difficult to imagine a theory without a recursive vocabulary. For such a theory, there
would be no effective method to determine whether
a given symbol (e.g., b312 ) was allowable or not!

Result: The following property is primitive recursive:


Vbl(x): x is the Gdel number of a variable.

Proof:
All standard first-order theories use all variables,
y

and so Vbl(x) iff

(x = 21 + 8y), and the latter

y<x

is primitive recursive.

89

but we can put it as:


Result: For any theory with a (primitive) recursive vocabulary, the number-theoretic properties,
relations and functions listed below are (primitive) recursive.

Atfml(x) or

(Fml(y) and x = 2

y) or

y<x
y

((Fml(y) and Fml(z) and

y<x z<x

x = 23 y 211 z 25 ) or
(Fml(y) and EVbl(z) and
x = 23 23 213 z 25 y 25 ))

(For some, I give the arithmetical formulas characterizing them in the metalanguage; for the rest,
consult the book. For the most part, they can be recursively characterized fairly easily with the functions used to do encoding, especially. A rare
few involve course-of-values recursion or the like.)
(a) EVbl(x): x is the Gdel number of a singlesymbol string consisting of a variable alone,
y

i.e.:

(Vbl(y) and x = 2 ).
y

y<x

(b) (property) EIC(x): x is the Gdel number of a


single-symbol string consisting of a constant
y

alone, i.e.:

(IC(y) and x = 2 ).
y

(k)

y<x

(c) (property) EFL(x): x is the Gdel number of a


single-symbol string consisting of a functiony

letter alone, i.e.:

(FL(y) and x = 2 ).
y

y<x

(d) (property) EPL(x): x is the Gdel number of a


single-symbol string consisting of a predicatey

letter alone, i.e.:

(PL(y) and x = 2 ).
y

(l)

y<x

(e) (function) ArgT(x): the superscript on the


function-letter with Gdel number x.
(f) (function) ArgP(x): the superscript on the (m)
predicate-letter with Gdel number x.
(g) (property) Gd(x): x is the Gdel number of
any string of signs allowed in the theory.
(h) (property) Trm(x): x is the Gdel number of a
term of the theory.
(n)
(i) (property) Atfml(x): x is the Gdel number of
an atomic formula of the theory.
(j) (property) Fml(x): x is the Gdel number of a (o)
wff of the theory. Actually, Mendelsons definition is wrong, even for his own notation,
90

I.e., x is either the Gdel number of an atomic


formula or there is a lower number y that is
the Gdel number of a wff A and x is the
Gdel number of A , or there are two lower
numbers y and z such that either y and z are
the Gdel numbers of A and B and x is the
Gdel number of (A B) or y is the Gdel
number of a wff A and z is the Gdel number
of a variable v and x is the Gdel number of
((v) A ).
(The error was pointed out to be my a student
in Korea who did not give his or her name.
The correction is mine.)
(relation) MP(x, y, z): z is the Gdel number of
a wff that follows by MP from the wffs whose
Gdel numbers are x and y, i.e.: (Fml(x) and
Fml(y) and Fml(z)) and (either (y = 23 x
211 z 25 ) or (x = 23 y 211 z 25 )).
(I.e., x, y and z correspond respectively to wffs
A , B and C , and A is (B C ) or B is
(A C ).
(relation) Gen(x, y): y is the Gdel number of
a wff that follows by Gen from the wff whose
Gdel number is x.
(function) Sub(x, y, z): the Gdel number of
what results from substituting the term with
Gdel number y for all free occurrences of the
variable with Gdel number z in the wff with
Gdel number x.
(relation) Fr(x, y): x is the Gdel number of a
wff that contains free occurrences of the variable with Gdel number y.
(relation) Ff(x, y, z): x is the Gdel number of
a term that is free for the variable with Gdel
number y in the wff with Gdel number z.

(p) (property) AxA1 (x): x is the Gdel number


(21 is the Gdel number of the variable x.)
of an instance of axiom schema (A1), i.e.: Definition: A theory K is said to have a (primiy
z
(Fml(y) and Fml(z) and x = 23 y tive) recursive axiom set iff, for that theory, the
y<x z<x
following property is (primitive) recursive:
211 23 z 211 y 25 25 ).
PrAx(x): x is the Gdel number of a proper (non(q) (properties) AxA2 (x), AxA3 (x), AxA4 (x), logical) axiom of the theory K.
AxA5 (x), etc., are characterized similarly.
(r) (property) LAX(x): x is the Gdel number of
an instance of one of (A1)(A5) (a logical axResult: System S has a primitive recursive axiom).
iom set.
(s) (function) Neg(x): the Gdel number of the
negation of the wff with Gdel number x, i.e.:
Neg(x) = 29 x.
Proof:
(t) (function) Cond(x, y): the Gdel number of
1. (property): AxS1 (x): x is the Gdel number of
the conditional with the wffs with Gdel num(S1), i.e.:
bers x and y as antecedent and consequent,
x = 23 3629859 53 721 117 1329 175 1911 233 29629859
respectively.
313 3721 417 4337 475 5311 59629859 613 6729 717 7337
(u) (property) Sent(x): x is the Gdel number of a
793 833 893 .
closed wff (a sentence).
This the Gdel of the the wff:
(v) (function) Clos(x): the Gdel number of the
(I 2 (x, y) (I 2 (x, z) I 2 (y, z)))
universal closure of wff with Gdel number x.

2. (properties) AxS2 (x) through AxS8 (x) can be


characterized similarly.
Theory-Specific Functions and Relations
3. (property): AxS9 (x): x is the Gdel number of
In any theory such as S that uses the specific conan instance of schema (S9), i.e.:
stant a as its numeral for 0, and constructs the
y
z
remaining numerals using the function-letter f 1
(EVbl(2z ) and Fml(y) and
for successor of, the following are primitive rey<x z<x
cursive:
x = 23 Sub(y, 215 , z) 211 23 23 23
(a) (function) Num(x): the Gdel number of the
numeral standing for the number x. This is
213 2z 25 23 y 211
defined by recursion as follows:
Sub(y, 249 23 2z 25 , z) 25 25 211

Num(0) = 215 (i.e., the Gdel number of a)


Num(y + 1) = 249 23 Num(y) 25

23 23 213 2z 25 y 25 25 25 )

4. PrAx for System S is then just the disjunction


of AxS1 through AxS1 ; hence S has a primitive
(b) (property) Nu(x): x is the Gdel number of a
y
recursive axiom set.
e
numeral, i.e., (x = Num(y)).

y<x

(c) the diagonalization function D(y): this is


an evil function that, if its argument is the
Gdel number of a wff A [x] containing the
variable x free , returns as value the Gdel
number of the wff obtained by substituting
the numeral for the Gdel number of A [x]
for all free occurrences of x in A [x], i.e.:
D(y) = Sub(y, Num(y), 21)
91

Result: For any first-order axiomatic system


with both a (primitive) recursive vocabulary and
a (primitive) recursive axiom set, the numbertheoretic properties and relations listed below are
also (primitive) recursive.

(a) (property) Ax(x): x is the Gdel number of an


axiom (logical or proper) of the system.
(This is simply the disjunction of LAX and
PrAx.)
(b) (property) Prf(x): x is the Gdel number of a
proof of the theory (i.e., it encodes a sequence
of wffs such that each member of the sequence
is either an axiom, or follows from previous
members of the sequence by Gen or MP.)
This property is defined by course-ofvalues recursion for properties, i.e., oneplace relations. To really prove that it is
recursive, we would first have to show
that the course-of-values function, CPrf# ,
for the characteristic function of Prf, viz.,
CPrf , is recursive, then obtain CPrf from
CPrf# .
Intuitively, however, we know that it is
a recursive relation if we know how to
determine whether or not it applies to
a given number when we already know
how to determine whether it or not applies to any smaller number. Note that:
y

Prf(x) iff [either

(Ax(y) and x = 2 ) or
y

y<x
y

(Prf(y) and Gen((y) , w) and


z

y<x z<`~(y) w<x

x = y 2 ) or

(Prf(y)

y<x z<`~(y) v<`~(y) w<x

and MP((y)z , (y)v , w) and x = y 2w ) or


y

from the Gdel number of a shorter proof


by appending the Gdel number of a new
axiom to the encoding.
(c) (relation) Pf(x, y): x is the Gdel number of
a proof in the theory of the wff with Gdel
number y, which is characterized this way:
Prf(x) and y = (x)(`~(x))
In other words, x is a proof, and y is the exponent in the greatest prime number in the
prime factorization of x (i.e., y is the last number encoded in x.)
(Strictly speaking, since PrAx is defined differently
in different theories depending on what axioms
they have, the above are also defined differently
for different theories.)
We can now prove all functions representable
in S are recursive. We noted earlier that, roughly
speaking, recursive functions represent those a
computer can in principle given enough time determine algorithmically. The following argument
suggests roughly that if a function is representable
in a system like S, then, heres one way for a computer to compute its value, i.e., go through the
Gdel numbers of proofs in S. For each, check if it
proves anything using the wff used to represent the
function for the values in question. If it does, then
the value of the function is whatever the proof in S
proves it to be. Keep looking until you find such a
proof. Since the function has a value, and there is
a proof in S that the function has that value, youll
eventually find it this way.

(Prf(y) and Ax(z) and x = y 2 )]


z

y<x z<x

Roughly, this says, x is the Gdel number of a proof iff either (i) it encodes a
sequence consisting of an axiom by itself, (ii) it is obtained from the Gdel
number of a shorter proof by appending
the Gdel number of a new Gen step to
the encoding, (iii) it is obtained from the
Gdel number of a shorter proof by appending the Gdel number of a new MP
step to the encoding, or (iv) it is obtained
92

Result: For any theory K (such as S) with a system of numerals, a recursive vocabulary and axiom set, and for which the following principle
holds:
(%) For any natural numbers r and s, if
`K r = s then r = s.
For any n-place number-theoretic function f , if
f is representable in K, then f is recursive.

Proof:
(5c) So we can obtain PA by substitution:
(1) Assume K is such a theory and assume that f
is representable in K.
PA (z1 , . . . , zn , u, v) =
(2) By the definition of representability, there is
Pf(v, Sub(. . . Sub(Sub(c, Num(u), 29),
some wff A [x1 , . . . , xn , y] with x1 , . . . , xn and
Num(z1 ), 45) . . . , Num(zn ), 21+24n))
y as its free variables such that, for all natural
numbers k1 , . . . , kn , and m:
Since Pf, Sub and Num are recursive, so
(2a) If the value of f for hk1 , . . . , kn i as arguis PA .
ment is m, then `K A [k1 , . . . , kn , m];
(6) For any natural numbers k1 , . . . , kn , r and j,
(2b) `K (1 y) A [k1 , . . . , kn , y].
if PA (k1 , . . . , kn , r, j), then the value of f for
(3) Let c be the Gdel number of the wff
hk1 , . . . , kn i as argument is r.
A [x1 , . . . , xn , y] that represents f .
(6a) Assume PA (k1 , . . . , kn , r, j).
(4) Consider the (n + 2)-place number-theoretic
(6b) By the definition of PA ,
relation PA such that PA (z1 , . . . , zn , u, v) iff v
`K A [k1 , . . . , kn , r].
is the Gdel number of a proof in K of the wff:
(6c) f is a number-theoretic function;
so it must have some value s for
hk1 , . . . , kn i as argument.
By (2a),
A [z1 , . . . , zn , u]
`K A [k1 , . . . , kn , s].
(6d) By (2b), (6b) and (6c), `K r = s.
Or in other words, PA holds for z1 , . . . , zn , u
(6e) By principle (%), it must be that r = s.
and v iff v is the Gdel number of an object- (7) For any hz1 , . . . , zn i, f will have some
language proof essentially to the effect that
value, u, and by (2a) there will be a proof
for A [z1 , . . . , zn , u] in K. Hence for any
hz1 , . . . , zn i there will be a sequence:
f (z1 , . . . , zn ) = u.
u, v
(5) We can prove that PA is a recursive relation.
Where v is the Gdel number of a proof of
(5a) Because K has a recursive vocabulary and
A [z1 , . . . , zn , u] in K, which is to say:
axiom set, the following functions and
PA (z1 , . . . , zn , u, v).
relation, discussed above, are recursive:
Pf(x, y), Sub(x, y, z), and Num(x).
Let w be the number that encodes the above
(5b) Note that c is the Gdel numsequence, i.e., 2u 3v . Hence:
ber of A [x1 , . . . , xn , y], and 29 is
the Gdel number of y, and the
PA (z1 , . . . , zn , (w)0 , (w)1 ).
Gdel numbers of x1 , . . . , xn are
(8) Then f can be obtained from PA using the
45, 69, 93, . . . (increases by 24) . . . ,
choice of least rule and the function (x)y . Con(21 + 24n).
sider:
Then we can see that:
Sub(c, Num(u), 29) is the Gdel number
f (z1 , . . . , zn ) =
of A [x1 , . . . , xn , u], and so
(w(PA (z1 , . . . , zn , (w)0 , (w)1 )))0
Sub(Sub(c, Num(u), 29), Num(z1 ),
45) is the Gdel number of
This function will return the u such that w is
A [z1 , x2 , . . . , xn , u].
the least number that encodes a sequence u, v
Repeating this process, we can see that:
such that:
Sub(. . . Sub(Sub(c, Num(u), 29), Num(z1 ),
45) . . . , Num(zn ), 21 + 24n) is the Gdel
PA (z1 , . . . , zn , u, v)
number of A [z1 , . . . , zn , u].
93

By (6) above, this u will be the value of f for (RR6) x = y (x z = y z z x = z y)


hz1 , . . . , zn i. Because PA is a recursive rela- (RR7) x0 = y0 x = y
tion, and we obtained f using the choice of (RR8) 0 6= x0
least rule from PA and the primitive recursive (RR9) x 6= 0 (y)(x = y0 )
function (x)y , f is recursive.
e (RR10) x + 0 = x
(RR11) x + y0 = (x + y)0
(RR12) x 0 = 0
(RR13) x y0 = (x y) + x
Corollary: In any theory meeting the conditions (RR14) (x = (x1 x2 ) + y y < x1 )
above, all expressible number-theoretic relations
(x = (x1 x3 ) + z z < x1 ) y = z
are recursive.
Notice that all of the above are particular axioms,
not axiom schemata. RR has exactly 14 proper
axioms, while, strictly speaking, S has infinitely
many. (RR14) was added by Mendelson to make
it easier to prove that Gdels -function is represented by RR, but the really interesting thing about
Corollary: A number-theoretic function f is
this system is that its proper axioms are finite, not
representable in S if and only if it is recursive,
whether
they are 13 or 14 in number.
and a number-theoretic relation R is expressible
The primary difference between RR and S is
in S if and only if it is recursive.
that RR does not contain something equivalent to
(S9): Peano arithmetics principle of mathematical
induction. However it does add axioms that are
equivalent to many of the important theorems one
would
use (S9) to get in System S. This includes
D. Robinson Arithmetic
(Ref/Trans/Sub=), (Sub+), (Sub), etc., which are
After Gdel discovered his famous results for needed to get Leibnizs law, or (A7) of PF= .
Peano arithmetic, the mathematician Raphael
Robinson decided to see how weak he could make Definition: First-order K is a subtheory of theory
an axiomatic system in which it would still be the K0 if and only if every theorem of K is a theorem K0 ;
case that all (and only) recursive number-theoretic K is a proper subtheory of K0 if K is a subtheory of
functions are representable. Here is the result, K0 , but K0 is not a subtheory of K.
slightly modified by Mendelson. (Robinson Arithmetic is usually called system Q; with Mendelsons Example: RR is a proper subtheory of S.
change, we call it RR.)

Just how weak is RR?

The System RR
The syntax and intended semantics for RR are the
same as for system S. For its deductive theory, it
consists of the logical axioms (A1)(A5), the inference rules MP and Gen, and the following proper
axioms.
(RR1) x = x
(RR2) x = y y = x
(RR3) x = y (y = z x = z)
(RR4) x = y x0 = y0
(RR5) x = y (x + z = y + z z + x = z + y)

RR does very well when it comes to dealing with


numerals, and in general in proving things about
closed terms. For example:
For any natural numbers n and m,
(a) `RR n + m = n + m;
(b) `RR n m = n m;
(c) if n 6= m, then `RR n 6= m;
(d) if n < m, then `RR n < m;
(e) `RR x 0;
(f) `RR x n (x = 0 . . . x = n) etc.

94

Without an induction principle, however, there are E. Diagonalization


many similar results making use of variables and
quantifiers of the system that one cannot prove in Preliminaries
RR. For example, the following are not theorems of
Abbreviation: pA q is shorthand for the objectsystem RR:
language numeral for the Gdel number of A .
(x) (y) x + y = y + x

Example: The Gdel number of I 2 (a, a) is


2629859 33 515 77 1115 135 , so pI 2 (a, a)q is the numeral
2629859 33 515 77 1115 135 , which is actually 0 followed by 2629859 33 515 77 1115 135 successor function
signs (0 ).

(x) (y) x y = y x
(x) (y) (z)(x + y) + z = x + (y + z)
(x) (y) (z) x (y + z) = (x y) + (x z)
However, the lack of such principles does not interfere with results about which number-theoretic
functions are representable, and which numbertheoretic relations are expressible. Recall that the
definitions of expressibility and representability
primarily have to do with getting the appropriate
theorems for the right numerals, not for getting
general results stated with quantifiers. In fact . . .

Insofar as system S (or similar theory) can partly


act as its own metalanguage, such numerals
as pI 2 (a, a)q act as its name for its own wff
I 2 (a, a).

Result (The Fixed-Point Theorem): For any


theory K (such as S or RR) such that (i) K
is a theory with identity, (ii) K has a system
of numerals and a recursive vocabulary, (iii)
all recursive number-theoretic functions are
representable in K, it holds that for any wff E [x]
containing x as its only free variable, there is a
closed wff B such that:

Result: A number-theoretic function is representable in RR iff it is recursive, just as in system


S. Similarly, a number-theoretic relation is expressible in RR iff it is recursive.

`K B E [pBq]

The proofs of these results for RR are almost exactly the same as the corresponding proofs for S.
It would be matter of tedious backtracking to see
this. This is not surprising, since RR was custom
tailored to allow these results to go through. In the
proof of these results for S, we rarely appealed to
theorems that require (S9), and in those few occasions in which we did, we have been given a new
axiom of RR that works just as well.
Obviously, RR is incomplete, and too weak for
what we wanted. However, it will turn out to be
useful later in the unit to have a weaker system
with only a finite number of proper axioms, to
make certain other things easier to prove, especially Churchs theorem.

This theorem states that for any wff of the form


E [x], there is closed wff B such that, within K, B
is equivalent to the claim that E [x] holds for Bs
own Gdel number.
Example: Consider the wff of S, x = 0. By the
theorem, there is some wff C such that
`S C pC q = 0.
This can be thought of this way: C says of itself that its own Gdel number is 0. (In this case,
`S C .)
The proof of the theorem relies on the (evil) diagonalization function D, introduced on p. 91. Recall
that when its argument is the Gdel number of a

95

wff of the form A [x], the value of D is the Gdel (7) We can now prove the biconditional:
number of the formula obtained by substituting the
`K B E [pBq]
numeral for the Gdel number of A [x] for all free 1. B `K B
(Premise)
occurrences of x in A [x].
2. B `K (y)(D[p, y] E [y])
1 (5b)
2 UI
3. B `K D[p, q] E [q]
Proof:
4. B `K E [q]
3, (6a), MP
(1) Assume that K is a theory meeting conditions 5. B ` E [pBq]
4 (5c)
K
(i)(iii) above, and then consider any wff E [x] 6. ` B E [pBq]
5 DT
K
having x as its only free variable.
7. E [pBq] `K E [pBq]
(Premise)
(2) Because K has a recursive vocabulary and a 8. D[p, y] ` D[p, y]
(Premise)
K
system of numerals, the function D for K is a 9. D[p, y] ` y = q
8, (6a), (6b) PF=
K
recursive number-theoretic function.
9 (5c)
10. D[p, y] `K y = pBq
(3) Because all recursive functions are repre- 11. E [pBq], D[p, y] ` E [y]
7, 10 LL
K
sentable in K, and D is a recursive function, 12. E [pBq] ` D[p, y] E [y]
11 DT
K
there is some wff D[x, y], such that, for all nat- 13. E [pBq] ` (y)(D[p, y] E [y])
12 Gen
K
ural numbers k and m:
14. E [pBq] `K B
13 (5b)
(3a) If D(k) = m, then `K D[k, m].
15. `K E [pBq] B
14 DT
(3b) `K (1 y) D[k, y].
16. `K B E [pBq]
6, 15 SL
(4) Now consider the following wff:
This establishes the theorem.
e
(4a) (y)(D[x, y] E [y])
This wff more or less says that E holds of the This establishes the theorem. The Fixed-Point TheGdel number obtained from x by the diago- orem makes a certain kind of self-reference possible, which leads to all sorts of fun results.
nalization function.
(5) Let p be the Gdel number of the wff (4a).
(5a) p is p(y)(D[x, y] E [y])q
Consider now the following closed wff, F. -Consistency, True
which hereafter well call B:
Theories and Completeness
(5b) B is (y)(D[p, y] E [y])
This wff, B, says that E holds of the Definition: A theory K with a system of numerGdel number obtained from p from the als is said to be -consistent iff for every wff A [y]
diagonalization function. Let q be the containing y as its only free variable, if it is true for
every natural number n that
Gdel number of B. Hence:
(5c) q is pBq
`K A [n],
Notice that (4a) is itself of the form A [x].
Hence, the value of the diagonalization then it is not the case that `K (y) A [y].
function for its Gdel number p, will be Basically, a system is -consistent if whenever you
the Gdel number of B, i.e., q:
can prove that A [y] holds for each particular
(5d) D(p) = q
numbers, you cannot then also prove the quantiNotice that because B says that E holds fied statement there is some number y such that
of the Gdel number obtained from p A [y].
from the diagonalization function, and
q is the Gdel number of B, B in effects Definition: A theory K with the same syntax as S
says that E holds of its own Gdel num- is said to be a true arithmetical theory iff all its
proper axioms are true in the standard interpretation.
ber.
(6) By (5d) and (3a) and (3b), we can conclude:
(6a) `K D[p, q]
(6b) `K (1 y) D[p, y]

Remember that the standard interpretation is the


interpretation M such that (i) the domain of quantification D of M is the set of natural numbers, (ii)
96

(0)M is zero, (iii) (=)M is the identity relation on


the set of natural numbers, and (iv) (+)M is the addition function, ()M is the multiplication function,
and (0 )M is the successor function.

Proof:
Both are true arithmetical theories.

Result: For any theory K, if K is -consistent,


then it is consistent.

Proof:
Suppose K is -consistent. Then it cannot be inconsistent, because every wff is provable in an
inconsistent system. Assume for reductio that K is
inconsistent. For any wff A [y] containing y as its
only free variable, it will be true for every natural
number n that `K A [n], but it will also hold that
`K (y) A [y]. So K is not -consistent.
e

Result: All theorems of a true arithmetical theory are true in the standard interpretation.

Corollary: Systems S and RR are -consistent.

Completeness and Decidability


Recall that there are two widespread definitions
of the word complete in mathematical logic, as
discussed on p. 23.
1. On one definition, (the definition I prefer), a system is said to be complete iff every wff that should
be a theorem in virtue of the intended semantics
for the system is a theorem. (This definition was
first used by Gdel.)
Examples:

(a) System PF was designed to have, as theorems,


Proof:
all wffs that are logically valid. Hence to prove
By supposition, all the proper axioms of K are true
it complete, we needed to prove that if  A
in the standard interpretation, and so are the logithen `PF A .
cal axioms, and MP and Gen preserve truth in an (b) System PF= was designed to have, as theointerpretation.
e
rems, all wffs that are identity-valid. Hence to
prove it complete, we needed to prove that if
= A then `PF= A .
Result: If a theory K is a true arithmetical the(c) System S was designed to have, as theorems,
ory, then it must be -consistent.
all wffs that are true in the standard interpretation. Hence to be complete, it would have
to be the case that if A is true in the standard
interpretation, then `S A .
Proof:
In such a theory, suppose that for every natural
number n, `K A [n]. Then, for every n, A [n] The first definition makes completeness about the
is true in the standard interpretation M. Hence, relationship between the semantics of the system
(y) A [y] must be true in the standard interpreta- and its system of deduction.
tion, because for every natural number n, (n)M = n,
and the natural numbers exhaust the domain of 2. On the other definition, a system K is said to be
quantification of M. Hence (y) A [y] cannot be a complete (or, as I like to say, maximal) iff for evtheorem of K, because if it were, it would be true in ery closed wff A , either `K A or `K A . (This
the standard interpretation, and it cannot be, since definition was first used by Polish American mathit is the negation of (y) A [y].
ematician Emil Post.)
97

This definition has nothing directly to do with


semantics, only with the system of deduction.
Systems PF and PF= are complete in Gdels
sense, but not in Posts. Indeed, it would be a bad
thing if PF were complete in Posts sense, because
a wff should be a theorem of PF iff it is a logical
truth, and so, for any contingent wff A , neither it
nor its negation should be a theorem of PF.
However, given Ss limited syntax and single
intended interpretation, the two definitions of completeness coincide. Why?

Result: For any theory with identity K (e.g., S


or RR) such that (i) K has a recursive axiom set
and vocabulary, (ii) every recursive function is
representable in K and every recursive relation
is expressible in K, and (iii) K is -consistent,
there is at least one undecidable sentence in K,
G (called the Gdel sentence for K).
(Gdels First Incompleteness Theorem).

Proof:
1. Assume K is a theory with the characteristics
above.
2. Because K is -consistent, it is consistent.
3. Because K has a recursive axiom set, the number
theoretic relation Pf(x, y), that holds between x
and y iff x is the Gdel number of a proof in K
of the wff with Gdel number y, is a recursive
relation.
4. Because every recursive relation is expressible
in K, Pf is expressible in K. Hence there is some
wff Pf [x1 , x2 ] such that, for all natural numbers
k1 and k2 :
(4a) If Pf holds for hk1 , k2 i, then
Definition: For a given closed wff A in a system
`K Pf [k1 , k2 ].
K, A is called an undecidable sentence iff neither
(4b) If Pf does not hold for hk1 , k2 i, then
`K A , nor `K A .
`K Pf [k1 , k2 ].
5. Consider the following wff:
Within a given interpretation, every closed
wff is either true, or its negation is true. Because S aims to capture everything that is
true in the standard interpretation, it could
be complete in Gdels sense only if it is complete in Posts sense, because to capture all
truths, for every closed wff A , it must capture either A or A , depending on which
is true in the standard interpretation.
Unfortunately S is complete in neither sense,
because Gdel showed that any theory similar to S has undecidable sentences.

Obviously, any system with undecidable sentences


is incomplete in Posts sense.
Note that the word undecidable is also used
with a different meaning in mathematical logic, although applied to systems rather than individual
sentences. Well actually discuss this meaning on
p. 104. But first, (drumroll please) . . .

G.

Gdels First Incompleteness


Theorem

(y) Pf [y, x]
Because every recursive function is representable in K, the Fixed-Point Theorem is applicable to the above wff, and hence, there is a
closed wff , which well call G , such that:
(5a) `K G (y) Pf [y, pG q]
In effect, G is equivalent to the assertion that no
natural number is the Gdel number of a proof
of G in K, i.e., G asserts that it is not provable.
6. We must now prove that G is an undecidable
sentence of K. Let q be the Gdel number of G .
7. We will first show that it is not the case that
`K G by reductio.
(7a) Assume that `K G .
(7b) Then there must be some proof of G in K.
This proof must have a Gdel number, r.

98

Proof:
Hence Pf(r, q).
They have the features necessary for the applicaBy (4a) `K Pf [r, q].
The Gdel number of G is q, so q is pG q. bility of Gdels theorems.
Hence, `K Pf [r, pG q].
But, by (7a) and (5a), `K (y) Pf [y, pG q].
Hence `K Pf [r, pG q].
Corollary: Any theory K to which the above theBy (7h) and (7f), K is inconsistent, contra- orem applies, with the same syntax as S and RR
dicting (2) above.
and the same intended semantics as S and RR
8. Hence it is not the case that `K G . This means
(including S and RR themselves) is also incomthat no natural number is the Gdel number of a
plete in Gdels sense.
proof of G in K. Hence, for all natural numbers
n, the relation Pf does not hold for hn, qi.
9. From (8) and (4b), we can conclude that:
Proof:
(9a) For all natural numbers n, `K Pf [n, q].
For every undecidable sentence, either it or its
(9b) The Gdel number of G is q, so q is pG q.
negation is true in the standard interpretation, and
(9c) Hence, (9a) means that for all natural numhence there are sentences that are true in the stanbers n, `K Pf [n, pG q].
dard interpretation, but are not theorems of K. e
(9d) Because K is -consistent, from (9c) we
can infer that 0K (y) Pf [y, pG q].
In particular, the Gdel sentence G of K is true in
10. We now show that it is not the case that `K G , the standard interpretation but is not a theorem of
again by reductio.
K. As we have just seen, for K, neither `K G nor
(10a) Assume that `K G .
`K G . Since G is closed, either G or G must be
(10b) By (5a) and (10a), `K (y) Pf [y, pG q]. true in the standard interpretation. However, since
(10c) This abbreviates to `K (y) Pf [y, pG q]. G asserts its own unprovability, and, in fact, G is
(10d) But (10c) contradicts (9d).
not provable in K, we can conclude that G is true.
11. Hence neither `K G nor `K G . Since G
Notice that the Gdel sentence G of some apis closed, G is an undecidable sentence of K. plicable theory K is a wff written entirely the synQED.
e tax of S. Interpreted with the standard interpretation, it a sentence about natural numbers, built entirely out of the symbols 0, 0 , +, , =, bound
variables and logical signs. Moreover, it is true.
Corollary: All theories to which Gdels first
Hence, it seems that not all truths of arithmetic
theorem applies has are incomplete in Posts
can be captured in any recursively axiomatizable,
sense.
-consistent theory.
We can consistently add the Gdel sentence G
of some theory K to that theory as a new axiom,
to obtain the theory KG . Since KG has a different
Proof:
All have at least one decidable sentence, and hence axiom set from K, the number-theoretic property
do not fall under this definition of completeness.e PrAx will be different for KG from what it was
for K, but it will still be recursive, and hence the
relation Pf will also be different but still be recursive. Hence, there will be a different wff Pf [x, y]
Corollary: Systems S and RR have undecidable
that expresses the new Pf-relation, and a different
sentences, and hence, are incomplete in Posts
Gdel sentence G , different from G , which is an
sense.
undecidable sentence of KG . We can continue the
adding all we like one by one; well never achieve
completeness.
(7c)
(7d)
(7e)
(7f)
(7g)
(7h)
(7i)

99

Gdels first incompleteness theorem involves


-consistency, not simple consistency. As J. B.
Rosser showed five years later, a similar result can
be proved involving consistency proper.

Result: For any theory with identity K (e.g., S


or RR) such that (i) K has a recursive axiom set
and vocabulary, (ii) every recursive function is
representable in K and every recursive relation is
expressible in K, (iii) for every natural number n,
it holds that:
($) `K x n x = 0 x = 1 . . . x = n
(@) `K x n n x
and (iv) K is consistent, there is at least one undecidable sentence in K, R (called the Rosser
sentence for K).
(The GdelRosser Theorem)

3.

4.
5.

Proof:
1. Assume that K is such a theory. So all
recursive functions and relations are representable/expressible in K. Hence the relation Pf
is expressible in K and the function Neg (whose
value, for any Gdel number of a wff, is the
Gdel number of the negation of that wff) is
representable in K. Let the wff that expresses
Pf be Pf [x1 , x2 ], and the let the wff that represents Neg be Neg [x, y]. Hence, for all natural
numbers k1 and k2 :
(1a) If Pf holds for hk1 , k2 i, then `K
Pf [k1 , k2 ].
(1b) If Pf does not hold for hk1 , k2 i, then
`K Pf [k1 , k2 ].
(1c) If Neg(k1 ) = k2 then `K Neg [k1 , k2 ].
(1d) `K (1 y) Neg [k1 , y].
2. Consider the following open wff, hereafter abbreviated as E [x]:
(z)(Pf [z, x] (y)(Neg [x, y]
(z1 )(z1 z Pf [z1 , y])))

6.

This wff says that, for all z, z is the Gdel number of a proof in K of the wff with Gdel number
x only if, for any number y that is the Gdel
100

number of the negation of the wff whose Gdel


number is x, there is a number z1 smaller than
z which is the Gdel number of a proof in K of
the wff with Gdel number y. Notice that if E [x]
holds for a given x, then either there is no Gdel
number of a proof of the wff with Gdel number
x (and hence that wff is not a theorem of K), or
there is also a proof of the negation of the wff
with Gdel number x, and K is inconsistent.
The Fixed-Point Theorem applies to E [x]:
(3a) `K R E [pRq].
R in effect asserts of its own Gdel number
that either it is not the Gdel number of a theorem, or its negation is also a theorem (and K is
inconsistent).
Let q be the Gdel number of R, and p be the
Gdel number of R.
It then cannot be the case that `K R.
(5a) Assume for reductio that `K R.
(5b) Then there is some n such that Pf(n, q),
and by (1a) it follows that `K Pf [n, q].
(5c) By (5a) and (3a), `K E [pRq], i.e., `K E [q].
(5d) Expanding (5c), by UI and (5b), we get:
`K (y)(Neg [q, y]
(z1 )(z1 n Pf [z1 , y])).
(5e) Note that Neg(q) = p and so by (1c),
`K Neg [q, p].
(5f) So by (5d) and (5e), we get:
`K (z1 )(z1 n Pf [z1 , p])
(5g) By (5a) and Ks consistency, 0K R.
For all natural numbers s, Pf does not hold
for hs, pi. A fortiori, for all natural numbers s less than or equal to n, we have, by
(1b), `K Pf [s, p].
(5h) By PF= rules, from (5g) it follows that, for
all natural numbers s less than or equal to
n, we have `K x = s Pf [x, p].
(5i) By ($), (5h) and a big proof by cases:
`K x n Pf [x, p].
(5j) By SL, Gen and variable juggling, (5i) becomes: `K (z1 ) (z1 n Pf [z1 , p]).
(5k) But from (5j) and (5f), we get that K is
inconsistent, which is impossible.
By a similar process of reasoning, we can show
that it is not the case that `K R.
(6a) Assume `K R for reductio.
(6b) Then there is some n such that Pf(n, p),

and by (1a) it follows that `K Pf [n, p].


(6c) By (5b), and PF= , we have
`K n x (z1 )(z1 x Pf [z1 , p]).
(6d) By (5), for all natural numbers s, Pf does
not hold for hs, qi, and so by (1b),
`K Pf [s, q].
(6e) By a proof by cases similar to that in (5i),
we get: `K x n Pf [x, q].
(6f) By (@), (6c) and (6e), we can derive
that: `K Pf [x, q] (z1 )(z1 x
Pf [z1 , p]).
(6g) By the same reasoning as (5e),
`K Neg [q, p].
(6h) By (6g) and (1d), we get:
`K (y)(Neg [q, y] y = p).
(6i) From (6f), SL, and variable juggling: `K
Pf [z, q] (z1 )(z1 z Pf [z1 , p]).
(6j) Using (6h) and (6i) we get the following
proof:
(Premise)
1. Pf [z, q] `K Pf [z, q]
2. Pf [z, q] `K (z1 )(z1 z
Pf [z1 , p])
1, (6i), MP
(Premise)
3. Neg [q, y] `K Neg [q, y]
4. Neg [q, y] `K y = p
3, (6h), UI, MP
5. Pf [z, q], Neg [q, y] `K (z1 )(z1
z Pf [z1 , y])
2, 4 LL
6. `K (z)(Pf [z, q]
(y)(Neg [q, y] (z1 )(z1 z
Pf [z1 , y])))
5,DT,Gen,DT,Gen
(6k) Note that the conclusion of (6j) is `K E [q].
(6l) But q is the Gdel number of R, so
`K E [pRq].
(6m) By (6l) and (3a), `K R.
(6n) By (6m) and (6a), K is inconsistent, which
is impossible.
7. By (5) and (6), neither `K R nor `K R. So R
is an undecidable sentence of K. QED.
e
The results of Gdel and Rosser we have just seen
can more or less be summarized this way: no system for number theory with a recursive axiom set
can be complete.
Definition: A theory K is said to be recursively
axiomatizable iff there is a theory K* with exactly
the same theorems as K such that K* has a recursive
axiom set.

Notice that a theory does not itself have to have a


recursive axiom set to be recursively axiomatizable.
However, it is easy to prove that if a given theory is incomplete, then any theory with exactly
the same theorems will also be incomplete. Hence,
no recursively axiomatizable system for number
theory can be complete.
What about a system that is not recursively
axiomatizable? The results of Gdel and Rosser
would not apply to it, and so it might very well be
able to capture all arithmetical truths. But what
would such a system be like? If we accept Churchs
thesis (see below), such a theory must be very
strange indeed.

H.

Churchs Thesis

Definition: A number-theoretic function is said


effectively computable iff there exists a purely mechanical procedure or algorithmone that does not
require original insight or ingenuitywhereby one
could determine the value of the function for any
given argument or arguments.
Definition: A number-theoretic relation is said to
be effectively decidable iff there exists a purely mechanical procedure or algorithm whereby one could
determine whether or not it applies to any given number or numbers.
Definition: Churchs thesis is the supposition
that a number-theoretic function is effectively computable iff it is recursive. (Or equivalently, that a
number-theoretic relation is effectively decidable iff
it is recursive.)
Churchs thesis has never been proven. The reason is that the notion of purely mechanical procedure, not requiring ingenuity cannot be made
more precise without begging the question. (I.e.,
if we simply define it in recursive mathematical
terms, Churchs thesis becomes uninteresting.)
However, more than a half-century of research
in computability and computer science have failed
to produce a clear counterexample to Churchs
Thesis.
There is clearly a mechanical procedure for
working forwards and backwards between wffs

101

and their Gdel numbers. So if we accept Churchs


thesis, a system that does not have a recursive axiom set would be one in which there is no effective
procedure for determining whether or not a given
wff is an axiom or not.
While in the abstract, one can speak of systems or theories that are not recursively axiomatizable, it is impossible actually to describe one
fully. In such a system, there would no effective
way to determine whether or not a given wff was
an axiom, and hence no effective way to determine
whether or not a given alleged proof was allowed
or not. It is difficult to believe that such a system
would not be fully learnable or usable in practice.
For example, in order to cheat, we could create a system in which every wff that is true in
the standard interpretation is an axiom. Obviously,
such a theory would be complete. However, there
would be no way of determining whether a given
wff counts as an axiom or not. (E.g., is Goldbachs
conjecture true in the standard interpretation?)
The notion of a system in which there is no
effective procedure for determining whether or not
a given wff is a theorem is somewhat less troubling. In fact, we shall later prove that S, RR and
even simple PF are like this. In these systems, it
takes ingenuity to determine whether a given wff
is a theorem, because it takes ingenuity to find the
appropriate proof. However, there is at least an
effective procedure, once given an alleged proof,
of determining whether or not it is an acceptable
proof in that system. (In other words, while the
property of being the Gdel number of a theorem is
not recursive for these systems, the relation Pf(x, y)
is recursive.)

I.

Recall that the wff Pf [x1 , x2 ] expresses the relation Pf that holds between x1 and x2 just in case
x1 is the Gdel number of a proof of the wff with
Gdel number x2 .
Abbreviation: We shall now introduce the following new abbreviation:
Bew [x]

is shorthand for

(y) Pf [y, x]

While this wff does not express in S the property of being the Gdel number of a theorem of S,
this is its meaning in the standard interpretation.
(This abbreviation derived from the German word
beweisbar, meaning provable.)
Definition: The Hilbert-Bernays derivability
conditions are the following three results, for any
wffs A and B:
(HB1) If `S A , then `S Bew [pA q].
(HB2) `S Bew [pA Bq]
(Bew [pA q] Bew [pBq])
(HB3) `S Bew [pA q] Bew [pBew [pA q]q].
Similar results hold not only for S, but for any recursively axiomatizable extension of S.
For homework, you will prove (HB1). It follows
fairly easily from the fact that P f [x1 , x2 ] expresses
Pf in S. (HB2) and (HB3) are more difficult to prove,
but follow in a similar way.

Currys Paradox (also known as Lbs


Paradox)
Consider the following proof for the existence of
Santa Claus. Consider the sentence
(C) If this sentence is true, then Santa Claus
exists.
I.e., let C be defined as C E, where E
means Santa Claus exists. Then:

Lbs Theorem / Gdels


Second Theorem

The Hilbert-Bernays Derivability Condi1. C `L C


tions
2. C `L ? C E
Gdels first incompleteness theorem, applied to S,
involves using the Fixed-Point Theorem to yield:
`S G (y) Pf [y, pG q]
102

3.
4.
5.
6.

C `L ? E
`L ? C E
`L ? C
`L ? E

(Premise)
1 def. C
1, 2 MP
3 DT
4 def. C
4, 5 MP

Is it a theorem of propositional logic that Santa


11. `S L
(2a), 10 SL
q]
Claus exists? Well, no, because it is not legitimate
12. `S Bew [pL
11, (HB1)
in system L to define something in terms of itself.
13. `S A
10, 11 MP
But in System S we do have the following odd re- (4) We have shown that `S A by assuming that
sult:
`S Bew [pA q] A . This establishes Lbs
theorem.
e
Result (Lbs Theorem): For any closed wff A ,
if `S Bew [pA q] A , then `S A .

Proof:
(1) Assume that `S Bew [pA q] A .
(2) If the wff A is closed, then the wff Bew [x]
A has exactly one free variable. Hence, by the
Fixed-Point theorem, there is some wff L such
that:
(2a) `S L (Bew [pL q] A )
Notice that L asserts of itself that if it is provable, then A is true. Assume for a conditional
proof that L is provable. Because of what
L says, it follows that if it is provable, then
A holds. Weve assumed that it is provable.
Hence, A holds. Discharging the assumption,
if L is provable, then A holds. But this is
what L says. Our conditional proof is a proof
of L . Hence, L is provable, and so is A .
(3) Making this more formal:
1. `S L (Bew [pL q] A )
(2a), SL
2. `S Bew [pL (Bew [pL q] A )q]
1, (HB1)
3. `S Bew [pL q]
Bew [pBew [pL q] A q] 2, (HB2), MP
4. `S Bew [pBew [pL q] A q]
(Bew [pBew [pL q]q] Bew [pA q]) (HB2)
5. `S Bew [pL q] (Bew [pBew [pL q]q]
Bew [pA q])
3, 4 SL
6. `S Bew [pL q] Bew [pBew [pL q]q]
(HB3)
7. `S (Bew [pL q]
(Bew [pBew [pL q]q] Bew [pA q]))
((Bew [pL q] Bew [pBew [pL q]q])
(Bew [pL q] Bew [pA q]))
(A2)
8. `S Bew [pL q] Bew [pA q]5, 6, 7 MP2
9. `S Bew [pA q] A
Assumed at (1)
10. `S Bew [pL q] A
8, 9 SL

Corollary: Consider the Henkin sentence, i.e.,


the wff H , very much like Gdels G , except
that instead of asserting its own unprovability,
H asserts its own provability:
`S H Bew [pH q]
(The above is obtained from the Fixed-Point Theorem as you might expect.) It holds that `S H .

Proof:
Immediate by the right-to-left half of the biconditional, and Lbs theorem.
e
Since S is a true arithmetical theory, H is true in
the standard interpretation, despite the intuition
that H could just as easily have been disprovable.
Lbs theorem also leads to the result that the
consistency of S cannot be proven in S itself, even
though there is a wff of S whose meaning in the
standard interpretation is that S is consistent.
The result that Peano Arithmetic, or any extension thereof, cannot be used to prove its own consistency, was one of the original incompleteness
results first proved by Gdel in 1931. Although,
Gdel proved this result in a different way, Lbs
theorem provides us with a fairly easy proof of this
result.
Abbreviation: Let Con S be an abbreviation for
the following closed wff of system S:
(x) (y) (Neg [x, y] Bew [x] Bew [y])
Bearing in mind that Neg [x, y] represents the function Neg(x), whose value for a given Gdel number
of a wff as argument, is the Gdel number of the

103

While Con S seems to make a metatheoretic assertion about the system S, taken with the standard interpretation it is simply an assertion about
numbers and their arithmetical properties. It is yet
another example of a truth of arithmetic that Peano
arithmetic fails to capture. Hence, this too shows
that system S is incomplete.
As with Gdels first incompleteness result,
Result: If S is consistent, then 0S Con S .
adding additional axioms, even Con S itself, will
(Gdels Second Incompleteness Theorem)
not yield a complete system. Let us consider the
system S* obtained from S by adding Con S as an
axiom. While it is easily shown that S* is consistent (at least if S is consistent), there will then be
Proof:
1. Assume S is consistent, and assume for reductio a different wff Con S that, for similar reasons, will
not be a theorem of S*.
that `S Con S .
This also shows that there are limitations to
2. Since `S 0 6= 1, by (HB1), we have
the extent to which S (or any other consistent sys`S Bew [p0 6= 1q].
tem) can properly be used for the metalanguage in
3. By UI on Con S , we get
which to conduct its own metatheory.
`S (Neg [p0 = 1q, p0 6= 1q]
q
p
q
p
In fact, there are no closed wffs A for which
Bew [ 0 = 1 ] Bew [ 0 6= 1 ]).
4. Because Neg [x, y] represents the Neg function, it is provable in S that A is not a theorem of S.
(This can be seen by careful reflection on steps of
we have in S that `S Neg [p0 = 1q, p0 6= 1q].
the proof of Gdels second theorem.) While S can
5. By (2), (3), (4) and SL we get that
be used to prove of itself that certain sentences
`S Bew [p0 = 1q].
q
p
are theorems, it cannot be used to prove that any
6. By (A1), `S Bew [ 0 = 1 ]
(0 6= 1 Bew [p0 = 1q]). sentences arent theorems.
The last point is actually the same as the point
7. By (5) and (6), `S 0 6= 1 Bew [p0 = 1q].
that
Bew [x] does not express in S the property of
8. By transposition on (7), we get:
being a theorem of S, which well discuss further
`S Bew [p0 = 1q] 0 = 1.
below.
9. By (8) and Lbs theorem, it follows that
`S 0 = 1!
10. Since `S 0 6= 1, this means that S is inconsistent,
contrary to our hypothesis. The assumption J. Recursive Undecidability
that `S Con S must be mistaken.
e
Definition: An axiomatic system K is said to be
recursively decidable iff the following numberA similar result will hold for any extension of S,
theoretic property is recursive:
or generally, for any system with a recursive axiom set in which all recursive relations/functions TK (x) : x is the Gdel number of a theorem of K.
are expressible/representable, and for which the
(If a system is not recursively decidable, then it is
Hilbert-Bernays derivability conditions hold.
said to be recursively undecidable.)
We might put it this way: if a given axiomatic
system for number-theory is sufficiently strong, The notion of a recursively decidable system should
then if it is consistent, it cannot be used to prove not be confused with the notion of a decidable senits own consistency.
tence. A system can be recursively decidable while
Precisely because S is (we hope!) consistent, nevertheless having undecidable sentences.
Con S , is true in the standard interpretation. HowNotice that a theory can be recursively axiomaever, it is not a theorem of S.
tizable without being recursively decidable.
negation of that wff, the above wff in effect says
that it is not the case of any wff that both it and
its negation are provable. Assuming that S is consistent, Con S , is true in the standard interpretation.
However, it is not a theorem of S.

104

If we accept Churchs thesis, a recursively unde(5a) `K W T [pW q].


cidable system is one in which there is no effective 6. Let q be the Gdel number of W . Hence q is
or mechanical procedure for determining whether
the same as pW q.
or not any given wff is a theorem of the system.
7. Let us first prove by reductio that 0K W .
(7a) Assume that `K W .
If we extend Gdel numbering to include wffs
(7b) Hence q is the Gdel number of a theorem
of propositional logic (which is simple enough to
of K. In other words, TK holds for q.
do), we could show that System L (propositional
(7c) By (3a) and (7b),
logic) is recursively decidable. A wff is a theorem
`K T [q], i.e., `K T [pW q].
of L iff it is a tautology. There is a mechanical pro(7d) By (5a) and (7c), `K W .
cedure (truth tables) to determine, for any given
(7e) By (7d) and (7a), K is inconsistent, which
wff, whether or not it is a tautology.
is impossible.
However, as we will prove shortly, no similar
mechanical procedure exists for systems S, RR or 8. We have just proven that 0K W . However, this
also leads to contradiction.
even PF. Semantic trees will not work in every case,
(8a) W is not a theorem of K. Hence q is not
and constructing derivations requires insight and
the Gdel number of a theorem of K. I.e.,
ingenuity; it is not a mechanical procedure.
TK does not hold for q.
(8b) By (3b), it follows that `K T [q]. I.e.,
`K T [pW q].
Result: If K is a theory with identity such that (i)
(8c) By (5a) and (8b), we get `K W which conK has a recursive vocabulary and system of nutradicts (7).
merals, (ii) all recursive number-theoretic func9. Therefore, our assumption that TK is a recurtions are representable in K and all recursive
sive property must be mistaken. Hence, K is
number-theoretic relations are expressible in K,
recursively undecidable. This establishes the
and (iii) K is consistent, (e.g., S or RR), then K is
principle.
e
recursively undecidable.
(The Recursive Undecidability Principle)

Proof:
1. Assume that K is a theory with the characteristics above, and assume for reductio that K is
recursively decidable.
2. Then, the number-theoretic property TK is recursive.
3. Because all recursive number-theoretic relations
are expressible in K, TK is expressible in K by
some wff T [x]. By the definition of expressibility, for all natural numbers n:
(3a) If TK holds for n, then `K T [n].
(3b) If TK does not hold for n, then `K T [n].
4. The above leads to a Gdel-like sentence, asserting its own unprovability. However, when
constructed using T rather than Bew , with (3b),
this will lead to an inconsistency.
5. By the Fixed-Point Theorem, there is some
closed wff W such that:

Corollary: There is no wff of S that expresses


the property TS of being the Gdel number of
being a theorem of S.

Proof:
By the above, TS is not recursive, and a number
theoretic property is expressible in S if and only if
it is recursive.
e
The wff Bew [x] means that x is the Gdel number of a theorem of S, but it does not express that
property. A principle similar to (3b) does not hold
for Bew . Otherwise, since G is not a theorem of S,
we would be able to prove that
`S Bew [pG q]
and hence G itself, and S would be inconsistent.

105

Taken with Churchs Thesis, the Undecidability


Principle means that there is no effective procedure for determining whether or not a given wff is
a theorem of S or RR. (Perhaps that will make you
feel better about those object-language proofs in S
you found difficult: after all, if a computer cant be
programmed to find a proof of any given theorem
of S, why should you be expected to?)
The Recursive Undecidability Principle also
leads to results such as Churchs Theorem and
Tarskis theorem.
Tarskis theorem has a proof-structure very
similar to the above. (Indeed, the book presents
Tarskis theorem as a corollary of the Recursive
Undecidability Principle. However, Ill give a separate, more intuitive proof closer to the proof Tarski
himself gave.)

fact, the set of its theorems is identical with the set


of its axioms. Specifically:
(!) For any wff A , A is true in the standard interpretation iff `N A .
e

Corollary: N is a theory with identity, since


(A6) and all instances of (A7) are true in the
standard interpretation.

Result: Every number-theoretic function and relation that is representable/expressible in S is


also representable/expressible in N .

Definition: A number-theoretic property P is said


to be arithmetical iff there is some wff A [x] with
x as its only free variable, containing no constants Proof:
other than 0, predicate-letters other than = or Because S is a true arithmetical theory, every theofunction signs other than 0 , + and (i.e., in rem of S is an axiom of N .
the syntax of S/RR) such that, for all natural numbers n, P holds of n iff A [n] is true in the standard
interpretation.
Corollary: All recursive number-theoretic functions and relations are representable/expressible
The System N
in N .
The proof of Tarskis theorem makes reference to
the cheater system N , which has the same recursive syntax as S, but contains every wff that
is true in the standard interpretation as an axiom.
Obviously, by the Gdel-Rosser theorem, N is not
recursively axiomatizable. Therefore, we cannot
fully describe it nor list its axioms. However, we
can still consider it as an abstract possibility.

Result (Tarskis Theorem): The number-theoretic property Tr, which holds of a given natural
number x iff x is the Gdel number of a wff
that is true in the standard interpretation, is not
arithmetical.

Result: N is a true arithmetical theory.


Proof:
1. Assume for reductio that Tr is arithmetical.
Then there is some wff Tr [x] such that for all
Proof:
natural numbers n, Tr holds of n iff Tr [n] is
All its axioms are true in the standard interpretatrue in the standard interpretation.
tion, and the inference rules preserve truth in an 2. Hence, for all natural numbers n, Tr holds of n
interpretation, so all its theorems are also true. In
iff `N Tr [n].
106

3. Because N has the standard interpretation as a


model, by the Modeling Lemma, it is consistent.
4. N is complete in Posts sense, because for every closed wff, either it or its negation is true
in the standard interpretation. Hence, for every
closed wff B, either `N B or `N B.
5. Because all recursive functions are representable in N , the Fixed-Point Theorem applies.
There is a closed wff A such that:
(5a) `N A Tr [pA q]
Notice that A asserts of itself that it is not
true in the standard interpretation. Let q be
the Gdel number of A . So q is the same as
pA q.
6. This leads to the liar paradox. Because A asserts of itself that it is not true in the standard
interpretation, it is true iff it is not true.
7. By (4), it holds that either `N A or `N A .
Both are impossible. First, we will show that
assuming `N A leads to a contradiction.
(7a) Assume that `N A .
(7b) Hence, by (!), A is true in the standard
interpretation.
(7c) By (7b), the Gdel number of A , q, is the
Gdel number of a wff that is true in the
standard interpretation. In other words,
Tr holds of q.
(7d) By (2) and (7c), `N Tr [q], i.e.,
`N Tr [pA q].
(7e) By (7d) and (5a), `N A .
(7f) By (7a) and (7e) we get that N is inconsistent, which is impossible.
8. However, it is also impossible that `N A .
(8a) Assume that `N A .
(8b) By (5a), `N Tr [pA q], i.e., `N Tr [q]..
(8c) By (8b) and (2), Tr holds of q.
(8d) But q is the Gdel number of A , and so
A is true in the standard interpretation.
(8e) By (!) and (8d), `N A , and once again,
we get that N is inconsistent, which is
impossible.
9. Because N is consistent, our assumption
that Tr is arithmetical must be mistaken. In
other words, there cannot be any such truth
predicate as Tr formulable in the syntax of
N /S/RR.
e

This result can be paraphrased as the claim that


the truth or falsity of a sentence of arithmetic is
not equivalent to an arithmetical property of its
Gdel number. Because the arithmetical properties
of the Gdel number of a wff encode the syntactic
features of a wff, this means whether a arithmetical claim is true or false does not boil down to its
syntactic features.
Arguably, this deals a significant blow to formalism in the philosophy of mathematics: the theory that mathematics is the study of rules for manipulating meaningless syntactic strings.
Because all recursive properties are
arithmeticalyou proved this as homeworka
corollary of this result is that the property of being
an arithmetical truth is not recursive. Accepting
Churchs Thesis, then, there is no effective or
mechanical procedure by which to determine the
truth or falsity of any arbitrary arithmetical claim.
Shucks. I guess we cant program our computers to determine the truth or falsity of Goldbachs
conjecture! Theyll just have to keep working at it.

K.

Churchs Theorem

We begin by proving the following as a lemma.

Result (The Finite Extension Principle): For


any first-order theories K and K* with the same
syntax, if K* is obtained from K by adding a
finite number of axioms to the axioms of K,
then if K* is recursively undecidable, K is also
recursively undecidable.

Proof:
1. Assume that K* is obtained from K by adding the
particular wffs A1 , . . . , An as axioms. Assume
that K* is recursively undecidable, but assume
for reductio that K is recursively decidable.
2. Because K and K* have the same syntax, the
wffs A1 , . . . , An may be used as hypotheses in
K.
3. It is obvious that for every wff E ,
{A1 , . . . , An } `K E iff `K E .

107

4. Let B1 , . . . , Bn be the universal closures of


A1 , . . . , An . In any first-order theory, a wff
is interderivable with its closure. Hence,
{A1 , . . . , An } `K E iff {B1 , . . . , Bn } `K E .
5. By the deduction theorem, and SL rules:
{B1 , . . . , Bn } `K E iff `K (B1 . . .
Bn ) E .
6. By (3), (4) and (5), we get that `K (B1 . . .
Bn ) E iff `K E .
7. Let c be the Gdel number of (B1 . . . Bn ).
By (6) it follows that, for any natural number
n, n is the Gdel number of a theorem of K* iff
23 c 211 n 25 is the Gdel number of a
theorem of K.
8. Weve assumed that K is recursively decidable.
Hence TK is a recursive property. However,
given (7), the characteristic function of the property of being the Gdel number of a theorem of
K*, TK , could easily be obtained by substitution
using the characteristic function of TK .
9. Hence, K*, TK is also a recursive property,
which contradicts the assumption at (1) that
K* is recursively undecidable. Therefore, the
assumption that K is recursively decidable must
be mistaken.
e

Result (Churchs Theorem): The first-order


predicate-calculus, system PF, is recursively
undecidable.

Proof:
1. Assume for reductio that PF is recursively decidable, i.e., that the number-theoretic property
TPF is recursive.
2. Consider now the system PS , the predicate calculus in the language of arithmetic. This is
the system that has the same syntax as S and
RR but does not contain any proper axioms.
Hence its only axioms are the instances of axiom
schemata (A1)(A5) that contain no constants
other than 0 (a), no predicate-letters other
than = (I 2 ), and no function signs other than
0 (f 1 ), + (f12 ), and (f22 ).

3. System RR is a theory with identity with a


recursive vocabulary and system of numerals.
All recursive number-theoretic functions are
representable in RR and all recursive numbertheoretic relations are expressible in RR. Since
RR is consistent, it follows by the Recursive
Undecidability Principle that RR is recursively
undecidable.
4. RR has only a finite number of proper axioms,
i.e., it is a finite extension of PS . By (3) and the
Finite Extension Principle, it follows that PS is
also recursively undecidable.
5. Because PS has the same syntax as S, the number
theoretic property FmlS that applies to a number x iff x is the Gdel number of a wff of the
syntax of S is the same as the number-theoretic
property FmlPS of being the Gdel number of a
wff of PS . Because FmlS is primitive recursive,
so is FmlPS .
6. The system PS is just like system PF except that
its theorems are those theorems of PF that are
wffs in the more limited syntax of PS .
7. It follows from (6) that the number-theoretic
property of being a theorem of PS , namely TPS , is
the conjunction of the properties TPF and FmlPS .
8. However, the conjunction of two recursive
number-theoretic properties is itself recursive.
Hence, by (1) and (5), TPS is recursive, which
contradicts (4).
9. Hence, our assumption that TPF is recursive
must be mistaken. This establishes the theorem.
e
Because PF is both complete and sound, a wff A is
a theorem of PF iff it is logically valid.
Therefore, if we accept Churchs Thesis,
Churchs Theorem amounts to the claim that there
is no effective or mechanical procedure for determining whether or not a given wff of the language
of predicate logic is logically valid.
Doesnt this seem like a good place to end the
semester? I thought so.

108

INDEX OF SYMBOLS AND DEFINITIONS

A , B, C , etc., 3
, ,
/ 4, 59
{. . .}, 4
, , 4
, , , 4
, 4
h. . .i, 4
, n , 4
[A]R , 5

=, 5, 60
0 , 5
, , , , , 8, 16
, , 10, 3334
2, 11
|, , 14
`, 18, 39
, 29
, 30
A [x], 31
(X)M , 31
s(t), 33
6, 36
` , 42
g( ), 43, 8889
=, 6=, 51
(n x), 51
= , 52
0
, +, , 0, 56
{x|A [x]}, 59
n, 63
<, >, , , , , , , 64
t|u, 67
CR , 70
GF , 71
Uin , 69, 71

Z(x), 71
y(. . .), 72
x

, , 75

x<y x<y

zz<y (. . .), 76
, 79
f #, 79
, Bt, 82
pA q, 95
Pf , 98
Neg , 100
Bew , 102
aleph null/naught, 5
argument places, 29
arithmetical property, 106
arithmetization (of syntax), 87
atomic formula, 29
axiom, 17, 39
axiom schema, 17
axiomatic systems, 17
base step, 6
beta function, 82
binary, 4, 28, 29
bound occurrence, 30
bound variable, 30
bounded -operator, 76
bounded product, 74
bounded sum, 74
cardinality/cardinal number, 5
Cartesian product, 4
characteristic function, 70
choice of least rule, 72
Churchs theorem, 107
109

Churchs thesis, 101


closed formula, 30
complete induction, 6
completeness, 23, 97
conjunction of relations, 75
connective, 8
consistent, 10, 44
constant, 28
constant function, 73
contingent, 10
contradictory, 34
countable, 5
countermodel, 35
course-of-values recursion, 79
Currys paradox, 102
dagger, 14
decidable relation, 101
decidable sentence, 98
decidable theory, 104
deduction theorem, 19, 40
denumerable, 5
denumerable model, 47
denumerable sequence, 32
derived rule, 19
diagonalization, 95
diagonalization function, 91
disjoint, 4
disjunction of relations, 75
divisibility, 67
domain (of a function/relation), 4
domain of quantification, 31
dyadic, 28, 29
effectively computable, 101
effectively decidable, 101
empty set, 4
encoding, 77
equality, 50
equinumerous, 5
equivalence class, 5
equivalence relation, 5
expressible, 69
false, 33
Fibonacci sequence, 79
field, 5
finite, 5

finite extension, 107


finite extension principle, 107
first-order language, 30
first-order theory, 39
first-order theory with identity, 51
formula, 8, 15, 29, 85
free for, 30
free occurrence, 30
free variable, 30
function, 5, 68
function letter, 29
function, characteristic, 70
general recursive, 72
generalization, 39
Gdel number, 44, 88
Gdel sentence, 98
Gdels -function, 80
Gdels first incompleteness theorem, 98
Gdels second incompleteness theorem, 104
GdelRosser theorem, 100
graph, 71
grotesque, 27
Henkin sentence, 103
Hilbert-Bernays derivability conditions, 102
identity, 50
identity-valid, 52
inconsistent, 44
independence, 25
individual constant, 28
individual variable, 28
induction, 6, 58
induction step, 6
inductive hypothesis, 6
inference rule, 7, 39
infinite, 5
infinite descent, 67
initial function, 71
interpretation, 31
intersection, 4
juxtaposition, 79
least number principle, 67
Leibnizs law, 51
lemma, 23
110

length, 77
liar paradox, 107
Lindenbaum extension lemma, 44
Lbs paradox, 102
Lbs theorem, 103
logical axiom, 17, 39
logical consequence, 10, 34
logically equivalent, 10, 34
logically imply, 10, 34
logically true, 34
logically valid, 34
logicism, 61
mathematical induction, 6, 58
maximal, 44
member, 4
metalanguage, 2
metalinguistic variables, 3
metatheory, 1
method of infinite descent, 67
model, 31
model for, 34
modeling lemma, 49
modus ponens, 7, 39
monadic, 28, 29
n-place function, 5
n-place operation, 5
n-place relation, 4
n-tuple, 4
natural deduction, 16
negation of a relation, 75
Nicod system, 27
normal model, 52
null set, 4
number-theoretic function, 68
number-theoretic relation, 68
numeral, 63
object language, 2
-consistent, 96
one-one function, 5
open formula, 30
operation, 5
operator, 8
order, 64
ordered n-tuple, 4
ordered pair, 4

overbar, 63
parentheses conventions, 9, 29
Peano arithmetic, 56
Peano axioms/postulates, 56
Peirce dagger, 14
predicate calculus, 39
predicate calculus with identity, 51
predicate letter, 28
primitive recursive, 72
projection function, 69, 71
proof, 17
proof induction, 6
proper axiom (RR), 94
proper axiom (S), 57
proper subset, 4
proper subtheory, 94
property, 4
propositional connective, 8
pseudo-derivability, 42
pure predicate calculus, 39
range, 5
recursion, 72
recursive axiom set, 91
recursive function, 72
recursive property/relation/set, 72
recursive vocabulary, 89
recursively axiomatizable, 101
recursively decidable, 104
reflexive, 5
relation, 4, 68
relative complement, 4
representable, 69
restricted -operator, 72
Robinson arithmetic, 94
Russells paradox, 60
satisfaction/satisfy, 33
satisfiable, 10, 34
schematic letter, 3
Schmdel number, 85
schmtingent, 26
schmuth tables, 25
schmuth-value assignment, 25
scope, 30
select, 26
self-contradiction, 10
111

semantic tree, 36
sentence, 30
sequence, 32
set, 4
Sheffer stroke, 14
Sheffer/Peirce dagger, 14
signum, 74
singleton, 4
Skolem-Lwenheim theorem, 50
smaller, 5
sound/soundness, 22
standard model for S, 57
statement letter, 8
strong induction, 6
strongly representable, 69
subset, 4
substitution, 71
subtheory, 94
successor, 57, 71
symmetric, 5
syntax, 8
system ,, 85
system F, 59
system L, 17
system N , 106
system PF, 39
system PF= , 51
system PP, 39
system PS , 108
system RR, 94
system S, 56

universal, 44
universe of discourse, 31
urelement, 4
use and mention, 2
valid, 34
variable, 28
variable assignment, 32
well-formed formula (wff), 8, 15, 29, 85
wff induction, 6
zero function, 69

Tarskis theorem, 106


tautology, 10
term, 29
theorem, 18, 85
theorem schema, 18
transitive, 5
true, 33
true arithmetical theory, 96
truth-value assignment, 9
turnstile, 18
undecidable sentence, 98
undecidable theory, 104
union, 4
unit set, 4

112

You might also like