DRAFT - 31 DEC 2021: Beginning Mathematical Logic
DRAFT - 31 DEC 2021: Beginning Mathematical Logic
Peter Smith
LOGIC MATTERS
DRAFT– 31 DEC 2021
© Peter Smith 2021
Contents
Preface vii
1 The Guide, and how to use it 1
1.1 Who is the Guide for? 1
1.2 The Guide’s structure 1
1.3 Strategies for self-teaching from logic books 4
1.4 Choices, choices 5
1.5 So what do you need to bring to the party? 6
1.6 Two notational conventions 7
2 A very little informal set theory 8
2.1 Sets: a checklist of some basics 8
2.2 Recommendations on informal basic set theory 11
2.3 Virtual classes, real sets 12
3 First-order logic 14
3.1 FOL: a general overview 14
3.2 A little more about types of proof-system 20
3.3 Basic recommendations for reading on FOL 24
3.4 Some parallel and slightly more advanced reading 26
3.5 A little history (and some philosophy too) 30
3.6 Postscript: Other treatments? 32
4 Second-order logic, quite briefly 37
4.1 A preliminary note on many-sorted logic 37
4.2 Second-order logic: a brief overview 39
4.3 Recommendations on many-sorted and second-order logic 44
4.4 Conceptual issues 45
5 Model theory 46
5.1 Elementary model theory: an overview 46
5.2 Recommendations for beginning first-order model theory 52
5.3 Some parallel and slightly more advanced reading 54
5.4 A little history 56
6 Arithmetic, computability, and incompleteness 58
6.1 Logic and computability 58
6.2 Computable functions: an overview 60
6.3 Formal arithmetic: an overview 62
iii
DRAFT– 31 DEC 2021
Contents
iv
DRAFT– 31 DEC 2021
Contents
v
DRAFT– 31 DEC 2021
DRAFT– 31 DEC 2021
Preface
vii
DRAFT– 31 DEC 2021
Preface
entries have been recommended, these can almost always be freely downloaded,
and again I give links.
Before I retired from the University of Cambridge, it was my greatest good
fortune to have secure, decently paid, university posts for forty years in leisurely
times, with almost total freedom to follow my interests wherever they meandered.
Like most of my contemporaries, for much of that time I didn’t really appreciate
how extraordinarily lucky I was. In writing this Study Guide and making it
readily available, I am trying to give a little back by way of heartfelt thanks. I
hope you find it useful.1
1 Many, many thanks are due to all those who commented on versions of Teach Yourself
Logic over more than a decade. Further comments and suggestions for future editions of
this revised Guide will always be gratefully received.
viii
DRAFT– 31 DEC 2021
Who is this Study Guide for? What does it cover? At what level? How should
the Guide be used? And what background knowledge do you need, in order to
make use of it? This preliminary chapter explains.
1
DRAFT– 31 DEC 2021
1 The Guide, and how to use it
2
DRAFT– 31 DEC 2021
The Guide’s structure
points to a cluster of issues about the structure of proofs and the consis-
tency of theories, etc.
(b) Now, a quick glance at e.g. the entry headings in The Stanford Encyclopedia
of Philosophy reveals that philosophers have been interested in a wide spectrum
of other logics, ranging far beyond classical and intuitionistic versions of FOL
and their second-order extensions. And although this Guide – as its title suggests
– is mainly focussed on core topics in mathematical logic, it is worth pausing to
consider just a few of those variant types of logic.
First, in looking at intuitionist logic, you will already have met a new way of
thinking about the meanings of the logical operators, using so-called ‘possible-
world semantics’. We can now usefully explore this idea further, since it has
many other applications. So:
Chapter 10 discusses modal logics, which deploy possible-world semantics, ini-
tially to deal with various notions of necessity and possibility. In general,
these modal logics are perhaps of most interest to philosophers. However,
there is one particular variety which it is good for any logician to know
about, namely provability logic, which (roughly speaking) explores the logic
of operators like ‘it is provable in formal arithmetic that . . . ’.
Second, standard FOL (classical or intuitionistic) can be criticized in various
ways. For example, (1) it allows certain arguments to count as valid even when
the premisses are irrelevant to the conclusion; (2) it is not as neutral about exis-
tence assumptions as we might suppose a logic ought to be; and (3) it can’t cope
naturally with terms denoting more than one thing like ‘Russell and Whitehead’
and ‘the roots of the quintic equation E’. It is worth saying something about
these supposed shortcomings. So:
Chapter 11 discusses so-called relevant logics (where we impose stronger require-
ments on the relevance of premisses to conclusions for valid arguments),
free logics (i.e. logics free of existence assumptions, where we no longer
presuppose that e.g. names in an interpreted formal language always ac-
tually name something), and plural logics (where we can e.g. cope with
plural terms).
For reasons I’ll explain, these variant logics are indeed mostly of concern to
philosophers. Though any logician interested in the foundations of mathematics
should want to know more about the pros and cons of dealing with talk about
pluralities by using set theory vs second-order logic vs plural logic.
(c) How are these chapters from Chapter 3 onwards structured?
Each starts with one or more overviews of its topic area(s). These overviews
are not full-blown tutorials or mini encylopedia-style essays – they are simply
intended to give helpful introductions, with some rough indications of what
the chapters are about. And I don’t pretend that the level of coverage in the
overviews is uniform. If you already know something of the topic, or if these
necessarily brisk arm-waving descriptions sometimes mystify, feel very free to
skim or skip as much you like.
3
DRAFT– 31 DEC 2021
1 The Guide, and how to use it
Overviews are then followed by the key sections, giving a list of main recom-
mended texts for the chapter’s topic(s), put into what strikes me as a sensible
reading order.
I next offer some suggestions for alternative/additional reading at about the
same level or only another half a step up in difficulty/sophistication.
And because it can be quite illuminating to know just a little of the background
history of a topic, most chapters end with a few brisk suggestions for reading on
that.
(d) This is primarily a Guide to beginning mathematical logic. So the recom-
mended introductory readings in Chapters 1 to 11 won’t take you very far. But
they should be more than enough to put you in a position from which you can
venture into rather more advanced work under your own steam. Still, I have
added a final chapter which looks ahead:
Chapter 12 offers suggestions for those who want to delve further into the topics
of some earlier core chapters, in particular looking again at model theory,
computability and arithmetic, set theory, and proof theory. Then I add a
final section on a new topic, type theories and the lambda calculus, a focus
of much recent interest.
Very roughly, if the earlier chapters are at advanced undergraduate level (or a
little more), this last one is definitely at graduate level.
To repeat: you will certainly miss a lot if you concentrate on just one text
in a given area, especially at the outset. Yes, do very carefully read one or two
central texts, chosing books that work for you. But do also cultivate the crucial
habit of judiciously skipping and skimming through a number of other works so
that you can build up a good overall picture of an area seen from various angles
and levels of approach.
While we are talking about strategies for self-teaching, I suppose I should add a
quick remark on the question of doing exercises.
Note that some authors have the irritating(?) habit of burying quite important
results among the exercises, mixed in with routine homework. It is therefore
always a good policy to skim through the exercises in a book even if you don’t
plan to work on answers to very many of them.
widely agreed to have significant virtues (even if other logicians would have
different favourites).
For example, we might talk in logician’s English about a logical formula being
of the shape (A ∨ B), using the italic letters as place-holders for sentences. And
then (P ∨ Q), a formula from a particular logical language, could be an instance,
with these sans-serif letters being sentences of the relevant language. Similarly,
x + 0 = x might be an equation of ordinary informal arithmetic, while x + 0 = x
will be an expression belonging to a formal theory of arithmetic.
Our second convention, just put into practice, is that we will not in general
be using quotation marks when mentioning symbolic expressions. Logicians can
get very pernickety, and insist on the use of quotation marks in order to make
it extra clear when we are mentioning an expression of, say, formal arithmetic
in order to say something about that expression itself as opposed to using it to
make an arithmetical claim. But in the present context it is unlikely you will be
led astray if we just leave it to context to fix whether a symbolic expression is
being mentioned rather than put to use.
7
DRAFT– 31 DEC 2021
Notation, concepts and constructions from entry-level set theory are very often
presupposed in elementary mathematical texts – including some of the introduc-
tory logic texts mentioned in the following chapters, even before we get round to
officially studying set theory itself. If the absolute basics aren’t already familiar
to you, it is worth pausing to get acquainted at an early stage.
In §2.1, then, I note what you should ideally know about sets here at the
outset. It isn’t a lot! And for now, we proceed ‘naively’ – i.e. we proceed quite
informally, and will just assume that the various constructions we talk about
are permitted, etc. §2.2 gives recommended readings on basic informal set theory
for those who need them. In §2.3 I point out that, while the use of set-talk in
elementary contexts is conventional, it many cases it can in fact be eliminated
without serious loss.
8
DRAFT– 31 DEC 2021
Sets: a checklist of some basics
(iii) If A, B are sets, then so too are their union, intersection and their power-
sets.
If the intersection A ∩ B is always to exist, then we have to allow a
set which contains no members (since A and B might not overlap). By
extensionality, the empty set ∅ is unique.
The powerset of A, P(A), is the set whose members are all and only the
subsets of A. Note this assumes that sets are indeed things which can be
members of other sets.
(iv) Sets are in themselves unordered. But we often need to work with ordered
pairs, ordered triples, ordered quadruples, . . . , tuples more generally. We
use ‘ha, bi’ – or often simply ‘(a, b)’ – for the ordered pair, first a, then b.
So, while {a, b} = {b, a}, by contrast ha, bi = 6 hb, ai.
We can implement ordered pairs using unordered sets in various ways:
all we need is some definition which ensures that ha, bi = ha0 , b0 i if and only
if a = a0 and b = b0 . The following is standard: ha, bi =def {{a}, {a, b}}.
Once we have ordered pairs available, we can use them to define ordered
triples: ha, b, ci can be defined as first the pair ha, bi, then c, i.e. as hha, bi, ci.
Then the quadruple ha, b, c, di can be defined as hha, b, ci, di. And so it goes.
(v) The Cartesian product A×B of the sets A and B is the set whose members
are all the ordered pairs whose first member is in A and whose second
member is in B. So A × B is {hx, yi | x ∈ A & y ∈ B}. Cartesian products
of n sets are defined as sets of n-tuples, again in the obvious way.
(vi) If R is a binary relation between members of the set A and members of the
set B, then its extension is the set of ordered pairs hx, yi (with x ∈ A and
y ∈ B) such that x is R to y. So the extension of R is a subset of A × B.
Similarly, the extension of an n-place relation is the set of n-tuples of
things which stand in that relation. In the unary case, where P is a property
defined over some set A, then we can simply say that the extension of P
is the set of members of A which are P .
For many mathematical purposes, we treat properties and relations ex-
tensionally; i.e. we regard properties with the same extension as being the
same property, and likewise for relations. Indeed, we can often simply treat
a property (relation) as if it simply is its extension.
(vii) The extension (or graph) of a unary function f which sends members of A
to members of B is the set of ordered pairs hx, yi (with x ∈ A and x ∈ B)
such that f (x) = y. Similarly for n-place functions. For many purposes, we
treat functions extensionally, regarding functions with the same extension
as the same. Again we often treat a function as if it is its extension, i.e.
we identify a function with its graph.
(viii) Relations can, for example, be reflexive, symmetric, transitive; equivalence
relations are all three. Note that if ≡ is an equivalence relation defined
over some set, it partitions that set into equivalence classes (we never
say ‘equivalence sets’ !) of objects standing in that relation. If [x] is the
9
DRAFT– 31 DEC 2021
2 A very little informal set theory
Now, some people use ‘naive set theory’ to mean, quite specifically, a the-
ory which makes that simple but hopeless assumption that any property at all
has a set as its extension. As we’ve just seen, naive set theory in this sense is
inconsistent.
But here we need to avoid getting entangled in one of those rather annoying
terminological divergences. Because, for many others, ‘naive set theory’ just
means set theory developed informally, without rigorous axiomatization, but
guided by unambitious low-level principles. In this different second sense, we
have been proceeding naively in this chapter – yet, fingers crossed, we remain on
track for developing a consistent story! Thus, we were careful in (vi) to assign
extensions just to those properties and relations that are defined over domains
we are already given as sets.
True, our story so far is silent about exactly which putative sets are the kosher
ones – i.e. are not ‘too big’ to be to be problematic. However, important though
it is, we can leave this topic until Chapter 7 when we turn to set theory proper.
Low-level practical uses of sets in ‘ordinary’ mathematics seem remote from
such problematic cases; hopefully, we can continue to proceed naively for now in
elementary contexts.
11
DRAFT– 31 DEC 2021
2 A very little informal set theory
3. David Makinson, Sets, Logic and Maths for Computing (Springer, 3rd
edn 2020), Chapters 1 to 3.
This is exceptionally clear and very carefully written for students
without much mathematical background. Chapter 1 reviews basic facts
about sets. Chapter 2 is on relations. Chapter 3 is on functions. This too
can be warmly recommended (though you might want to supplement it
by following up his reference to Cantor’s Theorem).
Now, Makinson doesn’t mention the Axiom of Choice at all. While Button
does eventually get round to Choice in his Chapter 16; but the treatment
there depends on the set theory developed in the intervening chapters, so
isn’t appropriate for us just now. Instead, the following two pages should
be enough for the present:
12
DRAFT– 31 DEC 2021
Virtual classes, real sets
You will eventually find that this same usage plays an important role in set theory
in some treatments of so-called ‘proper classes’ as distinguished from sets. For
example, in his standard book Set Theory (1980), Kenneth Kunen writes
The distinction being made here is an old one. Here is Paul Finsler, writing in
1926 (as quoted by Luca Incurvati, in his Conceptions of Set):
Finsler writes ‘almost always’, I take it, because a class term may in fact denote
just one thing, or even – perhaps by misadventure – none.
Nothing hangs on the particular terminology, ‘classes’ vs ‘sets’. What matters
(or will eventually matter) is the distinction between non-committal, eliminable,
talk – talk of merely virtual sets/classes/pluralities (whichever idiom we use) –
and uneliminable talk of sets as entities in their own right.
13
DRAFT– 31 DEC 2021
3 First-order logic
1A note to philosophers. If you have carefully read a substantial introductory logic text for
philosophers such as Nicholas Smith’s, or even my own, you will already be familiar with
(versions of) a fair amount of the material covered in this chapter. However, you will now
begin to see topics being re-presented in the sort of mathematical style and with the sort of
rigorous detail that you will necessarily encounter more and more as you progress in logic.
You do need to start feeling entirely comfortable with this mode of presentation at an early
stage. So it is well worth working through even rather familiar topics again, this time with
more mathematical precision.
14
DRAFT– 31 DEC 2021
FOL: a general overview
a division of labour. First, we clarify the intended structure of the original ar-
gument by rendering it into an unambiguous simplified/formalized language.
Second, there’s the separate business of assessing the validity of the resulting
regimented argument.
In exploring FOL, then, we will use appropriate formal languages which con-
tain, in particular, tidily-disciplined surrogates for the propositional connectives
and, or, if, not (standardly symbolized ∧, ∨, → and ¬), plus replacements for
the ordinary language quantifiers (roughly, using ∀x for every x is such that . . . ,
and ∃y for some y is such that. . . ).
Although the fun really starts once we have the quantifiers in play, it is very
helpful to develop FOL in two main stages:
(b) We then move on to develop the syntax and semantics of richer formal
languages which add the apparatus of so-called first-order quantification,
and explore the logic of arguments rendered into such languages.
So let’s have just a little more detail about stages (a) and (b).
(a.i) We first look at the syntax of propositional languages, defining what count
as the well-formed formulas (wffs) of such languages.
If you have already encountered languages of this kind, you will now get to
know how to actually prove various things about them that seem obvious and
that you perhaps previously took for granted – for example, that ‘bracketing
works’ to block ambiguities like P ∨ Q ∧ R, so every well-formed formula has a
unique unambiguous parsing.
(a.ii) On the semantic side, we need the idea of a valuation for a propositional
language.
We start with an assignment of truth-values, true vs false, to the atomic
formulas, the basic building blocks of our languages. This assignment of values to
atoms then fixes the truth-values of complex sentences involving the connectives.
Here we rely crucially on the ‘truth-functional’ interpretation of the connectives
– so the truth value of a formula like ¬(P ∧ (Q ∨ ¬R)) is entirely fixed as a
function of the truth-values of the atomic formulas P, Q, R.
More generally, then, any wff – whether atomic or complex – is determined to
be either definitely true or definitely false (one or the other, but not both) on any
particular valuation. This core assumption is distinctive of classical two-valued
semantics.
(a.iii) Even at this early point, questions arise. For example, how satisfactory
is the representation of an informal conditional if P then Q by a formula P → Q
15
DRAFT– 31 DEC 2021
3 First-order logic
16
DRAFT– 31 DEC 2021
FOL: a general overview
Of course, we want these two approaches to fit together. We want our favoured
proof-system S to be sound – it shouldn’t give false positives. In other words,
if there is an S-derivation of A from Γ, then A really is tautologically entailed
by Γ. We also would like our favoured proof-system S to be complete – we want
it to capture all the correct semantic entailment claims. In other words, if A
is tautologically entailed by the set of premisses Γ, then there is indeed some
S-derivation of A from premisses in Γ.
So, in short, we will want to establish both the soundness and the complete-
ness of our favoured proof-system S for propositional logic (axiomatic, natural
deduction, whatever). Now, these two results will hold no terrors! However, in
establishing soundness and completeness for propositional logics you will en-
counter some useful strategies which can later be beefed-up to give us soundness
and completeness results for stronger logics.
(b.i) Having warmed up with propositional logic, we then turn to full FOL so
we can also deal with arguments whose validity depends on their quantificational
structure (starting with the likes of our old friend ‘Socrates is a man; all men
are mortal; hence Socrates is a mortal’ !).
We need to introduce appropriate formal languages with quantifiers (more
precisely, with first-order quantifiers, running over a fixed domain of objects:
the next chapter explains the contrast with second-order quantifiers). So syntax
first.
Consider the simple ordinary-language sentence (i) ‘Socrates is wise’. And
now note that we can replace the name in (i) with the quantifier expression
‘everyone’ to give us another sentence (ii) ‘Everyone is wise’. Similarly, we can
directly replace the name ‘Juliet’ in (iii) ‘Romeo loves Juliet’ with the quantifier
expression ‘someone’ to get the equally grammatical (iv) ‘Romeo loves someone’.
In FOL, however, while we might render (i) as simply Ws, (ii) will get rendered
by something like ∀xWx (roughly, everyone x is such that x is wise). Similarly
if (iii) is rendered Lrj, then (iv) gets rendered by something like ∃xLrx (roughly,
someone x is such that Romeo loves x). But why?
It is crucial to understand the rationale for this departure from the syntac-
tic patterns of ordinary language and the use of the apparently more complex
‘quantifier/variable’ syntax in expressing generalizations. The headline point is
that in our formal languages we need to avoid the kind of structural ambiguities
that we can get in ordinary language when there is more than one logical oper-
ator involved. Consider for example the ambiguous ‘Everyone has not arrived’.
Does that mean ‘Everyone is such that they have not arrived’ or ‘It is not the
17
DRAFT– 31 DEC 2021
3 First-order logic
case that everyone has arrived’ ? Our logical notation will distinguish ∀x¬Ax and
¬∀xAx, with the relative ‘scopes’ of the generalization and the negation now
made transparent by the structure of the formulas.
(b.ii) Turning to semantics: the first key idea we need is that of a model struc-
ture, a (non-empty) domain of objects equipped with some properties, relations
and/or functions. And here we treat properties etc. extensionally. In other words,
we can think of a property as a set of objects from the domain, a binary relation
as a set of pairs from the domain, and so on. (Compare our remarks on naive
set theory in §2.1; though, heeding the point of §2.3, we can arguably take the
talk of sets here in a non-committal way.)
Then, crucially, you need to grasp the idea of an interpretation of an FOL
language in such a structure. Names are interpreted as denoting objects in the
domain; a one-place predicate gets assigned a property, i.e. a set of objects from
the domain (its extension – intuitively, the objects it is true of); a two-place
predicate gets assigned a binary relation; and so on.
Such an interpretation of the elements of a first-order language then generates
a valuation (a unique assignment of truth-values) for every sentence of the inter-
preted language. How does it do that? Well, a simple predicate-name sentence
like Ws will be true just if the object denoted by s is in the extension of W; a
sentence like Lrj is true if the ordered pair of the objects denoted by r and j is
in the extension of L; and so on. That’s easy. And the propositional connectives
continue to behave basically as in propositional logic.
But extending the formal semantic story to explain how the interpretation of
a language fixes the valuations of more complex, quantified, sentences requires
a new idea, some variant of the thought that ∀xWx is true just when Wa is true,
whatever ‘a’ picks out when treated as temporary name (compare: ‘everything
is W ’ is true just when ‘that is W ’ is true whatever the demonstrative ‘that’
might pick out in the relevant domain). There are a number of slightly different
ways of developing this story more carefully (for a start, do we take our FOL
languages to have a supply of special symbols available to act as temporary
names? or do we re-use a variable like ‘x’ without a preceding quantifier to then
act as a temporary name?) You need to get your head round the details of one
fully spelt-out story.
(b.iii) We can now introduce the idea of a model for a set of sentences, i.e. an
interpretation in a structure which makes all the sentences true together. And
we can then again define a semantic relation of entailment, this time for FOL
sentences:
You’ll again need to know some of the basic properties of this entailment relation.
18
DRAFT– 31 DEC 2021
FOL: a general overview
For one important example, note that if Γ has no model, then – on our defi-
nition – Γ semantically entails A for any A at all, including any contradiction.
(b.iv) Unlike the case of tautological entailment, this time there is no general
procedure for mechanically testing whether Γ semantically entails A. So the use
of proof systems to warrant entailments now really comes into its own.
You can again encounter five main types of proof system for FOL, with their
varying attractions and drawbacks. And to repeat, you’ll want at some future
point to find out at least something about all these styles of proof. But, as
before, we will principally be looking here at axiomatic systems and at one kind
of natural deduction.
As you will see, whichever form of proof system you take, some care is require
in handling inferences using the quantifiers in order to avoid fallacies. And we
will need extra care if we don’t use special symbols as temporary names but
allow the same variables to occur both ‘bound’ by quantifiers and ‘free’. This
isn’t the place to go into details; but you do need to tread carefully hereabouts!
(b.v) As with propositional logic, we will want to show that our chosen proof
system for FOL is sound and doesn’t overshoot (so giving us false positives) and is
complete and doesn’t undershoot (leaving us unable to derive some semantically
valid entailments).
In other words, if S is our FOL proof system, Γ a set of sentences, and A a
particular sentence, we need to show:
Now, for future uses, it is important that the completeness theorem actually
comes in two versions. There is a weaker version where Γ is restricted to having
only finitely many members (or indeed is empty). And there is a crucial stronger
version which allows Γ to be infinite.
And it is at this point, proving strong completeness, that the study of FOL
becomes mathematically really interesting.
(b.vi) Later chapters will continue the story along various paths; here though
I should quickly mention just one immediate corollary of completeness.
Proofs in formal systems are always only finitely long; so a proof of A from Γ
can only call on a finite number of premisses in Γ. But the strong completeness
theorem for FOL allows Γ to have an infinite number of members. This com-
bination of facts immediately implies the compactness theorem for sentences of
FOL languages:
2 That’s
equivalent to the claim that if (i) Γ doesn’t have a model, then there is a finite subset
∆ ⊆ Γ such that (ii) ∆ has no model. Suppose (i). This implies that Γ semantically entails
19
DRAFT– 31 DEC 2021
3 First-order logic
This compactness theorem, you will discover, has numerous applications in model
theory.
20
DRAFT– 31 DEC 2021
A little more about types of proof-system
Ax1. (A → (B → A))
Ax2. ((A → (B → C)) → ((A → B) → (A → C)))
1. (P → Q) premiss
2. (Q → R) premiss
3. ((Q → R) → (P → (Q → R))) instance of Ax1
4. (P → (Q → R)) from 2, 3 by MP
5. ((P → (Q → R) → ((P → Q) → (P → R))) instance of Ax2
6. ((P → Q) → (P → R)) from 4, 5 by MP
7. (P → R) from 1, 6 by MP
21
DRAFT– 31 DEC 2021
3 First-order logic
1. (P → Q) premiss
2. (Q → R) premiss
3. P supposition for the sake of argument
4. Q by MP from 3, 1
5. R by MP from 4, 2
6. (P → R) by CP, given the ‘subproof’ 3–5
So the key idea is that the line of proof snakes from column to column,
moving a column to the right (as at line 3) when a new temporary assump-
tion is made, and moving back a column to the left (as at line 6) when the
assumption is dropped or ‘discharged’. This mode of presentation really
comes into its own when multiple temporary assumptions are in play, and
makes such proofs very easy to read and follow. And, compared with the
axiomatic derivation, this regimented line of argument does indeed seem
to warrant being called a ‘natural deduction’ !
(ii) However, the layout for natural deduction proofs favoured for serious work
was first introduced Gerhard Gentzen in his doctoral thesis of 1933. He
sets out the proofs as trees, with premisses or temporary assumptions at
the top of branches and the conclusion at the root of the tree – and he uses
a system for explicitly tagging temporary assumptions and the inference
moves where they get discharged.
Let’s again argue from the same premisses to the same conclusion as
before. We will build up our Gentzen-style proof in two stages. First, then,
take the premisses (P → Q) and (Q → R) and the additional supposition
P, and construct the following proof of R using modus ponens twice:
P (P → Q)
Q (Q → R)
R
[P](1) (P → Q)
Q (Q → R)
R (1)
(P → R)
For clarity, we tag both the assumption which is discharged and the cor-
responding inference line where the discharging takes place with matching
labels, in this case ‘(1)’. (We’ll need multiple labels when multiple tempo-
rary assumptions are put into play and then dropped.)
In the second proof, then, just the unbracketed sentences at the tips of
branches are left as ‘live’ assumptions. So this is our Gentzen-style proof
from those remaining premisses (P → Q) and (Q → R) to the conclusion
(P → R).
(d) There is much more to be said of course, but that’s enough by way of some
very introductory remarks about the first half of the following list of commonly
used types of proof system:
1. Old-school axiomatic systems.
2. (i) Natural deduction done Gentzen-style.
(ii) Natural deduction done Fitch-style.
3. ‘Semantic tableaux’ or ‘truth trees’.
4. Sequent calculi.
5. Resolution calculi.
So next, a very brief word about semantic tableaux, which are akin to Gentzen-
style proof trees turned upside down.
The key idea is this. Instead of starting from some premisses Γ and aiming
directly for the desired conclusion A, we begin instead by assuming the premisses
are all true while the conclusion is false. And then we ‘work backwards’ from the
assumed values of these typically complex wffs, aiming to uncover a valuation of
the atoms for the relevant language which indeed makes Γ all true and A false.
If our search gets entangled in contradiction, that tells us that there is no such
valuation: so if Γ are all true, then A indeed has to be true too.
Note however that assuming e.g. that (A ∨ B) is true doesn’t tell us which
of A and B is true too: so as we ‘work backwards’ from the values of more
complex wffs to the values of their components we will typically have to explore
branching options, which are most naturally displayed on a downward-branching
tree. Hence ‘truth trees’.
The details of a truth-tree system for FOL are elegantly simple – which is why
the majority of elementary logic books for philosophers introduce either (2.ii)
Fitch-style natural deduction or (3) truth trees, or both. And indeed, it is well
worth getting to know about tree systems at a fairly early stage because they
can be adapted rather nicely to dealing with logics other than FOL. However,
introductory mathematical logic text books do usually focus on either (1) axiom-
atic systems or (2.i) Gentzen-style proof systems, and those will be our initial
main focus here too.
23
DRAFT– 31 DEC 2021
3 First-order logic
As for (4) the sequent calculus, in its most interesting form this really comes
into its own in more advanced work in proof theory. While (5) resolution calculi
are perhaps of particular concern to computer scientists interested in automating
theorem proving.
(e) I should add, though, that even once you’ve picked your favoured general
type of proof-system to work with from (1) to (5), there are many more choices
to be made before landing on a specific system of that type. For example, F.
J. Pelletier and Allen Hazen published a survey of logic texts aimed at philoso-
phers which use natural deduction systems (tinyurl.com/pellhazen): they note
that no less than thirty texts use a variety of Fitch-style system (2.ii) – and,
rather remarkably, no two of these have exactly the same system of rules for
FOL!
Moral? Don’t get too hung up on the finest details of a particular textbook’s
proof-system; it is the overall guiding ideas that matter, together with the Big
Ideas underlying proofs about the chosen proof-system (such as the soundness
and completeness theorems).
Unsurprisingly, there is a very long list of texts which cover FOL. But the
whole point of this Guide is to choose. So here are my top recommendations,
starting with one-and-a-third books which, taken together, make an excellent
introduction:
24
DRAFT– 31 DEC 2021
Basic recommendations for reading on FOL
Next, you should complement C&H by reading the first third of the following
excellent book:
25
DRAFT– 31 DEC 2021
3 First-order logic
These three main recommended books, by the way, have all had very positive
reports over the years from student users.
26
DRAFT– 31 DEC 2021
Some parallel and slightly more advanced reading
To repeat, unlike our main recommendations, Bostock does give a brisk but
very clear presentation of tableaux (‘truth trees’), and he proves completeness
for tableaux in particular, which I always think makes the needed construction
seem particularly natural. If you are a philosopher, you may well have already
encountered these truth trees in your introductory logic course. If not, at some
point you will want to find out about them (see §3.2d). As an alternative to
Bostock,
Next, back to the level we want: and even though it is giving a second bite
to an author we’ve already met, I must mention a rather different discussion of
FOL:
Next, here’s a much-used text which has gone through multiple editions and
should be in any library; it is a very useful natural deduction based alternative
to C&H. Later chapters of this book are also mentioned later in this Guide as
possible reading on further topics, so it could be worth making early acquaintance
with
8. Dirk van Dalen, Logic and Structure (Springer, 1980; 5th edition 2012).
The early chapters up to and including §3.2 provide an introduction
to FOL via Gentzen-style natural deduction. The treatment is often ap-
proachable and written with a relatively light touch. However, it has to
27
DRAFT– 31 DEC 2021
3 First-order logic
be said that the book isn’t without its quirks and flaws and inconsisten-
cies of presentation (though perhaps you have to be an alert and rather
pernickety reader to notice and be bothered by them). Still, having said
that, the coverage and general approach is good.
Mathematicians should be able to cope readily. I suspect, however,
that the book would occasionally be tougher going for philosophers if
taken from a standing start – which is another reason why I have re-
commended beginning with C&H instead. For more on this book, see
tinyurl.com/dalenlogic.
As a follow up to C&H, I just recommended L&K’s Friendly Introduction
which uses an axiomatic system. As an alternative to that, here is an older (and,
in its day, much-used) text:
9. Herbert Enderton, A Mathematical Introduction to Logic (Academic Press
1972, 2002).
This also focuses on an axiomatic system, and is often regarded as a
classic of exposition. However, it does strike me as somewhat less ap-
proachable than L&K, so I’m not surprised that students do quite often
report finding this book a bit challenging if used by itself as a first text.
However, this is an admirable and very reliable piece of work which
most readers should be able to cope with well if used as a supplementary
second text, e.g. after you have tackled C&H. And stronger mathemati-
cians might well dive into this as their first preference.
Read up to and including §2.5 or §2.6 at this stage. Later, you can
finish the rest of that chapter to take you a bit further into model theory.
For more about this classic, see tinyurl.com/enderlogicnote.
I should also certainly mention the outputs from the Open Logic Project. This
is an entirely admirable, collaborative, open-source, enterprise inaugurated by
Richard Zach, and continues to be work in progress. You can freely download the
latest full version and various sampled ‘remixes’ from tinyurl.com/openlogic. In
an earlier version of this Guide, I said that “although this is referred to as a text-
book, it is perhaps better regarded as a set of souped-up lecture notes, written
at various degrees of sophistication and with various degrees of more book-like
elaboration.” But things have moved on: the mix of chapters on propositional
and quantificational logic in the following selection has been expanded and de-
veloped considerably, and the result is much more book-like:
10. Richard Zach and others, Sets, Logic, Computation* (Open Logic: down-
loadable at tinyurl.com/slcopen).
There’s a lot to like here (Chapters 5 to 13 are the immediately relevant
ones for the moment). In particular, Chapter 9 could make for very useful
supplementary reading on natural deduction. Chapter 8 tells you about
a sequent calculus (a slightly odd ordering!). And Chapter 10 on the
completeness theorem for FOL should also prove a very useful revision
guide.
28
DRAFT– 31 DEC 2021
Some parallel and slightly more advanced reading
So much, then, for reading on FOL running on more or less parallel tracks
to the main recommendations in the preceding section. I’ll finish this section
by recommending two books that push the story on a little. First, an absolute
classic, short but packed with good things:
And second, taking things in a new direction, don’t be put off by the title of
12. Melvin Fitting, First-Order Logic and Automated Theorem Proving (Spr-
inger, 1990, 2nd end. 1996).
A wonderfully lucid book by a renowned expositor. Yes, at a number of
places in the book there are illustrations of how to implement algorithms
in Prolog. But either you can easily pick up the very small amount of
background knowledge that’s needed to follow everything that is going
on (and that’s quite fun) or you can in fact just skip lightly over those
implementation episodes while still getting the principal logical content
of the book.
As anyone who has tried to work inside an axiomatic system knows,
proof-discovery for such systems is often hard. Which axiom schema
should we instantiate with which wffs at any given stage of a proof?
Natural deduction systems are nicer. But since we can, in effect, make
any new temporary assumption we like at any stage in a proof, again
we still need to keep our wits about us if we are to avoid going off on
useless diversions. By contrast, tableau proofs (a.k.a. tree proofs) can
pretty much write themselves even for quite complex FOL arguments,
which is why I used to introduce formal proofs to students that way
(in teaching tableaux, we can largely separate the business of getting
across the idea of formality from the task of teaching heuristics of proof-
discovery). And because tableau proofs very often write themselves, they
are also good for automated theorem proving. Fitting explores both the
29
DRAFT– 31 DEC 2021
3 First-order logic
30
DRAFT– 31 DEC 2021
A little history (and some philosophy too)
Suppose for the sake of argument that P and Q; then we can derive
P – by the rule (i) which partly fixes the meaning of ‘and’.
And given that little suppositional inference, the rule (ii) which
partly gives the meaning of ‘if’ entitles us to drop the supposition
and conclude if P and Q, then Q.
Or in a Gentzen-style proof
[P ∧ Q](1)
P (1)
(P ∧ Q) → P)
In short, the inference rules (i) and (ii) enable us to derive that logical truth ‘for
free’ (from no remaining assumptions): it’s a theorem of a formal system with
those rules.
If this is right, and if the point generalizes, then we don’t have to see such
logical truths as reflecting deep facts about the logical structure of the world
(whatever that could mean): logical truths fall out just as byproducts of the
inference rules whose applicability is, in some sense, built into the very meaning
of e.g. the connectives and the quantifiers.
It is a nice question how far we should buy that sort of de-mystifying story
about the nature of logical truth. But whatever your eventual judgement on
that, there surely is something odd about thinking with Frege and Russell that
a systematized logic is primarily aiming to regiment a special class of ultra-
general truths. Isn’t logic at bottom about good and bad reasoning practices,
about what makes for a good proof? Shouldn’t its prime concern be the correct
styles of valid inference? And hence, shouldn’t a formalized logic highlight rules
of valid proof-building (perhaps as in a natural deduction system) rather than
stressing logical truths (as logical axioms)?
(c) Back to the history of the technical development of logic. An obvious start-
ing place is with the clear and judicious
31
DRAFT– 31 DEC 2021
3 First-order logic
13. William Ewald, ‘The emergence of first-order logic’, The Stanford Ency-
clopaedia, tinyurl.com/emergenceFOL.
If you want rather more, the following is also readable and very helpful:
And for a longer, though rather bumpier, read – you’ll probably need to skim
and skip! – you could also try dipping into this more wide-ranging piece:
15. Paolo Mancosu, Richard Zach and Calixto Badesa, ‘The development
of mathematical logic from Russell to Tarski: 1900–1935’ in Leila Haa-
paranta, ed., The History of Modern Logic (OUP, 2009, pp. 318–471):
tinyurl.com/developlogic.
(a) I quickly mention a handful of books aimed at philosophers (but only one
will be of interest to us at this point).
(b) Next, I consider four deservedly classic books, now more than fifty years
old.
(c) Then I look at eight more recent mathematical logic texts (I again highlight
one in particular).
(d) Finally, for light relief, I look at some fun extras from an author whom we
have already met.
(a) The following five books are very varied in style, level and content, but are
all designed with philosophers particularly in mind.
(a1) Richard Jeffrey, Formal Logic: Its Scope and Limits (McGraw Hill 1967,
2nd edn. 1981).
(a2) Merrie Bergmann, James Moor and Jack Nelson, The Logic Book (McGraw
Hill 1980; 6th edn. 2013).
(a3) John L. Bell, David DeVidi and Graham Solomon, Logical Options: An
Introduction to Classical and Alternative Logics (Broadview Press 2001).
(a4) Theodore Sider, Logic for Philosophy* (OUP, 2010).
(a5) Jan von Plato, Elements of Logical Reasoning* (CUP, 2014).
Quick comments: Sider’s book (a4) falls into two halves, and the second half
is quite good on modal logic; but the first half of the book, the part which is
relevant to us now, is very poor. Only the first two chapters of Logical Options
(a3) are on FOL, and not at the level we really want. Von Plato’s Elements (a5)
32
DRAFT– 31 DEC 2021
Postscript: Other treatments?
Kleene’s (b3) – not to be confused with his earlier and hugely influential Intro-
duction to Metamathematics – goes much more gently than Mendelson: it takes
almost twice as long to cover propositional and predicate logic, so Kleene has
much more room for helpful discursive explanations. This was in its time a rightly
much admired text, and still makes excellent and illuminating supplementary
reading.
But if you do want an old-school introduction from the same era, you might
most enjoy the somewhat less renowned book by Hunter, (b4). This is not as
comprehensive as Mendelson: but it was an exceptionally good textbook from
a time when there were few to choose from. Read Parts One to Three at this
stage. And if you are finding it rewarding reading, then do eventually finish the
book: it goes on to consider formal arithmetic and proves the undecidability of
first-order logic, topics we consider in Chapter 6. Unfortunately, the typography
– from pre-LATEX days – isn’t very pretty to look at. But in fact the treatment
of an axiomatic system of logic is extremely clear and accessible.
(c) We now turn to a number of more recent texts in mathematical logic that
have been suggested as candidates for this Guide. As you will see, the most
interesting of them – which almost made the cut to be included in §3.4’s list of
additional readings – is the idiosyncratic book by Kaye.
I have added the last two to the list in response to queries. But while the relevant
Chapters 2 and 4 of (c7) are quite attractively written, and have some interest,
there also are a number of presentation choices I’d quibble with. You can do
better. While (c8) just isn’t designed to be a conventional mathematical logic
text. It does have a fast-track introduction to FOL, but this is done far too fast
to be of much use to anyone. We can ignore it.
So going back to earlier texts, Ebbinghaus, Flum and Thomas’s (c1) is the
English translation of a book first published in German in 1978, and appears in a
series ‘Undergraduate Texts in Mathematics’, which indicates the intended level.
The book is often warmly praised and is (I believe) quite widely used in Germany.
There is a lot of material here, often covered well. But I can’t find myself wanting
to recommend it as a good place to start. The core material on the syntax
34
DRAFT– 31 DEC 2021
Postscript: Other treatments?
35
DRAFT– 31 DEC 2021
3 First-order logic
ematical idea of ‘a completeness theorem’ is, with some illustrations that have
real mathematical content.” So the reader is taken on a mathematical journey
starting with König’s Lemma (I’m not going to explain that here!), and progress-
ing via order relations, Zorn’s Lemma (an equivalent to the Axiom of Choice),
Boolean algebras, and propositional logic, to completeness and compactness of
first-order logic. Does this very unusual route work as an introduction? I am
not at all convinced. It seems to me that the journey is made too bumpy and
the road taken is far too uneven in level for this to be appealing as an early
trip through first-order logic. However, if you already know a fair amount of this
material from more conventional presentations, the different angle of approach
in this book linking topics together in new ways could well be very interesting
and illuminating.
(d) I have already strongly recommended Raymond Smullyan’s 1968 classic
First-Order Logic. Smullyan went on to write some absolutely classic texts on
Gödel’s theorem and on recursive functions, which we’ll be mentioning later.
But as well as these, he also wrote many ‘puzzle’-based books aimed at a wider
audience, including e.g. the rightly renowned What is the Name of This Book? *
(Dover Publications reprint of 1981 original, 2011) and The Gödelian Puzzle
Book * (Dover Publications, 2013).
Smullyan has also written Logical Labyrinths (A. K. Peters, 2009). From the
blurb: “This book features a unique approach to the teaching of mathematical
logic by putting it in the context of the puzzles and paradoxes of common lan-
guage and rational thought. It serves as a bridge from the author’s puzzle books
to his technical writing in the fascinating field of mathematical logic. Using the
logic of lying and truth-telling, the author introduces the readers to informal
reasoning preparing them for the formal study of symbolic logic, from propo-
sitional logic to first-order logic, . . . The book includes a journey through the
amazing labyrinths of infinity, which have stirred the imagination of mankind as
much, if not more, than any other subject.”
Smullyan starts, then, with puzzles, of this kind: you are visiting an island
where there are Knights (truth-tellers) and Knaves (persistent liars) and then in
various scenarios you have to work out what’s true given what the inhabitants
say about each other and the world. And, without too many big leaps, he ends
with first-order logic (using tableaux), completeness, compactness and more. To
be sure, this is no substitute for standard texts: but – for those with a taste for
being led up to the serious stuff via sequences of puzzles – a very entertaining
and illuminating supplement.
(Smullyan’s later A Beginner’s Guide to Mathematical Logic*, Dover Publi-
cations, 2014, is rather more conventional. The first 170 pages are relevant to
FOL. A rather uneven read, it seems to me; but again an engaging supplement
to the main texts recommended above.)
36
DRAFT– 31 DEC 2021
Classical first-order logic contrasts along one dimension with various non-classical
logics, and along another dimension with second-order and higher-order logics.
We can leave the exploration of non-classical logics to later chapters, starting
with Chapter 8. I will, however, say a little about second-order logic straight
away, in this chapter. Why?
Theories expressed in first-order languages with a first-order logic turn out to
have their limitations – that’s a theme that will recur when we look at model
theory (Chapter 5), theories of arithmetic (Chapter 6), and set theory (Chap-
ter 7). And you will occasionally find explicit contrasts being drawn with richer
theories expressed in second-order languages with a second-order logic. So, al-
though it’s a judgement call, I think it is worth getting to know just a bit about
second-order logic quite early on, in order to understand the contrasts being
drawn.
But first, . . .
If we want to make the generality here explicit, we could very naturally write
with the first quantifier understood as running just over scalars, and with the
other two quantifiers running just over vectors. Or we could explicitly declare
which domain a quantified variable is running over by using a notation like
(∀a : S) to assign a to scalars: mathematicians often do this informally. (And in
37
DRAFT– 31 DEC 2021
4 Second-order logic, quite briefly
some formal ‘type theories’, this kind of notation becomes the official policy: see
§12.7.)
It might seem strange, then, to insist that, if we want to formalize our theory
of vector spaces, we should follow FOL practice and use only one sort of variable
and therefore have to render the rule for scalar multiplication along the lines of
(3) ∀x∀y∀z((Sx ∧ Vy ∧ Vz) → x(y + z) = xy + xz),
i.e. ‘Take any three things in our [inclusive] domain, if the first is a scalar, the
second is a vector, and the third is a vector, then . . . ’.
(b) In sum, the theory of vector spaces is naturally regimented using a two-
sorted logic, with two sorts of variables running over two different domains. So,
generalizing, why not allow a many-sorted logic – allowing multiple independent
domains of objects, with different sorts of variables restricted to running over
the different domains?
In fact, it isn’t hard to set up such a revised version of FOL (it is first-order,
as the quantifiers are still of the now familiar basic type, running over objects
in the relevant domains). The syntax and semantics of a many-sorted language
can be defined quite easily. Syntactically, we will need to keep a tally of the
sorts assigned to the various names and variables. And we will also need rules
about which sorts of terms can go into which slots in predicates and in function-
expressions (for example, only terms for vectors should be used as inputs to
the vector-addition function). Semantically, we assign a domain for each sort
of variable, and then proceed pretty much as in the one-sorted case. Assuming
that each domain is non-empty (as in standard FOL) the inference rules for a
deductive system will then look entirely familiar. And the resulting logic will
have the same nice technical properties as standard one-sorted FOL; crucially,
you can prove soundness and completeness and compactness theorems in just
the same ways.
(c) As so often in the formalization game, we are now faced with a cost/benefit
trade-off. We can get the benefit of somewhat more natural regimentations of
mathematical practice, at the cost of having to use a slightly more complex many-
sorted logic. Or we can pay the price of having to use less natural regimentations
– we need to translate propositions like (2) by using restricted quantifications
like (3) – but get the benefit of a slightly-simpler-in-practice logic.1
So you pays your money and you takes your choice. For many (most?) pur-
poses, logicians prefer the second option, sticking to standard single-sorted FOL.
That’s because, at the end of the day, we care rather less about elegance when
regimenting this or that theory than about having a simple-but-powerful logical
system.
1 Note though that we do also get some added flexibility on the second option. The use of
a sorted quantifier ∀aFa with the usual logic presupposes that there is at least one thing
in the relevant domain for the variable a. But a corresponding restricted quantification
∀x(Ax → Fx), where the variable x quantifies over some wider domain while A picks out the
relevant sort which a was supposed to run over, leaves open the possibility that there is
nothing of that sort.
38
DRAFT– 31 DEC 2021
Second-order logic: a brief overview
(Ind 1) Take any numerical property X; if (i) zero has X and (ii) any number
which has X passes it on to its successor, then (iii) all numbers must
share property X.
This holds, of course, because every natural number is either zero or is an even-
tual successor of zero (i.e. is either 0 or 00 or 000 or 0000 or . . . , where the prime ‘0 ’
is a standard sign for the function that maps a number to its successor). There
are no stray numbers outside that sequence, so a property that percolates down
the sequence eventually applies to any number at all.
There is no problem about expressing some particular instances of the in-
duction principle in a first-order language. For example, suppose P is a formal
one-place predicate expressing some particular arithmetical property: then we
can express the induction principle for this property by writing
where the small-‘x’ quantifier runs over the natural numbers and again the prime
expresses the successor function. But how can we state the general principle
of induction in a formal language, the principle that applies to any numerical
property? The natural candidate is something like this:
Here the big-‘X’ quantifier is a new type of quantifier, which unlike the small-
‘x’ quantifier, quantifies ‘into predicate position’. In other words, it quantifies
into the position occupied in (Ind 2) by the predicate ‘P’, and the expressed
generalization is intended to run over all properties of numbers, so that (Ind 3)
indeed formally renders (Ind 1). But this kind of quantification – second-order
quantification – is not available in standard first-order languages of the kind that
you now know and love.
If we do want to stick with a theory framed in a first-order arithmetical lan-
guage L which just quantifies over numbers, the best we can do to render the
induction principle is to use a template or schema and say something like
However (Ind 4) is much weaker than the informal (Ind 1) or the equivalent
formal version (Ind 3) on its intended interpretation. For (Ind 1/3) tells us that
induction holds for any property at all ; while, in effect, (Ind 4) only tells us that
induction holds for those properties that can be expressed by some L-predicate
A( ).
39
DRAFT– 31 DEC 2021
4 Second-order logic, quite briefly
(i) First note that given a relational predicate R expressing the relation R, we
can of course define complex expressions, which we can abbreviate Rn , to
express the corresponding relations Rn . For example, we just put
R3 ab =def ∃x1 ∃x2 ∃x3 (Rax1 ∧ Rx1 x2 ∧ Rx2 x3 ∧ Rx3 b).
Now suppose we can construct an expression R∗ for the ancestral of the
relation expressed by R. Then consider the infinite set of wffs
{¬Rab, ¬R1 ab, ¬R2 ab, ¬R3 ab, . . . , ¬Rn ab, . . . , R∗ ab}
Then (X) every finite collection of these wffs has a model (let n be the
largest index appearing, and consider the case where a is the ancestor of b
more than n generations removed). But obviously (Y) the whole infinite set
of sentences doesn’t have a model (a can’t be an R-ancestor of b without
there being some n such that Rn ab).
(ii) Now, if we stay first-order, then we know the compactness theorem holds.
That means for first-order wffs we can’t have both (X) and (Y). Hence,
we can’t after all construct an expression R∗ from R and first-order logical
apparatus. In short, we can’t define the ancestral of a relation in first-order
logic.
(iii) On the other hand, a little reflection shows that a stands in the ancestral
of the R-relation to b just in case b inherits ever property that is had by
any R-child of a, and then always preserved by the R relation (why?). And
that’s why Frege could define the ancestral using second-order apparatus
like this:
R∗ ab =def ∀X[(∀x(Rax → Xx) ∧ ∀x∀y(Xx ∧ Rxy → Xy)) → Xb]
And note that, since we can construct a second-order expression R∗ for the
ancestral of the relation expressed by R, then – because (X) and (Y) are
true together – compactness must fail for second-order logic.
40
DRAFT– 31 DEC 2021
Second-order logic: a brief overview
Again, there’s fine print; but you get the general idea.
We’ll now also want to expand the syntactic and semantic stories further
to allow second-order quantification over binary and other relations and over
functions too; but these expansions raise no extra issues.
We can then define the relation of semantic consequence for formulas in our
extended languages including second-order quantifiers in the now familiar way:
(d) So, in bald summary, the situation is this. There are quite a few famil-
iar mathematical claims like the arithmetical induction principle, and familiar
mathematical constructions like forming transitive closures which are naturally
41
DRAFT– 31 DEC 2021
4 Second-order logic, quite briefly
42
DRAFT– 31 DEC 2021
Second-order logic: a brief overview
What little you need for present purposes is covered in four clear pages by
1. Herbert Enderton, A Mathematical Introduction to Logic (Academic
Press 1972, 2002), §4.3.
There is, however, a bit more that can be fussed over here, and some might be
interested in looking at e.g. Hans Halvorson’s The Logic in Philosophy of Science
(CUP, 2019), §§5.1–5.3
Turning now to second-order logic:
For a brief review, saying only a little more than my overview remarks, see
2. Richard Zach and others, Sets, Logic, Computation* (Open Logic) §11.3,
excerpted at tinyurl.com/openlogicSOL.
You could then look e.g. at the rest of Chapter 4 of the Enderton (1). Or,
rather more usefully at this stage, read
3. Stewart Shapiro, ‘Higher-order logic’, in S. Shapiro, ed., The Oxford
Handbook of the Philosophy of Mathematics and Logic (OUP, 2005).
You can skip §3.3; but §3.4 touches on Boolos’s ideas and is relevant
to the question of how far second-order logic presupposes set theory.
Shapiro’s §5, ‘Logical choice’, is an interesting discussion of what’s at
stake in adopting a second-order logic. (Don’t worry if some points will
only become really clear once you’ve done some model theory and some
formal arithmetic.)
To nail down some of the technical basics you can then very usefully sup-
plement the explanations in Shapiro with the admirably clear
4. Tim Button and Sean Walsh, Philosophy and Model Theory* (OUP,
2018), Chapter 1.
This chapter reviews, in a particularly helpful way, various ways of
developing the semantics of first-order logical languages; and then it
compares the first-order case with the second-order options, both ‘full’
semantics and ‘Henkin’ semantics.
For alternative very introductory reading you could look at the very clear
5. Theodore Sider, ‘Crash course on higher-order logic’, §§1–3, 5. Available
at tinyurl.com/siderHOL.
While if the initial readings leave you still wanting to fill out the technical story
about second-order logic a little further, you will then want to dive into the
self-recommending
44
DRAFT– 31 DEC 2021
Conceptual issues
So the idea is that we don’t need to invoke sets to interpret (3), just a non-
committal use of plurals. For more on this, just because he is so very readable,
let me highlight the thought-provoking
You can then follow up some of the critical discussions of Boolos mentioned by
Shapiro.
Note, however, the usual semantics for second-order logic and Boolos’s pro-
posed alternative do share an assumption – in effect, neither treat properties
very seriously! Recall, we started off stating the informal induction principle
(Ind 1) in terms of a generalization over properties of numbers. But in interpret-
ing its second-order regimentation (Ind 3), we’ve only spoken of sets of numbers
(to serve as extensions of properties, the standard story) or spoken even more
economically, just about numbers, plural (Ind 30 , Boolos). Where have the prop-
erties gone? Philosophers, at any rate, might want to resist reducing higher-order
entities (properties, properties of properties) to first-order entities (objectsl, or
sets of objects). Now, this is most certainly not the place to enter into those
debates. But for a nice survey with pointers to relevant discussions, see
45
DRAFT– 31 DEC 2021
5 Model theory
The high point of a first serious encounter with FOL is the proof of the complete-
ness theorem. Introductory texts then usually discuss at least a couple of quick
corollaries of the proof – the compactness theorem (which we’ve already met)
and the downward Löwenheim-Skolem theorem. And so we take initial steps into
what we can call Level 1 model theory. Further along the track we will encounter
Level 3 model theory (I am thinking of the sort of topics covered in e.g. the later
chapters of the now classic texts by Wilfrid Hodges and David Marker which
are recommended as advanced reading in §12.2). In between, there is a stretch
of what we can think of as Level 2 theory – still relatively elementary, relatively
accessible without too many hard scrambles, but going somewhat beyond the
very basics.
Putting it like this in terms of ‘levels’ is of course only for the purposes of
rough-and-ready organization: there are no sharp boundaries to be drawn. In a
first foray into mathematical logic, though, you should certainly get your head
around Level 1 model theory. Then tackle as much Level 2 theory as grabs your
interest.
But what topics can we assign to these first two levels?
46
DRAFT– 31 DEC 2021
Elementary model theory: an overview
47
DRAFT– 31 DEC 2021
5 Model theory
(b) Now, you have already met a pair of fundamental results linking semantic
structures and sets of first-order sentences – the soundness and completeness
theorems. And these lead to a pair of fundamental model-theoretic results. The
first of these we’ve met before, at end of §3.1:
(5) The compactness theorem (a.k.a. the finiteness theorem). If every finite
subset of a set of sentences Γ from a first-order language has a model, so
does Γ.
For our second result, revisit a standard completeness proof for FOL, which
shows that any syntactically consistent set of sentences from a first-order lan-
guage (set of sentences from which you can’t derive a contradiction) has a model.
Look at the details of the proof: it gives an abstract recipe for building the
required model. And assuming that we are dealing with normal first-order lan-
guages (with a countable vocabulary), you’ll find that the recipe delivers a count-
able model – so in effect, our proof shows that a syntactically consistent set of
sentences has a model whose domain is just (some or all) the natural numbers.
From this observation we get
48
DRAFT– 31 DEC 2021
Elementary model theory: an overview
(7) There is no first-order sentence ∃∞ which is true in all and only structures
with infinite domains.
That’s a nice mini-result about the limitations of first-order languages. But now
let’s note a second, much more dramatic, such result.
Suppose LA is a formal first-order language for the arithmetic of the natural
numbers. The precise details don’t matter; but to fix ideas, suppose LA ’s built-
in non-logical vocabulary comprises the binary function expressions + and ×
(with their obvious interpretations), the unary function expression 0 (expressing
the successor function), and the constant 0 (denoting zero). So note that LA
then has a sequence of expressions 0, 00 , 000 , 0000 , . . . which can serve as numerals,
denoting 0, 1, 2, 3, . . . .
Now let the theory Ttrue , i.e. true arithmetic, be the set of all true LA sen-
tences. Then we can show the following:
(8) As well as being true of its ‘intended model’ – i.e. the natural numbers
with their distinguished element zero and the successor, addition, and mul-
tiplication functions defined over them – Ttrue is also true of differently-
structured, non-isomorphic, models.
49
DRAFT– 31 DEC 2021
5 Model theory
And this is really rather remarkable! Formal first-order theories are our stan-
dard way of regimenting informal mathematical theories: but now we find that
even Ttrue – the set of all first-order LA truths taken together – still fails to pin
down a unique structure for the natural numbers.
(d) And, turning now to the L-S theorem, we find that things only get worse.
Again let’s take a dramatic example.
Suppose we aim to capture the set-theoretic principles we use as mathemati-
cians, arriving at the gold-standard Zermelo-Fraenkel set theory with the Axiom
of Choice, which we regiment as the first-order theory ZFC. Then:
(9) ZFC, on its intended interpretation, makes lots of infinitary claims about
the existence of sets much bigger than the set of natural numbers. But the
downward Löwenheim-Skolem theorem tells us that, all the same, assum-
ing ZFC is consistent and has a model at all, it has an unintended countable
model (despite the fact that ZFC has a theorem which on the intended in-
terpretation says that there are uncountable sets). In other words, ZFC has
an interpretation in the natural numbers. Hence our standard first-order
formalized set theory certainly fails to uniquely pin down the wildly infini-
tary universe of sets – it doesn’t even manage to pin down an uncountable
universe.
What is emerging then, in these first steps into model theory, are some very
considerable and perhaps unexpected(?) expressive limitations of first-order for-
malized theories. These limitations can be thought of as one of the main themes
of Level 1 model theory.
(e) At Level 2, we can pursue this theme further, starting with the upward
Löwenheim-Skolem theorem which tells us that if a theory has an infinite model
it will also have models of all larger infinite sizes (as you see, then, you’ll need
some basic grip on the idea of the hierarchy of different cardinal sizes to make
full sense of this sort of result). Hence
(10) The upward and downward Löwenheim-Skolem theorems tell us that first-
order theories which have infinite models won’t be categorical – i.e. their
+
Now observe that any finite collection of sentences ∆ ⊂ Ttrue has a model. Because ∆
is finite, there will be a some largest number n such that the axiom n 6= c is in ∆; so just
interpret c as denoting n + 1 and give all the other vocabulary its intended interpretation,
and every sentence in the finite set ∆ will by hypothesis be true on this interpretation.
+ +
Since any finite ∆ ⊂ Ttrue has a model, Ttrue itself has a model, by compactness. That
model, as well as having a zero and its successors, must also have in its domain a non-
standard ‘number’ c to be the denotation of the new name c (where c is distinct from the
denotations of 0, 1, 2, 3, . . .). And note, since the new model must still make true e.g. the
old Ttrue sentence which says that everything in the domain has a successor, there will in
addition be more non-standard numbers to be successor of c, the successor of that, etc.
+
Now take a structure which is a model for Ttrue , with its domain including non-standard
+
numbers. Then in particular it makes true all the sentences of Ttrue which don’t feature the
constant c. But these are just the sentences of the original Ttrue . So this structure will still
make all Ttrue true – even though its domain contains more than a zero and its successors,
and so does not ‘look like’ the original intended model.
50
DRAFT– 31 DEC 2021
Elementary model theory: an overview
models won’t all look the same because they can have domains of different
infinite sizes. For example, try as we might, a first-order theory of arith-
metic will always have non-standard models which ‘look too big’ to be
the natural numbers with their usual structure, and a first-order theory of
sets will always have non-standard models which ‘look too small’ to be the
universe of sets as we intuitively conceive it.
But if we can’t achieve full categoricity (all models looking the same),
perhaps we can get restricted categoricity results for some theories (telling
us that all models of a certain size look the same) – when is this possible?
An example you’ll find discussed: the theory of dense linear orders is
countably categorical (i.e. all its models of the size of the natural numbers
are isomorphic – a lovely result due to Cantor); but it isn’t categorical at
the next infinite size up. On the other hand, theories of first-order arith-
metic are not even countably categorical (even if we restrict ourselves to
models in the natural numbers, there can be models which give deviant
interpretations of successor, addition and multiplication).
How does that last claim square with the proof you often meet early in a maths
course that a theory usually called ‘Peano Arithmetic’ is categorical? The answer
is straightforward. As already indicated in (3) above, the version of Peano Arith-
metic which is categorical is a second-order theory – i.e. a theory which quantifies
not just over numbers but over numerical properties, and has a second-order in-
duction principle. Going second-order makes all the difference in arithmetic, and
in other theories too like the theory of the real numbers. But why? To understand
what is going on here, you need to understand something about the contrast be-
tween first-order theories and second-order ones. (So see our previous chapter,
and follow up the readings if you didn’t do so before.)
(f) Still at Level 2, there are results about which theories are complete in the
sense of entailing either A or ¬A for each relevant sentence A, and how this
relates to being categorical at a particular size. And there is another related
notion of so-called model-completeness: but let’s not pause over that.
Instead, let’s mention just one more fascinating topic that you will encounter
early in your model theory explorations:
(11) As explained in the last footnote, we can take a standard first-order theory
of the natural numbers and use a compactness argument to show that it
has a non-standard model which has an element c in the domain distinct
from (and indeed greater than) zero or any of its successors. Similarly,
we can take a standard first-order of the real numbers and use another
compactness argument to show that it has a non-standard model with
an element r in the domain such that that 0 < |r| < 1/n for any natural
number n. So in this model, the non-standard real r is non-zero but smaller
than any rational number, so is infinitesimally small. And indeed our model
will have non-standard reals infinitesimally close to any standard real.
In this way, we can build up a model of non-standard analysis with
infinitesimals (where e.g. a differential really can be treated as a ratio of
51
DRAFT– 31 DEC 2021
5 Model theory
infinitesimally small numbers – in just the sort of way that we all supposed
wasn’t respectable at all). Fascinating!
Let’s begin with a more expansive and very helpful overview (though you
may not understand everything at this preliminary stage). For a bit more
detail about the initial agenda of model theory, it is hard to beat
Now, a number of the introductions to FOL that I noted in $3.4 have treat-
ments of the Level 1 basics; I’ll be recommending one in a moment, and will
return to some of the others in the next section on parallel reading. Going just
a little beyond, the very first volume in the prestigious and immensely useful
Oxford Logic Guides series is Jane Bridge’s short Beginning Model Theory: The
Completeness Theorem and Some Consequences (Clarendon Press, 1977). This
neatly takes us through some Level 1 and a few Level 2 topics. But the writing,
though very clear, is also rather terse in an old-school way; and the book – not
unusually for that publication date – looks like photo-reproduced typescript,
which is nowadays really off-putting to read. What, then, are the more recent
options?
52
DRAFT– 31 DEC 2021
Recommendations for beginning first-order model theory
53
DRAFT– 31 DEC 2021
5 Model theory
would struggle with parts of this short book if were treated as a sole
introduction to model theory. However, again if you have read Goldrei,
it should be very helpful as an alternative or complement to Manzano’s
book. For a little more about it, see tinyurl.com/kirbybooknote.
5. Dirk van Dalen Logic and Structure (Springer, 1980; 5th edition 2012),
Chapter 3.
This covers rather more model-theoretic material than Enderton and
in greater detail. You could read §3.1 for revision on the completeness
theorem, then tackle §3.2 on compactness, the Löwenheim-Skolem theo-
rems and their implications, before moving on to the action-packed §3.3
which covers more model theory including non-standard analysis again,
and indeed touches on some slightly more advanced topics.
For rather more detail, here is a recent book with an enticing title:
Thanks to the efforts of the respective authors to write very accessibly, the
suggested main path into the foothills of model theory (from Chiswell & Hodges
→ Leary & Kristiansen → Goldrei → Manzano/Kirby/Kossack) is not at all a
hard road to follow.
Now, we can climb up to the same foothills by routes involving rather tougher
scrambles, taking in some additional side-paths and new views along the way.
Here, then, is a suggestion for the more mathematical reader:
Last but certainly not least, philosophers (but not just philosophers) will cer-
55
DRAFT– 31 DEC 2021
5 Model theory
tainly want to tackle (parts of) the following book, which strikes me as a very
impressive achievement:
9. Tim Button and Sean Walsh, Philosophy and Model Theory* (OUP,
2018).
This book both explains technical results in model theory, and also
explores the appeals to model theory in various branches of philosophy,
particularly philosophy of mathematics, but in metaphysics more gener-
ally, the philosophy of science, philosophical logic and more. So that’s a
very scattered literature that is being expounded, brought together, ex-
amined, inter-related, criticized and discussed. Button and Walsh don’t
pretend to be giving the last word on the many and varied topics they
discuss; but they are offering us a very generous helping of first words
and second thoughts. It’s a large book because it is to a significant extent
self-contained: model-theoretic notions get defined as needed, and many
of the more significant results are proved.
The philosophical discussion is done with vigour and a very engaging
style. And the expositions of the needed technical results are usually
exemplary (the authors have a good policy of shuffling some extended
proofs into chapter appendices). They also say more about second-order
logic and second-order theories than is usual.
But I do rather suspect that, despite their best efforts, an amount of
the material is more difficult than the authors fully realize: we soon get
to tangle with some Level 3 model theory, and quite a lot of other tech-
nical background is presupposed. The breadth and depth of knowledge
brought to the enterprise is remarkable: but it does make of a bumpy
ride even for those who already know quite a lot. Philosophical readers
of this Guide will probably find the book challenging, then, but should
find at least the earlier parts fascinating. And indeed, with judicious
skimming/skipping – the signposting in the book is excellent – mathe-
maticians with an interest in some foundational questions should find a
great deal of interest here too.
And that might already be about as far as many philosophers may want or need
to go in this area. Many mathematicians, however, will want go further into
model theory; so we pick up the story again in §12.2.
10. Wilfrid Hodges, ‘A short history of model theory’, in Button and Walsh,
pp. 439–476.
56
DRAFT– 31 DEC 2021
A little history
Read the first six or so sections. Later sections refer to model theoretic topics a
level up from our current more elementary concerns, so won’t be very accessible
at this stage.
For another piece that focuses on topics from the beginning of model theory,
you could perhaps try R. L. Vaught’s ‘Model theory before 1945’ in L. Henkin et
al, eds, Proceedings of the Tarski Symposium (American Mathematical Society,
1974), pp. 153–172. You’ll probably have to skim parts, but it will also give you
some idea of the early developments. But here’s something which is much more
fun to read. Alfred Tarski was one of the key figures in that early history. And
there is a very enjoyable and well-written biography, which vividly portrays the
man, and gives a wonderful sense of his intellectual world, but also contains
accessible interludes on his logical work:
11. Anita Burdman Feferman and Solomon Feferman, Alfred Tarski, Life
and Logic (CUP, 2004).
57
DRAFT– 31 DEC 2021
1. As a preliminary step, we can narrow our focus and just consider the
decidability of arithmetical properties.
Why? Because we can always represent facts about finite whatnots like
formulas and proofs by using numerical codings. We can then trade in
questions about formulas or proofs for questions about their code numbers.
59
DRAFT– 31 DEC 2021
6 Arithmetic, computability, and incompleteness
2. And as a second step, we can also trade in questions about the effective
decidability of arithmetical properties for questions about the algorithmic
computability of numerical functions.
Why? Because for any numerical property P we can define a correspond-
ing numerical function (its so-called ‘characteristic function’) cP such that
if n has the property P , cP (n) = 1 and if n doesn’t the have property P ,
cP (n) = 0. Think of ‘1’ as coding for truth, and ‘0’ for falsehood. Then
the question (i) ‘can we effectively decide whether a number has the prop-
erty P ?’ becomes the question (ii) ‘is the numerical function cP effectively
computable by an algorithm?’.
So, by those two steps, we do quickly move from e.g. the question whether it
is effectively decidable whether a string of symbols is a wff to a corresponding
question about whether a certain numerical function is computable.
x+0=x
x + Sy = S(x + y)
x×0=0
x × Sy = (x × y) + x
x0 = S0
xSy = (xy × x)
A neat abstract argument proves the point.1 But this raises an obvious question:
what further ways of defining functions – in addition to primitive recursion –
also give us effectively computable functions?
Here’s a pointer. The definition of (say) xy by primitive recursion in effect
tells us to start from x0 , then loop round applying the recursion equation to
compute x1 , then x2 , then x3 , . . . , keeping going until we reach xy . In all, we
have to loop around y times. In some standard computer languages, implement-
ing this procedure involves using a ‘for’ loop (which tells us to iterate some
procedure, counting as we go, and to do this for cycles numbered 1 to y). In
this case, the number of iterations is given in advance as we enter the loop.
But of course, standard computer languages also have programming structures
which implement unbounded searches – they allow open-ended ‘do until’ loops
(or equivalently, ‘do while’ loops). In other words, they allow some process to
be iterated until a given condition is satisfied, where no prior limit is put on the
number of iterations to be executed.
This suggests that one way of expanding the class of computable functions
beyond the primitive recursive functions will be to allow computations employing
open-ended searches. So let’s suppose we do this (there’s a standard device for
1 Roughly, we can effectively list off the primitive recursive functions by listing their recipes;
so we have an algorithm which gives us fn , the n-th such function. Then define the function
d by putting d(n) = fn (n) + 1. Evidently, d differs from any fn for the value n, so isn’t one
of the primitive recursive functions. But it is computable.
61
DRAFT– 31 DEC 2021
6 Arithmetic, computability, and incompleteness
this, but let’s not worry about the details now). Functions – more precisely,
total functions that deliver an output for any numerical input – which can be
computed by a chain of applications of primitive recursion and/or open-ended
searches are called (simply) recursive.
(d) Predictably enough, the next question is: have we now got all the effectively
computable functions?
The claim that the recursive functions are indeed just the intuitively com-
putable total functions is Church’s Thesis, and is very widely believed to be
true (or at least, it is taken to be an entirely satisfactory working hypothesis).
Why? For a start, there are quasi-empirical reasons: no one has found a function
which is incontrovertibly computable by a finite-step deterministic algorithmic
procedure but which isn’t recursive. But there are also much more principled
reasons for accepting the Thesis.
Consider, for example, Alan Turing’s approach to the notion of effective com-
putation. He famously aimed to analyse the idea of a step-by-step computation
procedure down to its very basics, which led him to the concept of computation
by a Turing machine (a minimalist computer). And what we can call Turing’s
Thesis is the claim that the effectively computable (total) functions are just the
functions which are computable by some suitably programmed Turing machine.
So do we now have two rival claims, Church’s and Turing’s, about the class of
computable functions? Not at all! For it turns out to be quite easy to prove the
technical result that a function is recursive if and only if is Turing computable.
And so it goes: every other attempt to give an exact characterization of the class
of effectively computable functions turns out to locate just the same class of
functions. That’s remarkable, and this is a key theme you will want to explore
in a first encounter with the theory of computable functions.
(e) It is fun to find out more about Turing machines, and even to learn to write
a few elementary programs (in effect, it is learning to write in a ‘machine code’).
And there is a beautiful early result that you will soon encounter:
How do we show that? Why does it matter? I leave it to you to read up on the
‘undecidability of the halting problem’, and its many weighty implications.
constant 0 to denote zero, has symbols for the successor, addition and multipli-
cation functions (to keep things looking nice, we still use a prefix S, and infix +
and ×), and its quantifiers run over the natural numbers. Note, we can form the
sequence of numerals 0, S0, SS0, SSS0, . . . (we will use n to abbreviate the result
of writing n occurrences of S before 0, so n denotes n).
PA has the following three pairs of axioms governing the three built-in func-
tions:
∀x 0 6= Sx
∀x∀y(Sx = Sy → x = y)
∀x x + 0 = x
∀x∀y x + Sy = S(x + y)
∀x x × 0 = 0
∀x∀y x × Sy = (x × y) + x
The first pair of axioms specifies that distinct numbers have distinct successors,
and that the sequence of successors never circles round and ends up with zero
again: so the numerals, as we want, must denote a sequence of distinct numbers,
zero and all its eventual successors. The other two pairs of axioms formalize the
equations defining addition and multiplication which we have met before.
And then, crucially, there is also an arithmetical induction principle. As noted
in §4.2, in a first-order framework we can stipulate that
If A(x) is a formula with x free, then from A(0) and ∀x(A(x) → A(Sx))
we can infer ∀xA(x).
You need to get at least some elementary familiarity with the workings of the
resulting theory.
(b) But why concentrate on first-order PA? We’ve emphasized in §4.2 that our
informal induction principle is most naturally construed as involving a second-
order generalization – for any arithmetical property P, if zero has P , and if a
number which has P always passes it on to its successor, then every number has
P . And when Richard Dedekind (1888) and Giuseppe Peano (1889) gave their
axioms for what we can call Dedekind-Peano arithmetic, they correspondingly
gave a second-order formulation for their versions of the induction principle.
Put it this way: Dedekind and Peano’s principle quantifies over all properties of
numbers, while in first-order PA our induction principle rather strikingly only
deals with those properties of numbers which can be expressed by open formulas
of its restricted language. Why go for the weaker first-order principle?
Well, we have already addressed this in Chapter 4: first-order logic is much
better behaved than second-order logic. And some would say that second-order
63
DRAFT– 31 DEC 2021
6 Arithmetic, computability, and incompleteness
logic is really just a bit of set theory in disguise. So, the argument goes, if we
want a theory of pure arithmetic, one whose logic can be formalized, we should
stick to a first-order formulation just quantifying over numbers. Then something
like PA’s induction rule (or the suite of axioms of the form we described) is the
best we can do.
But still, even if we have decided to stick to a first-order theory, why re-
strict ourselves to the impoverished resources of PA, with only three function-
expressions built into its language? Why not have an expression for e.g. the
exponential functions as well, and add to the theory the two defining axioms
for that function? Indeed, why not add expressions for other recursive functions
too, and then also include appropriate axioms for them in our formal theory?
Good question. The answer is to be found in a neat technical observation first
made by Gödel. Once we have successor, addition and multiplication available,
plus the usual first-order logical apparatus, we can in fact already express any
other computable (i.e. recursive) function. To take the simplest sort of case, sup-
pose f is a one-place recursive function: then there will be a two-place expression
of PA’s language which we can abbreviate F(x, y) such that F(m, n) is true if and
only if f (m) = n. Moreover, when f (m) = n, PA can prove F(m, n), and when
f (m) 6= n, PA can prove ¬F(m, n). In this way, PA as it were already ‘knows’
about all the recursive functions and can compute their values. Similarly, PA
can already express any algorithmically decidable relation.
So PA is expressively a lot richer than you might initially suppose. And indeed,
it turns out that even a induction-free subsystem of PA known as Robinson
Arithmetic (often called simply Q) can express the recursive functions.
And this key fact puts you in a position to link up your investigations of PA
with what you know about computability. For example, we quickly get a fairly
straightforward proof that there is no mechanical procedure that a computer
could implement which can decide whether a given arithmetic sentence is a
theorem of PA (or even a theorem of Q).
(c) On the other hand, despite its richness, PA is a first-order theory with
infinite models, so – applying results from elementary model theory (see the
previous chapter) – this first order arithmetic will have non-standard models,
i.e. will have models whose domains contain more than a zero and its successors.
It is worth knowing at an early stage just something about what some of these
non-standard models can look like. And you will also want to further investigate
the contrast with second-order versions of arithmetic which are categorical (i.e.
don’t have non-standard models).
64
DRAFT– 31 DEC 2021
Towards Gödelian incompleteness
we can form a sentence G in its language such that PA proves neither G nor ¬G.
How does he do the trick?
(ii) It’s fun to give an outline sketch, which I hope will intrigue you enough to
leave you wanting to find out more! So:
G1. Gödel introduces a Gödel-numbering scheme for a formal theory like PA,
which is a simple way of coding expressions of PA – and also sequences
of expressions of PA – using natural numbers. The code number for an
expression (or a sequence of expressions) is its unique Gödel number.
G2. We can then define relations like Prf , where Prf (m, n) holds if and only if
m is the Gödel number of a PA-proof of the sentence with code number n.
So Prf is a numerical relation which, so to speak, ‘arithmetizes’ the syn-
tactic relation between a sequence of expressions (proof) and a particular
sentence (conclusion).
G3. There’s a procedure for computing, given numbers m and n, whether
Prf (m, n) holds. Informally, we just decode m (that’s an algorithmic pro-
cedure). Now check whether the resulting sequence of expressions – if there
is one – is a well-constructed PA-proof according to the rules of the game
(proof-checking is another algorithmic procedure). If that sequence is a
proof, check whether it ends with a sentence with the code number n
(that’s another algorithmic procedure).
G4. Since PA can express any algorithmically decidable relation, there will in
particular be a formal expression in the language of PA which we can
abbreviate Prf(x, y) which expresses the effectively decidable relation Prf .
This means that Prf(m, n) is true if and only if m codes for a PA proof of
the sentence with Gödel number n.
G5. Now define Prov(y) to be the expression ∃xPrf(x, y). Then Prov(n), i.e.
∃xPrf(x, n), is true if and only if some number Gödel-numbers a PA-proof
of the wff with Gödel-number n, i.e. is true just if the wff with code num-
ber n is a theorem of PA. Therefore Prov is naturally called a provability
predicate.
G6. Next, with only a little bit of cunning, we construct a Gödel sentence G
in the language of PA with the following property: G is true if and only if
¬Prov(g) is true, where g is the numeral for g, the code number of G.
Don’t worry for the moment about how we do this construction (it in-
volves a so-called ‘diagonalization’ trick which is surprisingly easy). Just
note that G is true on interpretation if and only if the sentence with Gödel
number g is not a PA-theorem, i.e. if and only if G is not a PA-theorem.
In short, G is true if and only if it isn’t a PA-theorem. So, rather stretch-
ing a point, it is rather as if G ‘says’ I am unprovable in PA.
G7. Now, suppose G were provable in PA. Then, since G is true if and only if it
isn’t a PA-theorem, G would be false. So PA would have a false theorem.
65
DRAFT– 31 DEC 2021
6 Arithmetic, computability, and incompleteness
Hence assuming PA is sound and only has true theorems, then it can’t
prove G. Hence, since it is not provable, G is indeed true. Which means
that ¬G is false. Hence, still assuming PA is sound, it can’t prove ¬G either.
So, in sum, assuming PA is sound, it can’t prove either of G or ¬G. As
announced, PA is negation incomplete.
Wonderful!
(iii) Now the argument generalizes to other nicely axiomatized sound theories
T which can express enough arithmetical truths. We can use the same sort of
cunning construction to find a true GT such that T can prove neither GT nor
¬GT . Let’s be really clear: this doesn’t, repeat doesn’t, say that GT is ‘absolutely
unprovable’, whatever that could mean. It just says that GT and its negation
are unprovable-in-T.
Ok, you might well ask, why don’t we simply ‘repair the gap’ in T by adding
the true sentence GT as a new axiom? Well, consider the theory U = T + GT (to
use an obvious notation). Then (i) U is still sound, since the old T -axioms are
true and the added new axiom is true. (ii) U is still a nicely axiomatized formal
theory given that T is. (iii) U can still express enough arithmetic. So we can find
a sentence GU such that U can prove neither GU nor ¬GU .
And so it goes. Keep throwing more and more additional true axioms at T and
our theory will remain negation-incomplete (unless it stops counting as nicely
axiomatized). So here’s the key take-away message: any sound nicely axiomatized
theory T which can express enough arithmetic will not just be incomplete but
in a good sense T will be incompletable.
(iv) Now, we haven’t quite arrived at what’s usually called the First Incom-
pleteness Theorem. For that, we need an extra step Gödel took, which enables
us to drop the semantic assumption that we are dealing with a sound theory T
for a weaker consistency requirement. But I’ll leave you to explore the (not very
difficult) details, and also to find out about the Second Theorem.
It really is time to start reading!
But now turning to textbooks, how to approach the area? Gödel’s 1931 proof
of his incompleteness theorem actually uses only facts about the primitive recur-
sive functions. As we noted, these functions are only a subclass of the effectively
computable numerical functions. A more general treatment of computable func-
tions was developed a few years later (by Gödel, Turing and others), and this in
turn throws more light on the incompleteness phenomenon. So there’s a choice
to be made. Do you look at things in roughly the historical order, first introduc-
ing just the primitive recursive functions, explaining how they get represented
in theories of formal arithmetic, and then learning how to prove initial versions
of Gödel’s incompleteness theorem – and only then move on to deal with the
general theory of computable functions? Or do you explore the general theory
of computation first, only turning to the incompleteness theorems later?
My own Gödel books take the first route. But I also recommend alternatives
taking the second route. First, then, there is
3. Peter Smith, Gödel Without (Too Many) Tears* (Logic Matters, 2020):
freely downloadable from logicmatters.net/igt.
This is a very short book – just 130 pages – which, after some general
introductory chapters, and a little about formal arithmetic, explains the
idea of primitive recursive functions, explains the arithmetization of syn-
tax, and then proves Gödel’s First Theorem pretty much as Gödel did,
with a minimum of fuss. There follow a few chapters on closely related
matters and on the Second Theorem.
GWT is, I hope, very clear and accessible, and it perhaps gives all you need
for a first foray into this area if you don’t want (yet) to tangle with the general
theory of computation. However, you might well prefer to jump straight into one
of the following:
67
DRAFT– 31 DEC 2021
6 Arithmetic, computability, and incompleteness
One comment: none of these books – including my longer one – gives a full proof
of Gödel’s Second Incompleteness Theorem. The guiding idea is easy enough,
but there is tedious work to be done in implementing it. If you really want more
details, see e.g. the book by Boolos or by Rautenberg mentioned in §10.4.
68
DRAFT– 31 DEC 2021
Some parallel/additional reading
This book, then, makes an excellent alternative to Epstein & Carnielli in partic-
ular: it is, however, a little more abstract and sophisticated, which why I have
on balance recommended E&C for many readers. The more mathematical might
well prefer Enderton. (By the way, staying with Enderton, I should mention that
Chapter 3 of his earlier A Mathematical Introduction to Logic (Academic Press
1972, 2002) gives a good brisk treatment of different strengths of formal theo-
ries of arithmetic, and then proves the incompleteness theorem first for a formal
arithmetic with exponentiation and then – after touching on other issues – shows
how to use the β-function trick to extend the theorem to apply to arithmetic
without exponentiation. Not the best place to start, but this chapter too could
be very useful revision material.)
Thirdly, I have already warmly recommended the following book for its cov-
erage of first-order logic:
70
DRAFT– 31 DEC 2021
Some parallel/additional reading
Next we come to a stand-out book that you should certainly tackle at some
point (and though this starts from scratch, I rather suspect that many readers
will appreciate it more if they come to it after reading one or more of the main
recommendations in the previous section):
To introduce the third book, the first thing to say is that it presupposes very
little knowledge about sets, despite the title. If you are familiar with the idea
that the natural numbers can be identified with (implemented as) finite sets in a
standard way, and with a few other low-level ideas, then you can dive in without
further ado to
And, finally, if only because I’ve been asked about it such a large number of
times, I suppose I should end by also mentioning the (in)famous
Which is a great deal more than can be said about many popularizing treatments
of Gödel’s theorems!
15. Richard Epstein’s brisk and very helpful 28 page ‘Computability and
undecidability – a timeline’ which is printed at the very end of Epstein
& Carnielli, listed in §6.5.
This will really give you the headline news you initially need. It is then well
worth reading
16. Robin Gandy, ‘The confluence of ideas in 1936’ in R. Herken, ed., The
Universal Turing Machine: A Half-century Survey (OUP 1988). This
seeks to explain why so many of the absolutely key notions all got formed
in the mid-thirties.
17. John Dawson, Logical Dilemmas: The Life and Work of Kurt Gödel
(A. K. Peters, 1997).
72
DRAFT– 31 DEC 2021
eventually get arbitrarily close – i.e. when we take any > 0 however small,
then for some k, |sn − s0n | < for all n > k. Cauchy identifies √ real numbers
with equivalence classes of Cauchy sequences. So, for Cauchy, 2 would be the
equivalence class containing any sequence of rationals like 1.4, 1.41, 1.414, 1.4142,
1.41421, . . . , i.e. rationals whose squares approach 2.
Alternatively, dropping the picture of sequential approach, we can identify a
real number with a Dedekind cut, defined as a (proper, non-empty) subset C of
the rationals which (i) is downward closed – i.e. if q ∈ C and q 0 < q then q 0 ∈ C –
and (ii) has no largest member. For example, take the negative rationals together
with the positive ones whose square is less than√two: these form a cut. Dedekind
(more or less) identifies the positive irrational 2 with the cut we just defined.
Assuming some set theory, we can now show that – whether defined as cuts on
the rationals or defined as equivalence classes of Cauchy sequences of rationals
– these real numbers do indeed have the properties assumed in our informal
working theory of real analysis. And given that our set theory is consistent, the
resulting theory of the reals can be shown to be consistent too. Excellent!
We can now go on define functions between real numbers in terms of sets of
ordered tuples of reals, so we can develop a theory of analysis. I won’t spell this
out further here. However, you want to get to know something of how the overall
story goes, and also get some sense of what assumptions about sets are needed
for the story to work to give us a basis for reconstructing classical real analysis.
(You will need a number of levels of sets: sets of rationals, and sets of sets of
rationals, and sets of sets of sets, and up a few more levels depending on the
details.)
(c) Now, as far as construction of the reals and the foundations of analysis
are concerned, we could take the requisite set theory – the apparatus of infi-
nite sets, infinite sequences, equivalence classes and the rest – as describing a
superstructure sitting on top of a given prior basic universe of rational numbers
governed by a prior suite of numerical laws. However, we don’t need to do this.
For we can in fact already construct the rationals and simpler number systems
within set theory itself.
For the naturals, pick any set you like and call it ‘0’. And then consider e.g.
the sequence of sets 0; {0}; {{0}}; {{{0}}}; . . .. Or alternatively, consider the se-
quence 0; {0}; {0, {0}}; {0, {0}, {0, {0}}}; {0, {0}, {0, {0}}, {0, {0}, {0, {0}}}}; . . .
where at each step after the first we extend the sequence by taking the set of all
the sets we have so far. Either sequence then has the structure of the natural-
number series. There is a first member; every member has a unique successor
(which is distinct from it); different members have different successors; the se-
quence never circles around and starts repeating. So such a sequence of sets will
do as a representation, implementation, or model of the natural numbers (call
it what you will).
Let’s not get hung up about the best way to describe the situation; we will
simply say we have constructed a natural number sequence. And elementary
reasoning about sets will show that the familiar arithmetic laws about natural
74
DRAFT– 31 DEC 2021
Elements of set theory: an overview
And now a famous question arises – easy to ask, but (it turns out) extraordi-
narily difficult to answer. Take an infinite collection of real numbers. It could
be equinumerous with the set of natural numbers (like, for example, the set of
real numbers 0, 1, 2, . . . ). It could be equinumerous with the set of all the real
numbers (like, for example, the set of irrational numbers). But are there any
infinite sets of reals of intermediate size (so to speak)? – can there be an infinite
subset of real numbers that can’t be put into one-to-one correspondence with
just the natural numbers and can’t be put into one-to-one correspondence with
all the real numbers either? Cantor conjectured that the answer is ‘no’; and this
negative answer is known as the Continuum Hypothesis.
Efforts to confirm or refute the Continuum Hypothesis were a major driver in
early developments of set theory. We now know the problem is a profound one
– the standard axioms of set theory don’t settle the hypothesis one way or the
other. Is there some attractive and natural additional axiom which will settle
the matter? I’ll not give a spoiler here! – but exploration of this question takes
us way beyond the initial basics of set theory.
(e) The argument that the power set of the naturals isn’t equinumerous with
the set of naturals can be generalized. Cantor’s Theorem tells us that a set is
never equinumerous with its powerset.
Note, there is a bijection between the set A and the set of singletons of
elements of A; in other words, there is a bijection between A and part of its
powerset P(A). But we’ve just seen that there is no bijection between A and the
whole of P(A). Intuitively then, A is smaller in size than P(A), which will in
turn be smaller than P(P(A)), etc. We now want to develop this intuitive idea
of one set’s having a smaller cardinal size than another into a competent general
theory about relative cardinal size.
(f) Let’s pause to consider the emerging picture.
Starting perhaps from some given urelements – elements which don’t them-
selves have members – we can form sets of them, and then sets of sets, sets of
sets of sets, and so on and on: and at each new level, we accumulate more and
more sets formed from the urelements and/or the sets formed at earlier levels.
At each level, more and more sets are formed. In particular, once we have an
infinite number of entities at one level, we get an even greater infinity of entities
at the next as we form powersets, and so on up.
Now, for purely mathematical purposes such as reconstructing analysis, it
seems that a we only need a single non-membered base-level entity, and it is
tidy to think of this as the empty set. So for internal mathematical purposes,
we can take the whole universe of sets to contain only ‘pure’ sets (when we look
at the members of members of . . . members of sets, we find nothing other than
more sets). But what if we want to be able to apply set-theoretic apparatus in
talking about e.g. widgets or wombats or (more seriously!) space-time points?
Then it might seem that we will want the base level of non-membered elements
to be populated with those widgets, wombats or space-time points as the case
might be. However, it seems that we can always code for widgets, wombats or
76
DRAFT– 31 DEC 2021
Elements of set theory: an overview
space-time points using some kind of numbers, and we can treat those numbers
as sets. So our set-theory-for-applications can still involve only pure sets. That’s
why typical introductions to set theory either explicitly restrict themselves to
talking about pure sets, or – after officially allowing the possibility of urelements
– promptly ignore them.
(g) Lots of questions arise. Here are two:
1. First, how far can we iterate the ‘set of’ operation – how high do these levels
upon levels of sets-of-sets-of-sets-of-. . . stack up? Once we have the natural
numbers in play, we only need another dozen or so more levels of sets in
which to reconstruct ‘ordinary’ mathematics: but once we are embarked on
set theory for its own sake, how far can we go up the hierarchy of levels?
2. Second, at a particular level, how many sets do we get at that level? And
indeed, how do we ‘count’ the members of infinite sets?
With finite sets, we not only talk about their relative sizes (larger or
smaller), but actually count them and give their absolute sizes by using
finite cardinal numbers. These finite cardinals are the natural numbers,
which we have learnt can be identified with particular sets. We now want
similarly to have a story about the infinite case; we not only want an
account of relative infinite sizes but also a theory about infinite cardinal
numbers apt for giving the size of infinite collections. Again these infinite
cardinals will be identified with particular sets. But how can this story go?
It turns out that to answer both these questions, we need a new notion, the idea
of infinite ordinal numbers. We can’t say very much about this here, but some
more arm-waving pointers might still be useful.
(h) Let’s start rather naively. Here are the familiar natural numbers, but re-
sequenced with the evens in their usual order before the odds in their usual
order:
0, 2, 4, 6, . . . , 1, 3, 5, 7, . . . .
If we use ‘<’ to symbolize the order-relation here, then m < n just in case either
(i) m is even and n is odd or else (ii) m and n have the same parity and m < n.
Note that < is a well-ordering in the standard sense that it is a linear order and,
for any numbers we take, one will be the <-least.
Now, if we march through the naturals in their new <-ordering, checking
off the first one, the second one, the third one, etc., where does the number 7
come in the order? Plainly, we cannot reach it in any finite number of steps:
it comes, in a word, transfinitely far along the <-sequence. So if we want a
position-counting number (officially, an ordinal number) to tally how far along
our well-ordered sequence the number 7 is located, we will need a transfinite
ordinal. We will have to say something like this: We need to march through
all the even numbers, which here occupy positions arranged exactly like all the
natural numbers in their natural order. And then we have to go on another 4
steps. Let’s use ‘ω’ to indicate the length of the sequence of natural numbers
77
DRAFT– 31 DEC 2021
7 Set theory, less naively
in their natural order, and we’ll call a sequence structured like the naturals in
their natural order an ω-sequence. The evens in their natural order can be lined
up one-to-one with the naturals in order, so form another ω-sequence. Hence,
to indicate how far along the re-sequenced numbers we find the number 7, it is
then tempting to say that it occurs at ω + 4-th place.
And what about the whole sequence, evens followed by odds? How long is it?
How might we count off the steps along it, starting ‘first, second, third, . . . ’ ?
After marching along as many steps as there are natural numbers in order to
treck through the evens, then – pausing only to draw breath – we have to march
on through the odds, again going through positions arranged like all the natural
numbers in their natural ordering. So, we have two ω-sequences, put end to end.
It is very natural to say that the positions in the whole sequence are tallied by
a transfinite ordinal we can denote ω + ω.
Here’s another example. There are familiar maps for coding ordered pairs of
natural numbers by a single natural: take, for example, the function which maps
m, n to [m, n] = 2m (2n + 1) − 1. And consider the following ordering on these
‘pair-numbers’ [m, n]:
[0, 0], [0, 1], [0, 2], . . . , [1, 0], [1, 1], [1, 2], . . . , [2, 0], [2, 1], [2, 2], . . . , . . .
If we now use ‘≺’ to indicate this order, then [m, n] ≺ [m0 , n0 ] just in case either
(i) m < m0 or else (ii) m = m0 and n < n0 . (This type of ordering is standardly
called lexicographic: in the present case, compare the dictionary ordering of two-
letter words drawn from an infinite alphabet.) Again, ≺ is a well-ordering on the
natural numbers.
Where does [5, 3] come in this sequence? Before we get to this ‘pair’ there
are already five blocks of the form [m, 0], [m, 1], [m, 2], . . . for fixed m, each as
long as the naturals in their usual order, first the block with m = 0, then the
block with m = 1, and three more blocks, each ω long; so the five blocks are in
total ω · 5 long. And then we have to count another four steps along, tallying
off [5, 0], [5, 1], [5, 2], [5, 3]. So it is inviting to say we have to count along to the
ω · 5 + 4-th step in the sequence to get to the ‘pair’ [5, 3].
And what about the whole sequence of ‘pairs’ ? We have blocks ω long, with
the blocks themselves arranged in a sequence ω long. So this time it is tempting to
say that the positions in the whole sequence of ‘pairs’ are tallied by a transfinite
ordinal we can indicate by ω · ω.
We can continue. Suppose we re-arrange the natural numbers into a new
well-ordering like this: take all the numbers of the form 2l · 3m · 5n , ordered by
ordering the triples hl, m, ni lexicographically, followed by the remaining naturals
in their normal order. We tally positions in this sequence by the transfinite
ordinal ω · ω · ω + ω. And so it goes.
Note by the way that we have so far been considering just (re)orderings of
the familiar set of natural numbers – the sequences are equinumerous, and have
the same infinite cardinal size; but the well-orders are tallied by different infinite
ordinal numbers. Or so we want to say.
78
DRAFT– 31 DEC 2021
Elements of set theory: an overview
But is this sort of naive talk of transfinite ordinals really legitimate? Well, it
was one of Cantor’s great and lasting achievements to show that we can indeed
start to make perfectly good sense of all this.
Now, in Cantor’s work the theory of transfinite ordinals is already entangled
with his nascent set theory. Von Neumann later cemented the marriage by giving
the canonical treatment of ordinals in set theory. And it is via this treatment that
students now typically first encounter the arithmetic of transfinite ordinals, some
way into a full-blown course about set theory. This approach can, unsurprisingly,
give the impression that you have to buy into quite a lot of set theory in order to
understand even the basics about ordinals and their arithmetic. However, not so.
Our little examples so far are of recursive (re)orderings of the natural numbers
– i.e. a computer can decide, given two numbers, which way round they come in
the ordering. There is a whole theory of recursive ordinals which talks about how
to tally the lengths of such (re)orderings of the naturals, which has important
applications e.g. in proof theory. And these tame beginnings of the theory of
transfinite ordinals needn’t entangle us with the kind of rather wildly infinitary
and non-constructive ideas characteristic of modern set theory.
(i) However, here we are concerned with set theory, and so our next topic
will naturally be von Neumann’s very elegant implementation of ordinals in set
theory as the ‘hereditarily transitive sets’. The basic idea is to define a particular
well-ordered sequence of sets – call them the ordinalsvN – and show that any
well-ordered collection of objects, however long the ordering, will have the same
type of ordering as an initial segment of these ordinalsvN . So we can use the
ordinalsvN as a universal measuring scale against which to tally the length of
any well-ordering.
And at this point, I’ll have to leave it to you to explore the details of the
construction of the ordinalsvN in the recommended readings. But once we have
them available, we can say more about the way that the universe of sets is
structured; we can take the levels to be indexed by ordinalsvN (and then assume
that for every ordinal there is a corresponding level of the universe).
We can also now define a scale of cardinal size. We noted that well-orderings
of different ordinal length can be equinumerous; different ordinalsvN can have
the same cardinality. So von Neumann’s next trick is to define a cardinal number
to be the first ordinal (in the well-ordered sequence of ordinals) in a family of
equinumerous ordinals. Again this neat idea we’ll have to leave for the moment
for later exploration. However – and this is an important point – to get this
to all work out as we want, in particular to ensure that we can assign any two
non-equinumerous sets respective cardinalities κ and λ such that either κ < λ
or λ < κ, we will need the Axiom of Choice. (This is something to keep looking
out for when beginning set theory: where do we start to need to appeal to some
Choice principle?)
(j) We are perhaps already rather past the point where scene-setting remarks
at this level of unspecific generality can be very helpful. Time to dive into the
details! But one final important observation before you start.
79
DRAFT– 31 DEC 2021
7 Set theory, less naively
The themes we have been touching on can and perhaps should initially be
presented in a relatively informal style. But something else that also belongs
here near the beginning of your first forays into set theory is an account of the
development of axiomatic ZFC (Zermelo-Fraenkel set theory with Choice) as the
now standard way of formally regimenting set theory. As you will see, different
books take different approaches to the question of just when it is best to start
getting more rigorously axiomatic, formalizing our set-theoretic ideas.
Now, there’s a historical point worth noting, which explains something about
the shape of the standard axiomatization. You’ll recall from the remarks in
§2.1(b) that a set theory which makes the assumption that every property has
an extension will be inconsistent. So Zermelo set out in an epoch-making 1908
paper to lay down what he thought were the basic assumptions about sets that
mathematicians actually needed, while not overshooting and falling into such
contradictions. His axiomatization was not, it seems, initially guided by a positive
conception of the universe of sets so much as by the desire to keep safe and not
assume too much. But in the 1930s, Zermelo himself and especially Gödel came
to develop the conception of sets as a hierarchy of levels (with new sets always
formed from objects at lower levels, so never containing themselves, and with no
end to the levels where we form more sets from what we have accumulated so
far, so we never get to a paradoxical set of all sets). This cumulative hierarchy
is described and explored in the standard texts. Once this conception is in play,
it does invite a more direct and explicit axiomatization as a story about levels
and sets formed at levels: however, it was only much later that this positively
motivated axiomatization gets spelt out, particularly in what has come to be
called Scott-Potter set theory. Most text books stick for their official axioms
to the Zermelo approach, hence giving what looks to be a rather unmotivated
selection of axioms whose attraction is that they all look reasonably modest and
separately in keeping with the hierarchical picture, so unlikely to get us into
trouble. In particular the initial recommendations below take this conventional
line.
80
DRAFT– 31 DEC 2021
Main recommendations on set theory
Since Button can’t really get into enough detail into his brisk notes, most readers
will want to look instead at one or other of the first two of the following admirable
‘entry level’ treatments which cover rather more material in rather more depth
but still very accessibly:
Also starting from scratch, we find two further excellent books which are rather
less conventional in style:
81
DRAFT– 31 DEC 2021
7 Set theory, less naively
5. Winfried Just and Martin Weese, Discovering Modern Set Theory I: The
Basics (American Mathematical Society, 1996).
Covers similar ground to Goldrei and Enderton, but perhaps more
zestfully and with a little more discussion of conceptually interesting
issues. At some places, it is more challenging – the pace can be a bit
uneven.
I like the style a lot, though, and think it works very well. I don’t mean
the occasional (slightly laboured?) jokes: I mean the in-the-classroom
feel of the way that proofs are explored and motivated, and also the way
that teach-yourself exercises are integrated into the text. The book is ev-
idently written by enthusiastic teachers, and the result is very engaging.
(The story continues in a second volume.)
6. Yiannis Moschovakis, Notes on Set Theory (Springer, 2nd edition 2006).
This also takes a slightly more individual path through the material
than Goldrei and Enderton, with occasional bumpier passages, and with
glimpses ahead. But to my mind, this is very attractively written, and
again nicely complements and reinforces what you’ll learn from the more
conventional books.
Of these two pairs of books, I’d rather strongly advise reading one of the first
pair and then one of the second pair.
I will add two more firm recommendations at this level. The first might come
as a bit of surprise, as it is something of a ‘blast from the past’. But we shouldn’t
ignore old classics – they can have a lot to teach us even when we have read the
more recent books, and this is very illuminating:
Now, as I noted in the initial overview section, one thing that every set-theory
novice now acquires is the picture of the universe of sets as built up in a hierarchy
of stages or levels, each level containing all the sets at previous levels plus new
ones (so the levels are cumulative). It is significant that, as Fraenkel et al. makes
clear, the picture wasn’t firmly in place from the beginning. But the hierarchical
conception of the universe of sets is brought to the foreground in
82
DRAFT– 31 DEC 2021
Some parallel/additional reading on standard ZFC
Next, here are four introductory books at the right sort of level, listed in order of
publication; each has many things to recommend it to beginners. Browse through
to see which might suit your interests:
10. D. van Dalen, H.C. Doets and H. de Swart, Sets: Naive, Axiomatic and
Applied (Pergamon, 1978).
The first chapter covers the sort of elementary (semi)-naive set theory
that any mathematician needs to know, up to an account of cardinal
numbers, and then takes a first look at the paradox-avoiding ZF axiom-
atization. This is very attractively and illuminatingly done. (Or at least,
the conceptual presentation is attractive – sadly, and a sign of its time
of publication, the book seems to have been photo-typeset from original
83
DRAFT– 31 DEC 2021
7 Set theory, less naively
pages produced on electric typewriter, and the result is visually not at-
tractive at all.)
The second chapter carries on the presentation axiomatic set theory,
with a lot about ordinals, and getting as far as talking about higher
infinities, measurable cardinals and the like. The final chapter considers
some applications of various set theoretic notions and principles. Well
worth seeking out, if you don’t find the typography off-putting.
11. Karel Hrbacek and Thomas Jech, Introduction to Set Theory (Marcel
Dekker, 3rd edition 1999).
Eventually this book goes a bit further than Enderton or Goldrei (more
so in the 3rd edition than earlier ones), and you could – on a first reading
– skip some of the later material. Though do look at the final chapter
which gives a remarkably accessible glimpse ahead towards large cardinal
axioms and independence proofs. Recommended if you want to consoli-
date your understanding by reading a second presentation of the basics
and want then to push on just a bit.
Jech is a major author on set theory whom we’ll encounter again,
and Hrbacek once won a AMA prize for maths writing. So, unsurpris-
ingly, this is a very nicely put together book, which could very well have
featured as a main recommendation.
12. Keith Devlin, The Joy of Sets (Springer, 1979: 2nd edn. 1993).
The opening chapters of this book are remarkably lucid and attrac-
tively written. The opening chapter explores ‘naive’ ideas about sets and
some set-theoretic constructions, and the next chapter introduces axioms
for ZFC pretty gently (indeed, non-mathematicians could particularly
like Chs 1 and 2, omitting §2.6). Things then speed up a bit, and by
the end of Ch. 3 – some 100 pages into the book – we are pretty much
up to the coverage of Goldrei’s much longer first six chapters, though
Goldrei says more about (re)constructing classical maths in set theory.
Some will prefer Devlin’s fast-track version. (The rest of the book then
covers non-introductory topics in set theory, of the kind we take up again
in §12.4.)
13. Judith Roitman, Introduction to Modern Set Theory* (Wiley, 1990: a
2011 version is available at tinyurl.com/roitmanset.
Relatively short, and very engagingly written, this book covers quite
a bit of ground – we’ve reached the constructible universe by p. 90 of the
downloadable pdf version, and there’s even room for a concluding chapter
on ‘Semi-advanced set theory’ which says something about large cardi-
nals and infinite combinatorics. A few quibbles aside, this could make
excellent revision material as Roitman is particularly good at highlight-
ing key ideas without getting bogged down in too many details.
Those four books all aim to cover the basics in some detail. The next two books
are much shorter, and are differently focused.
84
DRAFT– 31 DEC 2021
Further conceptual reflection on set theories
Rather differently, if you haven’t tackled their book in working on model theory,
you will want to look at
19. Tim Button and Sean Walsh’s Philosophy and Model Theory* (OUP,
2018).
Now see especially §1.B (on first-order vs second-order ZFC), Ch. 8
(on models of set theory), and perhaps Ch. 11 (more on Scott-Potter set
theory).
20. José Ferreirós, ‘The early development of set theory’, The Stanford En-
cyclopaedia of Philosophy, available at tinyurl.com/sep-devset.
This article has references to many more articles, like Kanimori’s fine piece on
‘The mathematical development of set theory from Cantor to Cohen’. But you
might to need to be on top of rather more set theory before getting to grips with
that.
86
DRAFT– 31 DEC 2021
Postscript: Other treatments?
But I think Enderton or van Dalen et al. do this better. The second part of this
book is on more advanced topics in combinatorial set theory.
George Tourlakis’s Lectures in Logic and Set Theory, Volume 2: Set Theory
(CUP, 2003) has been recommended to me a number of times. Although this is
the second of two volumes, it is a stand-alone text. You can probably already
skip over the initial chapter on FOL, consulting if/when needed. That still leaves
over 400 pages on basic set theory, with long chapters on the usual axioms, on
the Axiom of Choice, on the natural numbers, on order and ordinals, and on
cardinality. (The final chapter on forcing should be omitted at this stage, and
strikes me as considerably less clear than what precedes it.)
As the title suggests, Tourlakis aims to retain something of the relaxed style
of the lecture room, complete with occasional asides and digressions. And as the
page length suggests, the pace is quite gentle and expansive, with room to pause
over questions of conceptual motivation etc. However, simple constructions and
results take a very long time to arrive. For example, we don’t get to Cantor’s
theorem on the uncountability of P(ω) until p. 455! So while this book might
be worth dipping into for some of the motivational explanations, I can’t myself
recommend it overall.
Finally, I’ll mention another more recent text from the same publisher, Daniel
W. Cunningham’s Set Theory: A First Course (CUP, 2016). But this doesn’t
strike me as a particularly friendly introduction. As the book progresses, it turns
into pages of old-school Definition/Lemma/Theorem/Proof with rather too little
commentary; key ideas seem often to be introduced in a phrase, without much
discursive explanation. (Readers who care about the logical niceties will also
raise their eyebrows at the author’s over-causal way with use and mention, or
e.g. the too-typically hopeless passage about replacing variables with values on
p. 14. And this isn’t just being pernickety: what exactly are we to make of the
claim on p. 31 that a class is “any collection of the form {x : A(x)}”? So not
recommended to logicians of a sensitive disposition!)
87
DRAFT– 31 DEC 2021
8 Intuitionistic logic
In the briefest headline terms, intuitionistic logic is what you get if you drop the
classical principle that ¬¬A implies A (or equivalently drop the law of excluded
middle which says that A ∨ ¬A always holds). But why would we want to do
that? And what further consequences for our logic does that have?
88
DRAFT– 31 DEC 2021
Overview: why intuitionistic logic?
[A]
..
. A ¬A
(¬I) (¬E)
⊥ ⊥
¬A
Alternatively, we can take these to be the introduction and elimination rules
governing a primitive built-in negation connective. Nothing hangs on this choice.
We then define IPL, intuitionistic propositional logic (in its natural deduction
version), to be the logic governed these rules.
The described rules are of course all rules of classical logic too. However, the
intuitionistic system is strictly weaker in the sense that the following classically
acceptable principles are not derived rules of our intuitionistic logic:
[¬A]
..
¬¬A .
(DN) (LEM) A ∨ ¬A (CR)
A ⊥
A
DN allows us to drop double negations. LEM is the Law of Excluded Middle,
which permits us to infer A ∨ ¬A whenever we want, from no assumptions. CR
is the classical reductio rule. And these three rules are equivalent in the sense
that adding any one of them to intuitionistic propositional logic enables us to
prove all the same conclusions; each way, we get back full classical propositional
logic.
(b) If only for brevity’s sake, we will largely be concentrating on propositional
logic in the two introductory overviews which follow. But we should briefly note
what it takes to get intuitionistic predicate logic in natural deduction form.
Technically, it’s very straightforward. Just as the rules for ∧ and ∨ are the
same in classical and intuitionist logic, the rules for generalized conjunctions and
generalized disjunctions remain the same too. In other words, to get intuition-
istic predicate logic we simply add to IPL the same pair of introduction and
elimination rules for ∀ and ∃ as for classical logic.
But note, because of the different background propositional logic – in particu-
lar, because of the different rules concerning negation – these familiar quantifier
rules no longer have all the same implications in the intuitionistic setting. For
example ∃xA(x) is no longer equivalent to ¬∀x¬A(x). More about this below.
(i) (A ∧ B) is warranted iff (if and only if) A and B are both warranted.
(ii) While there may be other ways of arriving at a disjunction, the direct
and ideally informative way of certifying a disjunction’s correctness is by
establishing one or other disjunct. So we will count (A ∨ B) as warranted
iff at least one disjunct is certified to be correct, i.e. iff there is a warrant
for A or a warrant for B.
(iii) A warranted conditional (A → B) must be one that, together with the
warranted assertion A, will enable us to derive another warranted assertion
B by using modus ponens. Hence (A → B) is directly warranted iff there
is a way of converting any warrant for A into a warrant for B.
(iv) ¬A is warranted iff we have a warrant for ruling out A because it leads to
something absurd (given what else is warranted).
(v) ⊥ is never warranted.
Then, in keeping with this approach, we will think of a reliable inference as one
that takes us from warranted premisses to a warranted conclusion.
Now, in this framework, the familiar introduction rules for the connectives
will still be acceptable, for they will evidently be warrant-preserving (given our
interpretation of the connectives). But as we said, the various elimination rules
in effect just ‘undo’ the effects of the introduction rules: so they should come for
free along with the introduction rules. Finally, we can still endorse EFQ – the
plausible thought is that if, per impossible, the absurd is warrantedly assertible,
then all hell breaks loose, and anything goes.
Hence, regarded now as warrant-preserving rules, all our IPL rules can remain
in place. However:
90
DRAFT– 31 DEC 2021
Overview: why intuitionistic logic?
Again, for similar reasons, CR is not acceptable either in this framework: but I
won’t keep mentioning this third rule.
In sum, then, if we want a propositional logic suitable as a framework for
regimenting arguments which preserve warranted assertability, we should stick
with the core rules of IPL – and shouldn’t endorse those further distinctively
classical laws.
But be very careful here! It is one thing to stand back from endorsing the law
of excluded middle. It would be something else entirely actually to deny some
instance of the law. In fact, it is an easy exercise to show that, even in IPL, any
outright negation of an instance – i.e. any sentence of the form ¬(A ∨ ¬A) –
entails absurdity!
(b) The double negation rule DN of classical logic is an outlier, not belonging
to one of the matched pairs introduction/elimination rules. Now we see the
significance of this. Its special status leaves room for an interpretation on which
the remaining rules – the rules of IPL – hold good, but DN doesn’t. Hence, as
we wanted to show, DN is not derivable as a rule of intuitionistic propositional
logic. Nor is LEM.
True, our version of the semantic argument as presented so far might seem
all a bit too arm-waving for comfort; after all, the notion of warrant as we
characterized it so can hardly be said to be ideally clear! But let’s not fuss
about details now. We’ll soon meet a rigorous story partially inspired by this
notion which gives us an entirely uncontroversial, technically kosher, proof that
DN and its equivalents are, as claimed, independent of the rules of IPL.
Things do get controversial, though, when it is claimed that DN and LEM
really don’t apply in some particular domain of inquiry, because in this do-
main there can indeed be no more to correctness than having a warrant in the
form of a direct informal proof. Now, so-called intuitionists do indeed hold that
mathematics is a case in point. Mathematical truth, they say, doesn’t consist
in correspondence with facts about abstract objects laid out in some Platonic
heaven (after all, there are familiar worries: what kind of objects could these ideal
mathematical entities be? how could we possibly know about them?). Rather,
the story goes, the mathematical world is in some sense our construction, and
being mathematically correct can be no more than a matter of being assertible on
the basis of a proof elaborating our constructions – meaning not a proof in this
or that formal system but a chain of reasoning satisfying informal mathematical
standards for being a direct proof.
Consider, for example, the following argument, intended to show that (C),
there is a pair of irrational numbers a and b such that ab is rational:
91
DRAFT– 31 DEC 2021
8 Intuitionistic logic
√ √2
Either (i) 2 is rational, or √ (ii) it isn’t. In case (i) we are done:
we can simply put a = b = 2, and hence (C) then holds. In case
√ √2 √
(ii) put a = 2 , b = 2. Then a is irrational by assumption, b is
b
√ √2 √2 √ 2
irrational, while a = ( 2 ) = 2 = 2 and hence is rational, so
(C) again holds. Either way, (C).
It will be agreed on all sides that this argument isn’t ideally satisfying. But the
intuitionist goes further, and claims that this argument actually fails to estab-
lish (C), because we haven’t yet constructed a specific a and b to warrant (C).
The cited argument assumes that either (i) or (ii) holds, and – the intuitionist
complains – we are not entitled to assume this when we are given no reason to
suppose that one or other disjunct can be warranted by a construction.
(c) For an intuitionist, then, the appropriate logic is not full classical two-
valued logic but rather our cut-down intuitionistic logic (hence the name!),
because this is the right logic for correctness-as-informal-direct-provability.
Or so, roughly, goes the story. Plainly, we can’t even begin to discuss here the
highly contentious issues about the nature of truth and provability in mathemat-
ics which first led to the advocacy of intuitionistic logic (if you want to know a
bit more, there are some initial references in the recommended reading). But no
matter: there are plenty of other reasons too for being interested in intuitionistic
logic, which keeps recurring in various contexts (e.g. in computer science and
in category theory). And as we will see in the next chapter, the fact that its
rules come in matched introduction/elimination pairs makes intuitionistic logic
proof-theoretically particularly neat.
For now, though, let’s just say a bit more about what can and can’t be proved
in IPL and its extension by the quantifier rules, and also introduce one of the
more formal ways of semantically modelling it.
(i) The familiar classical laws governing just conjunctions and disjunctions
stay the same: so, for example, we still have A ∧ (B ∧ C) `i (A ∧ B) ∧ C
and A ∨ (B ∧ C) `i (A ∨ B) ∧ (A ∨ C). However, although the conditional
rules of inference are the same in classical and intuitionist logic, the laws
governing the conditional are not the same. Classically, we have Peirce’s
Law, (A → B) → A `c A; but we do not have (A → B) → A `i A.
(ii) Classically, the binary connectives are interdefinable using negation. Not
so in IPL. We do have for example (A∨B) `i ¬(¬A∧¬B). But the converse
doesn’t hold – a good rule of thumb is that IPL makes disjunctions harder
to prove. However, ¬(¬A ∧ ¬B) `i ¬¬(A ∨ B).
92
DRAFT– 31 DEC 2021
Overview: more proof theory, more semantics
93
DRAFT– 31 DEC 2021
8 Intuitionistic logic
(b) Two comments on the Gödel/Gentzen theorem. First, it shows that for
every classical result, there is already a corresponding intuitionistic one which
has additional double negation signs in the right places. So we can think of
classical logic not so much as what you get by adding to intuitionist logic but
rather as what you get by ignoring a distinction that the intuitionist thinks is
of central importance, namely the distinction between A and ¬¬A.
Second, note this particular consequence of the theorem: Γ `c ⊥ if and only if
ΓT `i ⊥. So if the classical theory Γ is inconsistent by classical standards, then its
intuitionistic translation ΓT is already inconsistent by intuitionistic standards.
Roughly speaking, then, if we have worries about the consistency of a classical
theory, retreating to an intuitionistic version isn’t going to help. As you’ll see
from the readings, this observation had significant historical impact in debates
in the foundations of mathematics.
(c) Let’s now return to those earlier arm-waving semantic remarks in §8.2(a).
They can be sharpened up in various ways, but here I’ll just briefly consider (a
version of) Saul Kripke’s semantics for IPL. I’ll leave it to you to find out how
the story can be extended to cover quantified intuitionistic logic.
Take things in stages. First, imagine an enquirer, starting from a ground state
of knowledge g; she then proceeds to expand her knowledge, through a sequence
of possible further states K. Different routes forward can be possible, so we can
think of these states as situated on a branching array of possibilities rooted at g
(not strictly a ‘tree’ though, as we can allow branches to later rejoin, reflecting
the fact that our enquirer can arrive at the same knowledge state by different
routes). If she can get from state k ∈ K to the state k 0 ∈ K by zero or more
steps, then we’ll write k ≤ k 0 . So, to model the situation a bit more abstractly,
let’s say that
An intuitionistic model structure is a triple (g, K, ≤), where K is a
set, ≤ is a partial order defined over K, and g is its minimum (so
g ≤ k for all k ∈ K).
As our enquirer investigates the truth of the various sentences of her proposi-
tional language, at any stage k a sentence A is either established to be true or not
[yet] established. We can symbolize those alternatives by k
A and k 1 A; it is
quite common, for reasons that needn’t now detain us, to read ‘
’ as forces. And,
as far as atomic sentences are concerned, the only constraint on a forcing relation
is this: once P is established in the knowledge state k, it stays established in any
expansion on that state of knowledge, i.e. at any k 0 such that k ≤ k 0 . Knowledge
persists. Hence, again to put the point more abstractly, we require get forcing
relation
to satisfy this persistence condition:
For any atomic sentence P and k ∈ K, if k
P , then k 0
P , for all k 0 ∈ K
such that k ≤ k 0 .
And now, next stage, let’s expand a forcing relation defined for a suite of atoms
so that it now covers all wffs built up from those atoms by the connectives. So,
for all k, k 0 ∈ K, and all relevant sentences A, B, we will require
94
DRAFT– 31 DEC 2021
Overview: more proof theory, more semantics
(i) k 1 ⊥.
(ii) k
A ∧ B iff k
A and k
B.
(iii) k
A ∨ B iff k
A or k
B.
(iv) k
A → B iff, for any k 0 such that k ≤ k 0 , if k 0
A then k 0
B.
(v) k
¬A iff, for any k 0 such that k ≤ k 0 , k 0 1 A.
It’s a simple consequence of these conditions on a forcing relation that for any
A, whether atomic or molecular,
This formally reflects the idea that once A is established it stays established,
whether or not it is an atom.
But what motivates those clauses (i) to (v) in our characterization of
? (i)
The absurd is never established as true, in any state of knowledge. And (ii)
establishing a conjunction is equivalent to establishing each conjunct, on any
sensible story. So we needn’t pause over these first two.
But (iii) reveals our enquirer’s intuitionist/constructivist commitments! – as
per the BHK interpretation, she is taking establishing a disjunction in an accept-
ably direct way to require establishing one of the disjuncts. For (iv) the thought
is that establishing A → B is tantamount to giving you an inference-ticket: with
the conditional established, if you (eventually) get to also establish A, then you
will then be entitled to B too. Finally, (v) falls out from the definition of ¬A as
A → ⊥ and the evaluation rules for → and ⊥. Or more directly, the idea is that
to establish ¬A is to rule out, once and for all, A turning out to be correct as
we later expand our knowledge.
With these pieces in place, we can – next stage! – define a formula of a
propositional language to be intuitionistically valid in a natural way. Classically,
a propositional formula is valid (is a tautology) if it is true however things
turn out with respect to the values of the relevant atoms. Now we say that a
propositional formula A is intuitionistically valid if it can be established in the
ground state of knowledge, however things later turn out with respect to the
truth of relevant atoms as our knowledge expands. Putting that more formally,
And now for the big reveal! Kripke proved in 1965 the following soundness and
completeness result:
1 Fine print, just to link up with other presentations you will meet. First, given (∗), g
A
holds iff k
A for all k. So we can redefine validity by saying A is valid just when k
A
for all k. But then, second, we can in fact let g drop right out of the picture. For it is quite
easy to see that it will make no difference whether we require the partial order ≤ to have a
minimum or not: the same sentences will come out valid either way. Indeed, third, we don’t
even require the relation we symbolized ≤ to be a true partial order: again, if we allow any
reflexive, transitive relation over K in its place, it will make no different to what comes out
as valid.
95
DRAFT– 31 DEC 2021
8 Intuitionistic logic
You could read these in the order given, initially skimming/skipping over
passages that aren’t immediately clear.
Or perhaps better, start with (1)’s §1, ‘Rejection of Tertium Non Datur’,
and then (2)’s §5.1, ‘Constructive reasoning’ which introduces the BHK
interpretation of the logical operators.
Then look at a presentation of a natural deduction system for intuition-
istic logic (as sketched in our overview): this is briskly covered in (2) in the
96
DRAFT– 31 DEC 2021
Some parallel/additional reading
first half of §5.2. But in fact the discussion in (3) – though this is not an
introductory textbook – is notably more relaxed and clearer: see §1 of the
chapter.
Next, read up on the double-negation translation between classical and
intuitionistic logic. This is described in (1) §4.1, and explored a bit more in
the second half of (2) §5.2. But again, a more relaxed presentation can be
found in (3), §3 (up to Prop. 3.8).
Now you want to find out more about Kripke semantics, which is also
covered in all three resources. (1) §5.1 gives the brisk headline news. (2)
gives a compressed account in the first half of §5.3. But again (3) is best:
Troelstra and Van Dalen give a much more expansive and helpful account
in their Ch. 2 §5 – which sensibly treats propositional logic first before
expanding the story to cover full quantified intutionistic logic.
I would suggest, though, leaving detailed soundness and completeness
proofs for Kripke semantics – covered in (2) §5.3 or (3) §6 – for later (if
indeed tackled at all, at this stage.)
For a few more facts about intuitionistic logic, such as the disjunction
property, see also the first couple of pages of (2) §5.4 (the rest of that section
is interesting but not really needed at this stage).
Return to (1) to look at §2.1 (an axiomatic version of intuitionistic logic),
and the first half of §3 (on Heyting’s intuitionistic arithmetic). Then finally,
for more on Heyting Arithmetic and a spelt-out proof that it is consistent
if and only if classical Peano Arithmetic is consistent, you could dip into
For a bit more on natural deduction, the sequent calculus and semantics for
intuitionistic logic, you should look at two chapters from a modern classic:
In fact, you could well want to read the opening two chapters and the final
one as well! There are then many more pointers to technical discussions in
Moschovakis’s section of ‘Recommended reading’.
The same author has a The Stanford Encyclopedia article on ‘The Development
of Intuitionistic Logic’ at tinyurl.com/dev-intuit; but that’s much more detailed
than you are likely to want.
Turning to more philosophical discussions – and it is a bit difficult to separate
thinking about intuitionism as a philosophy of mathematics from thinking about
intuitionistic logic more specifically – one key article that you will want to read
(which was hugely influential in reviving interest in a ‘tamer’ intuitionism among
philosophers) is
99
DRAFT– 31 DEC 2021
The story of proof theory starts with David Hilbert and what has come to be
known as ‘Hilbert’s Programme’, which inspired the profoundly original work of
Gerhard Gentzen in the 1930s.
Two themes from Gentzen are within easy reach for beginners in mathemat-
ical logic: (A) the idea of normalization for natural deduction proofs, (B) the
move from natural deduction to sequent calculi, and cut-elimination results for
these calculi. But the most interesting later developments in proof theory – in
particular, in so-called ordinal proof theory – quickly become mathematically
rather sophisticated. Still, at this stage it is at least worth making a first pass
at (C) Gentzen’s proof of the consistency of arithmetic using a cut-elimination
proof which invokes induction over some small countable ordinals. So these three
themes from elementary proof theory will be the focus of this chapter.
100
DRAFT– 31 DEC 2021
Deductive systems, normal forms, and cuts: a short overview
P∧Q [R ∧ Q](1)
P∧Q
P [P](1) Q
(i) P (ii)
P∨Q P ∨ (R ∧ Q) P∨Q P∨Q
(1)
P∨Q
101
DRAFT– 31 DEC 2021
9 Elementary proof theory
[A](1) ..
o .
.. A
. B (1) o
A A→B
B B
For another example, going back to the case of introducing and then eliminating
a disjunction, a proof of the shape on the left can be reduced to a proof with
the shape on the right:
.. [A](1) [B](1) ..
. .
A o oo A
A∨B C C (1) o
C C
And similarly for other simple detours involving other connectives and the quan-
tifiers. However, what about the case where a detour gets entangled with the
application of other rules in more complicated ways? Can detours always be
removed?
Gentzen was able to show that – at least for his system of intuitionistic logic
– if a conclusion can be derived from premisses at all, then there will indeed be
a normal, i.e. detour-free, proof of the conclusion from the premisses. And he
did this by giving a normalization procedure – i.e. instructions for systematically
removing detours until we are left with a normal proof. The resulting detour-free
proofs will then have particularly nice features such as the so-called subformula
property: every formula that occurs in a proof will either be a subformula of one
of the premisses or a subformula of the conclusion (as usual, counting instances
of quantified wffs as subformulas of them). There won’t be irrelevancies as in
our silly proof (ii) above.
And now note that, as a corollary, we can immediately conclude that intu-
itionistic logic is consistent: we can’t have a proof with the subformula property
from no premisses to ⊥. Which raises a hopeful prospect: can other normal-
ization proofs be used to establish the sort of consistency results that Hilbert
wanted?
(b) But now the story gets complicated. For a start, Gentzen himself couldn’t
find a normalization proof for his natural deduction system of classical logic
(you can see why there might be a problem – a classical proof might, at least
on the face of it, need to rely on an instance of excluded middle which isn’t a
102
DRAFT– 31 DEC 2021
Deductive systems, normal forms, and cuts: a short overview
[P](2) [Q](1)
(P ∧ Q) → R P∧Q
R (1)
Q→R
(2)
P → (Q → R)
Then, reading upwards from R, we see that this wff depends on all three of
(P ∧ Q) → R, P, and Q as assumptions (for neither of the last two have yet been
discharged); while Q → R on the next line depends only on (P ∧ Q) → R and P.
That’s clear enough. But we could alternatively record dependencies quite ex-
plicitly, line by line. To do this, we will make use of so-called sequents. We’ll write
a sequent in the form Γ ⇒ A, and read this as saying that A is deducible from
the finitely many (perhaps zero) wffs Γ.2 Since an (undischarged) assumption
depends just on itself, we can then explicitly record the deducibilities revealed
in our last natural deduction proof like this (check that claim!):
P ⇒ P Q ⇒ Q
(P ∧ Q) → R ⇒ (P ∧ Q) → R P, Q ⇒ P ∧ Q
(P ∧ Q) → R, P, Q ⇒ R
(P ∧ Q) → R, P ⇒ Q → R
(P ∧ Q) → R ⇒ P → (Q → R)
2 Forpresent purposes, we can officially think of Γ as given as a set – though in the end we
might prefer to treat Γ as a multi-set where repetitions matter: Gentzen himself treated Γ
as an ordered sequence.
103
DRAFT– 31 DEC 2021
9 Elementary proof theory
104
DRAFT– 31 DEC 2021
Deductive systems, normal forms, and cuts: a short overview
Γ ⇒ A ∆, A ⇒ B
Γ, ∆ ⇒ B
This intuitively sound rule allows us to cut out the middle man A.
So far, then, so good – though of course, we’ve left lots of detail to be filled out.
And there is as yet nothing really novel involved in reworking natural deduction
into sequent style like this. But now, however, Gentzen introduces two very
striking new ideas.
(d) To introduce the first idea, let’s think again about the elimination rules for
conjunction. As a first shot, we might expect to transform the pair of natural-
deduction rules into a corresponding pair of sequent-calculus rules like this:
A∧B A∧B Γ ⇒ A∧B Γ ⇒ A∧B
A B Γ ⇒ A Γ ⇒ B
What could be more obvious? But we could alternatively adopt the following
sequent-calculus rule:
Γ, A, B ⇒ C
Γ, A ∧ B ⇒ C
This is obviously valid – if C can be derived from some assumptions Γ plus A
and B, it can obviously be derived from Γ plus the conjunction of A and B. And
we can use this rule introducing ∧ on the left of the sequent sign instead of the
expected pair of rules eliminating ∧ to the right of the sequent sign. For note,
given the new rule, we can restore the first of the elimination rules as a derived
rule, because we can always give a derivation of this shape:
A ⇒ A (Weakening)
A, B ⇒ A
(New rule for ∧)
Γ ⇒ A∧B A ∧ B ⇒ A (Cut)
Γ ⇒ A
Similarly, of course, for the companion elimination rule.
And now the point generalizes. As Gentzen saw, in a sequent calculus for intu-
itionistic logic, we can get all the rules for handling connectives and quantifiers
to introduce a logical operator – either on the right of the sequent sign (corre-
sponding to a natural-deduction introduction rule) or on the left of the sequent
sign (corresponding to a natural-deduction elimination rule).
(e) We can go further. Still working with a sequent calculus for ⇒ read as in-
tuitionistic deducibility, we can in fact eliminate the cut rule. Anything provable
using cut can be proved without it.
This might initially seem pretty surprising. After all, didn’t we just have to
appeal to the cut rule to show that – using our new introduction-on-the-left rule
for ∧ – we can still argue from (1) Γ ⇒ A ∧ B to (2) Γ ⇒ A? How can we
possibly do without cut in this case?
Well, consider how we might actually have arrived at (1). Perhaps it was by
the rule for introducing occurrences of ∧ on the right of a sequent. So perhaps,
to expose more of the proof from (1) to (2), it has the shape of the first proof
below (supposing Γ to result from putting together Γ0 and Γ00 ):
105
DRAFT– 31 DEC 2021
9 Elementary proof theory
A ⇒ A
Γ0 ⇒ A Γ00 ⇒ B A, B ⇒ A Γ0 ⇒ A (Weakenings)
Γ ⇒ A∧B A∧B ⇒ A (Cut)
Γ ⇒ A
Γ ⇒ A
But if we already have Γ0 ⇒ A, as in the first proof, then we don’t need to
go round the houses on that detour, introducing an occurrence of ∧ to get the
formula A ∧ B, and then cutting out that same formula: we can just get from
Γ0 ⇒ A to Γ ⇒ A by some weakenings (by adding in the wffs from Γ00 ). Here,
then, eliminating the cut is just like normalizing (part of) a natural deduction
proof.
OK: that only shows that in just one rather special sort of case, we can
eliminate a cut. Still, it’s a hopeful start! And in fact, we can always eventually
eliminate cuts from an intuitionistic sequent calculus proof.
But the process can be intricate. For example, take a slight variant of our
previous example and suppose we want to eliminate the following cut (remember,
combining Γ and Γ gives us Γ!):
Γ ⇒ A Γ ⇒ B ∆, A, B ⇒ C
Γ ⇒ A∧B ∆, A ∧ B ⇒ C
(Cut)
Γ, ∆ ⇒ C
Then we can replace this proof-segment with the following:
Γ ⇒ B ∆, A, B ⇒ C
(Cut)
Γ ⇒ A Γ, ∆, A ⇒ C
(Cut)
Γ, ∆ ⇒ C
Again, as in normalizing a natural deduction proof, we have removed a detour
– this time a detour through introducing-∧-on-the-right and introducing-∧-on-
the-left. So we have now lost the cut on the more complex formula A ∧ B, albeit
replacing it with two new cuts. But still, the new cuts are on the simpler formulas
A and B, and we have also pushed one of the cuts higher up the proof. And that’s
typical: looking at the range of possible situations where we can apply the cut
rule – a decidedly tedious hack though all the cases – we find we can indeed keep
reducing the complexity of formulas in cuts and/or pushing cuts up the proof
until all the cuts are completely eliminated.
(f) So we arrive at this result. In a sequent-calculus setting, we can use a cut-free
deductive system for intuitionistic logic where all the rules for the connectives
and quantifiers introduce logical operators, either to the left or to the right of the
sequent sign. Analogously to a normalized natural-deduction proof, there are no
detours. As we go down a branch of the proof, the sequents at each stage are
steadily more complex (we can make the relevant notion of complexity precise
in pretty obvious ways).
This proof-analysis immediately delivers some very nice results.
(i) The subformula property: every formula occurring the derivation of a se-
quent Γ ⇒ C is a subformula of either one of formulas Γ or of C. (By
inspection of the rules!)
106
DRAFT– 31 DEC 2021
Deductive systems, normal forms, and cuts: a short overview
Note too that, at least for propositional logic, we can take any sequent and
systematically try to work upwards from it to construct a cut-free proof with
ever-simpler-sequents: the resulting success or failure then mechanically decides
whether the sequent is intuitionistically valid.
(g) I said that Gentzen had two very striking new ideas in developing his se-
quent calculi beyond a mere re-write of a natural deduction system in which
dependencies are made explicit. The first idea is to recast all the rules for logical
operators as rules for introducing logical operators, now allowing introduction
to the left as well as introduction to the right of the sequent sign, and to then
show that we can get a cut-free proof (hence, a proof that always goes from less
complex to more complex sequents) for any intuitionistically correct sequent.
But this first idea doesn’t by itself resolve the problem which Gentzen initially
faced. Recall, he ran into trouble trying to find a normalization proof for classical
natural deduction. And plainly, if we stick with a cut-free all-introduction-rules
sequent calculus of the current style, we can’t get a classical logical system at
all. The point is trivial: one key additional classical principle we need to add to
intuitionistic logic is the double negation rule. We need to be able to show, in
other words, that from Γ ⇒ ¬¬A we can derive Γ ⇒ A. But obviously we can’t
do that in a system where we can only move from logically simpler to logically
more complex sequents!
What to do? Well, at this point Gentzen’s second (and quite original) idea
comes into play. We now liberalize the notion of a sequent. Previously, we took
a sequent Γ ⇒ A to relate zero or more wffs on the left to a single wff on the
right. Now we pluralize on both sides of the sequent sign, writing Γ ⇒ ∆; and
we read that as saying that at least one of ∆ is deducible from the wffs Γ. If you
like, you can regard ∆ as delimiting the field within which the truth must lie if
the premisses Γ are granted. (We’ll continue, for our purposes, to treat Γ and ∆
officially as sets, rather than multisets or lists: note that we will allow either or
both to be empty.)
Keeping the idea that we want all our rules for the logical operators to be
rules for introducing operators to the left or right of the sequent sign, how might
these rules now go? There are various options, but the following can work nicely
for conjunction and disjunction:
Γ, A, B ⇒ ∆ Γ ⇒ ∆, A Γ ⇒ ∆, B
(∧L) (∧R)
Γ, A ∧ B ⇒ ∆ Γ ⇒ ∆, A ∧ B
Γ, A ⇒ ∆ Γ, B ⇒ ∆ Γ ⇒ ∆, A, B
(∨L) (∨R)
Γ, A ∨ B ⇒ ∆ Γ ⇒ ∆, A ∨ B
107
DRAFT– 31 DEC 2021
9 Elementary proof theory
I won’t give the rules for all the other logical operators here, but let’s note the
left and right rules for negation (these can either be built-in rules, if negation is
treated as a primitive built-in connective, or derived rules, if negation is defined
in terms of the conditional and absurdity):
Γ ⇒ ∆, A Γ, A ⇒ ∆
(¬L) (¬R)
Γ, ¬A ⇒ ∆ Γ ⇒ ∆, ¬A
These rules are evidently correct on the classical understanding of the connec-
tives. For the first rule, suppose that given the assumptions Γ, then (at least)
one of ∆ and A follows: then given the same assumptions Γ but now also ruling
out A, we can conclude that (at least) one of ∆ is true. We can argue similarly
for the second rule. But with these negation and disjunction rules in place we
immediately have the following derivation:
A ⇒ A (¬R)
⇒ A, ¬A
(∨R)
⇒ A ∨ ¬A
Out pops the law of excluded middle! – so we know we are dealing with classical
calculus.
(h) What about the structural rules for our classical sequent calculus which
allows multiple alternative conclusions as well as multiple premisses? We can
now allow weakening on both sides of a sequent. And we can generalize the cut
rule to take this form:
Γ ⇒ ∆, A Γ 0 , A ⇒ ∆0
Γ, Γ0 ⇒ ∆, ∆0
(Think why this is a sound rule, given our interpretation of the sequents!) But
then, just as with our sequent calculus for intuitionistic logic, we can proceed to
prove that we can eliminate cuts. If a sequent is derivable in our classical sequent
calculus, it is derivable without using the cut rule.
And as with intuitionist logic, this immediately gives us some nice results. Of
course, we won’t have the disjunction property (think excluded middle!). But we
still have the subformula property in the form that if Γ ⇒ ∆ is derivable, the
every formula in the sequent proof is a subformula of one of Γ, ∆. And again,
simply but crucially, ⇒ ⊥ won’t be derivable in the cut-free classical system,
so it is consistent.
And that’s perhaps enough by way of introduction to our theme (B), in which
we begin to explore various elegant sequent calculi, prove cut-elimination theo-
rems, and draw out their implications.
(a) You might very well wonder whether there can be any illuminating and
informative ways of proving PA to be consistent. After all, proving consistency
by appealing to a stronger theory like ZF set theory which in effect contains PA
won’t be a very helpful (for doubts about the consistency of PA will presumably
just carry over to become doubts about the stronger theory). And you already
know that Gödel’s Second Incompleteness Theorem shows that it is impossible
to prove PA’s consistency by appealing to a weaker theory tame enough to be
modelled inside PA (not even full PA can prove PA’s consistency).
However, another possibility does remain open. It isn’t ruled out that we can
prove PA’s consistency by appeal to an attractive theory which is weaker than
PA in some respects but stronger in others. And this is what Gentzen aims to
give us in his consistency proof for arithmetic.3
(b) Here then is an outline sketch of the key proof idea, in Gentzen’s own
words.
We start with a formulation of PA using for its logic a classical sequent calculus
including the cut rule. (We will initially want the cut rule in making use of PA’s
axioms, and we can’t assume straight off the bat that we can still eliminate cuts
once we have more complex proofs appealing to non-logical axioms). Then,
The idea, then, is that the various sequent proof-trees in this version of PA can
be put into an ordering by a kind of dependency relation, with more complex
proof trees (on a suitable measure of complexity) coming after simpler proofs.
And this can be a well-ordering, so that the position along the ordering can
indeed be tallied by an ordinal number.
But why is the relevant linear ordering of proofs said to be transfinite (in other
words, why must it allow an item in the ordering to have an infinite number of
predecessors)? Because
[it] may happen that the correctness of a proof depends on the cor-
rectness of infinitely many simpler proofs. An example: Suppose that
in the proof a proposition is proved for all natural numbers by com-
plete induction. In that case the correctness of the proof obviously
depends on the correctness of every single one of the infinitely many
individual proofs obtained by specializing to a particular natural
3 Gentzen in fact gives four different proofs, developed along somewhat different lines. But
the master idea underlying the best known of the proofs is given in a wonderfully clear
way in his wide-ranging lecture on ‘The concept of infinity in mathematics’ reprinted in his
Collected Papers, from which the following quotations come.
109
DRAFT– 31 DEC 2021
9 Elementary proof theory
Think of it this way: a proof by induction of the quantified ∀xA(x) leaps beyond
all the proofs of A(0), A(1), A(2), . . . . And the result ∀xA(x) depends for its
correctness on the correctness of the simpler results. So, in the sort of ordering
of proofs which Gentzen has in mind, the proof by induction of ∀xA(x) must
come infinitely far down the list, after all the proofs of the various A(n).
And now Gentzen’s key step is to argue by an induction along this transfinite
ordering of proofs. The very simplest proofs right at the beginning of the ordering
transparently can’t lead to contradiction. Then
Transfinite induction here is just the principle that, if we can show that a proof
has a property P if all its predecessors in the relevant ordering have P , then all
proofs in the ordering have property P .
(c) We can implement this same proof idea the other way around. We show
that if any proof does lead to contradiction, then there must be an earlier proof
in the linear ordering of proofs which also leads to contradiction – so we get
an infinite sequences of proofs of contradiction, ever earlier in the ordering. But
then the ordinals which tally these proofs of contradiction would have to form
an infinite descending sequence. And there can’t be such a sequence of ordinals.
Hence no proof leads to contradiction and PA is consistent.
(d) Two questions arising. First, how do we show that if a proof leads to a
contradiction, then there must be an earlier proof in the linear ordering of proofs
which leads to contradiction? By eliminating cuts using reduction procedures like
those involved in the proof of cut-elimination for a pure logical sequent calculus
– so here’s the key point of contact with ideas we meet in tackling theme (B).
And second, what kind of transfinite ordering is involved here? Gentzen’s
ordering of possible proof-trees in his sequent calculus for PA turns out to have
the order type of the ordinals less than ε0 (what does that mean? – the references
will explain, but these are all the ordinals which are sums of powers of ω). So,
what Gentzen’s proof needs is the assumption that a relatively modest amount
of transfinite induction – induction up to ε0 – is legitimate.
110
DRAFT– 31 DEC 2021
Proof theory and the consistency of arithmetic: a short overview
Now, the PA proof-trees which we are ordering are themselves all finite ob-
jects; we can code them up using Gödel numbers in the familiar sort of way.
So in ordering the proofs, we are in effect thinking about a whacky ordering of
(ordinary, finite) code numbers. And whether one number precedes another in
the whacky ordering is nothing mysterious; a computation without open-ended
searches can settle the matter.
So what resources does a Gentzen-style argument use, if we want to code it up
and formalize it? The assignment of a place in the ordering to a proof can be han-
dled by primitive recursive functions, and facts about the dependency relations
between proofs at different points in the ordering can be handled by primitive
recursive functions too. A theory in which we can run a formalized version of
Gentzen’s proof will therefore be one in which we can (a) handle primitive recur-
sive functions and (b) handle transfinite induction up to ε0 , maybe via coding
tricks. It turns out to be enough to have all p.r. functions available, together with
a formal version of transfinite induction just for simple quantifier-free wffs con-
taining expressions for these p.r. functions. Such a theory is neither contained in
PA (since it can prove PA’s consistency by formalizing Gentzen’s method, which
PA can’t), nor does it contain PA (since it needn’t be able to prove instances of
the ordinary Induction Schema for arbitrarily complex wffs).
So, in this sense, we can indeed prove the consistency of PA by using a theory
which is weaker than PA in some respects while stronger in others.
(e) Of course, it is a very moot point whether – if you were really worried about
the consistency of PA – a Gentzen-style proof when fully spelt out would help
resolve your doubts. Are the resources it requires ‘tame’ enough to satisfy you?
Well, if you are globally worried about the use of induction in general, then
appealing to an argument which deploys an induction principle won’t help! But
global worries about induction are difficult to motivate, and perhaps your worry
is more specifically that induction over arbitrarily complex wffs might engen-
der trouble. You note that PA’s induction principle applies, inter alia, to wffs
that themselves quantify over all numbers. And you might worry that if (like
Frege) you understand the natural numbers to be what induction applies to, then
there’s a looming circularity here – numbers are understood as what induction
applies to, but understanding some cases of induction involves understanding
quantifying over numbers. If that is your worry, the fact that we can show that
PA is consistent using an induction principle which is only applied to quantifier-
free wffs (even though the induction runs over a novel ordering on the numbers)
could soothe your worries.
Be that as it may: we can’t pursue that kind of philosophical discussion any
further here. The point remains that the Gentzen proof is a fascinating achieve-
ment, containing the seeds of wonderful modern work in proof theory. Perhaps
we haven’t quite executed Hilbert’s Programme of proving consistency by appeal
to entirely tame proof-theoretic reasoning. But in the attempt, we have found
how far along the ordinals we need to run our transfinite induction in order to
111
DRAFT– 31 DEC 2021
9 Elementary proof theory
prove the consistency of PA.4 And we can now set out to discover how much
transfinite induction is required to prove the consistency of other theories. But
the achievements of that kind of ordinal proof theory will have to be left for you
(eventually) to explore . . .
1. Jan von Plato, ‘The development of proof theory’, The Stanford Ency-
clopedia of Philosophy. Available at tinyurl.com/sep-devproof.
And then look at the first half of the main entry on proof theory:
2. Michael Rathjen and Wilfrid Sieg, ‘Proof theory’, §§1–3, The Stanford
Encyclopedia of Philosophy. Available at tinyurl.com/sep-prooftheory.
Skip over any passages that are initially unclear, and return to them when
you’ve worked through some of the readings below.
In keeping with our overviews in the previous two sections, I suggest that – in
a first encounter with proof theory – you focus on (A) normalization for natural
deduction and its implications; (B) the sequent calculus, cut-elimination and
its implications; and (C) a Gentzen-style proof of the consistency of arithmetic.
Now, there is book which aims to cover just these topics at the level we want:
However, as the authors say in their Preface, “in order to make the content acces-
sible to readers without much mathematical background, we carry out the details
of proofs in much more detail than is usually done.” And this isn’t anywhere
near as reader-friendly as they intend: expositions too often become wearyingly
laborious. Also the authors stick very closely to Gentzen’s own original papers,
which isn’t always the wisest choice. So, at least on topic areas (A) and (B), I
will be highlighting some alternatives.
(A) You could find that the following Handbook of the History of Logic article
gives some more helpful orientation:
4 Technical remark. There are no worries about using transfinite induction up to any ordinal
less than ε0 ; for this can be handled inside PA. So Gentzen’s proof calls on the least possible
extension to the amount of induction that can be handled inside PA!
112
DRAFT– 31 DEC 2021
Main recommendations on elementary proof theory
You could next tackle Chs 3 and 4 of IPL. But there’s a lot to be said for just
diving into the brisk opening chapters of a modern classic:
Von Plato’s book is, in fact, intended as a first introductory logic text, based
on natural deduction: but it, very unusually, has a strongly proof-theoretic em-
phasis. And non-mathematicians, in particular, could find the whole book very
helpful.
(B) Next, moving on to sequent calculi, you could start with Chs 5 and 6 of
IPL. But the following is very accessibly written, ranges more widely, and is
likely to prove quite a bit more enjoyable:
113
DRAFT– 31 DEC 2021
9 Elementary proof theory
7. Sara Negri and Jan von Plato, Structural Proof Theory (CUP, 2001).
The first four chapters gives us the basics. Ch. 1 helpfully bridges our
topics, ‘From natural deduction to sequent calculus’. Ch. 2 gives a sequent
calculus for intuitionistic propositional logic and proves the admissibility
of cut. Ch. 3 does the same for classical propositional logic. Ch. 4 adds
the quantifiers.
You might well want to then read on to Ch. 5 which illuminatingly
discusses some variant sequent calculi. Then you can jump to Ch. 8 which
takes us ‘Back to natural deduction’. This relates the sequent calculus to
natural deduction with general elimination rules, shows how to translate
between the two styles of logic, and then derives a normalization theorem
from the cut-elimination theorem: again this is very instructive.
Negri and von Plato note that, as we ‘permute cuts upward’ in a derivation
– in order to eventually arrive at a cut-free proof – the number of cuts
remaining in a proof can increase exponentially as we go along (though
the process eventually terminates). So a cut-free proof can be much bigger
than its original version. Pelletier and Hazen (4) in their §3.8 make some
interesting related comments about sizes of proofs. And you will certainly
want to read this famous short paper:
8. George Boolos, ‘Don’t eliminate cut’, reprinted in his Logic, Logic, and
Logic (Harvard UP, 1998).
And now, if you really want to know more (in particular about how Gentzen
originally arrived at his cut-elimination proof) you can make use of the relevant
IPL chapters, skipping over a lot of the tedious proof-details.
(C) Next, on Gentzen’s proof of the consistency of arithmetic. Von Plato (1)
and Rathjen and Sieg (2) both provide some context for Gentzen’s work. And
here’s a contemporary mathematician’s perspective on why we might be inter-
ested in the proofs of the consistency of PA:
9. Timothy Y. Chow, ‘The consistency of arithmetic’, The Mathematical
Intelligencer 41 (2019), 22–30. Available at tinyurl.com/chow-cons.
Now we have two options, as Rathjen and Sieg (2) makes clear. We can tackle
something like one of Gentzen’s own consistency proofs for PA; but we then have
to tangle with a lot of messy detail as we negotiate the complications caused
by having to deal with the induction axioms. Or alternatively we can box more
cleverly, and prove consistency for a theory PAω which swaps the induction
axioms for an infinitary rule. The proof uses the same overall strategy, but this
time its implementation is a lot less tangled (yet the proof still does the needed
job, since PAω ’s consistency implies PA’s consistency).
There are a number of versions of the second line of proof in the literature.
There is quite a neat but rather terse version here, from which you should be
able to get the general idea (it assumes you know a bit about ordinals):
114
DRAFT– 31 DEC 2021
Some parallel/additional reading
Read Chapter 8 on ordinal notations first. Then the main line of proof is in
Chapters 7 and 9. Now, after an initial dozen pages saying something about
PA, these two chapters together span another sixty-five pages(!), and it is
consequently easy to get lost/bogged down in the details. And it is not as
if the discussion is padded out by e.g. a philosophical discussion about the
warrant for accepting the required amount of ordinal induction; the length
comes from hacking through more details than any sensible reader will want
or need.
However, if you have already tackled a modest amount of other mathe-
matical logic, you should by now have enough nous to be able to read these
chapters pausing over the key ideas and explanations while initially skip-
ping/skimming over much of the detail. You could then quite quickly and
painlessly end up with a very good understanding of at least the general
structure of Gentzen’s proof and of what it is going to take to elaborate it.
So I suggest first skimming through to get the headline ideas, and then do
a second pass to get more feel for the shape of some of the details. You can
then drill down further again to work through as much of the remaining
nitty-gritty that you then feel that you really want/need (which probably
won’t be much!).
it looks if you compare how you prove the completeness of a tree system
of logic). Then tackle Ch. 2, §9 on Peano Arithmetic. You can skip the
next section on the incompleteness theorem, and skim §11 on ordinals
(which makes rather heavy weather of what’s really needed, which is the
claim that a decreasing series of ordinals less than ε0 can only be finitely
long: see p. 98 on). The core consistency proof is then given in §12; read
up to at least p. 114. This isn’t exactly plain sailing – but if you skip
and skim over some of the more tedious proof-details you should pick up
a good basic sense of what happens in the consistency proof.
12. Jean-Yves Girard, Proof Theory and Logical Complexity. Vol. I (Bib-
liopolis, 1987). With judicious skipping, which I’ll signpost, this is read-
able and insightful, though some proofs are a bit arm-waving.
So: skip the ‘Foreword’, but do pause to glance over ‘Background and
Notations’ as Girard’s symbolic choices need a little explanation. Then
the long Ch. 1 is by way of an introduction, proving Gödel’s two in-
completeness theorems and explaining ‘The Fall of Hilbert’s Program’:
if you’ve read some of the recommendations on arithmetic, you can prob-
ably skim this fairly quickly, though noting Girard’s highlighting of the
notion of 1-consistency.
Ch. 2 is on the sequent calculus, proving Gentzen’s Hauptsatz, i.e.
the crucial cut-elimination theorem, and then deriving some first conse-
quences (you can probably initially omit the forty pages of annexes to
this chapter). Then also omit Ch. 3 whose content isn’t relied on later.
But Ch. 4 on ‘Applications of the Hauptsatz ’ is crucial (again, however,
at a first pass you can skip almost 60 pages of annexes to the chap-
ter). Take the story up again with the first two sections of Ch. 6, and
then tackle the opening sections of Ch. 7. A rather bumpy ride but very
illuminating.
13. A. S. Troelstra and H. Schwichtenberg, Basic Proof Theory (CUP 2nd
ed. 2000).
This a volume in the series ‘Cambridge Tracts in Computer Science’.
Now, one theme that runs through the book concerns the computer-
science idea of formulas-as-types and invokes the lambda calculus: how-
ever, it is in fact quite possible to skip over those episodes if (as is
probably the case) you aren’t yet familiar with the idea. The book, as
the title indicates, is intended as a first foray into proof theory, and it
is reasonably approachable. However it does spend quite a bit of time
looking at slightly different ways of doing natural deduction and slightly
different ways of doing the sequent calculus, and the differences may
matter more for computer scientists with implementation concerns than
for others.
You can, however, with a bit of skipping, at this stage very usefully
read just Chs. 1–3, the first halves of Chs. 4 and 6, and then Ch. 10 on
arithmetic again.
116
DRAFT– 31 DEC 2021
Some parallel/additional reading
We will return to consider more advanced texts on proof theory in the final
chapter, §12.6.
117
DRAFT– 31 DEC 2021
10 Modal logics
118
DRAFT– 31 DEC 2021
Some basic modal logics
ator 3 (so we read 3A as it is possibly true that A). But, to keep things simple,
we won’t do that, since 3A can equally well be treated as just a definitional
abbreviation for ¬¬A. Reflect: it is possibly true that A iff A is true at some
possible world, iff it isn’t the case that A is false at all possible worlds, iff it isn’t
the case that ¬A is necessary. So the parallel between the equivalences 3/¬¬
and ∃w/¬∀w¬ is not an accident!
A third modal symbol you will come across is J, for what is standardly called
‘strict implication’. But again, we can treat A J B as a definitional abbreviation,
this time for (A → B).
Hence, following quite common practice, we will here take to be the sole
built-in modal operator in our languages.
(b) The story of modern modal logic begins with C. I. Lewis’s 1918 book A
Survey of Symbolic Logic. Lewis presents postulates for J, motivated by claims
about the proper understanding of the idea of implication, though unfortunately
his claims do seem pretty muddled.1 Later, in C. I. Lewis and C. H. Langford’s
1932 Symbolic Logic, there are further developments: the authors distinguish five
modal logics of increasing strength, which they label S1 to S5. But why multiple
logics?
Let’s take four schemas, and ask whether we should accept all their instances
when the is interpreted in terms of necessary truth:
1 The modern reader might well suspect confusion between ideas that we now demarcate by
using the distinguishing notations →, ` and .
119
DRAFT– 31 DEC 2021
10 Modal logics
1. The basic ingredients we need are some objects W and a relation R defined
over them. For the moment, think of W as a collection of ‘possible worlds’
and then wRw0 will say that the world w0 is possible relative to w (or if
you like, w0 is an accessible possible world, as seen from w).
2. And we will pick out an object w0 from W to serve as the ‘actual world’.
Let’s say, for short, that a relational structure where the relation R satisfies the
condition S is an S-structure.
Next we define the idea of a valuation of L-sentences on an S-structure. The
story starts unexcitingly!
The only real novelty, as trailed at the outset, is in the treatment of the modal
operator . We stipulate
Evidently, given (20 ) and (30 ), every valuation ends up assigning a value to each
L-wff A at each world.
120
DRAFT– 31 DEC 2021
Some basic modal logics
Let’s say that an S-structure together with such a valuation for L-sentences
is an S-model for L. Then, continuing our list of definitions, when A is an L-
sentence,
40 . A is (simply) true in a given S-model for L if and only if A takes the value
true at the actual world w0 in the model.
So that sets up the general framework for a relational semantics for a propo-
sitional modal language. But we are now going to be interested in four different
particular versions got by filling out the specification S in different ways, and so
giving us four different notions of validity for propositional modal wffs:
(K) K-validity is defined in terms of K-models which allow any relation R (the
specification condition S is null).
(T) T -validity is defined in terms of T -models which require the relation R to
be reflexive.
(S4) S4 -validity is defined in terms of S4 -models which require the relation R
to be reflexive and transitive.
(S5) S5 -validity is defined in terms of S5 -models which require the relation R
to be reflexive, transitive and symmetric (i.e. R has to be an equivalence
relation).
As we will soon discover, the labels we have chosen are indeed significant!
(d) Let’s look at a couple of very instructive mini-examples. Take first the
following two-world model, with an arrow w −→ w0 depicting that wRw0 , and
with the values of P at each world as indicated:
w0 w1
P := F P := T
121
DRAFT– 31 DEC 2021
10 Modal logics
w0 w1 w1
P := T P := T P := F
Note, this is not only a K model but also a T -model, because the diagrammed
accessibility relation R is reflexive; but it is not an S4 model since R is not
transitive (we have w0 Rw1 and w1 Rw2 but not w0 Rw2 ).
Now, in this model, P is true at w0 (because P is true at both the accessible-
from-w0 worlds, i.e. at w0 and w1 ). But P is false at w1 (because P is false at
the accessible-from-w1 world w2 ). And then since P is false at w1 and w1 is
accessible from w0 , it follows that P is false at w0 . And hence in this model
P → P is false (i.e. false at w0 ). Moral: the S4 principle can fail in models
where the accessibility relation is not transitive.
But we can also show the reverse – in other words, in S4 models where the
accessibility relation is transitive, the S4 principle holds. That follows because
S4 can only fail in a model if the accessibility relation is non-transitive:
So our two mini-examples very nicely make the connection between a structural
condition on models and the obtaining of a general modal principle such as T or
S4. More about this very shortly.
(e) Since our main concern here is with the formalities, we won’t delve into the
arguments about which specification conditions S appropriately reflect which
intuitive notions of necessity (though note that even the condition T can fail if
e.g. we want to model deontic necessities – i.e. necessities of duty: since what
ought to be the case may not in fact be the case!). We can leave it to the
philosophers to fight things out. For now, it might be more useful to pause to
summarize our semantic story in the style of our earlier account of intuitionistic
semantics in §8.3(c).
So, an S-structure is a triple (w0 , W, R) where W is a set, w0 ∈ W , and R is
a relation defined over W which satisfies the conditions S. Then an S-model for
122
DRAFT– 31 DEC 2021
Some basic modal logics
A → A: obviously, A can be true without being necessarily true. However, the
idea justifying (Nec) is that if A is actually a logical theorem – i.e. is deducible
from logical principles alone – then it will indeed be necessary (on almost any
sensible understanding of ‘necessary’). Here’s an example of the rule (Nec) in
use in a K-proof:
In sum, then, all the theorems of the weak system K – i.e. all the wffs deducible
from axioms alone – should be logical truths on (almost all) readings of read
as a kind of necessity.
And now here are three nested ways of strengthening the system K:
(T) T is the axiomatic system K augmented with all instances of the schema
T as axioms.
(S4) S4 is T augmented with all instances of the schema S4 as axioms.
(S5) S5 is S4 augmented with all instances of the schema S5 as axioms.
The readings will give lots of examples of these (or equivalent) proof systems in
action.
(g) So now at last for the big reveal – except of course I’ve entirely sploit any
element of surprise by the parallel labelling of the flavours of modal semantics
and the flavours of axiomatic proof system!
What Kripke famously showed is the following lovely result:
In short, we have soundness and completeness theorems for our proof systems.
And there are some nice immediate implications. Searching for an appropriate
countermodel which shows that a wff is not S-valid is a finite business, so it is
decidable what’s S-valid – and hence it is decidable what’s an S-theorem.2
These soundness and completeness results are not mathematically very dif-
ficult. Perhaps Kripke’s real achievement was the prior one in developing the
general semantic framework and in finding the required simple proof systems –
some of them different from any of the systems proposed by Lewis and Langford
– thereby making his very elegant result possible.
2 Suppose we define in the now obvious ways (i) the idea of a conclusion being an S-valid
consequence of some finite number of premisses, and (ii) the idea of that conclusion being
deducible in system S from those premisses. Then again we have soundness and weak
completeness proofs linking valid consequences with deductions, and we have corresponding
decidability results too. We won’t worry however about strong completeness, which does
indeed fail for some modal logics, e.g. for GL which we meet in the next section.
124
DRAFT– 31 DEC 2021
Provability logic
(h) And now, with the apparatus of relational semantics available, the flood-
gates really open! After all, the objects in a S-model don’t have to represent
‘possible worlds’ (whatever they are conceived to be); they can stand in for
any points in a relational structure. So perhaps they could represent states of
knowledge, points of a time series, positions in a game, states in the execution of
a program, levels in a hierarchy . . . with different classes of accessibly relations
appropriate for different cases and so with different deductive systems to match.
The resulting applications of propositional modal logics are indeed very many
and various, as you will see.
(i) And what about quantified modal logics, where we add the modal operator
to a first-order language? Why might we be interested in them?
Well, philosophers make play with questions like this: Does it make sense to
suppose the very same objects can appear in the domains of different possible
worlds? If it does, do all possible worlds contain the same objects (perhaps some
of them actualized, some not)? Does a proper name (formally a constant term)
denote the same thing at any possible world at which it denotes at all? Are
atomic identity statements, if true at all, necessarily true? Questions of this
stripe pile up, and they motivate different ways of tweaking quantified modal
logic in formally modelling and so clarifying the philosophical ideas.
However, the resulting logics don’t seem to be of particular interest to non-
philosophers; the wider logical community has (as yet) been much more inter-
ested in propositional modal logics.
Still, the beginnings of the technical story about first-order modal logics are
pretty accessible. And the suggested readings will enable you to get some head-
line news about different proof systems and their formal semantics, without
getting too entangled in unwanted philosophical debates!
125
DRAFT– 31 DEC 2021
10 Modal logics
If A is wff of arithmetic, let pAq be shorthand for A’s Gödel-number, and let
pAq be shorthand for the formal numeral for pAq. Then, given our definitions,
Prov(pAq) says that A is provable in PA.
Now we introduce yet another bit of shorthand: let’s use A as a simple
abbreviation for Prov(pAq).4 With some effort, we can then show that PA proves
(unpacked versions of) all instances of the following familiar-looking schemas
K· (A → B) → ( A → B)
S4· A→ A
That package of facts about PA is standardly reported by saying that the theory
satisfies the so-called HBL derivability conditions. And appealing to these facts
together with the First Incompleteness Theorem, it is then easy to derive the
Second Theorem that PA cannot prove ¬ ⊥ (i.e. can’t prove that ⊥ isn’t
provable, i.e. can’t prove that PA is consistent).5
(b) The obvious next question might well seem to be: what other modal princi-
ples/rules should our dotted-box-as-a-provability-predicate obey, in addition to
the dotted principles K· and S4·, and the rule (Nec·)? What is its appropriate
modal logic?
But hold on! We are getting ahead of ourselves, because we so far only have
the illusion of modal formulas here. The box as just defined simply doesn’t have
the right grammar to be a modal operator. Look at it this way. In a proper
modal language, the operator is applied to a wff A to give a complex wff A
in which A appears as a subformula. But in our newly defined usage where A
is short for Prov(pAq), the formula A doesn’t appear as a subformula at all –
what fills the appropriate slot(s) in the predicate Prov is a numeral (the numeral
for the number which happens to code the formula A).
In short, the surface form of our notation A is entirely misleading as to its
logical form. Which is why the logically pernickety might indeed not be very
happy with the notation.
However, it remains the case that our abbreviatory notation is highly sugges-
tive. And what it suggests is starting with a kosher modal propositional language
of the kind now familiar for §10.1, where the box is genuinely a unary opera-
tor applied to wffs. And then we consider arithmetical interpretations which
map sentences A of our modal language to corresponding sentences A∗ of PA,
interpretations which have the following shape:
4 I’ve dotted the box here – not the usual notation – for clarity’s sake!
5 Formore details, if this is new to you, see for example Chapter 33 of my An Introduction
to Gödel’s Theorems (downloadable from logicmatters.net/igt).
126
DRAFT– 31 DEC 2021
Provability logic
ii. The map then respects the propositional connectives: for example, it sends
conjunctions in the modal language to conjunctions in the arithmetic lan-
guage, so (A ∧ B)∗ is (A∗ ∧ B ∗ ); it sends the absurdity constant to the
absurdity constant, i.e. ⊥∗ is ⊥; and so on.
iii. The map sends the modal sentence A to A∗ , i.e. to Prov(pA∗ q).
We will presumably want to reflect this theorem in a logic for the genuinely
modal operator interpreted as arithmetical provability: a natural move, then,
is to build into our modal logic the rule that, if A → A is deducible as a
theorem, then we can infer A.
So this putting this thought together with our previous remarks, let’s consider
the following modal logic – the ‘G’ in its name is for Gödel who made some
prescient remarks, and the ‘L’ is for Löb:
(GL) The modal axiomatic system GL is the theory whose axioms are
(Ax i) All instances of tautologies
(Ax ii) All instances of the schema K: (A → B) → (A → B)
(Ax iii) All instances of the schema S4: A → A
And whose rules of inference are
(MP) From A and A → B, infer B
(Nec) If A is deducible as a theorem, infer A
(Löb) If A → A is deducible as a theorem, infer A.
You can immediately see, by the way, that we don’t also want to add all instances
of the T-schema A → A to this modal logic. For a start, doing that would
make ⊥ → ⊥ a theorem and hence ¬⊥ would be a theorem. But that can’t
correspond on arithmetic interpretation to a theorem of PA, since we know that
PA can’t prove ¬ ⊥.
127
DRAFT– 31 DEC 2021
10 Modal logics
And there’s worse: leaving aside the desired interpretation of this logic, if we
add all instances of A → A as axioms, then in the presence of the rule (Löb),
we can derive any A, and the logic is inconsistent.
Now, given our motivational remarks in defining GL, it won’t be a surprise
to learn that it is indeed sound on the provability interpretation. Once we have
done the (non-trivial!) background work required for showing that the HBL
derivability conditions and hence Löb’s theorem hold in PA, it is quite easy to
go on to establish that, on every interpretation of the modal language into the
language of arithmetic, every theorem of GL is a theorem of PA.
And (with more decidedly non-trivial work due to Robert Solovay) it can also
be shown that GL is complete on the provability interpretation. In other words,
if a modal sentence is such that every arithmetic interpretation of it is a PA
theorem, then that sentence is a theorem of the modal logic GL.
Which is all very pleasingly neat!
(d) We should pause to note that there is another way of presenting this prov-
ability logic.
Suppose we drop the Löb inference rule from GL, and replace the instances
of the S4 schema as axioms with instances of the Löb-like schema
L (A → A) → A
It is then quite easy to see that this results in a modal logic with exactly the
same theorems (because GL in our original formulation implies all instances of
L; and conversely we can show that all instances of S4 can be derived in the new
formulation, for which the Löb rule is also a derived rule of inference). Hence
either formulation gives us the provability logic for PA.
(e) Now, we’ve so far been working with arithmetic interpretations of our modal
wffs. But we can also give a more abstract Kripke-style relational semantics for
GL (it is a nice question, though, whether this ‘semantics’ has much to do
with meaning!). We start by defining a GL-model in the usual sort of way as
comprising a valuation with respect to some worlds W with a relation R defined
over them, where R satisfies . . .
Well, what conditions do we in fact need to place on R so that GL-theorems
match with the GL-validities (the truths that hold at every world, for every GL-
model)? Clearly, we mustn’t require R to be reflexive – or else all instances of
the T-schema would come out GL-valid, and we don’t want that. Equally clearly,
we must require R to be transitive – or else instances of the S4-schema could
fail to be GL-valid. But we need more: what further condition on R is required
to make all the instances of the L-schema come out valid?
It turns out that what is needed is that there is no infinite chain of R-related
worlds w0 , w1 , w2 , w3 , . . . such that w0 Rw1 Rw2 Rw3 . . . (and that condition en-
sures that R is irreflexive, for otherwise we would have some infinite chain
wRwRwRw . . .). Call that the finite chain condition. Then define a GL-model as
one where the accessibility relation R is transitive and satisfies the finite chain
128
DRAFT– 31 DEC 2021
First readings on modal logic
5. Johan van Bentham, Modal Logic for Open Minds (CSLI Publications,
2010). This ranges widely and is good at highlighting main ideas and
making cross-connections with other areas of logic. Particularly interest-
ing and enjoyable to read in parallel with the main recommendations.
6. Rineke Verbrugge, ‘Provability logic’ §§1–4 and perhaps §6, The Stanford
Encyclopedia of Philosophy. Available at tinyurl.com/prov-logic.
130
DRAFT– 31 DEC 2021
Alternative and further readings on modal logics
Or you could dive straight into the very first published book on our topic, which
I think still makes for the most attractive entry-point:
However, this seems to be one of the very few distinguished mathematical logic
books which is not readily available online. So I need to also mention
Then, for more pointers towards recent work on related topics you could look at
§5 of Verbrugge’s article and/or at the following interesting overview:
131
DRAFT– 31 DEC 2021
10 Modal logics
philosophers could indeed find it useful. For example, the chapters on quanti-
fied modal logic (and some of the conceptual issues they raise) are brief and
approachable.
Sider is, however, closely following a particularly clear old classic by G. E.
Hughes and M. J. Cresswell A New Introduction to Modal Logic (Routledge,
1996, updating their much earlier book). This can still be recommended and
may suit some readers, though it does take a rather old-school approach.
If your starting point has been Priest’s book or Fitting/Mendelson, then you
might want at some point to supplement these by looking at a treatment of
natural deduction proof systems for modal logics. One option is to dip into Tony
Roy’s long article ‘Natural derivations for Priest’, in which he provides ND logics
corresponding to the propositional modal logics presented in tree form in Priest’s
book, though this gets much more detailed than you really need: available at
tinyurl.com/roy-modal. But a smoother introduction to ND modal systems is
provided by Chapter 5 of Girle, or by my main alternative recommendation for
philosophers, namely
11. James W. Garson, Modal Logic for Philosophers* (CUP, 2006; 2nd end.
2014). This again is intended as a gentle introductory book: it deals with
both ND and semantic tableaux (trees), and covers quantified modal
logic. It is quite a long book (one reason for preferring the snappier Fit-
ting/Mendelsohn as a first recommendation), with quite a lot on quan-
tified modal logics: and it is indeed pretty accessible.
(b) Modal logics for philosophical applications If you are interested in appli-
cations of propositional modal logics to tense logic, epistemic logic, deontic logic,
etc. then the relevant chapters of Girle’s book give helpful pointers to more read-
ings on these topics. If your interests instead lean to modal metaphysics, then
– once upon a time – a discussion of quantified modal logic at the level of Fit-
ting/Mendelsohn or Garson would have probably sufficed. And for a bit more
on first-order quantified modal logics, see
132
DRAFT– 31 DEC 2021
Finally, a very little history
presentation will probably put the discussion out of reach of most philosophers
who might be interested. You have been warned.
(c) Four more technical books In order of publication, here are three more ad-
vanced and rather more challenging texts I can suggest to sufficiently interested
readers:
13. Sally Popkorn, First Steps in Modal Logic (CUP, 1994). The author is,
at least in this possible world, identical with the late mathematician
Harold Simmons. This book, which entirely on propositional modal log-
ics, is written for computer scientists. The Introduction rather boldly
says ‘There are few books on this subject and even fewer books worth
looking at. None of these give an acceptable mathematically correct ac-
count of the subject. This book is a first attempt to fill that gap.’ This
considerably oversells the case: but the result is illuminating and read-
able.
14. Alexander Chagrov and Michael Zakharyaschev Modal Logic (OUP, 1997).
This is a volume in the Oxford Logic Guides series and again concentrates
on propositional modal logics. Definitely written for the more mathemat-
ically minded reader, it tackles things in an unusual order, starting with
an extended discussion of intuitionistic logic, and is good but rather
demanding.
15. Patrick Blackburn, Maarten de Ricke and Yde Venema’s Modal Logic
(CUP, 2001). This is one of the Cambridge Tracts in Theoretical Com-
puter Science: but don’t let that provenance put you off! This is an
accessibly and agreeably written text on propositional modal logic – cer-
tainly compared with the previous two books in this group – with a lot
of signposting to the reader of possible routes through the book, and
with interesting historical notes. I think it works pretty well, and will
also give philosophers an idea about how non-philosophers can make use
of propositional modal logic.
16. Lloyd Humberstone, Philosophical Applications of Modal Logic* (Col-
lege Publications, 2015). This very large large volume starts with a
book-within-a-book, an advanced 176 page introduction to propositional
modal logics. And then there are extended discussions at a high level of
a wide range of applications of these logics that have been made by
philosophers. A masterly compendium to consult as/when needed.
18. Sten Lindström and Krister Segerberg, ‘Modal logic and philosophy’ §1,
in The Handbook of Modal Logic, edited by P. Blackburn et al. (Elsevier,
2005).
134
DRAFT– 31 DEC 2021
11 Other logics?
i. One limitation of FOL is that we can only quantify over objects, as op-
posed to properties, relations and functions. Yet seemingly, we quantify
over properties etc. in informal mathematical reasoning. In Chapter 4, we
therefore considered adding second-order quantifiers. (This is just a first
step: there is a rich mathematical theory of higher-order logic, a.k.a. type
theory, which you will eventually want to explore – but I deem that to be
a more advanced topic, so we will return to it in the final chapter, §12.7.)
iii. Then in Chapter 10 we explored the use of the kind of relational semantics
we first met in the context of intuitionistic logic, but now in extending
FOL with modal operators. Again, the development on the formal side
is mathematically quite elegant: and some modal logics – in particular,
provability logic – have worthwhile mathematical applications.
And now, what other exhibits from the wild jungle of variants and/or extensions
of standard FOL are equally worth knowing about at this stage, as you begin
studying mathematical logic? What other logics are intrinsically mathematically
interesting, have significant applications to mathematical reasoning, but can be
reasonably regarded as entry-level topics?
A good question. In this chapter, I’ll be looking at three relatively accessible
variant logics that philosophers in particular have discussed, namely relevant
logic, free logic and plural logic. And – spoiler alert! – I’m going to be suggesting
that mathematical logicians can cheerfully pass by the first, should have a fleeting
acquaintance with the second, and might like to pause a bit longer over the third.
A and for any quite unconnected C – and correspondingly, in proof systems for
FOL, we can argue from the premisses A and ¬A to the arbitrary conclusion C.
But should we really count arguments as valid even when, as in this sort of case,
the premisses are totally irrelevant to the conclusion? Shouldn’t our formal logic
respect the intuitive idea – arguably already in Aristotle – that a conclusion in
a valid deduction must have something to do with the premisses?
Debates about this issue go back at least to medieval times. So let’s ask: what
might a suitable relevance-respecting logic look like? Is it worth the effort to use
such a logic?
(b) When we very first encounter it in Logic 101, the claim that A and ¬A
together entail any arbitrary conclusion C indeed initially seems odd. But we
soon learn that this result follows immediately from seemingly uncontentious
assumptions. Consider, in particular, these two principles:
Disjunctive syllogism is valid. From A ∨ C and ¬A we can infer C.
Entailment is transitive. In the simplest case, if A entails B and B
entails C, then A entails C. More generally, if Γ and ∆ stand in for
zero or more premisses, then if Γ entail B and ∆, B entail C, then
Γ, ∆ entail C
These indeed seem irresistible. Disjunctive syllogism is a principle we use all the
time in informal arguments (everyday ones and mathematical ones too). If we’ve
established that one of two options must hold, and can then rule out the first, this
surely establishes the second. And the transitivity of entailment is what allows
us to chain together shorter valid proofs to make longer valid proofs: reject it,
and it seems that the whole practice of proof in mathematics would collapse.
But now take the following three arguments:
P
P P∨Q ¬P
P∨Q ¬P
P∨Q Q
Q
The first just reflects our understanding of inclusive disjunction. The second is
the simplest of instances of disjunctive syllogism. The third argument chains
together the first two and, since they are valid entailments, this too is valid
according to the transitivity principle. So we have shown that P and ¬P entail
Q. And of course, we can generalize. In the same way, we can get from any pair
of premisses A and ¬A to an arbitrary conclusion C.
We have just three options, then:
1. Reject disjunctive syllogism as a universally valid principle (or at least, re-
ject disjunctive syllogism for the kind of disjunction for which the inference
A so A ∨ C is uncontentiously valid).
3. Bite the bullet, and accept what is often called ‘explosion’, the principle
that from contradictory premisses we can infer anything at all.
136
DRAFT– 31 DEC 2021
Relevant logics
The large majority of logicians take the first two options to be entirely unpalat-
able. So they conclude that we should indeed, as in standard FOL, learn to live
with explosion. And where’s the harm in that? After all, the explosive inference
can’t actually be used to take us from jointly true premisses to a false conclusion!
Still, before resting content with the explosive nature of FOL, perhaps we
should pause to see if there is any mileage in either option (1) or option (2).
What might a paraconsistent logic – one with a non-explosive entailment relation
– look like?
(c) Logicians are an ingenious bunch. And it isn’t difficult to cook-up a formal
system for e.g. a propositional language equipped with connectives written ∧, ∨
and ¬, for which analogues of disjunctive syllogism and explosion don’t generally
hold.
For example, suppose we adopt a natural deduction system with the usual
introduction and elimination rulers for ∧ and ∨ (as in §8.1). But the additional
rules governing negation are now just De Morgan’s Laws and a double negation
rule (the double inference lines indicate that you can apply the rules both top
to bottom and also the other way up).
¬(A ∧ B) ¬(A ∨ B) ¬¬A
(¬∧) (¬∨) (¬¬)
¬A ∨ ¬B ¬A ∧ ¬B A
The resulting logic is standardly called FDE for reasons that needn’t delay us.
And a little experimentation should convince you that, with only the FDE rules
in place, we can’t warrant either disjunctive syllogism or explosion.
But so what? By itself, the observation that dropping some classical rules stops
you proving some classical results has little interest. Compare the intuitionist
case, for example. There we are given a semantic story (the BHK account of the
meaning of the connectives) which aims to justify dropping the classical double
negation law. Can we similarly give a semantic story here which would again
justify dropping some classical rules and this time only underpin FDE ?
(d) Suppose – just suppose! – we think that there are four truth-related values
a proposition can take. Label these values T, B, N, F. And suppose that, given
an assignment of such values to atomic wffs, we compute the values of complex
wffs using the following tables:
A∧B T B N F A∨B T B N F A ¬A
T T B N F T T T T T T F
B B B F F B T B T B B B
N N F N F N T T N N N N
F F F F F F T B N F F T
These tables are to be read in the obvious way. So, for example, if P takes the
value B, and Q takes the value N, then P ∧ Q takes the value F, P ∨ Q takes the
value T, and ¬P takes the value B.
Suppose in addition that we define a quasi-entailment relation as follows: some
premisses Γ entail∗ a given conclusion C – in symbols Γ ∗ C – just if, on any
137
DRAFT– 31 DEC 2021
11 Other logics?
valuation which makes each premiss either T or B, the conclusion is also either
T or B.
Then, lo and behold, we can show that FDE is sound and complete for this
semantics – we can derive C from premisses Γ if and only if Γ ∗ C. And
note, as we wanted, the analogue of disjunctive syllogism is not always a correct
entailment∗ : on the same suggested valuations, both P ∨ Q and ¬P are either T
or B, while Q is N, so P ∨ Q, ¬P 2∗ Q. And we don’t always get explosion either,
since both P and ¬P are B while Q is N, it follows that P, ¬P 2∗ Q.
Which is all fine and good in the abstract. But what are these imagined
four truth-related values? Can we actually give some interpretation so that our
tables really do have something to do with truth and falsity, with negation,
conjunction and disjunction, and so that entailment∗ does arguably become a
genuine consequence relation?
Well, suppose – just suppose! – that propositions can not only be plain true
or plain false but can also be both true and false at the same time, or neither
true nor false. Then there will indeed be four truth-related values a proposition
can take – T (true), B (both true and false), N (neither), F (false).
And, interpreting the values like that, the tables we have given arguably re-
spect the intuitive meaning of the connectives. For example, if A is both true
and false, the same should go for ¬A. While if A is both true and false, and B is
neither, then A ∨ B is true because its first disjunct is, but it isn’t also false as
that would require both disjuncts to be false (or so we might argue). Similarly
for the other table entries. Moreover, the intuitive idea of entailment as truth-
preservation is still reflected in the definition of entailment∗, which says that if
the premisses are all true (though maybe some are false as well), the conclusion
is true (though maybe false as well).
(e) What on earth can we make of this supposition that some propositions are
both true and false at the same time? At first sight, this seems simply absurd.
However, a vocal minority of philosophers do famously argue that while, to
be sure, regular sentences are either true or false but not both, there are certain
special cases – e.g. the likes of the paradoxical liar sentence ‘This sentence is
false’ – which are both true or false.
It is fair to say that rather few are persuaded by this extravagant suggestion.
But let’s go along with it just for a moment. And now note that it isn’t immedi-
ately clear that this really helps. For suppose we do countenance the possibility
that certain special sentences have the deviant status of being both true and
false (or being neither). Then we might reasonably propose to add to our formal
logical apparatus an operator ‘!’ to signal that a sentence is not deviant in that
way, an operator governed by the following table:
A !A
T T
B F
N F
F T
138
DRAFT– 31 DEC 2021
Relevant logics
Why not? But then it is immediate that !P, P, ¬P ∗ Q. And similarly, if (say)
P and Q are the atoms present in A, then !P, !Q, A, ¬A ∗ C always holds.
So, if built out of regular atoms (expressing ordinary non-paradoxical claims),
a contradictory pair entails∗ anything. Yet surely, if we were seriously worried
by the original version of explosion, then this modified form will be no more
acceptable.
(f) We said that most logicians bite the bullet, and accept explosion because
they deem it harmless. But are they right?
It seems fundamental to a conditional connective → that it obeys the principle
of conditional proof. In other words, if the set of premisses Γ plus the temporary
assumption A together entail C, that shows that Γ entails A → C. But then
suppose we do accept the explosive inference from ¬A and A to C. Applying
conditional proof, we will have to agree that given ¬A, it follows that A → C,
for any unrelated consequent C, however irrelevant. And this, some will say, is
just the unacceptable face of the classical (or indeed intuitionistic) conditional:
so we should indeed reject explosion, not just for its prima facie oddity, but also
to get a nice conditional.
Now, if you have learnt to live happily with the standard conditional of classi-
cal or intuitionistic logic as an acceptable regimentation for serious mathematical
purposes, then you won’t be much moved by this argument. But what if you do
want to add a conditional connective where the inference from ¬A to A → C
generally fails?
Within an FDE -like framework, we can play with four-valued tables again,
now for the connective →. But on the more plausible ways of doing this, we
will still have !P, ¬P ∗ P → Q; and more generally, for wffs built out of regular
atoms, the conditional is just the material conditional again. So again, if we were
worried about the material conditional before, we should surely stay worried
about this sort of four-valued replacement.
(g) Let’s very briefly take stock.
We can run up proof systems like FDE which lack disjunctive syllogism and
explosion and where ¬A doesn’t imply A → C. Further, we can give these sys-
tems what looks like a semantics e.g. using four values (or alternatively we could
use Kripke-style valuations over some relational structure). But if this exercise
isn’t just to be an abstract game, then we do need to tell a story about how to
interpret the formal ‘semantics’ in order to link everything up with considera-
tions about truth and falsity and inference. And as we see in the initial case of
FDE, the supposed linkage can embroil us with highly implausible claims (e.g.
some propositions can be true and false – really?). Moreover, while our resulting
logic may not be classical overall, if we are allowed to distinguish regular true-or-
false propositions from those that behave deviantly according to the enhanced
semantic story, then in its application to the regular propositions, the new logic
can simply collapse back into classical logic again (with an entailment relation
and a conditional that don’t respect worries about relevance).
So already the price of avoiding exposition by rejecting disjunctive syllogism
139
DRAFT– 31 DEC 2021
11 Other logics?
140
DRAFT– 31 DEC 2021
Readings on relevant logic
Tennant advertises his proof system as core logic (actually there are two ver-
sions, one classical and one intuitionistic). His claim is that systems indeed cap-
ture the core of what we need in mathematical and scientific reasoning (classical
or constructive), without some of the unwanted extras. However, to avoid explo-
sion re-appearing, the operations of Tennant’s natural deduction system for his
core logic are inevitably subject to additional constraints on things like vacuous
discharge, as compared with the more free-wheeling proof-structures allowed in
standard systems for classical or intuitionistic systems. See the reading for more
details.
So here’s the obvious next question: is the occasional potential epistemic gain
from requiring proofs to obey the strictures of ‘core logic’ actually worth the
additional effort of strictly following its rules? A judgement call, of course. But
most mathematical logicians are going to return a negative verdict and, despite
Tennant’s energetic advocacy, feel quite comfortable on cost-benefit grounds of
sticking with their familiar ways.
If you just want to know what it takes to get a relevance-respecting logic by the
route of semantic revisionism, these two pieces should suffice. You may well then
quickly decide that you don’t want to pay the price, being happy to accept the
verdict of e.g.
3. John Burgess, ‘No requirement of relevance’, in S. Shapiro, ed., The
Oxford Handbook of the Philosophy of Mathematics and Logic (OUP,
2005). (Initially, you can skip the later pages of §3, on Tennant.)
If, however, you are tempted to explore further, this is a terrific resource,
already familiar from the recommended readings on modal logic:
4. Graham Priest, An Introduction to Non-Classical Logic* (CUP, 2nd edi-
tion 2008). As we said before, this treats a whole range of logics system-
141
DRAFT– 31 DEC 2021
11 Other logics?
And, taking a step up in level, here is the same author again vigorously making
the case for taking paraconsistent logics seriously:
You could also follow up Mares’s SEP article by taking a look at his book:
However, I for one am unpersuaded and remain on Burgess’s side of the debate,
at least as far as relevance-via-semantic-revisionism is concerned.
Going now in a very different direction, I mentioned in the previous section
Tennant’s idea of instead buying a certain amount of relevance by restricting
the transitivity of entailment. For a very lucid introductory account, see
8. Neil Tennant, Core Logic (OUP, 2017). This tour-de-force is a rich book,
very well worth reading for its many more general proof-theoretic in-
sights, even if at the end of the day you don’t want to buy the relevantist
aspects.
In the final chapter, by the way, Tennant responds to the technical challenges
laid down by Burgess in §3 of his paper.
142
DRAFT– 31 DEC 2021
Free Logic
143
DRAFT– 31 DEC 2021
11 Other logics?
And how might the defender of our standard FOL logic reply?
144
DRAFT– 31 DEC 2021
Readings on free logic
The debate, all too predictably, will continue. But we have perhaps said enough
to explain why the usual view is that, particularly for the purposes of regimenting
mathematical reasoning, it is quite defensible to stick with a standard logic
(classical or intuitionist) which relies on the presumption that we aren’t talking
about nothing at all. See the reading, though, for how to give an inclusive version
of FOL which allows empty domains, if you do want one.
(b) Introductory subsections on free logic proper to be added!
But for a more detailed overview, you want the very helpful
Or even better:
145
DRAFT– 31 DEC 2021
11 Other logics?
1. Salvatore Florio and Øystein Linnebo, The Many and the One (OUP 2021),
Chapter 2, ‘Taking plurals at face value’.
This is particularly lucid and helpful (though it would have been good to have,
perhaps as an appendix, a full-on, all-the-bells-and-whistles statement of the
rules for the natural deduction systems PFO and PLO+ , rather than a slightly
hands-off description, together with an axiomatic version too).
From the many papers which Linnebo mentions, if I have to choose two as
worth reading here at the outset, I’d perhaps pick these classics:
146
DRAFT– 31 DEC 2021
Readings on plural logic
Boolos’s paper is an influential early defence of the idea that taking plurals
seriously is logically important. Oliver and Smiley reinforce the point that there
is indeed a real topic here: you can’t readily eliminate all plural talk and plural
reasoning in favour e.g. of singular talk and reasoning about sets.
But now where? The book on Plural Predication by Thomas McKay (OUP
2006) is worth reading by philosophers for its discussion of non-distributive pred-
icates, plural descriptions etc. Then for logicians, there is the philosophically
argumentative, occasionally tendentious, and formally very rich tour de force
5. Alex Oliver and Timothy Smiley, Plural Logic (OUP 2013: revised and ex-
panded second edition, 2016).
However, Oliver and Smiley’s eventual logical system in their Chapter 13, ‘Full
plural logic’, will strike many as having (so to speak) unnecessarily many mov-
ing parts, as they aim – all at once – to accommodate empty domains, empty
names, a plural description operator, partial functions, multivalued functions,
even ‘copartial functions’ (which supposedly map nothing to something).
Oliver and Smiley, among others, make quite bold claims for plural logic. For
a critical look at such claims of defenders of plural logic, this is readable and
interesting:
6. Salvatore Florio and Øystein Linnebo, The Many and the One (OUP 2021),
Chapter 3 onwards. According to the blurb, this “provides a systematic anal-
ysis of the relation between this logic and other theoretical frameworks such
as set theory, mereology, higher-order logic, and modal logic. The applications
of plural logic rely on two assumptions, namely that this logic is ontologically
innocent and has great expressive power. These assumptions are shown to be
problematic.”
In particular, the argument – which applies already to simple systems like PLO
– is that the sort of comprehension principle which is built into plural logics is
problematic. Florio and Linnebo propose circumscribing comprehension.
Their book is approachable and argumentative. I in fact think some of Florio
and Linnebo’s arguments are resistible: see my comments on the first two parts
of the book, tinyurl.com/many-one. But well worth reading.
147
DRAFT– 31 DEC 2021
12 Going further
This has been a Guide to beginning mathematical logic. So far, then, the sug-
gested readings on different areas have been at entry level, or only a step or so
up from that. In this final chapter, by contrast, we take a look at some of the
more advanced literature on a selection of topics, taking us another step or two
further.
If you have been tackling enough of the introductory readings, you should
in fact be able to now follow your interests wherever they lead, without really
needing help from this chapter. For a start, you can explore the many mathemat-
ical logic entries in The Stanford Encyclopedia of Philosophy, which are mostly
excellent and have large bibliographies. The substantial essays in the eighteen(!)
volumes of The Handbook of Philosophical Logic are of varying quality, but there
are some good ones on straight mathematical logic topics, again with large bibli-
ographies. Internet sites like math.stackexchange.com and the upper-level math-
overflow.net can be searched for useful lists of recommended books. And then
there is always Google!
However, those resources do cumulatively point to a rather overwhelming
range of literature to pursue. So perhaps some readers will still appreciate a few
more limited menus of suggestions (even if they are less systematic and more
shaped by my personal interests than in the core Guide).
Of course, the ‘vertical’ divisions between entry-level coverage and the further
explorations in this chapter are pretty arbitrary; and the ‘horizontal’ divisions
into different subfields can in places also be quite blurred. But we do need to
impose some organization! So this chapter is divided up as follows. First, we
make a very brief foray into logic-relevant algebra:
There follows a series of sections taking up the core topics of Chapters 5–7 and 9
in the same order as before:
148
DRAFT– 31 DEC 2021
A very little light algebra for logic?
Then there is a final section which introduces a further topic area which is the
focus of considerable recent interest:
1. Barbara Hall Partee, Alice G. B. ter Meulen, and Robert Eugene Wall,
Mathematical Methods in Linguistics (1990, Springer). The (short!) Chs.
9 and 10 introduce some basic concepts of algebra (you can omit §10.3);
Ch. 11 is on lattices; Ch. 12 is then on Boolean and Heyting algebras,
and briefly connects Kripke’s relational semantics for intuitionistic logic
to Heyting algebras.
Then, for rather more about Boolean algebras, you need very little background
to start tackling the opening chapters of
If you already know a smidgin of algebra and topology, however, then there
is a faster-track introduction to Boolean algebras in
4. René Cori and Daniel Lascar, Mathematical Logic, A Course with Exer-
cises: Part I (OUP, 2000), Chapter 2.
And for a higher-level treatment of intuitionistic logic and Heyting algebras, you
could read Chapter 5 of the book by Dummett mentioned in §8.5, or work up
to Chapter 7 on algebraic semantics in the book on modal logic by Chagrov and
Zakharyaschev mentioned in §10.5.
149
DRAFT– 31 DEC 2021
12 Going further
Then, if you want to pursue more generally e.g. questions about when propo-
sitional logics do have nice algebraic counterparts (in the sort of way that classi-
cal and intuitionistic logic relate respectively to Boolean and Heyting Algebras),
then you might get something out of Ramon Jansana’s ‘Algebraic propositional
logic’ in The Stanford Enclyclopedia of Philosophy, tinyurl.com/alg-logic. But this
does strike me as too rushed to be particularly useful. So instead, you could make
a start reading
Now, we noted before in §§3.6(c) and 5.3 that the wide-ranging mathematical
logic texts by Hedman and Hinman cover a substantial amount of model theory.
But why not look at two classic stand-alone treatments of the area which really
choose themselves? In order of both first publication and eventual difficulty:
My suggestion would be to read the first three long chapters of Chang and
Keisler, and then perhaps pause to make a start on
At this point read the first five chapters for a particularly clear intro-
duction.
You could then return to Ch. 4 of C&K to look at (some of) their treatment of
the ultra-product construction, before perhaps putting the rest of their book on
hold and turning to Hodges.
(b) A level up again, here are two further books that should definitely be
mentioned. The first has been around long enough to have become regarded as
a modern standard text. The second is a bit more recent but also comes widely
recommended. Their coverage is significantly different – so I suppose that those
wanting to get really seriously into model theory should take a look at both:
of the book. But others have recommended this text more warmly, so I
mention it as a possibility worth checking out.
8. Bruno Poizat’s A Course in Model Theory (English edition, Springer
2000) starts from scratch and the early chapters give an interesting and
helpful account of the model-theoretic basics, and the later chapters
form a rather comprehensive introduction to stability theory. This often-
recommended book is written in a rather distinctive style, with rather
more expansive class-room commentary than usual: so an unusually en-
gaging read at this sort of level.
Another book which is often mentioned in the same breath as Poizat, Marker,
and now Tent and Ziegler is A Guide to Classical and Modern Model Theory, by
Annalisa Marcja and Carlo Toffalori (Kluwer, 2003) which also covers a lot: but
I prefer the previously listed books.
The next two suggestions are of books which are helpful on particular aspects
of model theory:
9. Kees Doets’s short Basic Model Theory* (CSLI 1996) highlights so-called
Ehrenfeucht games. This is enjoyable and very instructive.
10. Chs. 2 and 3 of Alexander Prestel and Charles N. Delzell’s Mathematical
Logic and Model Theory: A Brief Introduction (Springer 1986, 2011)
are brisk but clear, and can be recommended if you wanting a speedy
review of model theoretic basics. The key feature of the book, however, is
the sophisticated final chapter on serious applications to algebra, which
might appeal to mathematicians with interests in that area.
(d) As an aside, let me also mention the sub-area of Finite Model Theory which
arises particularly from consideration of problems in the theory of computation
(where, of course, we are interested in finite structures – e.g. finite databases
and finite computations over them). What happens, then, to model theory if we
restrict our attention to finite models? Trakhtenbrot’s theorem, for example, tells
that the class of sentences true in any finite model is not recursively enumerable.
So there is no deductive theory for capturing such finitely valid sentences (that’s
a surprise, given that there’s a complete deductive system for the sentences which
are valid in the usual broader sense!). It turns out, then, that the study of finite
models is surprisingly rich and interesting. So why not dip into one or other of
13. Heinz-Dieter Ebbinghaus and Jörg Flum, Finite Model Theory (Springer
2nd edn. 1999).
As a sort of sequel, there is also another volume in the Oxford Logic Guides series
for enthusiasts with more background in model theory, namely Roman Kossak
and James Schmerl, The Structure of Models of Peano Arithmetic, OUP, 2006.
But this is much tougher going. For a more accessible set of excellent lecture
notes, see
Next, going in a rather different direction, and explaining a lot about arith-
metics weaker than full PA, here’s another modern classic:
153
DRAFT– 31 DEC 2021
12 Going further
And what about going beyond first-order PA? We know that full second-
order PA (where the second-order quantifiers are constrained to run over all
possible sets of numbers) is unaxiomatizable, because the underlying second-
order logic is unaxiomatiable. But there are axiomatizable subsystems of second
order arithmetic. These are wonderfully investigated in another encyclopaedic
modern classic:
(b) Next, Gödelian incompleteness again. You could start with a short old
Handbook article which is still well worth reading:
154
DRAFT– 31 DEC 2021
More on formal arithmetic and computability
sets which have a finite number of members which in turn have a finite
number of members which in turn . . . where all downward membership
chains bottom out with the empty set). Relying on this fact gives us
another route in to proofs of Gödelian incompleteness, and other results
of Church, Rosser and Tarski. Beautifully done.
After these, where should you go if you want to know more about matters
more or less directly to do with the incompleteness theorems?
155
DRAFT– 31 DEC 2021
12 Going further
And if you want the bumpier ride of a lecture course with problems assigned as
you go along, this is notable:
(c) Now let’s turn to books on computability. Among the Big Books on math-
ematical logic, the one with the most useful treatment is probably
However, good those these chapters are, I’d still recommend starting your more
advanced work on computability with
And of more recent books covering computability at this level, I also particularly
like
classic, written at the end of the glory days of the initial development of
the logical theory of computation. It quite speedily gets advanced. But
the actin-packed opening chapters are excellent. At least take it out of
the (e)library, read a few chapters, and admire!
18. Piergiorgio Odifreddi, Classical Recursion Theory, Vol. 1 (North Holland,
1989) is well-written and discursive, with numerous interesting asides.
It’s over 650 pages long, so it goes further and deeper than other books
on the main list above (and then there is Vol. 2). But it certainly starts off
quite gently paced and very accessible and can be warmly recommended
for consolidating and then extending your knowledge.
(d) Classical computability theory abstracts away from considerations of prac-
ticality, efficiency, etc. Computer scientists are – surprise, surprise! – interested
in the theory of feasible computation, and any logician should be interested in
finding out at least a little about the topic of computational complexity. Here
are three introductions to the topic, in order of increasing detail:
19. Herbert E. Enderton, Computability Theory: An Introduction to Recu-
sion Theory (Associated Press, 2011). Chapter 7.
20. Shawn Hedman A First Course in Logic (OUP 2004): Ch. 7 on ‘Com-
putability and complexity’ has a nice review of basic computability the-
ory before some lucid sections discussing computational complexity.
21. Michael Sipser, Introduction to the Theory of Computation (Thomson,
2nd edn. 2006) is a standard and very well regarded text on computation
aimed at computer scientists. It aims to be very accessible and to take its
time giving clear explanations of key concepts and proof ideas. I think
this is very successful as a general introduction and I could well have
mentioned the book before. But I’m highlighting the book now because
its last third is on computational complexity.
And for more expansive, stand-alone treatments, here are three more suggestions:
22. I don’t mention many sets of lecture notes in this Guide, as they tend
to be rather too terse for self-study. But Ashley Montanaro has an ex-
cellent and extensive lecture notes on Computational Complexity, lucid
and detailed. Available at tinyurl.com/cocomp.
23. Oded Goldreich, P, NP, and NP-Completeness (CUP, 2010). Short,
clear, and introductory stand-alone treatment.
24. You could also look at the opening chapters of the pretty encyclopaedic
Sanjeev Arora and Boaz Barak Computational Complexity: A Modern
Approach (CUP, 2009). The authors say that ‘[r]equiring essentially no
background apart from mathematical maturity, the book can be used as
a reference for self-study for anyone interested in complexity, including
physicists, mathematicians, and other scientists, as well as a textbook for
a variety of courses and seminars.’ And at least it starts very readably! A
late draft of the book can be freely downloaded from tinyurl.com/arora.
157
DRAFT– 31 DEC 2021
12 Going further
Levy’s book ends with a discussion of some ‘large cardinals’. However another
much admired older book remains the recommended first treatment of this topic:
158
DRAFT– 31 DEC 2021
More on mainstream set theory
For some other topics you could also look at the second volume of a book whose
first instalment was a main recommendation in §7.2:
4. Winfried Just and Martin Weese, Discovering Modern Set Theory II:
Set-Theoretic Tools for Every Mathematician (American Mathematical
Society, 1997).
This contains, as the authors put it, “short but rigorous introductions
to various set-theoretic techniques that have found applications outside
of set theory”. Some interesting topics, and can be read independently
of Vol. I.
(c) But now the crucial next step – that perhaps marks the point where set
theory gets really challenging – is to get your head around Cohen’s idea of forcing
used in independence proofs. However, there is not getting away from it, this is
tough. In the admirable
Chow writes:
159
DRAFT– 31 DEC 2021
12 Going further
Kunen has since published another, totally rewritten, version of this book as
Set Theory* (College Publications, 2011). This later book is quite significantly
longer, covering an amount of more difficult material that has come to promi-
nence since 1980. Not just because of the additional material, my current sense
is that the earlier book may remain the somewhat gentler read.
Now, Kunen’s classic text takes a ‘straight down the middle’ approach, start-
ing with what is basically Cohen’s original treatment of forcing, though he does
relate this to some other approaches. Here are two of them:
7. Raymond Smullyan and Melvin Fitting, Set Theory and the Continuum
Problem (OUP 1996, Dover Publications 2010). This medium-sized book
is divided into three parts. Part I is a nice introduction to axiomatic set
theory (in fact, officially in its NBG version – see §12.5). The shorter
Part II concerns matters round and about Gödel’s consistency proofs via
the idea of constructible sets. Part III gives a different take on forcing.
This is beautifully done, as you might expect from two writers with
a quite enviable knack for wonderfully clear explanations and an eye for
elegance.
8. Keith Devlin, The Joy of Sets (Springer 1979, 2nd edn. 1993) Ch. 6 intro-
duces the idea of Boolean-Valued Models and their use in independence
proofs. The basic idea is fairly easily grasped, but the details perhaps
trickier.
For more on this theme, see John L. Bell’s classic Set Theory: Boolean-
Valued Models and Independence Proofs (Oxford Logic Guides, OUP, 3rd
edn. 2005). The relation between this approach and other approaches to
forcing is discussed e.g. in Chow’s paper and the last chapter of Smullyan
and Fitting.
(d) Here is a selection of another four books with various virtues, in order of
publication:
160
DRAFT– 31 DEC 2021
Choice, and the choice of set theory
the axioms of set theory. In the last part, some topics of classical set
theory are revisited and further developed in the light of forcing.”
True, this book gets quite hairy towards the end: but the earlier parts
of the book should be much more accessible. This book has been strongly
recommended for its expositional merits by more reliable judges than me;
but I confess I didn’t find it notably more successful than other accounts
of forcing. A late draft of the book is available: tinyurl.com/halb-set.
11. Nik Weaver, Forcing for Mathematicians (World Scientific, 2014) is less
than 150 pages (and the first applications of the forcing idea appear
after just 40 pages: you don’t have to read the whole book to get the
basics). From the blurb: “Ever since Paul Cohen’s spectacular use of the
forcing concept to prove the independence of the continuum hypothesis
from the standard axioms of set theory, forcing has been seen by the
general mathematical community as a subject of great intrinsic interest
but one that is technically so forbidding that it is only accessible to spe-
cialists ... This is the first book aimed at explaining forcing to general
mathematicians. It simultaneously makes the subject broadly accessible
by explaining it in a clear, simple manner, and surveys advanced appli-
cations of set theory to mainstream topics.” This does strike me as a
helpful attempt to solve Chow’s basic exposition problem, to explain the
Big Ideas very directly.
12. Ralf Schindler, Set Theory: Exploring Independence and Truth (Springer,
2014). The book’s theme is “the interplay of large cardinals, inner mod-
els, forcing, and descriptive set theory”. It doesn’t presume you already
know any set theory, though it does proceed at a cracking pace in a
brisk style. But, if you already have some knowledge of set theory, this
seems a clear and interesting exploration of some themes highly relevant
to current research.
161
DRAFT– 31 DEC 2021
12 Going further
And for a short book also explaining some of the consequences of AC (and some
of the results that you need AC to prove), see
Herrlich perhaps already tells you more than enough about the impact of AC:
but there’s also a famous book by H. Rubin and J.E. Rubin, Equivalents of the
Axiom of Choice (North-Holland 1963; 2nd edn. 1985) worth browsing through:
it gives over two hundred equivalents of AC!
Then next there is the nice short classic
3. Thomas Jech, The Axiom of Choice* (North-Holland 1973, Dover Publi-
cations 2008). This proves the Gödel and Cohen consistency and indepen-
dence results about AC (without bringing into play everything needed
to prove the parallel results about the Continuum Hypothesis). In par-
ticular, there is a nice presentation of the so-called Fraenkel-Mostowski
method of using ‘permutation models’. Then later parts of the book tell
us something about mathematics without choice, and about alternative
axioms that are inconsistent with choice.
And for a more recent short book, taking you into new territories (e.g. making
links with category theory), enthusiasts might enjoy
(b) From earlier reading you should certainly have picked up the idea that,
although ZFC is the canonical modern set theory, there are other theories on
the market. I mention just a selection here (I’m not suggesting you need to follow
up all these pointer – but it is worth stressing again that set theory is not quite
the monolithic edifice that some presentations might suggest).
For a brisk overview, putting many of the various set theories we’ll consider
below into some sort of order, and mentioning yet further alternatives, see
At this stage, you might well find this a bit too brisk and allusive, but it is useful
to give you a preliminary sense of the range of possibilities here. And I should
mention that there is a longer version of this essay which you can return to later:
6. M. Randall Holmes, Thomas Forster and Thierry Libert. ‘Alternative
set theories’. In Dov Gabbay, Akihiro Kanamori, and John Woods, eds.
Handbook of the History of Logic, vol. 6, Sets and Extensions in the
Twentieth Century, pp. 559-632. (Elsevier/North-Holland 2012).
162
DRAFT– 31 DEC 2021
Choice, and the choice of set theory
(c) It quickly becomes clear that some alternative set theories are more alter-
native than others! So let’s start with the one which is the closest sibling to
standard ZFC, namely NBG. You will have very probably come across mention
of this already (e.g. even in the early pages of Enderton’s set theory book).
We know that the universe of sets in ZFC is not itself a set. But we might
think that this universe is a sort of big collection. Should we explicitly recognize,
then, two sorts of collection, sets and (as they are called in the trade) proper
classes which are too big to be sets? Some standard presentations of ZFC, such
as Kunen’s, do indeed introduce symbolism for classes, but then make it clear
that class-talk is just a useful short-hand that can be translated away. NBG
(named for von Neumann, Bernays, Gödel: some say VBG) takes classes a bit
more seriously. But things are a little delicate: it is a nice question just what
NBG commits us to. An important technical feature is that its principle of class
comprehension is ‘predicative’; i.e. quantified variables in the defining formula
for a class can’t range over proper classes but range only over sets. Because of
this we get a conservative extension of ZFC (nothing in the language of sets can
be proved in NBG which can’t already be proved in ZFC). For more, see:
8. Michael Potter, Set Theory and Its Philosophy (OUP 2004) Appendix C
is a brisker account of NBG and of other theories with classes as well as
sets.
Then, if you want detailed presentations of set-theory via NBG, you can see
either or both of
(d) Recall, earlier in the Guide, we very warmly recommended Michael Potter’s
book which we just mentioned again. This presents a version of an axiomatiza-
tion of set theory due to Dana Scott (hence ‘Scott-Potter set theory’, SP). This
axiomatization is consciously guided by the conception of the set theoretic uni-
verse as built up in levels (the conception that, supposedly, also warrants the
axioms of ZF). What Potter’s book aims to reveal is that we can get a rich hier-
archy of sets, more than enough for mathematical purposes, without committing
ourselves to all of ZFC (whose extreme richness comes from the full Axiom of
Replacement). If you haven’t read Potter’s book before, now is the time to look
at it. Also, for a slightly simplified presentation of SP, see
163
DRAFT– 31 DEC 2021
12 Going further
11. Tim Button, ‘Level Theory, Part I’, Bulletin of Symbolic Logic, preprint
available at tinyurl.com/level-th.
(e) We now turn to a somewhat more radical departure from standard ZF(C),
namely ZFA (i.e. ZF − AF + AFA)
Here again is the now-familiar hierarchical conception of the set universe: We
start with some non-sets (maybe zero of them in the case of pure set theory). We
collect them into sets (as many different ways as we can). Now we collect what
we’ve already formed into sets (as many as we can). Keep on going, as far as we
can. On this ‘bottom-up’ picture AF, the Axiom of Foundation, is compelling
(any downward chain linked by set-membership will bottom out, and won’t go
round in a circle).
But here’s another alternative conception of the set universe. Think of a set as
a gadget that points you at some some things, its members. And those members,
if sets, point to their members. And so on and so forth. On this ‘top-down’
picture, the Axiom of Foundation is not so compelling. As we follow the pointers,
can’t we for example come back to where we started? It is well known that in
much of the usual development of ZFC the Axiom of Foundation AF does little
work. So what about considering a theory of sets ZFA which drops AF and
instead has an Anti-Foundation Axiom, AFA, which allows self-membered sets?
To explore this idea, see
12. Start with Lawrence S. Moss, ‘Non-wellfounded set theory’, The Stanford
Encyclopedia of Philosophy, tinyurl.com/sep-zfa.
13. Keith Devlin, The Joy of Sets (Springer, 2nd edn. 1993), Ch. 7. The last
chapter of Devlin’s book, added in the second edition of his book, starts
with a very lucid introduction, and develops some of the theory.
14. Peter Aczel, Non-well-founded Sets (CSLI Lecture Notes 1988). This is
a very readable short classic book, available at tinyurl.com/aczel.
15. Luca Incurvati, ‘The graph conception of set’ Journal of Philosophical
Logic (2014) pp. 181-208, or his Conceptions of Set and the Foundations
of Mathematics (CUP, 2020), Ch. 7, very illuminatingly explores the
motivation for such set theories.
(f) Now for a much more radical departure from ZF.
Standard set theory lacks a universal set because, together with other stan-
dard assumptions, the idea that there is a set of all sets leads to contradiction.
But by tinkering with those other assumptions, there are coherent theories with
universal sets, of which Quine’s ‘New Foundations’ is the probably the best
known. For the headline news, see
16. T. F. Forster, ‘Quine’s New Foundations’, The Stanford Encyclopedia of
Philosophy, tinyurl.com/quine-nf.
For a full-blown but very readable presentation concentrating on NFU (‘New
Foundations’ with urelements), and explaining motivations as well as technical
details, see
164
DRAFT– 31 DEC 2021
Choice, and the choice of set theory
17. M. Randall Holmes, Elementary Set Theory with a Universal Set (Cahiers
du Centre de Logique No. 10, Louvain, 1998). Now freely available at
tinyurl.com/holmesnf.
The following is rather tougher going, though with many interesting ideas:
18. T. F. Forster, Set Theory with a Universal Set Oxford Logic Guides 31
(Clarendon Press, 2nd edn. 1995).
(g) Famously, Zermelo constructed his theory of sets by gathering together
some principles of set-theoretic reasoning that seemed actually to be used by
working mathematicians (engaged in e.g. the rigorization of analysis or the de-
velopment of point set topology), hoping to get a theory strong enough for
mathematical use while weak enough to avoid paradox. The later Axiom of Re-
placement was added in much the same spirit. But does the result overshoot?
We’ve already noted that SP is a weaker theory which may suffice. For a more
radical approach, see this very engaging short piece:
19. Tom Leinster, ‘Rethinking set theory’. Gives an advertising pitch for the
merits of Lawvere’s Elementary Theory of the Category of Sets (ETCS).
tinyurl.com/leinst.
21. Laura Crosilla, ‘Set Theory: Constructive and Intuitionistic ZF’, The
Stanford Encyclopedia of Philosophy, tinyurl.com/crosilla.
Second, you’ll recall from elementary model theory that Abraham Robinson
developed a rigorous formal treatment that takes infinitesimals seriously. Later,
a simpler and arguably more natural approach, based on so-called Internal Set
Theory, was invented by Edward Nelson. He advertises it here:
22. Edward Nelson, ‘Internal Set Theory: a new approach to nonstandard
analysis’, Bulletin of The American Mathematical Society 83 (1977), pp.
1165–1198. tinyurl.com/nelson-ist.
165
DRAFT– 31 DEC 2021
12 Going further
You can follow that up by looking at the approachable early chapters of Nader
Vakin’s Real Analysis through Modern Infinitesimals (CUP, 2011), a monograph
developing Nelson’s ideas.
(b) And now the paths through proof theory fork. One path investigates what
happens when we tinker with the structural rules shared by classical and intu-
itionistic logic.
Note for example the inference which takes us from the trivial P ` P by
weakening to P, Q ` P and on, via conditional proof, to P ` Q → P . If we
want a conditional that conforms better to intuitive constraints of relevance,
then we need to block that proof: is ‘weakening’ the culprit? The investigation
of what happens if we vary rules such as weakening belongs to ‘substructural
logic’, whose concerns are outlined in
And the place to continue exploring these themes at length is the same author’s
(c) Another path forward picks up from Gentzen’s proof of the consistency of
arithmetic. Recall, that depends on transfinite induction along ordinals up to
ε0 ; and the fact that it requires just this much transfinite induction to prove the
consistency of first-order PA is an important characterization of the strength of
the theory.
The project of ‘ordinal analysis’ in proof theory aims to provide comparable
characterizations of other theories in terms of the amount of transfinite induction
1 Warning: there are, I am told, some confusing misprints in the cut-elimination proof.
166
DRAFT– 31 DEC 2021
Higher-order logic, the lambda calculus, and type theory
that is needed to prove their consistency. Things do get quite hairy quite quickly,
however. But you can start from two very useful sets of notes for mini courses:
4. Michael Rathjen, ‘The realm of ordinal analysis’ and ‘Proof theory: from
arithmetic to set theory’, downloadable from tinyurl.com/rath-art and
tinyurl.com/rath-ast.
5. Wolfram Pohlers, Proof Theory: The First Step into Impredicativity (Spr-
inger 2009). This book officially has introductory ambitions, focusing on
ordinal analysis. However, I would judge that it requires quite an amount
of mathematical sophistication from its reader. From the blurb: “As a
‘warm up’ Gentzen’s classical analysis of pure number theory is presented
in a more modern terminology, followed by an explanation and proof of
the famous result of Feferman and Schütte on the limits of predicativity.”
The first half of the book is probably manageable if (but only if) you
already have done some of the other reading. But then the going indeed
gets pretty tough.
6. H. Schwichtenberg and S. Wainer, Proofs and Computations (Associ-
ation of Symbolic Logic/CUP 2012) “studies fundamental interactions
between proof-theory and computability”. The first four chapters, at any
rate, will be of wide interest, giving another take on some basic mate-
rial and should be manageable given enough background. However, to
my surprise, I found the book to be not particularly well written and I
wonder if it sometimes makes heavier weather of its material than seems
really necessary. Still, worth getting to grips with.
167
DRAFT– 31 DEC 2021
12 Going further
But the first of these mostly revisits second-order logic at a probably quite
unnecessarily sophisticated level for now, so don’t get bogged down. The second
gives us pointers forward, but is perhaps also rather too rushed.
Still, as you’ll see from Coquand, basic topics to pursue include Simple Type
Theory and the lambda calculus. For a clear and gentle introduction to the latter,
see the first seven chapters of the following welcome short book which doesn’t
assume much mathematical background:
Next, as a spur to keep going, you might find this advocacy interesting:
And then for a bit more on Simple Type Theory/Church’s Type Theory, though
once more this is less than ideal, you could look at
But then where to go next will depend on your interests and on how much more
you want to know. The book we want, Type Theories for Logicians, A Gentle
Introduction, has yet to be written!
And a complicating factor is that a lot of current work on type theory is bound
up with constructivist ideas developing the BHK conception that ties the content
of a proposition to its proofs (for example, an implication A → C corresponds
to a type of function taking a proof A to a proof of C). This correspondence
between propositions and types of functions gets developed into the so-called
Curry-Howard correspondence or isomorphism. See
168
DRAFT– 31 DEC 2021
Higher-order logic, the lambda calculus, and type theory
6. Peter Dybjer and Erik Palmgren, ‘Intuitionistic type theory’, The Stan-
ford Encyclopedia of Philosophy, tinyurl.com/sep-ITT.
But again, this isn’t easy going.
Without a Gentle Introduction to hand, you will have to make do with ex-
ploring the following initial suggestions they take you! In order of publication
date:
7. Henk P. Barendregt, The Lambda Calculus: Its Syntax and Semantics*
(Originally 1980, reprinted by College Publications 2012). This is the
weighty standard text: but the opening chapters are fairly accessible.
8. Peter Andrews, An Introduction to Mathematical Logic and Type The-
ory: To Truth Through Proof (Academic Press, 1986). Chapter 5, under
50 pages, is a classic introduction to a version of Church’s type the-
ory developed by Andrews. It is often recommended, and worth battling
through; but it is a rather terse bit of old-school exposition.
9. J. Roger Hindley, Basic Simple Type Theory (CUP, 1997). This short
book is another classic, but again it is pretty terse. Worth making a
start, but perhaps, in the end, mostly for those whose main interest is in
computer science applications of type theory in the design of higher-level
programming languages like ML.
10. Benjamin C. Pierce, Types and Programming Languages (MIT Press,
2002). A frequently-recommended text for computer scientists, and read-
able by others if you skip over some parts about implementation in ML.
The first dozen or so shortish chapters are indeed relatively discursive
and accessible.
11. Morten Heine Sørensen and Pawel Urzyczyn, Lectures on the Curry-
Howard Isomorphism (Elsevier, 2006). This engaging book ranges much
more widely than the title might suggest!
12. J. Roger Hindley and Jonathan P. Seldin, Lambda-Calculus and Combi-
nators: An Introduction (CUP 2008). Attractively and clearly written,
aiming to avoid excess technicalities. More of the feel of a modern maths
book. Recommended.
13. Rob Nederpelt and Hedman Geuvers, Type Theory and Formal Proof:
An Introduction (CUP 2014). Focuses, the authors say, “on the use of
types and lambda terms for the complete formalisation of mathematics”,
so promises to be of particular interest to mathematical logicians. Also
attractively and clearly written (as these things go!).
Then, pointing in a different direction, you might also want to follow up
14. Peter Dybjer and Erik Palmgren, ‘Intuitionistic type theory’, The Stan-
ford Encyclopedia of Philosophy, tinyurl.com/sep-ITT.
And finally, I suppose I should finish by mentioning again one particular new
incarnation of type theory:
169
DRAFT– 31 DEC 2021
12 Going further
170
DRAFT– 31 DEC 2021
Index
171
DRAFT– 31 DEC 2021
Index
174