Logic For Mathematicians Full
Logic For Mathematicians Full
Joseph R. Mileti
January 9, 2014
Contents
1 Introduction
1.1 The Nature of Mathematical Logic . .
1.2 The Language of Mathematics . . . .
1.3 Syntax and Semantics . . . . . . . . .
1.4 The Point of It All . . . . . . . . . . .
1.5 Some Basic Terminology and Notation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
12
13
13
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
16
18
19
19
20
20
21
24
24
26
27
3 Propositional Logic
3.1 The Syntax of Propositional Logic . . . . . .
3.1.1 Standard Syntax . . . . . . . . . . . .
3.1.2 Polish Notation . . . . . . . . . . . . .
3.1.3 Official Syntax and Our Abuses of It .
3.1.4 Recursive Definitions . . . . . . . . . .
3.2 Truth Assignments and Semantic Implication
3.3 Boolean Functions and Connectives . . . . . .
3.4 Syntactic Implication . . . . . . . . . . . . . .
3.4.1 Motivation . . . . . . . . . . . . . . .
3.4.2 Official Definitions . . . . . . . . . . .
3.4.3 Examples Of Deductions . . . . . . . .
3.4.4 Theorems about ` . . . . . . . . . . .
3.5 Soundness and Completeness . . . . . . . . .
3.5.1 The Soundness Theorem . . . . . . . .
3.5.2 The Completeness Theorem . . . . . .
3.6 Compactness and Applications . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
31
32
33
34
36
39
41
41
41
42
43
45
45
46
49
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
3.6.1
3.6.2
3.6.3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
53
53
58
58
62
64
66
69
69
72
72
73
76
76
76
77
77
77
79
81
81
81
83
84
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
87
. 87
. 89
. 90
. 90
. 91
. 99
. 100
. 102
7 Quantifier Elimination
7.1 Motivation and Definition . . . . . . .
7.2 What Quantifier Elimination Provides
7.3 Quantifier Manipulation Rules . . . .
7.4 Examples of Theories With QE . . . .
7.5 Algebraically Closed Fields . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
49
50
.
.
.
.
.
.
.
.
.
.
105
105
106
107
108
110
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
125
125
126
130
131
132
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
135
135
137
138
139
139
143
146
146
147
148
148
151
152
152
152
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
155
155
159
163
165
166
Axiom Of Choice
169
Use of the Axiom of Choice in Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Equivalents of the Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
The Axiom of Choice and Cardinal Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Model Theory
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
175
175
175
175
176
CONTENTS
14.1.4 Borel Sets . . . . . . . . . . . .
14.1.5 Measurable Sets . . . . . . . .
14.2 The Size of Models . . . . . . . . . . .
14.2.1 Controlling the Size of Models
14.2.2 Counting Models . . . . . . . .
14.3 Ultraproducts and Compactness . . .
14.3.1 Ultrafilters . . . . . . . . . . .
14.3.2 Ultraproducts . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
177
178
178
178
179
180
180
182
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
187
187
191
193
195
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
197
197
199
199
199
201
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
213
213
215
215
215
216
217
217
218
218
219
220
221
222
Chapter 1
Introduction
1.1
CHAPTER 1. INTRODUCTION
We gave a mathematical definition of a deduction, so whats wrong with using mathematics to
prove things about deductions? Theres obviously a real world of true mathematics, and were
just working in that world to build a certain model of mathematical reasoning which is susceptible
to mathematical analysis. Its quite cool, really, that we can subject mathematical proofs to a
mathematical study by building this internal model. All of this philosophical speculation and
worry about secure foundations is tiresome and in the end probably meaningless. Lets get on
with the subject!
Should we be so dismissive of the first philosophically inclined student? The answer, or course, depends
on your own philosophical views, but Ill try to give my own views as a mathematician specializing in logic
with a definite interest in foundational questions. It is my firm belief that you should put all philosophical
questions out of your mind during a first reading of the material (and perhaps forever, if youre so inclined),
and come to the subject with a point of view which accepts an independent mathematical reality susceptible
to the mathematical analysis youve grown accustomed to. In your mind, you should keep a careful distinction
between normal real mathematical reasoning and the formal precise model of mathematical reasoning we
are developing. Some people like to give this distinction a name by calling the normal mathematical realm
were working in the metatheory.
To those who are interested, well eventually be able to give reasonable answers to the first student and
provide other respectable philosophical accounts of the nature of mathematics, but this should wait until
weve developed the necessary framework. Once weve done so, we can give examples of formal theories,
such as first-order set theory, which are able to support the entire enterprise of mathematics including
mathematical logic. This is of great philosophical interest, because this makes it possible to carry out
(nearly) all of mathematics inside this formal theory.
The ideas and techniques that were developed with philosophical goals in mind have now found application
in other branches of mathematics and in computer science. The subject, like all mature areas of mathematics,
has also developed its own very interesting internal questions which are often (for better or worse) divorced
from its roots. Most of the subject developed after the 1930s has been concerned with these internal and
tangential questions, along with applications to other areas, and now foundational work is just one small
(but still important) part of mathematical logic. Thus, if you have no interest in the more philosophical
aspects of the subject, there remains an impressive, beautiful, and mathematically applicable theory which
is worth your attention.
1.2
Our first and probably most important task in providing a mathematical model of mathematics is to deal with
the language of mathematics. In this section, we sketch the basic ideas and motivation for the development
of a language, but we will leave precise detailed definitions until later.
The first important point is that we should not use English (or any other natural language) because
its constantly changing, often ambiguous, and allows the construction of statements that are certainly not
mathematical and/or arguably express very subjective sentiments. Once weve thrown out natural language,
our only choice is to invent our own formal language. This seems quite daunting. How could we possibly
write down one formal language which encapsulates geometry, algebra, analysis, and every other field of
mathematics, not to mention those we havent developed yet, without using natural language? Our approach
to this problem will be to avoid (consciously) doing it all at once.
Instead of starting from the bottom and trying to define primitive mathematical statements which cant
be broken down further, lets first think about how to build new mathematical statements from old ones. The
simplest way to do this is take already established mathematical statements and put them together using
and, or, not, and implies. To keep a careful distinction between English and our language, well introduce
symbols for each of these, and well call these symbols connectives.
10
CHAPTER 1. INTRODUCTION
Although it is customary and certainly easier on the eyes to put between two elements of the group,
lets instead use the standard function notation in order to make the mathematical notation uniform across
different areas. In this setting, a group is a set G equipped with a function f : G G G and an element
e satisfying
1. For all x, y, z G, we have f (f (x, y), z) = f (x, f (y, z)).
2. For all x G, we have f (x, e) = x = f (e, x).
3. For all x G, there exists y G such that f (x, y) = e = f (y, x).
In order to allow our language to make statement about groups, we introduce a function symbol which we
denote by f to represent the group operation, and a constant symbol which we denote by e to represent the
group identity. Now the group operation is supposed to take in two elements of the group, so if x and y are
variables, then we should allow the formation of f(x, y) which should denote an element of the group (once
weve assigned elements of the group to x and y). Also, we should allow the constant symbol to be used
in this way, allowing us to form things like f(x, e). Once weve formed these, we should be allowed to use
them like variables in more complicated expressions such as f(f(x, e), y). Each of these expressions formed
by putting together, perhaps repeatedly, variables and the constant symbol e using the function symbol f is
called a term. Intuitively, a term will name a certain element of the group once weve assigned elements to
the variables.
With a way to name group elements in hand, were now in position to say what out primitive statements
are. The most basic thing that we can say about two group elements is whether or not they are equal, so
we introduce a new equality symbol, which we will denote by the customary =. Given two terms t1 and t2 ,
we call the expression (t1 = t2 ) an atomic formula. These are our primitive statements.
With atomic formulas in hand, we can use the old connectives and the new quantifiers to make new
statements. This puts us in a position to define formulas. First off, all atomic formulas are formulas. Given
formulas we already know, we can put them together using the connectives above. Also, if is a formula
and x is a variable then each of the following is a formula:
1. x
2. x
Perhaps without realizing it, weve described quite a powerful language which can make many nontrivial
statements. For instance, we can write formulas in this language which express the axioms for a group:
1. xyz(f(f(x, y), z) = f(x, f(y, z)))
2. x((f(x, e) = x) (f(e, x) = x))
3. xy((f(x, y) = e) (f(y, x) = e))
We can also write a statement saying that the group is abelian:
xy(f(x, y) = f(y, x))
or that the center is nontrivial:
x((x = e) y(f(x, y) = f(y, x)))
Perhaps unfortunately, we can write syntactically correct formulas which express things nobody would ever
utter, such as:
xyx((e = e))
What if you want to consider an area other than group theory? Commutative ring theory doesnt pose
much of a problem, so long as were allowed to alter the number of function symbols and constant symbols.
11
We can simply have two function symbols a and m which take two arguments (a to represent addition and
m to represent multiplication) and two constant symbols 0 and 1 (0 to represent the additive identity and 1
to represent the multiplicative identity). Writing the axioms for commutative rings in this language is fairly
straightforward.
To take something fairly different, what about the theory of partially ordered sets? Recall that a partially
ordered set is a set P equipped with a subset of P P , where we write x y to mean than (x, y) is an
element of this subset, satisfying
1. Reflexive: For all x P , we have x x.
2. Antisymmetric: If x, y P are such that x y and y x, then x = y.
3. Transitive: If x, y, z P are such that x y and y z, then x z.
Analogous to the syntax we used when handling the group operation, we will use notation which puts the
ordering in front of the two arguments. Doing may seem odd at this point, given that were putting equality
in the middle, but well see that this provides a unifying notation for other similar objects. We thus introduce
a relation symbol R (intuitively representing ), and we keep the equality symbol =, but we no longer have
a need for constant symbols or function symbols.
In this setting without constant or function symbols, the only terms that we have (i.e. the only names for
elements of the partially ordered set) are the variables. However, our atomic formulas are more interesting
because now there are two basic things we can say about elements of the partial ordering: whether they are
equal and whether they are related by the ordering. Thus, our atomic formulas are things of the form t1 = t2
and R(t1 , t2 ) where t1 and t2 are terms. From these atomic formulas, we build up all our formulas as above.
We can know write formulas expressing the axioms of partial orderings:
1. xR(x, x)
2. xy((R(x, y) R(y, x)) (x = y))
3. xyz((R(x, y) R(y, z)) R(x, z))
We can also write a statement saying that the partial ordering is a linear ordering:
xy(R(x, y) R(y, x))
or that there exists a maximal element:
xy(R(x, y) (x = y))
The general idea is that by leaving flexibility in the types and number of constant symbols, relation
symbols, and function symbols, well be able to handle many areas of mathematics. We call this setup
first-order logic. An analysis of first-order logic will consume the vast majority of our time.
Now we dont claim that first-order logic allows us to do and express everything in mathematics, nor do
we claim that each of the setups above allow us to do and express everything of importance in that particular
field. For example, take the group theory setting above. We can express that every nonidentity element has
order two with:
x(f(x, x) = e)
but its unclear how to say that every element of the group has finite order. The natural guess is:
xn(xn = e)
but this poses a problem for two reasons. The first is that our variables are supposed to quantify over
elements of the group in question, not the natural numbers. The second is that we put no construction in
12
CHAPTER 1. INTRODUCTION
our language to allow us to write something like xn . For each fixed n, we can express it (for example, for
n = 3, we can write f(f(x, x), x) and for n = 4, we can write f(f(f(x, x), x), x)), but its not clear how to write
it in a general way that would even allow quantification over the natural numbers.
Another example is trying to express that a group is simple (i.e. has no nontrivial normal subgroups).
The natural instinct is to quantify over all subsets H of the group G, and say that if it so happens that
H is a normal subgroup, then H is either trivial or everything. However, we have no way to quantify over
subsets. Its certainly possible to allow such constructions, and this gives second-order logic. If you allow
quantifications over sets of subsets (for example one way of expressing that a ring is Noetherian is to say
that every nonempty set of ideals has a maximal element), you get third-order logic, etc.
Newcomers to the field often find it strange that we focus primarily on first-order logic. There are many
reasons to give special attention to first-order logic that will be developed throughout our study, but for
now you should think of it as providing a simple example of a language which is capable of expressing many
important aspects of various branches of mathematics.
1.3
In the above discussion we introduced symbols to denote certain concepts (such as using in place of
and, in place of for all, and a function symbol f in place of the group operation f ). Building and
maintaining a careful distinction between formal symbols and how to interpret them is a fundamental aspect
of mathematical logic.
The basic structure of the formal statements that we write down using the symbols, connectives, and
quantifiers is known as the syntax of the logic that were developing. This corresponds to the grammar
of the language in question with no thought given to meaning. Imagine an English instructor who cared
nothing for the content of your writings, but only that the it was grammatically correct. That is exactly
what the syntax of a logic is all about. Syntax is combinatorial in nature and is based on rules which provide
admissible ways to manipulate symbols devoid of meaning.
The manner in which we are permitted (or forced) to interpret the symbols, connectives, and quantifiers is
known as the semantics of the the logic that were developing. In a logic, some symbols are to be interpreted
in only one way. For instance, in the above examples, we interpret the symbol to mean and. In the
propositional logic setting, this doesnt settle how to interpret a formula because we havent said how to
interpret the elements of P . We have some flexibility here, but once we assert that we should interpret
certain elements of P as true and the others as false, our formulas express statements that are either true
or false.
The first-order logic setting is more complicated. Since we have quantifiers, the first thing that must be
done in order to interpret a formula is to fix a set X which will act as the set of objects over which the
quantifiers will range. Once this is done, we can interpret each function symbol f taking k arguments as an
actual function f : X k X, each relation R symbol taking k arguments as a subset of X k , and each constant
symbol c as an element of X. Once weve fixed what were talking about by provided such interpretations,
we can view them as expressing something meaningful. For example, if weve fixed a group G and interpreted
f as the group operation and e as the identity, the formula
xy(f(x, y) = f(y, x))
is either true or false, according to whether G is abelian or not.
Always keep the distinction between syntax and semantics clear in your mind. Many basic theorems of
the subject involve the interplay between syntax and semantics. For example, in the logics we discuss, we will
have two types of implication between formulas. Let be a set of formulas and let be a formula. One way
of saying that the formulas in imply is semantic: whenever we provide an interpretation which makes all
of the formulas of true, it happens that is also true. For instance, if were working in propositional logic
13
and we have = {((A B) C)} and = (A C), then implies in this sense because no matter how we
assign true/false values to A, B, and C that make the formulas in true, it happens that will also be true.
Another approach that well develop is syntactic. Well define deductions which are formal proofs built
from certain permissible syntactic manipulations, and will imply in this sense if there is a witnessing
deduction. The Soundness Theorem and the Completeness Theorem for first-order logic (and propositional
logic) says that the semantic version and syntactic version are the same. This result amazingly allows one
to mimic mathematical reasoning with purely syntactic manipulations.
1.4
One important aspect, often mistaken as the only aspect, of mathematical logic is that it allows us to study
mathematical reasoning. A prime example of this is given by the last sentence of the previous section.
The Completeness Theorem says that we can capture the idea of one mathematical statement following
from other mathematical statements with nothing more than syntactic rules on symbols. This is certainly
computationally, philosophically, and foundationally interesting, but its much more than that. A simple
consequence of this result is the Compactness Theorem, which says something very deep about mathematical
reasoning and has many interesting applications in mathematics.
Although weve developed the above logics with modest goals of handling certain fields of mathematics,
its a wonderful and surprising fact that we can embed (nearly) all of mathematics in an elegant and natural
first-order system: first-order set theory. This opens the door to the possibility of proving that certain
mathematical statements are independent of our usual axioms. That is, that there are formulas such that
there is no deduction from the usual axioms of either or (). Furthermore, the field of set theory has
blossomed into an intricate field with its own deep and interesting questions.
Other very interesting and fundamental subjects arise when we ignore the foundational aspects and
deductions altogether, and simply look at what weve accomplished by establishing a precise language to
describe an area of mathematics. With a language in hand, we now have a way to say that certain objects
are definable in that language. For instance, take the language of commutative rings mentioned above. If we
fix a particular commutative ring, then the formula
y(m(x, y) = 1)
has a free variable x and defines the set of units in the ring. With this point of view, weve opened up
the possibility of proving lower bounds on the complexity of any definition of a certain object, or even of
proving that no such definition exists in the language.
Another, closely related, way to take our definitions of precise languages and run with it is the subject
of model theory. In group theory, you state some axioms and work from there in order to study all possible
realizations of the axioms, i.e. groups. However, as we saw above, the group axioms arise in one possible
language with one possible set of axioms. Instead, we can study all possible languages and all possible sets
of axioms and see what we can prove in general and how the realizations compare to each other. In this
sense, model theory is a kind of abstract abstract algebra.
Finally, although its probably far from clear how it fits in at this point, computability theory is intimately
related to the above subjects. To see the first glimmer of a connection, notice that computer programming
languages are also formal languages with a precise grammar and a clear distinction between syntax and
semantics. As well see in time, however, the connection is much deeper.
1.5
14
CHAPTER 1. INTRODUCTION
Chapter 2
2.1
We begin by compiling the basic facts about induction and recursion on the natural numbers. We dont
seek to prove that proofs by induction or definitions by recursion on N are valid methods because these
are obvious from the normal mathematical perspective which we are adopting. Besides, in order to do so,
we would first have to fix a context in which we are defining N. Eventually, we will indeed carry out such
a construction in the context of axiomatic set theory, but that is not our current goal. Although youre no
doubt familiar with the intuitive content of the results here, our goal here is simply to carefully codify these
facts in more precise ways to ease the transition to more complicated types of induction and recursion.
Definition 2.1.1. We define S : N N by letting S(n) = n + 1 for all n N.
Induction is often stated in the form If we know something holds of 0, and we know that it holds of S(n)
whenever it holds of n, then we know that it holds for all n N. We state it in the following more precise
set-theoretic fashion (avoiding explicit mention of somethings or properties) because we can always form
the set X = {n N : something holds of n}.
Theorem 2.1.2 (Induction on N - Step Form). Suppose that X N is such that 0 X and S(n) X
whenever n X. We then have X = N.
Definitions by recursion is usually referred to by saying that When defining f (S(n)), you are allowed to
refer to the value of f (n). For instance, let f : N N be the factorial function f (n) = n!. One usually sees
this defined in the following manner:
f (0) = 1
f (S(n)) = S(n) f (n)
We aim to codify this idea a little more abstractly and rigorously in order to avoid the self-reference of f in
the definition and allowable rules so that we can generalize it to other situations.
15
16
Suppose that X is a set and were trying to define f : N X recursively. What do we need? Well, we
need to know f (0), and we need to have a method telling you how to define f (S(n)) from knowledge of n
and the value of f (n). If we want to avoid the self-referential appeal to f when invoking the value of f (n),
what we need is a method which tells us what to do next regardless of the actual particular value of f (n).
That is, it needs to tell us what to do on any possible value, not just the one the ends up happening to be
f (n). Formally, this method can be given by a function g : N X X which tells you what to do at the
next step. Intuitively, this function acts as an iterator. That is, it says if the the last thing you were working
on was input n and it so happened that you set f (n) to equal x A, then you should define f (S(n)) to be
the value g(n, x).
With all this setup, we now state the theorem which says that no matter what value you want to assign
to f (0), and no matter what iterating function g : N X X you give, there exists a unique function
f : N X obeying the rules.
Theorem 2.1.3 (Recursion on N - Step Form). Let X be a set, let y X, and let g : N X X. There
exists a unique function f : N X such that
1. f (0) = y.
2. f (S(n)) = g(n, f (n)) for all n N.
In the case of the factorial function, we have X = N, y = 1, and g : NN N defined by g(n, x) = S(n)x.
The above theorem implies that there is a unique function f : N N such that
1. f (0) = y = 1.
2. f (S(n)) = g(n, f (n)) = S(n) f (n) for all n N.
Notice how we moved any mention of self-reference out of the definition of g, and pushed all of the weight
onto the theorem which states the existence and uniqueness of a function which behaves properly, i.e. which
satisfies the initial condition and appropriate recursive equation.
There is another version of induction on N, sometimes called strong induction, which appeals to the
ordering of the natural numbers rather than the stepping of the successor function.
Theorem 2.1.4 (Induction on N - Order Form). Suppose that X N is such that n X whenever m X
for all m < n. We then have X = N.
Notice that there is no need to deal with the base case of n = 0, because this is handled vacuously due
to the fact that there is no m < 0.
Theorem 2.1.5 (Recursion on N - Order Form). Let X be a set and let g : X X. There exists a unique
function f : N X such that
f (n) = g(f [n])
for all n N.
2.2
Generation
There are many situations throughout mathematics when we want to look at what a certain subset generates. For instance, you have a subset of a group (vector space, ring), and you want to consider the
subgroup (subspace, ideal) that they generate. Another example is you have a subset of a graph, and you
want to consider the set of vertices in the graph reachable from the subset. In the introduction, we talked
about generating all formulas from primitive formulas using certain connections. This situation will arise so
frequently in what follows that its a good idea to unify them all in a common framework.
2.2. GENERATION
17
Definition 2.2.1. Let A be a set and let k N+ . A function h : Ak A is called a k-ary function on A.
We call k the arity of h. A 1-ary function is sometimes called unary and a 2-ary function is sometimes
called binary.
Definition 2.2.2. Suppose that A is a set, B A and H is a collection of functions such that each h H is
a k-ary function on A for some k N+ . We call (A, B, H) a simple generating system. In such a situation,
for each k N+ , we denote the set of k-ary functions in H by Hk .
Examples.
1. Let G be a group and let B G. We want the subgroup of G that B generates. The operations in
question here are the group operation and inversion, so we let H = {h1 , h2 } where
(a) h1 : G2 G is given by h1 (x, y) = x y for all x, y G.
(b) h2 : G G is given by h2 (x) = x1 for all x G.
(G, B, H) is a simple generating system.
2. Let V be a vector space over R and let B V . We want the subspace of V that B generates. The
operations in question consist of vector addition and scalar multiplication, so we let H = {g} {h :
R} where
(a) g : V 2 V is given by g(v, w) = v + w for all v, w V .
(b) For each R, h : V V is given by h (v) = v for all v V .
(V, B, H) is a simple generating system.
There are certain cases when the natural functions to put into H are not total or are multi-valued. For
instance, in the first example below, well talk about the subfield generated by a certain subset of a field,
and well want to include multiplicative inverses for all nonzero elements. When putting a corresponding
function in H, there is no obvious way to define it on 0. Also, if generating the vertices reachable from a
subset of a graph, we may want to throw many vertices in because a vertex can be linked to many others.
Definition 2.2.3. Let A be a set and let k N+ . A function h : Ak P(A) is called a set-valued k-ary
function on A. We call k the arity of h. A 1-ary set-valued function is sometimes called unary and a 2-ary
set-valued function is sometimes called binary.
Definition 2.2.4. Suppose that A is a set, B A and H is a collection of functions such that each h H
is a set-valued k-ary function on A for some k N+ . We call (A, B, H) a generating system. In such a
situation, for each k N+ , we denote the set of multi-valued k-ary functions in H by Hk .
Examples.
1. Let K be a field and let B K. We want the subfield of K that B generates. The operations in
question here are addition, multiplication, and both additive and multiplicative inverses. We thus let
H = {h1 , h2 , h3 , h4 } where
(a) h1 : K 2 P(K) is given by h1 (a, b) = {a + b} for all a, b K.
(b) h2 : K 2 P(K) is given by h2 (a, b) = {a b} for all a, b K.
(c) h3 : K P(K) is given by h3 (a) = {a} for all a K.
18
{a1 }
if a 6= 0
if a = 0
Notice that if we have a simple generating system (A, B, H), then we can associate to it the generating
system (A, B, H0 ) where H0 = {h0 : h H} where if h : Ak A is an element of Hk , then h0 : Ak P(A) is
defined by letting h0 (a1 , a2 , . . . , ak ) = {h(a1 , a2 , . . . , ak )}.
Given a generating system (A, B, H), we want to define the set of elements of A generated from B using
the functions in H. There are many natural ways of doing this. We discuss three different ways which divide
into approaches from above and approaches from below. Each of these descriptions can be slightly
simplified for simple generating systems, but its not much harder to handle the more general case.
2.2.1
From Above
2.2. GENERATION
2.2.2
19
The second idea is to make a system of levels, at each new level adding elements of A which are reachable
from elements already accumulated by applying an element of H.
Definition 2.2.8. Let (A, B, H) be a generating system. We define a sequence Vn (A, B, H), or simply Vn ,
recursively as follows.
V0 = B
Vn+1 = Vn {c A : There exists k N+ , h Hk , and a1 , a2 , . . . , ak Vn such that c h(a1 , a2 , . . . , ak )}
S
Let V (A, B, H) = V =
Vn .
nN
2.2.3
The third method is to consider those elements of A which you are forced to put in because you see a
witnessing construction.
Definition 2.2.10. Let (A, B, H) be a generating system. A witnessing sequence is an element A \{}
such that for all j < ||, either
1. (j) B
2. There exists k N+ , h Hk and i1 , i2 , . . . , ik < j such that (j) h((i1 ), (i2 ), . . . , (ik )).
If is a witnessing sequence, we call it a witnessing sequence for (|| 1) (i.e. a witnessing sequence for
the last element of that sequence).
Definition 2.2.11. Let (A, B, H) be a generating system. Set
W (A, B, H) = W = {a A : there exists a witnessing sequence for a}.
It sometimes useful to look only at those elements reachable which are witnessed by sequences of a bounded
length, so for each n N+ , set
Wn = {a A : there exists a witnessing sequence for a of length n}.
The first simple observation is that if we truncate a witnessing sequence, what remains is a witnessing
sequence.
Remark 2.2.12. If is a witnessing sequence and || = n, then for all m N+ with m < n we have that
[m] is a witnessing sequence.
Another straightforward observation is that if we concatenate two witnessing sequences, the result is a
witnessing sequence.
Proposition 2.2.13. If and are witnessing sequences, then so is .
Finally, since we can always insert dummy elements from B (assuming its nonempty because otherwise
the result is trivial), we have the following observation.
Proposition 2.2.14. Let (A, B, H) be a generating system. If m n, then Wm Wn .
20
2.2.4
2.3
Step Induction
Heres a simple example of using the I definition to prove that we can argue by induction.
Proposition 2.3.1 (Step Induction). Let (A, B, H) be a generating system. Suppose that X A satisfies
1. B X.
2. h(a1 , a2 , . . . , ak ) X whenever k N+ , h Hk , and a1 , a2 , . . . , ak X.
We then have that G X. Thus, if X G, we have X = G.
Proof. Our assumption simply asserts that X is inductive, hence G = I X.
The next example illustrates how we can sometimes identify G explicitly. Notice that we use 2 different
types of induction in the argument. One direction uses induction on N and the other uses induction on G
as just described.
21
Example 2.3.2. Consider the following simple generating system. Let A = R, B = {7}, and H = {h}
where h : R R is the function h(x) = 2x. Determine G explicitly.
Proof. Intuitively, we want the set {7, 14, 28, 56, . . . }, which we can write more formally as {7 2n : n N}.
Let X = {7 2n : n N}
We first show that X G by showing that 7 2n G for all n N by induction (on N). We have
7 20 = 7 1 = 7 G because B G as G is inductive. Suppose that n N is such that 7 2n G. Since
G is inductive, it follows that h(7 2n ) = 2 7 2n = 7 2n+1 G. Therefore, 7 2n G for all n N by
induction, hence X G.
We now show that G X by induction (on G). Notice that B X because 7 = 7 1 = 7 20 X.
Suppose now that x X and fix n N with x = 7 2n . We then have h(x) = 2 x = 7 2n+1 X. Therefore
G X by induction.
It follows that X = G.
In many cases, its very hard to give a simple explicit description of the set G. This is where induction
really shines, because it allows us to prove something about all elements of G despite the fact that we have
a hard time getting a handle on what exactly the elements of G look like. Heres an example.
Example 2.3.3. Consider the following simple generating system. Let A = Z, B = {6, 183}, and H = {h}
where h : A3 A is given by h(k, m, n) = k m + n. Every element of G is divisible by 3.
Proof. Let X = {n Z : n is divisible by 3}. We prove by induction that G X. We first handle the bases
case. Notice that 6 = 3 2 and 183 = 3 61, so B X.
We now do the inductive step. Suppose that k, m, n X, and fix `1 , `2 , `3 Z with k = 3`1 , m = 3`2 ,
and n = 3`3 . We then have
h(k, m, n) = k m + n
= (3`1 ) (3`2 ) + 3`3
= 9`1 `2 + 3`3
= 3(3`1 `2 + `3 )
hence h(k, m, n) X.
It follows by induction that G X, i.e. that every element of G is divisible by 3.
2.4
Step Recursion
In this section, we restrict attention to simple generating systems for simplicity (and also because all examples
that well need which support definition by recursion will be simple). Naively, one might expect that a
straightforward analogue of Step Form of Recursion on N will carry over to recursion on generated sets. The
hope would be the following.
Hope 2.4.1. Suppose that (A, B, H) is a simple generating system and X is a set. Suppose also that
: B X and that for every h Hk , we have a function gh : (A X)k X. There exists a unique function
f : G X such that
1. f (b) = (b) for all b B.
2. f (h(a1 , a2 , . . . , ak )) = gh (a1 , f (a1 ), a2 , f (a2 ), . . . , ak , f (ak )) for all a1 , a2 , . . . , ak G.
Unfortunately, this hope is too good to be true. Intuitively, we may generate an element a of A in many
very different ways, and our different iterating functions conflict on what values we should assign to a. Heres
a simple example to see what can go wrong.
22
Example 2.4.2. Consider the following simple generating system. Let A = {1, 2}, B = {1}, and H = {h}
where h : A A is given by h(1) = 2 and h(2) = 1. Let X = N. Define : B N by letting (1) = 1 and
define gh : A N N by letting gh (a, n) = n + 1. There is no function f : G N such that
1. f (b) = (b) for all b B.
2. f (h(a)) = gh (a, f (a)) for all a G.
Proof. Notice first that G = {1, 2}. Suppose that f : G N satisfies (1) and (2) above. Since f satisfies (1),
we must have f (1) = (1) = 1. By (2), we then have that
f (2) = f (h(1)) = gh (1, f (1)) = f (1) + 1 = 1 + 1 = 2.
By (2) again, it follows that
f (1) = f (h(2)) = gh (2, f (2)) = f (2) + 2 = 1 + 2 = 3,
contradicting the fact that f (1) = 1.
To get around this problem, we want a definition of a nice simple generating system. Intuitively, we
want to say something like every element of G is generated in a unique way. The following definition is a
relatively straightforward way to formulate this.
Definition 2.4.3. A simple generating system (A, B, H) is free if
1. ran(h Gk ) B = whenever h Hk .
2. h Gk is injective for every h Hk .
3. ran(h1 Gk ) ran(h2 G` ) = whenever h1 Hk and h2 H` with h1 6= h2 .
Heres a simple example which will play a role for us in Section 2.5. Well see more subtle and important
examples when we come to Propositional Logic and First-Order Logic.
Example 2.4.4. Let X be a set. Consider the following simple generating system. Let A = X , let B = X,
and let H = {hx : x X} where hx : X X is the function hx () = x. We then have that G = X \{}
and that (A, B, H) is free.
Proof. First notice that X \{} is inductive because
/ B and hx () 6= for all X . Next, a simple
n
+
induction on n shows that X G for all n N . It follows that G = X \{}.
We now show that (A, B, H) is free. First notice that for any x X, we have that ran(hx G) X =
because every element of ran(hx G) has length at least 2 (because
/ G).
Now for any x X, we have that hx G is injective because if hx () = hx ( ), then x = x , and hence
= .
Finally, notice that if x, y X with x 6= y, we have that ran(hx G) ran(hy G) = because every
elements of ran(hx G) begins with x while every element of ran(hy G) begins with y.
On to the theorem which says that if a simple generating system is free, then we can perform recursive
definitions on it.
Theorem 2.4.5. Suppose that the simple generating system (A, B, H) is free and X is a set. Suppose also
that : B X and that for every h Hk , we have a function gh : (A X)k X. There exists a unique
function f : G X such that
1. f (b) = (b) for all b B.
23
24
hence b Y . It follows that B Y . Suppose now that h Hk and a1 , a2 , . . . , ak Y . Since ai Y for each
i, we have f1 (ai ) = f2 (ai ) for each i, and hence
f1 (h(a1 , a2 , . . . , ak )) = gh (a1 , f1 (a1 ), a2 , f1 (a2 ), . . . , ak , f1 (ak ))
= gh (a1 , f2 (a1 ), a2 , f2 (a2 ), . . . , ak , f2 (ak ))
= f2 (h(a1 , a2 , . . . , ak ))
Thus, h(a1 , a2 , . . . , ak ) Y . It follows by induction that Y = G, i.e. f1 (a) = f2 (a) for all a G.
2.5
An Illustrative Example
We now embark on a careful formulation and proof of the statement: If f : A2 A is associative, i.e.
f (a, f (b, c)) = f (f (a, b), c) for all a, b, c A, then any grouping of terms which preserves the ordering of
the elements inside the grouping gives the same value. In particular, if we are working in a group A, then
we can write things like acabba without parentheses because any allowable insertion of parentheses gives the
same value.
Throughout this section, let A be a set not containing the symbols [, ], or ?. Let SymA = A {[, ], ?}.
Definition 2.5.1. Define a binary function h : (SymA )2 SymA by letting h(, ) be the sequence [ ? ].
Let V alidExpA = G(SymA , A, {h}) (viewed as a simple generating system).
For example, suppose that A = {a, b, c}. Typical elements of G(SymA , A, {h}) are c, [b ? [a ? c]] and
[c ? [[c ? b] ? a]. The idea now is that if we have a particular function f : A2 A, we can intrepret ? as
application of the function, and then this should give us a way to make sense of, that is evaluate, any
element of V alidExpA .
2.5.1
Proving Freeness
w((i))
i<||
25
Proof. The proof is by induction on . In other words, we let X = { V alidExpA : K() = 0}, and we
prove by induction that X = V alidExpA . Notice that for every a A, we have that K(a) = 0. Suppose
that , V alidExpA are such that K() = 0 = K(). We then have that
K([ ? ]) = K([) + K() + K(?) + K() + K(])
= 1 + 0 + 0 + 0 + 1
= 0.
The result follows by induction.
Proposition 2.5.6. If V alidExpA and (i.e. is a proper initial segment of ) with 6= , then
K() 1.
Proof. Again, the proof is by induction on . That is, we let
X = { V alidExpA : For all with 6= , we have K() 1}
and we prove by induction that X = V alidExpA .
For every a A, this is trivial because there is no 6= with a.
Suppose that , V alidExpA and the result holds for and . We prove the result for [ ? ].
Suppose that [ ? ] and 6= . If is [, then K() = 1. If is [ where 6= and , then
K() = 1 + K( )
1 1
(by induction)
1.
If is [ or [?, then
K() = 1 + K()
= 1 + 0
= 1.
If is [ ? , where 6= and , then
K() = 1 + K() + K( )
= 1 + 0 + K( )
1 + 0 1
(by induction)
1.
Otherwise, is [ ? , and
K() = 1 + K() + K()
= 1 + 0 + 0
= 1.
Thus, the result holds for [ ? ].
Corollary 2.5.7. If , V alidExpA , then 6 .
Proof. This follows by combining Proposition 2.5.5 and Proposition 2.5.6, along with noting that
/
V alidExpA (which follows by a trivial induction).
26
2.5.2
The Result
Since we have established freeness, we can define functions recursively. The first such function we define is
the evaluation function.
Definition 2.5.9. Let f : A2 A. We define a function Evf : V alidExpA A recursively by letting
Evf (a) = a for all a A.
Evf ([ ? ]) = f (Evf (), Evf ()) for all , V alidExpA .
Formally, we use freeness to justify this definition as follows. Let : A A be the identity map, and let
gh : (SymA A)2 A be the function defined by letting gh ((, a), (, b)) = f (a, b). By freeness, there is a
unique function Evf : V alidExpA A such that
1. Evf (a) = (a) for all a A.
2. Evf (h(, )) = gh ((, Evf ()), (, Evf ())) for all , V alidExpA .
which, unravelling definitions, is exactly what we wrote above.
We now define the function which eliminates all mention of parentheses and ?. Thus, it produces the
sequence of elements of A within the given sequence in order of their occurrence.
Definition 2.5.10. Define a function D : V alidExpA A recursively by letting
D(a) = a for all a A.
D([ ? ]) = D() D() for all , V alidExpA .
With these definitions in hand, we can now precisely state our theorem.
Theorem 2.5.11. Suppose that f : A2 A is associative, i.e. f (a, f (b, c)) = f (f (a, b), c) for all a, b, c A.
For all , V alidExpA with D() = D(), we have Evf () = Evf ().
In order to prove our theorem, well make use of the following function. Intuitively, it takes a sequence
such as cabc and associates to the right to produce [c ? [a ? [b ? c]]]. Thus, it provides a canonical way to
put together the elements of the sequence into something we can evaluate.
To make the recursive definition precise, consider the simple generating system (A , A, {ha : a A})
where ha : A A is defined by ha () = a. As shown in Example 2.4.4, we know that (A , A, {ha : a A})
is free and we have that G = A \{}.
Definition 2.5.12. We define R : A \{} SymA recursively by letting R(a) = a for all a A, and letting
R(a) = [a ? R()] for all a A and all A \{}.
In order to prove our theorem, we will show that Evf () = Evf (R(D())) for all V alidExpA ,
i.e. that we can take any V alidExpA , rip it apart so that we see the elements of A in order, and then
associate to the right, without affecting the result of the evaluation. We first need the following lemma.
Lemma 2.5.13. Evf ([R() ? R( )]) = Evf (R( )) for all , A \{}.
27
Proof. Fix A \{}. We prove the result for this fixed by induction on A \{}. That is, we let
X = { A \{} : Evf ([R() ? R( )]) = Evf (R( ))}
and prove by induction on (A , A, {ha : a A}) that X = A \{}. Suppose first that a A. We then have
Evf ([R(a) ? R( )]) = Evf ([a ? R( )])
(by definition of R)
= Evf (R(a ))
(by definition of R)
(by definition of R)
(since f is associative)
(since X)
= Evf (R(a ))
(by definition of R)
(by induction)
(by definition of D)
Proof of Theorem 2.5.11. Suppose that , V alidExpA are such that D() = D(). We then have that
Evf () = Evf (R(D()))
2.5.3
= Evf (R(D()))
= Evf ()
It is standard mathematical practice to place binary operations like ? between two elements (so called infix
notation) to signify the application of a binary function, and throughout this section we have followed that
tradition in building up permissible expressions. However, the price we pay is that we need to use parentheses
to avoid ambiguity. For example, it is not clear how to parse a ? b ? c into one of [[a ? b] ? c] or [a ? [b ? c]],
and if the underlying function f is not associative then the distinction really matters.
28
We can of course move the operation to the front and write ?[a, b] instead of [a ? b] similar to how we
sometimes write f (x, y) for a function of two variables. At first sight this looks even worse because we
introduced a comma. However, it turns out that you can avoid all of this extra symbolism entirely. That
is, we simply write ?ab without additional punctuation and build further expressions from here without
introducing ambiguity. This syntactic approach is called Polish notation. For example, we have the
following translations in Polish notation.
[[a ? b] ? c] = ? ? abc
[a ? [b ? c]] = ?a ? bc.
[[a ? b] ? [c ? d]] = ? ? ab ? cd.
We now go about proving that every expression in Polish notation is built up in a unique way. That is, we
prove that the corresponding generating system is free. For this section, let A be a set not containing the
symbol ? and let SymA = A {?} (we no longer need the parentheses).
Definition 2.5.15. Define a binary function h : (SymA )2 SymA by letting h(, ) be the sequence ? .
Let P olishExpA = G(SymA , A, {h}) (viewed as a simple generating system).
Proposition 2.5.16. The simple generating system (SymA , A, {h}) is free.
Definition 2.5.17. Define K : SymA Z as follows. We first define w : SymA Z as follows.
w(a) = 1 for all a A
w(?) = 1
We then define K : SymA Z by letting K() = 0 and letting
K() =
w((i))
i<||
29
Suppose that , P olishExpA and the result holds for and . We prove the result for ?. Suppose
that ?. If = , then K() = 0. If is ? for some , then
K() = K(?) + K( )
1 + 0
(by induction)
1
0.
Otherwise, is ? for some , in which case
K() = K(?) + K() + K( )
= 1 + 1 + K( )
1 + 1 + 0
(by induction)
0.
Thus, the result holds for ?.
Corollary 2.5.21. If , P olishExpA , then 6 .
Proof. This follows by combining Proposition 2.5.19 and Proposition 2.5.20.
Theorem 2.5.22. The generating system (SymA , A, H) is free.
Proof. First notice that ran(h (P olishExpA )2 ) A = because all elements of ran(h) begin with ?.
Suppose that 1 , 2 , 1 , 2 P olishExpA and that h(1 , 1 ) = h(2 , 2 ). We then have ?1 1 = ?2 2 ,
hence 1 1 = 2 2 . Since 1 2 and 2 1 are both impossible by Corollary 2.5.21, it follows that
1 = 2 . Therefore, 1 = 2 . It follows that h (P olishExpA )2 is injective.
30
Chapter 3
Propositional Logic
3.1
3.1.1
Definition 3.1.1. Let P be a nonempty set not containing the symbols (, ), , , , and . Let SymP =
P {(, ), , , , }. Define a unary function h and binary functions h , h , and h on SymP as follows.
h () = ()
h (, ) = ( )
h (, ) = ( )
h (, ) = ( )
Definition 3.1.2. Fix P . Let F ormP = G(SymP , P, H) where H = {h , h , h , h }.
Definition 3.1.3. Define K : SymP Z as follows. We first define w : SymP Z by letting w(A) = 0
for all A P , letting w(3) = 0 for all 3 {, , , }, letting
Pw(() = 1, and letting w()) = 1. We then
define K : SymP Z by letting K() = 0 and letting K() = i<|| w((i)) for all SymP \{}.
Remark 3.1.4. If , SymP , then K( ) = K() + K( ).
Proposition 3.1.5. If F ormP , then K() = 0.
Proof. A simple induction as above.
Proposition 3.1.6. If F ormP and with 6= , then K() 1.
Proof. A simple induction as above.
Corollary 3.1.7. If , F ormP , then 6 .
Proof. This follows by combining Proposition 3.1.5 and Proposition 3.1.6, along with noting that
/ F ormP
(which follows by a simple induction).
Theorem 3.1.8. The generating system (SymP , P, H) is free.
31
32
Proof. First notice that ran(h F ormP ) P = because all elements of ran(h ) begin with (. Similarly,
for any 3 {, , }, we have ran(h3 F orm2P ) P = since all elements of ran(h3 ) begin with (.
Suppose that , F ormP and h () = h (). We then have () = (), hence = . Therefore,
h F ormP is injective. Fix 3 {, , }. Suppose that 1 , 2 , 1 , 2 F ormP and that h3 (1 , 1 ) =
h3 (2 , 2 ). We then have (1 31 ) = (2 32 ), hence 1 31 = 2 32 . Since 1 2 and 2 1 are
both impossible by Corollary 3.1.7, it follows that 1 = 2 . Therefore, 31 = 32 , and so 1 = 2 . It
follows that h3 F orm2P is injective.
Let 3 {, , }. Suppose that , 1 , 2 F ormP and h () = h3 (1 , 2 ). We then have () =
(1 32 ), hence = 1 32 , contradicting the fact that no element of F ormP begins with (by a simple
induction). Therefore, ran(h F ormP ) ran(h3 F orm2P ) = .
Suppose now that 31 , 32 {, , } with 31 6= 32 . Suppose that 1 , 2 , 1 , 2 F ormP and
h31 (1 , 1 ) = h32 (2 , 2 ). We then have (1 31 1 ) = (2 32 2 ), hence 1 31 1 = 2 32 2 . Since 1 2
and 2 1 are both impossible by Corollary 3.1.7, it follows that 1 = 2 . Therefore, 31 = 32 , a
contradiction. It follows that ran(h31 F orm2P ) ran(h32 F orm2P ) = .
3.1.2
Polish Notation
Definition 3.1.9. Let P be a set not containing the symbols , , , and . Let SymP = P {, , , }.
Define a unary function h and binary functions h , h , and h on SymP as follows.
h () =
h (, ) =
h (, ) =
h (, ) =
Definition 3.1.10. Fix P . Let F ormP = G(SymP , P, H) where H = {h , h , h , h }.
Definition 3.1.11. Define K : SymP Z as follows. We first define w : SymP Z by letting w(A) = 1 for
all A P , letting w() = 0, and letting P
w(3) = 1 for all 3 {, , }. We then define K : SymP Z
by letting K() = 0 and letting K() = i<|| w((i)) for all SymP \{}.
Remark 3.1.12. If , SymP , then K( ) = K() + K( ).
Proposition 3.1.13. If F ormP , then K() = 1.
Proof. The proof is by induction on . Notice that for every A P , we have that K(A) = 1. Suppose that
F ormP is such that K() = 1. We then have that
K() = 0 + K()
= K()
= 1.
Suppose now that , F ormP are such that K() = 1 = K(), and 3 {, , }. We then have that
K(3) = 1 + K() + K()
= 1 + 1 + 1
= 1.
The result follows by induction.
Proposition 3.1.14. If F ormP and , then K() 0.
33
Proof. The proof is by induction on . For every A P , this is trivial because the only A is = , and
we have K() = 0.
Suppose that F ormP and the result holds for . We prove the result for . Suppose that .
If = , then K() = 0. Otherwise, is for some , in which case
K() = 0 + K( )
0+0
(by induction)
0.
Thus, the result holds for .
Suppose that , F ormP and the result holds for and . Let 3 {, , }. We prove the result
for 3. Suppose that 3. If = , then K() = 0. If is 3 for some , then
K() = 1 + K( )
1 + 0
(by induction)
1
0.
Otherwise, is 3 for some , in which case
K() = 1 + K() + K( )
= 1 + 1 + K( )
1 + 1 + 0
(by induction)
0.
Thus, the result holds for 3.
Corollary 3.1.15. If , F ormP , then 6 .
Proof. This follows by combining Proposition 3.1.13 and Proposition 3.1.14.
Theorem 3.1.16. The generating system (SymP , P, H) is free.
Proof. First notice that ran(h F ormP ) P = because all elements of ran(h ) begin with . Similarly,
for any 3 {, , }, we have ran(h3 F orm2P ) P = since all elements of ran(h3 ) begin with 3.
Suppose that , F ormP and h () = h (). We then have = , hence = . Therefore,
h F ormP is injective. Fix 3 {, , }. Suppose that 1 , 2 , 1 , 2 F ormP and that h3 (1 , 1 ) =
h3 (2 , 2 ). We then have 31 1 = 32 2 , hence 1 1 = 2 2 . Since 1 2 and 2 1 are both
impossible by Corollary 3.1.15, it follows that 1 = 2 . Therefore, 1 = 2 . It follows that h3 F orm2P is
injective.
For any 3 {, , }, we have ran(h F ormP )ran(h3 F orm2P ) = because all elements of ran(h )
begin with and all elements of ran(h3 ) begin with 3. Similarly, if 31 , 32 {, , } with 31 6= 32 , we
have ran(h31 F orm2P ) ran(h32 F orm2P ) = because all elements of ran(h31 ) begin with 31 and all
elements of ran(h32 ) begin with 32 .
3.1.3
Since we should probably fix an official syntax, lets agree to use Polish notation because its simpler in
many aspects and it will be natural to generalize when we talk about the possibility of other connectives
and when we discuss first-order logic. However, as with many official definitions in mathematics, well ignore
34
and abuse this convention constantly in the interest of readability. For example, well often write things in
standard syntax or in more abbreviated forms. For example, well write A B instead of AB (or (A B)
in the original syntax) . Well also write something like
A1 A2 An1 An
or
n
^
Ai
i=1
instead of (A1 (A2 ( (An1 An ) ))) in standard syntax or A1 A2 An1 An in Polish notation
(which can be precisely defined in a similar manner as R in Section 2.5). In general, when we string together
multiple applications of an operation (such as ) occur in order, we always associate to the right.
When it comes to mixing symbols, lets agree to the following conventions about binding in a similar
fashion to how we think of as more binding than + (so that 35+2 is read as (35)+2). We think of as the
most binding, so we read A B as ((A) B). After that, we consider and as the next most binding,
and has the least binding. Well insert parentheses when we wish to override this binding. For example,
A B C D is really ((A (B)) (C D)) while A (B C D) is really (A ((B) (C D))).
3.1.4
Recursive Definitions
Since weve shown that our generating system is free, we can define functions recursively. It is possible
to avoid using recursion on F ormP to define some of functions. In such cases, you may wonder why we
bother. Since our only powerful way to prove things about the set F ormP is by induction, and definitions
of functions by recursion are well-suited to induction, its simply the easiest way to procede.
Definition 3.1.17. If X is a set, we denote by P(X) the set of all subsets of X. Thus P(X) = {Z : Z X}.
We call P(X) the power set of X.
Definition 3.1.18. We define a function OccurP rop : F ormP P(P ) recursively as follows.
OccurP rop(A) = {A} for all A P .
OccurP rop() = OccurP rop().
OccurP rop(3) = OccurP rop() OccurP rop() for each 3 {, , }.
If you want to be precise in the previous definition, were defining functions : P P(P ), gh : SymP
P(P ) P(P ) and gh3 : (SymP P(P ))2 P(P ) for each 3 {, , } as follows.
(A) = {A} for all A P .
gh (, Z) = Z.
gh3 (1 , Z1 , 2 , Z2 ) = Z1 Z2 for each 3 {, , }.
and were using our result on freeness to assure that there is a unique function OccurP rop : F ormP P(P )
which satisfy the associated requirements. Of course, this method is more precise, but its hardly more
intuitive to use. Its a good exercise to make sure that you can translate a few more informal recursive
definitions in this way, but once you understand how it works you can safely keep the formalism in the back
of your mind.
Heres a somewhat trivial example of using induction to prove a result based on a recursive definition.
Proposition 3.1.19. Suppose that Q P . We then have that F ormQ F ormP .
35
if = A
Subst (A) =
A otherwise
(
=
Subst ()
Subst ()
Subst (3)
(
=
if =
otherwise
3Subst ()Subst ()
if = 3
otherwise
for each 3 {, , }.
Example 3.1.26. SubstAB
C ( CAC) = ABA AB.
36
3.2
0
v() =
0
1
v() =
1
v( ) =
0
if
if
if
if
v() = 0
v() = 0
v() = 1
v() = 1
and
and
and
and
v() = 0
v() = 1
v() = 0
v() = 1
if
if
if
if
v() = 0
v() = 0
v() = 1
v() = 1
and
and
and
and
v() = 0
v() = 1
v() = 0
v() = 1
if
if
if
if
v() = 0
v() = 0
v() = 1
v() = 1
and
and
and
and
v() = 0
v() = 1
v() = 0
v() = 1
Before moving on, we should a couple of things about what happens when we shrink/enlarge the set P .
Intuitively, if F ormQ and Q P , then we can extend the truth assigment from Q to P arbitrarily
without affecting the value of v(). Here is the precise statement.
Proposition 3.2.3. Suppose that Q P and that v : P {0, 1} is a truth assignment on P . We then have
that v() = (v Q)() for all F ormQ .
Proof. A trivial induction on F ormQ .
Proposition 3.2.4. Suppose F ormP . Whenever v1 and v2 are truth assignments on P such that
v1 (A) = v2 (A) for all A OccurP rop(), we have v 1 () = v 2 ().
Proof. Let Q = OccurP rop(). We then have that F ormQ by Proposition 3.1.20. Since v1 Q = v2 Q,
we have
v 1 () = (v1 Q)() = (v2 Q)() = v 2 ()
With a method of assigning true/false values to formulas in hand (once weve assigned them to P ), were
now in position to use our semantic definitions to given a precise meaning to The set of formulas implies
the formula .
Definition 3.2.5. Let P be given Let F ormP and let F ormP . We write P , or simply
if P is clear, to mean that whenever v is a truth assignment on P such that v() = 1 for all , we have
v() = 1. We pronounce as semantically implies .
37
We also have a semantic way to say that a set of formulas is not contradictory.
Definition 3.2.6. is satisfiable if there exists a truth assignment v : P {0, 1} such that v() = 1 for all
. Otherwise, we say that is unsatisfiable.
Example 3.2.7. Let P = {A, B, C}. We have {A B, (A (C))} B C.
Proof. Let v : P {0, 1} be a truth assignment such that v(A B) = 1 and v((A (C))) = 1. We need to
show that v(B C) = 1. Suppose not. We would then have that v(B) = 0 and v(C) = 0. Since v(A B) = 1,
this implies that v(A) = 1. Therefore, v(A (C)) = 1, so v((A (C))) = 0, a contradiction.
Example 3.2.8. Let P be given. For any , F ormP , we have { , }
Proof. Let v : P {0, 1} be a truth assignment and suppose that v( ) = 1 and v() = 1. If v() = 0,
it would follows that v( ) = 0, a contradiction. Thus, v() = 1.
Notation 3.2.9.
1. If = , we write instead of .
2. If = {}, we write instead of {} .
Definition 3.2.10.
1. Let F ormP . We say that is a tautology if .
2. If and , we say that and are semantically equivalent.
Remark 3.2.11. Notice that and are semantically equivalent if and only if for all truth assigments
v : P {0, 1}, we have v() = v().
Example 3.2.12. is a tautology for any F ormP .
Proof. Fix F ormP . Let v : P {0, 1} be a truth assignment. If v() = 1, then v( ) = 1.
Otherwise, we have v() = 0, in which case v() = 1, and hence v( ) = 1. Therefore, v( ) = 1
for all truth assignments v : P {0, 1}, hence is a tautology.
Example 3.2.13. is semantically equivalent to for any F ormP .
Proof. Fix F ormP . We need to show that for any truth assignment v : P {0, 1}, we have v() = 1 if
and only if v() = 1. We have
v() = 1 v() = 0
v() = 1
38
B
0
0
1
1
0
0
1
1
AB
0
0
1
1
1
1
1
1
C
0
1
0
1
0
1
0
1
(A B) C
0
0
0
1
0
1
0
1
C
1
0
1
0
1
0
1
0
A (C)
1
1
1
1
1
0
1
0
(C) B
0
1
1
1
0
1
1
1
Notice that every row in which both of the (A B) C column and the A (C) column have a 1, namely
just the row beginning with 011, we have that the entry under the (C) B column is a 1. Therefore,
{(A B) C, A (C)} (C) B.
Example 3.2.18. Show that (A B) is semantically equivalent to A B.
Proof.
A
0
0
1
1
B
0
1
0
1
AB
0
0
0
1
(A B)
1
1
1
0
A
1
1
0
0
B
1
0
1
0
A B
1
1
1
0
39
Notice that the rows in which the (A B) column has a 1 are exactly the same as the rows in which the
A B column has a 1. Therefore, (A B) is semantically equivalent to A B.
3.3
Its natural to wonder if our choice of connectives is the right one. For example, why didnt we introduce a
new connective , allowing ourselves to form the formulas (or in Polish notation) and extend
our definition of v so that
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
1
0
1
0
0
0
1
1
40
Suppose we wanted to come up with a formula such that f = B . One option is to use a lot of thought
to come up with an elegant solution. Another is simply to think as follows. Since f (000) = 1, perhaps we
should put
A0 A1 A2
into the formula somewhere. Similarly, since f (010) = 1, perhaps we should put
A0 A1 A2
into the formula somewhere. If we do the same to the other lines which have value 1, we can put all of these
pieces together in a manner which makes them all play nice by connecting them with . Thus, our formula
is
(A0 A1 A2 ) (A0 A1 A2 ) (A0 A1 A2 ) (A0 A1 A2 )
We now give the general proof.
Definition 3.3.4. A literal is a element of P {A : A P }. We denote the set of literals by LitP .
Definition 3.3.5.
Let ConjP = G(SymP , LitP , {h }). We call the elements of ConjP conjunctive formulas.
Let DisjP = G(SymP , LitP , {h }). We call the elements of DisjP disjunctive formulas.
Definition 3.3.6.
Let DN FP = G(SymP , ConjP , {h }). We say that an element of DN FP is in disjunctive normal
form.
Let CN FP = G(SymP , DisjP , {h }). We say that an element of CN FP is in conjunctive normal
form.
Theorem 3.3.7. Fix k N+ , and let P = {A0 , A1 , . . . , Ak1 }. For any boolean function f : {0, 1}k {0, 1}
of arity k, there exists DN FP such that f = B .
Proof. Let T = { {0, 1}k : f () = 1}. If T = , we may let be A0 (A0 ). Suppose then that T 6= .
For each T , let
k1
^
=
i
i=0
where
(
i =
Ai
Ai
if (i) = 1
if (i) = 0
For each T , notice that ConjP because i LitP for all i. Finally, let
_
=
3.4
3.4.1
41
Syntactic Implication
Motivation
We now seek to define a different notion of implication which is based on syntactic manipulations instead
of a detour through truth assignments and other semantic notions. We will do this by setting up a proof
system which gives rules on how to transform certain implications to other implications. There are many
many ways to do this. Some approaches pride themselves on being minimalistic by using a minimal number
of axioms and rules, often at the expense of making the system extremely unnatural to work with. Well
take a different approach and set down our rules and axioms based on the types of steps in a proof that are
used naturally throughout mathematics.
We begin with a somewhat informal description of what we plan to do. The objects that will manipulate
are pairs, where the first component is a set of formulas and the second is a formula. Given F ormP
and F ormP , we write ` to intuitively mean that there is a proof of from the assumptions . We
begin with the most basic proofs. If , i.e. if is one of your assumptions, then youre permitted to
assert that ` .
Basic Proofs: ` if .
Rules for : We have two rules for -elimination and one for -introduction.
`
(ER)
`
`
(EL)
`
` `
(I)
`
`
(IR)
`
{} `
( I)
`
`
( E)
{} `
{} ` {} `
(P C)
`
3.4.2
Official Definitions
otherwise
42
if 1 = 2
otherwise
3.4.3
Examples Of Deductions
Proposition 3.4.7. A B ` A B.
Proof.
{A B} ` A B
(AssumeP )
(1)
{A B} ` A
(EL on 1)
(2)
(I on 2)
(3)
{A B} ` A B
(AssumeP )
(1)
{, } `
(AssumeP )
(2)
(Contr on 1 and 2)
(3)
{} `
43
Proof.
{} `
(AssumeP )
(1)
{} `
(IL on 1)
(2)
{} `
(AssumeP )
(3)
{} `
(IR on 3)
(4)
(P C on 2 and 4)
(5)
(AssumeP )
(1)
{, , } `
(AssumeP )
(2)
(Contr on 1 and 2)
(3)
{, } `
(AssumeP )
(4)
{, } `
(P C on 3 and 4)
(5)
{, } `
3.4.4
Theorems about `
44
45
3.5
3.5.1
46
3.5.2
The Completeness Theorem is the converse of the Soundness Theorem (both parts). In order words, it says
that (1) If , then ` and (2) every consistent set of formulas is satisfiable. Part (1) looks quite
difficult to tackle directly (think about the amount of cleverness that went into finding the simple deductions
we have used so far), so instead we go after (2) first and use it to prove (1).
Suppose then that F ormP is consistent. We need to build a truth assignment v : P {0, 1} such
that v() = 1 for all . Suppose that we are trying to define v(A) for a given A P . If A , then we
should certainly set v(A) = 1. Similarly, if A , then we should set v(A) = 0. But what should we do
if both A
/ and A
/ ? What if every formula in is very long and complex so that you have no idea
how to start defining the truth assignment? The idea is to expand to a larger consistent set which has
come simpler formulas that aid us in deciphering how to define v. Ideally, we would like to extend to
consistent set 0 such that for all A P , either A 0 or A 0 . because that would give us a clear way
to define v. However, in order to check that our v satisfies v() = 1 for all , we want even more. That
is the content of our next definition.
Definition 3.5.2. Let F ormP . We say that is complete if for all F ormP , either or
.
Our first task is to show that if is consistent, then it can be expanded to a consistent and complete set
. We first prove this in the special case when P is countable because the construction is more transparent
and avoids more powerful set-theoertic tools.
Proposition 3.5.3. Suppose that P is countable. If is consistent, then there exists a set which is
consistent and complete.
Proof. Since P is countable, it follows that F ormP is countable. List F ormP as 1 , 2 , 3 , . . . . We define a
sequence of sets 0 , 1 , 2 , . . . recursively as follows. Let 0 = . Suppose that n N and we have defined
n . Let
(
n {n }
if n {n } is consistent
n+1 =
n {n } otherwise
S
Using induction and Corollary 3.4.14, it follows that n is consistent for all n N. Let = nN n .
We first argue that is consistent. For any finite subset 0 of , there exists an n N such that
0 n , and so 0 is consistent because every n is consistent. Therefore, is consistent by Proposition
3.4.17. We end by arguing that is complete. Fix F ormP , and fix n N+ such that = n . By
construction, we either have n or n . Therefore, is complete.
We now show how to handle the uncountable case. The idea is that a complete consistent set is a maximal
consistent set, so we can obtain one using Zorns Lemma (a standard set-theoretic tool to obtain maximal
objects). If you are unfamiliar with Zorns Lemma, feel free to focus only on the countable case until we
cover set theory.
Definition 3.5.4. is maximal consistent if is consistent and there is no 0 which is consistent.
Proposition 3.5.5. is maximal consistent if and only if is consistent and complete.
Proof. Suppose that is maximal consistent. We certainly have that is consistent. Fix F ormP . By
Corollary 3.4.14, either {} is consistent or {} is consistent. If {} is consistent, then
because is maximal consistent. Similarly, If {} is consistent, then because is maximal
consistent. Therefore, either or .
Suppose that is consistent and complete. Suppose that 0 and fix 0 . Since is
complete and
/ , we have . Therefore, 0 ` and 0 ` , so 0 is inconsistent. It follows that
is maximal consistent.
47
Proposition 3.5.6. If is consistent, then there exists a set which is consistent and complete.
Proof. Let S = { F ormP : and is consistent}, and
S order S by . Notice that S is nonempty
because S. Suppose that C S is a chain in S. Let = C = { F ormP : for some C}.
We need to argue that is consistent. Suppose that 0 is a finite subset of , say 0 = {1 , 2 , . . . , n }.
For each i , fix i C with i i . Since C is a chain, there exists j such that j i for all i. Now
j C S, so j is consistent, and hence 0 is consistent. Therefore, is consistent by Proposition 3.4.17.
It follows that S and using the fact that for all C, we may conclude that C has an upper
bound.
Therefore, by Zorns Lemma, S has a maximal element . Notice that is maximal consistent, hence
is complete and consistent by Proposition 3.5.5.
Lemma 3.5.7. Suppose that is consistent and complete. If ` , then .
Proof. Suppose that ` . Since is complete, we have that either or . Now if ,
then ` , hence is inconsistent contradicting our assumption. It follows that .
Lemma 3.5.8. Suppose that is consistent and complete. We have
1. if and only if
/ .
2. if and only if and .
3. if and only if either or .
4. if and only if either
/ or .
Proof.
1. If , then
/ because otherwise ` and so would be inconsistent.
Conversely, if
/ , then because is complete.
2. Suppose first that . We then have that ` , hence ` by the EL rule and `
by the ER rule. Therefore, and by Lemma 3.5.7.
Conversely, suppose that and . We then have ` and ` , hence ` by the
I rule. Therefore, by Lemma 3.5.7.
3. Suppose first that . Suppose that
/ . Since is complete, we have that . From
Proposition 3.4.10, we know that {, } ` , hence ` by Proposition 3.4.11. Therefore,
by Lemma 3.5.7. It follows that either or .
Conversely, suppose that either or .
Case 1: Suppose that . We have ` , hence ` by the IL rule. Therefore,
by Lemma 3.5.7.
Case 2: Suppose that . We have ` , hence ` by the IR rule. Therefore,
by Lemma 3.5.7.
4. Suppose first that . Suppose that . We then have that ` and ` ,
hence ` by Proposition 3.4.15. Therefore, by Lemma 3.5.7. It follows that either
/ or
.
Conversely, suppose that either
/ or .
Case 1: Suppose that
/ . We have because is complete, hence {} is inconsistent
(as {} ` and {} ` ). It follows that {} ` by Proposition 3.4.12, hence
` by the I rule. Therefore, by Lemma 3.5.7.
48
v() = 1
Suppose that the result holds for and . We have
and
v() = 1 and v() = 1
v( ) = 1
and
or
v() = 1 or v() = 1
v( ) = 1
and finally
/ or
v() = 0 or v() = 1
v( ) = 1
Therefore, by induction, we have if and only if v() = 1. In particular, we have v() = 1 for all
, hence is satisfiable.
Theorem 3.5.10 (Completeness Theorem). (Suppose that P is countable.)
1. Every consistent set of formulas is satisfiable.
2. If , then ` .
Proof.
1. Suppose that is consistent. By Proposition 3.5.6, we may fix which is consistent and complete.
Now is satisfiable by Proposition 3.5.9, so we may fix v : P {0, 1} such that v() = 1 for all .
Since , it follows that v() = 1 for all , hence is satisfiable.
2. Suppose that . We then have that {} is unsatisfiable, hence {} is inconsistent by
part 1. It follows from Proposition 3.4.13 that ` .
3.6
3.6.1
49
3.6.2
Combinatorial Applications
Definition 3.6.2. Let G = (V, E) be a graph, and let k N+ . A k-coloring of G is a function f : V [k]
such that for all u, v V which are linked by an edge in E, we have f (u) 6= f (v).
Proposition 3.6.3. Let G = (V, E) be a (possibly infinite) graph and let k N+ . If every finite subgraph
of G is k-colorable, then G is k-colorable.
Proof. Let P = {Au,i : u V and i [k]}. Let
k1
_
={
i=0
50
{(A A ) : , T, }
We use the Compactness Theorem to show that is satisfiable. Suppose that 0 is finite. Let
= {1 , 2 , . . . , k } be all of the elements {0, 1} such that A occurs in some element of 0 . Let
n = max{|1 |, |2 |, . . . , |k |}. Since Tn 6= , we may fix Tn . If we define a truth assignment v : P {0, 1}
by
(
1 if
v(A ) =
0 otherwise
we see that v() = 1 for all 0 . Thus, 0 is satisfiable. Therefore, is satisfiable by the Compactness
Theorem.
Fix a truth assignment v : P {0, 1} such that v() = 1 for all . Notice that for each n N+ ,
there exists a unique Tn such that v(A ) = 1 because of the first two sets in the definition of . For
each n, denote the unique such by n and notice that m n whenver m n. Define f : N {0, 1} by
letting f (n) = n+1 (n). We then have that f [n] = n T for all n N.
3.6.3
An Algebraic Application
Definition 3.6.7. An ordered abelian group is an abelian group (A, +, 0) together with a relation on A2
such that
1. is a linear ordering on A, i.e. we have
For all a A, we have a a.
For all a, b A, either a b or b a.
If a b and b a, then a = b.
If a b and b c, then a c.
2. If a b and c d, then a + c b + d.
Example 3.6.8. (Z, +, 0) with its usual order is an ordered abelian group.
Example 3.6.9. Define on Zn using the lexicographic order. In other words, given distinct elements
~a = (a1 , a2 , . . . , an ) and ~b = (b1 , b2 , . . . , bn ) in Zn , let i be least such that ai 6= bi , and set ~a < ~b if ai <Z bi ,
and ~b < ~a if bi <Z ai . With this order, (Zn , +, 0) is an ordered abelian group.
Proposition 3.6.10. Suppose that (A, +, 0) is an ordered abelian group, and define < be letting a < b if
a b and a 6= b. We the have
1. For all a, b A, exactly one of a < b, a = b, or b < a holds.
2. If a < b and b c, then a < c.
3. If a b and b < c, then a < c.
51
Proof.
1. Let a, b A. We first show that at least one happens. Suppose then that a 6= b. We either have a b
or b a. If a b, we then have a < b, while if b a, we then have b < a.
We now show that at most one occurs. Clearly, we cant have both a < b and a = b, nor can we have
both a = b and b < a. Suppose then that we have both a < b and b < a. We would then have both
a b and b a, hence a = b, a contradiction.
2. Since a b and b c, we have a c. If a = c, it would follow that a b and b a, hence a = b, a
contradiction.
3. Since a b and b c, we have a c. If a = c, it would follow that c b and b c, hence b = c, a
contradiction.
Definition 3.6.12. An abelian group (A, +, 0) is torsion-free if every nonzero element of A has infinite
order.
Proposition 3.6.13. Every ordered abelian group is torsion-free.
Proof. Let (A, +, 0) be an ordered abelian group. Let a A. If a > 0, then we have n a > 0 for every
n N+ by induction. If a < 0, then we have n a < 0 for every n N+ by induction.
Theorem 3.6.14. Every torsion-free abelian group can be ordered.
Proof. First notice that every finitely generated torsion-free abelian group is isomorphic to Zn for some n,
which we can order lexicographically from above. We can transer this ordering across the isomorphism to
order our finitely generated abelian group.
Suppose now that A is an arbitrary torsion-free abelian group. Let P be the set {La,b : a, b A} and let
be the union of the sets
{La,a : a A}.
52
We show that is satisfiable. By Compactness, it suffices to show that any finite subset of is satisfiable.
Suppose that 0 is finite, and let S be the finite subset of A consisting of all elements of A which appear
as a subscript of a symbol occuring in 0 . Let B be the subgroup of A generated by S. We then have that
B is a finitely generated torsion-free abelian group, so from above we may fix an order on it. If we define
a truth assignment v : P {0, 1} by
(
1 if a b
v(La,b ) =
0 otherwise
we see that v() = 1 for all 0 . Thus, 0 is satisfiable. Therefore, is satisfiable by the Compactness
Theorem.
Fix a truth assignment v : P {0, 1} such that v() = 1 for all . Define on A2 by letting a b
if and only if v(La,b ) = 1. We then have that orders A. Therefore, A can be ordered.
Chapter 4
4.1
Since our logic will have quantifiers, the first thing that we need is a collection of variables.
Definition 4.1.1. Fix a countably infinite set V ar called variables.
Definition 4.1.2. A first-order language, or simply a language, consists of the following:
1. A set C of constant symbols.
2. A set F of function symbols together with a function ArityF : F N+ .
3. A set R of relation symbols together with a function ArityR : R N+ .
We also assume that C, R, F, V ar, and {, , =, , , , } are pairwise disjoint. For each k N+ , we let
Fk = {f F : ArityF (f) = k}
and we let
Rk = {R R : ArityR (R) = k}
Definition 4.1.3. Let L be a language. We let SymL = C R F V ar {, , =, , , , }.
Now that weve described all of the symbols that are available once weve fixed a language, we need to
talk about how to build up formulas. Before doing this, however, we need a way to name objects. Intuitively,
our constant symbols and variables name objects once weve fixed an interpretation. From here, we can get
new objects by applying, perhaps repeatedly, interpretations of function symbols. This is starting to sound
like a recursive definition.
53
54
Definition 4.1.4. Let L be a language. For each f Fk , define hf : (SymL )k SymL by letting
hf (1 , 2 , . . . , k ) = f1 2 k
Let
T ermL = G(SymL , C V ar, {hf : f F})
Now that we have terms which intuitively name elements once weve fixed an interpretation, we need to
say what our atomic formulas are. The idea is that the most basic things we can say are whether or not two
objects are equal or whether or not a k-tuple is in the interpretation of some relation symbol R Rk .
Definition 4.1.5. Let L be a language. We let
AtomicF ormL = {Rt1 t2 tk : k N+ , R Rk , and t1 , t2 , . . . , tk T ermL } {= t1 t2 : t1 , t2 T ermL }
From here, we can build up all formulas.
Definition 4.1.6. Let L be a language. Define a unary function h and binary functions h , h , and h
on SymL as follows.
h () =
h (, ) =
h (, ) =
h (, ) =
Also, for each x V ar, define two unary functions h,x and h,x on SymL as follows
h,x () = x
h,x () = x
Let
F ormL = G(SymL , AtomicF ormL , {h , h , h , h } {h,x , h,x : x V ar})
As with propositional logic, wed like to be able to define things recursively, so we need to check that our
generating systems are free. Notice that in the construction of formulas, we have two generating systems
around. We first generate all terms. With terms taken care of, we next describe the atomic formulas, and
from them we generate all formulas. Thus, well need to prove that two generating systems are free. The
general idea is to make use of the insights gained by proving the corresponding result for Polish notation in
propositional logic.
Definition 4.1.7. Let L be a language. Define K : SymL Z as follows. We first define w : SymL Z
as follows.
w(c) = 1
for all c C
w(f) = 1 k
for all f Fk
w(R) = 1 k
for all R Rk
w(x) = 1
w(=) = 1
for all x V ar
w(Q) = 1
for all Q {, }
w() = 0
w(3) = 1
We then define K on all of
SymL \{}.
SymL
for all 3 {, , }
P
by letting K() = 0 and letting K() =
i<|| w((i)) for all
55
(by induction)
= 1.
The result follows by induction.
Proposition 4.1.10. If t T ermL and t, then K() 0.
Proof. The proof is by induction on t. For every c C, this is trivial because the only c is = and
we have K() = 0. Similarly, for every x V ar, the only x is = and we have K() = 0.
Suppose that k N+ , f Fk , and t1 , t2 , . . . , tk T ermL are such that the result holds for each ti . We
prove the result for ft1 t2 tk . Suppose that ft1 t2 tk . If = , then K() = 0. Otherwise, there
exists i < k and ti such that = ft1 t2 ti1 , in which case
K() = K(f) + K(t1 ) + K(t2 ) + + K(ti1 ) + K( )
= (1 k) + 1 + 1 + + 1 + K( )
= (1 k) + i + K( )
(1 k) + i + 0
(by induction)
= 1 + (i k)
0.
(since i < k)
56
Proof. The proof is by induction on . We first show that K() = 1 for all AtomicF ormL . Suppose
that is Rt1 t2 tk where R Rk and t1 , t2 , . . . , tk T ermL . We then have
K(Rt1 t2 tk ) = K(R) + K(t1 ) + K(t2 ) + + K(tk )
= (1 k) + 1 + 1 + + 1
= 1.
Suppose that is = t1 t2 where t1 , t2 T ermL . We then have
K(= t1 t2 ) = K(=) + K(t1 ) + K(t2 )
= 1 + 1 + 1
= 1.
Thus, K() = 1 for all AtomicF ormL .
Suppose that F ormL is such that K() = 1. We then have that
K() = K() + K()
=0+1
= 1.
For any Q {, } and any x V ar we also have
K(Qx) = K(Q) + K(x) + K()
= 1 + 1 + 1
= 1.
Suppose now that , F ormL are such that K() = 1 = K(), and 3 {, , }. We then have that
K(3) = 1 + K() + K()
= 1 + 1 + 1
= 1.
The result follows by induction.
Proposition 4.1.14. If F ormL and , then K() 0.
Proof. The proof is by induction on . We first show that the results holds for all AtomicF ormL .
Suppose that is Rt1 t2 tk where R Rk and t1 , t2 , . . . , tk T ermL . Suppose that Rt1 t2 tk . If
= , then K() = 0. Otherwise, there exists i < k and ti such that is Rt1 t2 ti1 , in which case
K() = K(R) + K(t1 ) + K(t2 ) + + K(ti1 ) + K( )
= (1 k) + 1 + 1 + + 1 + K( )
= (1 k) + i + K( )
(1 k) + i + 0
(by induction)
= 1 + (i k)
0.
(since i < k)
Thus, the result holds for Rt1 t2 tk . The same argument works for = t1 t2 where t1 , t2 T ermL , so the
result holds for all AtomicF ormL .
57
Suppose that the result holds for F ormL . Suppose that . If = , then K() = 0.
Otherwise, = for some , in which case
K() = K() + K( )
= 0 + K( )
0.
(by induction)
Suppose now that Q {, }, that x V ar, and that Qx. If = , then K() = 0, and if = Q,
then K() = 1. Otherwise, = Qx for some , in which case
K() = K(Q) + K(x) + K( )
= 1 + 1 + K( )
=0
(by induction)
Suppose now that the result holds for , F ormL , and 3 {, , }. Suppose that 3. If = ,
then K() = 0. If is 3 for some , then
K() = K(3) + K( )
= 1 + K( )
1.
(by induction)
1.
(by induction)
58
4.2
4.2.1
Up until this point, all that weve dealt with are sequences of symbols without meaning. Sure, our motivation
was to capture meaningful situations with our languages and the way weve described formulas, but all weve
done so far is describe the grammar. If we want our formulas to actually express something, we need to set
up a context in which to interpret them. Since we have quantifiers, the first thing well need is a nonempty
set M to serve as the domain of objects that the quantifiers range over. Once weve fixed that, we need to
interpret the symbols of our language as actual elements of our set (in the case of constant symbols), actual
k-ary relations on M (in the case of R Rk ), and actual k-ary functions on M (in the case of f Fk ).
Definition 4.2.1. Let L be a language. An L-structure, or simply a structure, is a set M = (M, gC , gF , gR )
where
1. M is a nonempty set called the universe of M.
2. gC : C M .
59
60
Notice that there is nothing deep going on here. Given an L-structure M and a variable assignment s,
to apply s to a term, we simply unravel the term attaching meaning to each symbol using M and s as we
bottom-out through the recursion. For example, assume that L = {c, f} where c is a constant symbol and f
is a binary function symbol. Given an L-structure M and a variable assignment s : V ar M , then working
through the definitions, we have
s(ffczfxffczy) = f M (s(fcz), s(fxffczy))
= f M (f M (s(c), s(z)), f M (s(x), s(ffczy)))
= f M (f M (s(c), s(z)), f M (s(x), f M (s(fcz), s(y))))
= f M (f M (s(c), s(z)), f M (s(x), f M (f M (s(c), s(z)), s(y))))
= f M (f M (cM , s(z)), f M (s(x), f M (f M (cM , s(z)), s(y))))
In other words, were taking the syntactic formula ffczfxffczy and assigned a semantic meaning to it by
returning the element of M described in the last line. For a specific example of how this gets interpreted
in one case, let M be the integers Z with cM = 5 and with f M being addition. Let s : V ar M be an
arbitrary variable assignment with s(x) = 3, s(y) = 11, and s(z) = 2. We then have
s(ffczfxffczy) = 6
because
s(ffczfxffczy) = f M (f M (cM , s(z)), f M (s(x), f M (f M (cM , s(z)), s(y))))
= f M (f M (0, 2), f M (3, f M (f M (0, 2), 11)))
= ((5 + 2) + (3 + (5 + 2) + (11)))
=6
Were now in position to define the intuitive statement holds in the L-structure M with variable
assignment s recursively. We need the following definition in order to handle quantifiers.
Definition 4.2.6. Let M be an L-structure, and let s : V ar M be a variable assignment. Given x V ar
and a M , we let s[x a] denote the variable assignment
(
a
if y = x
s[x a](y) =
s(y) otherwise
Definition 4.2.7. Let M be an L-structure. We define a relation (M, s) (pronounced holds in
(M, s), or is true in (M, s), or (M, s) models ) for all F ormL and all variable assignments
s by induction on .
Suppose first that is an atomic formula.
If is Rt1 t2 tk ,, we have (M, s) if and only if (s(t1 ), s(t2 ), . . . , s(tk )) RM .
If is = t1 t2 , we have (M, s) if and only if s(t1 ) = s(t2 ).
For any s, we have (M, s) if and only if (M, s) 6 .
For any s, we have (M, s) if and only if (M, s) and (M, s) .
For any s, we have (M, s) if and only if either (M, s) or (M, s) .
61
62
In the above examples, its clear that only the value of s on the free variables in affect whether or not
(M, s) . The following precise statement of this fact follows by a straightforward induction.
Proposition 4.2.8. Let M be an L-structure. Suppose that t T ermL and s1 , s2 : V ar M are two
variable assignments such that s1 (x) = s2 (x) for all x OccurV ar(t). We then have s1 (t) = s2 (t).
Proposition 4.2.9. Let M be an L-structure. Suppose that F ormL and s1 , s2 : V ar M are two
variable assignments such that s1 (x) = s2 (x) for all x F reeV ar(). We then have
(M, s1 ) if and only if (M, s2 )
Notation 4.2.10. Let L be a language.
1. If x1 , x2 , . . . , xn V ar are distinct, and we refer to a formula (x1 , x2 , . . . , xn ) F ormL we mean that
F ormL and F reeV ar() {x1 , x2 , . . . , xn }.
2. Suppose that M is an L-structure, (x1 , x2 , . . . , xn ) F ormL , and a1 , a2 , . . . , an M . We write
(M, a1 , a2 , . . . , an ) to mean that (M, s) for some (any) s : V ar M with s(xi ) = ai for all i.
3. As a special case of 2, we have the following. Suppose that M is an L-structure and SentL . We
write M to mean that (M, s) for some (any) s : V ar M .
4.2.2
As weve seen, once we fix a language L, an L-structure can fix any set M at all, interpret the elements of C
as fixed elements of M , interpret the elements of Rk as arbitrary subsets of M k , and interpret the elements
Fk as arbitrary k-ary functions on M . However, since we have a precise language in hand, we now carve out
classes of structures which satisfy certain sentences of our language.
Definition 4.2.11. Let L be a language, and let SentL . We let M od() be the class of all L-structures
M such that M for all . If SentL , we write M od() instead of M od({}).
Definition 4.2.12. Let L be a language and let K be a class of L-structures.
1. K is an elementary class if there exists SentL such that K = M od().
2. K is a weak elementary class if there exists SentL such that K = M od().
By taking conjunctions, we have the following simple proposition.
Proposition 4.2.13. Let L be a language and let K be a class of L-structures. K is an elementary class if
and only if there exists a finite SentL such that K = M od().
Examples. Let L = {R} where R is a binary relation symbol.
1. The class of partially ordered sets is an elementary class as we saw in the introduction. We may let
be the following collection of sentences:
(a) xRxx
(b) xy((Rxy Ryx) (x = y))
(c) xyz((Rxy Ryz) Rxz)
2. The class of equivalence relations is an elementary class. We may let be the following collection of
sentences:
63
(a) xRxx
(b) xy(Rxy Ryx)
(c) xyz((Rxy Ryz) Rxz)
3. The class of simple undirected graphs (i.e. edges have no direction, and there are no loops and no
multiple edges) is an elementary class. We may let be the following collection of sentences:
(a) x(Rxx)
(b) xy(Rxy Ryx)
Examples. Let L be any language whatsoever, and let n N+ . The class of L-structures of cardinality at
least n is an elementary class as witnessed by the formula:
^
(xi 6= xj ))
x1 x2 xn (
1i<jn
Furthermore, the class of L-structures of cardinality equal to n is an elementary class. Letting n be the
above formula for n, we can see this by considering n (n+1 ).
Examples. Let L = {0, 1, +, } where 0, 1 are constant symbols and +, are binary function symbols.
1. The class of fields is an elementary class. We may let be the following collection of sentences:
(a) xyz(x + (y + z) = (x + y) + z)
(b) x((x + 0 = x) (0 + x = x))
(c) xy((x + y = 0) (y + x = 0))
(d) xy(x + y = y + x)
(e) xyz(x (y z) = (x y) z)
(f) x((x 1 = x) (1 x = x))
(g) x(x 6= 0 y((x y = 1) (y x = 1)))
(h) xy(x y = y x)
(i) xyz(x (y + z) = (x y) + (x z))
2. For each prime p > 0, the class of fields of characteristic p is an elementary class. Fix a prime p > 0,
and let p be the above sentences togheter with the sentence 1 + 1 + + 1 = 0 (where there are p
1s in the sum).
3. The class of fields of characteristic 0 is a weak elementary class. Let be the above sentences together
with {n : n N+ } where for each n N+ , we have n = (1 + 1 + + 1 = 0) (where there are n 1s
in the sum).
Example. Let F be a field, and let LF = {0, +} {h : F } where 0 is a constant symbol, + is binary
function symbol, and each h is a unary function symbol. The class of vector spaces over F is a weak
elementary class. We may let be the following collection of sentences:
1. xyz(x + (y + z) = (x + y) + z)
64
At this point, its often clear how to show that a certain class of structures is a (weak) elementary class:
simply exhibit the correct sentences. However, it may seem very difficult to show that a class is not a (weak)
elementary class. For example, is the class of fields of characteristic 0 an elementary class? Is the class of
finite groups a weak elementary class? There are no obvious ways to answer these questions affirmatively.
Well develop some tools later which will allow us to resolve these questions negatively.
Another interesting case is that of Dedekind-complete ordered fields. Now the ordered field axioms are
easily written down in the first-order language L = {0, 1, <, +, }. In contrast, the Dedekind-completeness
axiom, which says that every nonempty subset which is bounded above has a least upper bound, can not
be directly translated in the language L because it involves quantifying over subsets instead of elements.
However, we are unable to immediately conclude that this isnt due to a lack of cleverness on our part.
Perhaps there is an alternative approach which captures Dedekind-complete ordered fields in a first-order
way (by finding a clever equivalent first-order expression of Dedekind-completeness). More formally, the
precise question is whether the complete ordered fields are a (weak) elementary class in the language L.
Well be able to answer this question in the negative later as well.
4.2.3
Definability in Structures
Another wonderful side-effect of developing a formal language is the ability to talk about what objects we
can define using that language.
Definition 4.2.14. Let M be an L-structure. Suppose that k N+ and X M k . We say that X is
definable in M if there exists (x1 , x2 , . . . , xk ) F ormL such that
X = {(a1 , a2 , . . . , ak ) M k : (M, a1 , a2 , . . . , ak ) }
Examples. Let L = {0, 1, +, } where 0 and 1 are constant symbols and + and are binary function symbols.
1. The set X = {(m, n) N2 : m < n} is definable in (N, 0, 1, +, ) as witnessed by the formula
z(z 6= 0 (x + z = y))
2. The set X = {n N : n is prime} is definable in (N, 0, 1, +, ) as witnessed by the formula
(x = 1) yz(x = y z (y = 1 z = 1))
3. The set X = {r R : r 0} is definable in (R, 0, 1, +, ) as witnessed by the formula
y(y y = x)
65
Example.
1. Let L = {<} where < is a binary relation symbol. For every n N, the set {n} is definable in (N, <).
To see this, first define n (x) to be the formula
y1 y2 yn (
(yi 6= yj )
1i<jn
n
^
i=1
and for each n N , the set {n} is definable as witnessed by the formula
n (x) n+1 (x)
2. Let L = {e, f} where e is a constant symbol and f is a binary function symbol. Let (G, e, ) be a group
interpreted as an L-stucture. The center of G is definable in (G, e, ) as witnessed by the formula
y(f(x, y) = f(y, x))
Sometimes, there isnt an obvious way to show that a set is definable, but some cleverness and/or
nontrivial mathematics really pays off.
Examples. Let L = {0, 1, +, } where 0 and 1 are constant symbols and + and are binary function symbols.
1. The set N is definable in (Z, 0, 1, +, ) as witnessed by the formula
y1 y2 y3 y4 (x = y1 y1 + y2 y2 + y3 y3 + y4 y4 )
Certainly every element of Z that is a sum of squares must be an element of N. The fact that every
element of N is a sum of four squares is Lagranges Theorem, an important result in number theory.
2. Let (R, 0, 1, +, ) be a commutative ring. The Jacobson radical of R, denoted Jac(R) is the intersection
of all maximal ideal of R. As stated, it is not clear that this is definable in (R, 0, 1, +, ) because it
appears to quantify over subsets. However, a basic result in commutative algebra says that
a Jac(R) ab 1 is a unit for all b R
For all b R, there exists c R with (ab 1)c = 1
Using this, it follows that Jac(R) is definable in (R, 0, 1, +, ) as witnessed by the formula
yz((x y) z = z + 1)
3. The set X = {(k, m, n) N3 : k m = n} is definable in (N, 0, 1, +, ), as is the set {(m, n) N2 : m is
the nth digit in the decimal expansion of }. In fact, every set C Nk which is computable (i.e. for
which you can write a computer program which outputs yes on elements of C and no on elements
of C) is definable in (N, 0, 1, +, ). These are nontrivial yet fundamental result we will prove later.
4. The set Z is definable in (Q, 0, 1, +, ). This is a deep result of Julia Robinson using some nontrivial
number theory.
As for elementary classes, its clear how to attempt to show that something is definable (although as
weve seen this may require a great deal of cleverness). However, its not at all obvious how one could show
that a set is not definable. Well develop a few tools to do this in time.
66
4.2.4
Substitution
Eventually, we will see the need to substitute terms for variables. Roughly, one might naturally think that
if x is true (M, s), then upon taking a term t and substituting it in for x in the formula , the resulting
formula would also be true in (M, s). We need a way to relate truth before substituting with truth after
substituting. The hope would be the following, where we use the notation tx to intuitively mean that you
substitute t for x:
Hope 4.2.15. Let M be an L-structure, let s : V ar M , let t T ermL , and let x V ar. For all
F ormL , we have
(M, s) tx if and only if (M, s[x s(t)])
In order to make this precise, we first need to define substitition. Even with the correct definition of
substitution, however, the above statement is not true. Lets first define substitution for terms and show
that it behaves well.
Definition 4.2.16. Let x V ar and let t T ermL . We define a function Substtx : T ermL T ermL
denoted by utx as follows.
1. ctx = c for all c C.
(
t if y = x
2. yxt =
y otherwise
for all y V ar.
3. (fu1 u2 . . . uk )tx = f(u1 )tx (u2 )tx (uk )tx for all f Fk and all u1 , u2 , . . . , uk T ermL .
Heres the key lemma that relates how to interpret a term before and after substitition.
Lemma 4.2.17. Let M be an L-structure, let s : V ar M , let t T ermL , and let x V ar. For all
u T ermL , we have
s(utx ) = s[x s(t)](u)
Although the statement of the lemma is symbol heavy, it expresses something quite natural. In order
to determine the value of the term utx according to the variable assignment imposed by s, we need only
change s so that x now gets sent to s(t) (the value of t assigned by s), and evaluate u using this new
variable assignment.
Proof. The proof is by induction on T ermL . For any c C, we have
s(ctx ) = s(c)
= cM
= s[x s(t)](c)
= s[x s(t)](c)
Suppose that u = x. We then have
s(xtx ) = s(t)
= s[x s(t)](x)
= s[x s(t)](x)
67
(by induction)
= s[x s(t)](fu1 u2 uk )
With subsitution in terms defined, we now move to define substitution in formals. The key fact about
this definition is that we only replace x by the term t for the free occurances of x because we certainly dont
want to change x into t, nor do we want to mess with an x inside the scope of such a quantifier. We
thus make the following recursive definition.
Definition 4.2.18. We now define F reeSubstt,x : F ormL F ormL , again denoted tx , as follows.
1. (Ru1 u2 uk )tx = R(u1 )tx (u2 )tx (uk )tx for all R Rk and all u1 , u2 , . . . , uk T ermL .
2. (= u1 u2 )tx = = (u1 )tx (u2 )tx for all u1 , u2 T ermL .
3. ()tx = (tx ) for all F ormL .
4. (3)tx = 3tx xt for all , F ormL and all 3 {, , }.
(
Qy
if x = y
t
5. (Qy)x =
t
Qy(x ) otherwise
for all F ormL , y V ar, and Q {, }.
With the definition in hand, lets analyze the above hope. Suppose that L = , and consider the formula
(x) F ormL given by
y(y = x)
For any L-structure M and any s : V ar M , we have (M, s) if and only if |M | 2. Now notice that
the formula yx is
y(y = y)
so for any L-structure M and any s : V ar M , we have (M, s) 6 yx . Therefore, the above hope fails
whenever M is an L-structure with |M | 2. The problem is that the term we substituted (in this case y)
had a variable which became captured by a quantifier, and thus the meaning of the formula became
transformed. In order to define ourselves out of this obstacle, we define the following function.
Definition 4.2.19. Let t T ermL and let x V ar. We define a function V alidSubsttx : F ormL {0, 1}
as follows.
1. V alidSubsttx () = 1 for all AtomicF ormL .
2. V alidSubsttx () = V alidSubsttx () for all F ormL .
68
1
0
1
t
4. V alidSubstx (Qy) = 1
if x
/ F reeV ar(Qy)
if y
/ OccurV ar(t) and V alidSubsttx () = 1
otherwise
(by induction)
69
(by induction)
4.3
4.3.1
70
(since h is a homomorphism)
= h s(c).
Now for x V ar, we have
h(s(x)) = h(s(x))
= (h s)(x)
= h s(x)
Suppose now that f Fk , that t1 , t2 , . . . , tk T ermL , and the result holds for each ti . We then have
h(s(ft1 t2 tk )) = h(f M (s(t1 ), s(t2 ), . . . , s(tk )))
= f M (h(s(t1 )), h(s(t2 )), . . . , h(s(tk )))
(since h is a homomorphism)
(by induction)
= h s(ft1 t2 tk )
The result follows by induction
2. Suppose that h is an embedding. We prove the result by induction on . Suppose first that R Rk
and that t1 , t2 , . . . , tk T ermL . We then have
(M, s) Rt1 t2 tk (s(t1 ), s(t2 ), . . . , s(tk )) RM
(h(s(t1 )), h(s(t2 )), . . . , h(s(tk ))) RN
(h s(t1 ), h s(t2 ), . . . , h s(tk )) R
(N , h s) Rt1 t2 tk
(since h is a homomorphism)
(by part 1)
71
(by induction)
(N , h s)
Suppose that the result holds for and . We have
(M, s) (M, s) and (M, s)
(N , h s) and (N , h s)
(by induction)
(N , h s)
and similarly for and . The result follows by induction.
3. In light of the proof of 2, we need only show that if is = t1 t2 where t1 , t2 T ermL , then (M, s)
if and only if (N , h s) . For any t1 , t2 T ermL , we have
(M, s) = t1 t2 s(t1 ) = s(t2 )
h(s(t1 )) = h(s(t2 ))
(since h is injective)
h s(t1 ) = h s(t2 )
(by part 1)
(N , h s) = t1 t2
4. Suppose that the result holds for and x V ar. We have
(M, s) x There exists a M such that (M, s[x a])
There exists a M such that (N , h (s[x a]))
There exists a M such that (N , (h s)[x h(a)]
There exists b N such that (N , (h s)[x b])
(since h is bijective)
(N , h s) x
and also
(M, s) x For all a M , we have (M, s[x a])
For all a M , we have (N , h (s[x a]))
For all a M , we have (N , (h s)[x h(a)]
For all b N , we have (N , (h s)[x b])
(since h is bijective)
(N , h s) x
Definition 4.3.5. Let L be a language, and let M and N be L-structures. We write M N , and say that
M and N are elementarily equivalent, if for all SentL , we have M if and only if N .
Corollary 4.3.6. Let L be a language, and let M and N be L-structures. If M
= N , then M N .
72
4.3.2
An Application To Definability
Proposition 4.3.7. Suppose that M is an L-structure and k N+ . Suppose also that X M k is definable
in M and that h : M M is an automorphism. For every a1 , a2 , . . . , ak M , we have
(a1 , a2 , . . . , ak ) X if and only if (h(a1 ), h(a2 ), . . . , h(ak )) X
Proof. Fix (x1 , x2 , . . . , xk ) F ormL such that
X = {(a1 , a2 , . . . , ak ) M k : (M, a1 , a2 , . . . , ak ) }
By part 4 of Theorem 4.3.4, we know that for every a1 , a2 , . . . , ak M , we have
(M, a1 , a2 , . . . , ak ) if and only if (M, h(a1 ), h(a2 ), . . . , h(ak ))
Therefore, for every a1 , a2 , . . . , ak M , we have
(a1 , a2 , . . . , ak ) X if and only if (h(a1 ), h(a2 ), . . . , h(ak )) X
Corollary 4.3.8. Suppose that M is an L-structure and k N+ . Suppose also that X M k and that
h : M M is an automorphism. If there exists a1 , a2 , . . . , ak M such that exactly one of the following
holds:
(a1 , a2 , . . . , ak ) X
(h(a1 ), h(a2 ), . . . , h(ak )) X
then X is not definable in M.
Example. Let L = {R} where R is a binary relation symbol, and let M be the L-structure where M = Z
and RM = {(a, b) Z2 : a < b}. We show that a set X M is definable in M if and only if either X = or
X = Z. First notice that is definable as witnessed by (x = x) and Z as witnessed by x = x. Suppose now
that X Z is such that X 6= and X 6= Z. Fix a, b Z such that a X and b
/ X. Define h : M M
by letting h(c) = c + (b a) for all c M . Notice that h is automorphism of M because it is bijective (the
map g(c) = c (b a) is clearly an inverse) and a homomorphism because if c1 , c2 Z, then have have
(c1 , c2 ) RM c1 < c2
c1 + (b a) < c2 + (b a)
h(c1 ) < h(c2 )
(h(c1 ), h(c2 )) RM
Notice also that h(a) = a + (b a) = b, so a X but h(a)
/ X. It follows from the proposition that X is
not definable in M.
4.3.3
Substructures
Definition 4.3.9. Let L be a language and let M and A be L-structures. We say that A is a substructure
of M, and we write A M if
1. A M .
2. cA = cM for all c C.
73
3. RA = RM Ak for all R Rk .
4. f A = f M Ak for all f Fk .
Remark 4.3.10. Let L be a language and let M and A be L-structures with A M . We then have that
A M if and only if the identity map : A M is a homomorphism.
Remark 4.3.11. Suppose that M is an L-structure and that A M . A is the universe of a substructure
of M if and only if {cM : c C} A and f M (a1 , a2 , . . . , ak ) A for all f Fk and all a1 , a2 , . . . , ak A.
Proposition 4.3.12. Let M be an L-structure and let B M . Suppose either that B 6= or C 6= . If we
let A = G(M, B {cM : c C}, {f M : f F }), then A is the universe of a substructure of M. Moreover, if
N M with B N , then A N .
Proposition 4.3.13. Let L be a language.
1. A 1 -formula is an element of G(SymL , QuantF reeF ormL , {h,x : x V ar}).
2. A 1 -formula is an element of G(SymL , QuantF reeF ormL , {h,x : x V ar}).
Proposition 4.3.14. Suppose that A M.
1. For any QuantF reeF ormL and any s : V ar A, we have
(A, s) if and only if (M, s)
2. For any 1 -formula F ormL and any s : V ar A, we have
If (A, s) , then (M, s)
3. For any 1 -formula F ormL and any s : V ar A, we have
If (M, s) , then (A, s)
Proof.
1. This follows from Remark 4.3.10 and Theorem 4.3.4.
2. We prove this by induction. If is quantifier-free, this follows from part 1. Suppose that we know the
result for , and suppose that (A, s) x. Fix a A such that (A, s[x a]) . By induction, we
know that (M, s[x a]) , hence (M, s) x.
3. We prove this by induction. If is quantifier-free, this follows from part 1. Suppose that we know the
result for , and suppose that (M, s) x. For every a A, we then have (M, s[x a]) , and
hence (A, s[x a]) by induction. It follows that (A, s) x.
4.3.4
Elementary Substructures
Definition 4.3.15. Let L be a language and let M and A be L-structures. We say that A is an elementary
substructure of M if A M and for all F ormL and all s : V ar A, we have
(A, s) if and only if (M, s)
We write A M to mean that A is an elementary substructure of M.
74
Example. Let L = {f} where f is a unary function symbol. Let M be the L-structure with M = N and
f M (n) = n + 1. Let A be L-structure with A = N+ and f A (n) = n + 1. We then have that A M.
Furthermore, we have M
= A, hence for all SentL we have
A if and only if M
However, notice that A 6 M because if (x) is the formula y(fy = x), we then have that (A, 1) but
(M, 1) 6 .
Theorem 4.3.16 (Tarski-Vaught Test). Suppose that A M. The following are equivalent.
1. A M.
2. Whenever F ormL , x V ar, and s : V ar A satisfy (M, s) x, there exists a A such that
(M, s[x a])
Proof. We first prove that 1 implies 2. Suppose then that A M. Let F ormL and s : V ar A be
such that (M, s) x. Using the fact that A M, it follows that (A, s) x. Fix a A such that
(A, s[x a]) . Using again the fact that A M, we have (M, s[x a]) .
We now prove that 2 implies 1. We prove by induction on F ormL that for all s : V ar A, we have
(A, s) if and only if (M, s) . That is, we let
X = { F ormL : For all s : V ar A we have (A, s) if and only if (M, s) }
and prove that X = F ormL by induction. First notice that X for all quantifier-free because A M.
Suppose now that X. For any s : V ar A, we have
(A, s) (A, s) 6
(M, s) 6
(since X)
(M, s)
Therefore, X.
Suppose now that , X. For any s : V ar A, we have
(A, s) (A, s) and (A, s)
(M, s) and (M, s)
(since , X)
(M, s)
Therefore, X. Similarly, we have X and X.
Suppose now that X and x V ar. For any s : V ar A, we have
(A, s) x There exists a A such that (A, s[x a])
There exists a A such that (M, s[x a])
(since X)
(M, s) x
Therefore, x X.
Suppose now that X and x V ar. We then have that X from above, hence x X from
above, hence x X again from above. Thus, for any s : V ar A, we have
(A, s) x (A, s) x
(M, s) x
(M, s) x
Therefore, x X.
(since x X)
75
76
4.4
4.4.1
cM = cM for all c C.
0
RM = RM for all R R.
0
f M = f M for all f F.
Proposition 4.4.2. Let L L0 be languages, let M0 be an L0 -structure, and let M be the restriction of M0
to L. For all F ormL and all s : V ar M , we have (M, s) if and only if (M0 , s) .
Proof. By induction.
4.4.2
Definition 4.4.3. Let L be a language and let M be an L-structure. For each a M , introduce a new
constant ca (not appearing in the original language L and all distinct). Let LM = L {ca : a M } and let
M
Mexp be the LM -structure which is the expansion of M in which ca exp = a for all a M . We call Mexp
the expansion of M obtained by adding names for elements of M .
Definition 4.4.4. Let M be an L-structure, and let s : V ar M be a variable assignment. Define a
function N ames : T ermL T ermLM by plugging in names for free variables according to s. Define a
function N ames : F ormL SentLM again by plugging in names for free variables according to s.
Proposition 4.4.5. Let M be an L-structure, and let s : V ar M be a variable assignment. For every
F ormL , we have
(M, s) if and only if Mexp N ames ()
Definition 4.4.6. Let M be an L-structure.
We let AtomicDiag(M) = { SentLM AtomicF ormLM : Mexp }.
We let Diag(M) = { SentLM : Mexp }.
Proposition 4.4.7. Let L be a language and let M and N be the L-structures. The following are equivalent:
There exists an embedding h from M to N .
There exists an expansion of N to an LM -structure which is a model of AtomicDiag(M).
Proposition 4.4.8. Let L be a language and let M and N be the L-structures. The following are equivalent:
There exists an elementary embedding h from M to N .
There exists an expansion of N to an LM -structure which is a model of Diag(M).
Chapter 5
Definition 5.1.1. Let L be a language and let F ormL . A model of is a pair (M, s) where
M is an L-structure.
s : V ar M is a variable assignment.
(M, s) for all .
Definition 5.1.2. Let L be a language. Let F ormL and let F ormL . We write to mean that
whenever (M, s) is a model of , we have that (M, s) . We pronounce as semantically implies
.
Definition 5.1.3. Let L be a language and let F ormL . We say that is satisfiable if there exists a
model of .
Definition 5.1.4. Let L be a language. A set SentL is an L-theory if
is a satisfiable.
For every SentL with , we have .
There are two standard ways to get theories. One is to take a stucture, and consider all of the sentences
that are true in that structure.
Definition 5.1.5. Let M be an L-structure. We let T h(M) = { SentL : M }. We call T h(M) the
theory of M.
Proposition 5.1.6. Let L be a language and let M be an L-structure. T h(M) is an L-theory.
Proof. First notice that T h(M) is satisfiable because M is a model of T h(M) (since M for all
T h(M) by definition). Suppose now that SentL is such that T h(M) . Since M is a model of T h(M),
it follows that M , and hence T h(M).
Another standard way to get a theory is to take an arbitrary satisfiable set of sentences, and close it off
under semantic implication.
77
78
Definition 5.1.7. Let L be a language and let SentL . We let Cn() = { SentL : }. We call
Cn() the set of consequences of .
Proposition 5.1.8. Let L be a language and let SentL be satisfiable. We then have that Cn() is an
L-theory.
Proof. We first show that Cn() is satisfiable. Since is satsfiable, we may fix a model M of . Let
Cn(). We then have that , so using the fact that M is a model of we conclude that M .
Therefore, M is a model of Cn(), hence Cn() is satisfiable.
Suppose now that SentL and that Cn() . We need to show that Cn(), i.e. that .
Let M be a model of . Since for all Cn(), it follows that M for all Cn(). Thus, M
is a model of Cn(). Since Cn() , it follows that M . Thus, , and so Cn().
Definition 5.1.9. An L-theory is complete if for all SentL , either or .
Proposition 5.1.10. Let L be a language and let M be an L-structure. T h(M) is a complete L-theory.
Proof. Weve already seen that T h(M) is a theory. Suppose now that SentL . If M , we then have
that T h(M). Otherwise, we have M
6 , so by definition M , and hence T h(M).
Example. Let L = {f, e} where f is a binary function symbol and e is a constant symbol. Consider the
following sentences.
1 = xyz(f(f(x, y), z) = f(x, f(x, y)))
2 = x(f(x, e) = x f(e, x) = x)
3 = xy(f(x, y) = e f(y, x) = e)
The theory T = Cn({1 , 2 , 3 }) is the theory of groups. T is not complete because it neither contains
xy(f(x, y) = f(y, x)) nor its negation, since there are both abelian groups and nonabelian groups.
Definition 5.1.11. Let L = {R} where R is a binary relation symbol. Consider the following sentences
1 = xRxx
2 = xyz((Rxy Ryz) Rxz)
3 = xy(Rxy Ryx)
4 = xy(x = y Rxy Ryx)
and let LO = Cn({1 , 2 , 3 , 4 }). LO is called the theory of (strict) linear orderings. LO is not complete
because it neither contains yx(x = y x < y) nor its negation because there are linear ordering with greatest
elements and linear orderings without greatest elements.
Definition 5.1.12. Let L = {R} where R is a binary relation symbol. Consider the following sentences
1 = xRxx
2 = xyz((Rxy Ryz) Rxz)
3 = xy(Rxy Ryx)
4 = xy(x = y Rxy Ryx)
5 = xy(Rxy z(Rxz Rzy))
6 = xyRxy
7 = xyRyx
and let DLO = Cn({1 , 2 , 3 , 4 , 5 , 6 , 7 }). DLO is called the theory of dense (strict) linear orderings
without endpoints. DLO is complete as well see below.
79
Theorem 5.1.13 (Countable Lowenheim-Skolem Theorem). Suppose that L is countable and that
F ormL is satisfiable. There exists a countable model (M, s) of .
Proof. Since is satisfiable, we may fix a model (N , s) of . Let X = ran(s) N and notice that X
is countable. By the Countable Lowenheim-Skolem-Tarski Theorem, there exists a countable elementary
substructure M N such that X M . Notice that s is also a variable assigment on M . Now for any
, we have that (N , s) because (N , s) is a model of , hence (M, s) because M N . It follows
that (M, s) is a model of .
5.1.2
80
Example 5.1.20. Let L = {f} where f is a unary function symbol, and let T = Cn({x(ffx = x)}). We have
I(T, n) = b n2 c + 1 for all n N+ .
Proof. Lets first analyze the finite models of T . Suppose that M is a model of T of cardinality n. For every
a M , we then have f M (f M (a)) = a. There are now two cases. Either f M (a) = a, or f M (a) = b 6= a in
which case f M (b) = a. Let
F ixM = {a M : f M (a) = a}.
M oveM = {a M : f M (a) 6= a}.
From above, we then have that |M oveM | is even and that |F ixM | + |M oveM | = n. Now the idea is that
two models M and N of T of cardinality n are isomorphic if and only if they have the same number of fixed
points, because then we can match up the fixed points and then match up the pairings left over to get an
isomorphism. Heres a more formal argument.
We know show that if M and N are models of T of cardinality n, then M
= N if and only if |F ixM | =
|F ixN |. Clearly, if M
= N , then |F ixM | = |F ixN |. Suppose conversely that |F ixM | = |F ixN |. We then
M|
such that f M (x) 6= y
must have |M oveM | = |M oveN |. Let XM M oveM be a set of cardinality |M ove
2
M
for all x, y X (that is, we pick out one member from each pairing given by f ), and let XN be such
a set for N . Define a function h : M N . Fix a bijection from : F ixM F ixN and a bijection
: XM XN . Define h by letting h(a) = (a) for all a F ixM , letting h(x) = (x) for all x XM , and
letting h(y) = f N ((f M (y))) for all y M oveM \X. We then have that h is an isomophism from M to N .
Now we need only count how many possible values there are for |F ixM |. Let n N+ . Suppose first that n
is even. Since |M oveM | must be even, it follows that |F ixM | must be even. Thus, |F ixM | {0, 2, 4, . . . , n},
so there are n2 + 1 many possibilities, and its easy to construct models in which each of these possibilities
occurs. Suppose now that n is odd. Since |M oveM | must be even, it follows that |F ixM | must be odd.
Thus, |F ixM | {1, 3, 5, . . . , n}, so there are n1
2 + 1 many possibilities, and its easy to construct models in
which each of these possibilities occurs. Thus, in either case, we have I(T, n) = b n2 c + 1.
Example 5.1.21. I(DLO, n) = 0 for all n N+ .
Proof. As mentioned in the LO example, every finite linear ordering has a least element.
Definition 5.1.22. Suppose that L is a finite language and SentL . Let
Spec() = {n N+ : I(Cn(), n) > 0}
Proposition 5.1.23. There exists a finite language L and a SentL such that Spec() = {2n : n N+ }.
Proof. We give two separate arguments. First, let L = {e, f} be the language of group theory. Let be the
conjunction of the group axioms with the sentence x((x = e) fxx = e) expressing that there is an element
of order 2. Now for every n N+ , the group Z/(2n)Z is a model of of cardinality 2n because n is an
element of order 2. Thus, {2n : n N+ } Spec(). Suppose now that k Spec(), and fix a model M
of of order k. We then have that M is a group with an element of order 2, so by Lagranges Theorem it
follows that 2 | k, so k {2n : n N+ }. It follows that Spec() = {2n : n N+ }.
For a second example, let L = {R} where R is a binary relation symbol. Let be the conjunction of the
following sentences:
xRxx.
xy(Rxy Ryx).
xyz((Rxy Ryz) Rxz).
81
5.1.3
Theorem 5.1.25. Suppose that M and N are two countably infinite models of DLO. We then have that
M
= N.
Proof. Back-and-forth construction. See Damirs carefully written proof.
Corollary 5.1.26 (Countable Los-Vaught Test). Let L be a countable language. Suppose that T is an Ltheory such that all models of T are infinite, and suppose also that every two countably infinite models of T
are isomorphic. We then have that T is complete.
Proof. Suppose that T is not complete and fix SentL such that
/ T and
/ T . We then have that
T {} and T {} are both satisfiable by infinite models (because all models of T are infinite), so by
the Countable Lowenheim-Skolem Theorem we may fix countably infinite models M1 of T {} and M2
of T {}. We then have that M1 and M2 are countably infinite models of T which are not isomorphic
(because they are not elementarily equivalent), a contradiction.
Corollary 5.1.27. DLO is complete.
Proposition 5.1.28. Suppose that T is a complete L-theory. If M and N are models of T , then M N .
Proof. Let SentL . If T , we then have that both M and N . Suppose that
/ T . Since T is
complete, we then have that T , hence M and N . It follows that both M 6 and N 6 .
Therefore, for all SentL , we have that M if and only if N , so M N .
Corollary 5.1.29. In the language L = {R} where R is a binary relation symbol, we have (Q, <) (R, <).
5.2
5.2.1
Syntactic Implication
Definitions
Basic Proofs:
` if
(AssumeL )
82
(EqRef l)
Proof Rules:
`
(EL)
`
`
(ER)
`
` `
(I)
`
`
(IR)
`
`
(IL)
`
`
( E)
{} `
{} `
( I)
`
{} ` {} `
(P C)
`
{} ` {} `
(P C)
{ } `
{} ` {} `
(Contr)
`
Equality Rules:
` tx ` t = u
` ux
Existential Rules:
` tx
` x
{yx } `
{x} `
Universal Rules:
if y
/ F reeV ar( {x, }) and V alidSubstyx () = 1 (P )
` x
` tx
` yx
` x
if V alidSubsttx () = 1 (I)
if V alidSubsttx () = 1 (E)
if y
/ F reeV ar( {x}) and V alidSubstyx () = 1 (I)
Superset Rule:
`
0 `
if 0 (Super)
Definition 5.2.1. A deduction is a witnessing sequence in (P(F ormL ) F ormL , AssumeL EqRef l, H).
Definition 5.2.2. Let F ormP and let F ormP . We write ` to mean that
(, ) (P(F ormL ) F ormL , AssumeL EqRef l, H)
We pronounce ` as syntactically implies .
Notation 5.2.3.
1. If = , we write ` instead of ` .
2. If = {}, we write ` instead of {} ` .
Definition 5.2.4. is inconsistent if there exists F ormP such that ` and ` . Otherwise, we
say that is consistent.
5.2.2
83
(EqRef l)
(1)
{t = u} ` t = u
(AssumeL )
(2)
{t = u} ` u = t
(3)
(AssumeL )
(1)
{t = u, u = w} ` u = w
(AssumeL )
(2)
{t = u, u = w} ` t = w
(3)
Sk
i=1 (OccurV
` Rt1 t2 tk
(AssumeL )
(1)
` t1 = u1
(AssumeL )
(2)
(3)
(AssumeL )
(4)
(5)
` Ru1 t2 t3 tk
` t2 = u2
` Ru1 u2 t3 tk
..
.
` tk = uk
(AssumeL )
` Ru1 u2 uk
(2k)
(2k + 1)
Sk
i=1 (OccurV
84
u1 , t2 = u2 , . . . , tk = uk }. We have
` ft1 t2 tk = f t1 t2 tk
` t1 = u1
(1)
(AssumeL )
(2)
(3)
(AssumeL )
(4)
(3)
` ft1 t2 tk = fu1 t2 tk
` t2 = u2
` ft1 t2 tk = fu1 u2 tk
..
.
(EqRef l)
` tk = uk
(AssumeL )
` ft1 t2 tk = fu1 u2 uk
(2k)
(2k + 1)
Similar to the previous proposition, but start with the line ` ft1 t2 tk = ft1 t2 tk using the EqRef l
rule.
Proposition 5.2.9. x ` x.
Proof. Fix y 6= x with y
/ OccurV ar().
{yx , x, x} ` x
{yx , x, x}
{yx , x}
{yx , x}
{yx , x}
{yx }
` x
(AssumeL )
(1)
(AssumeL )
(2)
(Contr on 1 and 2)
(E on 3)
(3)
(4)
(AssumeL )
(5)
` x
(Contr on 4 and 5)
(6)
{x} ` x
(P on 6)
(7)
` x
` (yx )
` yx
Proposition 5.2.10. x ` x.
Proof. Fix y 6= x with y
/ OccurV ar().
{x, yx } ` x
(AssumeL )
(1)
{x, yx }
{x, yx }
(AssumeL )
(2)
()yx
` x
{x} ` yx
{x} ` x
5.2.3
(I on 2)
(3)
(Contr on 1 and 3)
(4)
(I on 4)
(5)
Theorems About `
85
1. If {} is inconsistent, then ` .
2. If {} is inconsistent, then ` .
Proof.
1. Since {} is inconsistent, we know that {} ` by Proposition 5.2.11. Since we also have
that {} ` by Assume, it follows that ` by the P C rule.
2. Since {} is inconsistent, we know that {} ` by Proposition 5.2.11. Since we also have
that {} ` by Assume, it follows that ` by the P C rule.
Proposition 5.2.15. Let Gf in = G(Pf in (F ormL ) F ormL , AssumeL EqRef l, H), i.e. we insist that the
set is finite but otherwise have exactly the same proof rules. Let `f in denote that (, ) Gf in
1. If `f in , then ` .
2. If ` , then there exists a finite 0 such that 0 `f in
In particular, if ` , then there exists a finite 0 such that 0 ` .
Proof. 1 is a completely straightforward induction because the starting points are the same and we have the
exact same rules. The proof of 2 goes in much the same way as the corresponding result for propositional
logic.
Corollary 5.2.16. If every finite subset of is consistent, then is consistent.
Proof. Suppose that is inconsistent, and fix F ormL such that ` and ` . By Proposition
5.2.15, there exists finite sets 0 and 1 such that 0 ` and 1 ` . Using the Super rule, it
follows that 0 1 ` and 0 1 ` , so 0 1 is a finite inconsistent subset of .
86
Chapter 6
Soundness
88
(since y
/ F reeV ar() and y 6= x)
Thus, (M, s[y a]) yx in either case. Now since (M, s) for all and y
/ F reeV ar(),
we have (M, s[y a]) for all . Thus, (M, s[y a]) because {yx } . Finally,
since y
/ F reeV ar(), it follows that (M, s) .
We next do the E rule. Suppose that x and that t T ermL is such that V alidSubsttx () =
1. We need to show that tx . Fix a model (M, s) of . Since x, it follows that that
(M, s) x. Since V alidSubsttx () = 1, we have
(M, s) x For all a M, we have (M, s[x a])
(M, s[x s(t)])
(M, s) tx
(since y
/ F reeV ar() and y 6= x)
Now a M was arbitrary, so (M, s[x a]) for every a M , hence (M, s) x.
The result follows by induction.
2. Let be a satisfiable set of formulas. Fix a model (M, s) of . Suppose that is inconsistent, and fix
F ormL such that ` and ` . We then have and by part 1, hence (M, s)
and (M, s) , a contradiction. It follows that is consistent.
6.2
89
Prime Formulas
Corollary 6.2.5. Let L be a language, let F ormL , and let F ormL . If # P (L) # (in the
propositional language P (L)), then `L (in the first-order language L).
90
Proof. Suppose that # P (L) # . By Proposition 6.2.4, it follows that (# )? P (L) (# )? , hence `L
by Proposition 6.2.3
Example 6.2.6. Let L be a language and let , F ormL We have ( ) `L ()
Proof. We show that (( ))# P (L) ( ())# . Notice that
1. (( ))# = (# # )
2. ( ())# = # ( # ).
Suppose that v : P (L) {0, 1} is a truth assignment such that v((# # )) = 1. We then have v(#
# ) = 0, hence v(# ) = 1 and v( # ) = 0. We therefore have v(( # )) = 1 and hence v(# ( # )) = 1.
It follows that (( ))# P (L) ( ())# .
Corollary 6.2.7. Let L be a language, let F ormL , and let , F ormL . If `L and # P (L) # ,
then `L .
Proof. Since # P (L) # , we have that `L by Corollary 6.2.5. It follows from the Super rule that
{} `L . Using Proposition 5.2.14 (since `L and {} `L ), we may conclude that `L .
6.3
6.3.1
Completeness
Motivating the Proof
We first give an overview of the key ideas in our proof of completeness. Let L be a language, and suppose
that F ormL is consistent.
Definition 6.3.1. Suppose that L is a language and that F ormL . We say that is complete if for all
F ormL , either or .
As we saw in propositional logic, it will aid use greatly to extend to a set which is both consistent
and complete, so lets assume that we can do that (we will prove it exactly the same way below). We need
to construct an L-structure M and a variable assignment s : V ar M such that (M, s) for all .
Now all that we have is the syntactic information that provides, so it seems that the only way to proceed
is to define our M from these syntactic objects. Since terms intuitively name elements, it is natural to try
to define the universe M to simply be T ermL . We would then define the structure as follows
1. cM = c for all c C.
2. RM = {(t1 , t2 , . . . , tk ) M k : Rt1 t2 . . . tk } for all R Rk .
3. f M (t1 , t2 , . . . , tk ) = ft1 t2 tk for all f Fk and all t1 , t2 , . . . , tk M .
and let s : V ar M be the variable assignment defined by s(x) = x for all x V ar.
However, there are two problems with this approach, one of which is minor and the other is quite serious.
First, lets think about the minor problem. Suppose that L = {f, e} where f is a binary function symbol
and e is a constant symbol, and that is the set of group axioms. Suppose that is consistent and
complete. We then have fee = e because ` fee = e. However, the two terms fee and e are syntactically
different objects, so if we were to let M be T ermL this would cause a problem because fee and e are distinct
despite the fact that says they must be equal. Of course, when you have distinct objects which you want
to consider equivalent, you should define an equivalence relation. Thus, we should define on T ermL by
letting t u if t = u . We would then need to check that is an equivalence relation and that the
6.3. COMPLETENESS
91
definition of the structure above is independent of our choice of representatives for the classes. This is all
fairly straightfoward, and will be carried out below.
On to the more serious obstacle. Suppose that L = {P} where P is a unary relation symbol. Let
= {Px : x V ar} {(x = y) : x, y V ar with x 6= y} {xPx} and notice that is consistent because
it is satisfiable (Let M = N, let s : V ar N be s(xk ) = k + 1 and let PM = {0}). Suppose that is
consistent and complete. In the structure M described above, we have M = T ermL = V ar (notice that the
equivalence relation defined above will be trivial in this case). Thus, since (M, s) Px for all x V ar, it
follows that (M, s) 6 xPx. Hence, M is not a model of .
The problem in the above example is that there was an existential statement in , but whenever you
plugged a term in for the quantified variable, the resulting formula was not in . Since we are building
our structure from the terms, this is a serious problem. However, if had the following, then this problem
would not arise.
Definition 6.3.2. Let L be a language and let F ormL . We say that contains witnesses if for all
F ormL and all x V ar, there exists c C such that (x) cx .
Our goal then is to show that if is consistent, then there exists a which is consistent, complete,
and contains witnesses. On the face of it, this is not true, as the above example shows (because there are no
constant symbols). However, if we allow ourselves to expand our language with new constant symbols, we
can repeatedly add witnessing statements by using these fresh constant symbols as our witnesses. The key
question we need to consider is the following. Suppose that L is a language and F ormL is consistent.
If you expand the language L to a language L0 obtained by adding a new constant symbol, is the set still
consistent when viewed as a set of L0 formulas? It might seem absolutely harmless to add a new constant
symbol about which we say nothing (and its not hard very hard to see that it is semantically harmless),
but we are introducing new deductions in L. We need a way to convert a possibly bad L0 -deduction into a
similarly bad L-deduction to argue that is still consistent as a set of L0 -formulas.
6.3.2
The Proof
We can also define substitution of variables for constants in the obvious recursive fashion. Ignore the following
lemma until you see why we need it later.
Lemma 6.3.3. Let F ormL , let t T ermL , let c C, and let x, z V ar. Suppose that z
/ OccurV ar().
tz
92
6.3. COMPLETENESS
93
n
[
(i {i }))
i=0
n
[
(i ) {i })
i=0
94
0 `L0 0
1 `L0 1
2 `L0 2
..
.
n `L0 n
is an L0 -deduction, so n `L0 . Using the Super rule, we conclude that `L0 . Therefore, `L
by part 2.
Corollary 6.3.7. Let L be a language and let L0 be L together with (perhaps infinitely many) new constant
symbols. Let F ormL . is L-consistent if and only if is L0 -consistent.
Proof. Since any L-deduction is also a L0 -deduction, if is L-inconsistent then it is L0 -inconsistent . Suppose
that is L0 -inconsistent. We then have that `L0 for all F ormL by Proposition 5.2.11, hence `L
for all F ormL by Corollary 6.3.6. Therefore, is L-inconsistent.
Lemma 6.3.8. Let L be a language, and let L0 be L together with a new constant symbol c. Suppose that
F ormL is L-consistent and that F ormL . We then have that {(x) cx } is L0 -consistent.
Proof. Suppose that {(x) cx } is L0 -inconsistent. We then have that `L0 ((x) cx ), hence
#
#
`L0 (x) (cx ) by Corollary 6.2.7 (because (((x) cx )) P (L0 ) ((x) (cx )) ). Thus, `L0
x by the EL rule, so `L0 x (by Proposition 5.2.9 and Proposition 5.2.14), and hence `L x
by Corollary 6.3.6. We also have `L0 ()cx by the ER rule, so `L x by Generalization on
Constants. This contradicts the fact that is L-consistent.
Lemma 6.3.9. Let L be a language and let F ormL be L-consistent. There exists a language L0 L
and 0 F ormL0 such that
1. 0 .
2. 0 is L0 -consistent.
3. For all F ormL and all x V ar, there exists c C such that (x) cx 0 .
6.3. COMPLETENESS
95
Proof. For each F ormL and each x V ar, let c,x be a new constant symbol (distinct from all symbols
in L). Let L0 = L {c,x : F ormL and x V ar}. Let
c
,x
,x
,x
,x
,x
,x
,x
,x
,x
96
Proof.
Lemma 6.3.13. Suppose that is consistent and complete. If ` , then .
Proof. Suppose that ` . Since is complete, we have that either or . Now if ,
then ` , hence is inconsistent contradicting our assumption. It follows that .
Lemma 6.3.14. Suppose that is consistent, complete, and contains witnesses. For every t T ermL ,
there exists c C such that t = c .
Proof. Let t T ermL . Fix x V ar such that x
/ OccurV ar(t). Since contains witnesses, we may fix
c C such that (x(t = x)) (t = c) (using the formula t = x). Now ` (t = x)tx , so we may use the
I rule (because V alidSubsttx (t = x) = 1) to conclude that ` x(t = x). From here we can use the Super
rule to conclude that ` x(t = x). We therefore have ` x(t = x) and ` (x(t = x)) (t = c), hence
` t = c by Proposition 5.2.14. Using Lemma 6.3.13, we conclude that t = c .
Lemma 6.3.15. Suppose that is consistent, complete, and contains witnesses. We have
1. if and only if
/ .
2. if and only if and .
3. if and only if or .
4. if and only if
/ or .
5. x if and only if there exists c C such that cx .
6. x if and only if cx for all c C.
Proof.
1. If , then
/ because otherwise ` and so would be inconsistent.
Conversely, if
/ , then because is complete.
2. Suppose first that . We then have that ` , hence ` by the EL rule and `
by the ER rule. Therefore, and by Lemma 6.3.13.
Conversely, suppose that and . We then have ` and ` , hence ` by the
I rule. Therefore, by Lemma 6.3.13.
3. Suppose first that . Suppose that
/ . Since is complete, we have that . From
Proposition 3.4.10, we know that {, } ` , hence ` by the Super rule. Therefore,
by Lemma 6.3.13. It follows that either or .
Conversely, suppose that either or .
Case 1: Suppose that . We have ` , hence ` by the IL rule. Therefore,
by Lemma 6.3.13.
Case 2: Suppose that . We have ` , hence ` by the IR rule. Therefore,
by Lemma 6.3.13.
6.3. COMPLETENESS
97
4. Suppose first that . Suppose that . We then have that ` and ` , hence
` by Proposition 5.2.14. Therefore, by Lemma 6.3.13. It follows that either
/ or
.
Conversely, suppose that either
/ or .
Case 1: Suppose that
/ . We have because is complete, hence {} is inconsistent
(as {} ` and {} ` ). It follows that {} ` by Proposition 5.2.11, hence
` by the I rule. Therefore, by Lemma 6.3.13.
Case 2: Suppose that . We have {}, hence {} ` , and so ` by the
I rule. Therefore, by Lemma 6.3.13.
5. Suppose first that x . Since contains witnesses, we may fix c C such that (x) cx .
We therefore have ` x and ` (x) cx , hence ` cx by Proposition 5.2.14. Using Lemma
6.3.13, we conclude that cx .
Conversely, suppose that there exists c C such that cx . We then have ` cx , hence ` x
using the I rule (notice that V alidSubstcx () = 1). Using Lemma 6.3.13, we conclude that x .
6. Suppose first that x . We then have ` x, hence ` cx for all c C using the E rule
(notice that V alidSubstcx () = 1 for all c C). Using Lemma 6.3.13, we conclude that cx for all
c C.
Conversely, suppose that cx for all c C. Since is consistent, this implies that there does not
/ by part 5, so x by part 1. It
exist c C with (cx ) = ()cx . Therefore, x
follows from Proposition 5.2.10 that ` x. Using Lemma 6.3.13, we conclude that x .
(by induction)
98
(M, s) 6
(by induction)
(M, s)
Suppose that the result holds for and . We have
and
(by induction)
(M, s)
and
or
(M, s) or (M, s)
(by induction)
(M, s)
and finally
/ or
(M, s) 6 or (M, s)
(by induction)
(M, s)
Suppose that the result holds for and that x V ar. We have
x There exists c C such that cx
There exists c C such that (M, s)
(by induction)
(by the Substitution Theorem)
6.4. COMPACTNESS
99
and also
x For all c C, we have cx
For all c C, we have (M, s)
(by induction)
(by the Substitution Theorem)
We now give another proof of the Countable Lowenheim-Skolem Theorem which does not go through the
concept of elementary substructures.
Corollary 6.3.18 (Countable Lowenheim-Skolem Theorem). Suppose that L is countable and F ormL
is consistent. There exists a countable model of .
Proof. Notice that if L is consistent, then the L0 formed in Lemma 6.3.9 is countable because F ormL V ar
is countable. Thus, each Ln in the proof of Proposition 6.3.10 is countable, so the L0 formed in Proposition
6.3.10 is countable. It follows that T ermL0 is countable, and since the L0 -structure M we construct in the
proof of Proposition 6.3.16 is formed by taking the quotient from an equivalence relation on the countable
T ermL0 , we can conclude that M is countable. Therefore, the L-structure which is the restriction of M to
L from the proof of the Completeness Theorem is countable.
6.4
Compactness
100
1. Suppose that . By the Completeness Theorem, we have ` . Using Proposition 5.2.15, we may
fix a finite 0 such that 0 ` . By the Soundness Theorem, we have 0 .
2. If every finite subset of is satisfiable, then every finite subset of is consistent by the Soundness
Theorem, hence is consistent by Corollary 5.2.16, and so is satisfiable by the Soundness Theorem.
6.5
Applications of Compactness
The next proposition is another result which expresses that first-order logic is not powerful enough to
distinguish certain aspects of cardinality. Here the distinction is between large finite numbers and the
infinite.
Proposition 6.5.1. Let L be a language. Suppose that F ormL is such that for all n N, there exists
a model (M, s) of such that |M | > n. We then have that there exists a model (M, s) of such that M is
infinite.
Proof. Let L0 = L {ck : k N} where the ck are new distinct constant symbols. Let
0 = {ck 6= c` : k, ` N and k 6= `}
We claim that every finite subset of 0 is satisfiable. Fix a finite 00 0 . Fix N N such that
00 {ck 6= c` : k, ` N and k 6= `}
By assumption, we may fix a model (M, s) of such that |M | > N . Let M0 be the L0 structure M together
with interpreting the constants c0 , c1 , . . . , cN as distinct elements of M and interpreting each ci for i > N
arbitrarily. We then have (M0 , s) is a model of 0 . Hence, every finite subset of 0 is satisfiable.
By the Compactness Theorem we may conclude that 0 is satisfiable. Fix a model (M0 , s) of 0 . If we
let M be the restriction of M0 to L, then (M, s) is a model of which is infinite.
Corollary 6.5.2. The class of all finite groups is not a weak elementary class in the language L = {f, e}.
Proof. If SentL is such that M od() includes all finite groups, then we may use the trivial fact that there
are arbitrarily large finite groups and Proposition 6.5.1 to conclude that it contains an infinite structure.
Proposition 6.5.3. Let L be a language. Suppose that F ormL is such there exists a model (M, s) of
with M infinite. We then have that there exists a model (M, s) of such that M is uncountable.
Proof. Let L0 = L {cr : r R} where the cr are new distinct constant symbols. Let
0 = {cr 6= ct : r, t R and r 6= t}
We claim that every finite subset of 0 is satisfiable. Fix a finite 00 0 . Fix a finite Z R such that
00 {cr 6= ct : r, t Z}
By assumption, we may fix a model (M, s) of such that M is infinite. Let M0 be the L0 structure M
together with interpreting the constants cr for r Z as distinct elements of M and interpreting each ct for
t
/ Z arbitrarily. We then have (M0 , s) is a model of 0 . Hence, every finite subset of 0 is satisfiable.
By the Compactness Theorem we conclude that 0 is satisfiable. Fix a model (M0 , s) of 0 . If we let M
be the restriction of M0 to L, then (M, s) is a model of which is uncountable.
101
Proposition 6.5.4. The class K of all torsion groups is not a weak elementary class in the language
L = {f, e}.
Proof. Suppose that SentL is such that K M od(). Let L0 = L {c} where c is new constant symbol.
For each n N+ , let n SentL0 be cn 6= e (more formally, fcfc fcc where there are n 1 fs). Let
0 = {n : n N}
We claim that every finite subset of 0 has a model. Suppose that 0 0 is finite. Fix N N such that
0 {n : n < N }
0
Notice that if we let M0 be the group Z/N Z and let cM = 1, then M0 is a model of 0 . Thus, every finite
subset of 0 has a model, so 0 has a model by Compactness. If we restrict this model to L, we get an
element of M od() which is not int K because it has an element of infinite order.
Proposition 6.5.5. The class K of all equivalence relations in which all equivalence classes are finite is not
a weak elementary class in the language L = {R}.
Proof. Suppose that SentL is such that K M od(). Let L0 = L {c} where c is new constant symbol.
For each n N+ , let n SentL0 be
^
x1 x2 xn (
(xi 6= xj )
1i<jn
n
^
Rcxi )
i=1
and let
0 = {n : n N}
We claim that every finite subset of 0 has a model. Suppose that 0 0 is finite. Fix N N such that
0 {n : n N }
0
102
6.6
Throughout this section, we work in the language L = {R} where R is binary relation symbol. We think of
graphs as L-structures which are models of {xRxx, xy(Rxy Ryx)}.
Definition 6.6.1. For each n N+ , let Gn be the set of of all models of {xRxx, xy(Rxy Ryx)} with
universe [n].
Definition 6.6.2. For each A Gn , we let
P rn (A) =
|A|
|Gn |
Definition 6.6.4. For each r, s N with max{r, s} > 0, let r,s be the sentence
x1 x2 xr y1 y2 ys (
(xi 6= xj )
1i<jr
(yi 6= yj )
r ^
s
^
(xi 6= yj )
i=1 j=1
1i<js
z(
r
^
(z 6= xi )
i=1
s
^
(z 6= yj )
j=1
r
^
i=1
Rxi z
s
^
Ryj z))
j=1
Proposition 6.6.5. For all r, s N with max{r, s} > 0, we have lim P rn (r,s ) = 1.
n
Proof. Fix r, s N. Suppose that n N with n > r, s. Fix distinct a1 , a2 , . . . , ar , b1 , b2 , . . . , bs {1, 2, . . . , n}.
For each c distinct from the ai and bj , let
Ac = {M Gn : c is linked to each ai and to no bj }
For each such c, we have P rn (Ac ) =
is
1
2r+s ,
so P rn (Ac ) = 1
(1
1
2r+s
1
2r+s
)nrs
Therefore,
n nr
1
P rn (r,s )
(1 r+s )nrs
r
s
2
1
nr+s (1 r+s )nrs
2
1 rs r+s
1
= (1 r+s )
n (1 r+s )n
2
2
1
2r+s 1
= (1 r+s )rs nr+s ( r+s )n
2
2
1 rs
nr+s
= (1 r+s )
2r+s
2
( 2r+s 1 )n
103
Proposition 6.6.6. Let = {xRxx, xy(Rxy Ryx)} {r,s : r, s N+ and max{r, s} > 0} and let
RG = Cn().
Proposition 6.6.7. RG is satisfiable.
Proof. We build a countable model M of RG with M = N. Notice first that since Pf in (N) (the set of all
finite subsets of N) is countable, so is the set Pf in (N)2 , Hence the set
{(A, B) Pf in (N)2 : A B = and A B 6= }
is countable. Therefore, we may list it as
(A1 , B1 ), (A2 , B2 ), (A3 , B3 ), . . .
and furthermore we may assume that max(An Bn ) < n for all n N. Let M be the L-structure where
M = N and RM = {(k, n) : k An } {(n, k) : k An }. Suppose now that A, B N are finite with
A B = and A B 6= . Fix n N with A = An and B = Bn . We then have that (k, n) RM for all
k A (because k An ) and (`, n)
/ RM for all ` B (because `
/ An and n
/ A` since ` < n). Therefore,
M r,s for all r, s N with max r, s > 0. Thus, M is a model of RG.
Theorem 6.6.8. All models of RG are infinite, and any two countable models of RG are isomorphic.
Proof. Suppose that M is model of RG which is finite. Let n = |M |. Since M n,0 , there exists b M
such that (b, a) RM for all a M . However, this is a contradiction because (a, a)
/ RM for all a M . It
follows that all models of RG are infinite.
Suppose now that M and N are two countable models of RG. From above, we know that M and N are
both countably infinite. List M as m0 , m1 , m2 , . . . and list N as n0 , n1 , n2 , . . . . We build an isomorphism
via a back-and-forth construction as in the proof of the corresponding result for DLO. That is, we define
k Pf in (M N ) for k N recursively such that
1. k k+1 .
2. If (m, n) k and (m0 , n) k , then m = m0 .
3. If (m, n) k and (m, n0 ) k , then n = n0 .
4. mi dom(2i ).
5. nj ran(2j+1 ).
6. If (m, n) and (m0 , n0 ) , then (m, m0 ) RM if and only if (n, n0 ) RN
Suppose
that we are successful. Define h : M N be letting h(m) be the unique n such that (m, n)
S
,
and notice that h is isomorphism.
k
kN
We now define the k . Let 0 = (m0 , n0 ). Suppose that k N and weve defined k . Suppose first
that k is odd, say k = 2i + 1. If mi dom(k ), let k+1 = k . Suppose then that mi
/ dom(k ). Let
A = {m dom(k ) : (m, mi ) RM } and let B = {m dom(k ) : (m, mi )
/ RM }. Since N is a model of RG
and A B = , we may fix n N \ran(k ) such that (k (m), n) RM for all m A and (k (m), n)
/ RM
for all m B. Let k+1 = k {(mi , n)}.
Suppose now that k is even, say k = 2j. If nj ran(k ), let k+1 = k . Suppose then that nj
/ ran(k ).
Let A = {n ran(k ) : (n, nj ) RN } and let B = {n ran(k ) : (n, nj )
/ RN }. Since M is a model of RG
and AB = , we may fix m M \dom(k ) such that (k1 (n), m) RM for all n A and (k1 (n), m)
/ RM
for all n B. Let k+1 = k {(m, nj )}.
104
2. If
/ RG, then lim P rn ( ) = 0.
n
Proof.
1. Suppose that RG. We then have , so by Compactness we may fix N N such that
{xRxx, xy(Rxy Ryx)} {r,s : r, s N }
We then have that if M Gn is such that M , then
_
M
r,s
0r,sN,max{r,s}>0
P rn ( )
P rn (r,s )
0r,sN,max{r,s}>0
2. Suppose that
/ RG. Since RG is complete, it follows that RG. Thus, lim P rn ( ) = 1 by
n
Chapter 7
Quantifier Elimination
7.1
Quantifiers make life hard, so its always nice when we can find a way to express a statement involving
quantifiers using an equivalent statement without quantifiers.
Examples.
1. Let L = {0, 1, +, } and let (a, b, c) (where a, b, c V ar) be the formula
x(ax2 + bx + c = 0)
or more formally
x(a x x + b x + c = 0)
Let M be the L-structure (C, 0, 1, +, ). Since C is algebraically closed, we have that
(M, s) (a 6= 0 b 6= 0 c = 0)
for all s : V ar C.
2. Let L = {0, 1, +, , <} and let (a, b, c) (where a, b, c V ar) be the formula
x(ax2 + bx + c = 0)
Let M be the L-structure (R, 0, 1, +, , <). Using the quadratic formula, we have
(M, s) ((a 6= 0 b2 4ac 0) (a = 0 b 6= 0) (a = 0 b = 0 c = 0))
for all s : V ar R.
The above examples focused on one structure rather than a theory (which could have many models).
The next example uses a theory.
Example. Let L = {0, 1, +, } and let T be the theory of fields, i.e. T = Cn() where is the set of field
axioms. Let (a, b, c, d) (where a, b, c, d V ar) be the formula
wxyz(wa + xc = 1 wb + xd = 0 ya + zc = 0 yb + zd = 1)
105
106
The part inside the parentheses is really just saying that the matrix equation is true:
w x a b
1 0
=
y z c d
0 1
Therefore, using simple facts about 2 2 determinants over an arbitrary field, we have
T ad 6= bc
Definition 7.1.1. Let T be a theory. We say that T has quantifier elimination, or has QE, if for every
k 1 and every (x1 , x2 , . . . , xk ) F ormL , there exists a quantifier-free (x1 , x2 , . . . , xk ) such that
T
This seems like an awful lot to ask of a theory. However, it is a pleasant surprise that several natural
and important theories have QE, and in several more cases we can obtain a theory with QE by only adding
a few things to the language. Before proving this, we first explain what we get from it.
7.2
The first application of using QE is to show that certain theories are complete. QE itself is not sufficient,
but a very mild additional assumption gives us what we want.
Proposition 7.2.1. Let T be a theory that has QE. If there exists an L-structure N such that for every
model M of T there is an embedding h : N M from N to M, then T is complete. (Notice, there is no
assumption that N is a model of T .)
Proof. Fix an L-structure N such that for every model M of T there is an embedding h : N M from N to
M, and fix n N . Let M1 and M2 be two models of T . For each i {1, 2}, fix an embedding hi : N Mi
from N to Mi . For each i, let Ai = ran(hi ), and notice that Ai is the universe of a substructure Ai of Mi .
Furthermore, notice that hi is an isomorphism from N to Ai .
Let SentL and let (x) F ormL be the formula (x = x). Since T has QE, we may fix a
quantifier-free (x) F ormL such that T . We then have
M1 (M1 , h1 (n))
(M1 , h1 (n))
(A1 , h1 (n))
(since is quantifier-free)
(N , n)
(A2 , h2 (n))
(M2 , h2 (n))
(since is quantifier-free)
(M2 , h2 (n))
M2
Proposition 7.2.2. Let T be a theory that has QE. Suppose that A and M are models of T and that
A M. We then have that A M.
107
Proof. Let F ormL and let s : V ar A be a variable assignment. Suppose first that
/ SentL . Since
T has QE, we may fix a quantifier-free (x) F ormL such that T . We then have
(M, s) (M, s)
(A, s)
(since is quantifier-free)
(A, s)
If is a sentence, we may tack on a dummy x = x as in the previous proof.
Proposition 7.2.3. Let T be a theory that has QE, let M be a model of T , and let k N+ . Let Z be the
set of all subsets of M k which are definable by atomic formuals. The set of definable subsets of M k equals
G(P(M k ), Z, {h1 , h2 }) where h1 : P(M k ) P(M k ) is the complement function and h2 : P(M k )2 P(M k )
is the union function.
Proof.
7.3
Definition 7.3.1. Let L be a language, and let , F ormL . We say that and are semantically
equivalent if and .
We now list a bunch of simple rules for manipulating formulas while maintaing.
1. (x) and x() are s.e.
2. (x) and x() are s.e.
3. (x) and x( ) are s.e. if x
/ F reeV ar().
4. (x) and x( ) are s.e. if x
/ F reeV ar().
5. (x) and x( ) are s.e. if x
/ F reeV ar().
6. (x) and x( ) are s.e. if x
/ F reeV ar().
7. (x) and x( ) are s.e. if x
/ F reeV ar().
8. (x) and x( ) are s.e. if x
/ F reeV ar().
Well need the following to change annoying variables.
1. x and y(yx ) are s.e. if y
/ OccurV ar().
2. x and y(yx ) are s.e. if y
/ OccurV ar().
Well also need to know that if and are s.e., then
1. and are s.e.
2. x and x are s.e.
3. x and x are s.e.
and also that if 1 are 2 s.e., and 1 and 2 are s.e., then
1. 1 1 and 2 2 are s.e.
108
m
^
i=1
n
^
j )
j=1
Proof.
7.4
109
T y(
n
^
i=1
j )
j=1
Now each i and j is s.e. with, and hence we may assume is, one of the following:
1. x` = y
2. y = y
If some i is x` = y, then
T y(
m
^
i=1
n
^
j ) (
j=1
m
^
i=1
n
^
j )xy`
j=1
If some j is y = y, then
T y(
m
^
n
^
i=1
j ) (x1 = x1 )
j=1
m
^
i=1
n
^
j ) x1 = x1
j=1
m
^
i=1
n
^
j )
j=1
Now each i and j is RG-equivalent with, and hence we may assume is, one of the following:
1. x` = y
2. Rx` y
3. y = y
110
4. Ryy
If some i is x` = y, then
RG y(
m
^
i=1
n
^
j ) (
j=1
m
^
i=1
n
^
j )xy`
j=1
RG y(
i=1
n
^
j ) (x1 = x1 )
j=1
m
^
i=1
n
^
^ ^
j )
j=1
(xa = xb )
aA bB
because in models of RG, given disjoint finite sets A and B of vertices, there are infinitely many vertices
linked to everything in A and not linked to everything in B. Therefore, RG has QE.
Notice that RG is complete because the structure M given by M = {0} and RM = trivially embeds
into all models of RG.
7.5
Definition 7.5.1. Let L = {0, 1, +, }. Let SentL be the field axioms together with the sentences
a0 a1 an (an 6= 0 x(an xn + + a1 x + a0 = 0))
for each n N+ . Let ACF = Cn().
Theorem 7.5.2. ACF has QE.
Proof Sketch. The fundamental observation is that we can think of atomic formulas with free variables in
{y, x1 , x2 , . . . , xk } as equations p(~x, y) = 0 where p(~x, y) Z[~x, y] is a polynomial.
Thus, we have to find quantifier-free equivalents to formulas of the form
y[
m
^
(pi (~x, y) = 0)
i=1
Qn
x, y),
j=1 qj (~
n
^
j=1
is equivalent in ACF to
y[
m
^
i=1
Suppose now that R is a ring and p1 , p2 , . . . , pm , q R[y] listed in decreasing order of degrees. Let the
leading term of p1 be ay n and let the leading term of pm be by k . We then have that there is a simultaneous
root of polynomials p1 , p2 , . . . , pm which is not a root of q if and only if one of the following happens:
111
112
Proof. Let p be prime. For every n, let Kn be the set of roots of xp x in Fp . By standard results
in algebra, we have that Kn is a field of order pn , and furthermore is the unique subfield of Fp of order
d
d
d
d
2d
pn . If d | n, we then have that Kd Kn because if ap = a, then ap
= (ap )p = ap = a, so
S
2d
d
d
3d
ap = (ap )p = ap = a, etc. Let K = nN Kn . Notice that K is a subfield of Fp because if a Kn
and b Km , then a + b, a b Kmn . Furthermore, notice that K is algebraically closed because a finite
extension of a finite field is finite. Therefore, K = Fp .
Now if we have finitely many a1 , a2 , . . . , am Fp = K, then we may fix an n such that a1 , a2 , . . . , am Kn .
We then have that subfield of Fp generated by a1 , a2 , . . . , am is a subfield of Kn , and hence is finite.
Theorem 7.5.9. Every injective polynomial map from Cn to Cn is surjective.
Proof. Let n,d SentL be the sentence expressing that every injective polynomial map from F n to F n ,
where each polynomial has degree at most d, is surjective. We want to show that C n,d for all n, d. To
do this, it suffices to show that Fp n,d for all primes p and all n, d N. Thus, it suffices to show that for
n
n
all primes p, every injective polynomial map from Fp to Fp is surjective.
n
n
Fix a prime p and an n N. Suppose that f : Fp Fp is an injective polynomial map. Let
n
n
(b1 , b2 , . . . , bn ) Fp . We need to show that there exists (a1 , a2 , . . . , an ) Fp with f (a1 , a2 , . . . , an ) =
(b1 , b2 , . . . , bn ). Let f1 , f2 , . . . , fn Fp [x1 , x2 , . . . , xn ] be such that f = (f1 , f2 , . . . , fn ), and let C be the finite set of coefficients appearing in f1 , f2 , . . . , fn . Let K be the subfield of Fp generated by C {b1 , b2 , . . . , bn }
and notice that K is a finite field. Now f K n maps K n into K n and is injective, so its surjective because
n
K n is finite. Thus, there exists (a1 , a2 , . . . , an ) K n Fp such that f (a1 , a2 , . . . , an ) = (b1 , b2 , . . . , bn ).
Chapter 8
Throughout this section, we work in the language L = {0, 1, <, +, } where 0, 1 are constant symbols, < is
a binary relation symbol, and +, are binary function symbols. We also let N = (N, 0, 1, <, +, ) where the
symbol 0 is interpreted as the real 0, the symbol + is interpreted as real addition, etc. Make sure that
you understand when + means the symbol in the language L and when it mean the addition function on N.
A basic question is whether T h(N) compeletely determines the model N. More precisely, we have the
following question.
Question 8.1.1. Are all models of T h(N) isomorphic to N?
Using Proposition 6.5.3, we can immediately give a negative answer to this question because there is an
uncountable model of T h(N), and an uncountable model cant be isomorphic to N. What would such a
model look like? In order to answer this, lets think a little about the kinds of sentences that are in T h(N).
Definition 8.1.2. For each n N, we define a term n T ermL as follows. Let 0 = 0 and let 1 = 1. Now
define the n recursively by letting n + 1 = n + 1 for each n 1. Notice here that the 1 and the + in n + 1
mean the actual number 1 and the actual addition function, whereas the 1 and + in n + 1 mean the symbols
1 and + in our language L. Thus, for example, 2 is the term 1 + 1 and 3 is the term (1 + 1) + 1.
Definition 8.1.3. Let M be an L-structure. We know that given any t T ermL containing no variables, t
corresponds to an element of M given by s(t) for some (any) variable assignment s : V ar M . We denote
this value by tM .
Notice that nN = n for all n N be a simple induction. Here are some important examples of the kinds
of things in T h(N).
Examples of Sentences in T h(N).
1. 2 + 2 = 4 and in general m + n = m + n and m n = m n.
2. xy(x + y = y + x)
3. x(x 6= 0 y(y + 1 = x))
113
114
Now any model M of T h(N) must satisfy all of these sentences. The basic sentences in 1 above roughly
tell us that M has a piece which looks just like N. We make this precise as follows.
Proposition 8.1.4. For any model M of T h(N), the function h : N M given by h(n) = nM is an
embedding of N into M.
Proof. Notice that
h(0N ) = h(0) = 0M = 0M
and
h(1N ) = h(1) = 1M = 1M
Now let m, n N. We have
m<nNm<n
m < n T h(N)
Mm<n
mM <M nM
h(m) <M h(n)
Also, since m + n = m + n T h(N) we have
h(m + n) = (m + n)M
= m M +M n M
= h(m) +M h(n)
and since m n = m n T h(N) we have
h(m n) = (m n)M
= mM M nM
= h(m) M h(n)
Finally, for any m, n N with m 6= n, we have m 6= n T h(N), so M m 6= n, and hence h(m) 6= h(n).
Proposition 8.1.5. Let M be a model of T h(N). The following are equivalent.
1. M
= N.
2. M = {nM : n N}.
Proof. If 2 holds, then the h of the Proposition 8.1.4 is surjective and hence an isomorphism. Suppose then
that 1 holds and fix an isomorphism h : N M from N to M. We show that h(n) = nM for all n N by
induction. We have
h(0) = h(0N ) = 0M
and
h(1) = h(1N ) = 1M
115
8.2
Throughout this section, let M be a nonstandard model of arithmetic. Anything we can express in the
first-order language of L which is true of N is in T h(N), and hence is true in M. For example, we have the
following.
Proposition 8.2.1.
+M is associative on M .
+M is commutative on M .
<M is a linear ordering on M .
For all a M with a 6= 0M , there exists b M with a + 1 = b.
Proof. The sentences
xyz(x + (y + z) = (x + y) + x)
xy(x + y = y + x)
xy(x < y y < x x = y)
x(x 6= 0 y(y + 1 = x))
are in T h(N).
116
Since we already know that N is naturally embedded in M, and it gets tiresome to write +M , M , and
<M , well abuse notation by using just +, , and < to denote these. Thus, these symbols now have three
different meanings. They are used as formal symbols in our language, as the normal functions and relations
in N, and as their interpretations in M. Make sure you know how each appearance of these symbols is being
used.
Definition 8.2.2. We let Mf in = {nM : n N} and we call Mf in the set of finite elements of M. We also
let Minf = M \Mf in and we call Minf the set of infinite elements of M.
The following definition justifies our choice of name.
Proposition 8.2.3. Let a Minf . For any n N, we have nM < a.
Proof. For each n N, the sentence
x(x < n
n1
_
(x = i))
i=0
is in T h(N). Since a 6= nM for all n N, it follows that its not the case that a < nM for all n N. Since
< is a linear ordering on M , we may conclude that nM < a for all n N.
Definition 8.2.4. Define a relation on M by letting a b if either
a = b.
a < b and there exists n N such that a + nM = b.
b < a and there exists n N such that b + nM = a.
In other words, a b if a and b are finitely far apart.
Proposition 8.2.5. is an equivalence relation on M .
Proof. is clearly relexive and symmetric. Suppose that a, b, c M , that a b, and that b c. We handle
one case. Suppose that a < b and b < c. Fix m, n N with a + mM = b and b + nM = c. We then have
a + (m + n)M = a + (mM + nM )
= (a + mM ) + nM
= b + nM
=c
so a c. The other cases are similar.
Definition 8.2.6. Let a, b M . We write a b to mean that a < b and a 6 b.
Wed like to know that that relation is well-defined on the equivalence classes of . The following
lemma is useful.
Lemma 8.2.7. Let a, b, c M be such that a b c and suppose that a c. We then have a b and
b c.
Proof. If either a = b or b = c, this is trivial, so assume that a < b < c. Since a < c and a c, there exists
n N+ with a + nM = c. Now the sentence
xzw(x + w = z y((x < y y < z) u(u < w x + u = y)))
is in T h(N), so there exists d M such that d < nM and a + d = b. Since d < nM , there exists i N with
d = iM . We then have a + iM = b, hence a b. The proof that b c is similar.
117
Proposition 8.2.8. Suppose that a0 , b0 M are such that a0 b0 . For any a, b M with a a0 and
b b0 , we have a b.
Proof. We first show that a 6 b. If a b, then using a0 a and b0 b, together with the fact that is an
equivalence relation, we can conclude that a0 b0 , a contradiction. Therefore, a 6 b.
Thus, we need only show that a < b. Notice that a0 < b because otherwise a0 b0 by Lemma 8.2.7.
Similarly, a < b0 because otherwise a0 b0 by Lemma 8.2.7. Thus, if b a, we have
a0 < b a < b0 .
so b a0 by Lemma 8.2.7, hence a0 b0 , a contradiction. It follows that a < b.
This allows us to define an ordering on the equivalence classes.
Definition 8.2.9. Given a, b M , we write [a] [b] to mean that a b.
The next proposition implies that there is no largest equivalence class under the ordering .
Proposition 8.2.10. For any a Minf , we have a a + a.
Proof. Let a Minf . For each n N, the sentence
x(n < x x + n < x + x)
is in T h(N). Using this when n = 0, we see that a = a + 0M < a + a. Since a Minf , we have nM < a and
hence a + nM < a + a for all n N. Therefore, a + nM 6= a + a for all n N, and so a 6 a + a.
Lemma 8.2.11. For all a M , one of the following holds
1. There exists b M such that a = 2M b.
2. There exists b M such that a = 2M b + 1M .
Proof. The sentence
xy(x = 2 y x = 2 y + 1)
is in T h(N).
Proposition 8.2.12. For any a Minf , there exists b Minf with b a.
Proof. Suppose first that we have a b M such that a = 2M b. We then have a = b + b (because
x(2 x = x + x) is in T h(N)). Notice that b
/ Mf in because otherwise we would have a Mf in . Therefore,
b b + b = a using Proposition 8.2.10. Suppose instead that we have a b M such that a = 2M b + 1M .
/ Mf in because
We then have a = (b + b) + 1M because x(2 x + 1 = (x + x) + 1) is in T h(N). Notice that b
otherwise we would have a Mf in . Therefore, b b + b using Proposition 8.2.10, so b (b + b) + 1 = a
since b + b (b + b) + 1.
Proposition 8.2.13. For any a, b Minf with a b, there exists c Minf with a c b.
Proof. Suppose first that we have a c M such that a + b = 2M c. We then have a + b = c + c. Since
xyz((x < y x + y = z + z) (x < z z < y))
is in T h(N) it follows that a < c < b.
Suppose that a c and fix n N with a + nM = c. We then have that a + b = c + c = a + a + (2n)M ,
so b = a + (2n)M contradicting the fact that a b. Therefore a 6 c.
118
8.3
With a basic understanding of nonstandard models of arithmetic, lets think about nonstandard models of
other theories. One of the more amazing and useful such theories is the theory of the real numbers. The
idea is that we will have nonstandard models of the theory of the reals which contain both infinite and
infinitesimal elements. We can then transfer first-order statements back-and-forth, and do calculus in
this expanded stucture where the basic definitions (of say continuity) are simpler and more intuitive.
The first thing we need to decide on is what our language will be. Since we want to do calculus, we want
to have analogs of all of our favorite functions (such as sin) in the nonstandard models. Once we throw these
in, its hard to know where to draw the line. In fact, there is no reason to draw a line at all. Simply throw in
relation symbols for every possible subset of Rk , and throw in function symbols for every possible function
f : Rk R. Thus, throughout this section, we work in the language L = {r : r R} {P : P Rk } {f :
f : Rk R} where the P and f have the corresponding arities. We also let R be the structure with universe
R and where we interpret all symbols in the natural way.
Proposition 8.3.1. For any model M of T h(R), the function h : R M given by h(r) = rM is an
embedding of R into M.
119
= f M (r1 M , r2 M , . . . , rk M )
= f M (h(r1 ), h(r2 ), . . . , h(rk ))
Finally, for any r1 , r2 R with r1 6= r2 , we have r1 6= r2 T h(R), so M r1 6= r2 , and hence h(r1 ) 6=
h(r2 ).
Proposition 8.3.2. Let M be a model of T h(R). The following are equivalent.
1. M
= R.
2. M = {rM : r R}.
Proof. If 2 holds, then the h of the Proposition 8.3.1 is surjective and hence an isomorphism. Suppose
then that 1 holds and fix an isomorphism h : R M from R to M. For any r R, we must have
h(r) = h(rR ) = rM . Therefore, M = {rM : r R} because h is surjective.
Definition 8.3.3. A nonstandard model of analysis is a model M of T h(R) such that M
6 R.
=
Theorem 8.3.4. There exists a nonstandard model of analysis.
Proof. Let L0 = L {c} where c is a new constant symbol. Consider the following set of L0 -sentences.
0 = T h(R) {c 6= r : r R}
Notice that every finite subset of 0 has a model (by taking R and interpreting c distinct from each r
such that r appears in 0 ), so 0 has a model M by the Compactness Theorem. Restricting this model to
the original language L, we may use the Proposition 8.3.2 to conclude that M is a nonstandard model of
analysis.
Definition 8.3.5. For the rest of this section, fix a nonstandard model of analysis and denote it by R.
Instead of wrting f R for each f : Rk R, we simply write f . We use similar notation for each P Rk .
Also, since there is a natural embedding (the h above) from R into R, we will identify R with its image
and hence think of R as a subset of R. Finally, for operations like + and , we will abuse notation and omit
the s.
Proposition 8.3.6. There exists z R such that z > 0 and z < for all R with > 0.
120
1
b
Definition 8.3.7.
1. Z = {a R : |a| < for all R with > 0}. We call Z the set of infinitesimals.
2. F = {a R : |a| < r for some r R with r > 0}. We call F the set of finite or limited elements.
3. I = R\F. We call I the set of infinite or unlimited elements.
Proposition 8.3.8.
1. Z is a subring of R.
2. F is a subring of R.
3. Z is a prime ideal of F.
Proof.
1. First notice that Z =
6 because 0 Z (or we can use Proposition 8.3.6). Suppose that a, b Z. Let
R with > 0.
R and
121
and |b| < 2 . It follows that
|a b| |a + (b)|
|a| + | b|
|a| + |b|
< +
2 2
=
Therefore, a b Z. We also have that |a| < 1 and |b| < , hence
|a b| = |a| |b|
<1
=
Therefore, a b Z.
2. Clearly, F 6= . Suppose that a, b F, and fix r1 , r2 R with r1 , r2 > 0 such that |a| < r1 and |b| < r2 .
We have
|a b| |a + (b)|
|a| + | b|
|a| + |b|
< r1 + r2
so a b F. We also have
|a b| = |a| |b|
< r1 r2
so a b F.
3. We first show that Z is an ideal of F. Suppose that a F and b Z. Fix r R with r > 0 and
|a| < r. Let R with > 0. We then have that r R and r > 0, hence |a| < r . It follows that
|a b| = |a| |b|
< r
r
=
Therefore, a b Z.
We now show that Z is a prime ideal of F. Suppose that a, b F\Z. We have a b F by part 2.
Fix , R with , > 0 such that |a| > and |b| > . We then have |a b| = |a| |b| > , hence
ab
/ Z.
122
a1
a2
b1
b2 .
Proof.
1. We have a1 b1 Z and a2 b2 Z, hence
(a1 + a2 ) (b1 + b2 ) = (a1 b1 ) + (a2 b2 )
is in Z by Proposition 8.3.8.
2. We have a1 b1 Z and a2 b2 Z, hence
(a1 a2 ) (b1 b2 ) = (a1 b1 ) (a2 b2 )
is in Z by Proposition 8.3.8.
3. We have a1 b1 Z and a2 b2 Z. Now
a1 a2 b1 b2 = a1 a2 a1 b2 + a1 b2 b1 b2 = a1 (a2 b2 ) + b2 (a1 b1 )
so a1 a2 b1 b2 Z by Proposition 8.3.8.
4. We have a1 b1 Z and a2 b2 Z. Now
a1
b1
a1 b2 a2 b1
1
=
=
(a1 b2 a2 b1 )
a2
b2
a2 b2
a2 b2
and we know by part 3 that a1 b2 a2 b1 Z. Since a2 , b2 F\Z, it follows that a2 b2 F\Z
by Proposition 8.3.8. Therefore, a21b2 F (if > 0 is such that |a2 b2 | > , then | a21b2 | < 1 ), so
a1
b1
a2 b2 Z by Proposition 8.3.8.
123
Definition 8.3.14. We define a map st : F R by letting st(a) be the unique r R such that a r. We
call st(a) the standard part or shadow of a.
Corollary 8.3.15. The function st : F R is a ring homomorphism and ker(st) = Z.
Proposition 8.3.16. Suppose that A R, that f : A R, and that r, ` R. Suppose also that there exists
> 0 such that (r , r + )\{r} A. The following are equivalent.
1. lim f (x) = `.
xr
lim f (x) = `, we may fix R with > 0 such that |f (x) `| < whenever x A and 0 < |x r| < .
xr
Now the sentence
x((x A 0 < |x r| < ) |f (x) `| < )
is in T h(R) = T h( R). Now we have a A and 0 < |a r| < , hence | f (a) `| < . Since was arbitrary,
it follows that f (a) `.
Suppose now that for all a r with a 6= r, we have f (a) `. Fix z Z with z > 0. Let R with
> 0. By assumption, whenever a A and 0 < |a r| < z, we have that f (a) `. Thus, the sentence
( > 0 x((x A 0 < |x r| < ) |f (x) `| < ))
is in T h( R) = T h(R). By fixing a witnessing , we see that the limit condition holds for .
Proposition 8.3.17. Suppose that A R, that f, g : A R, and that r, `, m R. Suppose also that there
exists > 0 such that (r , r + )\{r} A, that lim f (x) = ` and lim g(x) = m. We then have
xr
xr
1. lim (f + g)(x) = ` + m.
xr
2. lim (f g)(x) = ` + m.
xr
3. lim (f g)(x) = ` m.
xr
`
m.
f (a)
g(a)
`
m
(notice g(a)
/ Z because m 6= 0).
Corollary 8.3.18. Suppose that A R, that f : A R, and that r R. Suppose also that there exists
> 0 such that (r , r + ) A. The following are equivalent.
1. f is continuous at r.
2. For all a r, we have f (a) f (r).
124
Corollary 8.3.19. Suppose that A R, that f : A R, and that r, ` R. Suppose also that there exists
> 0 such that (r , r + ) A. The following are equivalent.
1. f is differentiable at r with f 0 (r) = `.
2. For all a r with a 6= r, we have
f (a)f (r)
ar
`.
Now f 0 (r) F, so
f (a) f (r).
f (a)f (r)
ar
f (a) f (r)
f 0 (r)
ar
Proposition 8.3.21. Suppose that f, g : R R and r R. Suppose also that g is differentiable at r and f
is differentiable at g(r). We then have that f g is differentiable at r and (f g)0 (r) = f 0 (g(r)) g 0 (r).
Proof. We know that for all a r with a 6= r, we have
g(a) g(r)
g 0 (r)
ar
f (b) f (g(r))
f 0 (g(r))
b g(r)
Now fix a r with a 6= r. Since g is continuous at r, we have g(a) g(r). If g(a) 6= g(r), then
f ( g(a)) f (g(r))
(f g)(a) (f g)(r)
=
ar
ar
f ( g(a)) f (g(r)) g(a) g(r)
=
g(a) g(r)
ar
0
0
f (g(r)) g (r)
Suppose then that g(a) = g(r). Since the first line above holds for every a r with a 6= r, we must have
g 0 (r) 0 and hence g 0 (r) = 0 because g 0 (r) R. Therefore,
f ( g(a)) f (g(r))
(f g)(a) (f g)(r)
=
ar
ar
=0
= f 0 (g(r)) g 0 (r)
Chapter 9
9.1
Set theory originated in an attempt to understand and somehow classify small or negligible sets of
real numbers. Cantors early explorations in the realm of the transfinite were motivated by a desire to
understand the points of convergence of trigonometric series. The basic ideas quickly became a fundamental
part of analysis.
Since then, set theory has become a way to unify mathematical practice and the way in which mathematicians deal with the infinite in all areas of mathematics. Youve all seen the proof that the set of real
numbers is uncountable, but what more can be said? Exactly how uncountable is the set of real numbers?
Does this taming of the infinite give us any new tools to prove interesting mathematical theorems? Is there
anything more that the set-theoretic perspective provides to the mathematical toolkit other than a crude
notion of size and cute diagonal arguments?
We begin by listing a few basic questions from various areas of mathematics that can only be tackled
with a well-defined theory of the infinite which set theory provides.
Algebra: A fundamental result in linear algebra is that every finitely generated vector space has a basis,
and any two bases have the same size. We call the unique size of any basis of a vector space the dimension
of that space. What can be said about vector spaces that arent finitely generated? Does every vector space
have a basis? Is there a meaningful way to assign a dimension to every vector space in such a way that
two vector spaces over the same field are isomorphic if and only if they have the same dimension? We
need a well-defined and robust notion of infinite sets and infinite cardinality to deal with these questions.
Analysis: Lebesgues theory of measure and integration require an important distinction between countable and uncountable sets. Aside from this use, the study of the basic structure of the Borel sets or the
projective sets (an extension of the Borel sets) require some sophisticated use of set theory, in a way that
can be made precise.
Foundations: A remarkable side-effect of our undertaking to systematically formalize the infinite is that
we can devise a formal axiomatic and finitistic system in which virtually all of mathematical practice can
be embedded in an extremely faithful manner. Whether this fact is interesting or useful depends on your
philosophical stance about the nature of mathematics, but it does have an important consequence. It puts us
in a position to prove that certain statements do not follow from the axioms (which have now been formally
defined and are thus susceptible to a mathematical analysis), and hence can not be proven by the currently
accepted axioms. For better or worse, this feature has become the hallmark of set theory. For example, we
can ask questions like:
125
126
1. Do we really need the Axiom of Choice to produce a nonmeasurable set of real numbers?
2. Is there an uncountable set of real numbers which can not be in one-to-one correspondence with the
set of all real numbers?
Aside from these ideas which are applicable to other areas of mathematics, set theory is a very active
area of mathematics with its own rich and beautiful structure, and deserves study for this reason alone.
9.2
In every modern mathematical theory (say group theory, topology, the theory of Banach spaces), we start
with a list of axioms, and derive results from these. In most of the fields that we axiomatize in this way,
we have several models of the axioms in mind (many different groups, many different topological spaces,
etc.), and were using the axiomatization to prove abstract results which will be applicable to each of these
models. In set theory, you may think that it is our goal to study one unique universe of sets, so our original
motivation in writing down axioms is simply to state precisely what we are assuming in an area that can
often be very counterintuitive. Since we will build our system in first-order logic, it turns out that there are
many models of set theory as well (assuming that there is at least one...), and this is the basis for proving
independence results, but this isnt our initial motivation. This section will be a little informal. Well give
the formal axioms (in a formal first-order language) and derive consequences starting in the next section.
Whether the axioms that we are writing down now are obviously true, correct, justified, or even
worthy of study are very interesting philosophical questions, but I will not spend much time on them here.
Regardless of their epistemological status, they are now nearly universally accepted as the right axioms to
use in the development of set theory. The objects of our theory are sets, and we have one binary relation
which represents set membership. That is, we write x y to mean that x is an element of y. We begin with
an axiom which ensures that our theory is not vacuous.
Axiom of Existence: There exists a set.
We need to have an axiom which says how equality of sets is determined in terms of the membership
relation. In mathematical practice using naive set theory, the most common way to show that two sets A
and B are equal is to show that each is a subset of the other. We therefore define A B to mean that for
all x A, we have x B, and we want to be able to conclude that A = B from the facts that A B and
B A. That is, we want to think of a set as being completely determined by its members, thus linking =
and , but we need to codify this as an axiom.
Axiom of Extensionality: For any two sets A and B, if A B and B A, then A = B.
The Axiom of Extensionality implicitly implies a few perhaps unexpected consequences about the nature
of sets. First, if a is a set, then we should consider the two sets {a} and {a, a} (if we are allowed to assert
their existence) to be equal because they have the same elements. Similarly, if a and b are sets, then we
should consider {a, b} and {b, a} to be equal. Hence, whatever a set is, it should be inherently unordered
and have no notion of multiplicity. Also, since the only objects we are considering are sets, we are ruling
out the existence of atoms other than the empty set, i.e. objects a which are not the empty set but which
have no elements.
We next need some rules about how we are allowed to build sets. The naive idea is that any property
we write down determines a set. That is, for any property P of sets, we may form the set {x : P (x)}. For
example, if you have a group G, you may form the center of G given by Z(G) = {x : x G and xy = yx for
all y G}. Of course, this naive approach leads to the famous contradiction known as Russells paradox.
127
128
we can build a very rich collection of finite sets using the above axioms. For example, we can form {} using
the Axiom of Pairing. We can also form {} by applying the Axiom of Power Set to . We can then go on
to form {, {}} and many other finite sets. However, our axioms provide no means to build an infinite set.
Before getting to the Axiom of Infinity, we will lay some groundwork about ordinals. If set theory is going
to serve as a basis for mathematics, we certainly need to be able to embed within it the natural numbers.
It seems natural to represent the number n as some set which we think of as having n elements. Which set
should we choose? Lets start from the bottom-up. The natural choice to play the role of 0 is because it is
the only set without any elements. Now that we have 0, and we want 1 to be a set with one element, perhaps
we should let 1 be the set {0} = {}. Next, a canonical choice for a set with two elements is {0, 1}, so we let
2 = {0, 1} = {, {}}. In general, if we have defined 0, 1, 2, . . . , n, we can let n + 1 = {0, 1, . . . , n}. This way
of defining the natural numbers has many advantages which well come to appreciate. For instance, well
have n < m if and only if n m, so we may use the membership relation to define the standard ordering of
the natural numbers.
However, the . . . in the above definition of n + 1 may make you a little nervous. Fortunately, we can give
another description of n + 1 which avoids this unpleasantness. If weve defined n, we let n + 1 = n {n},
which we can justify the existence of using the Axiom of Pairing and the Axiom of Union. The elements of
n + 1 will then be n, and the elements of n which should inductively be the natural numbers up to, but
not including, n.
Using the above outline, we can use our axioms to justify the existence of any particular natural number
n (or, more precisely, the set that weve chosen to represent our idea of the natural number n). However,
we cant justify the existence of the set of natural numbers {0, 1, 2, 3, . . . }. To enable us to do this, we make
the following definition. For any set x, let S(x) = x {x}. We call S(x) the successor of x. We want an
axiom which says that there is a set containing 0 = which is closed under successors.
Axiom of Infinity: There exists a set A such that A and for all x, if x A, then S(x) A.
With the Axiom of Infinity asserting existence, its not too difficult to use the above axioms to show
that there is a smallest (with respect to ) set A such that A and for all x, if x A, then S(x) A.
Intuitively, this set is the collection of all natural numbers. Following standard set-theoretic practice, we
denote this set by (this strange choice, as opposed to the typical N, conforms with the standard practice
of using lowercase greek letters to represent infinite ordinals).
With the set of natural numbers in hand, theres no reason to be timid and stop counting. We started
with 0, 1, 2, . . . , where each new number consisted of collecting the previous numbers into a set, and weve
now collected all natural numbers into a set . Why not continue the counting process by considering
S() = {} = {0, 1, 2, . . . , }? We call this set + 1 for obvious reasons. This conceptual leap of
counting into the so-called transfinite gives rise to the ordinals, the numbers which form the backbone of
set theory.
Once we have + 1, we can then form the set + 2 = S( + 1) = {0, 1, 2, . . . , , + 1}, and continue on
to + 3, + 4, and so on. Why stop there? If we were able to collect all of the natural numbers into a set,
whats preventing us from collecting these into the set {0, 1, 2, . . . , , + 1, + 2, . . . }, and continuing? Well,
our current axioms are preventing us, but we shouldnt let that stand in our way. If we can form , surely
we should have an axiom allowing us to make this new collection a set. After all, if isnt too large, this
set shouldnt be too large either since its just another sequence of many sets after .
The same difficulty arises when you want to take the union of an infinite family of sets. In fact, the
previous problem is a special case of this one, but in this generality it may feel closer to home. Suppose we
have sets A0 , A1 , AS
2 , . . . , that is, we have a set An for every n . Of course, we should be able to justify
making the union n An into a set. If we want to apply the Axiom of Union, we should first form the
set F = {A0 , A1 , A2 , . . . } and apply the axiom to F. However, in general, our current axioms dont justify
forming this set despite its similarity to asserting the existence of .
129
To remedy these defects, we need a new axiom. In light of the above examples, we want to say something
along the lines of if we can index a family of sets with , then we can form this family into a set. Using this
principle, we should be able to form the set {, + 1, + 2, . . . } and hence {0, 1, 2, . . . , , + 1, + 2, . . . }
is a set by the Axiom of Union. Similarly, in the second example, we should be able to form the set
{A0 , A1 , A2 , . . . }. In terms of our restriction of not allowing sets to be too large, this seems justified
because if we consider to not be too large, then any family of sets it indexes shouldnt be too large
either.
There is no reason to limit our focus to . If we have any set A, and we can index a family of sets using
A, then we should be able to assert the existence of a set containing the elements of the family. We also
want to make the notion of indexing more precise, and we will do it using the currently vague notion of a
property of sets as used in the Axiom of Separation.
Axiom of Collection: Suppose that A is a set and P (x, y) is a property of sets such that for every x A,
there is a unique set y such that P (x, y) holds. Then there is a set B such that for every x A, we have
y B for the unique y such that P (x, y) holds.
Our next axiom is often viewed as the most controversial due to its nonconstructive nature and the
sometimes counterintuitive results it allows us to prove. I will list it here as a fundamental axiom, but we
will avoid using it in the basic development of set theory below until we get to a position to see its usefulness
in mathematical practice.
The Axiom of Separation and the Axiom of Collection involved the somewhat vague notion of property,
but whenever we think of a property (and the way we will make the notion of property precise using a
formal language) we have a precise unambiguous definition which describes the property in mind. Our next
axiom, the Axiom of Choice, asserts the existence of certain sets without the need for such a nice description.
Intuitively, it says that if we have a set consisting only of nonempty sets, there is a function which picks an
element out each of these nonempty sets without requiring that there be a definable description of such
a function. We havent defined the notion of a function in set theory, and it takes a little work to do, so
we will state the axiom in the following form: For every set F of nonempty pairwise disjoint sets, there is a
set C consisting of exactly one element from each element of F. We think of C as a set which chooses an
element from each of the elements of F. Slightly more precisely, we state the axiom as follows.
Axiom of Choice: Suppose that F is a set such every A F is nonempty, and for every A, B F, if there
exists a set x with x A and x B, then A = B. There exists a set C such that for every A F, there is
a unique x C with x A.
Our final axiom is in no way justified by mathematical practice because it never appears in arguments
outside set theory. It is also somewhat unique among our axioms in that in asserts that certain types of sets
do not exist. However, adopting it gives a much clearer picture of the set-theoretic universe and it will come
to play an important role in the study of set theory itself. As with the Axiom of Choice, we will avoid using
it in the basic development of set theory below until we are able to see its usefulness to us.
The goal is to eliminate sets which appear circular in terms of the membership relation. For example, we
want to forbid sets x such that x x (so there is no set x such that x = {x}). Similarly, we want to forbid
the existence of sets x and y such that x y and y x. In more general terms, we dont want to have a
set with an infinite descending chain each a member of the next, such as having sets xn for each n such
that x2 x1 x0 . We codify this by saying every nonempty set A has an element which is minimal
with respect to the membership relation.
Axiom of Foundation: If A is a nonempty set, then there exists x A such that there is no set z with
both z A and z x.
130
9.3
We now give the formal version of our axioms. We work in a first-order language L with a single binary
relation symbol . By working in this first-order language, we are able to make precise the vague notion of
property discussed above by using first-order formulas instead. However, this comes at the cost of replacing
the Axiom of Separation and the Axiom of Collection by infinitely many axioms (also called an axiom scheme)
since we cant quantify over formulas within the theory itself. There are other more subtle consequences of
formalizing the above intuitive axioms in first-order logic which we will discuss below.
Notice also that we allow parameters (denoted by p~) in the Axioms of Separation and Collection so that
we will be able to derive statements which universally quantified over a parameter, such as For all groups
G, the set Z(G) = {x G : xy = yx for all x G} exists, rather than having to reprove that Z(G) is
a set for each group G that we know exists. Finally, notice how we can avoid using defined notions (like
, , and S(x) in the Axiom of Infinity) by expanding them out into our fixed language. For example, we
replace x y by w(w x w y) and replace z by w(y(y
/ w) w z) (we could also replace it
w(y(y
/ w) w z)).
In each of the following axioms, when we write a formula (x1 , x2 , . . . , xk ), we implicitly mean that the
xi s are distinct variables and that every free variable of is one of the xi . We also use ~p to denote a finite
sequence of variables p1 , p2 , . . . , pk . Notice that we dont need the Axiom of Existence because it is true in
all L-structures (recall that all L-structures are nonempty).
Axiom of Extensionality:
xy(w(w x w y) x = y)
Axiom (Scheme) of Separation: For each formula (x, y,~p) we have the axiom
~pyzx(x z (x y(x, y,~p)))
Axiom of Pairing:
xyz(x z y z)
Axiom of Union:
xuz(y(z y y x) z u)
Axiom of Power Set:
xzy(w(w y w x) y z)
Axiom of Infinity:
z(w(y(y
/ w) w z) x(x z y(w(w y (w x w = x)) y z)))
Axiom (Scheme) of Collection: For each formula (x, y,~p) we have the axiom
~pw((x(x w y(x, y,~p))x(x w uv(((x, u,~p) (x, v,~p)) u = v)))
zx(x w y(y z (x, y,~p))))
Axiom of Choice:
z((x(x z w(w x)) xy((x z y z w(w x w y)) x = y))
cx(x z (w(w x w c) uv((u x v x u c v c) u = v))))
Axiom of Foundation:
z(x(x z) x(x z (y(y z y x))))
131
Let AxZF C be the above set of sentences, and let ZFC = Cn(AxZF C ) (ZFC stands for Zermelo-Fraenkel
set theory with Choice). Other presentations state the axioms of ZFC a little differently, but they all
give the same theory. Some people refer to the Axiom of Separation as the Axiom of Comprehension, but
Comprehension is sometimes also used to mean the contradictory statement (via Russells Paradox) that
we can always form the set {x : P (x)}, so I prefer to call it Separation. Also, some presentations refer to
the Axiom of Collection as the Axiom of Replacement, but this name is more applicable to the statement
that replaces the last in the statement of Collection with a , and this formulation implies the Axiom of
Separation.
9.4
We have set up ZFC as a first-order theory similar to the group axioms, ring axioms, or axioms of partial
orderings. Since we have two notions of implication (sementic and syntactic), in order to show that ZF C,
we can show that either AxZF C or AxZF C ` . Given your experience with syntactic deductions, Im
guessing that you will jump on the first one.
When attempting to show that AxZF C we must take an arbitrary model of AxZF C and show that
it is a model of . Thus, we must be mindful of strange L-structures and perhaps unexpected models. For
example, let L be the language of set theory (so we have one binary relation symbol ) and let N be the
L-structure (N, <). Lets see which elements of AxZF C hold in N.
Axiom of Extensionality: In the structure N, this interprets as saying that whenever two elements of
N have the same elements of N less than them, then they are equal. This holds in N.
Axiom (Scheme) of Separation: This does not hold in N. Let (x, y) be the formula w(w x). The
corresponding instance of Separation is:
yzx(x z (x y w(w x)))
In the structure N, this interprets as saying that for all n N, there is an m N such that for all
k N, we have k < m if and only if k < n and k 6= 0. This does not hold in N because if we consider
n = 2, there is no m N such that 0 6< m and yet 1 < m.
Axiom of Pairing: In the structure N, this interprets as saying that whenever m, n N, there exists
k N such that m < k and n < k. This holds in N because given m, n N, we may take k =
max{m, n} + 1.
Axiom of Union: In the structure N, this interprets as saying that whenever n N, there exists ` N
such that whenever k N has the property that there exists m N with k < m and m < n, then
k < `. This holds in N because given n N, we may take ` = n since if k < m and m < n, then k < n
by transitivity of < in N (in fact, we may take ` = n 1 if n 6= 0).
Axiom of Power Set: In the structure N, this interprets as saying that whenever n N, there exists
` N such that whenever m N has the property that every k < m also satisfies k < n, then m < `.
This holds in N because given n N, we may take ` = n + 1 since if m N has the property that
every k < m also satisfies k < n, then m n and hence m < n + 1.
Axiom of Infinity: In the structure N, this interprets as saying that there exists n N such that 0 < n
and whenever m < n, we have m + 1 < n. This does not hold in N.
Axiom (Scheme) of Collection: This holds in N, as we now check. Fix a formula (x, y,~p). Interpreting
in N, we need to check that if we fix natural numbers ~q and an n N such that for all k < n there
exists a unique ` N such that (N, k, `, ~q) , then there exists m N such that for all k < n there
132
Is AxZF C satisfiable? Can we somehow construct a model of AxZF C ? These are interesting questions
with subtle answers. For now, youll have to live with a set of axioms with no obvious models.
Thus, when we develop set theory below, we will be arguing semantically via models. Rather that
constantly saying Fix a model M of AxZF C at the beginning of each proof, and proceeding by showing
that (M, s) for various , we will keep the models in the background and assume that we are living
inside one for each proof. When we are doing this, a set is simply an element of the universe M of our
model M, and given two sets a and b, we write a b to mean that (a, b) is an element of M .
Also, although there is no hierarchy of sets in our axioms, we will often follow the practice of using
lowercase letters a, b, c, etc. to represent sets that we like to think of as having no internal structure (such
as numbers, elements of a group, points of a topological space), use capital letters A, B, C, etc. to represent
sets whose elements we like to think of as having no internal structure, and use script letters A, F, etc. to
represent sets of such sets.
9.5
In the next 2 chapters well show how to develop mathematics quite faithfully within the framework of ZFC.
This raises the possibility of using set theory as a foundation for mathematical practice. However, this seems
circular because our development of logic presupposed normal mathematical practice and naive set theory
(after all, we have the set of axioms of ZFC). It seems that logic depends on set theory and set theory
depends on logic, so how have we gained anything from a foundational perspective?
It is indeed possible, at least in principle, to get out of this vicious circle and have a completely finististic
basis for mathematics. The escape is to buckle down and use syntactic arguments. Now there are infinitely
many axioms of ZFC (because of the two axioms schemes), but instead of showing that AxZF C ` , we
can instead show that ` for a finite AxZF C (in which every line of the deduction has a finite
collection of formulas on the left-hand side). In this way, it would be possible in principle to make every
proof completely formal and finitistic where each line follows from previous lines by one of our proof rules.
If we held ourselves to this style, then we could reduce mathematical practice to a game with finitely many
symbols (if you insisted we could replace our infinite stock of variables V ar with one variable symbol x and
a new symbol 0 and refer to x3 as x000 , etc.) where each line could be mechanically checked according to our
finitely many rules. Thus, it would even be possible to program a computer to check every proof.
133
In practice (for human beings at least), the idea of giving deductions for everything is outlandish. Leaving
aside the fact that actually giving short deductions is often a painful endeavor in itself, it turns out that
even the most basic statements of mathematics, when translated into ZFC, are many thousands of symbols
long, and elementary mathematical proofs (such as say the Fundamental Theorem of Arithmetic) are many
thousands of lines long. Well discuss how to develop the real numbers below, but any actual formulas
talking about real numbers would be ridiculously long and incomprehensible to the human reader. Due to
these reasons, and since the prospect of giving syntactic deductions for everything gives me nightmares, I
choose to argue everything semantically in the style of any other axiomatic subject in mathematics. It is an
interesting and worthwhile exercise, however, to imagine how everything could be done syntactically.
134
Chapter 10
First Steps
We first establish some basic set theoretic facts carefully from the axioms.
Definition 10.1.1. If A and B are sets, we write A B to mean for all c A, we have c B.
Although the symbol is not part of our language, we will often use in our formulas and arguments. This use is justified because it can always be transcribed into our language by replacing it with the
corresponding formula as we did in the axioms.
Proposition 10.1.2. There is a unique set with no elements.
Proof. Fix a set b. By Separation applied to the formula x 6= x, there is a set c such that for all a, we have
a c if and only if a b and a 6= a. For all a, we have a = a, hence a
/ c. Therefore, there is a set with no
elements. If c1 and c2 are two sets with no elements, then by the Axiom of Extensionality, we may conclude
that c1 = c2 .
Definition 10.1.3. We use to denote the unique set with no elements.
As above, we will often use in our formulas and arguments despite the fact that there is no constant
in our language representing it. Again, this use can always be eliminated by replacing it with a formula
as we did in the axioms. We will continue to follow this practice without comment in the future when we
introduce new definitions to stand for sets for which ZFC proves existence and uniqueness. In each case, be
sure to understand how these definitions could be eliminated.
We now show how to turn the idea of Russells Paradox into a proof that there is no universal set.
Proposition 10.1.4. There is no set u such that a u for every set a.
Proof. Suppose that u is a set and a u for every set a. By Separation applied to the formula x
/ x, there
is a set c such that for all sets a, we have a c if and only if a u and a
/ a. Since a u for every set a,
we have a c if and only if a
/ a for every set a. Therefore, c c if and only if c
/ c, a contradiction.
Proposition 10.1.5. For all sets a and b, there is a unique set c such that, for all sets d, we have d c if
and only if either d = a or d = b.
Proof. Let a and b be sets. By Pairing, there is a set e such that a e and b e. By Separation applied to
the formula x = a x = b (notice that we are using parameters a and b in this use of Separation), there is a
set c such that for all d, we have d c if and only if both d e and either d = a or d = b. It follows that a c,
b c, and for any d c, we have either d = a or d = b. Uniqueness again follows from Extensionality.
135
136
Corollary 10.1.6. For every set a, there is a unique set c such that, for all sets d, we have d c if and
only if d = a.
Proof. Apply the previous proposition with b = a.
Definition 10.1.7. Given two sets a and b, we use the notation {a, b} to denote the unique set guaranteed
to exist by the Proposition 10.1.5. Given a set a, we use the notation {a} to denote the unique set guaranteed
to exist by the Corollary 10.1.6.
Using the same style of argument, we can use Union and Separation to show that for every set F, there
is a unique set z consisting precisely of elements of elements of F. The proof is an exercise.
Proposition 10.1.8. Let F be a set. There is a unique set U such that for all a, we have a U if and only
if there exists B F with a B.
S
Definition 10.1.9. Let F be a set. We use the notation F to denote the unique setSguaranteed to exist
by the previous proposition. If A and B are sets, we use the notation A B to denote {A, B}.
We now introduce some notation which conforms with the normal mathematical practice of writing sets.
Definition 10.1.10. Suppose that (x, y,~p) is a formula. Suppose that B and ~q are sets. By Separation
and Extensionality, there is a unique set C such that for all sets a, we have a C if and only if a B and
(a, B, ~q). We denote this unique set by {a B : (a, B, ~q)}.
With unions in hand, what about intersections? As in unions, the general case to consider is when we
have a family of sets F. We then want to collect those a such that a B for all B F into a set. We do
need to be a little careful however. What happens if F = ? It seems that our definition would want to
make the the intersection of the sets in F consists of all sets, contrary to Proposition 10.1.4. However, this
is the only case which gives difficulty because if F 6= , we can take the intersection to be a subset of one
(any) of the elements of F.
Proposition 10.1.11. Let F be a set with F 6= . There is a unique set I such that for all a, we have a I
if and only if a B for all B F.
Proof. Since F =
6 , we may fix C F. Let I = {a C : B F(a B)}. For all a, we have a I if and
only if a B for all B F. Uniqueness again follows from Extensionality.
T
Definition 10.1.12. Let F be a set with F =
6 . We use the notation F to denote the unique set
guaranteed
to exist by the previous proposition. If A and B are sets, we use the notation A B to denote
T
{A, B}.
If A is a set, then we can not expect the complement of A to be a set because the union of such a
purported set with A would be a set which has every set as an element, contrary to Proposition 10.1.4.
However, if A and B are sets, and A B, we can take the relative complement of A in B.
Proposition 10.1.13. Let A and B be sets with A B. There is a unique set C such that for all a, we
have a C if and only if a B and a
/ A.
Definition 10.1.14. Let A and B be sets with A B. We use the notation B\A or B A to denote the
unique set guaranteed to exist by the previous proposition.
10.2
137
Since sets have no internal order to them, we need a way to represent ordered pairs. Fortunately (since it
means we dont have to extend our notion of set), there is a hack which allows us to build sets which capture
the notion of an ordered pair.
Definition 10.2.1. Given two sets a and b, we let (a, b) = {{a}, {a, b}}.
Proposition 10.2.2. Let a, b, c, d be sets. If (a, b) = (c, d), then a = c and b = d.
Proof. Suppose that a, b, c, d are sets and {{a}, {a, b}} = {{c}, {c, d}}. We first show that a = c. Since
{c} {{a}, {a, b}}, either {c} = {a} or {c} = {a, b}. In either case, we have a {c}, hence a = c. We
now need only show that b = d. Suppose instead that b 6= d. Since {a, b} {{c}, {c, d}}, we have either
{a, b} = {c} or {a, b} = {c, d}. In either case, we conclude that b = c (because either b {c} or b {c, d},
and b 6= d). Similarly, since {c, d} {{a}, {a, b}}, we have either {c, d} = {a} or {c, d} = {a, b}. In either
case, we conclude that d = a. Therefore, using the fact that a = c, it follows that b = d.
We next turn to Cartesian products. Given two sets A and B, we would like to form the set {(a, b) : a A
and b B}. Justifying that we can collect these elements into a set takes a little work. The idea is as
follows. For each fixed a A, we can assert the existence of {a} B = {(a, b) : b B} using Collection (and
Separation) because B is a set. Then using Collection (and Separation) again, we can assert the existence
of {{a} B : a A} since A is a set. The Cartesian product is then the union of this set. At later points,
we will consider this argument sufficient, but we give a slightly more formal version here to really see how
the axioms of Collection and Separation are applied and where the formulas come into play.
Proposition 10.2.3. For any two sets A and B, there exists a unique set, denoted by A B, such that for
all x, we have x A B if and only if there exists a A and b B with x = (a, b).
Proof. Let (b, x, a) be a formula expressing that x = (a, b) (think about how to write this down). We
have the statement
aB(b(b B !x(b, x, a)))
where ! is shorthand for there is a unique. Therefore, by Collection, we may conclude that
aBCb(b B x(x C (b, x, a)))
Next using Separation and Extensionality, we have
aB!Cb(b B x(x C (b, x, a)))
From this it follows that
ABa(a A !Cb(b B x(x C (b, x, a))))
Using Collection again, we may conclude that
ABFa(a A C(C F b(b B x(x C (b, x, a)))))
This implies
ABFab((a A b B) C(C F x(x C (b, x, a))))
Now let A and B be sets. From the last line above, we may conclude
that there exists F such that for all
S
a A and all b B, there exists C F with (a, b) C. Let D = F. Given any a A and b B, we then
have (a, b) D. Now applying Separation to the set D and the formula ab(a A b B (b, x, a)),
there is a set E such that for all x, we have x E if and only if there exists a A and b B with x = (a, b).
As usual, Extensionality gives uniqueness.
138
10.3
Now that we have ordered pairs and Cartesian products, we can really make some progress.
Definition 10.3.1. A relation is a set R such that every set x R is an ordered pair. In other words, R is
a relation if x(x R ab(x = (a, b))).
Given a relation R, we want to define its domain to be the set of first elements of ordered pairs which are
elements of R, and we want to define its range to be the set of second elements of ordered pairs which are
elements of R. These are good descriptions which can easily (though not shortly) be turned into formulas,
but we need to know that there is some set which contains all of these elements in order to apply Separation.
Since the elements
S S of an ordered pair (a, b) = {{a}, {a, b}} are two deep, a good exercise is to convince
yourself that
R will work. This justifies the following definitions.
Definition 10.3.2. Let R be a relation
1. dom(R) is the set of a such that there exists b with (a, b) R.
2. ran(R) is the set of b such that there exists a with (a, b) R.
Definition 10.3.3. Let R be a relation. We write aRb if (a, b) R.
Definition 10.3.4. Let A be a set. We say that R is a relation on A if dom(R) A and ran(R) A.
We define functions in the obvious way.
Definition 10.3.5. A function f is a relation which is such that for all a dom(f ), there exists a unique
b ran(f ) such that (a, b) f .
Definition 10.3.6. Let f be a function. We write f (a) = b if (a, b) f .
Definition 10.3.7. Let f be a function. f is injective (or an injection) if whenever f (a1 ) = b and f (a2 ) = b,
we have a1 = a2 .
Definition 10.3.8. Let A and B be sets. We write f : A B to mean that f is a function, dom(f ) = A
and ran(f ) B.
We are now in a position to define when a function f is surjective and bijective. Notice that surjectivity
and bijectivity are not properties of a function itself because these notions depend on a set which you consider
to contain ran(f ). Once we have a fixed such set in mind, however, we can make the definitions.
Definition 10.3.9. Let A and B be sets, and let f : A B.
1. f is surjective (or a surjection) if ran(f ) = B.
2. f is bijective (or a bijection) if f is injective and surjective.
Definition 10.3.10. Let A and B be sets.
1. We write A B to mean that there is an injection f : A B.
2. We write A B to mean that there is a bijection f : A B.
Proposition 10.3.11. Let A, B, and C be sets.
1. If A B and B C, then A C.
2. A A.
3. If A B, then B A.
4. If A B and B C, then A C.
10.4. ORDERINGS
10.4
139
Orderings
10.5
We specifically added the Axiom of Infinity with the hope that it captured the idea of the set of natural
numbers. We now show how this axiom, in league with the others, allows us to embed the theory of the
natural numbers into set theory. We start by defining the initial natural number and successors of sets.
Definition 10.5.1. 0 =
Definition 10.5.2. Given a set x, we let S(x) = x {x}, and we call S(x) the successor of x.
With 0 and the notion of successor, we can then go on to define 1 = S(0), 2 = S(1) = S(S(0)), and
continue in this way to define any particular natural number. However, we are seeking to form the set of all
natural numbers.
Definition 10.5.3. A set I is inductive if 0 I and for all x I, we have S(x) I.
The Axiom of Infinity simply asserts the existence of some inductive set J. Intuitively, we have 0 J,
S(0) J, S(S(0)) J, and so on. However, J may very well contain more than just repeated applications
of S to 0. We now use the top-down approach to generation to define the natural numbers (the other two
approaches will not work yet because their definitions rely on the natural numbers).
Proposition 10.5.4. There is a smallest inductive set. That is, there is an inductive set K such that K I
for every inductive set I.
Proof. By the Axiom of Infinity, we may fix an inductive set J. Let K = {x J : x I for every inductive
set I}. Notice that 0 K because 0 I for every inductive set I (and so, in particular, 0 J). Suppose
that x K. If I is inductive, then x I, hence S(x) I. It follows that S(x) I for every inductive set I
(and so, in particular, S(x) J), hence S(x) K. Therefore, K is inductive. By definition of K, we have
K I whenever I is inductive.
By Extensionality, there is a unique smallest inductive set, so this justifies the following definition.
140
141
With the Step Induction Principle in hand, we can begin to prove the basic facts about the natural
numbers. Our goal is to prove that < is a well-ordering on , but it will take some time to get there.
We first give a very simple inductive proof. For this proof only, we will give careful arguments using both
versions of Step Induction to show how a usual induction proof can be formalized in either way.
Lemma 10.5.10. For all n , we have 0 n.
Proof. The following two proofs correspond to the above two versions of the Induction Principle.
1. Let X = {n : 0 n}, and notice that 0 X. Suppose now that n X. We then have n
and 0 n, hence 0 < S(n) by Lemma 10.5.8, so S(n) X. Thus, by Step Induction, we have X.
Therefore, for all n , we have 0 n.
2. Let (n) be the formula 0 n. We clearly have (0) because 0 = 0. Suppose now that n and
(n). We then have 0 n, hence 0 < S(n) by Lemma 10.5.8. It follows that (S(n)). Therefore, by
Step Induction, we have 0 n for all n .
We give a few more careful inductive proof using the second version of the Induction Principle to illustrate
how parameters can be used. Afterwards, our later inductive proofs will be given in a more natural relaxed
style.
Our relation < is given by , but it is only defined on elements of . We thus need the following
proposition which says that every element of a natural number is a natural number.
Proposition 10.5.11. Suppose that n and m n. We then have m .
Proof. The proof is by induction on n; that is, we hold m fixed by treating it as a parameter. Thus, fix
m . Let X = {n : m n m }. Notice that 0 X because m
/ 0 = . Suppose now that
n X. We show that S(n) X. Suppose that m S(n) = n {n}. We then know that either m n,
in which case m by induction (i.e. because n X), or m = n, in which case we clearly have m .
It follows that S(n) X. Therefore, by Step Induction, we may conclude that X = . Since m was
arbitrary, the result follows.
Proposition 10.5.12. < is transitive on .
Proof. We prove the result by induction on n. Fix k, m . Let X = {n : (k < m m < n) k < n}.
We then have that 0 X vacuously because we do not have m < 0 by Lemma 10.5.7. Suppose now that
n X. We show that S(n) X. Suppose that k < m and m < S(n) (if not, then S(n) X vacuously).
By Lemma 10.5.8, we have m n, hence either m < n or m = n. If m < n, then k < n because n X. If
m = n, then k < n because k < m. Therefore, in either case, we have k < n, and hence k < S(n) by Lemma
10.5.8. It follows that S(n) X. Thus, by Step Induction, we may conclude that X = . Since k, m
were arbitrary, the result follows.
Lemma 10.5.13. Let m, n . We have S(m) n if and only if m < n.
142
143
Proof.
1. Let Y = {n : (m < n)(m X)}. Notice that Y and 0 Y because there is no m with
m < 0 by Lemma 10.5.7. Suppose that n Y . We show that S(n) Y . Suppose that m < S(n). By
Lemma 10.5.8, we have m n, hence either m < n or m = n. If m < n, then m X because n Y .
For the case m = n, notice that n X by assumption (because m X for all m < n). Therefore,
S(n) Y . By Step Induction, it follows that Y .
Now let n . We have n , hence S(n) because is inductive, so S(n) Y . Since n < S(n)
by Lemma 10.5.8, it follows that n X. Therefore, X.
2. This follows from part 1 using Separation. Fix sets ~q, and suppose that
(n )((m < n)(m, ~q) (n, ~q))
Let X = {n : (n, ~q)}. Suppose that n and m X for all m < n. We then have
(m < n)(m, ~q), hence (n, ~q) by assumption, so n X. It follows from part 1 that X.
Therefore, we have (n )(n, ~q).
It is possible to give a proof of part 2 which makes use of part 2 of the Step Induction Principle, thus
avoiding the detour through sets and using only formulas. This proof simply mimics how we obtained part 1
above, but uses formulas everywhere instead of working with sets. Although it is not nearly as clean, when
we treat ordinals, there will times when we need to argue at the level of formulas.
Theorem 10.5.18. < is a well-ordering on
Proof. By Proposition 10.5.12, Proposition 10.5.15, and Proposition 10.5.16, it follows that < is a linear
ordering on . Suppose then that Z and there is no n Z such that for all m Z, either n = m or
n < m. We show that Z = . Notice that for every n Z, there exists m Z with m < n by Proposition
10.5.12.
Let Y = \Z. We show that Y = using the Induction Principle. Notice first that 0 Y because if
0 Z, then there exists m Z with m < 0 by the last sentence of the previous paragraph, contrary to
Lemma 10.5.7. Suppose then that n is such that m Y , i.e. m
/ Z for all m < n. If n
/ Y , we would
then have that n Z, so by the last sentence of the previous paragraph, there exists m Z with m < n, a
contradiction. Therefore, n Z. Hence, by the Induction Principle, we have that Y = and so Z = .
Therefore, if Z and Z 6= , there exists n X such that for all m Z, either n = m or n < m. It
follows that < is a well-ordering on .
10.6
We know from Proposition 10.1.4 that there is no set u such that a u for all sets a. Thus, our theory forbids
us from placing every set into one universal set which we can then play with and manipulate. However, this
formal impossibility within our theory does not prevent us from thinking about or referring to the collection
of all sets or other collections which are too large to form into a set. After all, our universal quantifiers
do indeed range over the collection of all sets. Also, if we are arguing semantically, then given a model M
of ZF C, we may externally work with the power set of M .
We want to be able to reason about such collections of sets in a natural manner within our theory
without violating our theory. We will call such collections classes to distinguish them from sets. The idea
is to recall that any first-order theory can say things about certain subsets of every model: the definable
subsets. In our case, a formula (x) is implicitly defining a certain collection of sets. Perhaps this collection
144
is too large to put together into a set inside the model, but we may nevertheless use the formula in various
ways within our theory. For example, for any formulas (x) and (x), the sentence x((x) (x)) says
that every set which satisfies also satisfies . If there exist sets C and D such that x((x) x C) and
x((x) x D), then we can use Separation to form the sets A = {x C : (x)} and B = {x D : (x)},
in which case the sentence x((x) (x)) simply asserts that A B. However, even if we cant form these
sets (intuitively because {x : (x)} and {x : (x)} are too large to be sets), the sentence is expressing
the same underlying idea. Allowing the possibility of parameters, this motivates the following internal
definition.
Definition 10.6.1. A class C is a formula (x,~p).
Our course, this isnt a very good way to think about classes. Externally, a class is simply a definable
set (with the possibility of parameters). The idea is that once we fix sets ~q to fill in for the position of the
parameters, the formula describes the collection of those sets a such that (a, ~q). The first class to consider
is the class of all sets, which we denote by V. Formally, we define V to be the formula x = x, but we will
content ourselves with defining classes in the following more informal external style.
Definition 10.6.2. V is the class of all sets.
Heres a more interesting illustration of how classes can be used and why we want to consider them. Let
CR be the class of all relations and let CF be the class of all functions. More formally, CR is the formula
R (x) given by
y(y x ab(y = (a, b)))
while CF is the formula F (x) given by
y(y x ab(y = (a, b))) ab1 b2 (((a, b1 ) x (a, b2 ) x) b1 = b2 )
With this shorthand in place, we can write things like CF CR to stand for the provable sentence
x(F (x) R (x)). Thus, by using the language of classes, we can express complicated formulas in a
simplified, more suggestive, fashion. Of course, theres no real need to introduce classes because we could
always just refer to the formulas, but it is psychologically easier to think of a class as some kind of ultra-set
which our theory is able to handle, even if we are limited in what we can do with classes.
With the ability to refer to classes, why deal with sets at all? The answer is that classes are much less
versatile than sets. For example, if C and D are classes, it makes no sense to write C D because this
doesnt correspond to a formula built from the implicit formulas giving C and D. This inability corresponds
to the intuition that classes are too large to collect together into a set and then put into other collections.
Hence, asking whether V V is meaningless. Also, since classes are given by formulas, we are restricted to
referring only to definable collections. Thus, there is no way to talk about or quantify over all collections
of sets (something that is meaningless internally). However, there are many operation which do make sense
on classes.
For instance, suppose that R is a class of ordered pairs (with parameters p~). That is, R is a formula
(x, p~) such that the formula x((x, p~) ab(x = (a, b))) is provable. We think of R as a class relation.
Using suggestive notation, we can then go on to define dom(R) to be the class consisting of those sets a
such that there exists a set b with (a, b) R. To be precise, dom(R) is the class which is the formula
(a, p~) given by xb(x = (a, b) (x, p~)). Thus, we can think of dom() as a operation on classes (given
any formula (x, p~) which is a class relation, applying dom() results in the class given by the formula
xb(x = (a, b) (x, p~))).
Similarly, we can talk about class functions. We can even use notation like F : V V to mean that F is
a class function with dom(F) = V. Again, each of these expressions could have been written out as formulas
in our language, but the notation is so suggestive that its clear how to do this without actually having to
do it. An example of a general class function is U : V V V given by U(a, b) = a b. Convince yourself
how to write U as a formula.
145
We can not quantify over classes within our theory in the same way that we can quantify over sets because
there is no way to quantify over the formulas of set theory within set theory. However, we can, at the price
of considering one theorem as infinitely many (one for each formula), make sense of a theorem which does
universally quantify over classes. For example, consider the following.
Proposition 10.6.3. Suppose that C is a class, 0 C, and for all n , if n C then S(n) C. We
then have C.
This proposition is what is obtained from the first version of Step Induction on by replacing the set
X with the class C. Although the set version can be written as one sentence which is provable in ZFC, this
version can not because we cant quantify over classes in the the theory. Unwrapping this proposition into
formulas, it says that for every formula (x, p~), if we can prove (x, p~) and (n )((n, p~) (S(n), p~)),
then we can prove (n )(n, p~). That is, for each formula (x, p~), we can prove the sentence
~
p(((0, p~) (n )((n, p~) (S(n), p~))) (n )(n, p~))
Thus, the class version is simply a neater way of writing the second version of Step Induction on which masks
the fact that the quantification over classes requires us to write it as infinitely many different propositions
(one for each formula (x, p~)) in our theory.
Every set can be viewed as a class by making use of the class M given by the formula x p. That is,
once we fix a set p, the class x p describes exactly the elements of p. For example, using M in class version
of Step Induction on , we see that the following sentence is provable:
p((0 p (n )(n p S(n) p)) (n )(n p))
Notice that this is exactly the set version of Step Induction on .
On the other hand, not every class can be viewed as a set (look at V, for example). Let C be a class.
We say that C is a set if there exists a set A such that for all x, we have x C if and only if x A. At
the level of formulas, this means that if C is given by the formula (x, p~), then we can prove the formula
Ax((x, p~) x A). By Separation, this is equivalent to saying that there is a set B such that for all x,
if x C then x B (i.e. we can prove the formula Bx((x, p~) x B)). A class which is not set (that
is, we can prove (Ax((x, p~) x A))) is called a proper class. For example, V is a proper class.
The following proposition will be helpful to us when we discuss transfinite constructions. Intuitively, it
says that proper classes are too large to embedded into any set.
Proposition 10.6.4. Let C be a proper class and let A be a set. There is no injective class function
F : C A.
Proof. Suppose that F : C A is an injective class function. Let B = {a A : c(c C F(c) = a)} and
notice that B is a set by Separation (recall that C and F are given by formulas). Since for each b B, there
is a unique c C with F(c) = b (using the fact that F is injective), we may use Collection and Separation
to conclude that C is a set, contradicting the fact that C is a proper class.
We end this section by seeing how to simply restate the Axiom of Separation and the Axiom of Collection
in the language of classes.
Axiom of Separation: Every subclass of a set is a set.
Axiom of Collection: If F is a class function and A is a set, then there is a set containing the image of A
under F.
146
10.7
10.7.1
Finite Sets
Definition 10.7.1. Let A be a set. A is finite if there exists n such that A n. If A is not finite, we
say that A is infinite.
Proposition 10.7.2. Suppose that n . Every injective f : n n is bijective.
Proof. The proof is by induction on n . Suppose first that n = 0 and f : 0 0 is injective. We then have
f = , so f is trivially bijective. Suppose now that the result holds for n so that every injective f : n n is
bijective. Suppose that f : S(n) S(n). We then have f (n) n, and we consider two cases.
Case 1: Suppose that f (n) = n. Since f is injective, we have f (m) 6= n for every m < n, hence
f (m) < n for every m < n (because f (m) < S(n) for every m < n). It follows that f n : n n. Notice
that f n : n n is injective because f is injective, hence f n is bijective by induction. Therefore,
ran(f n) = n, and hence ran(f ) = S(n) (because f (n) = n). It follows that f is surjective, so f is bijective.
Case 2: Suppose that f (n) < n. We first claim that n ran(f ). Suppose instead that n
/ ran(f ).
Notice that f n : n n is injective because f is injective, hence f n is bijective by induction. Therefore,
f (n) ran(f n) (because f (n) < n), so there exists ` < n with f (`) = f (n), contrary to the fact that f is
injective. It follows that n ran(f ). Fix k < n with f (k) = n. Define a function g : n n by
(
f (m) if m 6= k
g(m) =
f (n) if m = k
Notice that if m1 , m2 < n with m1 6= m2 and m1 , m2 6= k, then g(m1 ) 6= g(m2 ) since f (m1 ) 6= f (m2 )
(because f is injective). Also, if m < n with m 6= k, then g(m) 6= g(k) since f (m) 6= f (n) (again because f is
injective). It follows that g : n n is injective, hence bijective by induction. From this we can conclude that
ran(f ) = S(n) as follows. Notice that f (n) ran(f ) and n ran(f ) because f (k) = n. Suppose that ` < n
with ` 6= f (n). Since g : n n is bijective, there exists a unique m < n with g(m) = `. Since ` 6= f (n), we
have m 6= k, hence f (m) = g(m) = `, so ` ran(f ). Therefore, ran(f ) = S(n), and hence f is bijective.
Corollary 10.7.3 (Pigeonhole Principle). If n, m and m > n, then m 6 n.
Proof. Suppose that f : m n is injective. It then follows that f n : n n is injective, hence f n is
bijective by Proposition 10.7.2. Therefore, since f (n) n, it follows that there exists k < n with f (k) = f (n),
contradicting the fact that f is injective. Hence, m 6 n.
Corollary 10.7.4. If m, n and m n, then m = n.
Proof. Suppose that m 6= n so that either m > n or m < n. If m > n, then m 6 n be the Pigeonhole
Principle, so m 6 n. If m < n, then n 6 m by the Pigonhole Principle, so n 6 m and hence m 6 n.
Corollary 10.7.5. If A is finite, there exists a unique n such that A n.
Definition 10.7.6. If A is finite, the unique n such that A n is called the cardinality of A and is
denoted by |A|.
Proposition 10.7.7. Let A be a nonempty set and let n . The following are equivalent:
1. A n.
2. There exists a surjection g : n A.
3. A is finite and |A| n.
147
Proof. 1 implies 2: Suppose that A n and fix an injection f : A n. Fix an element b A (which exists
since A 6= ). Define g : n A by letting
g = {(m, a) n A : f (a) = m} {(m, a) n A : m
/ ran(f ) and a = b}.
Notice that g : n A and that g is a surjection.
2 implies 1: Suppose that g : n A is a surjection. Define a set f by letting
f = {(a, m) A n : g(m) = a and g(k) 6= a for all k < m}
Using the fact < well-orders and that g is a surjection, it follows that f : A n. Also, f is injective
because g is a function.
1 implies 3: Suppose that A n. Let m be the least element of such that A m, and fix an injection
g : A m. We claim that g is a bijection. Notice that m 6= 0 because A is nonempty, so we may fix k
with m = S(k). If g is not a bijection, we could construct an injective h : A k, a contradiction.
3 implies 1: Suppose that A is finite and |A| n. Let m = |A| n and fix a bijection f : A m. We
then have that f : A n is an injection, so A n.
Corollary 10.7.8. Suppose that n . Every surjective g : n n is bijective.
Proof. Suppose that g : n n is surjective. Define an injective f : n n such that f g = idn as above.
We then have that f is bijective, hence g is bijective.
10.7.2
Finite Powers
It is possible to use ordered pairs to define ordered triples, ordered quadruples, and so on. For example, we
could define the ordered triple (a, b, c) to be ((a, b), c). However, with the basic properties of in hand, we
can give a much more elegant definition.
Proposition 10.7.9. Let A be a set and let n . There is a unique set, denoted by An , such that for all
f , we have f An if and only if f : n A.
Proof. As usual, uniqueness follows from Extensionality, so we need only prove existence. The proof is by
induction on n. Suppose that n = 0. Since for all f , we have f : 0 A if and only if f = , we may take
A0 = {}. Suppose that the result holds for n, i.e. there exists a set An such that for all f , we have f An
if and only if f : n A.
Fix a A. Notice that for each f An , there is a unique function fa : S(n) A such that fa (m) = f (m)
for all m < n and fa (n) = a (let fa = f {(n, a)} and use Lemma 10.5.8). Therefore, by Collection (since
An is a set), Separation, and Extensionality, there is a unique set Ca such that for all g, we have g Ca if
and only if g = fa for some a A. Notice that for every g : S(n) A with g(n) = a, there is an f : n A
such that g = fa (let f = g\{(n, a)}). Therefore, for every g, we have g Ca if and only if g : S(n) A and
g(n) = a.
By Collection (since A is a set), Separation, and Extensionality again, there is a set F such that S
for all
D, we have D F if and only if there exists a A with S
D = Ca . Notice that for all g, we have g F if
and only if there exists a A with g Ca . Let AS(n) = F. For all g, we then have g AS(n) if and only
g : S(n) A. Therefore, by induction, for every n , there is a set B such that for all f , we have f B
if and only if f : n A.
Proposition 10.7.10. Let A be a set. There is a unique set, denoted by A< , such that for all f , we have
f A< if and only if f An for some n .
Proof. By Collection (since is a set), Separation, and Extensionality, there is a S
unique set F such that for
all D, we have D F if and only if there exists n with D = An . Let A< = F . For every f , we then
have f A< if and only if f An for some n .
148
10.7.3
Finite Products
Suppose that f is a function with dom(f ) = n . We want to consider the Cartesian product of the sets
indexed indexed by f .
Y
[
f = {g ( ran(f ))n : g(i) f (i) for all i < n}
10.8
Definitions by Recursion
Theorem 10.8.1 (Step Recursive Definitions on - Set Form). Let A be a set, let b A, and let g : A
A. There exists a unique function f : A such that f (0) = b and f (S(n)) = g(n, f (n)) for all n .
Proof. We first prove existence. Call a set Z A sufficient if (0, b) Z and for all (n, a) Z, we have
(S(n), g(n, a)) Z. Notice that sufficient sets exists (since A is sufficient). Let
Y = {(n, a) A : (n, a) Z for every sufficent set Z}.
We first show that Y is sufficient. Notice that (0, b) Y because (0, b) Z for every sufficient set Z. Suppose
now that (n, a) Y . For any sufficient set Z, we have (n, a) Z, hence (S(n), g(n, a)) Z. Therefore,
(S(n), g(n, a)) Z for every sufficient set Z, so (S(n), g(n, a)) Y . It follows that Y is sufficient.
We next show that for all n , there exists a unique a A such that (n, a) Y . Let
X = {n : there exists a unique a A such that (n, a) Y }.
Since Y is sufficient, we know that (0, b) Y . Suppose that d A and d 6= b. Since the set ( A)\{(0, d)}
is sufficient (because S(n) 6= 0 for all n ), it follows that (0, d)
/ Y . Therefore, there exists a unique
a A such that (0, a) Y (namely, a = b), so 0 X. Suppose now that n X, and let c be the unique
element of A such that (n, c) Y . Since Y is sufficient, we have (S(n), g(n, c)) Y . Fix d A with
d 6= g(n, c). We then have that Y \{(S(n), d)} is sufficient (otherwise, there exists a A such that (n, a) Y
and g(n, a) = d, contrary to the fact that in this case we have a = c by induction), so by definition of Y
it follows that Y Y \{(S(n), d)}. Hence, (S(n), d)
/ Y . Therefore, there exists a unique a A such that
(S(n), a) Y (namely, a = g(n, c)), so S(n) X. By induction, we conclude that X = , so for all n ,
there exists a unique a A such that (n, a) Y .
Let f = Y and notice that f : A from above. Since Y is sufficient, we have (0, b) Y , so f (0) = b. Let
n . Since (n, f (n)) Y and Y is sufficient, it follows that (S(n), g(n, f (n))) Y , so f (S(n)) = g(n, f (n)).
We now prove uniqueness. Suppose that f1 , f2 : A are such that:
1. f1 (0) = b.
2. f2 (0) = b.
3. f1 (S(n)) = g(n, f1 (n)) for all n .
4. f2 (S(n)) = g(n, f2 (n)) for all n .
Let X = {n : f1 (n) = f2 (n)}. Notice that 0 X because f1 (0) = b = f2 (0). Suppose that n X so
that f1 (n) = f2 (n). We then have
f1 (S(n)) = g(n, f1 (n)) = g(n, f2 (n)) = f2 (S(n))
hence S(n) X. It follows by induction that X = , so f1 (n) = f2 (n) for all n .
149
As an example of how to use this result (assuming we already know how to multiply - see below), consider
how to define the factorial function. We want to justify the existence of a unique function f : such
that f (0) = 1 and f (S(n)) = f (n) S(n) for all n . We can make this work as follows. Let A = , b = 1,
and define g : by letting g(a, n) = a S(n) (here we are thinking that the first argument of g will
contain the accumulated value f (n)). The theorem now gives the existence and uniqueness of a function
f : such that f (0) = 1 and f (S(n)) = f (n) S(n) for all n .
However, this begs the question of how to define multiplication. Lets start by thinking about how to
define addition. The basic idea is to define it recursively. For any m , we let m + 0 = m. If m , and
we know how to find m + n for some fixed n , then we should define m + S(n) = S(m + n). It looks like
an appeal to the above theorem is in order, but how do we treat the m that is fixed in the recursion? We
need a slightly stronger version of the above theorem which allows a parameter to come along for the ride.
The proof is basically the same so we just give a short sketch.
Theorem 10.8.2 (Step Recursive Definitions with Parameters on ). Let A and P be sets, let h : P A,
and let g : P A A. There exists a unique function f : P A such that f (p, 0) = h(p) for all
p P , and f (p, S(n)) = g(p, n, f (p, n)) for all p P and all n .
Proof. One could reprove this from scratch following the above outline, but we give a simpler argument
using Collection. For each p P , define gp : A A by letting gp (n, a) = g(p, n, a) for all (n, a) A.
Using the above results without parameters, for each fixed p P , there exists a unique function fp : A
such that fp (0) = h(p) and fp (S(n)) = gp (n, fp (n)) for all n . By Collection and Separation, we may
form the set {fp : p }. Let f be the union of this set. It is then straightforward to check that f is the
unique function satisfying the necessary properties.
Definition 10.8.3. Let h : be defined by h(m) = m and let g : be defined by
g(m, n, a) = S(a). We denote the unique f from the previous theorem by +. Notice that + : , that
m + 0 = m for all m , and that m + S(n) = S(m + n) for all m, n .
Now that we have the definition of +, we can prove all of the basic axiomatic facts about the natural
numbers with + by induction. Heres a simple example.
Proposition 10.8.4. 0 + n = n for all n .
Proof. The proof is by induction on n. For n = 0, simply notice that 0 + 0 = 0. Suppose that n and
0 + n = n. We then have 0 + S(n) = S(0 + n) = S(n). The result follows by induction.
A slightly more nontrivial example is a proof that + is associative.
Proposition 10.8.5. For all k, m, n , we have (k + m) + n = k + (m + n).
Proof. We fix k, m , and prove the result is my induction on n. Notice that (k + m) + 0 = k + m =
k + (m + 0). Suppose that we know the result for n, so that (k + m) + n = k + (m + n). We then have
(k + m) + S(n) = S((k + m) + n)
= S(k + (m + n))
(by induction)
= k + S(m + n)
= k + (m + S(n))
The result follows by induction.
Definition 10.8.6. Let h : be defined by h(m) = 0 and let g : be defined by
g(m, a, n) = a + m. We denote the unique f from the previous theorem by . Notice that : , that
m 0 = 0 for all m , and that m S(n) = m n + m for all m, n .
150
From now on, we will present our recursive definitions in the usual mathematical style. For example, we
define iterates of a function as follows.
Definition 10.8.7. Let B be a set, and let h : B B be a function. We define, for each n , a function
hn by letting h0 = idB and letting hS(n) = h hn for all n .
For each fixed h : B B, this definition can be justified by appealing to the theorem with A = B B ,
b = idB , and g : A given by g(a, n) = h a. However, we will content ourselves with the above more
informal style when the details are straightforward and uninteresting.
The above notions of recursive definitions can only handle types of recursion where the value of f (S(n))
depends just on the previous value f (n) (and also n). Thus, it is unable to deal with recursive definitions such
as that used in defining the Fibonacci sequence where the value of f (n) depends on the two previous values
of f whenever n 2. We can justify these more general types of recursions by carrying along all previous
values of f in the inductive construction. Thus, instead of having our iterating function g : A A, where
we think of the first argument of g as carrying the current value f (n), we will have an iterating function
g : A< A, where we think of the first argument of g as carrying the finite sequence consisting of all values
f (m) for m < n. Thus, given such a g, we are seeking the existence and uniqueness of a function f : A
such that f (n) = g(f n) for all n . Notice that in this framework, we no longer need to put forward a
b A as a starting place for f because we will have f (0) = g(). Also, we do not need to include a number
argument in the domain of g because the current n in the iteration can recovered as the domain of the single
argument of g.
Theorem 10.8.8 (Recursive Definitions on ). Let A be a set and let g : A< A. There exists a unique
function f : A such that f (n) = g(f n) for all n .
Proof. We first prove existence. Call a set Z A sufficient if for all n and all q An such that
(k, q(k)) Z for all k < n, we have (n, g(q)) Z. Notice that sufficient sets exists (since A is sufficient).
Let
Y = {(n, a) A : (n, a) Z for every sufficent set Z}.
We first show that Y is sufficient. Suppose that n , that q An , and that (k, q(k)) Y for all k < n.
For any sufficient set Z, we have (k, q(k)) Z for all k < n, so (n, g(q)) Z. Therefore, (n, g(q)) Z for
every sufficient set Z, so (n, g(q)) Y . It follows that Y is sufficient.
We next show that for all n , there exists a unique a A such that (n, a) Y . Let
X = {n : there exists a unique a A such that (n, a) Y }.
Suppose that n is such that k X for all k < n. Let q = Y (n A) and notice that q An . Since
(k, q(k)) Y for all k < n and Y is sufficient, it follows that (n, g(q)) Y . Fix b A with b 6= g(q). We then
have that Y \{(n, b)} is sufficient (otherwise, there exists p An such that (k, p(k)) Y for all k < n and
g(p) = b, but this implies that p = q and hence b = a), so by definition of Y it follows that Y Y \{(n, b)}.
Hence, (n, b)
/ Y . Therefore, there exists a unique a A such that (n, a) Y , so n X. By induction, we
conclude that X = , so for all n , there exists a unique a A such that (n, a) Y .
Let f = Y and notice that f : A from above. Suppose that n . Let q = Y (n A) and notice
that q An and q = f n. Since (k, q(k)) Y for all k < n and Y is sufficient, it follows that (n, g(q)) Y ,
so f (n) = g(q) = g(f n).
We now prove uniqueness. Suppose that f1 , f2 : A are such that:
1. f1 (n) = g(f1 n) for all n .
2. f2 (n) = g(f2 n) for all n .
151
Let X = {n : f1 (n) = f2 (n)}. We prove by induction that X = . Let n and suppose that k X
for all k < n. We then have that f1 n = f2 n, hence
f1 (n) = g(f1 n) = g(f2 n) = f2 (n)
hence n X. It follows by induction that X = , so f1 (n) = f2 (n) for all n .
As above, there is a similar version when we allow parameters. If f : P A and p P , we use the
notation fp to denote the function fp : A given by fp (n) = f (p, n) for all n .
Theorem 10.8.9 (Recursive Definitions with Parameters on ). Let A and P be sets and let g : P A<
A. There exists a unique function f : P A such that f (p, n) = g(p, fp n) for all p P and n .
10.9
152
10.9.1
Countable Sets
10.9.2
General Powers
There is no reason to restrict to n in the above examples. In general, we want to define AB to be the
set of all functions from B to A. We can certainly make this definition, but it is the first instance where we
really need to use Power Set.
Proposition 10.9.6. Let A and B be sets. There is a unique set, denoted by AB , such that for all f , we
have f AB if and only if f : B A.
Proof. Notice that if f : B A, then f B A, hence f P(B A). Therefore, AB = {f P(B A) : f
is a function, dom(f ) = B, and ran(f ) = A}. As usual, uniqueness follows from Extensionality.
10.9.3
General Products
Chapter 11
11.2
153
154
Chapter 12
Well-Orderings
The ability to do induction and make definitions by recursion on was essential to developing the basic
properties of the natural numbers. With such success, we may wonder on which other kinds of structures
we can do induction and recursion. Looking at the Step Induction Principle and Step Recursive Definitions
on , it seems hard to generalize these ideas to anything more complicated than because by starting with
zero and taking successors we cant get any further. However, the more general versions of induction and
recursion which refer to the order on rather than just 0 and successors can be very fruitfully generalized
to any well-ordering.
Proposition 12.1.1 (Induction on Well-Orderings). Let (W, <) be a well-ordering.
1. Suppose that X is set and for all z W , if y X for all y < z, then z X. We then have W X.
2. For any formula (z,~p), we have the sentence
~p((z W )((y < z)(y,~p) (z,~p)) (z W )(z,~p))
3. Suppose that C is a class and for all z W , if y C for all y < z, then z C. We then have W C.
Proof.
1. Suppose that W * X so that W \X 6= . Since (W, <) is a well-ordering, there exists z W \X such
that for all y W \X, either z = y or z < y. Therefore, for all y W with y < z, we have y X
(becaues y
/ W \X). It follows from assumption that z X, contradicting the fact that z W \X.
Thus, it must be the case that W X.
2. This follows from part 1 using Separation. Fix sets ~q, and suppose that
(z W )((y < z)(y, ~q) (z, ~q))
Let X = {z W : (n, ~q)}. Suppose that z W and y X for all y < z. We then have
(y < z)(y, ~q), hence (z, ~q) by assumption, so z X. It follows from part 1 that W X. Therefore,
we have (z W )(z, ~q).
155
156
This is all well and good, but are there other interesting well-orderings other than (and every n )?
Well, any well-ordering has a smallest element. If there are any elements remaining, there must be a next
smallest element. Again, if there are any elements remaining, there must be a next smallest element, and so
on. Thus, any well-ordering begins with a piece that looks like .
However, we can build another longer well-ordering by taking , and adding a new element which is
greater than every element of . This can be visualized by thinking of the set
A = {1
1
R : n \{0}} {1}.
n
Its a simple exercise to check that A, ordered by inheritance from the usual order on R, is a well-ordering.
We can then add another new element which is greater than every element, and another and another and
so on, to get a well-ordering that is a copy of with another copy of on top of the first. We can add a
new element greater than all of these, and continue. These well-orderings beyond differ from (and all
n ) in that they have points that are neither initial points nor immediate successors of other points.
Definition 12.1.2. Let (W, <) be a well-ordering, and let z W .
1. If z y for all y W , we call z the initial point (such a z is easily seen to be unique).
2. If there exists y W such that there is no x W with y < x < z, we call z a successor point.
3. If z is neither an initial point nor a successor point, we call z a limit point.
A little thought will suggest that all well-orderings should be built up by starting at an initial point,
taking successors (perhaps infinitely often), and then jumping to a limit point above everything previously.
After all, if we already have an initial part that looks like , and we havent exhausted the well-ordering,
then there must be a least element not accounted for, and this is the first limit point. If we still havent
exhausted it, there is another least element, which is a successor, and perhaps another successor, and so on.
If this doesnt finish off the well-ordering, there is another least element not accounted for which will be the
second limit point.
This idea makes it seem plausible that we can take any two well-orderings and compare them by running
through this procedure until one of them runs out of elements. That is, if (W1 , <1 ) and (W2 , <2 ) are wellorderings, then either they are isomorphic, or one is isomorphic to an initial segment of the other. We now
develop the tools to prove this result. We first show that we can make recursive definitions along wellorderings. The proof is basically the same as the proof of the Induction Principle on because the only
important fact that allowed that argument to work was the property of the order < on (not the fact that
every element of was either an initial point or a successor point).
Definition 12.1.3. Let (W, <) be a well-ordering, and let z W . We let W (z) = {y W : y < z}.
Definition 12.1.4. Let (W, <) be a well-ordering. A set I W is called an initial segment of W if I 6= W
and whenever x I and y < x, we have y I.
Proposition 12.1.5. Suppose that (W, <) is a well-ordering and I is an initial segment of W . There exists
z W with I = W (z).
Proof. Since I is an initial seqment of W , we have I W and I 6= W . Hence, W \I 6= . Since (W, <)
is a well-ordering, there exists z W \I such that z y for all y W \I. We claim that I = W (z). If
y W (z), we then have y
/ W \I (because y < z), hence y I. Therefore, W (z) I. Suppose that y I
and y
/ W (z). We then have y z, hence z I because I is an initial segment, contradicting the fact that
z W \I. It follows that I W (z). Therefore, I = W (z) by Extensionality.
12.1. WELL-ORDERINGS
157
Definition 12.1.6. Let (W, <) be a well-ordering and let A be a set. We let
A<W = {f P(W A) : f is a function and f : W (z) A for some z W }
Theorem 12.1.7 (Recursive Definitions on Well-Orderings). Let (W, <) be a well-ordering, let A be a set,
and let g : A<W A. There exists a unique function f : W A such that f (z) = g(f W (z)) for all
z W.
Proof. We first prove existence. Call a set Z W A sufficient if for all z W and all q AW (z) such that
(y, q(y)) Z for all y < z, we have (z, g(q)) Z. Notice that sufficient sets exists (since W A is sufficient).
Let
Y = {(z, a) W A : (z, a) Z for every sufficent set Z}.
We first show that Y is sufficient. Suppose that z W , that q AW (z) , and that (y, q(y)) Y for all y < z.
For any sufficient set Z, we have (y, q(y)) Z for all y < z, so (z, g(q)) Z. Therefore, (z, g(q)) Z for
every sufficient set Z, so (z, g(q)) Y . It follows that Y is sufficient.
We next show that for all z W , there exists a unique a A such that (z, a) Y . Let
X = {z W : there exists a unique a A such that (z, a) Y }.
Suppose that z W is such that y X for all y < z. Let q = Y (W (z) A) and notice that q AW (z) .
Since (y, q(y)) Y for all y < z and Y is sufficient, it follows that (z, g(q)) Y . Fix b A with b 6= g(q).
We then have that Y \{(z, b)} is sufficient (otherwise, there exists p AW (z) such that (y, p(y)) Y for all
y < z and g(p) = b, but this implies that p = q and hence b = a), so by definition of Y it follows that
Y Y \{(z, b)}. Hence, (z, b)
/ Y . Therefore, there exists a unique a A such that (z, a) Y , so z X.
By induction, we conclude that X = W , so for all z W , there exists a unique a A such that (z, a) Y .
Let f = Y and notice that f : W A from above. Suppose that z W . Define q AW (z) by letting
q = Y (W (z) A) and notice that q = f W (z). Since (y, q(y)) Y for all y < z and Y is sufficient, it
follows that (z, g(q)) Y , so f (z) = g(q) = g(f W (z)).
We now prove uniqueness. Suppose that f1 , f2 : W A are such that:
1. f1 (z) = g(f1 W (z)) for all z .
2. f2 (z) = g(f2 W (z)) for all z .
Let X = {z W : f1 (z) = f2 (z)}. We prove by induction that X = W . Let z W and suppose that y X
for all y < z. We then have that f1 W (z) = f2 W (z), hence
f1 (z) = g(f1 W (z)) = g(f2 W (z)) = f2 (z)
hence z X. It follows by induction that X = W , so f1 (z) = f2 (z) for all z W .
Definition 12.1.8. Let (W1 , <1 ) and (W2 , <2 ) be well-orderings.
1. A function f : W1 W2 is order-preserving if whenever x, y W1 and x <1 y, we have f (x) <2 f (y).
2. A function f : W1 W2 is an isomorphism if it is bijective and order-preserving.
3. If W1 and W2 are isomorphic, we write W1
= W2 .
Proposition 12.1.9. Suppose that (W, <) is a well-ordering and f : W W is order-preserving. We then
have f (z) z for all z W .
Proof. We prove the result by induction on W . Suppose that z W and f (y) y for all y < z. Suppose
instead that f (z) < z, and let x = f (z). Since f is order-preserving and x < z, it follows that f (x) <
f (z) = x, contradicting the fact that f (y) y for all y < z. Therefore, f (z) z. The result follows by
induction.
158
Corollary 12.1.10.
1. If (W, <) is a well-ordering and z W , then W W (z).
2. If (W, <) is a well-ordering, then its only automorphism is the identity.
3. If (W1 , <1 ) and (W2 , <2 ) are well-orderings, and W1
= W2 , then the isomorphism from W1 to W2 is
unique.
Proof.
1. Suppose that W
= W (z) for some z W and let f : W W (z) be a witnessing isomorphism. Then
f : W W is order-preserving and f (z) < z (because f (z) W (z)), contrary to Proposition 12.1.9.
2. Suppose that f : W W is an automorphism of W which is not the identity. By Proposition 12.1.9,
we have f (z) z for all z W . Suppose that z W and let y = f (z). Since f 1 : W W is also
an automorphism of W , Proposition 12.1.9 implies that f 1 (y) y, hence z f (z). Combining this
with the above mentioned fact that f (z) z, it follows that z = f (z). Therefore, f is the identity.
3. Suppose that f : W1 W2 and g : W1 W2 are both isomorphisms. We then have that g 1 : W2 W1
is an isomorphism, hence g 1 f : W1 W1 is an automorphism. Hence, by part b, we may conclude
that g 1 f is the identity on W1 . It follows that f = g.
Theorem 12.1.11. Let (W1 , <1 ) and (W2 , <2 ) be well-orderings. Exactly one of the following holds.
1. W1
= W2 .
2. There exists z W2 such that W1
= W2 (z).
3. There exists z W1 such that W1 (z)
= W2 .
In each of the above cases, the isomorphism and the z (if appropriate) are unique.
Proof. We first prove that one of the three options holds. Fix a set a such that a
/ W1 W2 (such an
a exists by Proposition 10.1.4). Our goal is to define a function f : W1 W2 {a} recursively. Define
g : (W2 {a})<W1 W2 {a} as follows. Let q (W2 {a})<W1 and fix z W1 such that q : W1 (z)
W2 {a}. If a ran(q) or ran(q) = W2 , let g(q) = a. Otherwise ran(q) is a proper subset of W2 , and we let
g(q) be the <2 -least element of W2 \ran(q). By Theorem 12.1.7, there is a unique f : W1 W2 {a} such
that f (z) = g(f W1 (z)) for all z W1 .
Suppose first that a
/ ran(f ) so that f : W1 W2 . We begin by showing that ran(f W1 (z)) is an initial
segment of W2 for all z W1 by induction. Suppose that z W1 and ran(f W1 (y)) is an initial segment of
W2 for all y < z. If z is the initial point of W1 , then ran(f W1 (z)) = is certainly an initial segment of W2 .
Suppose that z is a successor point of W1 , and let y W1 be such that there is no x W1 with y < x < z. By
induction, we know that ran(f W1 (y)) is an initial segment of W2 . Since f (y) = g(f W1 (y)) is the <2 -least
element of W2 \(f W1 (y)), it follows that ran(f W1 (z)) = ran(f W1 (y)) {f (y)} isSan initial segment of
W2 . Suppose finally that z is a limit point of W1 . It then follows that ran(f W1 (z)) = y<z ran(f W1 (y)).
Since every element of the union is an initial segment of W2 , it follows that ran(f W1 (z)) is an initial segment
of W2 (note that it cant equal W2 because f (z) 6= a).
Therefore, ran(f W1 (z)) is an initial segment of W2 for all z W1 by induction. It follows that for
all y, z W1 with y < z, we have f (y) < f (z) (because ran(f W1 (z)) is an initial seqment of W1 and
f (y) ran(f W1 (z))), so f is order-preserving. This implies that f is an injection, so if ran(f ) = W2 , we
have W1
= W2 . Otherwise, ran(f ) is an initial segment of W2 , so by Proposition 12.1.5 there is a z W2
such that W1
= W2 (z).
12.2. ORDINALS
159
Suppose now that a ran(f ). Let z W1 be the <1 -least element of W1 such that f (z) = a. It
then follows that f W1 (z) : W1 (z) W2 is order-preserving by induction as above. Also, we must have
ran(f W1 (z)) = W2 because f (z) = a. Therefore, f W1 (z) : W1 (z) W2 is an isomorphism. This
completes the proof that one of the above 3 cases must hold.
The uniqueness of the case, the isomorphism, and the z (if appropriate), all follow from Corollary 12.1.10
With this result in hand, we now know that any well-ordering is uniquely determined by its length.
The next goal is to find a nice system of representatives for the isomorphism classes of well-orderings. For
that, we need to generalize the ideas that went into the construction of the natural numbers.
12.2
Ordinals
Our definition of the natural numbers had the advantage that the ordering was given by the membership
relation . This feature allowed us to define successors easily and to think of a natural number n as the set
of all natural numbers less than n. We now seek to continue this progession to measure well-orderings longer
than . The idea is to define successors as in the case of the natural numbers, but now to take unions to
acheive limit points.
The key property of (and each n ) that we want to use in our definition of ordinals is the fact that
well-orders (and each n ). We need one more condition to ensure that there are no holes or gaps
in the set. For example, well-orders the set {0, 2, 3, 5}, but we dont want to consider it as an ordinal
because it skipped over 1 and 4. We therefore make the following definition.
Definition 12.2.1. A set z is transitive if whenever x and y are sets such that x y and y z, we have
x z.
Definition 12.2.2. Let z be a set. We define a relation z on z by setting z = {(x, y) z z : x y}.
Definition 12.2.3. An ordinal is a set which is transitive and well-ordered by .
Our hard work developing the natural numbers gives us one interesting example of an ordinal.
Proposition 12.2.4. is an ordinal.
Proof. Proposition 10.5.11 says that is transitive, and Theorem 10.5.18 says that is well-ordered by
< = .
Proposition 12.2.5. If is an ordinal and , then is an ordinal.
Proof. We first show that is transitive. Let x and y be sets with x y and y . Since y , , and
is transitive, it follows that y . Since x y and y , it follows that x . Now since x, y, ,
x y, y , and is transitive on , we may conclude that x . Therefore, is transitive.
Notice that because and is transitive. Therefore, is the restriction of to the subset
. Since is a well-ordering on , it follows that is a well-ordering on . Hence, is an ordinal.
Corollary 12.2.6. Every n is an ordinal.
Lemma 12.2.7. If is an ordinal, then
/ .
Proof. Suppose that is an ordinal and . Since , it follows that is not asymmetric on ,
contradicting the fact that is a well-ordering on .
Proposition 12.2.8. If is an ordinal, then S() is an ordinal.
160
Proof. We first show that S() is transitive. Suppose that x y S(). Since y S() = {}, either
y or y = . Suppose first that y . We then have x y , so x because is transitive. Hence,
x S(). Suppose now that y = . We then have x because x y, so x S().
We next show that S() is transitive on S(). Let x, y, z S() with x y z. Since z S(), either
z or z = . Suppose first that z . We then have y (since y z and is transitive), and
hence x (since x y and is transitive). Thus, x, y, z , so we may conclude that x z using
the fact that is transitive on . Suppose now that z = . We then have x = z because x y
and is transitive.
We next show that S() is asymmetric on S(). Let x S(). If x , then x
/ x because is
asymmetric on . If x = , then x
/ x by Lemma 12.2.7.
We now show that S() is connected on S(). Let x, y S(). If x and y , then either x y,
x = y, or y x because is connected on . If x = and y = , we clearly have x = y. Otherwise, one
of x, y equals , and the other is an element of , if which case were done.
Finally, suppose that X S() and X 6= . If X = , then we must have X = {}, in which case
X clearly has a S() -least element. Suppose that X 6= . Since X is nonempty and is a
well-ordering on , there exists a -least element in X . For any X, either in which case we
have have either = or by choice of , or = in which case (because ). Therefore,
X has a S() -least element.
Proposition 12.2.9. Suppose that and are ordinals. We then have if and only if either =
or .
Proof. () If = , then clearly and if we can use the fact that is transitive to conclude
that .
() Suppose that and 6= . Notice that \ is an nonempty subset of , so there exists a
-least element of \, call it z. We show that = z, hence . We first show that z . Let x z.
Since z and is transitive, we have x . Since x z, we can not have x \ by choice of z, so
x . Thus, z . We next show that z. Let x . Since , we have x . Using the fact
that x, z and is connected on , we know that either x z, x = z, or z x. We can not have x = z
because x and z \. Also, we can not have z x, because if z x we can also conclude that z
(because z x and is transitive), contadicting the fact that z \. Thus, z. It follows that
z = (by Extensionality), so .
Proposition 12.2.10. Suppose that and are ordinals. Exactly one of , = , or holds.
Proof. We first show that at least one , = , holds. We first claim that is an ordinal.
If x y , then x y and x y , so x and x (because and are transitive), and
hence x . Thus, is transitive. Notice that is the restriction of to the subset .
Since is a well-ordering on , it follows that is a well-ordering on . Hence, is an ordinal.
Now we have and . If 6= and 6= , then and by
Proposition 12.2.9, hence , contrary to Lemma 12.2.7. Therefore, either = or = .
If = , we then have , hence either = or by Proposition 12.2.9. Similarly, if = ,
we then have , hence either = or by Proposition 12.2.9. Thus, in any case, at least one
, = , or holds.
We finish by showing that exactly one of , = , or holds. If and = , then ,
contrary to Lemma 12.2.7. Similary, if = and , then , contrary to Lemma 12.2.7. Finally, if
and , then (because is transitive), contrary to Lemma 12.2.7.
Definition 12.2.11. If and are ordinals, we write < to mean that .
Proposition 12.2.12. Suppose that and are ordinals. If
= as well-orderings, then = .
12.2. ORDINALS
161
Proof. If 6= , then either < or < by Proposition 12.2.10. Suppose without loss of generality that
< . We then have that the well-ordering is an initial segments of the well-ordering (in the notation
for well-orderings, we have = ()), hence by Corollary 12.1.10.
By the above results, it seems that we are in a position to say that < is a linear ordering on the collection
of all ordinals. However, there is a small problem here. We do not know that the class of all ordinals is a
set. In fact, we will see below that the collection of all ordinals is a proper class.
Definition 12.2.13. ORD is the class of all ordinals.
We first establish that nonempty sets of ordinals have least elements.
Proposition 12.2.14. IfTA is a nonempty subset of ORD, then A has a least element. Furthermore the
least element is given by A.
Proof. Since A 6= , we may fix an ordinal A. If A = , then for any A, we can not have ,
hence either = or by Proposition 12.2.10. Suppose that A 6= . Since A is nonempty,
it has an -least element, call it . Let A and notice that is an ordinal. By Proposition 12.2.10,
either , = , or . If , then A , so either = or by choice of . If = ,
then because . If , we then have , so because is transitive. It follows that
is the least element of A.
T
Therefore, we know that A has a least element, call it . Since A, we certainly have A T. For
all A, we then
T have either = or , hence by Proposition 12.2.9. Therfore, A. It
follows that = A.
S
S
Proposition
12.2.15. If A is aSsubset of ORD, then A is an ordinal. Furthermore, we have A = sup A,
S
i.e. A for all A and A whenever is an ordinal with for all A.
S
S
S
Proof. We first show that A is transitive. Suppose that x y A. Since y A, there exists A,
necessarily an ordinal,
y A. Since is transitive and x y , we can conclude that x .
S such that S
It follows that x A. Hence, A is transitive.
S
S
S
We next show that S A is transitive on A. Let x, y, z A with x y z. Since z A, there
exists A, necessarily an ordinal, such that z A. Since z and is an ordinal, we may
use Proposition 12.2.5 to conclude that z is an ordinal. Thus, z is transitive, so we may use the fact that
x y z to conclude that x z.
S
S
We next show that S A is asymmetric on A. Let x A and fix A, necessarily an ordinal, such
that x A. Using Proposition 12.2.5 again, it follows that x is an ordinal, hence x
/ x by Lemma
12.2.7.
S
S
We now show that S A is connected on A. Let x, y A. Fix , A, necessarily ordinals, such
that x A and y A. Again, using Proposition 12.2.5, we may conclude that x and y are ordinals,
hence either x y, x = y, or yS x by Proposition 12.2.10
Finally, suppose that X A and X 6= . Notice that for any y X, there exists A, necessarily an
ordinal, such that y A, and hence y is an ordinal by Proposition 12.2.10. Therefore, X is a nonempty
subset of ORD, so by Proposition 12.2.14 we may conclude that X has a least element (with respect to
S A ).
S
We
S now show that A =Ssup A. Suppose
S that A. For any , we
S have A, hence
A. It follows that A, hence A by Proposition 12.2.9. Thus, A is an upper boundSfor
A. Suppose that is an upper bound for A, i.e. is an ordinal and for all A. For
S any A,
we
may
fix
A
such
that
and
notice
that
,
so
.
It
follows
that
A , hence
S
S
A by Proposition 12.2.9. Therefore, A = sup A.
Proposition 12.2.16. ORD is a proper class.
162
Proof. Suppose that ORD is a set, so that there is a set O such that is an ordinal if and only O. In
this case, O is a transitive set (by Proposition 12.2.5) which is well-ordered by O (transitity follows from
the fact that ordinals are transitive sets, asymmetry follows from Lemma 12.2.7, connectedness follows from
Proposition 12.2.10, and the fact that every nonempty subset has a least element is given by Proposition
12.2.14). Therefore, O is an ordinal and so it follows that O O, contrary to Lemma 12.2.7. Hence, ORD
is not a set.
Since ORD is a proper class, there are subclasses of ORD which are not subsets of ORD. We therefore
extend Proposition 12.2.14 to the case of nonempty subclasses of ORD. The idea is that if we fix an C,
then C becomes a set of ordinals, so we can apply the above result.
Proposition 12.2.17. If C is a nonempty subclass of ORD, then C has a least element.
Proof. Since C 6= , we may fix an ordinal C. If C = , then for any C, we can not have
, hence either = or by Proposition 12.2.10. Suppose that C 6= . In this case, C is
a nonempty set of ordinals by Separation, hence C has a least element by Proposition 12.2.14. It now
follows easily that is the least element of C.
Proposition 12.2.18 (Induction on ORD). Suppose that C ORD and that for all ordinals , if C
for all < , then C. We then have C = ORD.
Proof. Suppose that C ( ORD. Let B = ORD\C and notice that B is a nonempty class of ordinals. By
Proposition 12.2.17, it follows that B has a least element, call it . For all < , we then have
/ B,
hence C. By assumption, this imples that C, a contradiction. It follows that C = ORD.
This gives a way to do strong induction on the ordinals, but there is a slightly more basic version. We
cant get around looking at many previous values at limit ordinals, but we can by with just looking at the
previous ordinal in the case of successors.
Proposition 12.2.19 (Step/Limit Induction on ORD). Suppose that C ORD and that
1. 0 C.
2. Whenever C, we have S() C.
3. Whenever is a limit ordinal and C for all < , we have C.
We then have C = ORD.
Proof. Suppose that C ( ORD. Let B = ORD\C and notice that B is a nonempty class of ordinals. By
Proposition 12.2.17, it follows that B has a least element, call it . We cant have = 0 because 0 C.
Also, it is not possible that is a successor, say = S(), because if so, then
/ B (because < ), so
C, hence = S() C. Finally, suppose that is a limit. Then for for all < , we have
/ B,
hence C. By assumption, this imples that C, a contradiction. It follows that C = ORD.
Theorem 12.2.20 (Recursive Definitions on ORD). Let G : V V be a class function. There exists a
unique class function F : ORD V such that F() = G(F ) for all ORD.
Theorem 12.2.21 (Recursive Definitions with Parameters on ORD). Let P be a class and let G : P V
V be a class function. There exists a unique class function F : PORD V such that F(p, ) = G(Fp )
for all p P and all ORD.
Theorem 12.2.22. Let (W, <) be a well-ordering. There exists a unique ordinal such that W
= .
163
12.3
Arithmetic on Ordinals
Definition 12.3.1. We define ordinal addition (that is a class function + : ORD ORD ORD) recursively as follows.
1. + 0 = .
2. + S() = S( + ).
S
3. + = { + : < } if is a limit ordinal.
Similarly, we define ordinal multiplication recursively as follows.
1. 0 = 0.
2. S() = + .
S
3. = { : < } if is a limit ordinal.
Finally, we define ordinal exponentiation recursively as follows.
1. 0 = 1.
2. S() = .
S
3. = { : < } if is a limit ordinal.
Proposition 12.3.2. Let , , and be ordinals. If , then + + .
Proof. Fix ordinals and . We prove by induction on that if , then + + . If = , this
is trivial. Suppose that and we know the result for . We then have
+ +
< S( + )
= + S()
Suppose now that > is a limit ordinal. We then have
[
+ { + : < }
=+
(since < )
164
Proposition 12.3.3. Let , , and be ordinals. We have < if and only if + < + .
Proof. Notice first that
+ < S( + ) = + S()
Now for any ordinal > , we have S() , hence
+ < + S() +
Proposition 12.3.4. Let and be ordinals. If is a limit ordinal, then + is a limit ordinal.
Proof. Since is a limit ordinal, we have
+ =
{ + : < }
Suppose now that < + , and fix an ordinal < such that < + . We then have that S() <
because is a limit ordinal, hence
S() < S( + )
= + S()
+
It follows that + is a limit ordinal.
Proposition 12.3.5. Let , , and be ordinals. We have ( + ) + = + ( + ).
Proof. Fix ordinals and . We prove that ( + ) + = + ( + ) for all ordinals by induction. Suppose
first that = 0. We then have
( + ) + 0 = +
= + ( + 0)
Suppose now that is an ordinal and we know that ( + ) + = + ( + ). We then have
( + ) + S() = S(( + ) + )
= S( + ( + ))
(by induction)
= + S( + )
= + ( + S())
Suppose now that is a limit ordinal and we know that ( + ) + = + ( + ) for all < . We then
have
[
( + ) + = {( + ) + : < }
[
= { + ( + ) : < }
[
= { + : < + }
= + ( + )
where the last line follows because + is a limit ordinal.
12.4. CARDINALS
12.4
165
Cardinals
166
Proposition 12.4.11. Let A be a set. There exists an ordinal such that A if and only if A can be
well-ordered.
Proof. Suppose first that there exists an ordinal such that A . We use a bijection between A and
to transfer the ordering on the ordinals to an ordering on A. Let f : A be a bijection. Define a relation
< on A by letting a < b if and only if f (a) < f (b). It is then straightforward to check that (A, <) is a
well-ordering (using the fact that (, ) is a well-ordering).
For the converse direction, suppose that A can be well-ordered. Fix a relation < on A so that (A, <) is a
well-ordering. By Theorem 12.2.22, there is an ordinal such that A
= . In particular, we have A .
Of course, this leaves open the question of which sets can be well-ordered. Below, we will use the Axiom
of Choice to show that every set can be well-ordered.
Definition 12.4.12. Let A be a set which can be well-ordered. We define |A| to be the least ordinal such
that A .
Lemma 12.4.13. If A can be well-ordered, then |A| is a cardinal.
12.5
167
168
Chapter 13
170
Proposition 13.1.4. If f : R R and y R, then f is continuous at y if and only if for every sequence
{xn }n with lim xn = y, we have lim f (xn ) = f (y).
n
Proof. The left-to right direction is unproblematic. For the right-to-left direction, the argument is as follows.
Suppose that f is not continuous at y, and fix > 0 such that there is no > 0 such that whenever |xy| < ,
we have |f (x) f (y)| < . We define a sequence as follows. Given n , let xn be an arbitrary real number
with |xn y| < n1 such that |f (xn ) f (y)| . Again, were making infinitely many arbitrary choices in the
construction.
Suppose that f is not continuous at y, and fix > 0 such that there is no > 0 such that whenever
|xy| < , we have |f (x)f (y)| < . Define a function H : R+ P(R) by letting H() = {x R : |xy| <
and |f (x) f (y)| . Notice that H() 6= for every R+ by assumption. Let h : P(R)\{} R be a
choice function. For each n , let xn = h(H( n1 )). One then easily checks that lim xn = y but its not the
n
Another example is the proof is the countable union of countable sets is countable. Let {An }n be
countable
sets. The first step is to fix injections fn : An for each n and then build an injection
S
f : n An from these. However, we are again making infinitely many arbitrary choices when we fix
the injections. Well prove a generalization of this fact using the Axiom of Choice below.
S
Example. Let F = P()\{0}. Notice that F = . We can prove the existence of a choice function for F
without the Axiom of Choice as follows. Define g : F by letting g(A) be the <-least element of A for
every A P()\{0}. More formally, we define g = {(A, a) F : a A and a b for all b A} and
prove that g is a choice function on F.
Proposition 13.1.5. Without the Axiom of Choice, one can prove that if F is a family of nonempty sets
and F is finite, then F has a choice function.
13.2
171
2. Zorns Lemma.
Proof. 1 implies 2: Let (P, <) be nonempty partially ordered set with the property that each chain in P has
an upper bound in P . Let g : P(P )\{} P be a choice function. Fix x
/ P . We define a class function
F : ORD P recursively as follows. If x ran(F ), let F() = x. Also, if ran(F ) A and there is no
q P such that q > p for every p ran(F ), let F() = x. Otherwise, ran(F ) A and {q P : q > p
for every p ran(F )} =
6 , and we let F() = g({q P : q > p for every p ran(F )}). We know that
F can not be injective, so as above we must have x ran(F). Fix the least ordinal such that F() = x.
A straightforward induction shows that ran(F ) is injective and that ran(F ) is a chain in P .
Notice that 6= 0 because P 6= . Suppose that is a limit ordinal. Since ran(F ) is a chain in
P , we know by assumption that there exists q P with q p for all p ran(F ). Notice that we can
not have q = F() for any < because we would then have + 1 < (because is a limit ordinal)
and q < F( + 1) by definition of F, contrary to the fact that q p for all p ran(F ). It follows that
q > p for all p ran(F ), hence F() 6= x, a contradiction. It follows that is a successor ordinal, say
= S(). Since F() 6= x and F(S()) = x, it follows that F() is a maximal element of P .
2 implies 1: Let F be a family of nonempty sets. We use Zorns Lemma to show that F has a choice
function. Let P = {q : q is a function, dom(q) F, and q(A) A for every A dom(q)}. Given p, q P ,
we let p < q if and only if p ( q. It is easy to check
that (P, <) isSa partial ordering. Notice that P 6=
S
because P . Also, if H is a chain in P , then H P , and p H for all p H. It follows that every
chain in P has an upper bound in P . By Zorns Lemma, P has a maximal element which we call g. We
need only show that dom(g) = F. Suppose instead that dom(g) ( F, and fix A F\dom(g). Fix a A.
We then have g {(A, a)} P and g < g {(A, a)}, a contradiction. It follows that dom(g) = F, so g is a
choice function on F.
13.3
Once we adopt the Axiom of Choice, it follows that every set can be well-ordered. Therefore, |A| is defined
for every set A.
Proposition 13.3.1. Let A and B be sets.
1. A B if and only if |A| |B|.
2. A B if and only if |A| = |B|.
Proof.
1. Suppose first that |A| |B|. Let = |A| and let = |B|, and fix bijections f : A and g : B.
Since , we have and so we may consider g f : A B. One easily checks that this is an
injective function.
Suppose now that A B, and fix an injection h : A B. Let = |A| and let = |B|, and fix
bijections f : A and g : B . We then have that g h f : is an injection, hence .
2. Suppose first that A B. We then have that |A| |B| and |B| |A| by part 1, hence |A| = |B|.
3. Suppose now that |A| = |B|. By part 1, we then have that A B and B A, hence A B by the
Cantor-Bernstein Theorem.
172
Proof. Since A is infinite, there exists such that |A| = . We then have A A , hence
|A A| . We clearly have |A A|, hence |A A| = = |A|.
Proposition 13.3.3.
Let F be a family of sets. Suppose that |F| and that |A| for every A F.
S
We then have | F| .
Proof. Let = |F| (notice that ), and fix a bijection f : F.
S Also, for each A F, fix an injection
S
gA : A (using the Axiom of Choice). We define an injection h :
F as follows. Given S
b F,
let be the least ordinal such that b f (), and set h(b) = (, gf () (b)). Suppose that b1 , b2 F and
h(b1 ) = h(b2 ). Let 1 be the least ordinal such that b1 f (1 ) and let 2 be the least ordinal such that
b2 f (2 ). Since h(b1 ) = h(b2 ), it follows that 1 = 2 , and we call their common value . Therefore,
using the fact that h(b1 ) = h(b2 ) again, we conclude that gf () (b1 ) = gf () (b2 ). Since gf () is an injection,
it follows that b1 = b2 . Hence, h : F is an injection, so we may conclude that |F| .
Proposition 13.3.4. |A< | = |A| for every infinite set A.
Proof. Using Proposition
13.3.2 and induction (on ), it follows that |An | = |A| for every n with n 1.
S n
<
Since A
= {A : n }, we may use Proposition 13.3.3 to conclude that |A< | 0 |A| = |A|. We
clearly have |A| |A< |, hence |A< | = |A|.
Definition 13.3.5. Let A and B be sets. We let AB be the set of all functions from B to A.
B2
1
Proposition 13.3.6. Let A1 , A2 , B1 , B2 be sets with A1 A2 and B1 B2 . We then have AB
1 A2 .
Now that weve adopted the Axiom of Choice, we know that AB can be well-ordered for any sets A and
B, so it makes sense to talk about |AB |. This gives us a way to define cardinal exponentiation.
Definition 13.3.7. Let and be cardinals. We use to also denote the cardinality of the set . (So,
were using the same notation to denote both the set of functions from to and also its cardinality).
Proposition 13.3.8. Let , , and be cardinals.
1. + = .
2. = ( ) .
3. ( ) = .
Proof. Fix sets A, B, C such that |A| = , |B| = , and |C| = (we could use , , and , but its easier
to distinguish sets from cardinals).
1. It suffices to find a bijection F : AB{0}C{1} AB AC . We define F as follows. Given f : B
{0} C {1} A, let F (f ) = (g, h) where g : B A is given by g(b) = f ((b, 0)) and h : C A is
given by h(c) = f ((c, 1)).
2. It suffices to find a bijection F : (AB )C ABC . We define F as follows. Given f : C AB , let
F (f ) : B C A be the function defined by F (f )((b, c)) = f (c)(b) for all b B and c C.
3. It suffices to find a bijection F : AC B C (A B)C . We define F as follows. Given g : C A
and h : C B, let F ((g, h)) : C A B be the function defined by F ((g, h))(c) = (g(c), h(c)) for all
c C.
173
174
Chapter 14
Subsets of R
14.1.1
The Reals
X
q(n)
10n
n=0
is injective, so 20 |R|.
Proposition 14.1.2. If a, b R and a < b, then |(a, b)| = 20 .
Proof. The above injection shows that |(0, 1)| = 20 and for all a, b R with a < b, we have (0, 1) (a, b).
Proposition 14.1.3. If O is a nonempty open subset of R, then |O| = 20 .
Proof. Every nonempty open subset of R contains an open interval.
14.1.2
Perfect Sets
Definition 14.1.4. Let P R. We say that P is perfect if it is closed and has no isolated points.
Example. [a, b] is perfect for all a, b R with a < b.
Proposition 14.1.5. The Cantor Set C defined by
X
q(n 1)
: q {0, 2} }
n
3
n=1
C={
is perfect.
175
176
1
2||
14.1.3
Closed Sets
14.1. SUBSETS OF R
177
Proof. Recall that a set is closed if and only if its complement is open. We show that C 0 is open. Fix x C 0 .
If x
/ C, then since C is closed, we may fix > 0 such that (x , x + ) C C 0 . Suppose then that
x C. Since x
/ C 0 , we know that x is an isolated point of C. Fix > 0 such that C (x , x + ) = {x}.
We then have that (x , x + ) C 0 . Therefore, C 0 is open, hence C 0 is closed.
Proposition 14.1.11. If C R is a closed set, then C\C 0 = {x R : x is an isolated point of C} is
countable.
Proof. Define a function f : C\C 0 Q Q by letting f (x) = (q, r) where (q, r) is least (under some fixed
well-ordering of Q Q) such that C (q, r) = {x}. We then have that f is injective, hence C\C 0 is countable
because Q Q is countable.
Definition 14.1.12. Let C R be a closed set. We define a sequence C () for < 1 recursively as follows.
1. C (0) = C.
2. C (+1) = (C () )0 .
T
3. C () = {C () : < } if is a limit.
Notice that each C () is closed and that C () C () whenever < < 1 be a trivial induction.
Proposition 14.1.13. Let C R be a closed set. There exists an < 1 such that C (+1) = C () .
Proof. Suppose that C (+1) 6= C () for all < 1 . Define a function f : 1 Q Q by letting f () = (q, r)
where (q, r) is least (under some fixed well-ordering of Q Q) such that there is a unique element of
C () (q, r). We then have that f is injective, contrary to the fact that |Q Q| = 0 .
Theorem 14.1.14. Let C R be a closed set. There exists a perfect set P R and a countable A R
such that C = A P and A P = .
S
Proof. Let < 1 be least such that C (+1) = C () . Let P = C () and let A = < (C () \C (+1) ). Notice
that C = A P and A P = , that P is perfect because P = P 0 , and that A is countable because it is the
countable union of countable sets.
Corollary 14.1.15. If C R is an uncountable closed set, then |C| = 20 .
Proof. Let C R be an uncountable closed set. We have |C| 20 because C R. Let P be perfect and
A countable such that C = A P and A P = . Since C is uncountable, we have P 6= , hence |P | = 20 ,
and so |C| 20 .
14.1.4
Borel Sets
Definition 14.1.16. Let O be the set of open subsets of R. We define the set B of Borel sets to be the
smallest subset of P(R) such that
1. O B.
2. If A B, then R\A B.
3. If An B for all n , then
An B.
178
S
3. = { n An : each An for some < } and = {R\A : A } if is a limit.
S
Proposition 14.1.18. B = <1 .
Corollary 14.1.19. |B| = 20 .
Corollary 14.1.20. B 6= P(R).
Proof. We have |P(R)| = 2|R| = 22
14.1.5
> 20 = |B|.
Measurable Sets
14.2
14.2.1
Proposition 14.2.1. Let L be a language and suppose that F ormL is satisfiable. There exists a model
(M, s) of such that |M | |L| + 0 .
Proof. We already proved this last quarter when L was countable (and particular when L was finite). Suppose
that L is infinite and let = |L|.
Recall the proof of the Completeness Theorem. Notice that if L is consistent, then L0 formed in the first
step of adding witnesses satisfies |L0 | = because |F ormL V ar|S= < 0 = . Thus, each Ln acheived
by iteratively adding witnesses satisfies |Ln | = , so the final L0 = n Ln satisfies |L0 | = . It follows that
|T ermL0 | = , and since the L0 -structure M0 we constructed in the proof of the Completeness Theorem is
formed by taking the quotient from an equivalence relation on the countable T ermL0 , we can conclude that
|M 0 | . Therefore, the L-structure M0 L from the proof of the Completeness Theorem has cardinality
at most .
Theorem 14.2.2 (Lowenheim-Skolem Theorem). Let L be a language and suppose that F ormL has an
infinite model. Let |L| + 0 . There exists a model (M, s) of such that |M | = .
Proof. Suppose that |L|. Let L0 be L together with new constant symbols c for all < . Notice that
|L0 | = |L| + = . Let
0 = {c 6= c : , < and 6= }
Notice that every finite subset of 0 has a model by using an infinite model of and interpreting the
constants which appear in the finite subset as distinct elements. Therefore, by Compactness, we know that
is a model. By Proposition 14.2.1, there exists a model (M0 , s) of 0 such that |M 0 | |L0 | + 0 = .
Notice that we must also have |M 0 | , hence |M 0 | = . Letting M be the restiction of the structure M0
to the language L, we see that (M, s) is a model of and that |M | = .
14.2.2
179
Counting Models
Definition 14.2.3. Given a theory T in a language L and a cardinal , let I(T, ) be the number of models
of T of cardinality up to isomorphism.
Proposition 14.2.4. Let T be a theory in a language L with |L| = . For any infinite cardinal , we have
I(T, ) 2 . In particular, if is infinite, then I(T, ) 2 .
Proof. Let be an infinite cardinal. We have
I(T, ) |C| |P(< )||R| |P(< )||F |
|C| |P()||R| |P()||F |
(2 ) (2 )
(2 ) (2 ) (2 )
= 2
180
Proof. Let T be a theory such that all models of T are infinite. Suppose that T is not complete and fix
SentL such that
/ T and
/ T . We then have that T {} and T {} are both satisfiable with
infinite models (because all models of T are infinite), so by the Lowenheim-Skolem Theorem we may fix a
model M1 of T {} and a model M2 of T {} such that |M1 | = = |M2 |. We then have that M1 and
M2 are models of T which are not isomorphic, hence I(T, ) 2.
Corollary 14.2.10. DLO and each ACFp are complete.
Theorem 14.2.11 (Morleys Theorem). Let L be a countable language and let T be a theory. If T is
-categorical for some 1 , then T is -categorical for all 1 .
14.3
Let L be a language. Let I be a set, and suppose that for each i I we have an L-structure Mi . For initial
clarity, think of I = , so that we have L-stuctures M0 , M1 , M2 , . . . . We want a way to put together all
of the Mi which somehow blends the properties of the Mi together
into one structure. An initial thought
Q
is to form
a
product
of
the
structures
M
with
underlying
set
M
. That is, M consists of all functions
i
i
iI
S
g : I iI Mi such that g(i) Mi for all i I. Interpreting the constants and functions would then
be straightforward. For example, suppose that L = {e, f} where e is a constant symbol and f is a binary
relation symbol. Suppose that I = and each Mi is a group. Elements of M would then be sequences
hai ii , we would interpret e as the sequence of each identity in each group, and we would interpret f as the
componentwise group operation (i.e. f M (hai ii , hbi ii ) = hf Mi (ai , bi )ii . In general, we would let cM
be the function i 7 cMi for each constant symbol c, and given f Fk we would let f Mi (g1 , g2 , . . . , gk ) be
the function i 7 f Mi (gi (i), g2 (i), . . . , gk (i)).
This certainly works, but it doesnt really blend the properties of the structures together particularly
well. For example, if each Mi is a group and all but one is abelian, the product is still nonabelian. Also,
if we have relation symbols, its not clear what the right way to determine how to interpret the relation
on M. For example, if L = {R} where R is a binary relation symbol and I = , do we say that the pair
(hai ii , hbi ii ) is an element of RM if some (ai , bi ) RMi , or if all (ai , bi ) RMi , or something else?
Which is the right definition? In other words, if each Mi is a graph, do we put an edge between the
sequences if some edge exists between the components, or if every pair has an edge?
We thus want a more democratic approach of forming M which also gives a way to nicely interpret
the relation symbols. If I were finite, perhaps we could do a majority rules (if most of the pairs were in the
relation), but what if I is infinite?
14.3.1
Ultrafilters
181
T =
6 .
Proof. By induction on |T |.
Definition T
14.3.4. Let X be a set and suppose that S P(X). We say that S has the finite intersection
property if T 6= for all finite T S.
Proposition 14.3.5. Let X be a set and suppose that S P(X). The following are equivalent
1. S has the finite intersection property.
2. There exists a filter F on X such that S F.
Proof. 1 imples 2: Let
F = {A P(X) :
F,
and
fix
finite
T
,
T
S
such
that
T
A
and
T
B. We
1
2
1
2
T
then have that (T1 T2 ) A B, hence A B F.
2 implies 1: Fix a filter
T F on X with S F. Let T be a finite subset of S. WeTthen have that T is a
finite subset of F, hence T F because F is a filter. Since
/ F, it follows that T 6= .
Definition 14.3.6. Let X be a set. An ultrafilter on X is filter U on X such that for all A X, either
A U or X\A U.
Example. Every principal filter is an ultrafilter.
Proposition 14.3.7. Let F be a filter on X. F is an ultrafilter on X if and only if F is a maximal filter
on X (i.e. there is no filter G on X with F ( G).
Proof. Suppose that F is not a maximal filter on X. Fix a filter G on X such that F ( G. Fix A G\F.
Notice that X\A
/ F because otherwise we would have X\A G and hence = A (X\A) G, a
contradiction. Therefore, A
/ F and X\A
/ F, so F is not an ultrafilter on X.
Conversely, suppose that F is not an ultrafilter on X. Fix A P(X) such that A
/ F and X\A
/ F.
We claim that F {A} has the finite intersection property. Fix a filter G on X such that F {A} G.
Since F ( G, it follows that F is not a maximal filter on X.
Proposition 14.3.8. Let F be a filter on X. There exists an ultrafilter U on X such that F U.
Proof. Zorns Lemma.
Corollary 14.3.9. Let X be an infinite set. There exists a nonprincipal ultrafilter on X.
Proof. Let F be the filter on X consisting of all cofinite subsets of X. Fix an ultrafilter U on X such that
F U. For all x X, we have X\{x} F U, hence {x}
/ U.
182
14.3.2
Ultraproducts
Ultrafilters (or even just filters) solve our democratic blending problem for relation symbols beautifully.
Suppose that L = {R} where R is a binary relation symbol and I = . Suppose also that U is an ultrafilter
on . Given elements hai ii and hbi ii of M , we could then say that the pair (hai ii , hbi ii ) is an element
of RM if the set of indices i I such that (ai , bi ) RMi is large, i.e. if {i I : (ai , bi ) RMi } U. Of
course, our notion of large depends on the ultrafilter, but that flexibility is the beauty of the construction!
However, we have yet to solve the dictatorial problem of function symbols (such as the product of groups
in which each is abelian save one ending up nonabelian regardless of what we consider large). Wonderfully,
and perhaps surpisingly, the ultrafilter can be used in another way to save the day. For concreteness, consider
the situation where L = {e, f} where e is a constant symbol and f is a binary relation symbol, I = , and each
Mi is a group. The idea is to flat out ignore variations on small sets by considering two sequences hai ii
and hbi ii to be the same if the set of indices in which they agree is large, i.e. if {i I : ai = bi } U. In
other words, we should define an equivalence relation in this way and take a quotient! This is completely
analagous to considering two function f, g : R R to be the same if the set {x R : f (x) 6= g(x)} has
measure 0. What does this solve? Suppose that M0 was our rogue nonabelian group, and each Mi for
i 6= 0 was an abelian group. Suppose also that \{0} U (i.e. our ultafilter is not the principal ultrafilter
generated by {0}, and thus we are considering {0} to be a small set). Given a sequence hai ii , let [hai ii ]
be the equivalence class of hai ii under the relation. Assuming that everything is well-defined (see below),
we then have that hf Mi (ai , bi )ii hf Mi (bi , ai )ii and so
f M ([hai ii ], [hbi ii ]) = [hf Mi (ai , bi )ii ]
= [hf Mi (bi , ai )ii ]
= f M ([hbi ii ], [hai ii ])
and so we have saved abelianess by ignoring problems on small sets!
To summarize before launching into details, heres
Q the constuction. Start with a language L, a set I, and
L-structures Mi for each i I. Form the product iI Mi , but take a quotient by considering two elements
of this product to be equivalent if the set of indices on which they agree is large. Elements of our structure
are now equivalence classes, so we need to worry about things being well-defined, but the fundamental idea
is to interpret constant symbols and functions componentwise, and interpret relation symbols by saying that
that an k-tuple is in the interpretation of some R Rk if the set of indices on which the corresponding ktuple is in RMi is large. Amazingly, this process behaves absolutely beautifully with regards to first-order
logic. For example, if we denote this blended structure by M, we will prove below that for any SentL
we have
M if and only if {i I : Mi } U
That is, an arbitrary sentence is true in the blended stucture if and only if the set of indices i I in
which is true in Mi is large!
Onward to the details. The notation is painful and easy to get lost in, but keep the fundamental ideas
in mind and revert to thinking of I = whenever the situation looks hopelessly complicated. First we have
the proposition saying that the defined in this way is an equivalence relation and that our definitions are
well-defined.
Proposition 14.3.10. Let I be a set, and suppose
that for each i I we have an L-structure Mi . Let U
Q
be an ultrafilter on I. Define a relation on iI Mi by saying that g h if {i I : g(i) = h(i)} U.
Q
1. is an equivalence relation on iI Mi .
Q
2. Suppose that g1 , g2 , . . . , gk , h1 , h2 , . . . , hk iI Mi are such that gj hj for all j.
(a) {i I : (g1 (i), g2 (i), . . . , gk (i)) = (h1 (i), h2 (i), . . . , hk (i))} U.
183
184
There exists g
iI
There exists g
iI
185
Theorem 14.3.15. If every finite subset of has a model, then has a model.
Proof. Let I be the set of all finite subsets of . For each I, fix a model M of . For each , let
A = { I : }. Let S = {A : } P(I) and notice that S has the finite intersection property
because
{1 , 2 , . . . , n } A1 A2 An
Q
Fix an ultrafilter U on I such that S U and let M = I M /U. For any , we then have that
A { I : M }, hence { I : M } U, and so M . Therefore, M is a model of .
186
Chapter 15
188
189
where h0 : N3 N is given by h0 (x, y, a) = h(y, a). This is something we can handle, after which we can
strip off the extaneous first variable to get that f (x) = f 0 (0, x) is primitive recursive.
Let g 0 = Ca1 : N N and let h0 : N3 N be defined by h0 (x, y, a) = h(y, a). Notice that g 0 is primitive
recursive and h0 = Compose(h, I23 , I33 ) is also primitive recursive. It follows that f 0 = PrimRec(g 0 , h0 ) is
primitive recursive. Therefore, f = Compose(f 0 , C01 , I11 ) is primitive recursive.
Proposition 15.1.11. The factorial function f : N N is primitive recursive.
Proof. Notice that
f (0) = 0
f (y + 1) = (y + 1) f (y)
Thus, we want to consider the function h : N2 N defined by h(y, a) = (y + 1) a. Notice that h =
Compose(, Compose(S, I12 ), I22 ) is primitive recursive, so f is primitive recursive by Proposition 15.1.10.
Proposition 15.1.12. The predecessor function P red : N N defined by P red(0) = 0 and P red(n) = n 1
for all n > 0 is primitive recursive.
Proof. We have
P red(0) = 0
P red(y + 1) = y
Thus, if we let h : N2 N be the function h(y, a) = y, we notice that h = I12 is primitive recursive, hence
P red is primitive recursive by Proposition 15.1.10.
Proposition 15.1.13. The function f : N2 N defined by
(
x y if x y
f (x, y) =
0
otherwise
Proof. We have
f (x, 0) = x
f (x, y + 1) = P red(f (x, y))
More formally, notice that f = P rimRec(I11 , Compose(P red, I33 )).
Proposition 15.1.14. The following functions are primitive recursive.
1. sg : N N defined by
1
0
if n 6= 0
otherwise
1
0
if n = 0
otherwise
sg(n) =
2. sg : N N defined by
sg(n) =
Proof. We first handle sg. Notice that sg(n) = 1 n for every n N, so sg = Compose(, C11 , I11 ). Next
notice that sg(n) = 1 sg(n) for every n N, so sg = Compose(, C11 , sg).
Proposition 15.1.15. The following functions are primitive recursive:
190
if x > y
otherwise
(
1
Lt(x, y) =
0
if x < y
otherwise
1
0
if x = y
otherwise
Proof. Notice that x > y if and only if x y 6= 0, which is if and only if sg(x y) = 1. Therefore,
Gt = Compose(sg, ) is primitive recursive. Now Lt = Compose(Gt, I22 , I12 ), so Lt is primitive recursive.
Finally, notice that Equal = Compose(sg, Compose(+, Gt, Lt)), so Equal is primitive recursive.
Proposition 15.1.16. For each k N, the function ik defined by
(
1 if n = k
ik (n) =
0 otherwise
is primitive recursive.
Proof. Notice that for each m, we have that im = Compose(Equal, I11 , Ck1 ), so ik is primitive recursive.
Proposition 15.1.17. Suppose that f : Nn+1 N is primitive recursive.
P
1. The function g : Nn+1 N given by g(~x, y) =
f (~x, z) is primitive recusive.
z<y
z<y
Proof.
1. We have
g(~x, 0) = 0
g(~x, y + 1) = g(~x, y) + f (~x, y)
2. We have
g(~x, 0) = 1
g(~x, y + 1) = g(~x, y) f (~x, y)
191
so g1 is primitive recursive.
2. We have
g2 (~x, y) = sg(g1 (~x, y)) y +
z<y
15.2
Definition 15.2.1. Let R Nn . We say that R is primitive recursive if its characteristic function, i.e. the
function KR : Nn N given by
(
1 if (x1 , x2 , . . . , xn ) R
KR (x1 , x2 , . . . , xn ) =
0 otherwise
is primitive recursive.
Proposition 15.2.2. If R Nn is primitive recursive, then so is Nn \R.
Proof. Notice that KNn \R = Compose(sg, KR ).
Proposition 15.2.3. If R, S Nn are primitive recursive, then so are R S and R S.
Q
Proof. Notice that KRS = Compose( , KR , KS ) and that KRS = Compose(sg, Compose(+, KR , KS )).
Proposition 15.2.4. Suppose that f : Nn N is primitive recursive (as a function). We then have that
graph(f ) Nn+1 is primitive recursive (as a relation).
192
if f (x1 , x2 , . . . , xn ) = y
otherwise
n+1
Thus, Kgraph(f ) = Compose(Equal, Compose(f, I1n+1 , I2n+1 , , Inn+1 ), In+1
).
(
KS (~x, y) =
1
0
193
Proposition 15.2.8. The function f : N N which sends n to the n + 1st prime (i.e. f (0) = 2, f (1) = 3,
f (2) = 5, etc.) is primitive recursive.
Proof. Notice that by Euclids proof that there are infinitely many primes, we have that f (n + 1) f (n)! + 1
for all n N. Thus,
f (0) = 2
f (n + 1) = z<(f (n)!+2) (z > f (n) z P rime)
15.3
Coding Sequences
In what follows, we let pn denote the (n + 1)st prime, so p0 = 2, p1 = 3, p2 = 5, etc. Notice that the function
n 7 pn is primitive recursive by Proposition 15.2.8.
Definition 15.3.1. For each n N+ , let n : Nn N be the function
xn +1
n (x1 , x2 , . . . , xn ) = p0x1 +1 p1x2 +1 pn1
0 if y = 1
ln(y) = n if y ran(n )
0 otherwise
(i.e. ln(y) is the length of the sequence coded by y) is primitive recursive.
194
1
h(y) = (y)ln(y)2 + (y)ln(y)1
and notice that h is primitive recursive and that f (y) = h(f(y)) for all y.
Proposition 15.3.14. Suppose that f : Nn N, and let g : N N be the function
(
f (x1 , x2 , . . . , xn ) if y = hx1 , x2 , . . . , xn i Seq
g(y) =
0
otherwise
We then have that f is primitive recursive if and only if g is primitive recursive.
195
15.4
We want to be able to code primitive recursive functions using numbers. The idea is as follows:
1. We use the number h0i = 20+1 = 2 as a code for the function O.
2. We use the number h1i = 21+1 = 4 as a code for the function S.
3. We use the number h2, n, ii = 22+1 3n+1 5i+1 = 8 3n+1 5i+1 as a code for the function Iin .
4. If a, b1 , b2 , . . . , bm are codes for functions such that the each of the functions coded by the bi have the
same arity, and the arity of the function coded by a is m, then we use the number h3, a, b1 , b2 , . . . , bm i
as a code for the function which is the composition of the function coded by a with the functions coded
by the bi .
5. If a and b are codes functions such that the function coded by b has arity two more than the function
coded by a, then we use the number h4, a, bi as a code for the function with is achieved via primitive
recursion using the function coded by a as the base case and the function coded by b as our iterator.
For example, the number h3, h1i, h0ii = 23+1 34+1 52+1 = 24 35 53 is a code for the function C11 .
We want to show that the set of codes described above is primitive recursive. To do this, we define a
function f : N N recursively in which f (e) = 0 if e is not a valid code, and f (e) = n > 0 if e is a valid
code of a function of arity n. Heres the precise definition.
Definition 15.4.1. We define a function f : N N recursively as follows.
(e)1
f ((e) )
2
f (e) =
f ((e)1 ) + 1
e = h0i
e = h1i
e Seq, (e)0 = 2, ln(e) = 3, (e)1 1, and 1 (e)2 (e)1
e Seq, (e)0 = 3, ln(e) 3, (i < ln(e))(i > 0 f ((e)i ) 6= 0),
(i < ln(e))(j < ln(e))((i > 1 j > 1) f ((e)i ) = f ((e)j ))
and f ((e)1 ) = ln(e) 2
if e Seq, (e)0 = 4, ln(e) = 3, f ((e)1 ) 6= 0, and f ((e)2 ) = f ((e)1 ) + 2
otherwise
if
if
if
if
We denote f by P RArity.
Proposition 15.4.2. P RArity is primitive recursive.
196
Proof. Roughly, this is because P RArity is defined recursively using primitive recursive conditions. More
formally, we define h : N N by
1
if ln(y) = h0i
1
if ln(y) = h1i
(ln(y))
if ln(y) Seq, (ln(y))0 = 2, ln(ln(y)) = 3, (ln(y))1 1, and 1 (ln(y))2 (ln(y))1
(y)
if ln(y) Seq, (ln(y))0 = 3, ln(ln(y)) 3, (i < ln(ln(y)))(i > 0 (y)i 6= 0)
2
h(y) =
0
otherwise
S
(e)
(e) = I(e)21
P rimRec(((e)2 ), ((e)1 ))
if e = h0i
if e = h1i
if (e)0 = 2
if (e)0 = 3
if (e)0 = 4
Proposition 15.4.6. A function f : Nn N is primitive recursive if and only if there exists e N such
that f = (e).
Definition 15.4.7. We define a function F : N2 N by letting
(
(e)((x)0 , (x)1 , . . . , (x)ln(x)1 ) if x Seq, ln(x) > 0, and P RArity(e) = ln(x)
F (e, x) =
0
otherwise
For each n N, we then define a function Fn : Nn+1 N by letting
(
F (e, hx1 , x2 , . . . , xn i) if P RArity(e) = n
Fn (e, x1 , x2 , . . . , xn ) =
0
otherwise
Notice that F and each Fn are intuitively computable.
Proposition 15.4.8. F1 is not primitive recursive.
Proof. Suppose that F1 is primitive recursive. Define g : N N by letting g(x) = F1 (x, x) + 1 for all x N,
and notice that g is primitive recursive. Fix an e N such that g = (e). We then have that
g(x) = (e)(x) = F1 (e, x)
for all x N, hence
F1 (e, e) = g(e) = F1 (e, e) + 1
a contradiction.
Chapter 16
Definition 16.1.1. Fix some element not in N and denote it by . Let N = N {}.
Definition 16.1.2. Let F be the set of all functions f : Nn N for some n N+ . We call elements of F
partial functions.
Definition 16.1.3. Suppose that m, n N+ , that h : Nm N , and that g1 , g2 , . . . , gm : Nn N . We let
Compose(h, g1 , g2 , . . . , gm ) be the function f : Nn N defined by
(
f (~x, y + 1) =
h(~x, y, f (~x, y))
if f (~x, y) =
otherwise
Definition 16.1.5. Let h : Nn+1 N . We let Minimize(h) be the function f : Nn N defined by letting
f (~x) be the least y such that h(~x, y) = 0 and h(~x, z) 6= for all z N for all ~x Nn . We denote f by writing
f (~x) = y(h(~x, y) = 0).
Definition 16.1.6. The collection of partial recursive functions is the collection of partial functions generated
by starting with the initial functions, and generating using Compose, P rimRec, and M inimize. A (total)
recursive function is a partial recursive function f : Nn N such that
/ ran(f ).
Definition 16.1.7. Let R Nn . We say that R is recursive if its characteristic function, i.e. the function
KR : Nn N given by
(
1 if (x1 , x2 , . . . , xn ) R
KR (x1 , x2 , . . . , xn ) =
0 otherwise
is recursive.
Proposition 16.1.8. Suppose that f : Nn N (notice that f is total). If graph(f ) Nn+1 is recursive (as
a relation), then f is recursive (as a function).
197
198
Proof. The right-to-left direction is as before, so we prove the left-to-rigth direction. Suppose that graph(f )
is recursive (as a relation). Notice that for all ~x N, there exists y N such that sg(Kgraph(f ) (~x, y)) = 0
(because (~x, f (~x)) graph(f ) and so sg(Kgraph(f ) (~x, f (~x))) = 0). Since f (~x) = y(sg(Kgraph(f ) (~x, y)) = 0),
it follows that f is recursive.
Definition 16.1.9. We define a function f : N N recursively as follows.
1
if e = h0i
1
if e = h1i
(e)1
if e Seq, (e)0 = 2, ln(e) = 3, (e)1 1, and 1 (e)2 (e)1
f
((e)
)
+
1
if
e
0
otherwise
We denote f by RArity.
Proposition 16.1.10. RArity is primitive recursive.
Definition 16.1.11. Let RCode be the set of all e N such that RArity(e) 6= 0.
Proposition 16.1.12. RCode is primitive recursive.
Definition 16.1.13. We define a function : RCode F {} recursively as follows.
O
if e = h0i
S
if e = h1i
(e)1
I(e)2
if (e)0 = 2 and ((e)1 ), ((e)2 ) 6=
(e) = Compose(((e)ln(e)1 ), ((e)1 ), . . . , ((e)ln(e)2 )) if (e)0 = 3 and ((e)i ) 6= whenever 0 < i < ln(y)
P rimRec(((e)2 ), ((e)1 ))
if (e)0 = 4 and ((e)1 ), ((e)2 ) 6=
M inimize(((e)1 ))
if (e)0 = 5, ((e)1 ) 6=, and ~xy(((e)1 )(~x, y) = 0)
otherwise
Definition 16.1.14. We define a set T N3 such that (e, x, y) T means to capture that y codes a
computation of (e) on input x Seq.
Proposition 16.1.15. T is primitive recursive.
Definition 16.1.16. For each n N+ , let Tn Nn+2 be defined by letting
Tn = {(e, x1 , x2 , . . . , xn , y) Nn+2 : (e, hx1 , x2 , . . . , xn i, y) T }
Proposition 16.1.17. Tn is primitive recursive for each n N+ .
Theorem 16.1.18. Suppose that f : Nn N is a partial recursive function. There exists e N such that
1. For all ~x Nn , we have f (~x) 6= if and only if there exists y such that Tn (e, ~x, y).
2. For all ~x N and all y N such that Tn (e, ~x, y), we have f (~x) = U (y). In particular, we have
f (~x) = U (yTn (e, ~x, y)) for all ~x Nn .
16.2
16.3
16.4
199
(x)0
a
if R((x)0 , (x)1 )
otherwise
200
Theorem 16.4.4. There exists a c.e. set which is not computable. In particular, the set
K = {x N : (y)T3 (x, x, y)}
is a c.e. set which is not computable.
Proof. Since T3 is primitive recursive, it follows that {(x, y) N2 : T3 (x, x, y)} is primitive recursive, hence
recursive. Therefore, K is c.e. Suppose that K was computable. Define a function f : N N by letting
(
U (yT3 (x, x, y)) + 1 if x K
f (x) =
0
otherwise
and notice that f is a total computable function. Thus, we fix an e such that f (x) = U (yT3 (e, x, y)) for all
x N. We then have that e K because f (e) N, hence f (e) = U (yT3 (e, e, y)) + 1, a contradiction.
Chapter 17
202
1 if n ]Con
1 if n ]V ar
[m
Seq
ln(m)
=
Arity
((n)
)
(i
<
ln(m))[(m)
<
n f ((m)i ) = 1]
F
0
i
n = Concat(h(n)0 i, SeqConcat(m))]
0 otherwise
Notice that f is computable and that f = K]T ermL .
Proposition 17.0.11. ]AtomicF ormL is computable.
Proof. We define a
1
f (n) =
203
Notice that since we can code finite sequences of numbers with numbers, we can also code finite sets of
numbers with numbers. One such natural coding is as follows. Suppose that F N is finite, list the elements
of F in ascending order as a1 , a2 , . . . , an , and let the code of F be ha1 , a2 , . . . , an i. Using this definition,
one can now check that the function which takes two such codes, and outputs the code of their union is
computable, along with many other such basic functions.
Definition 17.0.13. Let F reeV ar : N N be the function defined by letting F reeV ar(n) be the code of the
finite set of variables occuring free in , if n = ], and 0 otherewise.
Proposition 17.0.14. F reeV ar is computable.
Proposition 17.0.15. ]SentL is computable.
Proposition 17.0.16. Subst : N2 N is computable.
Proposition 17.0.17. V alidSubst : N2 N is computable.
Since we can code finite sets and formulas using numbers, we can now code pairs (, ) as numbers.
From here, we can sequences of such pairs as numbers.
Proposition 17.0.18. Let Ded N be the set of codes of such sequences which are deductions. We then
have that Ded is computable. Furthermore, if F ormL is computable, then the subset Ded consisting
of elements of Ded whose last line is of the form (, ) where , is computable.
Proposition 17.0.19. Suppose that F ormL is such that ] is computable. The set ]{ F ormL :
} is c.e.
Proof. Notice that n ]{ F ormL : } if and only if n ]{ F ormL : } by the Soundness
and Completenes Theorems, which is if and only if y(Ded (y) Last(y) = n).
Corollary 17.0.20. Suppose that SentL is such that ] is computable. The set ]Cn() is c.e.
Definition 17.0.21. Let T be a theory in a computable language.
1. We say that T is finitely axiomatizable if there exists a finite SentL such that T = Cn().
2. We say that T is axiomatizable if there exists SentL such that ] is computable and T = Cn().
Definition 17.0.22. We say that a theory T is decidable if ]T is computable.
Proposition 17.0.23. If T is an axiomatizable complete theory, then T is decidable.
Corollary 17.0.24. DLO is decidable.
Corollary 17.0.25. ACFp is decidable for each p.
204
Chapter 18
18.1
Definition 18.1.1. Let L = {0, S} where 0 is a constant symbol and S is a unary function symbol. Let
NS = (N, 0, S).
Definition 18.1.2. Let AxS be the following set of L-sentences.
1. xy(Sx = Sy x = y)
2. x(x 6= 0 y(Sy = x))
3. x(Sx 6= 0)
4. x(Sn x 6= x) for each n N+ .
A model of Cn(AxS ) consists of one N-chain together with some number (possibly 0) of Z-chains as
we discuss now.
Proposition 18.1.3. Suppose that M AxS . Define a relation on M by saying that a b if either
1. a = b.
2. a = (SM )(n) (b) for some n N+ .
3. b = (SM )(n) (a) for some n N+ .
We then have that is an equivalence relation on M .
Definition 18.1.4. Let be a cardinal. We define an L-structure M by letting M = N ( Z), letting
0M = 0, and letting SM (, n) = (, n + 1).
Proposition 18.1.5. M AxS for every cardinal .
205
206
T y(
n
^
i=1
j )
j=1
Notice that terms in our language are S` x for some ` N and some x V ar, and also S` 0 for some ` N.
Let
X = {S` 0 : ` N} {S` y : ` N} {S` xi : ` N, 1 i k}
Now each i and j is s.e. with, and hence we may assume is, an atomic formula of the form S` y = t for
some t X.
First, we may suppose that none of ts is of the form S` y. This is because we may ignore the formulas
i of the form S` y = Sp y with ` = p and the formulas j of the form S` y = Sp y with ` 6= p. Also, if some
i of the form S` y = Sp y with ` 6= p or some j of the form S` y = Sp y with ` = p, then using the injectivity
axioms and the noncircularity axioms, we see that
T y(
m
^
i=1
n
^
j ) (x1 = x1 )
j=1
Next, we may assume that the `s in each S` y are the same. This is because we have t = u if and only if
St = Su (due to the injectivity axiom for S).
Now for each i, we denote the term on the right in i by ti , and for each j, we denote the the term on
the right of j by uj . If m 1 (i.e. if there is some i ), we then have
T y(
m
^
i=1
n
^
j ) ((
`1
^
t1 6= Sp 0)
p=0
j=1
m
^
(t1 = ti )
i=2
m
^
i=1
n
^
j=1
j ) x1 = x1
n
^
j=1
(t1 = uj ))
207
18.2
Definition 18.2.1. Let L = {0, S, <} where 0 is a constant symbol, S is a unary function symbol, and < is
a binary relation symbol. Let NL = (N, 0, S, <).
Definition 18.2.2. Let AxL be the following set of L-sentences.
1. x(x 6= 0 y(Sy = x))
2. x(x < x)
3. xyz((x < y y < z) x < z)
4. xy((x < y) (y < x) (x = y))
5. x(x 6= 0 0 < x)
6. x(x < Sx)
7. xy(x < y y < Sx)
Lemma 18.2.3. AxL xy(x < y Sx < Sy).
Proof. Fix a model M AxL . Suppose that a, b M .
Suppose that a <M b. Notice that b <M SM (a) is impossible by 7, so we must have either SM (a) = b or
M
S (a) <M b by 4. In the former case, we have that SM (a) <M SM (b) by 6. In the latter case, we may use
6 and 3 to conclude that SM (a) <M SM (b).
Suppose conversely that SM (a) <M SM (b). We need to show that a <M b. Suppose for a contradiction
that this is not the case. By 4, either a = b or b <M a. In the former case, we could conclude that
SM (a) = SM (b) contradicting 2. In the latter case, we could use the previous paragraph to conclude that
SM (b) <M SM (a), then use 3 to get that SM (a) <M SM (a), contradicting 2. It follows that a <M b.
n
208
m
^
i=1
n
^
j )
j=1
Notice that terms in our language are S` x for some ` N and some x V ar, and also S` 0 for some ` N.
Let
X = {S` 0 : ` N} {S` y : ` N} {S` xi : ` N, 1 i k}
Now each i and j is s.e. with, and hence we may assume is, one of the following
1. S` y = t for some t X.
2. S` y < t for some t X.
3. t < S` y for some t X.
209
First, we may suppose that none of ts is of the form S` y because we can either ignore them, or because they
make the formula trivial as above. Next, we can suppose that there are no j s . This is because we could
replace (S` y = t) by (S` y < t) (t < S` y) and similarly for inequalities, distribute the over the , and
then distribute the over the , handling each case separately. Next, we may assume that the `s in each
S` y are the same. This is because we have t = u if and only if St = Su (due to the injectivity axiom for S)
and t < u if and only if St < Su (by what we showed above).
For each i, we denote the t in i by ti . Let E = {i : i is S` y = ti }, let L = {i : i is ti < S` y}, and let
U = {i : i is S` y < ti }.
Suppose first that E 6= . Fix j E. We then have
T y(
m
^
i=1
i ) (
`1
^
(tj 6= S p 0)
p=0
(tj = ti )
iE
(ti < tj )
iL
(tj < ti ))
iU
m
^
i ) x1 = x1
i=1
If L = , then
T y(
m
^
i )
i=1
(S` 0 < ti )
iU
m
^
i ) (
i=1
iU
(S` 0 < ti )
(Sti < tj ))
iL,jU
210
18.3
Definition 18.3.1. Let L = {0, 1, <, +} where 0 and 1 are constant symbols, < is a binary relation symbol,
and + is a binary function symbol. Let NA = (N, 0, 1, <, +).
Given n N+ and y V ar, we use n y as shortand for y + y + + y (n times) and given n N, we
use n as shorthand for n 1.
Proposition 18.3.2. T h(NA ) does not have QE.
Proof. Notice that if (x) F ormL is quantifier-free, then {n N : (NA , n) } is either finite or cofinite
because atomic formulas are linear inequalities and inequalities. Since y(y + y = x) defines the set of even
numbers, which is neither finite nor cofinite, it follows that y(y + y = x) is not equivalent to any quantifierfree formula.
Definition 18.3.3. Let AxA be the following set of L-sentences.
1. x(x 6= 0 y(y + 1 = x))
2. x(x < x)
3. xyz((x < y y < z) x < z)
4. xy((x < y) (y < x) (x = y))
5. x(x 6= 0 0 < x)
6. x(x < x + 1)
7. xy(x < y y < x + 1)
8. x(x + 0 = x)
9. xyz((x + y) + z = x + (y + z))
10. xy(x + y = y + x)
11. xy(x < y z(x + z = y))
12. wxyz((w < y (x < z x = z)) w + x < y + z)
W
13. xy k<n (x = n y + k) for each n 2.
Lemma 18.3.4.
1. AxA x(x < `
W`1
k=0 (x
= k)) whenever ` N+ .
211
212
Chapter 19
Number Theory
19.1
Definability in N
Definition 19.1.1. Let L = {0, 1, +, } where 0 and 1 are constant symbols, and + and are binary function
symbols. Let N = (N, 0, 1, +, ).
Theorem 19.1.2 (Essentially G
odel). The partial computable functions equals the collection of partial functions starting with O, S, Iin , +, , and Equal, and closing off under Compose and M inimize.
Proof. Let F be the collection of all such functions. Since +, , and Equal are all partial computable, it
follow by a simple induction that every element of F is partial recursive (i.e. partial computable). For the
converse, we need to show that if g : Nn N and that h : Nn+2 N are in F, then P rimRec(g, h) F.
Our method to accomplish this will be to define a function : N2 N such that F with the property
that for all a0 , a1 , . . . , an N, there exists c N such that (c, i) = ai for all i n. That is, we want a
function of two variables in F which is able to code finite sequences. You may think that weve already done
this through our sequence decoding function. However, our coding of sequences using powers of primes, and
exponentiation is defined using primitive recursion which at this point is not obviously in F (we will know
that it is once we prove the theorem, but we dont know that yet).
Suppose that we have such a function . Suppose that g : Nn N and that h : Nn+2 N are in F. Let
f = P rimRec(g, h). Consider the function t : Nn+1 N defined by
t(~x, y) = z[(z, 0) = g(~x) (i < y)((z, i + 1) = h(~x, i, (z, i)))]
Assuming that F is closed under same basic operations (like and bounded quantification), it follows that
t F. Since f (~x, y) = (t(~x, y), y), it follows that f F. Ill leave it to you to think about why F is closed
under these basic operations, and move on to the subtle part of the argument.
Rather than defining our function of two variables all at once, well first make a function of three
variables work. Assuming some pairing function exists in F (it does, see below), this will be sufficient. Thus,
we define a function : N3 N in F such that for all a0 , a1 , . . . , an N, there exists b, k N such that
(b, k, i) = ai for all i n. The idea for this function is to get the ai s as remainders upon division of the
number b by n + 1 numbers given in terms of k and i. To this end, recall the Chinese Remainder Theorem,
which says that if d0 , d1 , . . . , dn N are pairwise relatively prime, and a0 , a1 , . . . , an N satisfy ai < di for
all i, then there exists m N such that m ai (mod di ) for all i.
Let (b, k, i) be the remainder upon division of b by 1 + (i + 1)k. To see that F, simply notice
that (b, k, i) = r[(q < b + 1)(b = q (1 + (i + 1)k) + r)]. We now need to show that works. Suppose
that a0 , a1 , . . . , an N. Let s = max{n, a0 , a1 , . . . , an } + 1 and let k = s!. We first argue that the numbers
1 + k, 1 + 2k, . . . , 1 + (n + 1)k are pairwise relatively prime. Suppose that 1 i, j n + 1, that p is prime,
213
214
and that p | (1 + ik) and p | (1 + jk). Since p | (1 + ik), we must have p - k, hence it must be the case
that p > s. We also have p | (i j)k, hence p | (i j) (because p is prime and p - k). Therefore, since
p > s n + 1, it must be the case that i = j. Thus, the numbers 1 + k, 1 + 2k, . . . , 1 + (n + 1)k are pairwise
relatively prime. Since ai < s k < 1 + (i + 1)k for all k, it follows by the Chinese Remainder Theorem
that there exists b N such that (b, k, i) = ai for all i.
Next, we need to a pairing function which is in F. Rather that using powers of primes and thus
exponentiation, we will make do with the standard diagonalizing through the plane encoding. That is,
we define J : N2 N by letting
x+y
X
J(x, y) = (
i) + x =
i=1
(x + y)(x + y + 1)
+x
2
which simplifies to
J(x, y) =
(x + y)2 + 3x + y
2
Notice that J(x, y) = z(2z = (x + y)2 + 3x + y), so J F. Now define L : N N by letting L(z) =
x((y < z + 1)(J(x, y) = z)) and R(z) = y((x < z + 1)(J(x, y) = z)), and notice that L, R F.
Finally, define : N2 N by letting (c, i) = (L(c), R(c), i) and notice that F. Given a0 , a1 , . . . , an
N, we then have that there exists c N such that (c, i) = ai for all i n.
Theorem 19.1.3. Every computable relation and (the graph of ) every parital computable function is definable in N.
Proof. We first show that (the graph of) every partial computable function is definable in N.
Suppose that h : Nm N is definable in N and that g1 , g2 , . . . , gm : Nn N are definable in N. Let f =
Compose(h, g1 , g2 , . . . , gm ). Fix (y1 , y2 , . . . , ym , z) F ormL defining h in N and fix i (x1 , x2 , . . . , xn , y)
F ormL defining gi in N. Let (x1 , x2 , . . . , xn , z) F ormL be the formula
y1 y2 ym (
m
^
i=1
19.2
215
Our goal in this section is to prove the following theorem which is a weak form of the First Incompleteness
Theorem. We will strengthen it later, but all of the real insight and hard work is in this version anyway.
Theorem 19.2.1 (First Incompleteness Theorem - Godel). T h(N) is undecidable and not axiomatizable.
Notice that since T h(N) is a complete theory, we know that it is undecidable if and only if it is not
axiomatizable. Thus, it suffices to prove only one. We will give three proofs below.
19.2.1
Proof of Incompleteness Theorem via Computability. Suppose that ] is computable and that Cn() =
T h(N). Let K be a c.e. set which is not computable. We then know that K is not c.e. Fix a formula
(x) defining K in N. Notice that the set {n N : (n)} is c.e. However, we have that
n K N (n) (n)
so K = {n N : (n)} is c.e., a contradiction.
19.2.2
Theorem 19.2.2 (Undefinability of Truth - Tarski). The set ]T h(N) is not definable in N.
Proof. Suppose that {] : N } is definable in N, and fix (x) F ormL defining it so that we have
N N (])
for all SentL . The idea is to show that there is a definable subset of N2 such that every definable subset
of N appears as a row. We can then definably diagonlize out by taking the negation of the diagonal to get
a contradiction.
Notice that the function f : N2 N given by letting
(
](n) if m = ] for a formula with one free variable
f (m, n) =
0
otherwise
is computable. Fix (x, y, z) defining (the graph of) f in N. Let (x, y) be the formula z((x, y, z) (z)).
Notice that for all (x) F ormL and all n N, we have
N (], n) N (f (], n))
N (](n))
N (n)
Now let (x) be the formula (x, x) (so defines the complement of the diagonal). The point is that we
diagonalized out, so this cant be one of rows. But it must be one of the rows, since in fact it must be in
row number ]. Formally, we have
N (]) N (], ])
N 6 (], ])
N 6 (])
which is a contradiction.
Proof of Incompleteness Theorem via Definability. If T h(N) is decidable, then ]T h(N) is computable, hence
definable in N, contradicting Undefinability of Truth. Therefore, T h(N) is undecidable.
216
19.2.3
Our next proof uses of Incompleteness uses the following fundamental Lemma which allows us to make
sentences which indirectly refer to themselves.
Lemma 19.2.3 (Fixed-Point Lemma - G
odel). Let (x) F ormL . There exists SentL such that
N (])
Proof. As above, notice that the function f : N2 N defined by letting
(
](n) if m = ] for a formula with one free variable
f (m, n) =
0
otherwise
is computable. Fix (x, y, z) defining (the graph of) f in N. Let (x, y) be the formula z((x, y, z) (z)).
Notice that for all (x) F ormL and all n N, we have
N (], n) N (f (], n))
N (](n))
Now let (x) be the formula (x, x) (so defines the diagonal). The point here is to look at what happens
when the row ] which is defining the diagonal actually meets the diagonal. That is, we should look at the
(], ]) entry of the table. We have
N (]) N (], ])
N (](]))
Thus, if we let = (]), we then have that N if and only if N (]). That is, N (]).
We first show how to get Undefinability of Truth using the Fixed-Point Lemma. The idea is to take a
purported definition of truth, and use it to get a sentence which indirectly says that it is false.
Using the Fixed-Point Lemma to Prove Undefinability of Truth. Suppose that the set {] : N } is definable in N, and fix (x) F ormL defining it so that
N N (])
for all SentL . By the Fixed-Point Lemma, there exists SentL such that N (]). We then
have that
N (]) N
N (])
a contradiction.
We now give another proof of incompletenss using a sentence which indirectly asserts that it is not
provable.
Proof of Incompleteness Theorem via Self-Reference. Suppose that T h(N) and that ] is computable.
We then have that the set {] : } is c.e., so is definable in N. Fix P rv (x) F ormL defining
{] : } in N, so that
N P rv (])
217
for all SentL . By the Fixed-Point Lemma, there exists SentL such that
N P rv (])
Now if , we would then have that N (because T h(N)), but we also have
N P rv ()
N
which is a contradiction. Therefore, we must have 6 . It follows that N 6 P rv (]), so N P rv (]),
and hence N . Therefore, T h(N)\Cn(), so Cn() 6= T h(N).
It follows that T h(N) is not axiomatizable.
19.3
19.3.1
Robinsons Q
Definition 19.3.1. Let L = {0, 1, +, } and let AxQ be the following set of L-sentences.
1. xy(Sx = Sy x = y)
2. x(Sx 6= 0)
3. x(x 6= 0 y(Sy = x))
4. x(x + 0 = x)
5. xy(x + Sy = S(x + y))
6. x(x 0 = 0)
7. xy(x Sy = x y + x)
Let Q = Cn(AxQ ).
Definition 19.3.2. Let (x, y) be the formula z(z + x = y) and let < (x, y) be the formula (x, y)x 6= y.
Proposition 19.3.3.
1. Q x(x + k = S k x)
2. Q x(x + k + 1 6= k)
W`
3. Q (k, `) i=0 (k = i) for all k, ` N.
4. Q x(< (x, k) x = k < (k, x)) for all k N.
5. Q (k = `) for all k 6= `.
6. Q k + ` = k + ` for all k, ` N.
7. Q k ` = k ` for all k, ` N.
Proposition 19.3.4. For every variable-free t T ermL , there exists k N such that Q t = k.
Proof.
218
19.3.2
Peano Arithmetic
Definition 19.3.7. Let P A be the L-theory axiomatized by AxQ together with the formulas
~p(((0,~p) x((x,~p) (Sx,~p))) x(x,~p)
for all (x, p~) F ormL .
19.4
Definition 19.4.1. Let L be a compuatble language containing a constant symbol 0 and a unary function
symbol S. Let T be an L-theory.
1. A relation R Nn is representable in T if there exists (x1 , x2 , . . . , xn ) F ormL such that for all
k1 , k2 , . . . , kn N, we have
(a) If (k1 , k2 , . . . , kn ) R, then T (k1 , k2 , . . . , kn ).
(b) If (k1 , k2 , . . . , kn )
/ R, then T (k1 , k2 , . . . , kn ).
2. A function f : Nn N is representable in T if graph(f ) Nn+1 is representable in T (as a relation).
Proposition 19.4.2. Let T be an axiomatizable theory.
1. If R Nn is representable in T , then R is computable.
2. If f : Nn N is representable in T , then f is computable.
Proof. Since T is axiomatizable, we may fix a set such that ] is computable and Cn() = T .
1. Suppose that R Nn is representable in T , and fix (x1 , x2 , . . . , xn ) F ormL representing it. Notice
that the function g : Nn N defined by g(k1 , k2 , . . . , kn ) = ](k1 , k2 , . . . , kn ) is computable. Now we
know that the set {] : } is c.e., hence both R and R are c.e. It follows that R is computable.
2. By part 1, we know that graph(f ) is computable, so f is computable.
219
19.5
Working From Q
Lemma 19.5.2. Let SentL and let SentL . If Cn() is decidable, then Cn( { }) is also
decidable.
Proof. Notice that Cn( { }) if and only if Cn().
Theorem 19.5.3 (Strong Undecidability of Q). If ] is computable and Q is consistent, then Cn()
is undecidable.
Proof. Suppose that Cn() is decidable. Let T = Cn( Q), and notice that T is decidable by the lemma.
Fix (x) F ormL representing ]T in Q. By the Fixed-Point Lemma, there exists SentL such that
Q (]). We then have
T ] ]T
Q (])
(since represents ]T in Q)
Q
(by choice of )
(since Q T )
220
and
/ T ]
/ ]T
Q (])
(since represents ]T in Q)
Q
(by choice of )
(since Q T )
19.6
Definition 19.6.1. Let SentL be decidable. We then have that the set {(m, n) N2 : n codes a
deduction witnessing that proves the sentence coded by m} is computable. Denote by (x, y) a formula
representing the above set in Q. Let P rv (x) be the formula y (x, y).
Lemma 19.6.2. Suppose that SentL and that ] is computable. If , then Q P rv (]).
Proof. Suppose that , and let n be the Godel number of a deduction witnessing that ` . We then
have that Q (], n), hence Q y (], y), which is to say that Q P rv (]).
Definition 19.6.3. Suppose that SentL and that ] is computable. We say that has the reflection
property if whenever , we have P rv (]).
Corollary 19.6.4. Suppose that SentL and that ] is computable. If Q Cn(), then has the
reflection property.
Definition 19.6.5. Suppose that SentL and that ] is computable. A Godel sentence of is a
SentL such that
P rv (])
Proposition 19.6.6. If SentL , ] is computable, and Q Cn(), then has a G
odel sentence.
Proposition 19.6.7. Suppose that SentL , ] is computable, and Q Cn(). Let be a G
odel
sentence of . If is consistent, then 6 .
Proof. Suppose that is consistent. Notice that
P rv (])
is inconsistent
a contradiction, so 6 .
(by reflection)
(since is a Godel sentence of )
221
Definition 19.6.8. Given a set SentL such that ] is computable, let Con be the sentence P r (](0 = 1)).
Definition 19.6.9. Suppose that SentL and that ] is computable. We say that is sufficently strong
if
1. Q Cn().
2. For any SentL , we have P rv (]) P rv (]P rv (])). We call this formalized reflection.
3. For any , SentL , we have (P rv (]) P rv (]( ))) P rv (] ). We call this formalized
Modus Ponens.
Theorem 19.6.10 (Second Incompleteness Theorem - Godel). Suppose that SentL ,that ] is computable, and that Cn() is sufficiently strong. We then have that Con if and only if is inconsistent.
Proof. Let be a G
odel sentence of . We formalize the proof of Proposition 19.6.7 (which says that if
is consistent then 6 ) inside to show that Con P rv (]). From this it will follow that
Con , which will give the result as well see below.
1. Notice that P rv (]) P rv (]P rv (])) by formalized reflection (so the first implication in the
proof of Proposition 19.6.7 holds inside ).
2. Now since P rv (]) by choice of , we have P rv (](P rv (]) )) by reflection.
Using formalized Modus Ponens, it follows that P rv (]P rv (])) P rv (]) (so the second
implication in the proof of Proposition 19.6.7 holds inside ).
3. By combining 1 and 2, we therefore have that P rv (]) P rv (]).
4. Notice that ( ( (0 = 1))), so P rv (]( ( (0 = 1)))) by reflection. By
formalized Modus Ponens, it follows that P rv (]) P rv (]( (0 = 1))).
5. Combining 3 and 4, we see that P rv (]) (P rv (]) P rv (]( (0 = 1)))).
6. Therefore, P rv (]) P rv (](0 = 1)) by formalized Modus Ponens, which is to say that
P rv (]) Con .
7. Hence, Con P rv (]).
8. Since is a G
odel sentence of , it follows that Con .
Therefore, if Con , it would follows that , which would imply that was inconsistent by
Proposition 19.6.7.
Corollary 19.6.11. There exists a consistent, decidable with P A Cn() such that Con .
Proof. Let be the axioms of P A together with ConP A .
19.7
Diophantine Sets
222
Proof. Notice that n D if and only if there exists `, m N+ with n = (` + 1)(m + 1).
Proposition 19.7.3. The set D = {n N+ : n is not a power of 2} is Diophantine.
Proof. Notice that n D if and only there exists `, m N+ with n = `(2m + 1).
Proposition 19.7.4. Every Diophantine set is c.e.
Theorem 19.7.5 (Davis,Putnam,Robinson,Matiyasevich). Every c.e. set is Diophantine.
Corollary 19.7.6. Suppose that T h(N) is decidable. There exists f Z[x1 , x2 , . . . , xn ] such that if
is the sentence expressing that f has no root in N+ , then N but 6 .
19.8
A Speed-Up Theorem
Theorem 19.8.1 (G
odel). Suppose that ] is computable, that Q Cn(), and is such that 6 and
6 . For any computable h : N N, there exists and n such that
1. .
2. {} via a proof of length at most n.
3. Every proof of has length at least h(n).
Proof. We first argue that Cn( {})\Cn() is not c.e. Suppose instead that Cn( {})\Cn() was
c.e. We show that the complement of Cn( {}) is c.e., implying that Cn( {}) is computable,
which contradictis the Strong Undecidability of Q. We have
/ Cn( {}) {}
6
6
6 Cn()
Cn( {})\Cn()
Suppose then h : N N is computable but that there is no and n satisfying the above three conditions.
We show that Cn( {})\Cn() is c.e. Given , wait until (if ever) we see it enter Cn( {}) via a
proof of length of n. If we ever see this happen, check all proofs from up to length h(n) to see if appears.
If not, enumerate .