Philippeter AxiomaticSetTheory

Axiomatic Set Theory
Peter Philip∗
Lecture Notes
Created for the Class of Summer Semester 2024 at LMU Munich†
May 15, 2024
Contents
1 Motivation and Preliminaries 3
1.1 Cantor’s Definition, Russell’s Antinomy . . . . . . . . . . . . . . . . . . . 3
1.2 Mathematical Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Set-Theoretic Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Zermelo-Fraenkel Set Theory (ZF) 13

2.1 Existence, Extensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Models, Independence Results . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Comprehension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Pairing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.7.1 Replacement Scheme, Cartesian Products . . . . . . . . . . . . . 27
∗
E-Mail: [email protected]
†
Resources used in the preparation of this text include [Kun12, Kun13].
1
CONTENTS 2
2.7.2 Relations and Functions . . . . . . . . . . . . . . . . . . . . . . . 30

2.7.3 Ordinals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.8 Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.8.1 Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.8.2 Transfinite Induction on Well-Founded Relations . . . . . . . . . . 59
2.8.3 Transfinite Recursion on Well-Founded Relations . . . . . . . . . 69
2.9 Power Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.10 Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3 The Axiom of Choice 71
References 77
1 MOTIVATION AND PRELIMINARIES 3
1 Motivation and Preliminaries
1.1 Cantor’s Definition, Russell’s Antinomy

In 1895 in [Can95], Georg Cantor defined a set as “any collection into a whole M of
definite and separate objects m of our intuition or our thought”. The objects m are
called the elements of the set M and one writes m ∈ M if, and only if, m is an element
of M .
As it turns out, naive set theory, founded on Cantor’s definition, is not suitable to be
used in the foundation of mathematics. The problem lies in the possibility of obtaining
contradictions such as Russell’s antinomy, after Bertrand Russell, who described it in
1901, see [Rus80, Rus96].
Russell’s antinomy is obtained when considering the set X of all sets that do not con-
tain themselves as an element: When asking the question if X ∈ X, one obtains the
contradiction that X ∈ X ⇔ X ∈ / X:
Suppose X ∈ X. Then X is a set that contains itself. But X was defined to contain
only sets that do not contain themselves, i.e. X ∈
/ X.
So suppose X ∈
/ X. Then X is a set that does not contain itself. Thus, by the definition
of X, X ∈ X.
Perhaps you think Russell’s construction is rather academic, but it is easily translated
into a practical situation. Consider a library. The catalog C of the library should contain
all the library’s books. Since the catalog itself is a book of the library, it should occur
as an entry in the catalog. So there can be catalogs such as C that have themselves as
an entry and there can be other catalogs that do not have themselves as an entry. Now
one might want to have a catalog X of all catalogs that do not have themselves as an
entry. As in Russell’s antinomy, one is led to the contradiction that the catalog X must
have itself as an entry if, and only if, it does not have itself as an entry.
One can construct arbitrarily many versions, which we will not do. Just one more:
Consider a small town with a barber, who, each day, shaves all inhabitants, who do not
shave themselves. The poor barber now faces a terrible dilemma: He will have to shave
himself if, and only if, he does not shave himself.
To avoid contradictions such as Russell’s antinomy, axiomatic set theory restricts the
construction of sets via so-called axioms, as we will see below.
1.2 Mathematical Logic

The development and presentation of axiomatic set theory is based on mathematical
logic. Indeed, mathematical logic is a large field in its own right and a thorough intro-
duction is beyond the scope of this class – the interested reader may refer to [EFT07],
[Kun12], and references therein. Still, it will be necessary to at least introduce some
basic concepts. Occasionally, we will touch on some deeper logical issues and subtlebies,
usually referring to the literature for further information.
One can view the central goal of mathematics as the rigorous proof of the truth or
falsehood of statements. By a statement or proposition, we mean any sentence (any
sequence of symbols) that can reasonably be assigned a truth value, i.e. a value of either
true,
√ abbreviated T, or false, abbreviated F. For example, “2+3 = 5” is a true statement,
“ 2 < 0” is a false statement, whereas “3 · 5 + 7” and “x + 1 > 0” are not statements
at all.
Statements can be manipulated or combined into new statements using logical operators,
where the truth value of the combined statements depends on the truth values of the
original statements and on the type of logical operator facilitating the combination.
The simplest logical operator is negation, denoted ¬. It is a so-called unary operator, i.e.
it does not combine statements, but is merely applied to one statement. For example, if
A stands for the (true) statement “2 + 3 = 5”, then ¬A√ stands for the (false) statement
“2 + 3 6= 5”; if B √
stands for the (false) statement “ 2 < 0”, then ¬B stands √ for the
(true) statement “ 2 is not less than zero”, which can also be expressed as “ 2 ≥ 0”.
To completely understand the action of a logical operator, one usually writes what is
known as a truth table. For negation, the truth table is
A ¬A
T F (1.1)
F T
that means if the input statement A is true, then the output statement ¬A is false; if
the input statement A is false, then the output statement ¬A is true.
We now proceed to discuss binary logical operators, i.e. logical operators combining
precisely two statements. The following four operators are essential for mathematical
reasoning:
Conjunction: A and B, usually denoted A ∧ B.
Disjunction: A or B, usually denoted A ∨ B.
Implication: A implies B, usually denoted A ⇒ B.
Equivalence: A is equivalent to B, usually denoted A ⇔ B.
The corresponding truth table reads:

A B A∧B A∨B A⇒B A⇔B
T T T T T T
T F F T F F (1.2)
F T F T T F
F F F F T T
Note that the disjunction A ∨ B is true if, and only if, at least one of the statements
A, B is true. Here one already has to be a bit careful – A ∨ B defines the inclusive or,
whereas “or” in common English is often understood to mean the exclusive or (which is
false if both input statements are true). Instead of A implies B, one also says if A then
B, B is a consequence of A, B is concluded or inferred from A, A is sufficient for B, or
B is necessary for A.
The implication A ⇒ B is always true, except if A is true and B is false. At first
glance, it might be surprising that A ⇒ B is defined to be true for A false and B true,
however, this is precisely what distinguishes the implication from the equivalence. After
a moment’s contemplation, one will most likely notice that one is quite familiar with
examples of incorrect statements implying correct statements: For instance, squaring
the (false) equality of integers −1 = 1, implies the (true) equality of integers 1 = 1.
Of course, the implication A ⇒ B is not really useful in situations, where the truth
values of both A and B are already known. Rather, in a typical application, one tries
to establish the truth of A to prove the truth of B (a strategy that will fail if A happens
to be false).
The equivalence A ⇔ B means A is true if, and only if, B is true. Analogous to the
situation of implications, A ⇔ B is not really useful if the truth values of both A and
B are known a priori, but can be a powerful tool to prove B to be true or false by
establishing the truth value of A.
Note that the expressions in the first row of the truth table (1.2) (e.g. A ∧ B) are not,
actually, statements, as they contain the statement variables (also known as proposi-
tional variables) A or B. However, the expressions become statements if all statement
variables are substituted with actual statements. We will call expressions of this form
propositional formulas. Moreover, if a truth value is assigned to each statement variable
of a propositional formula, then this uniquely determines the truth value of the formula.
In other words, the truth value of the propositional formula can be calculated from
the respective truth values of its statement variables – the presently discussed topic is,
therefore, known as propositional calculus.
Example 1.1. (a) Consider the propositional formula (A ∧ B) ∨ (¬B). Suppose A is
true and B is false. The truth value of the formula is obtained according to the
following truth table:

A B A ∧ B ¬B (A ∧ B) ∨ (¬B)
(1.3)
T F F T T
(b) The propositional formula A ∨ (¬A), also known as the law of the excluded middle,
has the remarkable property that its truth value is T for every possible choice of
truth values for A:
A ¬A A ∨ (¬A)
T F T (1.4)
F T T
Formulas with this property are of particular importance.
Definition 1.2. A propositional formula φ is called a tautology or universally true if,
and only if, its truth value is T for all possible assignments of truth values to all the
statement variables it contains. One writes ⊢ φ if, and only if, φ is a tautology.
Notation 1.3. We write φ(A1 , . . . , An ) if, and only if, the propositional formula φ
contains precisely the n statement variables A1 , . . . , An .
Definition 1.4. The propositional formulas φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) are called
equivalent if, and only if, φ(A1 , . . . , An ) ⇔ ψ(A1 , . . . , An ) is a tautology.
—
For all logical purposes, two equivalent formulas are exactly the same – it does not
matter if one uses one or the other. The following Th. 1.6 provides some important
equivalences of propositional formulas. As too many parentheses tend to make formulas
less readable, we first introduce some precedence conventions for logical operators:
Convention 1.5. ¬ takes precedence over ∧, ∨, which take precedence over ⇒, ⇔. So,
for example,
(A ∨ ¬B ⇒ ¬B ∧ ¬A) ⇔ ¬C ∧ (A ∨ ¬D)
is the same as

A ∨ (¬B) ⇒ (¬B) ∧ (¬A) ⇔ (¬C) ∧ A ∨ (¬D) .
Theorem 1.6. (a) ⊢ (A ⇒ B) ⇔ ¬A ∨ B. This means one can actually define impli-
cation via negation and disjunction.

(b) ⊢ (A ⇔ B) ⇔ (A ⇒ B) ∧ (B ⇒ A) , i.e. A and B are equivalent if, and only if,
A is both necessary and sufficient for B. One also calls the implication B ⇒ A the
converse of the implication A ⇒ B. Thus, A and B are equivalent if, and only if,
both A ⇒ B and its converse hold true.
(c) Commutativity of Conjunction: ⊢ A ∧ B ⇔ B ∧ A.
(d) Commutativity of Disjunction: ⊢ A ∨ B ⇔ B ∨ A.
(e) Associativity of Conjunction: ⊢ (A ∧ B) ∧ C ⇔ A ∧ (B ∧ C).
(f ) Associativity of Disjunction: ⊢ (A ∨ B) ∨ C ⇔ A ∨ (B ∨ C).
(g) Distributivity I: ⊢ A ∧ (B ∨ C) ⇔ (A ∧ B) ∨ (A ∧ C).
(h) Distributivity II: ⊢ A ∨ (B ∧ C) ⇔ (A ∨ B) ∧ (A ∨ C).
(i) De Morgan’s Law I: ⊢ ¬(A ∧ B) ⇔ ¬A ∨ ¬B.
(j) De Morgan’s Law II: ⊢ ¬(A ∨ B) ⇔ ¬A ∧ ¬B.
(k) Double Negative: ⊢ ¬¬A ⇔ A.
(l) Contraposition: ⊢ (A ⇒ B) ⇔ (¬B ⇒ ¬A).
Proof. Each equivalence is proved by providing a suitable truth table, showing that the
respective equivalence τ is a tautology: In each case, the final column of the truth table
shows that, for all possible assignments of truth values to A, B, C (where applicable),
τ has truth value T:
(a):
A B ¬A A ⇒ B ¬A ∨ B (A ⇒ B) ⇔ ¬A ∨ B
T T F T T T
T F F F F T
F T T T T T
F F T T T T
(b) – (h): Exercise.

(i):
A B ¬A ¬B A ∧ B ¬(A ∧ B) ¬A ∨ ¬B ¬(A ∧ B) ⇔ ¬A ∨ ¬B
T T F F T F F T
T F F T F T T T
F T T F F T T T
F F T T F T T T
(j): Exercise.
(k):
A ¬A ¬¬A ¬¬A ⇔ A
T F T T
F T F T
(l):
A B ¬A ¬B A ⇒ B ¬B ⇒ ¬A (A ⇒ B) ⇔ (¬B ⇒ ¬A)
T T F F T T T
T F F T F F T
F T T F T T T
F F T T T T T
Having checked all the equivalences completes the proof of the theorem.
The importance of the rules provided by Th. 1.6 lies in their providing proof techniques,
i.e. methods for establishing the truth of statements from statements known or assumed
to be true. The rules of Th. 1.6 will be used frequently in proofs throughout this class.
Remark 1.7. Another important proof technique is the so-called proof by contradic-
tion, also called indirect proof. It is based on the observation, called the principle of
contradiction, that A ∧ ¬A is always false:
A ¬A A ∧ ¬A
T F F (1.5)
F T F
Thus, one possibility of proving a statement B to be true is to show ¬B ⇒ A ∧ ¬A for

some arbitrary statement A. Since the right-hand side of the implication is false, the
left-hand side must also be false, proving B is true.
—
Two more rules we will use regularly in subsequent proofs are the so-called transitivity
of implication and the transitivity of equivalence. In preparation for the transitivity
rules, we generalize implication to propositional formulas:
Definition 1.8. In generalization of the implication operator defined in (1.2), we say
the propositional formula φ(A1 , . . . , An ) implies the propositional formula ψ(A1 , . . . , An )
(denoted φ(A1 , . . . , An ) ⇒ ψ(A1 , . . . , An )) if, and only if, each assignment of truth values
to the A1 , . . . , An that makes φ(A1 , . . . , An ) true, makes ψ(A1 , . . . , An ) true as well, i.e.
if, and only if, ⊢ φ(A1 , . . . , An ) ⇒ ψ(A1 , . . . , An ).
Theorem 1.9. (a) Transitivity of Implication: ⊢ (A ⇒ B) ∧ (B ⇒ C) ⇒ (A ⇒ C).
(b) Transitivity of Equivalence: ⊢ (A ⇔ B) ∧ (B ⇔ C) ⇒ (A ⇔ C).
Proof. Both implications are proved by providing a suitable truth table, showing that
the respective implication τ (A, B, C) is a tautology: In each case, the final column of
the truth table shows that, for all possible assignments of truth values to A, B, and C,
τ (A, B, C) has truth value T. We carry out (a) and leave (b) as an exercise.
(a):
A B C A⇒B B⇒C (A ⇒ B) ∧ (B ⇒ C) A⇒C (A ⇒ B) ∧ (B ⇒ C) ⇒ (A ⇒ C)
T T T T T T T T
T F T F T F T T
F T T T T T T T
F F T T T T T T
T T F T F F F T
T F F F T F F T
F T F T F F T T
F F F T T T T T
(b): Exercise.
Definition and Remark 1.10. A proof of the statement B is a finite sequence of

statements A1 , A2 , . . . , An such that A1 is true; for 1 ≤ i < n, Ai implies Ai+1 , and An
implies B. If there exists a proof for B, then Th. 1.9(a) guarantees that B is true1 .
1.3 Set-Theoretic Formulas

The contradiction of Russell’s antinomy, as described in Sec. 1.1, is related to Cantor’s
sets not being hierarchical. Another source of contradictions in naive set theory is the
imprecise nature of informal languages such as English. Suppose B is a set and P (x) is
a statement about an element x of B (a so-called predicate of x). Then one might define
A := {x ∈ B : P (x)}
to be the subset of B, consisting of all elements of B such that P (x) is true. Now take
B := N := {1, 2, . . . } to be the set of the natural numbers and let
P (x) := “The number x can be defined by fifty English words or less”. (1.6)
1
Actually, more generally, a proof of the statement B is given by a finite sequence of statements
A1 , A2 , . . . , An such that A1 is true; the logical disjunction A1 ∨ · · · ∨ Ai implies Ai+1 for 1 ≤ i < n;
and A1 ∨ · · · ∨ An implies B. It is then still correct that the existence of a proof of B guarantees B to
be true.
Then A is a finite subset of N, since there are only finitely many English words (if you
think there might be infinitely many English words, just restrict yourself to the words
contained in some concrete dictionary). Then there is a smallest natural number n that
is not in A. But then n is the smallest natural number that can not be defined by
fifty English words or less, which, actually, defines n by less than fifty English words, in
contradiction to n ∈/ A.
To avoid contradictions of this type2 , we require P (x) to be a so-called set-theoretic
formula.
Definition 1.11. (a) The language of set theory consists precisely of the following
symbols: ∧, ¬, ∃, (, ), ∈, =, vj , where j = 1, 2, . . . .
(b) A set-theoretic formula is a finite string of symbols from the above language of set
theory that can be built using the following recursive rules:
(i) vi ∈ vj is a set-theoretic formula for all i, j = 1, 2, . . . .

(ii) vi = vj is a set-theoretic formula for all i, j = 1, 2, . . . .
(iii) If φ and ψ are set-theoretic formulas, then (φ) ∧ (ψ) is a set-theoretic formula.
(iv) If φ is a set-theoretic formula, then ¬(φ) is a set-theoretic formula.
(v) If φ is a set-theoretic formula, then ∃vj (φ) is a set-theoretic formula for all
j = 1, 2, . . . .
Example 1.12. Examples of set-theoretic formulas are (v3 ∈ v5 ) ∧ (¬(v2 = v3 )),
∃v1 (¬(v1 = v1 )); examples of symbol strings that are not set-theoretic formulas are
v1 ∈ v2 ∈ v3 , ∃∃¬, and ∈ v3 ∃.
Remark 1.13. It is noted that, for a given finite string of symbols, a computer can, in
principle, check in finitely many steps, if the string constitutes a set-theoretic formula
or not. The symbols that can occur in a set-theoretic formula are to be interpreted as
follows3 : The variables v1 , v2 , . . . are variables for sets. The symbols ∧ and ¬ are to be
interpreted as the logical operators of conjunction and negation as described in Sec. 1.2.
Moreover, ∃ stands for a so-called existential quantifier: The statement ∃vj (φ) means
“there exists a set vj that has the property φ” (we will see many examples throughout
this class). Parentheses ( and ) are used to make clear the scope of the logical symbols
∃, ∧, ¬. Where the symbol ∈ occurs, it is interpreted to mean that the set to the left of
2
The described contradiction is a variant of the so-called Berry paradox (see, e.g., [Wik22a] for
further information and references). While it is, clearly, not as easy to provide a variant of the Berry
paradox when using set-theoretic formulas, it does not seem at all obvious if it is, actually, impossible.
3
In the terminology of mathematical logic, Def. 1.11 provides the syntax of set-theoretic formulas,
whereas the interpretations given by the present Rem. 1.13 provide the semantics of set-theoretic
formulas
∈ is contained as an element in the set to the right of ∈. Similarly, = is interpreted to

mean that the sets occurring to the left and to the right of = are equal.
Remark 1.14. A disadvantage of set-theoretic formulas as defined in Def. 1.11 is that

they quickly become lengthy and unreadable (at least to the human eye). To make
formulas more readable and concise, one introduces additional symbols and notation.
Formally, additional symbols and notation are always to be interpreted as abbreviations
or transcriptions of actual set-theoretic formulas. For example, we use the rules of Th.
1.6 to define the additional logical symbols ∨, ⇒, ⇔ as abbreviations:
(φ) ∨ (ψ) is short for ¬((¬(φ)) ∧ (¬(ψ))) (cf. Th. 1.6(j)), (1.7a)
(φ) ⇒ (ψ) is short for (¬(φ)) ∨ (ψ) (cf. Th. 1.6(a)), (1.7b)
(φ) ⇔ (ψ) is short for ((φ) ⇒ (ψ)) ∧ ((ψ) ⇒ (φ)) (cf. Th. 1.6(b)). (1.7c)
We also define the universal quantifier ∀:
∀vj (φ) is short for ¬(∃vj (¬(φ))), (1.7d)
such that ∀vj (φ) means “each set vj has the property φ”, which is equivalent to the
statement “there does not exist a set vj that does not have the property φ”. Further
abbreviations and transcriptions are obtained from omitting parentheses if it is clear
from the context and/or from Convention 1.5 where to put them in, by writing variables
bound by quantifiers under the respective quantifiers (to improve readability), and by
using other symbols than vj for set variables. For example,
∀ (φ ⇒ ψ) transcribes ¬(∃v1 (¬((¬(φ)) ∨ (ψ)))), (1.7e)

x

∃φ ⇔ ∃ψ transcribes ((∃v1 (φ)) ⇒ (∃v2 (ψ))) ∧ ((∃v2 (ψ)) ⇒ (∃v1 (φ))).
x y
(1.7f)
Moreover,
6 vj is short for ¬(vi = vj );

vi = (1.7g)
vi ∈/ vj is short for ¬(vi ∈ vj ); (1.7h)

vi ⊆ vj is short for ∀ x ∈ vi ⇒ x ∈ vj . (1.7i)
x
Definition and Remark 1.15. We say that a variable vj , occurring in a set-theoretic

formula is bound by a quantifier or in the scope of a quantifier if, and only if, it occurs
directly behind an existential quantifier (i.e. in the form ∃vj (φ), cf. Def. 1.11(b)(v)) or
directly behind a universal quantifier (i.e. in the form ∀vj (φ), cf. (1.7d)); otherwise, we
call the variable vj free. Bound variables are sometimes also called dummy variables,
since, if the bound variable vj in, say, ∃vj (φ) is replaced by vk (and vk is not free
in φ, cf. Ex. 1.16(e) below), then ∃vj (φ) and the formula with vj replaced by vk are
equivalent. Thus, if one uses the transcriptions introduced in (1.7e) and (1.7f), then the
bound variables are precisely those, occurring under a quantifier. In principle, it is not
forbidden for the same variable (more precisely, the same variable symbol) to occur as
both a free variable and a bound variable in the same formula, and it could also occur in
the scope of several different quantifiers4 . However, using the same variable symbol both
free and bound and/or within several scopes tends to make formulas less readable and
it can, actually, always be avoided, using additional variable symbols, see Ex. 1.16(c)-
(e) below. One might already have encountered the analogous situation when writing
integrals: For example, consider f : R −→ R, defined by the formula
Z 1 Z 1 Z x
f (x) := x + x dx + x+ exp(x) dx dx . (1.8a)
0 x 0
In (1.8a), the variable x occurs as a bound variable with three different scopes (within
the scope of each of the three integrals, x is used as the respective integrand’s dummy
variable) and also as a “free” variable (not bound by any integral), namely as the
function argument of f . Successively replacing each bound version of x, starting with
the innermost integral, one can write (1.8a) in the equivalent (and more readable) form
Z 1 Z 1 Z z
f (x) := x + u du + z+ exp(y) dy dz . (1.8b)
0 x 0
Example 1.16. (a) x ∈ y has x and y as free variables and no bound variables. It
states that the set x is an element of the set y.
(b) ∃ (x ∈ y) has x bound and y free. It states that there exists a set x that is an
x
element of the set y (i.e. that y is not the empty set).
(c) In the formula
∀ y ∈ x ⇒ ∃ (x ∈ y) ,
y x
y is bound, whereas x occurs both free and bound. If one replaces the bound version
of x by z, then one obtains the equivalent formula

∀ y ∈ x ⇒ ∃ (z ∈ y) .
y z
The formulas state that, if the set y is an element of the set x, then y contains an
element z – in other words, the set x does not contain the empty set.
4
Using the same variable symbol in such a way is similar to using the same variable name for different
local variables when coding computer programs.
2 ZERMELO-FRAENKEL SET THEORY (ZF) 13
(d) The formula

∃ ∀ (x ∈ x)
x x
contains x as bound variables within two different scopes. Replacing the version of
x in the scope of the all quantor by y yields the equivalent formula
∃ ∀ (y ∈ y).
x y
It is somewhat peculiar, as it has the form ∃ φ, where x does not occur as a free
x
variable in φ. According to the interpretation given by Rem. 1.13, the formula is
true if, and only if, there exists a set such that φ is true, i.e. if, and only if, the
considered universe of sets in nonempty and every set in the universe contains itself.
(e) As stated in Def. and Rem. 1.15, dummy (i.e. bound) variables may be replaced by
other symbols without changing the meaning of the formula, however, if replacing
x in, say, ∃ (φ), then one has to make sure that the replacement does not occur as
x
a free variable in φ: For instance, in ∃ (x ∈ y) of (b), one can replace x with every
x
variable symbol, except y: While ∃ (x ∈ y) states that the set y is not empty, the
x
formula ∃ (y ∈ y) states the existence of a set that contains itself.
y
Remark 1.17. In Def. and Rem. 1.10, we defined a proof of statement B from statement
A1 as a finite sequence of statements A1 , A2 , . . . , An such that, for 1 ≤ i < n, Ai implies
Ai+1 , and An implies B. In the field of proof theory, which, similar to mathematical
logic, is a large field in its own right and a detailed treatment is beyond the scope
of this class, proofs are formalized via a finite set of rules that can be applied to (set-
theoretic) formulas (see, e.g., [EFT07, Sec. IV], [Kun12, Sec. II]). Once proofs have been
formalized in this way, one can, in principle, mechanically check if a given sequence of
symbols does, indeed, constitute a valid proof (without even having to understand the
actual meaning of the statements). Indeed, several different computer programs have
been devised that can be used for automatic proof checking, for example Coq [Wik22b],
HOL Light [Wik21], Isabelle [Wik22c] and Lean [Wik22d] to name just a few.
2 Zermelo-Fraenkel Set Theory (ZF)

Axiomatic set theory seems to provide a solid and consistent foundation for conducting
mathematics, and most mathematicians have accepted it as the basis of their everyday
work. However, there do remain some deep, difficult, and subtle philosophical issues
regarding the foundation of logic and mathematics (see, e.g., [Kun12, Sec. 0, Sec. III]).
Definition and Remark 2.1. An axiom is a statement that is assumed to be true

without any formal logical justification. The most basic axioms (for example, the stan-
dard axioms of set theory) are taken to be justified by common sense or some underlying
philosophy. However, on a less fundamental (and less philosophical) level, it is a common
mathematical strategy to state a number of axioms (for example, the axioms defining
the mathematical structure called a group), and then to study the logical consequences
of these axioms (for example, group theory studies the statements that are true for all
groups as a consequence of the group axioms). For a given system of axioms, the ques-
tion if there exists an object satisfying all the axioms in the system (i.e. if the system
of axioms is consistent, i.e. free of contradictions) can be extremely difficult (or even
impossible) to answer.
—
We are now in a position to formulate and discuss the axioms of axiomatic set the-
ory. More precisely, we will present the axioms of Zermelo-Fraenkel set theory, usually
abbreviated as ZF, which are Axiom 0 – Axiom 8 below. While there exist various
set theories in the literature, each set theory defined by some collection of axioms, the
axioms of ZFC, consisting of the axioms of ZF plus the axiom of choice (Axiom 9, see
Sec. 3 below), are used as the foundation of mathematics currently accepted by most
mathematicians.
2.1 Existence, Extensionality

Axiom 0 Existence:
∃ (X = X).
X
Recall that this is just meant to be a more readable transcription of the

set-theoretic formula ∃v1 (v1 = v1 ). The axiom of existence states that there
exists (at least one) set X.
—
In naive set theory, based on Cantor’s definition as described in Sec. 1.1, sets X and
Y are defined to be equal if, and only if, they contain precisely the same elements. In
axiomatic set theory, this is guaranteed by the axiom of extensionality:
Axiom 1 Extensionality:

∀ ∀ ∀ (z ∈ X ⇔ z ∈ Y ) ⇒ X = Y .
X Y z
—
Following [Kun12], we assume that the substitution property of equality is part of the
underlying logic, i.e. if X = Y , then X can be substituted for Y and vice versa without
changing the truth value of a (set-theoretic) formula. In particular, this yields the
converse to extensionality:

∀ ∀ X = Y ⇒ ∀ (z ∈ X ⇔ z ∈ Y ) .
X Y z
Before we discuss further consequences of extensionality, we would like to have the

existence of the empty set. However, Axioms 0 and 1 do not suffice to prove the existence
of an empty set as we will see in Ex. 2.3 below. We will take the opportunity to discuss,
at an early stage, the idea of proving independence results via suitable models in the
following section.
2.2 Models, Independence Results

One is often interested in proving the independence of an axiom A from a collection
C of other axioms, which one does by providing one model of set theory in which all
axioms in C hold as well as A, and a second model of set theory in which all axioms in
C hold, but A fails. In Def. 2.2 below, we provide the ten simple “toy models” (the first
seven are the ones introduced in [Kun12, Sec. I.2]). Subsequently, we will check which
of our axioms are satisfied by which model, providing a number of simple independence
results.
Definition 2.2 (Toy Models). Let a, b, c, d, e be distinct elements. For each index i in
{1, 2, . . . , 10}, we define the model Mi := (Di , Ei ), where Mi is the pair consisting of
the “domain” Di and a relation Ei on Di (i.e. Ei ⊆ Di × Di ), where one thinks of Di as
modeling the universe of sets and of Ei as modeling the element relation ∈ (one might
be concerned that the construction of these models is not justified by the axioms that
have, thus far, been introduced – this is a fair concern and we will address it further in
Rem. 2.4 below):
M1 := (D1 , E1 ), D1 := {a}, E1 := ∅,
M2 := (D2 , E2 ), D2 := {a}, E2 := {(a, a)},
M3 := (D3 , E3 ), D3 := {a, b}, E3 := {(a, b), (b, a)},
M4 := (D4 , E4 ), D4 := {a, b, c}, E4 := {(a, b), (b, a), (a, c), (b, c)},
M5 := (D5 , E5 ), D5 := {a, b, c}, E5 := {(a, b), (a, c)},
M6 := (D6 , E6 ), D6 := {a, b, c, d}, E6 := {(a, b), (a, c), (a, d), (b, c), (b, d), (c, d)},
M7 := (D7 , E7 ), D7 := {a, b, c}, E7 := {(a, b), (b, c)},
M8 := (D8 , E8 ), D8 := {a, b, c}, E8 := {(b, c)},
M9 := (D9 , E9 ), D9 := {a, b, c, d, e}, E9 := {(a, b), (b, c), (c, d), (b, e), (c, e)},
M10 := (D10 , E10 ), D10 := {a, b}, E10 := {(a, b), (b, b)}.
Example 2.3. For each toy model Mi of Def. 2.2, we will check if it satisfies Axiom
0 (i.e. existence of a set), Axiom 1 (i.e. extensionality), and the (non-)existence of an
empty set (c.f. (2.1) below). We will see that Axioms 0 and 1 are independent from
each other and that Axioms 0 and 1 together neither imply nor refute the existence of
an empty set.
(a) Axiom 0 holds in each of the above models Mi , i ∈ {1, . . . , 10}, since Di 6= ∅ in
each case.
(b) Axiom 1 holds in each Mi , i ∈ {1, 2, 3, 4, 6, 7, 9, 10}, but is violated in M5 and M8 :
Axiom 1 holds in Mi , i ∈ {1, 2}, since Di contains only 1 element.
Axiom 1 holds in M3 , since E3 provides precisely the relations aE3 b and bE3 a, i.e.,
in this universe of sets, b has only a as an element and a has only b as an element
– in particular, there are no distinct sets that contain precisely the same elements.
Axiom 1 is violated in M5 , since, according to E5 , both b and c contain precisely a
as an element.
We leave M4 and M6 – M10 as an exercise.
(c) We check which of our toy models do not contain an “empty set”, i.e. satisfy the
“axiom”
¬ ∃ ∀ x∈ /X : (2.1)
X x
(2.1) holds in M2 , M3 , M4 , whereas M1 , M5 , . . . , M10 do have an “empty set”: In

M2 , a contains a; in M3 , a contains b and b contains a; M4 is an exercise; in each
of the models M5 , . . . , M10 , a does not contain any elements (M8 even has a second
empty set, namely b).
From (a) – (c), we see, in particular, that M2 , M3 , M4 satisfy Axioms 0, 1, plus (2.1);
whereas M1 , M6 , M7 , M9 , M10 satisfy Axioms 0, 1, plus the existence of an “empty
set”.
Remark 2.4. Using models of set theory to prove independence results, as we have
just done in Def. 2.2 and Ex. 2.3 is subject to some logical subtleties: The validity of
such arguments relies on the admissibility of constructing the respective models: For
example, one can obtain all the models of Def. 2.2, if one is allowed to form sets with
up to 5 distinct elements, one is allowed to form ordered pairs from these elements, and
one is also allowed to form sets, containing the obtained ordered pairs as elements (of
course, each individual model can be obtained with weaker construction rules).
2.3 Comprehension
To obtain, among many other things, the existence of the empty set, we introduce the
additional axiom of comprehension. More precisely, in the case of comprehension, we
do not have a single axiom, but a scheme of infinitely many axioms, one for each set-
theoretic formula that satisfies a certain condition. Its formulation makes use of the
following definition:
Definition 2.5. One obtains the universal closure of a set-theoretic formula φ, by
writing ∀ in front of φ for each variable vj that occurs as a free variable in φ (recall from
vj
Def. and Rem. 1.15 that vj is free in φ if, and only if, it is not bound by a quantifier
in φ). While, if φ contains more than one free variable, the universal closure of φ is
nonunique (as one can choose an arbitrary order of the ∀ in front of φ), this does not
vj
cause a problem, since all universal closures of φ are equivalent.
Axiom 2 Comprehension Scheme: For each set-theoretic formula φ, not containing Y

as a free variable, the universal closure of

∃ ∀ x ∈ Y ⇔ (x ∈ X ∧ φ) (2.2)
Y x
is an axiom. Thus, the comprehension scheme states that, given the set
X, there exists (at least one) set Y , containing precisely the elements of X
that have the property φ (the importance of allowing φ in (2.2) to have free
variables will be illustrated in Ex. 2.11 below, where Ex. 2.11(e) will also
show, why Y must not be free in φ).
Lemma 2.6. Axioms 0 and 2 (i.e. the existence of a set together with the comprehension
scheme) imply the existence of (at least one) empty set, i.e. the validity of
∃ ∀ x∈
/ Y. (2.3)
Y x
Proof. According to Axiom 0, there exists a set X. Letting φ denote the set-theoretic
formula x 6= x, Axiom 2 yields

∃ ∀ x ∈ Y ⇔ (x ∈ X ∧ x 6= x) .
Y x
Since, for each x, the statement x ∈ X ∧ x 6= x is false, x ∈ Y must be false for each x
as well, thereby proving (2.3).
Example 2.7. We check which of our toy models M1 , . . . , M10 of Def. 2.2 satisfy Axiom
2 (i.e. the comprehension scheme):
We begin with some general considerations that will be useful for several of the models:
Claim 1: If X in (2.2) is an empty set, then (2.2) holds with Y := X: Indeed, both
x ∈ Y and x ∈ X ∧ φ are then false for each x and φ.
Claim 2: If the domain Di contains elements A, B, C (not necessarily distinct), where A
is empty and C contains precisely one element, namely B, then (2.2) holds for X := C:
Indeed, there are four possible cases to check: (i) φ does not contain x as a free variable
and φ is true (independently of x) – then (2.2) holds with Y := C (since x ∈ C ⇔
(x ∈ C ∧ φ)); (ii) φ does not contain x as a free variable and φ is false (independently
of x) – then (2.2) holds with Y := A; (iii) φ does contain x as a free variable and φ is
true for x = B – then (2.2) holds with Y := C (since x ∈ C ⇔ (x ∈ C ∧ φ), both sides
being true for x = B, both sides being false for x 6= B); (iv) φ does contain x as a free
variable and φ is false for x = B – then (2.2) holds with Y := A (since both sides of
x ∈ A ⇔ (x ∈ C ∧ φ) are false for each x).
Axiom 2 holds in M1 : Since D1 contains only one set, namely a, which is empty (ac-
cording to E1 ), (2.2) is true for Y := a by Claim 1. In combination with Ex. 2.3, we see
that M1 satisfies all Axioms 0 – 2. As it also satisfies

∀ ¬ ∃ (y ∈ x) ,
x y
M1 shows that Axioms 0 – 2 do not suffice to prove the existence of nonempty sets.
Axiom 2 does not hold in M2 , M3 , M4 : We know from Ex. 2.3(a),(c) that these models
satisfy Axiom 0, but violate (2.3). Thus, Lem. 2.6 yields that Axiom 2 does not hold.
Axiom 2 holds in M5 : If X := a, then, since a is an empty set, (2.2) holds with Y := a
by Claim 1. If X := b, then, since b contains precisely a, (2.2) holds by Claim 2 (using
A := B := a, C := b). If X := c, then, since c contains precisely a, (2.2) holds again by
Claim 2 (using A := B := a, C := c).
It is an exercise to show that Axiom 2 holds in M7 , M8 , M9 , but fails in M6 and M10 .
We summarize the toy models’ properties we found so far in the following table:
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
Axiom 0 (Existence) T T T T T T T T T T
Axiom 1 (Extensionality) T T T T F T T F T T
¬(2.1) (has empty set) T F F F T T T T T T
Axiom 2 (Comprehension) T F F F T F T T T F
In particular, comparing the corresponding rows in the table above, we find Axioms 1
and 2 to be mutually independent.
Remark 2.8. Comprehension alone does not provide uniqueness (for instance, we found
in Ex. 2.7 that model M8 satisfies comprehension, even though it has two distinct empty
sets). However, if one also assumes Axiom 1 (extensionality) and if both Y and Y ′ are
sets containing precisely the elements of X that have the property φ, then

∀ x ∈ Y ⇔ (x ∈ X ∧ φ) ⇔ x ∈ Y ′ ,
x
and extensionality implies Y = Y ′ . Thus, due to extensionality, the set Y given by

comprehension is unique, justifying the common notation
{x : x ∈ X ∧ φ} := {x ∈ X : φ} := Y. (2.4)
Theorem 2.9. Assuming Axioms 0 – 2, there exists a unique empty set (which we
denote by ∅ or by 0 – it is common to identify the empty set with the number zero in
axiomatic set theory).
Proof. Axiom 0 provides the existence of a set X. Then comprehension allows us to

define the empty set by
0 := ∅ := {x ∈ X : x 6= x},
where, as explained in Rem. 2.8, extensionality guarantees uniqueness.
Remark 2.10. In Rem. 1.14 we said that every formula with additional symbols and
notation is to be regarded as an abbreviation or transcription of a set-theoretic formula
as defined in Def. 1.11(b). Thus, formulas containing symbols for defined sets (e.g. 0
or ∅ for the empty set) are to be regarded as abbreviations for formulas without such
symbols. Some logical subtleties arise from the fact that there is some ambiguity in the
way such abbreviations can be resolved: For example, 0 ∈ X might abbreviate

ψ : ∃ φ(y) ∧ y ∈ X or χ : ∀ φ(y) ⇒ y ∈ X , where φ(y) stands for ∀ (v ∈ / y).
y y v
Then ψ and χ are equivalent if

∃ φ(y) ∧ ∀ φ(z) ⇒ y = z
y z
(e.g., if Axioms 0 – 2 hold), but they can be nonequivalent, otherwise: For example, in
model M8 of Def. 2.2, consider ψ and χ with X := c. In M8 , φ(y) is true for y := a and
y := b. Thus, ψ is true in M8 (since (b, c) ∈ E8 ), but χ is false in M8 (since (a, c) ∈
/ E8 ).
To avoid introducing logical ambiguities, we will only use formulas with symbols for
defined sets under the assumption of extensionality.
—
At first glance, the role played by the free variables in φ, which are allowed to occur
in Axiom 2, might seem a bit obscure. So let us consider examples to illustrate that
allowing free variables (i.e. set parameters) in comprehension is quite natural:
Example 2.11. In view of Rem. 2.10, assume Axiom 1 (extensionality).
(a) If φ in (2.2) is the formula x ∈ Z (having x, Z as free variables), then the set given
by the resulting axiom yields precisely the intersection of X and Z:
X ∩ Z := {x ∈ X : φ} = {x ∈ X : x ∈ Z}.
(b) While (a) shows how Axiom 2 provides the intersection of two sets, with a modifi-
cation, Axiom 2 also yields the existence of intersections of more than two sets (of
both finitely and even infinitely many): If M is a nonempty set, X ∈ M, and φ in
(2.2) is the formula ∀ x ∈ M (having x, M as free variables), then the set given
M ∈M
by the resulting axiom yields precisely the intersection of all sets that are elements
of M:
\ \
M := M := x : ∀ x ∈ M := x ∈ X : ∀ x ∈ M . (2.5)
M ∈M M ∈M
M ∈M
It is also customary (and useful) to define intersections

\
Mi := x : ∀ x ∈ Mi := x ∈ Mi0 : ∀ x ∈ Mi , (2.6)
i∈I i∈I
i∈I
where I 6= ∅ is a nonempty index set, i0 ∈ I 6= ∅ is an arbitrary fixed element of I,

and (Mi )i∈I is a so-called family of sets. However, conceptually (2.6) is significantly
more involved than (2.5) and not justifiable from the axioms considered so far:
Formally, the family (Mi )i∈I is a function f : I −→ N , Mi := f (i) for each i ∈ I,

where the existence of functions will be justified, once we have Axiom 3 (pairing,
Sec. 2.5), Axiom 4 (union, Sec. 2.6), and Axiom 5 (replacement,TSec. 2.7.1).
T The
definitions in (2.5) and (2.6) will be equivalent (in the sense that M = i∈I Mi ),
if we are allowed to form the set
M := {Mi : i ∈ I}
(if I is a set and Mi is a set for each i ∈ I, then M as above will be a set by Axiom
5). It is emphasized that the sets M and I in (2.5) and (2.6), respectively, were
required to be nonempty. If one tries to form
\ \
∅ = x : ∀ x ∈ X = x : ∀ x ∈ Mi = Mi ,
X∈∅ i∈∅
i∈∅
then one obtains the so-called universal class of all sets V, which is not a set (cf.
Sec. 2.4 below, in particular Ex. 2.14(b)).
(c) Suppose φ in (2.2) is the formula x ∈

/ Z (again having x, Z as free variables), then
the set given by the resulting axiom yields precisely the difference X minus Z:
X \ Z := {x ∈ X : φ} = {x ∈ X : x ∈
/ Z}.
(d) Note that it is even allowed for φ in (2.2) to have X as a free variable, so one can
let φ be the formula ∃ (x ∈ u ∧ u ∈ X) to define the set
u
n o
∗
X := x ∈ X : ∃ (x ∈ u ∧ u ∈ X) .
u
Then, if 0 := ∅, 1 := {0}, 2 := {0, 1}, we obtain
2∗ = {0} = 1.
(e) It is essential that φ in (2.2) must not contain Y as a free variable. Otherwise, one
would have a contradiction as soon as there exists any nonempty set: Suppose φ
in (2.2) were allowed to be the formula x ∈ / Y . Then, if X is nonempty, i.e. there
exists x ∈ X, (2.2) required the existence of a set Y such that x ∈ Y ⇔ x ∈ / Y.
Example 2.12. Another example of extensionality consequences is the important result

that the mathematical universe consists of sets and only of sets: Suppose there were
other objects in the mathematical universe, for example a cow C and a monkey M (or
any other object without elements, other than the empty set) – this would be equivalent
to allowing a cow or a monkey (or any other object without elements, other than the
empty set) to be considered a set, which would mean that our set-theoretic variables vj
were allowed to be a cow or a monkey as well. However, extensionality then implies the
false statement C = M = ∅, thereby excluding cows and monkeys from the mathematical
universe. Similarly, {C} and {M } (or any other object that contains a non-set), can
not be inside the mathematical universe. Indeed, otherwise we had

∀ x ∈ {C} ⇔ x ∈ {M }
x
(as C and M are non-sets) and, by extensionality, {C} = {M } were true, in contradic-
tion to a set with a cow inside not being the same as a set with a monkey inside. Thus,
we see that all objects of the mathematical universe must be so-called hereditary sets,
i.e. sets all of whose elements (thinking of the elements as being the children of the sets)
are also sets.
2.4 Classes
As we need to avoid contradictions such as Russell’s antinomy, we must not require the
existence of a set {x : φ} for each set-theoretic formula φ. However, it can still be
useful to think of a “collection” of all sets having the property φ. Such collections are
commonly called classes:
Definition 2.13. (a) If φ is a set-theoretic formula, then we call {x : φ} a class,

namely the class of all sets that have the property φ (typically, φ will have x as a
free variable).
(b) If φ is a set-theoretic formula, then we say the class {x : φ} exists (as a set) if, and
only if
∃ ∀ x∈X ⇔ φ (2.7)
X x
is true. Assuming Axiom 1 (extensionality), X is then actually unique and we

identify X with the class {x : φ}. If (2.7) is false, then {x : φ} is called a proper
class (and the usual interpretation is that the class is in some sense “too large” to
be a set).
Example 2.14. (a) Due to Russell’s antinomy of Sec. 1.1, we know that
R := {x : x ∈
/ x}
forms a proper class.

(b) The universal class of all sets, V := {x : x = x}, is a proper class. Once again,
this is related to Russell’s antinomy: If V were a set, then
R = {x : x ∈
/ x} = {x : x = x ∧ x ∈
/ x} = {x : x ∈ V ∧ x ∈
/ x}
would also be a set by comprehension. However, this is in contradiction to R being

a proper class by (a).
Remark 2.15. From the perspective of formal logic, statements involving proper classes
are to be regarded as abbreviations for statements without proper classes. For example,
it turns out that the class G of all sets forming a group is a proper class. But we might
write G ∈ G as an abbreviation for the statement “The set G is a group.”
2.5 Pairing
As we saw from our investigation of model M1 in Ex. 2.7, Axioms 0 – 2 are still consistent
with the empty set being the only set in existence. The next axiom will provide the
existence of nonempty sets:
Axiom 3 Pairing:
∀ ∀ ∃ (x ∈ Z ∧ y ∈ Z). (2.8)
x y Z
Thus, the pairing axiom states that, for all sets x and y, there exists a set Z
that contains x and y as elements.
—
In consequence of the pairing axiom, the sets
0 := ∅, (2.9a)
1 := {0}, (2.9b)
2 := {0, 1} (2.9c)
all exist. More generally, we may define:
Definition 2.16. Assume Axioms 0 – 3. If x, y are sets and Z is given by the pairing
axiom, then we call
(a) {x, y} := {u ∈ Z : u = x ∨ u = y} the unordered pair given by x and y,
(b) {x} := {x, x} the singleton set given by x,

(c) (x, y) := {{x}, {x, y}} the ordered pair given by x and y.
We can now show that ordered pairs behave as expected:
Lemma 2.17. Assuming Axioms 0 – 3, the following holds true:

∀′ ′ (x, y) = (x′ , y ′ ) ⇔ (x = x′ ) ∧ (y = y ′ ) . (2.10)
x,y,x ,y
Proof. “⇐” is merely

x=x′ , y=y ′
(x, y) = {{x}, {x, y}} = {{x′ }, {x′ , y ′ }} = (x′ , y ′ ).
“⇒” is done by distinguishing two cases: If x = y, then
{{x}} = (x, y) = (x′ , y ′ ) = {{x′ }, {x′ , y ′ }}.
Next, by extensionality, we first get {x} = {x′ } = {x′ , y ′ }, followed by x = x′ = y ′ ,

establishing the case. If x 6= y, then
{{x}, {x, y}} = (x, y) = (x′ , y ′ ) = {{x′ }, {x′ , y ′ }},
where, by extensionality {x} 6= {x, y} 6= {x′ }. Thus, using extensionality again, {x} =
{x′ } and x = x′ . Next, we conclude
{x, y} = {x′ , y ′ } = {x, y ′ }
and a last application of extensionality yields y = y ′ .
Remark 2.18. Assume Axioms 0 – 3.
(a) We now have the existence of the infinitely many different sets 0, {0}, {{0}}, . . . .
In particular, none of our finite toy models M1 , . . . , M10 from Def. 2.2 can satisfy
Axioms 0 – 3. While we will need the axiom of infinity of Sec. 2.8.1 below to
formally define the notions finite and infinite, in Ex. 2.19 below, we will see that
only M2 and M10 satisfy pairing (and we know from Ex. 2.7 that M2 and M10 do
not satisfy comprehension). However, Axioms 0 – 3 do not, yet, suffice to prove the
existence of sets with more than two elements.
(b) At this stage, it would already be possible to introduce the notion of a relation by
calling a set a relation if, and only if, all its elements are ordered pairs. However,
without further axioms, this becomes cumbersome, one can not, actually, construct
many interesting relations anyway, and certain definitions (such as domain, image,
function) would depend on the particular definition of (x, y) := {{x}, {x, y}} in Def.
2.16(c), rather than merely on the key property (2.10) of ordered pairs (cf. [Kun12,
Sec. I.7.1]). Thus, we postpone the definition and consideration of relations and
functions to Sec. 2.7.2, where we can use the axioms of union and replacement
to justify the existence of Cartesian products, then giving rise to relations and
functions in the usual way.
(c) Once one has ordered pairs, one can proceed to define more general ordered tuples
by letting
(v1 ) := v1 ,

(v1 , v2 ) := v1 , (v2 ) := {v1 }, {v1 , v2 } (ordered pair, same as Def. 2.16(c)),

(v1 , v2 , v3 ) := v1 , (v2 , v3 ) (ordered triple),

(v1 , v2 , v3 , v4 ) := v1 , (v2 , v3 , v4 ) (ordered quadruple),
...
where v1 , v2 , . . . are arbitrary sets. While this is less elegant than the usual definition
of ordered n-tuples (v1 , . . . , vn ) as the function v : {1, . . . , n} −→ {v1 , . . . , vn },
vi := v(i), it has the advantage of not needing any further axioms. Once we have
sufficiently many axioms to justify definition via recursion and proof via induction,
we can show both definitions of ordered n-tuples to be equivalent.
3 (pairing): Axiom 3 holds only in M2 and M10 , and is violated in all the remaining
models: Axiom 3 holds in M2 , since a is the only set in the model and a is an element of
a. Axiom 3 does not hold in M1 , M3 , . . . , M8 : In M1 , there is no set containing a; in M3 ,
there is no set containing both a and b; in M4 , M5 , M7 , M8 , there is no set containing
c; and in M6 and M9 , there is no set containing d. Axiom 3 holds in M10 , since a and
b are the only sets in the model and b contains both a and b. We summarize the toy
models’ properties we found so far in the following table:
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
Axiom 3 (Pairing) F T F F F F F F F T
2.6 Union
To be able to construct sets with more than two elements, we introduce the following
axiom:
Axiom 4 Union:
∀ ∃ ∀ ∀ (x ∈ X ∧ X ∈ M) ⇒ x ∈ Y . (2.11)
M Y x X
Thus, the union axiom states that, for each set of sets M, there exists a set
Y containing all elements of elements of M.
Definition 2.20. (a) If M is a set and Y is given by the union axiom, then define
[ [
M := X := x ∈ Y : ∃ x∈X . (2.12)
X∈M
X∈M
(b) If X and Y are sets, then define

[
X ∪ Y := {X, Y }.
(c) If x, y, z are sets, then define
{x, y, z} := {x, y} ∪ {z}.
Remark 2.21. (a) Analogous to (2.6) for intersections, once one has a family of sets
(Mi )i∈I , it is also useful to define set-theoretic unions as
[
Mi := x : ∃ x ∈ Mi . (2.13)
i∈I
i∈I
Analogous to the remark in Ex.

S 2.11(b),
S the definitions in (2.12) and (2.13) will be
equivalent (in the sense that M = i∈I Mi ), if we are allowed to form the set
M := {Mi : i ∈ I}.
(b) The union [ [ [

∅= X= Mi = ∅
X∈∅ i∈∅
is the empty setT– in particular, a set (this is in contrast to the situation for inter-
sections, where ∅ = V, which is a proper class and not a set, cf. Ex. 2.11(b)).
Definition 2.22. For each set x, we define its successor to be the set x ∪ {x}. While
we will define functions between sets in the usual way in Sec. 2.7.2 below, it can already
be useful to think of the successor function as a class function
S : V −→ V, S(x) := x ∪ {x}
(clearly, S will not be a function between sets, since it is defined for each set x, that
means it is defined on the proper class V – however each restriction to a set V will be
a set function in the usual sense). Recalling (2.9), we have 1 = S(0), 2 = S(1); and we
can define 3 := S(2), . . .
4 (union): As it turns out, Axiom 4 holds in each Mi , except for i = 9:
Axiom 4 holds in M1 , since a is the only set in D1 and a is empty. Axiom 4 holds in M3 :
If M := a in (2.11), then the only possibility (due to E3 ) is X = b and, thus, x = a,
implying (2.11) to hold with Y = b (since (a, b) ∈ E3 ). Switching the roles of a and b
shows (2.11) to hold with Y = a for M := b. Axiom 4 holds in M5 : For M := a, (2.11)
is trivially true (with arbitrary Y ∈ D5 ), since a is empty; for M := b or M := c, (2.11)
still holds with arbitrary Y , since, in both cases, X = a and a is empty.
We leave it as an exercise to verify Axiom 4 also holds in M2 , M4 , M6 , M7 , M8 , M10 .
Axiom 4 does not hold in M9 : Consider (2.11) with M := e. Since e contains b and c,
b contains a, and c contains b, we would need an element Y of D9 that contains both a
and b. However, D9 does not contain such an element.
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
Axiom 4 (Union) T T T T T T T T F T
2.7 Replacement
2.7.1 Replacement Scheme, Cartesian Products
As mentioned before, we desire to define relations and functions in the usual manner,
making use of the Cartesian product A × B of two sets A and B, where A × B consists
of all ordered pairs (x, y), where x ∈ A and y ∈ B. However, Axioms 0 – 4 are not suf-
ficient to justify the existence of Cartesian products. To obtain Cartesian products, we
employ the following axiom of replacement. Analogous to the axiom of comprehension,
the axiom of replacement actually consists of a scheme of infinitely many axioms, one
for each set-theoretic formula. For the formulation of replacement, it is convenient to
introduce another abbreviation:
Notation 2.24. If φ is a set-theoretic formula, then

∃! φ is short for ∃ φ(y) ∧ ∀ φ(z) ⇒ y = z , (2.14)
y y z
where the notation φ(y) and φ(z) is supposed to mean that, if y is free in φ, then this
free y is replaced by z to obtain φ(z) from φ(y). Thus, ∃! φ holds if, and only if, there
y
exists a unique set y with the property φ.
Axiom 5 Replacement Scheme: For each set-theoretic formula φ, not containing Y as

a free variable, the universal closure of

∀ ∃! φ ⇒ ∃ ∀ ∃ φ (2.15)
x∈X y Y x∈X y∈Y
is an axiom. Thus, the replacement scheme states that if, for each x ∈ X,
there exists a unique y having the property φ (where, in general, φ will depend
on x), then there exists a set Y that, for each x ∈ X, contains this y with
property φ. One can view this as obtaining Y by replacing each x ∈ X by
the corresponding y = y(x).
Theorem 2.25. Assuming Axioms 0 – 5, the following holds true: If A and B are sets,
then the Cartesian product of A and B, i.e. the class

A × B := x : ∃ ∃ x = (a, b) (2.16)
a∈A b∈B
exists as a set.
Proof. For each a ∈ A, we can use replacement with X := B and φ := φa being the
formula y = (a, x) to obtain the existence of the set
{a} × B := {(a, x) : x ∈ B} (2.17a)
(in the usual way, comprehension and extensionality were used as well). Analogously,
using replacement again with X := A and φ being the formula y = {x} × B, we obtain
the existence of the set
M := {{x} × B : x ∈ A}. (2.17b)
In a final step, the union axiom now shows

[ [
M= {a} × B = A × B (2.17c)
a∈A
to be a set as well.
Example 2.26. We check which of our toy models M1 , . . . , M10 of Def. 2.2 satisfy
Axiom 5 (replacement): We will see that Axiom 5 holds in M1 , M2 , M3 , M10 , but fails
in M4 , . . . , M9 :
Axiom 5 holds in M1 : Since the only set in D1 is the empty set a, x ∈ X = a in (2.15)
is false for each x, implying (2.15) to hold with Y = a.
Axiom 5 holds in M2 : Once again, X = a is the only possibility in (2.15). Since a is the
only element of D1 , each admissible φ must hold precisely for y := a, implying (2.15) to
hold with Y = a.
Axiom 5 holds in M3 : The only possibilities in (2.15) are X := a or X := b. In both
cases, since a and b each have precisely one element, each admissible φ must either hold
precisely for y := a (in which case (2.15) holds with Y = b) or precisely for y := b (in
which case (2.15) holds with Y = a).
Axiom 5 does not hold in M4 : Consider (2.15) with X := b and

φ := ∃ u 6= v ∧ u ∈ y ∧ v ∈ y .
u,v
Then φ is admissible in (2.15) (since y = c is the unique set in D4 with precisely two
elements). However, there does not exist a set Y ∈ D4 such that (c, Y ) ∈ E4 .
Models M5 – M10 are left as an exercise.
We summarize the toy models’ properties we found so far in the following table5 :
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
Axiom 5 (Replacement) T T T F F F F F F T
5
In the literature, one sometimes finds the statement that the axiom of replacement plus the existence
of an empty set implies the axiom of comprehension. Models M3 and M10 show that with the axiom
of replacement in the form (2.15), which is the version found, e.g., in [Kun12, Sec. I.2] and [Hal17, Ch.
3.7], this is not the case! The situation is different if the axiom of replacement requires that the set Y
in (2.15) contains precisely those y with φ = φ(x, y) true for some x ∈ X.
2.7.2 Relations and Functions
Now that we have the existence of Cartesian products according to Th. 2.25, we proceed
to define relations in the usual way:
Definition 2.27. Assume Axioms 0 – 5. Given sets A and B, each subset R of A × B is
called a relation over A and B (if A = B, then we call R a relation on A). If one wants
to be completely precise, a relation is an ordered triple (A, B, R), where R ⊆ A × B (see
Rem. 2.18(c) above for the definition of ordered triples). The set A is called the domain
of R, denoted dom(R)6 , B is called the codomain of R, denoted codom(R), and R is
the relations graph (here we commit the usual abuse of notation, referring to both the
relation triple and relation’s graph as R). One says that a ∈ A and b ∈ B are related
according to the relation R if, and only if, (a, b) ∈ R. In this context, one often writes
a R b instead of (a, b) ∈ R.
Definition and Remark 2.28. Assume Axioms 0 – 5. Let A, B be sets and let
R ⊆ A × B be a relation over A and B. If T is a subset of A, then call

R(T ) := b ∈ B : ∃ (a, b) ∈ R
a∈T
the image of T under R; if U is a subset of B, then call

−1
R (U ) := a ∈ A : ∃ (a, b) ∈ R
b∈U
the preimage or inverse image of U under R. Moreover, we call R(A) the image of R
and we call R−1 (B) the preimage, inverse image, or active domain of R (cf. footnote
to the definition of domain in Def. 2.27 above). To prove the existence of R(T ) and
R−1 (U ) as sets, apply (2.15) with X := R and

φ := ∃ x = (a, y)
a
to obtain Y to be a superset of R(T ) (and, then, R(T ) via comprehension), and with
X := R and
φ := ∃ x = (y, b)
b
to obtain Y to be a superset of R (U ) (and, then, R−1 (U ) via comprehension).

−1
Definition 2.29. Assume Axioms 0 – 5. Let A, B be sets and let R ⊆ A × B be a

relation over A and B.
6
As a caveat we note that the notion of domain varies in the literature – for example, [Kun12, Def.
I.7.3] defines a relation’s domain to be what we call its preimage or active domain according to Def.
and Rem. 2.28.
(a) R is called univalent or right-unique or a partial function if, and only if,

∀ ∀ (x R y1 ∧ x R y2 ) ⇒ y1 = y2 ,
x∈A y1 ,y2 ∈B
i.e. if, and only if, every element of A is related to at most one element of B.
(b) R is called total or left-total if, and only if,
∀ ∃ (x R y),
x∈A y∈B
i.e. if, and only if, in terms of Def. and Rem. 2.28, the active domain of R is all of
A (i.e. R−1 (B) = A).
(c) R is called injective or left-unique if, and only if,

∀ ∀ (x1 R y ∧ x2 R y) ⇒ x1 = x2 ,
x1 ,x2 ∈A y∈B
i.e. if, and only if, for every element y of B there exists at most one element of A
that is related to y.
(d) R is called one-to-one if, and only if, it is an injective partial function.
(e) R is called surjective or right-total or onto if, and only if,
∀ ∃ (x R y),
y∈B x∈A
i.e. if, and only if, in terms of Def. and Rem. 2.28, the image of R is all of B (i.e.
R(A) = B).
(f ) R is called a function if, and only if, it is a total partial function. In this case, one
usually writes R : A −→ B and one introduces the usual notation

∀ ∀ R(x) = y :⇔ x 7→ y :⇔ xRy
x∈A y∈B
(where the notation x 7→ y is only useful, if the function R is understood). One

calls x 7→ R(x) the assignment rule of the function. Also note that, for functions,
injective and surjective have their usual meanings, where, for functions, the notions
injective and one-to-one coincide. Moreover, a function is called bijective if, and
only if, it is both injective and surjective.
(g) If A = B, then R is called the identity on A if, and only if, R : A −→ A, R(x) = x.
For the identity on R, one writes IdA (or simply Id, if A is understood). Actually,
the identity on A is the same as the equality relation “=” on A, sometimes also
called the diagonal on A, denoted ∆(A). Thus, one has
IdA = ∆(A) := {(x, x) ∈ A × A : x ∈ A}
and
∀ x = y ⇔ IdA (x) = y ⇔ (x, y) ∈ ∆(A) .
x,y∈A
The preferred terms and notation depend on the emphasis being either on the
function perspective or the relation perspective.
Definition 2.30. Assume Axioms 0 – 5 and let R ⊆ A × B be a relation over sets A
and B.
(a) The relation

R−1 := (b, a) ∈ B × A : a R b ⊆ B × A

is called the inverse or converse relation of R (note that the notation R−1 is con-
sistent with the notation introduced in Def. and Rem. 2.28).
(b) Given U ⊆ A, the relation S ⊆ U × B over U and B, defined by

S := (a, b) ∈ U × B : a R b
is called the restriction of R to U ; R is called an extension of S to A. In this
situation, one also uses the notation R↾U for S (some authors prefer the notation
R|U or R|U and often one is less precise and still writes R for the restriction). If R
is a relation on A (i.e. R ⊆ A × A), then we also define its strong restriction to U ,
denoted R↾↾U ⊆ U × U , to be the relation on U defined by

R↾↾U := (a, b) ∈ U × U : a R b
(in general, one then has R↾↾U ( R↾U ).
(c) Given a relation T ⊆ C × D over sets C and D the composition of R and T is the
relation over A and D defined by

T ◦ R := (a, d) ∈ A × D : ∃ ( a R b ∧ b T d ) ⊆ A × D.
b∈B∩C
The expression T ◦ R is read as “T after R” or “T composed with R”. Of course,

if R and T are functions with R(A) ⊆ C, then T ◦ R is the function

T ◦ R : A −→ D, (T ◦ R)(a) = T R(a) .
Proposition 2.31. Assume Axioms 0 – 5. Consider sets A, B, C, D, E, F and relations

R ⊆ A × B, S ⊆ C × D, T ⊆ E × F .
(a) Associativity of Compositions: It holds that
T ◦ (S ◦ R) = (T ◦ S) ◦ R. (2.18)
(b) Properties of the Inverse Relation: One has
(R−1 )−1 = R. (2.19a)
Moreover,
R is a partial function ⇔ R−1 is injective, (2.19b)

R is injective ⇔ R−1 is a partial function, (2.19c)
R is one-to-one ⇔ R−1 is one-to-one, (2.19d)
R is surjective ⇔ R−1 is total, (2.19e)
R is total ⇔ R−1 is surjective, (2.19f)
R is a function ⇔ R−1 is injective and surjective, (2.19g)
R is injective and surjective ⇔ R−1 is a function, (2.19h)
R is a bijective function ⇔ R−1 is a bijective function. (2.19i)
(c) The law for forming inverse relations reads:
(S ◦ R)−1 = R−1 ◦ S −1 . (2.20)
(d) One has the following law for forming images and preimages:
∀ (S ◦ R)(U ) = S(R(U )), (2.21a)

U ⊆A
∀ (S ◦ R)−1 (W ) = R−1 (S −1 (W )). (2.21b)

W ⊆D
(e) If R and S are both partial functions (resp. both injective or both one-to-one), then
so is S ◦ R.
(f ) Assuming R(A) ⊆ C, the following holds true: If R and S are both total (resp. both
a function), then so is S ◦ R (but see Ex. 2.32(a)).
(g) Assuming S −1 (D) ⊆ B, the following holds true: If R and S are both surjective,
then so is S ◦ R (but see Ex. 2.32(a)).
(h) Assuming B = C, the following holds true 7 : If R and S are both bijective functions,
then so is S ◦ R (but see Ex. 2.32(a)).
(i) If R is a bijective function, then R−1 ◦ R = IdA (but see Ex. 2.32(b)).
Proof. (a): According to Def. 2.30(c), both T ◦ (S ◦ R) and (T ◦ S) ◦ R are relations over
A and F . So it just remains to prove

∀ (a, f ) ∈ T ◦ (S ◦ R) ⇔ (a, f ) ∈ (T ◦ S) ◦ R .
(a,f )∈A×F
Indeed, we obtain, for each (a, f ) ∈ A × F ,

(a, f ) ∈ T ◦ (S ◦ R) ⇔ ∃ a (S ◦ R) d ∧ d T f
d∈D∩E

⇔ ∃ ∃ aRb ∧ bS d ∧ dT f
b∈B∩C d∈D∩E

⇔ ∃ a R b ∧ b (T ◦ S) f ⇔ (a, f ) ∈ (T ◦ S) ◦ R,
b∈B∩C
thereby establishing the case.

(b) – (e) are left as exercises.
(f): Assume R(A) ⊆ C. If R and S are total, then, given a ∈ A,

b∈C
∃ (a R b) ⇒ ∃ ∃ aRb ∧ bS D ⇒ ∃ a (S ◦ R) d,
b∈B b∈B∩C d∈D d∈D
proving S ◦ R to be total. If R and S are both functions, then they are both total partial
functions, implying S ◦ R to be a total partial function (i.e. a function) by combining
what we have just proved with (e).
(g) – (i) are left as exercises.
Example 2.32. (a) To see that Prop. 2.31(f),(g),(h) are not correct without their re-
spective assumptions R(A) ⊆ C, S −1 (D) ⊆ B, B = C, consider A := B := {1, 2},
C := D := {2, 3}, R := IdA = {(1, 1), (2, 2)}, S := IdC = {(2, 2), (3, 3)}. Then R
and S are bijective functions, but S ◦ R = {(2, 2)} is neither total nor surjective.
An even simpler example is given by A := B := {1}, C := D := {2}, R := IdA ,
S := IdC , S ◦ R = ∅.
7
As one wants to apply (f) and (g), instead of B = C, one might be inclined to use the hypotheses
R(A) ⊆ C and S −1 (D) ⊆ B, since, at first glance, this might appear weaker. However, the also assumed
surjectivity of R then yields R(A) = B ⊆ C and the also assumed totality of S (i.e. surjectivity of S −1 )
then yields S −1 (D) = C ⊆ B, and we are back to B = C.
(b) To see that the converse of Prop. 2.31(i) does not hold, consider A := {1}, B :=
{1, 2, 3}, R := {(1, 1), (1, 2)}. Then R is not a function and not surjective, but still
R−1 = {(1, 1), (2, 1)} and R−1 ◦ R = IdA .
Definition 2.33. Assume Axioms 0 – 5 and let R be a relation on a set A, i.e. R ⊆ A×A.
(a) R is called reflexive if, and only if,
∀ x R x,
x∈A
i.e. if, and only if, every element is related to itself.
(b) R is called symmetric if, and only if,

∀ xRy ⇒ yRx ,
x,y∈A
i.e. if, and only if, each x is related to y if, and only if, y is related to x.
(c) R is called antisymmetric if, and only if,

∀ (x R y ∧ y R x) ⇒ x = y ,
x,y∈A
i.e. if, and only if, the only possibility for x to be related to y at the same time that
y is related to x is in the case x = y.
(d) R is called asymmetric if, and only if,

∀ x R y ⇒ ¬(y R x) ,
x,y∈A
i.e. if, and only if, x is related to y only if y is not related to x.
(e) R is called transitive if, and only if,

∀ (x R y ∧ y R z) ⇒ x R z ,
x,y,z∈A
i.e. if, and only if, the relatedness of x and y together with the relatedness of y and
z implies the relatedness of x and z.
(f ) R is called an equivalence relation if, and only if, R is reflexive, symmetric, and
transitive.
(g) R satisfies trichotomy if, and only if,

∀ xRy ∨ yRx ∨ x = y ,
x,y∈A
i.e. if, and only if, if x and y are distinct, then x is related to y or y is related to x.
(h) R is called a partial order if, and only if, R is reflexive, antisymmetric, and transitive.
If R is a partial order, then one usually writes x ≤ y instead of x R y. A partial
order is called a total or linear order if, and only if, it also satisfies trichotomy.
(i) R is called a strict partial order if, and only if, R is asymmetric and transitive. If
R is a partial order, then one usually writes x < y instead of x R y. A strict partial
order is called a strict total or strict linear order if, and only if, it also satisfies
trichotomy.
Lemma 2.34. If ≤ is a partial order on a set A, then, using the notation of Def. 2.29(g),
< := ≤ \ ∆(A) is a strict partial order (called the strict partial order corresponding to
≤). Conversely, if < is a strict partial order on A, then ≤ := < ∪ ∆(A) is a partial
order (called the partial order corresponding to <).
Proof. Let ≤ be a partial order on A, < := ≤ \ ∆(A). If x, y ∈ A with x < y, then

x 6= y (since ¬(x < x)). Then, also ¬(y < x): Otherwise, we had x ≤ y and y ≤ x,
implying the contradiction x = y (by the antisymmetriy of ≤). Thus, < is asymmetric.
Now suppose, we have x < y and y < z with x, y, z ∈ A. Then x ≤ y and y ≤ z,
implying x ≤ z by the transitivity of ≤. If x = z, then z ≤ x ≤ y ≤ z, y = z, in
contradiction to y < z. Thus, x 6= z and x < z, showing < to be transitive and a strict
partial order. The proof that ≤ := < ∪ ∆(A) is a partial order, if < is a given strict
partial order, is left as an exercise.
Proposition 2.35. Let R be a relation on a set A and R−1 its inverse relation as defined
in Def. 2.30(a). Then
R is reflexive ⇔ R−1 is reflexive, (2.22a)

R is symmetric ⇔ R−1 is symmetric, (2.22b)
R is antisymmetric ⇔ R−1 is antisymmetric, (2.22c)
R is asymmetric ⇔ R−1 is asymmetric, (2.22d)
R is transitive ⇔ R−1 is transitive, (2.22e)
R is an equivalence relation ⇔ R−1 is an equivalence relation, (2.22f)
R satisfies trichotomy ⇔ R−1 satisfies trichotomy, (2.22g)
R is a partial (resp. total) order ⇔ R−1 is a partial (resp. total) order, (2.22h)
R is a str. par. (resp. total) order ⇔ R−1 is a str. par. (resp. total) order,
(2.22i)
−1
R \ ∆(A) = R−1 \ ∆(A). (2.22j)
Proof. Since R = (R−1 )−1 , for each equivalence, it suffices to prove just one implication
(the converse then follows by applying the first implication with R replaced by R−1 ).
Let x, y, z ∈ A. Then
x R x ⇒ x R−1 x,
proving (2.22a). If R is transitive, then
x R−1 y ∧ y R−1 z ⇒ zRy ∧ yRx ⇒ zRx ⇒ x R−1 z,
showing R−1 to be transitive and (2.22e). Also,
(x, y) ∈ R−1 \ ∆(A) ⇔ (y, x) ∈ R ∧ x 6= y ⇔ (y, x) ∈ R \ ∆(A)

−1
⇔ (x, y) ∈ R \ ∆(A) ,
thereby proving (2.22j). We leave the remaining cases (all straightforward) as exercises.

Definition 2.36. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A.
(a) x ∈ A is called lower (resp. upper) bound for B if, and only if, x ≤ b (resp. b ≤ x)
for each b ∈ B. Moreover, B is called bounded from below (resp. from above) if, and
only if, there exists a lower (resp. upper) bound for B; B is called bounded if, and
only if, it is bounded from above and from below.
(b) x ∈ B is called minimum or just min (resp. maximum or max) of B if, and only if,
x is a lower (resp. upper) bound for B. One writes x = min B if x is minimum and
x = max B if x is maximum.
(c) A maximum of the set of lower bounds of B (i.e. a largest lower bound) is called
infimum of B, denoted inf B; a minimum of the set of upper bounds of B (i.e. a
smallest upper bound) is called supremum of B, denoted sup B.
We extend all the notions defined above to strict partial orders < by applying them
to the partial order corresponding to <, i.e. to ≤ := < ∪ ∆(A): For example, we call
x ∈ A a lower bound of B ⊆ A with respect to < if, and only if, x is a lower bound of
B with respect to ≤, and analogous for the other notions.
Lemma 2.37. Let ≤ and < be relations on a set A, where ≤ is a partial order and < is
a strict partial order. Let ≥ := (≤)−1 and > := (<)−1 be the respective inverse relations
according to Def. 2.30(a), i.e.

∀ x≥y ⇔y≤x ∧ x>y ⇔ y<x . (2.23)
x,y∈A
According to (2.22h) and (2.22i), ≥ is also a partial order on A and > is also a strict
partial order on A, where ≤ (resp. <) being total on A, implies ≥ (resp. >) to be total
on A as well. If < is the strict order corresponding to ≤, then > is the strict order
corresponding to ≥. Moreover for A 6= ∅ and ∅ 6= B ⊆ A, using obvious notation, we
have, for each x ∈ A,
x ≤-lower bound for B ⇔ x ≥-upper bound for B, (2.24a)
x ≤-upper bound for B ⇔ x ≥-lower bound for B, (2.24b)
x = min≤ B ⇔ x = max≥ B, (2.24c)
x = max≤ B ⇔ x = min≥ B, (2.24d)
x = inf ≤ B ⇔ x = sup≥ B, (2.24e)
x = sup≤ B ⇔ x = inf ≥ B. (2.24f)
All the equivalences in (2.24) also hold if ≤ is replaced by < and ≥ is replaced by >.
Proof. If < is the strict order corresponding to ≤, then

−1 (2.22j)
< = ≤ \ ∆(A) ⇒ > = (<)−1 = ≤ \ ∆(A) = (≤)−1 \ ∆(A) = ≥ \ ∆(A),
i.e. > is the strict order corresponding to ≥. Moreover,
x ≤-lower bound for B ⇔ ∀ x ≤ b ⇔ ∀ b ≥ x ⇔ x ≥-upper bound for B,
b∈B b∈B
proving (2.24a). Analogously, we obtain (2.24b). Next, (2.24c) and (2.24d) are implied
by (2.24a) and (2.24b), respectively. Finally, (2.24e) is proved by
x = inf ≤ B ⇔ x = max≤ {y ∈ A : y ≤-lower bound for B}
⇔ x = min≥ {y ∈ A : y ≥-upper bound for B} ⇔ x = sup≥ B,
and (2.24f) follows analogously. That all the equivalences in (2.24) also hold if ≤ is
replaced by < and ≥ is replaced by > is now immediate from the last paragraph of Def.
2.36.
Proposition 2.38. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A. The elements

max B, min B, sup B, inf B are all unique, provided they exist.
Proof. Exercise.
Definition 2.39. Let A, B be nonempty sets with partial orders, both denoted by ≤
(even though they might be different). A function f : A −→ B, is called (strictly)
isotone, order-preserving, or increasing if, and only if,

∀ x < y ⇒ f (x) ≤ f (y) (resp. f (x) < f (y)) ; (2.25a)
x,y∈A
f is called (strictly) antitone, order-reversing, or decreasing if, and only if,

∀ x < y ⇒ f (x) ≥ f (y) (resp. f (x) > f (y)) . (2.25b)
x,y∈A
Functions that are (strictly) isotone or antitone are called (strictly) monotone.
Proposition 2.40. Let A, B be nonempty sets with partial orders, both denoted by ≤.
(a) A (strictly) isotone function f : A −→ B becomes a (strictly) antitone function

and vice versa if precisely one of the relations ≤ is replaced by ≥.
(b) If the order ≤ on A is total and f : A −→ B is strictly isotone or strictly antitone,

then f is injective.
(c) If the order ≤ on A is total and f : A −→ B is bijective and strictly isotone (resp.
antitone), then f −1 is also strictly isotone (resp. antitone).
Proof. (a) is immediate from (2.25).

(b): Due to (a), it suffices to consider the case that f is strictly isotone. If f is strictly
isotone and x 6= y, then x < y or y < x since the order on A is total. Thus, f (x) < f (y)
or f (y) < f (x), i.e. f (x) 6= f (y) in every case, showing f is injective.
(c): Again, due to (a), it suffices to consider the isotone case. If u, v ∈ B such that u < v,
then u = f (f −1 (u)), v = f (f −1 (v)), and the isotonicity of f imply f −1 (u) < f −1 (v) (we
are using that the order on A is total – otherwise, f −1 (u) and f −1 (v) need not be
comparable).
Example 2.41. The following examples show that the assertions of Prop. 2.40(b),(c)
are no longer correct if one does not assume the order on A to be total. Let

A := (1, 1), (2, 1), (1, 2) .
Then
(m1 , m2 ) ≤ (n1 , n2 ) ⇔ m1 ≤ n1 ∧ m2 ≤ n2 , (2.26)
defines a partial order on A that is not a total order (for example, neither (1, 2) ≤ (2, 1)
nor (2, 1) ≤ (1, 2)).
(a) The function 

f (1, 1) := 1,

f : A −→ {1, 2}, f (1, 2) := 2,

f (2, 1) := 2,

is strictly isotone, but not one-to-one.

(b) The function 
f (1, 1) := 1,

f : A −→ {1, 2, 3}, f (1, 2) := 2,

f (2, 1) := 3,

is strictly isotone and bijective, however f −1 is not isotone (since 2 < 3, but
f −1 (2) = (1, 2) and f −1 (3) = (2, 1) are not comparable, i.e. f −1 (2) ≤ f −1 (3) is
not true).
Definition 2.42. A relation R on a set A is called a (strict) well-order if, and only if,
R is a (strict) total order and every nonempty subset of A has a min with respect to R
(for example, we will see that the usual ≤ constitutes a well-order on N; however, the
usual ≤ does not constitute a well-order on Z (e.g., Z does not have a min) or on R+ 0
(e.g., R+ does not have a min)).
Definition 2.43. (a) Let R be a relation on a set A. We define R6= := R \ ∆(A), i.e.
R6= is the relation on A defined by
x R6= y :⇔ x R y ∧ x 6= y
(for example, if ≤ is a partial order, then < := (≤)6= is the corresponding strict
partial order, cf. Lem. 2.34).
(b) Let R be a relation on a set A and let S be a relation on a set B. We define a
relation P := R ⊙ S on A × B, called the lexicographic product of R and S, where
(a1 , b1 ) P (a2 , b2 ) :⇔ (a1 , b1 ) (R⊙S) (a2 , b2 ) :⇔ a1 R6= a2 ∨ (a1 = a2 ∧ b1 S b2 ).
Proposition 2.44. Let R be a relation on a set A, let S be a relation on a set B, and

let P := R ⊙ S be the lexicographic product on A × B, as defined in Def. 2.43.
(a) If S is reflexive, then P is reflexive.
(b) If R and S are symmetric, then P is symmetric.
(c) If R and S are antisymmetric, then P is antisymmetric.
(d) If R and S are asymmetric, then P is asymmetric.
(e) If R6= and S are transitive, then P is transitive (but see Ex. 2.45).
(f ) If R and S satisfy trichotomy, then P satisfies trichotomy.
(g) If R and S are (strict) partial orders, then P is a (strict) partial order. In this
situation, one calls P the lexicographic order given by R and S. It is also common
to denote all three orders R, S, P by the same symbol ≤ (or all by < in the strict
case).
(h) If R and S are (strict) total orders, then P is a (strict) total order.
(i) If R and S are (strict) well-orders, then P is a (strict) well-order.
Proof. Let a, a1 , a2 , a3 ∈ A and b, b1 , b2 , b3 ∈ B.

(a): If S is reflexive, then (a, b) P (a, b), since a = a and b S b.
(b): Exercise.
(c): If R and S are antisymmetric, then (a1 , b1 ) P (a2 , b2 ) ∧ (a2 , b2 ) P (a1 , b1 ) implies
a1 = a2 (otherwise, a1 R6= a2 and a2 R6= a1 needed to hold, in contradiction to the anti-
symmetry of R). Thus,
(a1 , b1 ) P (a2 , b2 ) ∧ (a2 , b2 ) P (a1 , b1 ) ⇒ a1 = a2 ∧ b 1 S b 2 ∧ b 2 S b 1

⇒ (a1 , b1 ) = (a2 , b2 ),
showing P to be antisymmetric.
(d),(e): Exercise.
(f): Assume R and S to satisfy trichotomy. Then

¬ (a1 , b1 ) P (a2 , b2 ) ∧ ¬ (a2 , b2 ) P (a1 , b1 ) ⇒ ¬(a1 R6= a2 ) ∧ ¬(a2 R6= a1 )
⇒ a1 = a2 ⇒ ¬(b1 S b2 ) ∧ ¬(b2 S b1 ) ⇒ b1 = b2 ⇒ (a1 , b1 ) = (a2 , b2 ),
proving P to satisfy trichotomy.

(g),(h): Exercise.
(i): Assume R and S are (strict) well-orders. Then P is a (strict) total order by (h). If
∅ 6= C ⊆ A × B, then, letting

A1 := a ∈ A : ∃ (a, b) ∈ C ,
b∈B
we have ∅ 6= A1 ⊆ A. Since R is a (strict) well-order, there exists α := min A1 . Now,

letting
B1 := b ∈ B : (α, b) ∈ C ,
we have ∅ 6= B1 ⊆ B. Since S is a (strict) well-order, there exists β := min B1 . We show
(α, β) = min C: Let (a, b) ∈ C. If a 6= α, then α R6= a, since a, α ∈ A1 and α = min A1 .
If a = α, then b = β or β S6= b, since b, β ∈ B1 and β = min B1 . Thus, (a, b) = (α, β)
or (α, β) P (a, b) (both hold if P is a total order, but not if P is a strict total order),
showing (α, β) to be a lower bound for C. Since, also, (α, β) ∈ C, we have shown
(α, β) = min C and P is a (strict) well-order.
Example 2.45. To see that the lexicographic product of transitive relations need not
be transitive and that the lexicographic product of equivalence relations need not be
an equivalence relation, consider A := {1, 2} with R := {(1, 1), (2, 2), (1, 2), (2, 1)}, and
S := {(1, 1), (2, 2)}. It is an exercise to show R and S are both equivalence relations,
but R ⊙ S is not transitive (in particular, not an equivalence relation).
Lemma 2.46. Let R be a relation on a set A, U ⊆ A, and let R↾↾U denote its strong
restriction to U as defined in Def. 2.30(b).
(a) If R is reflexive, then R↾↾U is reflexive.
(b) If R is symmetric, then R↾↾U is symmetric.
(c) If R is antisymmetric, then R↾↾U is antisymmetric.
(d) If R is asymmetric, then R↾↾U is asymmetric.
(e) If R is transitive, then R↾↾U is transitive.
(f ) If R is an equivalence relation, then R↾↾U is an equivalence relation.
(g) If R satisfies trichotomy, then R↾↾U satisfies trichotomy.
(h) If R is a (strict) partial order, then R↾↾U is a (strict) partial order.

(i) If R is a (strict) total order, then R↾↾U is a (strict) total order.
(j) If R is a (strict) well-order, then R↾↾U is a (strict) well-order.
(k) R6= ↾↾U = (R↾↾U )6= .
Proof. Let x, y, z ∈ U . Since U ⊆ A, we then have x, y, z ∈ A, which is the key

ingredient to the proofs below.
(a): If R is reflexive, then x R x, showing R↾↾U to be reflexive.
(b): If R is symmetric, then x R y implies y R x, showing R↾↾U to be symmetric.
(c): If R is antisymmetric, then x R y and y R x implies x = y, showing R ↾↾U to be
antisymmetric.
(d): If R is asymmetric, then x R y implies ¬(y R x), showing R↾↾U to be asymmetric.
(e): If R is transitive, then x R y and y R z implies x R z, showing R↾↾U to be transitive.
(f) follows by combining (a), (b), and (e).
(g): If R satisfies trichotomy, then
xRy ∨ yRx ∨ x = y
holds, showing R↾↾U to satisfy trichotomy.

(h) follows by combining (a), (c), and (e) (resp. (d) and (e) in the strict case).
(i) follows by combining (h) with (g).
(j): Due to (h), it merely remains to show that every nonempty subset V ⊆ U has a
min. However, since V ⊆ A and R is a well-order on A, there is m ∈ V such that m is
a min for R on A, implying m to be a min for R on U as well.
(k): Since
(x, y) ∈ R6= ↾↾U ⇔ x R y ∧ x 6= y ⇔ (x, y) ∈ (R↾↾U )6= ,
the proof is complete.
2.7.3 Ordinals
In preparation for our official definition of N in Def. 2.72 below, we will study so-called
ordinals, which are special sets also of further interest to the field of set theory (the
natural numbers will turn out to be precisely the finite ordinals).
Definition 2.47. A set X is called transitive if, and only if, every element of X is also
a subset of X:
∀ x ⊆ X. (2.27a)
x∈X
Clearly, (2.27a) is equivalent to

∀ x∈y ∧ y∈X ⇒ x∈X . (2.27b)
x,y
Lemma 2.48. (a) Intersections of transitive sets are transitive: If M is a nonempty

set, then \
∀ X is transitive ⇒ M is transitive
X∈M
(in particular, if X, Y are transitive sets, then X ∩ Y is a transitive set).
(b) Unions of transitive sets are transitive: If M is a set, then

[
∀ X is transitive ⇒ M is transitive
X∈M
(in particular, if X, Y are transitive sets, then X ∪ Y is a transitive set).

T
Proof. (a): If x ∈ M and T X ∈ M, then x ∈ X.TThus, if y ∈ x, then y ∈ X, since X
is transitive, showing y ∈ M. In consequence, M is transitive.
S
(b): If x ∈ M, then there exists X ∈ M such that x S ∈ X. Thus, if y ∈ x, then
yS∈ X, since X is transitive. This, in turn, implies y ∈ M, since X ⊆ M, showing
M to be transitive.
Definition 2.49. (a) A set α is called an ordinal number or just an ordinal if, and only
if, α is transitive and ∈ constitutes a strict well-order on α. An ordinal α is called a
successor ordinal if, and only if, there exists an ordinal β such that α = S(β), where
S is the successor function of Def. 2.22. An ordinal α 6= 0 is called a limit ordinal
if, and only if, it is not a successor ordinal. We denote the class of all ordinals by
ON (it is a proper class by Cor. 2.57 below).
(b) We define
∀ (α < β :⇔ α ∈ β), (2.28a)

α,β∈ON
∀ (α ≤ β :⇔ α < β ∨ α = β). (2.28b)

α,β∈ON
Notation 2.50. Given a set A, we define the element relation R∈ on A by
R∈ := {(x, y) ∈ A × A : x ∈ y}, (2.29a)
i.e.
∀ (x, y) ∈ R∈ ⇔ x ∈ y. (2.29b)
x,y∈A
Example 2.51. (a) Using (2.9), 0 = ∅ is an ordinal, and 1 = S(0), 2 = S(1) are both
successor ordinals (in Prop. 2.74, we will identify N0 as the smallest limit ordinal).
Even though X := {1} and Y := {0, 2} are well-ordered by ∈, they are not ordinals,
since they are not transitive sets: 1 ∈ X, but 1 6⊆ X (since 0 ∈ 1, but 0 ∈ / X);
similarly, 1 ∈ 2 ∈ Y , but 1 ∈
/ Y.
(b) As a caveat, we point out that, in genereal, saying that a set A is transitive is not
equivalent to saying that R∈ is transitive on A: Actually, in general, neither impli-
cation is true: In (a) we saw that R∈ was a transitive relation on the nontransitive
sets X and Y . To see that the converse implication can fail, consider

A := 0, 1, 2, {1} .
Recalling 1 = {0} and 2 = {0, 1}, we observe A to be a transitive set. However, R∈

is not transitive on A, since 0 ∈ 1 and 1 ∈ {1}, but 0 ∈
/ {1}.
Lemma 2.52. No ordinal contains itself, i.e.
∀ α∈
/ α.
α∈ON
Proof. If α is an ordinal, then ∈ is a strict order on α. Due to asymmetry of strict

orders, x ∈ x can not be true for any element of α, implying that α ∈ α can not be
true.
Proposition 2.53. Every element of an ordinal is an ordinal, i.e.

∀ X ∈ α ⇒ X ∈ ON
α∈ON
(in other words, ON is a transitive class).
Proof. Let α ∈ ON and X ∈ α. Since α is transitive, we have X ⊆ α. As ∈ is a strict

well-order on α, it must also be a strict well-order on X by Lem. 2.46(j). In consequence,
it only remains to prove that X is transitive as well. To this end, let x ∈ X. Then
x ∈ α, as α is transitive. If y ∈ x, then, using transitivity of α again, y ∈ α. Now
y ∈ X, as ∈ is transitive on α, proving x ⊆ X, i.e. X is transitive.
Proposition 2.54. If α, β ∈ ON, then X := α ∩ β ∈ ON (we will see in Th. 2.59(a)

below that, actually, α ∩ β = min{α, β} and, moreover, the result extends to arbitrary
intersections and, analogously, to arbitrary unions).
Proof. The set X is transitive by Lem. 2.48(a), and, since X ⊆ α, ∈ is a strict well-order
on X by Lem. 2.46(j).
Proposition 2.55. On the class ON, the relation ≤ (as defined in (2.28)) is the same
as the relation ⊆, i.e.

∀ α ≤ β ⇔ α ⊆ β ⇔ (α ∈ β ∨ α = β) . (2.30)
α,β∈ON
Proof. Exercise.
Theorem 2.56. The class ON is strictly well-ordered by ∈, i.e.
(i) ∈ is transitive on ON:

∀ α<β ∧ β<γ ⇒ α<γ .
α,β,γ∈ON
(ii) ∈ is asymmetric on ON:

∀ α < β ⇒ ¬(β < α) .
α,β∈ON
(iii) Ordinals are always comparable:

∀ α<β ∨ β<α ∨ α=β .
α,β∈ON
(iv) Every nonempty set of ordinals has a min.
Proof. (i) is clear, as γ is a transitive set.

(ii): If α, β ∈ ON, then α ∈ β ∈ α implies α ∈ α by (i), which is a contradiction to
Lem. 2.52.
(iii): Let γ := α ∩ β. Then γ ∈ ON by Prop. 2.54. Thus
Lem. 2.55
γ⊆α ∧ γ⊆β ⇒ (γ ∈ α ∨ γ = α) ∧ (γ ∈ β ∨ γ = β). (2.31)
If γ ∈ α and γ ∈ β, then γ ∈ α ∩ β = γ, in contradiction to Lem. 2.52. Thus, by (2.31),

γ = α or γ = β. If γ = α, then α ⊆ β. If γ = β, then β ⊆ α, completing the proof of
(iii).
(iv): Let X be a nonempty set of ordinals and consider α ∈ X. If α = min X, then

we are already done. Otherwise, Y := α ∩ X = {β ∈ X : β ∈ α} 6= ∅. Since α is
well-ordered by ∈, there is m := min Y . If β ∈ X, then either β < α or α ≤ β by (iii).
If β < α, then β ∈ Y and m ≤ β. If α ≤ β, then m < α ≤ β. Thus, m = min X,
proving (iv).
Corollary 2.57. ON is a proper class (i.e. there is no set containing all the ordinals).
Proof. If there is a set X containing all ordinals, then, by comprehension, β := ON =

{α ∈ X : α is an ordinal} must be a set as well. But then Prop. 2.53 says that the set
β is transitive and Th. 2.56 yields that the set β is well-ordered by ∈, implying β to be
an ordinal, i.e. β ∈ β in contradiction to Lem. 2.52.
Corollary 2.58. For each set X of ordinals, we have:
(a) X is well-ordered by ∈.
(b) X is an ordinal if, and only if, X is transitive. Note: A transitive set of ordinals
X is sometimes called an initial segment of ON, since, here, transitivity can be
restated in the form

∀ ∀ α<β ⇒ α∈X . (2.32)
α∈ON β∈X
Proof. (a) is a simple consequence of Th. 2.56(i)-(iv).

(b) is immediate from (a).
Theorem 2.59. Let X be a nonempty set of ordinals.

T
(a) Then γ := X is an ordinal, namely γ = min X. In particular, if α, β ∈ ON,
then min{α, β} = α ∩ β.
S
(b) Then δ := X is an ordinal, namely δ = sup X. In particular, if α, β ∈ ON, then
max{α, β} = α ∪ β.
Proof. (a): Let m := min X. Then γ ⊆ m, since m ∈ X. Conversely, if α ∈ X, then

m ≤ α implies m ⊆ α by Prop. 2.55, i.e. m ⊆ γ. Thus, m = γ, proving (a).
(b): Exercise.
Next, we obtain some results regarding the successor function S of Def. 2.22 in the
context of ordinals.
Lemma 2.60. We have

∀ x, y ∈ S(α) ∧ x ∈ y ⇒ x 6= α .
α∈ON
Proof. Seeking a contradiction, we reason as follows:

α∈α
/ y∈S(α) α transitive x∈y
x = α ⇒ y 6= α ⇒ y∈α ⇒ y ⊆ α ⇒ α ∈ α.
This contradiction to α ∈
/ α yields x 6= α, concluding the proof.
Proposition 2.61. For each α ∈ ON, the following holds:
(a) S(α) ∈ ON.

(b) α < S(α).
(c) For each ordinal β, β < S(α) holds if, and only if, β ≤ α.
(d) For each ordinal β, if β < α, then S(β) ≤ α < S(α).
(e) For each ordinal β, if S(β) < S(α), then β < α.
(f ) If α is a limit ordinal, then α = sup α. If α is a successor ordinal, then α =
S(sup α).
Proof. (a): Due to Prop. 2.53, S(α) is a set of ordinals. Thus, by Cor. 2.58(b), it merely
remains to prove that S(α) is transitive. Let x ∈ S(α). If x = α, then x = α ⊆ α∪{α} =
S(α). If x 6= α, then x ∈ α and, since α is transitive, this implies x ⊆ α ⊆ S(α), showing
S(α) to be transitive, thereby completing the proof of (a).
(b) holds, as α ∈ S(α) holds by the definition of S(α).
(c) is clear, since, for each ordinal β,
β < S(α) ⇔ β ∈ S(α) ⇔ β ∈ α ∨ β = α ⇔ β ≤ α.
(d): If β < α, then S(β) = β ∪ {β} ⊆ α, i.e. S(β) ≤ α < S(α).

(e) follows from (d) using contraposition: If ¬(β < α), then β = α or α < β, implying
S(β) = S(α) or S(α) < S(β), i.e. ¬(S(β) < S(α)).
(f): Suppose α > 0. Since β < α means β ∈ α, α is an upper bound for α, showing S
sup α ≤ α. Since α is a set of ordinals by Prop. 2.53, Th. 2.59(b) yields sup α = α. If
β ∈ α, then (b) and (d) Simply β < S(β) ≤ α. If α is a limit ordinal, then S(β) < α, i.e.
β ∈ S(β) ∈ α and β ∈ α, proving α = sup α. If α is a successor ordinal, then there
exists β ∈ α with S(β) = α. We still show β = max α: If β < γ ∈ ON, then, by (d),
α = S(β) ≤ γ, showing γ ∈ / α. Since β ∈ α, this shows β = max α = sup α.
In Th. 2.67 below, we will show that, up to isomorphism, ordinals are the only strictly
well-ordered sets. While we are mostly interested in order isomorphisms, it seems to
make sense to introduce homomorphism for relations in general:
Definition 2.62. Let A, B be sets, let R be a relation on A, and let S be a relation
on B. A function f : A −→ B is called a homomorphism between (A, R) and (B, S) if,
and only if,
∀ x R y ⇒ f (x) S f (y) . (2.33)
x,y∈A
If f is a homomorphism, then it is called monomorphism if, and only if, it is injective;

epimorphism if, and only if, it is surjective; isomorphism if, and only if, it is bijec-
tive and f −1 : B −→ A is a homomorphism as well; endomorphism if, and only if,
(A, R) = (B, S); automorphism if, and only if, it is both endomorphism and isomor-
phism. Moreover, (A, R) and (B, S) are called isomorphic (denoted (A, R) ∼ = (B, S))
if, and only if, there exists an isomorphism f : A −→ B. In this case, we also write
f : (A, R) ∼= (B, S).
Lemma 2.63. Let A, B be sets with total orders, both denoted by ≤ and the respective
corresponding strict total orders both denoted by <. Given a function f : A −→ B, the
following statements are equivalent:
(i) f is an isomorphism with respect to the total orders, i.e. f : (A, ≤) ∼

= (B, ≤).
(ii) f is an isomorphism with respect to the strict total orders, i.e. f : (A, <) ∼
= (B, <).
(iii) f is strictly isotone and surjective.
Proof. Exercise.
Definition 2.64. Let R be a relation on a set A. Using the notation of Def. and Rem.
2.28, we define
∀ a↓ := pred(A, a) := pred(A, a, R) := R−1 ({a}) = {x ∈ A : x R a},
a∈A
where we use the notation pred(A, a) and a↓ if R or both R and A are understood. One
can think of pred(A, a, R) as the set of predecessors of a in A with respect to the relation
R (which is especially useful, if R constitutes an order relation on A). If R well-orders
A, then one can also think of pred(A, a, R) as an initial segment of A with respect to
the well-order.
Lemma 2.65. Isomorphisms between well-ordered sets map initial segments to initial
segments: If A, B are sets with strict well-orders, both denoted by <, and f : (A, <) ∼
=
(B, <) is an isomorphism, then
f : (A, <) ∼
= (B, <) ⇒ ∀ f (a↓ ) = (f (a))↓ .
a∈A
Proof. If y ∈ f (a↓ ), then there exists x ∈ A with x < a such that y = f (x). Then, as f is
strictly isotone by Lem. 2.63, y = f (x) < f (a), i.e. y ∈ (f (a))↓ , proving f (a↓ ) ⊆ (f (a))↓ .
We can now apply what we just proved with a replaced by f (a) and f replaced by f −1
to obtain f −1 (f (a))↓ ⊆ (f −1 (f (a)))↓ = a↓ . Applying f to both sides of this inclusion

yields (f (a))↓ ⊆ f (a↓ ), thereby completing the proof of the lemma.
In Th. 2.71 and Th. 2.77 below, we will justify the proof method of induction on the
set of natural numbers and, subsequently, we will generalize induction proofs such that
they can be applied on general well-ordered sets and even on well-ordered classes (like
ON) and still more general ojects. The basic idea of induction proofs is as follows: To
proof an assertion P (x) holds for all x ∈ C, C being a suitable class, one first establishes
that P (x) holds for all “small” x ∈ C, then assumes the existence of a smallest x ∈ C
with ¬P (x), showing this to provide a contradiction. We will see first examples of this
strategy in the proofs of Prop. 2.66 and Th. 2.67 below.
Proposition 2.66. If α, β ∈ ON and f : (α, <) ∼ = (β, <), then α = β and f = Idα (in
particular, the identity is the unique automorphism on an ordinal).
Proof. If α = 0 or β = 0, then there is nothing to prove. Thus, let ξ ∈ α. Then

f (ξ) ∈ β. Since ξ ∈ ON, we have ξ = ξ↓ and Lem. 2.65 implies
f (ξ) = f (ξ↓ ) = (f (ξ))↓ = {f (µ) : µ < ξ} = {f (µ) : µ ∈ ξ}. (2.34)
Now let X := {ξ ∈ α : f (ξ) 6= ξ} and, seeking a contradiction, assume X 6= ∅. Since

< is a strict well-order on α, there exists m := min X ∈ X. Thus, for each µ ∈ m, we
have µ < m and f (µ) = µ, implying
(2.34)
f (m) = {f (µ) : µ ∈ m} = {µ ∈ α : µ ∈ m} = m,
in contradiction to m ∈ X. In consequence, we have shown X = ∅ and f = Idα .
Theorem 2.67. If A is a set and < is a strict well-order on A, then there exists a
unique α ∈ ON such that (A, <) ∼ = (α, ∈) (we then define type(A) := type(A, <) := α
and call α the order type of the strict well-order (A, <); we write type(A), if the strict
well-order < on A is understood). Moreover, the isomorphism f : (A, <) ∼ = (α, ∈) is
unique.
Proof. Uniqueness of α is clear due to Prop. 2.66. If f, g : (A, <) ∼

= (α, ∈) are both
isomorphisms, then Prop. 2.66 yields
Idα = f ◦ g −1 ⇒ g = f,
proving uniqueness of the isomorphism f . It remains to prove existence. The idea

is to show that the theorem’s claim holds for each initial segment of A and then, in
consequence, for A. If A = ∅, then the empty function f := ∅ provides the isomorphism
f : (A, <) ∼
= (0, ∈) (recall 0 = ∅ as well). Thus, we now assume A 6= ∅ and we call
a ∈ A good if, and only if,
∃ (a↓ , <) ∼
= (ξ, ∈).
f (a):=ξ∈ON
Letting G := {a ∈ A : a good}, we know G 6= ∅, since, for m := min A, we have m↓ = ∅

and f (m) = 0. Due to the uniqueness of f (a) for each a ∈ A, we can use Axiom 5
(replacement) to obtain

∃ ∀ ∃ ∼
ξ ∈ ON ∧ (a↓ , <) = (ξ, ∈) ,
B a∈G ξ∈B
justifying the function definition f : G −→ B, a 7→ f (a). Let fa denote the correspond-

ing unique isomorphism fa : a↓ −→ f (a). If c ∈ a↓ , then gc := fa ↾c↓ : c↓ −→ gc (c↓ ) is
surjective and strictly isotone (as fa is strictly isotone), implying gc to be an isomor-
phism by Lem. 2.63(iii). Moreover, Lem. 2.65 yields gc (c↓ ) = fa (c↓ ) = (fa (c))↓ = fa (c),
as fa (c) is an ordinal. The uniqueness of isomorphisms now implies gc = fc . Thus, we
have shown
∀ ∀ c ∈ G ∧ fc = fa ↾c↓ ∧ f (c) = fa (c) . (2.35)
a∈G c∈a↓
Thus, c, a ∈ G with c < a implies f (c) = fa (c) ∈ f (a), showing f : G −→ f (G) ⊆ ON

to be strictly isotone. As it is also surjective, it is an isomorphism by Lem. 2.63(iii). We
finish the proof by showing f (G) ∈ ON and G = A. To verify f (G) being transitive,
consider a ∈ G and ν ∈ f (a). Since fa : a↓ ∼ = f (a), there exists c ∈ a↓ with ν = f (c).
Then c ∈ G (cf. (2.35)) and ν = f (c) ∈ f (G), showing f (G) to be transitive and,
thus, f (G) ∈ ON by Cor. 2.58(b). Finally, seeking a contradiction, assume G 6= A
and let m := min(A \ G). Then m↓ = G: For a ∈ m↓ , one has a < m and, thus,
a ∈ G (since m := min(A \ G)), showing m↓ ⊆ G; conversely, if a ∈ G, then a < m
(otherwise m < a, implying m ∈ G by (2.35)) and, thus, a ∈ m↓ , showing G ⊆ m↓ . Now
(m↓ , <) = (G, <) ∼ = (f (G), ∈), i.e. m ∈ G in contradiction to m ∈ A \ G. Summarizing,
we have proved f : (A, <) ∼ = (α, ∈) with α := f (G).
2.8 Infinity
2.8.1 Natural Numbers
The following axiom of infinity guarantees the existence of infinite sets (e.g., it will allow
us to define the set of natural numbers N, which is infinite by Th. 2.80 below).
Axiom 6 Infinity:
∃ 0∈X ∧ ∀ (x ∪ {x} ∈ X) . (2.36)
X x∈X
Thus, the infinity axiom states the existence of a set X containing ∅ (iden-
tified with the number 0), and, for each of its elements x, its successor
S(x) = x ∪ {x}.
Example 2.68. We would like to check which of our toy models M1 , . . . , M10 of Def. 2.2
satisfies Axiom 6 (one might expect that the answer is “none”, since the models all are
finite, however M10 will show that Axiom 6 does not guarantee the existence of infinite
sets in the absence of comprehension). There is a slight complication arising from the
fact that the formulation of (2.36) already makes use of Axiom 1 (extensionality) and
Axiom 4 (union). Therefore, for the purpose of this example only, we replace (2.36) by

∃ ∃ ∀ y∈ /Y ∧ ∀ ∃ x ∈ Z ∧ (u ∈ x ⇒ u ∈ Z) . (2.37)
X Y ∈X y x∈X Z∈X
Indeed, each M1 , . . . , M9 violates (2.37): In D1 , there exists no set containing a; in D2 ,

D3 , and D4 there exists no empty set. To abbreviate the following arguments, we say
that X is x-bad, if x ∈ X and there does not exist Z ∈ X with x ∈ Z. Note that X
violates (2.37), if there exists x such that X is x-bad. In D5 , a is empty and b, c are
both a-bad; in D6 , a is empty, b is a-bad, c is b-bad, and d is c-bad; in D7 , a is empty, b
is a-bad, c is b-bad; in D8 , b is empty, c is b-bad; in D9 , a is empty, b is a-bad, c is b-bad,
d is c-bad, e is c-bad. In M10 , (2.37) does hold: Indeed, (2.37) is satisfied with X := b: b
contains the empty set a and the second part of (2.37) is also satisfied: For both x := a
and x := b, one can choose Z := b, since both a and b are in b, and each element of
either a (there is none) or b (namely a and b) are also in b. Once again, this example
shows that Axiom 6 does not guarantee the existence of infinite sets in the absence of
comprehension.
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
Axiom 5 (Replacement) T T T F F F F F F T
Axiom 6 (Infinity) F F F F F F F F F T
—
REFERENCES 53
We now proceed to define the natural numbers:
Definition 2.69. An ordinal n is called a natural number if, and only if,

n 6= 0 ∧ ∀ m ≤ n ⇒ m = 0 ∨ m is successor ordinal .
m∈ON
Proposition 2.70. If n = 0 or n is a natural number, then S(n) is a natural number

and every element of n is a natural number or 0.
Proof. Suppose n is 0 or a natural number. If m ∈ n, then m is an ordinal by Prop.

2.53. Suppose m 6= 0 and k ∈ m. Then k ∈ n, since n is transitive. Since n is a natural
number, k = 0 or k is a successor ordinal. Thus, m is a natural number. It remains
to show that S(n) is a natural number. By definition, S(n) = n ∪ {n} 6= 0. Moreover,
S(n) ∈ ON by Prop. 2.61(a), and, thus, S(n) is a successor ordinal. If m ∈ S(n), then
m ≤ n, implying m = 0 or m is a successor ordinal, completing the proof that S(n) is
a natural number.
References
[Bla84] A. Blass. Existence of Bases Implies the Axiom of Choice. Contemporary
Mathematics 31 (1984), 31–33.
[Can95] G. Cantor. Beiträge zur Begründung der transfiniten Mengenlehre (1).

Math. Ann. 46 (1895), No. 4, 481–512 (German).
[EFT07] H.-D. Ebbinghaus, J. Flum, and W. Thomas. Einführung in die math-

ematische Logik, 5th ed. Spektrum Akademischer Verlag, Heidelberg, 2007
(German).
[Hal17] Lorenz J. Halbeisen. Combinatorial Set Theory, 2nd ed. Springer Mono-
graphs in Mathematics, Springer, Cham, Switzerland, 2017.
[Jec73] T. Jech. The Axiom of Choice. North-Holland, Amsterdam, 1973.
[Kun80] Kenneth Kunen. Set Theory. Studies in Logic and the Foundations of Math-
ematics, Vol. 102, North-Holland, Amsterdam, 1980.
[Kun12] Kenneth Kunen. The Foundations of Mathematics. Studies in Logic,

Vol. 19, College Publications, London, 2012.
REFERENCES 54
[Kun13] Kenneth Kunen. Set Theory. Studies in Logic, Vol. 34, College Publica-
tions, London, 2013, revised edition.
[Phi19] P. Philip. Linear Algebra I. Lecture Notes, Ludwig-Maximilians-Universität,

Germany, 2018/2019, AMS Open Math Notes Ref. # OMN:202109.111304,
available in PDF format at
https://www.ams.org/open-math-notes/omn-view-listing?listingId=111304.
[Rus80] Bertrand Russell. Correspondence with Frege, Gottlob Frege: Philosophi-

cal and Mathematical Correspondence. Translated by Hans Kaal. W.W. Nor-
ton & Company, 1980.
[Rus96] Bertrand Russell. The Principles of Mathematics, reprint of 2nd ed.

W.W. Norton & Company, New York, 1996, first published in 1903.
[Wik21] Wikipedia Contributors. HOL Light — Wikipedia, The Free Encyclo-

pedia. https://en.wikipedia.org/w/index.php?title=HOL_Light, 2021,
[Online; accessed 5-February-2023].
[Wik22a] Wikipedia Contributors. Berry paradox — Wikipedia, The Free Encyclo-

pedia. https://en.wikipedia.org/w/index.php?title=Berry_paradox,
2022, [Online; accessed 22-December-2023].
[Wik22b] Wikipedia Contributors. Coq — Wikipedia, The Free Encyclopedia.

https://en.wikipedia.org/w/index.php?title=Coq, 2022, [Online; ac-
cessed 5-February-2023].
[Wik22c] Wikipedia Contributors. Isabelle (proof assistant) — Wikipedia,

The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=
Isabelle_(proof_assistant), 2022, [Online; accessed 5-February-2023].
[Wik22d] Wikipedia Contributors. Lean (proof assistant) — Wikipedia, The

Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Lean_
(proof_assistant), 2022, [Online; accessed 5-February-2023].

Philippeter AxiomaticSetTheory

Uploaded by

Copyright:

Available Formats

Philippeter AxiomaticSetTheory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Philippeter AxiomaticSetTheory

Uploaded by

Copyright:

Available Formats

Axiomatic Set Theory

May 15, 2024

2 Zermelo-Fraenkel Set Theory (ZF) 13

2.7.2 Relations and Functions . . . . . . . . . . . . . . . . . . . . . . . 30

3 The Axiom of Choice 71

1 Motivation and Preliminaries

1.1 Cantor’s Definition, Russell’s Antinomy

1.2 Mathematical Logic

The corresponding truth table reads:

following truth table:

(c) Commutativity of Conjunction: ⊢ A ∧ B ⇔ B ∧ A.

(d) Commutativity of Disjunction: ⊢ A ∨ B ⇔ B ∨ A.

(e) Associativity of Conjunction: ⊢ (A ∧ B) ∧ C ⇔ A ∧ (B ∧ C).

(f ) Associativity of Disjunction: ⊢ (A ∨ B) ∨ C ⇔ A ∨ (B ∨ C).

(g) Distributivity I: ⊢ A ∧ (B ∨ C) ⇔ (A ∧ B) ∨ (A ∧ C).

(h) Distributivity II: ⊢ A ∨ (B ∧ C) ⇔ (A ∨ B) ∧ (A ∨ C).

(i) De Morgan’s Law I: ⊢ ¬(A ∧ B) ⇔ ¬A ∨ ¬B.

(j) De Morgan’s Law II: ⊢ ¬(A ∨ B) ⇔ ¬A ∧ ¬B.

(k) Double Negative: ⊢ ¬¬A ⇔ A.

(l) Contraposition: ⊢ (A ⇒ B) ⇔ (¬B ⇒ ¬A).

(b) – (h): Exercise.

Thus, one possibility of proving a statement B to be true is to show ¬B ⇒ A ∧ ¬A for

(b) Transitivity of Equivalence: ⊢ (A ⇔ B) ∧ (B ⇔ C) ⇒ (A ⇔ C).

Definition and Remark 1.10. A proof of the statement B is a finite sequence of

1.3 Set-Theoretic Formulas

(i) vi ∈ vj is a set-theoretic formula for all i, j = 1, 2, . . . .

∈ is contained as an element in the set to the right of ∈. Similarly, = is interpreted to

Remark 1.14. A disadvantage of set-theoretic formulas as defined in Def. 1.11 is that

We also define the universal quantifier ∀:

∀vj (φ) is short for ¬(∃vj (¬(φ))), (1.7d)

∀ (φ ⇒ ψ) transcribes ¬(∃v1 (¬((¬(φ)) ∨ (ψ)))), (1.7e)

6 vj is short for ¬(vi = vj );

Definition and Remark 1.15. We say that a variable vj , occurring in a set-theoretic

(d) The formula

2 Zermelo-Fraenkel Set Theory (ZF)

Definition and Remark 2.1. An axiom is a statement that is assumed to be true

2.1 Existence, Extensionality

Recall that this is just meant to be a more readable transcription of the

Before we discuss further consequences of extensionality, we would like to have the

2.2 Models, Independence Results

Rem. 2.4 below):

(2.1) holds in M2 , M3 , M4 , whereas M1 , M5 , . . . , M10 do have an “empty set”: In

Axiom 2 Comprehension Scheme: For each set-theoretic formula φ, not containing Y

and extensionality implies Y = Y ′ . Thus, due to extensionality, the set Y given by

Proof. Axiom 0 provides the existence of a set X. Then comprehension allows us to

Then ψ and χ are equivalent if

Example 2.11. In view of Rem. 2.10, assume Axiom 1 (extensionality).

It is also customary (and useful) to define intersections

where I 6= ∅ is a nonempty index set, i0 ∈ I 6= ∅ is an arbitrary fixed element of I,

Formally, the family (Mi )i∈I is a function f : I −→ N , Mi := f (i) for each i ∈ I,

(c) Suppose φ in (2.2) is the formula x ∈

Then, if 0 := ∅, 1 := {0}, 2 := {0, 1}, we obtain

Example 2.12. Another example of extensionality consequences is the important result

Definition 2.13. (a) If φ is a set-theoretic formula, then we call {x : φ} a class,

is true. Assuming Axiom 1 (extensionality), X is then actually unique and we

forms a proper class.

would also be a set by comprehension. However, this is in contradiction to R being

In consequence of the pairing axiom, the sets

all exist. More generally, we may define: