0% found this document useful (0 votes)
16 views11 pages

Research Paper 5

The paper analyzes the Cantor–Bernstein theorem, asserting that there is essentially one proof with two variants by Dedekind and Zermelo. It employs proof theory to explore the argument structures of various proofs, emphasizing the foundational lemma related to the theorem. The study aims to contribute to a broader understanding of proof methods in mathematics, aligning with Hilbert's vision for a theory of proof methodology.

Uploaded by

owekesa361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views11 pages

Research Paper 5

The paper analyzes the Cantor–Bernstein theorem, asserting that there is essentially one proof with two variants by Dedekind and Zermelo. It employs proof theory to explore the argument structures of various proofs, emphasizing the foundational lemma related to the theorem. The study aims to contribute to a broader understanding of proof methods in mathematics, aligning with Hilbert's vision for a theory of proof methodology.

Uploaded by

owekesa361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

The Cantor–Bernstein

theorem: how many proofs?


royalsocietypublishing.org/journal/rsta
Wilfried Sieg
Department of Philosophy, Carnegie Mellon University, Pittsburgh,
PA, USA
Research WS, 0000-0002-7130-0524
Citation: Sieg W. 2019 The
Cantor–Bernstein theorem: how many proofs? Dedekind’s proof of the Cantor–Bernstein theorem
is based on his chain theory, not on Cantor’s well-
Phil. Trans. R. Soc. A 377: 20180031.
ordering principle. A careful analysis of the proof
http://dx.doi.org/10.1098/rsta.2018.0031 extracts an argument structure that can be seen in
the many other proofs that have been given since.
Accepted: 11 September 2018 I contend there is essentially one proof that comes in
two variants due to Dedekind and Zermelo, respectively.
This paper is a case study in analysing proofs of
One contribution of 11 to a theme issue ‘The a single theorem within a given methodological
notion of ‘simple proof’ - Hilbert’s 24th framework, here Zermelo–Fraenkel set theory (ZF).
problem’. It uses tools from proof theory, but focuses on
heuristic ideas that shape proofs and on logical
Subject Areas: strategies that help to construct them. It is rooted
mathematical logic in a perspective on Beweistheorie that predates its
close connection and almost exclusive attention to the
Keywords: goals of Hilbert’s finitist consistency programme. This
earlier perspective can be brought to life (only) with
Cantor–Bernstein theorem, automated proof
the support of powerful computational tools.
search, natural formalization, interactive This article is part of the theme issue ‘The notion of
theorem proving, formal verification, partial ‘simple proof’ - Hilbert’s 24th problem’.
proofs

Author for correspondence:


Wilfried Sieg
1. Context
e-mail: [email protected] The Cantor–Bernstein theorem (CBT) or Schröder–Bernstein
theorem or, simply, the Equivalence theorem asserts the
existence of a bijection between two sets a and b,
assuming there are injections f and g from a to b and
from b to a, respectively. Dedekind [1] was the first to
prove the theorem without appealing to Cantor’s well-
ordering principle in a manuscript from 1887. The proof
was published with a Note of Emmy Noether in the
third volume of his Gesammelte mathematische Werke [2].
In a letter of 29 August 1899, Dedekind communicated
a slightly different proof to Cantor; the letter was
included in Cantor’s Gesammelte Abhandlungen with
Zermelo as editor [3]. Zermelo mentions in his Note to the

2019 The Author(s) Published by the Royal Society. All rights reserved.

This content downloaded from


3.7.3.255 on Thu, 27 Jul 2023 20:50:58 +00:00
All use subject to https://about.jstor.org/terms
correspondence that he was not aware of Dedekind’s proof when, in [4], he published his
2
own proof and explicitly appealed to Dedekind’s chain theory. Noether claims in her Note that
Dedekind’s proof is ‘exactly the same’ as Zermelo’s, whereas Zermelo more cautiously considers

royalsocietypublishing.org/journal/rsta Phil. Trans. R. Soc. A 377: 20180031


...............................................................
the two proofs as only ‘inessentially different’.
My analysis of Dedekind’s proof isolates an argument structure that can be recognized in the
many other, presumably different, proofs.1 I contend there is essentially one proof that comes in
two variants due to Dedekind and Zermelo, respectively. The variations are seen most clearly in
the proofs of Dedekind’s fundamental lemma that is equivalent to the CBT. The fundamental lemma
asserts: if h is a bijection from a to e and d is a set with e ⊆ d ⊆ a, then there is a bijection from a to
d. Having analysed the proofs of Dedekind and Zermelo, I will discuss a third proof due to König
[6], which joins in a certain sense the first two. The Knaster–Tarski fixed-point theorem explains
why the analysed proofs yield exactly one of two canonical bijections: Dedekind’s is associated
with the smallest and Zermelo’s with the largest fixed-point of a particular monotone operation;
the latter is defined in terms of the function h from the fundamental lemma.
The proof analysis of §2 is in partial accord with the expectation Hilbert expressed in his 24th
Problem2 for ‘a theory of the method of proof in mathematics in general’. That is discussed in
§3, after a framework has been described for formalizing the proofs of the CBT in a natural
and faithful way. Indeed, the proofs have been carried out in Zermelo–Fraenkel set theory (ZF),
using the theorem prover AProS as a ‘proof assistant’ [8]. This work is part of my project that is
taking a step towards a theory Hilbert envisioned. It is mathematical, as it examines ‘the method
of proof in mathematics’ through its refined formal garb; it is also philosophical, as it explores
the capacities of the human mathematical mind with its distinctive strategic approaches and
heuristic insights. It has its roots in proof-analytic studies like Mahlo’s [9], Hilbert’s programmatic
formulations in his [10,11], as well as the considerations in Gentzen [12,13] and MacLane [14,15].
These roots are uncovered in [16]. For this paper, I restrict myself to indicating programmatic
directions in §4.3

2. Dedekind’s fundamental lemma


The core of Dedekind’s considerations is reflected in the proof of his fundamental lemma.
Fundamental lemma: Let h be a bijection from a to e and let d be a set with e ⊆ d ⊆ a; then there is
a bijection from a to d.
The equivalence of the fundamental lemma to the CBT is easily established. Assume CBT as
formulated in §1. The bijection h between a and e yields, by simply changing its co-domain to d, an
injection from a to d; analogously, the identity on d yields, again by changing its co-domain to a,
an injection from d to a. CBT guarantees the existence of a bijection between a and d. Conversely,
assume the fundamental lemma. The composition g ◦ f of f and g is a bijection between a and
g ◦ f [a], but g ◦ f [a] ⊆ g[b] and g[b] ⊆ a. Thus, the fundamental lemma gives a bijection between a
and g[b]; composing this bijection with the inverse of g (that is a bijection between g[b] and b)
yields a bijection between a and b.
A diagrammatic presentation of the consideration underlying Dedekind’s proof of the
fundamental lemma is given in figure 1. Taking for granted the possibility of making explicit the

1
An account of proofs and their history is given in [5], a 429-page book that is very informative and quite comprehensive.
However, Hinkis does not provide a unifying mathematical examination.
2
The 24th Paris problem was formulated for but not included in the publication of Hilbert’s Paris lecture. For the discovery
of Hilbert’s manuscript and its significance, see [7].
3
I have profited from a stay in Lisbon that was partially supported by the FCT project ‘Hilbert’s 24th problem’ (PTDC/MHC-
FIL/2583/2014). The challenge of giving four seminar talks during this stay in October 2017 helped me to better organize
my considerations; I am grateful, in particular, to Mirko Engeler, Reinhard Kahle and Isabel Oitavem. Some aspects of the
considerations presented here are reported with many more details in [8,16]. I gave a version of this paper at the Joint
Mathematics Meeting in San Diego on 13 January 2018 as part of the Special Session on Alternative Proofs that John Dawson
had organized.

This content downloaded from


3.7.3.255 on Thu, 27 Jul 2023 20:50:58 +00:00
All use subject to https://about.jstor.org/terms
3
–a a

royalsocietypublishing.org/journal/rsta Phil. Trans. R. Soc. A 377: 20180031


= a\d

...............................................................
–h1[a–] Æ h2[a–] Æ º
d

Figure 1. Dedekind’s way.

– a
d
= d \e
d

– –
–h1[d ] Æ h2[d ] Æ º e

Figure 2. Zermelo’s way.

set c that is obtained from a\d by finitely iterating h, one defines

h1 : c → h[c] with h1 (x) = h(x) and


h2 : a\c → a\c with h2 (x) = id(x).

We have d\h[c] = (d ∪ (a\d))\(h[c] ∪ (a\d)). With d ∪ (a\d) = a and the structural identity c = (a\d) ∪
h[c] = h[c] ∪ (a\d) we have d\h[c] = a\c. Thus, h2 is a function from a\c to d\h[c]. Both h1 and h2
are bijections; c and a\c partition a, whereas h[c] and d\h[c] partition d. So, the union h* of h1 and
h2 is a bijection from a to d. The above structural identity articulates a crucial insight concerning
inductively defined sets: their elements are either in the initial set (above, in a\d) or have been
obtained by the iterating function (above, h).
A modification of this proof establishes that there is a bijection h* from d to e. Let c* be the
set obtained from d\e by finitely iterating h; define h* from d to e by h*(x) = h(x) if x is in c* and
h*(x) = id(x) if x is in d\c* (figure 2). The composition of h with the inverse of h* is a bijection h**
from a to d. This bijection can be directly defined by exploiting c* as follows: h**(x) = h(x) if x is in
a\c* and h**(x) = id(x) otherwise. This is Zermelo’s argument for the Equivalence theorem. Note
that for the two arguments above (and also for König’s below) the important case arises when
a\d and d\e are non-empty; otherwise, the identity on a, respectively the given bijection h, can be
taken as the sought-after bijection.4
König’s proof was published in [6]; his informal argument is presented rigorously in ([19], p.
55). Adapted to my set up, it is seen to join the earlier considerations using both c and c* (figure 3.)

4
An informative analysis of Zermelo’s proof is found in [17, pp. 508–509]. Sieg & Walsh [8] recast a proof of CBT given in [18].
The bijection obtained from Banach’s proof is Dedekind’s h*.

This content downloaded from


3.7.3.255 on Thu, 27 Jul 2023 20:50:58 +00:00
All use subject to https://about.jstor.org/terms
4
a
a–

royalsocietypublishing.org/journal/rsta Phil. Trans. R. Soc. A 377: 20180031


...............................................................
–h1[a–] Æ h2[a–] Æ º d


d – –
–h1[d ] Æ h2[d ] Æ º e

Figure 3. König’s way.

Let r be e\(h[c] ∪ h[c*]) and define h1 *(x) = h(x), if x is in c, and h1 *(x) = id(x), if x is in c* ∪ r, and
h2 *(x) = h(x), if x is in c ∪ r, and h2 *(x) = id(x), if x is in c*.
It is not difficult to verify that h1 * is the h* from Dedekind’s proof and that h2 * is the h** from
Zermelo’s proof. Here are the definitions side by side:

h*(x) = h(x) if x is in c and h*(x) = id(x) if x is in a\c;


h1 *(x) = h(x) if x is in c and h1 *(x) = id(x) if x is in c* ∪ r

and

h**(x) = h(x) if x is in a\c* and h**(x) = id(x) if x is in c*;


h2 *(x) = h(x) if x is in c ∪ r and h2 *(x) = id(x) if x is in c*.

We have only to observe that, in the first case, a\c = c* ∪ r and, in the second case, a\c* = c ∪ r. h*
and h** are the canonical mappings that are obtained also in all the other proofs I have analysed.5
There are two important and problematic issues in the above arguments; first, we have to
find for the informally described sets c and c* an explicit set-theoretic definition and, second,
we have to prove the structural identities. If one defines c ‘from below’ as ∪ [hn [a\d] | n ∈ N]
with h0 [a\d] = a\d and hn+1 [a\d] = h[hn [a\d]], then it is immediate that c = (a\d) ∪ h[c]. This
approximation of c from below is the central construction in Bernstein’s proof [21]. Its standard
diagrammatic presentation, as given for example in [22, pp. 11–12], can be adapted as in figure 4
for the proof of Dedekind’s fundamental lemma in the following way, because d is a subset of a.
The bijection h*: a → d that is obtained ‘from the diagram’ is indeed Dedekind’s, as it is defined
by h*(x) = h(x) if x is in ∪ [hn [a\d] | n ∈] and h*(x) = id(x) if x is in a\∪ [hn [a\d] | n ∈ N]. However,
Dedekind wanted to avoid any appeal to natural numbers in the development of his general
theory of chains; after all, the natural numbers were to be founded on it. Once the natural numbers
had been given a chain-theoretic characterization, Dedekind established that the approximation
from below and above (as the intersection of all chains containing a\d and closed under h)
yield the same set.6 The latter characterization is going to be discussed next to address the two
problematic and deeply related issues I just pointed to: an explicit set-theoretic definition of c and
c* as well as the proof of the structural identities.

5
Many proofs, including König’s original one and more contemporary proofs like that of Doyle & Conway [20], define a
partition of a into sets c, c* and r based on the basic insight underlying figure 3; cf. also footnote 4. Scott Weinstein pointed
me to Doyle and Conway’s paper and provided a proof of the graph-theoretic fact that is crucially used there to prove CBT.
Weinstein’s proof emphasized for me the parallelism of the Doyle and Conway argument to that of König.
6
In the last part of #131 of [23], one finds an unnumbered theorem that expresses this identity. The theorem guarantees the
existence of the approximation from above—on the basis of the existence of N. In that restricted sense, the general theory
is dependent on N: the infinity axiom guarantees the existence of N in ZF; N together with the Replacement Principle (and
the union axiom) ensures the existence of a set that contains a\d and is closed under h; thus, the intersection is applied to a
non-empty set.

This content downloaded from


3.7.3.255 on Thu, 27 Jul 2023 20:50:58 +00:00
All use subject to https://about.jstor.org/terms
–a –]
h[a h2[a–] 5
a

royalsocietypublishing.org/journal/rsta Phil. Trans. R. Soc. A 377: 20180031


...............................................................
h id h h
id º

d
–]
h[a h2[a–] h3[a–]

Figure 4. Bernstein’s way.

3. Chains and formal proofs


Dedekind and Zermelo used chains of a given system s ⊆ a to capture the set that is obtained from
s by finitely iterating the bijective function h. This way of proceeding does not take for granted
the natural numbers. (However, see footnote 6.) In Dedekind’s case, the chain of a\d given h is
defined as ∩ {x ⊆ a | a\d ⊆ x and h[x] ⊆ x}, i.e. as the smallest subset of a that contains a\d and is
closed under h. It is a non-trivial task to establish the structural identity when c is defined in this
way; Dedekind of course addressed and solved the problem (in theorems #57 and #58 of WZ).
A directly related way of making the inductively defined set c explicit is Knaster and Tarski’s
fixed-point construction. Consider the mapping m from ℘(a) to ℘(a) defined by m(x) = a\d ∪
h[x]; m is a monotone operation and has a smallest as well as a largest fixed-point [24].7 The
smallest fixed-point of m is defined by ∩ {x ⊆ a | m(x) ⊆ x}, the largest by ∪ {x ⊆ a| x ⊆ m(x)}. By
exploiting the definition of m it is quite direct to see that c is the smallest fixed-point of m, whereas
c ∪ r is its largest. Dedekind’s function h* is thus associated with the smallest and Zermelo’s
function h** with the largest fixed-point of m. The structural identity in Dedekind’s proof of the
fundamental lemma is easily obtained; after all, for the smallest fixed-point we have m(c) = c and by
the definition of m we obtain c = a\d ∪ h[c]. (That holds, in complete analogy, also for Zermelo’s
proof.)
I have discussed a number of proofs of Dedekind’s fundamental lemma and isolated the central
ideas. How ‘different’ are these proofs?—How can we analyse the differences or similarities
between the proofs in greater detail? To answer this question, we built a formal framework
that is, on the one hand, perfectly precise and, on the other hand, sufficiently flexible to reflect
the structure of argumentation in mathematical practice. The basic axiomatic framework is the
Zermelo-Fraenkel system ZF for set theory; clearly, other methodological frameworks could be
chosen for this kind of work. The logical inference mechanism is a version of Gentzen’s natural
deduction calculus that allows the bi-directional construction of partial proofs in pure logic and,
thus, reasoning with gaps. These intercalation proofs are represented as partial Fitch diagrams. The
completed or full proofs, when strategically constructed, are easily seen to be normal proofs in
natural deduction.8 The meaning of the logical connectives is expressed through introduction
and elimination rules; these considerations are extended from connectives to defined notions and
operations in order to develop a hierarchy of definitional extensions for ZF.
Those are the bases for natural formalization. In addition to exploiting logical strategies, it builds
connections between the conceptual organization and the construction of proofs via the use of
7
In footnote 4, I mentioned Banach’s proof of the CBT. His focus on partitioning leads directly to the consideration of fixed-
points. That is discussed in [8, in section 2, pp. 8–10].
8
The underlying logical work was presented in two early publications, [25,26]; it has been implemented in more and more
refined, efficient versions since 2002 or so for full predicate logic, classical and intuitionist. Over the last 5 or 6 years, it has been
extended to allow the full formal proof of the CBT in ZF. That work is described in [8] and can be inspected at http://www.
phil.cmu.edu/legacy/Proof_Site/. See also §4, where some important structural features of normal proofs are discussed.

This content downloaded from


3.7.3.255 on Thu, 27 Jul 2023 20:50:58 +00:00
All use subject to https://about.jstor.org/terms
lemmas as rules. The most relevant lemmas are of a simple logical form, just universally quantified
6
conditionals whose antecedents are conjunctions. These connections can also be formed in a bi-
directional way. If the hypotheses of a lemma have been proved individually, then the lemma

royalsocietypublishing.org/journal/rsta Phil. Trans. R. Soc. A 377: 20180031


...............................................................
justifies immediately the step to the conclusion in the ‘top-down’ or ‘forward’ direction. If a
particular statement is to be proved by a selected lemma, then the hypotheses of the lemma are
new proof obligations in the ‘bottom-up’ or ‘backward’ direction. The interaction of building a
conceptual framework and carrying out formal proofs is quite dynamic, even after the central
line of the informal argument has been articulated: it faces the very same issues that have to be
addressed by any attempt to organize a mathematical argument in the most perspicuous way.
The final proof of the CBT is the pinnacle of the natural formalization. The lemmas, to which
the proof appeals and which go beyond the fundamental lemma, are listed in appendix A and are
rather direct observations. Here is the formal proof of CBT, where 1(g ◦ f ) denotes the composition
of f and g, when its co-domain is restricted to the image of its domain:

1. fŒinj(a,b) Prem
2. gŒinj(b,a) Prem

3. 1(g ∞ f)Œbij(a,g ∞ f[a]) Theorem (Core12) 1, 2


4. g[b]Õa Theorem (Func17) 2
5. g ∞ f[a]Õg[b] Theorem (Comp11) 1, 2
6. aªg[b] Theorem (Fundamental Lemma) 3, 4, 5

7. bªg[b] Theorem (Equi4) 2


8. aªb Theorem (Equi8) 6, 7

The additional lemmas, appealed to on lines 3, 4, 5, 7 and 8, do not at all touch the central
considerations that lead to the fundamental lemma. That part begins with proving the structural
identity for chains c = b ∪ h[c ] of a subset b of a, where a is any system and h any function from a
to a. This general fact can be instantiated for the two chains c and c*; that fact, in turn, allows the
partitions used in the proofs of the fundamental lemma to define the bijections h* and h**. Here is a
diagrammatic summary:

Cantor–Bernstein theorem
|
fundamental lemma

definition of h* definition of h**


| |
partition based on c partition based on c*
| |
structural identity for c structural identity for c*

general structural identity

The proofs of the structural identity for the two chains are instances of the same proof presented in
its general form below. This almost linear proof is not presented for its completed structure, but
for the possibility of indicating how it was constructed through a sequence of partial proofs with
gaps that are filled successively by forward and backward moves. (I use the line numbering of

This content downloaded from


3.7.3.255 on Thu, 27 Jul 2023 20:50:58 +00:00
All use subject to https://about.jstor.org/terms
the completed proof; the numbering in the sequence of partial proofs is dynamic. The principles
7
for chains) that are appealed to in the following proof are listed in appendix A.)

royalsocietypublishing.org/journal/rsta Phil. Trans. R. Soc. A 377: 20180031


...............................................................
1. hŒfunc(a,a) Prem
2. bÕa Prem
3. ($×)× = ¢(a,b,h) Theorem (Chain28) 2, 1
4. z1= ¢(a,b,h) Assum
5. bÕb » h[z1] Theorem (Bool1)
6. z1Õa Theorem (Chain 8 - Named) 4, 2, 1

7. h[z1]Õz1 Theorem (Chain11 - Named) 4, 2, 1


8. bÕz1 Theorem (Chain 10 - Named) 4, 2, 1
9. b » h[z1]Õz1 Theorem (Bool6) 8, 7

10. h[h[z1]]Õh[z1] Theorem (Func12) 1, 7, 6


11. h[b]Õh[z1] Theorem (Chain 14 - Named) 4, 2, 1
12. h[b] » h[h[z1]]Õh[z1] Theorem (Bool6) 11, 10

13. h[z1]Õa Theorem (Func10) 1, 6


14. h[b] » h[h[z1]] = h[b » h[z1]] Theorem (Func13) 1, 2, 13
15. h[b » h[z1]]Õh[z1] =E 14, 12
16. h[b » h[z1]]Õb » h[z1] Theorem (Bool4) 15
17. chain[b » h[z1],h) Defl (Chain) 16

18. b » h[z1]Õa Theorem (Mem6) 9, 6


19. z1Õb » h[z1] Theorem (Chain 12 - Named) 4, 17, 5, 18, 1

20. z1 = b » h[z1] Theorem (Mem4) 19, 9


21. ¢(a,b,h) = b » h[¢(a,b,h)] =E 4, 20

22. ¢(a,b,h) = b » h[¢(a,b,h)] $E 3, 21

The construction begins with the partial proof consisting of the premises 1–2 and the goal 22. To
allow the introduction of a (temporary) name for the complex term ¢(a,b,h) denoting the chain
of the system b given a and h, we employ the theorem in 3 and apply the elimination rules for
the existential quantifier and for identity to obtain the partial proof with lines 1–4 and 20–22.
Here is where the core of the proof begins, namely, to establish the identity in 20. That leads to
two new goals with gaps, thus to the partial proof: 1–4 . . . gap1 . . . 9 . . . gap2 . . . 19–22. The
reader, I hope, is now in a position to see how these new gaps are closed and how the proof of the
structural identity is completed.
Given the earlier diagrammatic summary of the two parallel ways of obtaining the CBT from
the structural identity, it seems that the multitude of proofs of the theorem has been reduced to
essentially one proof by analysing crucial concepts and related techniques. This case study presents
proof-theoretic investigations that are quite different from the standard ones (in pursuit of modified
Hilbert programmes). Nevertheless, it uses crucial insights from the traditional work and not only
opens new directions rooted in the earlier work, but actually takes up deep programmatic themes.

4. Programmatic directions
For his Paris list of mathematical problems, Hilbert had prepared a 24th problem that was not
included in their final publication [10]. As mentioned already in §1, this hastily formulated
problem called for the development of ‘a theory of the method of proof in mathematics in general’.
Hilbert made the bold claim that ‘under a given set of conditions there can be but one simplest
proof’, without indicating a notion of simplicity. If there should be two proofs for a theorem, then,

This content downloaded from


3.7.3.255 on Thu, 27 Jul 2023 20:50:58 +00:00
All use subject to https://about.jstor.org/terms
Hilbert demanded,
8

. . . you must keep going until you have derived each from the other, or until it becomes

royalsocietypublishing.org/journal/rsta Phil. Trans. R. Soc. A 377: 20180031


...............................................................
quite evident what variant conditions (and aids) have been used in the two proofs. [My
emphasis.]

The strategic conceptual necessity underlying the proofs of the fundamental lemma has to be
distinguished from the mathematical set-theoretic necessity of defining the set that is obtained by
finitely iterating an operation. The strategic conceptual necessity is realized through the two
variant conditions that separate Dedekind’s from Zermelo’s proof. As far as the other necessity
is concerned, three aids have been exploited to find explicit set-theoretic definitions: Bernstein’s
approximation from below, Dedekind’s approximation from above, and the Knaster–Tarski fixed-
point construction. The first way requires the availability of the natural numbers, whereas the
second and third approaches yield exactly the same two sets, c and c*.
In his Zürich talk of September 1917, Hilbert implicitly resumed the project outlined in the 24th
problem and called for the investigation of ‘the concept of the specifically mathematical proof’. It
was clear that the logical calculi Frege, Peano, Whitehead and Russell had developed would play
a crucial role in such an investigation. In lectures of the winter term 1917/18, Hilbert & Bernays
[27] used the Principia Mathematica calculus when sketching the formal development of number
theory and analysis. As the sustained formal work was far too unwieldy, they introduced in early
1922 a novel logical calculus with two explicit goals. The first goal was methodological, whereas
the second was entirely pragmatic:

(1) formulate a group of characteristic axioms for each logical connective and fix in this way
the logically relevant meaning of connectives,9 and
(2) make it easier to formalize mathematical arguments as well as to guarantee the
intelligibility of the formal object representing the informal proof.

Gentzen’s natural deduction systems are rule-based versions of the Hilbert–Bernays calculi,
but introduce one completely new feature: making and discharging assumptions. Gentzen viewed
that feature as an essential reflection of mathematical practice. A subclass of proofs in natural
deduction calculi, so-called normal ones that do not make detours, have most striking structural
properties, among them the subformula property.10 As there was no direct way of generating
normal proofs similar to that of generating cut-free proofs in sequent calculi, the question was:
How can those properties be exploited for shaping a search for proofs? Intercalation calculi
address exactly this problem. The systematic bi-directional use of elimination and introduction
rules underlies the completeness proof for these calculi and produces either normal proofs or
allows the formulation of a counterexample. The structural features of normal proofs motivate
particular strategic moves to make proof search efficient and always goal-directed.
Let us return to Hilbert’s 24th problem and the question, when two proofs in mathematics
should be considered to be the same. Recall that Noether viewed Dedekind’s and Zermelo’s proofs
as ‘exactly the same’. If Beweistheorie is to be a theory of mathematical arguments then, ultimately,
one has to find a criterion that relates the identity of proofs to the literal identity of syntactic
configurations. The latter are, or have been obtained from, formal representatives of the two proofs.
That raises, of course, the question when a formal derivation can be viewed as representing an
informal proof. Neither the question of proof representation nor the topic of proof identity can be

9
This was done explicitly to mimic for logic what Hilbert had done in Grundlagen der Geometrie [28], namely, fix the
mathematically relevant meaning of each geometric notion through a group of axioms.
10
For these structural properties, I refer to [29], in particular to the sections on The form of normal deductions in chapters III
and IV. Prawitz shows there that every branch (path) can be divided uniquely into E- and I-parts. This structural property
is underlying the strategic search for intercalation proofs. Gentzen discovered a procedure for the normalization of proofs in
intuitionist first-order logic, but could not extend it to classical logic; see my essay [30, section 6]. That was achieved, at least
partially, in [29]. The completeness proofs for intercalation calculi were established in the two papers mentioned in footnote
8 for classical and intuitionist first-order logic.

This content downloaded from


3.7.3.255 on Thu, 27 Jul 2023 20:50:58 +00:00
All use subject to https://about.jstor.org/terms
addressed, I think, without reference to a conceptually organized framework and proof search
9
within it. In sum, (informal) proof representation and (formal) proof identity are always relative to
a framework and proof search procedure. Given such a framework, natural formalization crucially

royalsocietypublishing.org/journal/rsta Phil. Trans. R. Soc. A 377: 20180031


...............................................................
complements strategic search by building connections to the framework. The connections are of
roughly two kinds, namely observations concerning important notions and heuristic ideas. Both
kinds are expressed through lemmas.
We distinguish, consequently, at least three components of natural formalization: (i) strategic
(sequences of) steps using the introduction and elimination rules associated with connectives and
definitions, (ii) syntactically motivated connections to the background conceptual organization,
and (iii) heuristics or leading ideas for addressing classes of problems. The strategic approach
(i) is based on the syntactic structure of assumptions and goals and has, of course, its limits. The
components of kind (ii) are exemplified by the lemmas used in the top-level proof of the CBT (and
listed in appendix A). They are easy observations for anyone who understands the mathematical
concepts involved. By contrast, the components of kind (iii) articulate broader heuristic ideas. In
the context of this paper, the technique for showing that two sets a and b are equinumerous falls
into this category: partition both a and b into two subsets a1 , a2 and b1 , b2 and show that the ai and
bi , i = 1 or 2, are pairwise equinumerous; thus, a and b are equinumerous.11
Natural formalization is a dynamic tool to locate significant differences between proofs, to
explore criteria for their identity and to raise questions on their simplicity (or complexity).
What are the limits of the strategic approach? How do components of kind (ii) and (iii) help
to overcome the limits? Those of kind (ii), do they more than keep proofs surveyable and free
from excessive low-level formality, or do they have greater cognitive significance? Those of kind
(iii), do they reflect different proof ideas? As to simplicity, are the length of proofs and proof
search meaningful measures? Should we consider a proof as complex, if it uses a great number of
particular logical rules (e.g. indirect steps, existential introductions) or if the containment relation
for subderivations is deep? In any event, natural formalization presses us to articulate heuristics
and leading ideas when the general strategic approach falters—and that will be extremely useful
for the automated search for humanly intelligible proofs.
Here is a wide and open field for fascinating investigations. It can be explored by proof search
experiments with essential support of computers, and it can also be connected to traditional
reflections on the identity of formal proofs as presented in [32,33]. In my study [31], I suggested
calling such an expanded proof theory structural for two reasons: on the one hand, one exploits
the internal syntactic structure of normal proofs and, on the other hand, one appeals to the
framing conceptual structure, including the axiomatic definition of mathematical structures.
Formal proofs should be viewed as representatives of ordinary mathematical proofs only when they
preserve leading ideas and have been constructed in a strategic way relative to a methodological
framework. Of course, the character of this intricate connection is itself a central topic of
investigation.
Competing interests. I declare I have no competing interests.
Funding. I received no funding for this study.

Appendix A
Here is the list of lemmas that are actually used in the top-level proof of the Cantor–Bernstein
Theorem. For a full list of lemmas used in the proof of the CBT, see ([8], appendix A). One
should note the directness of the first five observations; the remaining ones are facts concerning
Dedekind’s chains and are used in the proof of the structural identity.

11
This technique is reminiscent of the Greek way of partitioning geometric figures and showing them to be congruent by
arguing for the congruence of the parts; such Zerlegungsbeweise are the topic of Mahlo’s thesis [9]. Its most famous application
is found in Euclid’s proof of the Pythagoras’ theorem; see my [31].

This content downloaded from


3.7.3.255 on Thu, 27 Jul 2023 20:50:58 +00:00
All use subject to https://about.jstor.org/terms
..........................................................................................................................................................................................................

Equi4 f ∈ inj(a, b) a ≈ f [a] 10


..........................................................................................................................................................................................................

Equi8 a ≈ b; a ≈ c b≈c
..........................................................................................................................................................................................................

royalsocietypublishing.org/journal/rsta Phil. Trans. R. Soc. A 377: 20180031


f ∈ inj(a, b) f [a] ⊆ b

...............................................................
Func17
..........................................................................................................................................................................................................

Comp11 f ∈ inj(a, b); g ∈ inj(b, c) (g ◦ f )[a] ⊆ g[b]


..........................................................................................................................................................................................................

Core12 f ∈ inj(a, b); g ∈ inj(b, c) 1#(g ◦ f )[a] ∈ bij(a, (g ◦ f )[a])


..........................................................................................................................................................................................................

Chain8 f ∈ func(c, c); a ⊆ c  a, f ) ⊆ c


c(c,
..........................................................................................................................................................................................................

Chain10 f ∈ func(c, c); a ⊆ c a ⊆ c(c,


 a, f )
..........................................................................................................................................................................................................

Chain11 f ∈ func(c, c); a ⊆ c f [c(c,


 a, f )] ⊆ c(c,
 a, f )
..........................................................................................................................................................................................................

Chain12 f ∈ func(c, c); a ⊆ c; b ⊆ a; Chain(a, f )  b, f ) ⊆ a


c(c,
..........................................................................................................................................................................................................

Chain14 f ∈ func(c, c); a ⊆ c f [a] ⊆ f [c(c,


 a, f )]
..........................................................................................................................................................................................................

Chain28 f ∈ func(a, a); b ⊆ a (∃x)x = c(c,


 b, f )
..........................................................................................................................................................................................................

References
1. Dedekind R. 1887 Ähnliche (deutliche) Abbildung und ähnliche Systeme. In Gesammelte
mathematische Werke, vol. 3 (eds R Fricke, E Noether, Ö Ore), pp. 447–449. Braunschweig:
Vieweg.
2. Dedekind R. 1932 Gesammelte mathematische Werke, vol. 3 (eds R Fricke, E Noether, Ö Ore).
Braunschweig: Vieweg.
3. Cantor G. 1932 Gesammelte Abhandlungen mathematischen und philosophischen Inhalts
(ed. E Zermelo). Berlin, Germany: Springer.
4. Zermelo E. 1908 Untersuchungen über die Grundlagen der Mengenlehre. I. Math. Ann. 65,
261–281. (Translated in (van Heijenoort 1967).). (doi:10.1007/BF01449999)
5. Hinkis A. 2013 Proofs of the Cantor-Bernstein Theorem: A mathematical excursion. Basel:
Birkhäuser Verlag.
6. König J. 1906 Sur la théorie des ensembles. C. R. Hebd. Séances Acad. Sci. 143, 110–112.
7. Thiele R. 2003 Hilbert’s twenty-fourth problem. Am. Math. Mon. 110, 1–24. (doi:10.1080/
00029890.2003.11919933)
8. Sieg W, Walsh P. Submitted. Natural formalization: deriving the Cantor-Bernstein Theorem
in ZF.
9. Mahlo P. 1908 Topologische Untersuchungen über Zerlegung in ebene und sphärische
Polygone. Dissertation, Halle.
10. Hilbert D. 1900 Mathematische Probleme. Nachrichten der Königlichen Gesellschaft der
Wissenschaften zu Göttingen 253–297.
11. Hilbert D. 1918 Axiomatisches Denken. Math. Ann. 78, 405–415. (doi:10.1007/BF01457115)
12. Gentzen G. 1934 Untersuchungen über das logische Schließen I, II. Math. Z. 39, 176–210,
405–431.
13. Gentzen G. 1936 Die Widerspruchsfreiheit der reinen Zahlentheorie. Math. Ann. 112, 493–565.
14. MacLane S. 1934 Abgekürzte Beweise im Logikkalkul. Dissertation, Göttingen.
15. MacLane S. 1935 A logical analysis of mathematical structure. Monist 45, 118–130.
(doi:10.5840/monist19354515)
16. Sieg W. 2018 In preparation. Proofs as objects.
17. Kanamori A. 2004 Zermelo and set theory. Bull. Symbol. Logic 10, 487–553. (doi:10.2178/bsl/
1102083759)
18. Banach S. 1924 Un théorème sur les transformations biunivoques. Fundam. Math. 6, 236–239.
(doi:10.4064/fm-6-1-236-239)
19. Deiser O. 2010 Introductory Note to 1901. In Collected Works/Gesammelte Werke, vol. 1 (eds
HD Ebbinghaus, C Fraser, A Kanamori), pp. 52–70. Berlin, Germany: Springer.
20. Doyle PG, Conway JH. 1994 Division by three. arXiv:math/0605779v1.
21. Bernstein F. 1905 Untersuchungen aus der Mengenlehre. Math. Ann. 61, 117–155. [This is
the publication of Bernstein’s dissertation from 1901. His proof of the Cantor-Bernstein
Theorem had been found earlier and published, with the appropriate acknowledgment,

This content downloaded from


3.7.3.255 on Thu, 27 Jul 2023 20:50:58 +00:00
All use subject to https://about.jstor.org/terms
by Borel in the 1898-edition of his Leçons sur la théorie des fonctions; Gauthier-Villars.].
11
(doi:10.1007/BF01457734)
22. Kleene SC. 1952 Introduction to metamathematics. Amsterdam, The Netherlands: North-Holland

royalsocietypublishing.org/journal/rsta Phil. Trans. R. Soc. A 377: 20180031


...............................................................
Publishing Company.
23. Dedekind R. 1888 Was sind und was sollen die Zahlen? Braunschweig: Vieweg.
24. Tarski A. 1955 A lattice-theoretical fixpoint theorem and its applications. Pac. J. Math. 5,
285–309. (doi:10.2140/pjm.1955.5.285)
25. Sieg W, Byrnes J. 1998 Normal natural deduction proofs (in classical logic). Studia Logica 60,
67–106. (doi:10.1023/A:1005091418752)
26. Sieg W, Cittadini S. 2005 Normal natural deduction proofs (in non-classical logics). In
Mechanizing mathematical reasoning (eds D Hutter, W Stephan), pp. 169–191. Lecture Notes
in Computer Science 2605. Berlin, Germany: Springer.
27. Hilbert D, Bernays P. 1917-18 Prinzipien der Mathematik. In David Hilbert’s Lectures on the
Foundations of Mathematics and Physics, 1917–1933 (eds WB Ewald, W Sieg), pp. 64–214. Berlin,
Germany: Springer.
28. Hilbert D. 1899 Grundlagen der Geometrie. Stuttgart, Germany: Teubner.
29. Prawitz D. 1965 Natural deduction – A proof-theoretical study. Stockholm: Almqvist & Wiksell.
30. Sieg W. 2012 In the shadow of incompleteness: Hilbert and Gentzen. In Hilbert’s programs and
beyond, pp. 155–192. Oxford, UK: Oxford University Press.
31. Sieg W. 2010 Searching for proofs (and uncovering capacities of the mathematical mind). In
Hilbert’s programs and beyond, pp. 377–401. Oxford, UK: Oxford University Press (Reprinted).
32. Prawitz D. 1971 Ideas and results in proof theory. In Proc. of the Second Scandinavian Logic
Symp. (ed. JE Fenstad), pp. 235–307. Amsterdam, The Netherlands: North-Holland.
33. Dosen K. 2003 Identity of proofs based on normalization and generality. Bull. Symbol. Logic 9,
477–503. (doi:10.2178/bsl/1067620091)

This content downloaded from


3.7.3.255 on Thu, 27 Jul 2023 20:50:58 +00:00
All use subject to https://about.jstor.org/terms

You might also like