Practical Found of Math - Taylor
Practical Found of Math - Taylor
INTRODUCTION
Introduction
1.1
Substitution
1.2
Denotation and Description
1.3
Functions and Relations
1.4
Direct Reasoning
1.5
Proof Boxes
1.6
Formal and Idiomatic Proof
1.7
Automated Deduction
1.8
Classical and Intuitionistic Logic
Exercises I
Introduction
2.1
Introduction
3.1
Posets and Monotone Functions
3.2
Meets, Joins and Lattices
3.3
Fixed Points and Partial Functions
3.4
Domains
3.5
Products and Function-Spaces
3.6
Adjunctions
3.7
Closure Conditions and Induction
3.8
Modalities and Galois Connections
3.9
Constructions with Closure Conditions
Exercises III
Introduction
4.1
Categories
4.2
Actions and Sketches
4.3
Categories for Formal Languages
4.4
Functors
4.5
A Universal Property: Products
4.6
Algebraic Theories
4.7
Interpretation of the Lambda Calculus
4.8
Natural Transformations
Exercises IV
Introduction
5.1
Pullbacks and Equalisers
5.2
Subobjects
5.3
Partial and Conditional Programs
5.4
Coproducts and Pushouts
5.5
Extensive Categories
5.6
Kernels, Quotients and Coequalisers
5.7
Factorisation Systems
5.8
Regular Categories
Exercises V
VI STRUCTURAL RECURSION
Introduction
6.1
Free Algebras for Free Theories
6.2
Well Formed Formulae
6.3
The General Recursion Theorem
6.4
Tail Recursion and Loop Programs
6.5
Unification
6.6
Finiteness
6.7
The Ordinals
Exercises VI
VII ADJUNCTIONS
Introduction
7.1
Examples of Universal Constructions
7.2
Adjunctions
7.3
General Limits and Colimits
7.4
Finding Limits and Free Algebras
7.5
Monads
7.6
From Semantics to Syntax
7.7
Gluing and Completeness
Exercises VII
Introduction
8.1
The Language
8.2
The Category of Contexts
8.3
Display Categories and Equality Types
8.4
Interpretation
Exercises VIII
IX THE QUANTIFIERS
Introduction
9.1
The Predicate Convention
9.2
Indexed and Fibred Categories
9.3
Sums and Existential Quantification
9.4
Dependent Products
9.5
Comprehension and Powerset
9.6
Universes
Exercises IX
BIBLIOGRAPHY
INDEX
Practical Foundations of
Mathematics relates category
theory and type theory to the idioms
of mathematics (Cambridge
University Press, 1999, ISBN 0-521-
63107-6).
Classical (Scott) and stable domain theory, also called analytic or polynomial functors, shapes,
containers, or multiadjoints.
Proofs and Types by Jean-Yves Girard, which I translated, is out of print but downloadable.
Gauss's second proof of the fundamental theorem of algebra, which I translated from Latin.
Undergraduate algebra and other course notes that I wrote when I was a graduate student in
Cambridge.
These HTML pages (with the exception of the book) have recently been regenerated. The problem
with missing mathematical symbols seems to have been solved, but this is browser-dependent, so
please tell me if you find that some symbols are still missing.
Cambridge University Press publishes the finest academic and educational Search
writing from around the world. As a department of the University of
Cambridge, its purpose is to further the University's objective of advancing Search
knowledge, education, learning, and research. Cambridge is not just a
Enter Author, Title, or ISBN
leading British publisher, it is the oldest printer and publisher in the world
and one of the largest academic publishers globally. More...
Journals
Search our dedicated journals site...
Select your region to browse and buy...
Cambridge Journals Online
● Europe, ● Asia
the Middle
● Australia
East and
and New
North Africa
Zealand
Africa and
Spain and
●
●
the
Portugal
Caribbean
● Americas
© Cambridge University Press 2006. Copyright statement | Privacy Policy | A to Z | Accessibility | Contact us
http://www.cup.cam.ac.uk/2007-8-27 11:43:58
Practical Foundations of Mathematics
Buy it online
From the publisher, CUP: in the UK or the USA.
From bookshops: Amazon (UK) (USA), Blackwell's, Barnes & Noble, Powell's, Science Daily or
W H Smith.
Via Price Comparison Sites: Abebooks, Best Book Buys, FetchBook, Froogle (=Google) (UK)
(USA) or Pricegrabber.
Read it online
The 18-page Synopsis in DVI or HTML has links into the full text, on-line in HTML.
This crude HTML translation only includes the narrative and simpler mathematical formulae, not the
diagrams. It was generated by TTH, which gives some advice for a quick way to make the Greek letters
Corrections
Two mathematically significant but localised errors have been found:
● p. 342, Lemma 6.5.7: The unification algorithm need not terminate if the "occurs check" fails, for
example x = a b x, y = b a y, x = a y. The whole section needs to be reorganised, as the "occurs
check" is currently introduced after this Lemma,
● p. 523, Exercise 9.4: Thomas Streicher sent me a simple counterexample to the claim that
fibrations preserve pullbacks. The exercise was replaced by another similar one in the 2000
reprint.
Some other corrections were made in the 2000 reprint, and there are a few more now.
Reviews
● Roy Dyckhoff for the Bulletin of the London Mathematical Society,
● Peter Johnstone for Zentralblatt, and by
● Thomas Streicher for Science of Computer Programming, Volume 38, August 2000.
Web searches
Google Book Search, CiteSeer, Google and Google Scholar.
INTRODUCTION
1.1 SUBSTITUTION
● Variables
● Structural recursion
● Terms and substitution
● Quantification
● Bound variables
● Substitution and variable-binding
● Direct reasoning
● Proof boxes delimited by keywords
● Importing and exporting formulae
● Open-ended boxes
● Declaration and assignment
● Alternative methods
● Model theory
● Theoretical basis
● Resolution
● Unification
● Box-proof heuristics
● Excluded middle
● The Sheffer stroke
● Intuitionism
● The axiom of choice
● Logic in a topos
EXERCISES I
Chapter 1
First Order Reasoning
How do we begin to lay the foundations of a palace which is already more than 3600 years old? Alan
Turing [Tur35] identified what is perhaps the one point of agreement between the Rhind papyrus and
our own time, that a ``computer'' ( ie a mathematician performing a calculation) puts marks on a page
and transforms them in some way. Even in its most naive form, Mathematics is not passive: we recite
the multiplication table, transforming 7×8 into 56, and later find out that xx(y+z) may be replaced by
(xxy)+(xxz). We say that these pairs are respectively equal, meaning that they denote the same objects in
``reality,'' even though they are written in different ways.
During the process of calculation (from Latin calx, a pebble) there are intermediate forms with no
directly explicable meaning: accountants refer to ``net'' and ``gross'' amounts, and to ``pre-tax'' profits, in
an attempt to give them one. The remarkable feature of mathematics is that we may suspend belief like
this, and yet rely on the results of a lengthy calculation, even when it has been delegated to a computer.
The notation of elementary school arithmetic, which nowadays everyone takes for granted, took
centuries to develop. There was an intermediate stage called syncopation, using abbreviations for the
words for addition, square, root, etc . For example Rafael Bombelli ( c. 1560) would write
3
R. c. L. 2 p. di m. 11 L for our √{2+11i}.
Many professional mathematicians to this day use the quantifiers ( ∀,∃) in a similar fashion,
in spite of the efforts of Gottlob Frege, Giuseppe Peano and Bertrand Russell to reduce mathematics to
logic.
The logical calculus is easier to execute than any of the techniques of mathematics itself, yet only in
1934 did Gerhard Gentzen set it out in a natural way. Even now, mathematics students are expected to
learn complicated (ε- δ)-proofs in analysis with no help in understanding the logical structure of the
arguments. Examiners fully deserve the garbage that they get in return.
In Sections 1.4 and 1.5 natural deduction is introduced in a way which has been taught successfully to
first year informatics undergraduates, for whom reasoning about the elementary details of their programs
is a more pressing concern. Section 1.6 shows how formal methods correspond to carefully written
proofs in the vernacular of mathematics and Section 1.7 formalises the way in which routine proofs are
found, maybe by machine.
The manipulation of sets and relations more familiar to mathematics students is treated in Section 1.3
and Chapter II. If you are doubtful of the need for formal logic then I suggest reading the later chapters
first.
In the spirit of starting out ``from nothing,'' the first two sections discuss the behaviour of purely
symbolic manipulation. Nevertheless, the ideas which they introduce will play a substantial role in the
rest of the book. We have, for example, to learn the difference between object-language and meta-
language. Logic is primarily a meta-theory, and historically it has sought its object-language in some
strange places: for medieval logicians, the motivation was theology. Today it is mathematics and
programming, but in the first instance we shall apply logic to formal manipulation itself.
The final section discusses classical and intuitionistic logic, and why we intend to use the latter. This
chapter and the next raise some of the questions of logic. The rest of the book uses modern tools to
illuminate a few of those issues.
1.1 Substitution
Logic as the foundations of mathematics turns mathematics on itself, but there is content even in the
study of form.
We begin by considering how expressions must transform if they are to have and retain meaning,
without being too specific about that meaning. Think of this as learning how to use a balance, ruler and
stop-watch in preparation for studying physics. To those whose background is in pure mathematics, the
explicit manipulation of syntax and the distinction between object-language and meta-language may be
unfamiliar, but they are the stuff of logic and informatics. We shall not be too formal: formal rigour can
only be tested dynamically, and those whose business it is to implement formal languages on computers
are already well aware of many of the actual difficulties from a practical point of view.
The mathematical expressions which we write are graphically of many forms, not necessarily one-
dimensional, and are often subject to complex rules of well-formedness. It suffices at first to think of
strings composed of constants (0, 1, ..., π, etc ) and operation-symbols (+, x, [√], sin).
Given an expression of this kind, we can evaluate it, but that's the end of the story. The main
preoccupation of algebra before the nineteenth century was the solution of equations.
Variables Algebraic techniques typically involve the manipulation of expressions in which the
unknown is represented by a place-holder. This may be a letter, or (as often in category theory) an
anonymous symbol such as (-) or ( = ). The transformations must be such that they would remain valid if
the place were filled by any particular quantity. They must not depend on its being a variable-name
rather than a quantity, so we cannot add the variables b and e to obtain g (``2+5 = 7'') because this is
unlikely to continue to make sense when other values are put for the variables. Variables were used in
this way by Aristotle.
The symbols are called parameters, constants, etc depending on ``how'' variable they are taken to be in
the context. Usage might describe x as a variable, p and q as parameters and 3 as a constant, but they
may change their status during the course of an argument. This is reflected in the history of the cubic
equation (Example 4.3.4): Gerolamo Cardano, like the author of the Rhind papyrus, demonstrated his
techniques by solving specific but typical numerical examples such as x3+6x = 20, but Franccois Viète
was the first to use letters for the coefficients. (The use of x y z for variables and abc for coefficients is
due to René Descartes.)
After we have solved x3 = 3px+2q for x in terms of p and q, we may consider how the form of the
solution depends on them, in particular on whether p3 ≥ q2. In a more general discussion, what were
constants before are subject to investigation and become variables. This changing demarcation between
what is considered constant or variable will be reflected in the binding of variables by quantifiers, and in
the boxes used in Section 1.5.
The places p and q also differ in that they stand for arbitrary (∀) quantities, whilst x is intended to be
filled by individual (∃), though as yet unknown, ones, namely the solutions of the equation. This
difference is represented formally by the two quantifiers: ∀p,q.∃x.x3 = 3px+2q.
Structural recursion
REMARK 1.1.1 Although we write algebraic expressions and programs as linear strings of symbols, it is
better to think of them as trees, in which the connectives are nodes and the constants and variables are
leaves. Each operation- symbol corresponds to a certain species of node, with a sub-tree for each
argument; we speak of nullary, unary, binary, ternary and in general k- ary or multiary operations if
they have 0, 1, 2, 3 or k arguments. For example Cardano's first cubic equation might be written
Transformation of expressions becomes surgery on trees. The analysis of a tree begins with the top node,
the ``outermost'' operation-symbol, which is (confusingly) called the root. It is important to be able to
recognise the root of an expression written in the familiar linear fashion; if it is a unary symbol such as a
quantifier (∀, ∃) it is usually at the front (a prefix), but binary ones (+, x, ∨, ∧, ⇒ ) are traditionally
written between their arguments ( infix). Like unwrapping a parcel, you must remove the outermost
operation-symbol first.
Expressions are typically stored in a computer by representing each node as a record, stating which
operation-symbol, constant or variable it is, and with pointers to the records which represent the
arguments or branches. Concretely, the pointer is the machine address of the record. The whole
expression is denoted by the address of its root record, so the outermost operation-symbol is accessible
im mediately, ie without inter 1em mediate processing.
The leaves of an expression-tree are its variables. The simplest operation on trees is substitution of a
term for a variable. A copy of the expression-tree of the term is made for every occurrence of the
variable, and attached to the tree in its place. If there were many occurrences, the term and its own
variables would be proliferated, but if there were none the latter would disappear. We call this a direct
transformation.
The application of a single operation-symbol such as + to its family of arguments is a special case of
substitution: 3+5 is obtained by substituting 3 for x and 5 for y in x+y. Conversely, it can be useful to
think of any expression-tree (with variables scattered amongst its leaves) as a generalised operation-
symbol; since it replicates the algebraic language by adding in definable operations, the collection of all
trees, substitutable for each other's leaves, is called the clone.
REMARK 1.1.2 An expression may also be regarded as a function of certain variables, written f(x,y,…).
Beware that ``(x,y,…)'' signifies neither that x, y, etc actually occur in the expression (if they don't then f
is a constant function), nor that these exhaust the variables which do occur (others may be parameters).
We shall therefore describe this and f(a,b,…), the result of substituting certain terms for the variables, as
the informal notation for functions.
There are also indirect transformations which re-structure the whole tree from top to bottom. For each
kind of node, we have to specify the resulting node and how its sub-trees are formed from the original
ones and their transformed versions. If you have a mathematical background, but haven't previously
thought about syntactic processes, then symbolic differentiation will perhaps be the most familiar
example. The most complex case is multiplication.
The tree has grown both horizontally and vertically, but notice that all occurrences of [(d)/( dx)] are now
closer to the leaves, and this observation is just what we need to see that the recursion terminates: the
motto is divide and conquer. We shall study recursion in Chapters II and VI.
These symbols u and v are meant to stand for sub-trees such as x3+6x or y, rather than values: we have to
use unknowns representing trees in order to apply mathematics to the discussion. It is important to
understand that u is a different kind of variable from x. In the differential calculus an approximately
analogous distinction is made between dependent and independent variables; this is needed to explain
what we mean by [(d)/( dx)]t, where the variable x may ``occur'' in t. In terms of formal languages, t is (a
variable in the) meta-notation, ie a meta- variable. We must be prepared for the possibility that t does
actually stand for a variable, in which case the manipulation usually behaves in one way for the special
variable x and in another for other variables.
Equipped with some meta-notation, it is more convenient to change back from trees to the linear
notation; then the rules of differentiation are
(a)
a variable, such as x,
(b)
a constant symbol, such as c, or
(c)
r(u,v,…), ie an operation-symbol r that is applied to a sequence of sub-terms u, v, ...
Beware that FV is not an operation-symbol, since it does depend on the identity of the variables. It's a
meta-operation like [(d)/( dx)].
Substitution is, for us, the most important use of structural recursion. We shall study the term a by
exploiting its effect by substitution for variables in other terms, which must follow through all of the
transformations of expressions. This obligation on each principle of logic gives rise to many curious
(and easily overlooked) phenomena.
DEFINITION 1.1.4 By structural induction on t we define substitution, t[x: = a], of a term a for a
variable x possibly occurring in t, as follows:
Our notation for substitution, t[x: = a], has a star, the real meaning of which will become clear in
Chapter VIII. For the moment, it is a reminder that t[x: = a] is not an expression starting with a square
bracket but a modified form of t, having the same first symbol as it has, unless this is x, in which case t
[x: = a] starts with the first symbol of a. Other forms of this notation are to be found in the literature,
including t[a/x] from typography and t[x: = a] from programming.
The next result, the Substitution Lemma, is the key to our semantic understanding of syntax in Chapters
IV and VIII.
LEMMA 1.1.5 Let a, b and t be expressions, and x, y distinct variables such that x,y ∉ FV(a) and y ∉ FV
(b). Then
PROOF: We use structural induction on the term (a or t), considering each of the cases c, x, y, z ( ie
another distinct variable) and r(u,v,…), using the induction hypothesis for the last. The first result, with b
The general form in the lemma, where we allowed x to occur in b, applies when variables do interfere: as
in quantum mechanics, the failure of commutation shows that something is happening.
Quantification There are expressions which have logical values as well as those with numerical
meaning. We shall say formula for a logical expression and term for other kinds. Equations (a = b) and
inequalities (a ≤ b) are (atomic) formulae; we might think of = as a binary operation-symbol whose
result is a truth-value. There is an algebra of formulae à la Boole, whose operation-symbols include ∧, ∨
and ⇒.
Using the informal notation, a formula such as φ[x] containing variables is known as a predicate, x being
the subject. If it has no variables it is called a proposition, sentence or closed formula. Like terms with
variables, predicates may be understood as equations to be ``solved,'' defining the class of things which
satisfy them. The formation of such a class from a predicate is known as comprehension
(Definition 2.2.3).
A predicate φ[x] may also be intended to make a general assertion, ie of every instance φ[a] obtained by
substituting a term a for x. In this case it is called a scheme. It may be turned into a single assertion by
universal quantification: ∀x.φ[x]. Similarly, we may assert that the predicate qua equation has some
solution or witness by writing ∃x.φ[x], with the existential quantifier.
10
2π
⌠ x2 ∃x.φ[x] ∀x.φ[x] λx.p
⌡ 0
sinx dx ∑ (x )
x=1
have a special status. They are no longer available for substitution, and can be replaced by any other
variable not already in use: ∃x.φ[x] is the same as ∃y.φ[y]. In this new role , x is called a bound variable,
where before it was free. This distinction was first clarified by Giuseppe Peano (1897). He invented the
symbols ∈ , ∃ and many others in logic, and a language called Latino sine flexione or Interlingua in
which to write his papers. The twin ∀ was added by Gerhard Gentzen in 1935.
Variables link together occurrences which are intended to be the same thing, and substitution specifies
what that thing is. Variable-binding operations isolate this link from the outside world. The linking may
be represented on paper by drawing lines instead (such as in Exercise 1.23) or in a machine by
assigning to the variable an address that can only be accessed via the binding node.
DEFINITION 1.1.6 The expressions ∃x.φ[x] and ∃y.φ[y] are said to be α-equivalent. In the latter, y has
been substituted for each occurrence of x inside the quantifier, which now ranges over y. Although we
often emphasise the dynamic nature of calculation, we shall treat these expressions as the same: there is
no preferential choice of going from one to the other. Technically this presents no problems since it is
decidable whether two given strings of symbols are α-equivalent. Nikolas de Bruijn devised a method of
eliminating (the choice of names of) bound variables in favour of something more canonical (Exercise
2.24), but we shall not use it, as it is less readable by humans.
EXAMPLES 1.1.7 The predicate ∃x:N.y = x2 says whether y is a square number. We may substitute a value
(3) or another expression (w+z) for y, to obtain the (false) proposition ∃x:N.3 = x2 or binary predicate ∃x:
N.(w+z) = x2. But we may not substitute a term involving x for y: ∃x:N.x = x2 is a true proposition,
which is not a substitution instance of the original formula. Likewise, α-equivalence lets us substitute a
new variable for the bound x: ∃w:N.y = w2 is the same predicate, but ∃3:N.y = 32 and ∃(w+z):N.y = (w+z)
2 are nonsense, whilst ∃y:N.y = y2 is again a true proposition.
CONVENTION 1.1.8 We never use the same variable name twice in an expression. Where circumstances
bring together two expressions with common variable names, we use α- equivalent forms instead.
Surely only an undisciplined programmer who couldn't be bothered to make a good choice of names
would do otherwise? But no, although ∃ and ∀ respect it, β-reduction in the λ-calculus may cause
duplication of sub-terms, and hence bound variables, so it may later substitute one abstraction within the
scope of the other (Exercise 2.25). The convention is therefore a very naive one, unsuitable for
computation. (A weaker convention will be needed in Section 4.3, and variables must be repeated for
different reasons in Sections 9.3 and 9.4.)
DEFINITION 1.1.9 When we use variable-binding operators we must therefore have available an
inexhaustible supply of variables. This means that, whatever (finite) set of variables we're already
using, we can always find a new one, ie one which is different from the rest. Exercise 6.50 examines
what is needed to mechanise this. Some accounts of formal methods specify that there are countably
many variables - even number them as x1,x2,x3,… or x,x′,x′′,… - but one thing the formalism must not
In fact the x1 written here is a meta-variable which stands for whatever actual variable may be desired in
the application; it may stand for the same variable as x2 unless we say otherwise. The difference between
object-language and meta-language is important, but making it explicit can be taken too far. In this book
we make make no systematic distinction between object-variables and the meta-variables used as place-
holders for them, or, rather, we shall use x, y, z, x1, x2, etc as examples of variable names ( cf Cardano's
examples of equations).
In the Substitution Lemma and elsewhere we do need to know whether or not two symbols are meant to
be the same variable ( intensional or by mention equality); we write x\not ≡ y if they have to be different
symbols. Even in this case they may be assigned the same value ( extensional or by use equality). On the
other hand, we sometimes also need to constrain values, for example to say that x ≠ 0 as a precondition
for division.
Substitution and variable-binding Bound variables complicate the formal definition of substitution,
and it must be given simultaneously with the definitions of free variables and of α-equivalence. As
before, we do this by structural recursion on the main term. If this is new to you then you should work
through these definitions and prove the Substitution Lemma for yourself, maybe using the tree notation.
DEFINITION 1.1.10 By simultaneous structural recursion on t we define (i) the set, FV(t), of free
variables, (ii) substitution, t[x: = a], of a term a for a variable x possibly occurring in it and (iii) α-
equivalence with another term. To Definitions 1.1.3 and 1.1.4 we add that
(a)
the constant c and variable x are each α-equivalent to themselves, but not to anything else;
(b)
operation-symbols respect α-equivalence: if u is α-equivalent to u′ and v to v′, etc , then r(u,v,…)
is α-equivalent to r(u′,v′,…);
(c)
if t is ∃x.p then it is α-equivalent (to itself and) to any ∃y.( p[x: = y]) as long as y ∉ FV(p), and
similarly for ∀.
Hence
(d)
if t is a quantified formula then (by the previous part, and using the inexhaustible supply of
variables) we may assume that it is ∃y.p where y is not the same variable as x, and y does not
occur in a; then FV(t) = FV(p)\{y} and ∃y.p[x: = a] = ∃y.p[x: = a]. (Similarly with ∀.) In this
sense, quantification respects α-equivalence.
Notice that α-substitution for y ∈ FV(a) may be needed within t in order to define t[x: = a]. (Some
authors say that ``a is free for x in t'' if there are no clashes between bound variables of t and free
variables of a.) The Substitution Lemma remains valid, and, as a corollary, substitution respects α-
equivalence.
Recall that the Lemma mentioned expressions in which the variable x must not occur freely, and we
shall put the same condition on certain formulae in the rules for proving ∀x.φ[x] and using ∃x.φ[x] in
Section 1.5. When this condition arises, it often means that at some point the same expression is also
used as if the variable x really did occur in it.
^
*
x t
for the term t in which the variable x is considered to occur (``invisibly''). Like [x: = a], [^(x)]* is meta-
notation, though in this case it does not actually alter the term. The use of [^(x)]* presupposes x ∉ FV(t),
ie that the term t came from a world in which x did not occur. In particular, [^(x)]* x and [^(x)]*[^(x)]*t
are not well formed terms.
The Notation conveys more than a negative listing of the free variables: it says that t is being imported
into a world in which x is defined. Although this distinction seems trivial, making it and the passage
back and forth between these two worlds explicit will help us considerably to understand the quantifiers
in Chapter IX. What [^(x)]* means will become clearer when we use it in Sections 1.5, 2.3 and 4.3. The
hat has been adopted from the abbreviation 1,…,[^(ι)] ,…,n for the sequence with i omitted in the theory
of matrices and their determinants.
PROPOSITION 1.1.12 [ Extended Substitution Lemma] Let a, b, t, u and v be expressions, and x and y be
distinct variables such that x ∉ FV(a,u) and y ∉ FV(a,b,u,v). Then
Moreover if t′ is α-equivalent to t, u′ to u and a′ to a, then t′[x: = a′] and [^(x)]*u′ are α-equivalent to t[x:
= a] and [^(x)]*u respectively.
PROOF: To the proof of Lemma 1.1.5 we add the case ∃z.s, in which, without loss of generality (by α-
equivalence), the bound variable is not x or y and does not occur in a or b. Then [x: = a], [y: = b], [^(x)]*
and [^(y)]* commute with ∃z, and the induction hypothesis applies. Since they also commute with [^(w)]*
[z: = w], the bound variable z can be renamed. []
The Extended Substitution Lemma will be our point of departure for the semantic treatment of syntax in
Sections 4.3 and 8.2. Algebras are the subject of Section 4.6, and term algebras of Chapter VI, which
considers structural recursion in a novel and sophisticated way.
EXAMPLE 1.2.1 A rational number may be represented as a fraction n/d with d ≠ 0 in many ways. The
synonyms are related, written
n,d ∼ n′,d′, if n d′ = n′
d.
There is a normal form, obtained by Euclid's algorithm (Example 6.4.3), in which n and d have no
common factor. But there is no need to reduce everything to normal form at each step of a calculation ,
since the arithmetic operations respect the equivalence relation.
Similarly, a (positive, zero or negative) integer z may be coded as the set of pairs p,n of (zero or
positive) natural numbers subject to
Platonism and Formalism As there are syntactically different terms which are synonyms for the same
value, the expressions themselves can never capture exactly what mathematical values are.
Various mathematical philosophies offer different views about what these eternal values are meant to be.
One view is that the constants are tokens for real things like sheep and pebbles (Exercise 1.1) and the
operations combine them. Our investigations are merely passive observations of an eternal and
unchanging world, in which there is a true answer to every question, even though we may be (provably)
unable to find it. This is known as Platonism.
Although most mathematicians habitually think and speak in Platonist language, this philosophy is
ridiculously naive . It brings mathematics down to an experimental science, in which we can only infer
laws such as the associativity of addition by scientific induction from the cases which have been
observed. How can we know when all of the laws have been codified? Have they been asserted in
excessive generality? We don't know. Bertrand Russell noticed that Gottlob Frege's use of
comprehension was too general, and Kurt Gödel showed that Russell's own system (or anything like it)
could express certain true facts about itself which it could not prove, unless it was inconsistent.
Formalism denies absolute being: only the symbols themselves exist, and nothing else. The notion of
value must then be a derived one: it is defined as that which is common to all of the other expressions
which are directly or indirectly related to the given one. What is common is simply their totality: the
equivalence class.
Formalism is regarded by some as nihilism: if mathematics is just a game with symbols and arbitrary
rules, what's the point of playing it? Chess is also a game with arbitrary rules played by many
mathematicians, but to a chess master it has latent structure, a semantics. We hope to show that the rules
of logic are not so arbitrary as those of chess, and exhibit many of the symmetries which mathematicians
find beautiful in the world.
So long as we fix our sights on a reasonably small fragment of logic, there is a synthesis between the
two points of view: out of the formalism itself we construct a world which has exactly the required
properties. The free algebra for an algebraic theory is an example of such a world; it satisfies just those
equations which are provable. Joachim Lambek has argued [ LS86, p. 123] that it is possible to reconcile
the Platonist and Formalist points of view.
Laws as reduction rules Using the tree notation, the distributive law, seen as a way of transforming or
computing with expressions, is
We distinguish between equations, which may or may not hold, and laws, which hold because (as in the
legal sense) we choose to enforce them. In the λ-calculus the term δ-rule is used, and occasionally we
shall call an ah doc imposed equation a relation (``re gulation''), though this term is normally reserved
for another sense. Beware that the word law is also used elsewhere for what we call an operation, such
as multiplication; this sense is archaic in English but current in French ( loi de multiplication).
DEFINITION 1.2.2 In a law the variables on the left hand side name sub-trees in a pattern to be matched.
The variables on the right say where to copy these sub-trees (maybe several times, maybe not at all), so
every variable occurring on the right must also occur on the left.
Any (sub-)expression which matches the pattern on the left is called a redex ( reducible expression), and
it reduces to the substituted expression on the right. A term may have many redexes in it.
The result of a ``reduction'' may be a longer expression than the first, and there may indeed be infinite
computations. A sequence of terms in which each is obtained by replacing a redex in the previous one by
the term to which it reduces is called a reduction path. We write u\leadsto v if (u and v are identical
expressions or) there is a reduction path whose first and last terms are u and v respectively.
A term with no redexes is said to be irreducible, but note that this only means that we have reached a
dead end: by going backwards and following another reduction path some different result might be
obtained.
Equivalence relations For an arbitrary set of laws we have no guide to say whether a reduction gets
nearer to a ``result'' or makes matters worse. In this case the equivalence class is the only notion of static
value which we can give to an expression, and there is no systematic way of determining whether two
expressions are equal.
DEFINITION 1.2.3 Reduction, \leadsto , is the sparsest binary relation which contains reduction of
redexes, respects substitution and is reflexive and transitive:
Some authors, particularly when studying term-rewriting, use \leadsto * for the reflexive-transitive
closure of \leadsto , after Russell's \relR*. Another notation for this is \twoheadrightarrow , but we shall
not use either \leadsto * or \twoheadrightarrow in this book.
omitted prooftree
environment
Any relation which is reflexive, symmetric and transitive is called an equivalence relation. These are
essentially Euclid's Common Notions.
(We have just started using a way of expressing conditional assertions, ie rules of deduction, which will
be very useful throughout the book. The ruled line means that whenever what is written above holds
then what is written below follows, for the reason on the right.)
The notion of reduction path is an example of the transitive closure of a relation; it is reflexive if we
include the trivial path (the equality or identity relation). Although any relation can be made symmetric
by adding its converse, ie making the arrows bi-directional, this may destroy transitivity. To form the
equivalence closure we need
LEMMA 1.2.4 Let → be any reflexive-transitive relation, viewed as an oriented graph. Then two nodes u
and v are related by the smallest equivalence relation containing → iff there is a zig-zag, ie a sequence u
= \termu0,\termu1,\termu2,…,\termu2 = v where
n
We say that u and v are in the same connected component, whatever the orientations of the arrows
involved. []
In a formal system of expressions and laws with no underlying Platonist meaning, it may be that there
are so many indirect laws that the terms which we have chosen to name the numbers and truth values all
turn out to be provably equal. This is known as algebraic inconsistency.
Confluence This definition captures the Church-Rosser Theorem, which showed that the pure λ-
calculus is consistent (Fact 2.3.3).
DEFINITION 1.2.5 A system of reduction rules is said to be confluent if, whenever there are reduction
paths t\leadsto u and t\leadsto v, there is some term w with reduction paths u\leadsto w and v\leadsto w.
Other names for this property are diamond and amalgamation.
What we prove in practice is local confluence, that any two one- step reductions may be brought back
together. If it only takes one step on each side to do this then the following result still holds. Usually
more than one step is needed, so the paths to be reconciled may just get longer.
Confluence is a property of the presentation of a system of rules. Donald Knuth and Peter Bendix
[KB70] showed how a system of algebraic laws may sometimes be turned into a confluent system of
reduction rules.
LEMMA 1.2.6 Suppose u and v are equivalent terms with respect to a transitive confluent
relation \leadsto . Then u\leadsto w and v\leadsto w for some term w. In particular for each term there is
at most one irreducible form to which it is equivalent, and distinct irreducible forms are inequivalent, so
we call them normal.
PROOF: Confluence changes each ``zag-zig'' to a ``zig-zag,'' as shown with dotted arrows in the diagram
accompanying Lemma 1.2.4. []
EXAMPLE 1.2.7 Soldiers in Lineland face towards the right if they have even rank (0, 2, 4, ...), whilst
those of odd rank face left. A sequence of soldiers, which we write with semi-colons between them, is
called a parade if each can see only those facing in the same direction and of the same or lower rank.
Adjacent soldiers annihilate if they are facing each other and their rank differs by exactly 1 ( eg 2;3 or
6;5). Otherwise, if a junior is facing an adjacent senior of either parity then they change places, but the
senior loses two grades ( eg 4;7\leadsto 5;4, 6;8\leadsto 6;6, 3;1\leadsto 1;1).
Since the total rank decreases, any conflict of forces terminates, and the result is a parade. An algebraist
might take the parades alone as the elements of the structure, and seek to show that (;) is an associative
operation. The term-rewriting approach would show that the reduction rules are confluent. In fact
exactly the same calculations are needed either way, and we deduce that any conflict has a foregone
conclusion. Hence parades form a monoid whose unit is the empty sequence; we shall use it in Example
7.1.9. []
DEFINITION 1.2.8 A system of reduction rules is weakly normalising if every term has some reduction
path which leads to an irreducible form. It is strongly normalising if there is no infinite reduction path.
THEOREM 1.2.9 In a normalising confluent system of reduction rules, each term is equivalent to exactly
one normal form, and can be reduced to it. If the system is strongly normalising then local confluence
suffices, and every reduction path leads to the normal form. []
Normal forms, where they exist, provide a versatile tool. Suppose that some formula is provable in a
logic whose proofs have this property; then (without being given a proof) we know that any such proof
must be of a particular form, involving certain other formulae which are also provable. This observation
can be used to prove a powerful result about ∨ and ∃ in intuitionistic logic (Remark 2.4.9 and Section
7.7).
There are idioms in the English language which exploit uniqueness, such as that provided by confluence.
Theory of Descriptions In the vernacular we speak of the Moon, using the definite article, because
Earth has only one. By contrast, Ganymede is a satellite of Jupiter: the indefinite article is used since
there are several others. Similarly, when there is exactly one term v which is normal and equivalent to u,
we call it the normal form of u.
(a)
φ is a description if it is satisfied by at most one thing. The best way to say this is that any (∀)
two solutions are equal or, in symbols,
(b)
a description φ denotes if something (∃) satisfies it:
∃x.φ
[x].
In this case we may speak of thesolution, iewhatthe description describes or denotes. The notation
ix.φ
[x]
is sometimes used for this solution. We write ∃!x.φ[x] for unique existence, the conjunction of ∃x.φ[x]
and ∀x, y.φ[x]∧φ[y]⇒x= y. Some equivalent forms will be given in Exercise 1.8.
It is convenient to extend usage to descriptions which are not known to denote, that is, to a general
situation in which we would recognise a widget if we met one and know that there cannot be two which
are different, but where there need not actually be any. Then we may say
to mean that, should we ever find ourselves in possession of a widget, it will necessarily be found to be
grue, where grueness (ψ[x]) is any predicate, not necessarily another description. In symbols,
``The'' and ``unique'' must be used with predicates: it is meaningless to say that a thing is unique,
whatever abuses advertisers may make of the English language. It is similarly wrong to introduce
putative things such as unicorns and then treat existence as a property of them.
ψ ix.φ[x] .
That is, if we have a grue widget, then we know that all widgets are necessarily grue.
PROOF:
omitted prooftree
environment
[]
The i-calculus notation had been in use in the late nineteenth century, but it was Bertrand Russell who
sorted out its meaning. A description such as ``the author of Waverley'' is not a name - it is incomplete,
having no meaning until it is embedded in a sentence such as ``Scott is the author of Waverley'' - and
may be contingent (``the President''). Unfortunately some logic texts to this day assign an arbitrary value
such as zero to the two meaningless cases, leading to nonsense like saying that
Synonyms The theory of descriptions puts a premium on uniqueness, but how can we reconcile this
with the stress we put before on the many concrete forms which a mathematical object such as 5 may
take? One way is to appoint either the equivalence class ( all quintuples) or a normal form ( {0,1,2,3,4})
to be what the object is. The former is objectionable because it introduces extraneous material (the
protons in a Boron atom, as well as the fingers on one hand), and indeed a proper class of it, whilst
Example 1.2.1 shows that the latter is unnecessary. As Pál Halmos [Hal60] points out, we look to
physics for the way standards are defined: when we want to measure a kilogram, we compare it with a
particular platinum-iridium bar in France. See [Ben64] for discussion.
The deeper we dig into foundations, to find out what things do, the less sure we are about what they are:
we must be content with knowing how to exchange equals for equals.
omitted prooftree
environment
If ∼ is the equivalence relation generated by reduction (\leadsto ), it suffices to verify this rule for a
\leadsto b and b\leadsto a (with w = φ[x], this is post-subs in Definition 1.2.3), and this property is
known as subject reduction.
The doctrine of interchangeability does not allow us to test for equality in substance, and so the `` = ''
sign in Definition 1.2.10 and the proof of Lemma 1.2.11 must be replaced by congruence.
So any two things satisfying the description have to be congruent rather than equal. Conversely, the
congruence law says that anything which is equivalent to something satisfying the description also
satisfies it.
The idea is that two things are the same if they have all properties in common, cf Leibniz' principle,
Proposition 2.8.7. Finally, testing a property for any one representative suffices, by the congruence law
again, to prove it for all of them.
For terms denoting individual values these remarks are maybe academic, but significant technical issues
arise when we turn to structures. Then the congruence is not just a property but a method of passing
from one representation to another and back ( isomorphism). Now the admissible properties are those
which are invariant with respect to isomorphism. We often say things like ``the product is unique'' or
``the quotient is compact,'' to be understood in the above senses.
Since a structure may have non-trivial isomorphisms with itself, few interesting properties survive
indiscriminate isomorphisms; for example all of the points on a sphere become identified. There are two
isomorphisms between my shoes and my feet, but choosing the wrong one is rather painful! What
isomorphisms there are with my socks is a significant one geometrically, as a non-trivial group is
involved (Example 6.6.7). In order to express what we have to say about a structure we must follow the
parts of the structure through the transformations.
Therefore mathematical objects are defined up to interchangeability , but we must pay attention to the
means of exchange: isomorphisms of objects and equivalences of categories (Definition 4.8.9).
Opinions differ on how to handle this issue, for example Michael Makkai [Mak96] has taken an extreme
semantic point of view in which equality of objects is unthinkable . The syntactic position is that we
only deal with names of types, which can therefore be compared. In Section 7.6 we shall show that this
conflict can be resolved, by using interchangeability of categories to restore equality of objects.
We shall explore the relationship between the formal and vernacular ways of expressing mathematics
further in Section 1.6. The idea of interchangeability and the means of exchange will be used in Section
4.4. Algebras with laws are the subject of Section 4.6, and Section 5.6 treats equivalence relations. Now
we turn to parametric descriptions.
However, during the twentieth century mathematics students have been taught that a function is a set of
input-output pairs. The only condition is that for each input value there exists, somehow, an output
value, which is unique. This is the graph of the function: plotting output values in the y-direction against
arguments along the x-axis, forgetting the algorithm. Now two functions are equal if they have the same
output value for each input. (This definition was proposed by Peter Lejeune Dirichlet in 1829, but until
1870 it was thought to be far too general to be useful.)
These definitions capture the intension and the effect ( extension) of a function. Evaluation takes us from
the first to the second, but it doesn't say what non-terminating programs do during their execution, and
can't distinguish between algorithms for the same function. But each view is both pragmatic and
entrenched, so how can this basic philosophical clash ever be resolved? Chapter IV begins the
construction of semantic models which recapture the intension extensionally, as part of our
reconciliation of Formalism and Platonism.
DEFINITION 1.3.1
(a)
A binary relation is a predicate in two variables x and y; we shall write it variously as R[x,y],
xR→ y, R:x→ y or xRy.
(b)
a functional or single-valued relation or a partial function if for each x it is a description of y
(Definition 1.2.10(a)), ie for all y1 and y2,
omitted prooftree
environment
(c)
a total functional relation, or just a function , if also for each x, it denotes, ie there is in fact
some y with xR→ y (Definition 1.2.10(b)).
Functional relations are more familiarly called f instead of R, and in this case we write ``f(a) = b'' for f:
a→ b or b = iy.a R→ y. The notation of function-application, like the definite article, implicitly means
that the result is uniquely determined and (usually) that it exists.
A relation which satisfies the existence but not necessarily uniqueness axiom for a function is said to be
entire. Nothing really remains of the functional idea, but the axiom of choice (Definition 1.8.8) says
that such a relation, considered as a set of pairs, contains a function.
On the other hand, single-valuedness alone is important. It is neither possible nor desirable to require all
programs to terminate, but those of mathematical interest can typically be calculated in some manifestly
deterministic way. For term-rewriting systems (including λ-calculi) , confluence is a commoner property
than normalisation. So partial functions are the norm, and will be considered in Sections 3.3, 5.3, 6.3
and 6.4.
REMARK 1.3.2 When equality has to be weakened to interchangeability, the functional property becomes
that x ∼ x′⇒ f(x) ∼ f(x′) or
Arity, source and target In this book we shall take the view that
Although this seems like common sense, it surprises me how readily this principle is dropped when
people try to reason about language.
X
In Chapter II we shall provide ways of forming new types, such as P(X), XxY, Y and List(X), but for the
time being they are fixed in advance; in this case we often say ``sort'' instead of (base) type.
NOTATION 1.3.3 We write x:X and c:X to express the syntactic information that the variable x or constant
c is declared to have type X. For each operation-symbol r we must specify not only the type of its result
but also those of each argument. We sum up this information as
\typeX1,\typeX2,…\vdash r:Y
The list [(X)\vec] of input types is called the arity of r. Types, like predicates (Definition 1.2.12), must
be invariant under subject reduction:
The symbol ∈ is often used instead of the colon, but this can lead to confusion with the axiom of
comprehension (Definition 2.2.3), ie that the value x satisfies the predicate defining a subset X
(Exercise 1.12).
NOTATION 1.3.5 The types of the variables in Definition 1.3.1 are called the source x:X and target y:Y.
We regard them as an inseparable part of the definition, and indicate them by arrows:
(a)
R:X\leftharpoondown \rightharpoonup Y for a binary relation (this symbol is new),
(b)
R:X\rightharpoonup Y for a functional relation (partial function), and
(c)
R:X→ Y for a total functional relation (function).
The words domain and codomain are more usual in category theory, but we shall avoid them because of
confusion with Section 3.4. We also avoid the word range because usage is ambiguous as to whether it
means Y or the set of outputs which actually arise from some input,
{y|∃x.x R→
y},
which we call the image. Again, the word image is sometimes used for the value of a function at a
particular element, but we shall always use it in the above way as the collection of values taken over a
set, ignoring repetition. We shall use the word range in another sense, for the type of the bound variable
of a quantifier (∀x, ∃x). An endofunction is one whose source and target are the same, ie a ``loop''
\circlearrowright .
Semantics
REMARK 1.3.6 Besides notation and discipline, types also internalise values, which need not have names.
For example there are (in a classical understanding) far more irrational numbers than we can name in
finitely many symbols, but a function ``on R'' is meant to be defined for all numbers, not just those with
names. Even for the natural numbers, where each value does have a name, the symbol N brings the
completed infinity of numbers into the discussion.
LEMMA 1.3.7 Let t be a term of type Y with a free variable x of type X. Then
x R→ y ⇔ y = t
is a total functional relation from X to Y.
PROOF: The notation is deceptively simple, so we must first clarify its meaning. The term ``t'' belongs to
the intensional syntax and as such may involve the variable x, which is also understood syntactically.
The other graphical symbols belong to the extensional semantics. Therefore to interpret the formula ``y
= t'' we must convert t from the syntax to the semantics, by substituting a term representing the value x
for the variable x wherever it occurs in t, and then evaluating the result.
We may regard the types as the sets of values, where the values may be equivalence classes of terms, or
normal forms. If y1 and y2 are two values which are both equal to the value of t then they must be equal
to each other (it is the confluence property that allows us to use normal forms here instead of
equivalence classes), so R is functional. It is total because the value of t itself witnesses ∃y.R:x→ y,
although we may choose to say instead that only those equivalence classes which have normal forms are
to be treated as ``defined'' values. []
We use these two notations synonymously. The semicolon was used in this sense, for left-to-right
composition, by Ernst Schröder in 1895. Today it is used for sequential composition in imperative
programming languages (Definition 4.3.1). The identity relation \id is the same as equality on X. It is
X
also called the diagonal relation (∆) because when its values are written out in a square table the entries
on the diagonal are true and the others false (Exercise 1.18).
LEMMA 1.3.9 If R and S are the (total functional) relations which correspond to terms v:Y and w:Z, each
having one free variable x:X and y:Y respectively, then So R corresponds to w[y: = v]. Also, the diagonal
relation corresponds to the variable x:X considered as a term.
Composition preserves functionality and totality, but we postpone the proof to Lemma 1.6.6 for reasons
of exposition.
Relational calculus The definition of a (total) functional relation is not symmetrical in X and Y, so we
can ask what happens if we interchange the roles of the variables in the conditions. Of course what we
are then considering is
op
DEFINITION 1.3.10 The converse relation has yR → x ⇔ x R→ y. Its source is now Y and its target X.
(a)
op
injective or 1-1 if x1R→ y∧x2 R→ y ⇒ x1 = x2, ie R is also functional; we write R:X
\hookrightarrow Y or R:X\rightarrowtail Y;
(b)
op
surjective or onto if ∀y.∃x. x R→ y, ie R is entire; we write R:X \twoheadrightarrow Y;
(c)
op
bijective if both R and R are total functional relations.
(a)
f, considered as a functional relation, is bijective;
(b)
f is (total,) injective and surjective;
(c)
there is a function g:Y→ X such that go f = \id and fo g = \id .
X Y
op
Moreover in the last case g, which we call the inverse, f-1, is unique and is given by f . When f has an
inverse we call it an isomorphism and write f:X ≡ Y. (An isomorphism whose source and target are the
same type is called an automorphism of that type.) Beware that, when there is other structure, a
bijection is not necessarily an isomorphism (Example 3.1.6(e)). []
DEFINITION 1.3.13 An endofunction e:X→ X is called idempotent if eoe = e. In this case, x is in the
image of e iff it is fixed by e.
i
A = {x|e(x) = x} \hookrightarrow \lOnto X
q
The inclusion i into and the surjection q onto the set of such points are said to split the idempotent; they
satisfy i;q = \id and q;i = e. The functions i and q are called respectively split mono and split epi. The
A
Chapter II will begin the study of types, concentrating on functions in Section 2.3. Composition is the
basis of category theory, beginning in Chapter IV; in particular Remark 4.4.7 considers isomorphisms.
The relational calculus will be discussed further in Sections 3.8 , 5.8 and 6.4. We shall now turn to the
symbols ⇒ , ∧, ∀ and ∃ which we have just started using.
The style has been traditional in editions of Euclid's Elements since the invention of printing:
First we write down the hypotheses, each on a separate line and annotated as such. Then we
write any formulae we wish to derive from them (and from formulae which have been
derived already), noting the names of the rules and hypotheses used.
Each step is introduced by \therefore ( therefore) and its reason by \because ( because). The reason cites
one of a small number of rules of inference and some previously asserted formulae (the premises of the
rule). In presenting the rules we shall employ the ruled line notation used in Definition 1.2.3.
Each one of the rules makes an appearance somewhere in ancient or medieval logic, long before Boole. So
they all have Latin names, which shed no light at all on their structure (even if you can read Latin). As
Russell commented, ``Mathematics and Logic, historically speaking, have been entirely different studies.
Mathematics has been connected with Science, Logic with Greek.'' The only Latin name in common use in
mathematics is modus ponens, which we call (⇒ E ). The symbolic names reflect the symmetries of logic.
(a)
Atomic predicates ρ[\termu1,…,\termu ] where \termu is a term of type \typeX . The number k is
k i i
called the arity. Predicates for which k = 0 are usually called propositions; in particular there are
constants
If k ≥ 2 we speak of relations, as in the last section. For example we have the binary relations of equality, u
= v, order, u ≤ v, and membership, u ∈ w (Definition 2.2.5), where u,v:X and w:P(X).
(b)
If φ and ψ are formulae then so are
\lnot φ (not) and φ⇔ ψ (equivalent) are abbreviations for φ⇒ ⊥ and (φ⇒ ψ)∧(ψ⇒ φ) respectively. The
arrow should be read ``in so far as φ (the antecedent) holds, then so does ψ (the consequent).'' These
symbols are collectively known as connectives.
(c)
If φ is a formula and x a variable of type X then
are formulae. The symbols ∀ and ∃, which are called quantifiers, bind variables in the way that we
described in Definition 1.1.6ff.
The phrase ``there exists'' will be discussed in Remarks 1.6.2(f) and 1.6.5.
REMARK 1.4.3 The names of the rules specify the connective involved - that is, the outermost connective
(Remark 1.1.1), as the formulae φ and ψ may themselves involve connectives. The ℑ stands for an
introduction rule, ie where the formula in the conclusion is obtained from that in the premise by adding the
connective in question. Similarly E indicates an elimination rule, where the connective has been deleted.
We employ the introduction rule to give a reason for the formula, and the elimination rule to derive
consequences from it.
Notice how the elimination rules for ∧ and ∀ mirror the introduction rules for ∨ and ∃. (Tℑ ) and (⊥E ) and,
to some extent, the other rules also match up, although the duality is seen more strictly in classical logic,
Theorem 1.8.3. The other three direct logical rules each have two premises, so cannot have mirror images.
REMARK 1.4.4 The (⊥E )-rule comes as a surprise to novices. It is like playing the joker in a game of cards:
⊥ stands for any formula you like. The strategy usually disregards this card ( minimal logic leaves it out),
but it can sometimes save the day, eg in Remark 1.6.9(a).
REMARK 1.4.5 There are two (∧E )- and (∨ℑ )-rules, whilst (∧ℑ ) has two premises and (∨E ) has two sub-
boxes (Definition 1.5.1), since these connectives are binary. The nullary (T, ⊥) and infinitary (∀, ∃)
versions follow the same pattern: in particular, there is no (TE )- or (⊥ℑ )-rule.
Since the (∧ℑ ), (⇒ E )- and (\lnot E )- rules, together with transitivity (1.2.3), description (1.2.10(a)) and
congruence ( 1.2.12) have two premises, the ancestry of a deduction is not linear but tree-like, eg Lemma
1.2.11. Proofs, like expressions, involve binary operation-symbols, which we shall come to recognise as
pairing for (∧ℑ ) and evaluation for (⇒ E ).
Whilst the tree style shows more clearly the roles played by individual hypotheses in a deduction, it can
repeat large parts of the text when a derived formula ( lemma) is used more than once as a subsequent
premise, as in induction (Remark 2.5.12). Big sub-expressions also get repeated in algebraic manipulation,
but this can be avoided by use of declarative programming (Definition 1.6.8 and Section 4.3).
The provability relation The presentation of a proof as a chain or tree of assertions is very convenient
when the aim is to show that some result in mathematics is true. But from the point of view of logic per se,
we need a notation which says that ``there is a proof of θ from hypotheses φ1,…,φ .'' This list is to be
n
understood conjunctively ( φ1∧···∧φ ); it is called the context and will be denoted by Γ. (Such contexts
n
must not be confused with ``context-free languages,'' Example 4.6.3(d).) The provability assertion is written
Γ\vdash θ and is called a sequent .
REMARK 1.4.6 Provability is an inequality φ\vdash θ on formulae, whereas reduction rules (Section 1.2)
defined when two expressions were equal. Ernst Schröder observed that the inequality is more natural in
logic. Formulae are then equal if both φ\vdashθ and θ\vdash φ, abbreviated to φ\dashv \vdashθ.
Beware that φ1,φ2\vdash θ1,θ2 means φ1∧φ2\vdash θ1∨θ2 in Gentzen's classical sequent calculus, but for us
the comma means conjunction on both sides.
The structure consisting of inter-provability classes (equivalence classes under \dashv \vdash ) of formulae
is called the Lindenbaum algebra. In a much more general form (the classifying category or category of
contexts and substitutions, Section 4.3) it will be the major object of study in this book; the weakening rule
will play a crucial role in our construction.
DEFINITION 1.4.7 The provability relation, written using the turnstile \vdash (which comes from Frege,
1879), is generated by three classes of rules:
(a)
the structural rules, which govern the way the formulae move around the proof,
(b)
the logical rules, which determine the way in which the connectives and quantifiers behave, and
(c)
the non-logical rules, which relate to symbols in the object- language.
For the logical rules we may make other distinctions: between the direct rules we have given and the
indirect rules of the next section (using temporary hypotheses, which must be delimited somehow), and
between introduction and elimination rules.
DEFINITION 1.4.8 The structural rules are the identity axiom, cut , exchange, weakening and contraction:
It is precisely the exchange and contraction rules which enable us to treat Γ as a set instead of a list (indeed
in Section 8.2 we shall discuss a similar set of rules for type theory in which exchange is not generally
valid, and there Γ must be considered as a list). The cut rule allows us to delete those hypotheses which are
derivable from others; the exchange and contraction rules mean that the context of its conclusion is just
Γ∪Ψ.
The weakening rule says that hypotheses may be added to the context. If we allow weakening by an
arbitrary set of hypotheses we can give a meaning to an infinite context: something may be proved in it iff
it may be proved in some finite subset (though we shall not take this up).
REMARK 1.4.9 There are several ways of presenting the direct logical rules in sequent form, cf operation-
symbols applied to either variables or terms (Remark 1.1.2). For example (\land E0) may be written as any
of
Gerhard Gentzen [Gen35] used the third form in his sequent rules for intuitionistic logic. His Hauptsatz
(German: main theorem) was that anything that can be proved in the sequent calculus is provable without
the Cut rule: cut-free proofs are normal forms. All of his rules apart from cut have the sub-formula
property: the formulae used in the premises are sub-formulae of those in the conclusion, so controlling the
search for a proof. This proves consistency, but by using an induction principle which is stronger than the
calculi under study.
Cut is redundant in Gentzen's calculus because he saturated the other rules with respect to it. This makes it
a very cumbersome notation for justifying theorems, since the context must be copied from one line to the
next, usually without change.
The (∀E )-rule says that all instances φ[a] obtainable by substituting terms hold, and it says no more than this. But by ∀x:X.φ[x] we intend φ[x] to
be true throughout the type X ( cf Remark 1.3.5), not just for those values a with names which can be used for substitution.
Universal quantification over numbers (and not just a scheme ranging over numbers) is needed to formulate the principle of induction,
⇒ ∀n:N.θ
θ[0]∧ ∀n:N.θ[n] ⇒ θ[n+1]
[ n].
Remark 2.5.12 illustrates the role of the part of the proof corresponding to the nested ∀, as a repeating feature of a tree. The type X of the variable
(the range of quantification) is essential to understanding the quantified formula ∀x:X.φ[x].
The indirect rules To prove φ⇒ ψ we must temporarily assume φ, and deduce ψ from that. Then we are (by definition) able to deduce φ⇒ ψ
without assumption. The additional hypothesis φ, and anything we deduced from it, are no longer part of the context for φ⇒ ψ, and so our
notation must provide a way of ``shielding'' them. The demarcation between facts, assumptions and hypotheses (page 1.1) has changed.
The word indirect has been borrowed from the euphemism indirect proof for excluded middle (Definition 1.8.1), because in these cases at certain
stages in the deduction we make assertions which are subsequently withdrawn. We saw similar transformations of expressions as trees on pages
1.1.1- 1.1.2. Dag Prawitz [Pra65] called them proper rules and improper rules.
: : :
^
*
θ θ x θ
0
\multispan6
We shall show how to translate the formal rules into the vernacular (English, French, etc prose) and back. But it is
important to understand that this translation cannot be a precise one like those amongst logical, programming and
categorical languages which we give elsewhere in the book. This is because the meaning of a vernacular sentence (if it has
one) is not given structurally in terms of the component parts, but depends heavily on unstated contextual information.
Sentences which a classical grammarian would parse alike may have very different logical meanings.
The art of translation between human languages lies in idiom: particular phrases in the two languages match, not literally,
but as a whole. The theory of descriptions (Definition 1.2.10) has already illustrated this.
Direct reasoning
a = b = c = d = ···
where the symmetric and transitive laws of equality are being elided.
Little more needs to be said for the other direct rules: the formal style places each assertion below the previous one, spelling
out which rule justifies the step. By contrast, for reasons of space, it is usual to present a long argument as running text,
divided into paragraphs and sentences to indicate the milestones in our progress. Arguments of this form are hardly literary
prose, and we save hence, thence and whence from the grammatical graveyard simply to avoid the monotony of therefore.
Proof boxes delimited by keywords The vernacular has its own ways of accommodating simple departures from direct
logic.
REMARK 1.6.2
(a)
`` Put x'' indicates a substitution, such as an instance of a universal formula (the substitution used in an (∀E )-rule,
Definition 1.4.2 and Remark 1.5.2) or a declaration (Definition 1.6.8). This associates a specific value or expression
with the name x.
(b)
`` Let x'' introduces a fresh variable, opening an (∀ℑ )-box. No value in particular is given to x - it is generic - until a
β-reduction (Remark 1.5.10) annihilates the (∀ℑ ) and corresponding (∀E ).
(c)
`` Suppose φ'' opens an (⇒ ℑ ) -box with a hypothesis.
(d)
`` Thus'' (in this way) closes these boxes.
(e)
`` If φ then... . Otherwise '' delimit (∨E )-boxes. In programming languages `` fi '' or `` endif'' closes these boxes,
drawing attention to the common conclusion, but ``in either case'' is English idiom.
(f)
`` There exists x such that φ[x]'' has two linguistic functions: both asserting ∃x.φ[x] (``for some x'') and opening the
(∃E )-box which makes use of something satisfying φ. The same symbol x is both the bound variable and the
temporary witness (Remark 1.6.5ff).
Boxes can be avoided by dividing the presentation of a topic into lemmas, each of which deals with a single box. The
idiomatic proof of a lemma with ∀x.θ[x] as its conclusion or ∃y.φ[y] as its hypothesis would not bother to state the
quantified formulae or use any kind of box: the variables are simply global to the proof. The same applies to the proof of ∃z.
θ[z]: the witness z has to be introduced at some point and its properties developed. So only when the proof is complex, with
heavily nested (⇒ ℑ )- and (∀ℑ )-rules and no natural way of packaging the parts, must we use a formal style to make the
argument clear.
Although nested boxes can be handled by additional lemmas, they make us deaf to anything logic may have to say about a
problem: for example in (ε-δ)-proofs in analysis, information (the degree δ of approximation) flows backwards from output
to input. Induction, with its nested ∀, involves nested (∀E )- boxes, of which Section 2.6 gives examples.
It is easy to translate formal proofs mechanically into the vernacular, though some creativity is needed to make them
readable, cf word by word translations from a foreign language using a dictionary. The other way is much more difficult - it
would be quicker to reconstruct the proof from scratch (as I often find myself). This raises questions about the usefulness of
proofs in printed journals in future.
Often the conceptual structure of an argument may already be present in a simpler version of the result, the more substantial
one involving some difficult calculation. If the former, which explains the theorem, had been laid out, the readers could
have supplied or omitted the details of the calculation for themselves.
Importing and exporting formulae One should think of a box as a separate logical world, interacting with our own only
across a membrane, which allows any hypothesis to enter from above (unless, of course, this means taking it out of another
nested box).
A formula within a box which has no proof there is a hypothesis; this is replaced by evidence if the formula is imported
from outside (β-reduction, Remark 1.5.10). Indeed, the weakening rule specifically allows this for external hypotheses, and
nothing stops us from repeating parts of the development; so we may as well import the conclusion instead. The weakening
rule for variables is given in Remark 2.3.8.
In general, the ability to use formulae from earlier in the proof is known as referential transparency, it was discussed by
Willard Quine for natural language [Qui60]. For us, it is a manifestation of invariance under substitution. This phenomenon
will arise
(a)
for the λ-calculus as the naturality equation (Definition 4.7.2(c));
(b)
for conditionals as stability under pullback (Definition 5.5.1);
(c)
for composition of relations (Lemma 5.8.6) and for the existential quantifier (Remark 9.3.7) in the same way;
(d)
and for recursion over N as a product with Γ (Remark 6.1.6).
(e)
The effect on ⇒ and ∀ will only be apparent in Section 9.4, where it is the Beck-Chevalley condition.
REMARK 1.6.4 The box rules allow us to export θ from an (⇒ ℑ )-box in the form φ⇒ θ, and θ[x] from (∀ℑ) as ∀x.θ[x].
Although we presented these rules with just one such formula, and wrote it on the last line of the box, in fact any number of
formulae may be exported from any line of the box, if they are appropriately qualified (by ∀x or φ⇒ ). A formula may be
exported unaltered from an (∨E )-box if it occurs on both sides.
Open-ended boxes The (∨E )-rule provides a proof in each of the two cases, without prejudice as to which holds.
Similarly the (∃E )-rule gives a demonstration in terms of an unspecified witness. Dependence on the alternative or witness
means that a box is needed, but we shall now show that the (∃E)-box is open-ended below (we have just seen that all boxes
are open above). This rule is the least well understood in the practice of mathematics, although it has a bearing on the use of
structure such as products (Remark 4.5.12) and the meaning of finiteness (Remark 6.6.5).
REMARK 1.6.5 Since the conclusion (θ) is arbitrary, the closing of the box may be postponed indefinitely, ie until the end of
the enclosing box or proof. This is because any θ′ which we deduce from θ outside the box (necessarily not mentioning the
witness) may equally be deduced inside and exported as the conclusion instead of θ.
.5 -48pt \proofboxleftmargin25pt omitted proofbox environment \vtop to 24pt \leadsto omitted proofbox
environment
So the box need not be closed at all.
PROOF:
The properties of the (∃E )-box make it notationally redundant, and explain the idiomatic phrase ``there exists.'' However,
the conclusions of such an argument cannot be exported from enclosing boxes, unless the witness is unique (φ[x] is a
description, Definition 1.2.10), in which case a function-symbol may be introduced.
Enlarging the box as much as possible is appropriate for the existential quantifier. The conclusion of (∨E ) is also
indeterminate, but this is normally exploited in such a way as to shorten the proof, by closing the box as soon as the
alternatives can be reunited. Remark 2.3.13 sets out the continuation rules which are needed in type theory to handle the
open-endedness, and Example 7.2.6 explains it categorically.
REMARK 1.6.7 Notice that the (∃E )-rule alone - one of the two halves of the meaning of the quantifier - suffices to give a
formal justification of the introduction of an unspecified witness for any existentially quantified statement, and the
continued use of this witness until the end of the enclosing box. It is quite unnecessary to postulate, as Hilbert and later
Bourbaki [Bou57] did, a global process which selects such witnesses, and indeed to do so amounts to the axiom of choice
(Definition 1.8.8).
DEFINITION 1.6.8 A very useful application of the (∃E )-box and its open-endedness is the definition or declaration. For any
well formed term t we may always introduce a fragment of proof of the form
= .5omitted proofbox
environment
After this, as far as the end of the next enclosing box, x and t may be used interchangeably. As t is a term, ie a long sequence
of symbols, x is an abbreviation for it, but any conclusion we reach concerning x may be translated into one about t and
exported from the box.
Since the box is open-ended, we don't bother to write it, and condense the three steps above to `` putx = t.'' Although after
the declaration the relationship between t and x is symmetrical, during the defining step they play different roles ; some
authors indicate this by writing x: = t. The symbol x is called the definiendum or head and the term t the definiens or body.
Similarly when t is a proposition we say ``if'' rather than ``iff'' or ``if and only if'' in a declaration.
Notice that any variables or hypotheses in the context are parameters or preconditions, so declarations cannot be exported
from enclosing boxes.
REMARK 1.6.9 The phrase without loss of generality (wlog) is a variant of declaration which is analogous to assignment in
imperative programming languages (Remark 4.3.3). For example,
(a)
To show that there is at most one integer satisfying a given property we take x and y having it and may suppose
without loss of generality that x < y to derive a contradiction, because the argument from y < x is similar. By the
trichotomy law ∀x, y.x < y∨x = y∨x > y we are left with x = y; this does not use excluded middle because, by (⊥E ),
we can deduce x = y in each of the three cases.
(b)
To solve the general cubic equation (Example 4.3.4) we may assume without loss of generality that it is x3-3px-2q =
0, because it is easy to turn the general problem into this form and the result back.
(c)
To solve Newton's equations for the motion of the Earth about the Sun, we may assume without loss of generality
that the origin is at their centre of mass and motion takes place in the x y-plane.
These assumptions are desirable because the problem becomes simpler. They are permissible because the general problem
may be transformed into the special case and its solution back again. One ought to take care to distinguish the general and
special cases by different notation, such as x and x′, but commonly this is not done. Nevertheless, when we say ``without
loss of generality'' we must always state the two-way translation involved (the means of exchange).
Alternative methods Until Frege, Aristotle's syllogisms (Exercise 1.25) were the standard treatment of the quantifiers, and
they are still taught to unfortunate philosophy students, despite significant advances in logic in both ancient times and the
twelfth to fourteenth centuries.
Gottlob Frege was by far the best logician between the Renaissance and the First World War. The distinction in Section 1.2
between an expression and its value is due to him. In 1879, while others such as de Morgan and Schröder were still battling
with the propositional connectives, he developed a modern theory of quantifiers, understood bound variables and used
second order logic with confidence. His verbal explanations are crystal clear, but his space-consuming notation ( Begriffs
schrift, concept-writing) would have consigned his work to oblivion but for Russell's attention. It must have caused
nightmares for his printer, but he argued (in a letter to Peano, who printed his own books) that this was not the important
consideration: one ought to take advantage of the second dimension to explain mathematics. We use Frege's theory of
sequences in Sections 3.8 and 6.4.
REMARK 1.6.10 Hilbert's logic, like that of Frege and Russell, had steps like
Implicational logic is an interesting fragment of Hilbert's logic; see Exercise 1.23 and Example 2.4.2.
All of the assertions in a Hilbert-style proof are facts, whereas idiomatic arguments in mathematics use hypotheses. The first
formal account of natural deduction, together with its equivalence with the Hilbert style, was given by Stanislaw Ja
\'skowski [ J \' as34], based on ideas of Jan Lukasiewicz. It treated not only implication and conjunction but also the
universal quantifier (and substitution), noting and correctly handling the problem of empty domains (Remark 1.5.6).
Gerhard Gentzen [Gen35], a student of Hilbert, treated the connectives individually (whereas previous authors had defined
some in terms of others), recognising the symmetry in their rules. He gave translations amongst natural deduction (NK),
sequent calculus (LK) and Hilbert-style (LHK) classical and intuitionistic (NJ, LJ, LHJ) logic.
Frederic Fitch wrote the first textbook [Fit52] to make routine use of natural deduction using our proof boxes, modulo some
syntactic sugar, and Nikolas de Bruijn developed the notation for AUTOMATH.
REMARK 1.6.11 Gentzen used the tree notation (Remark 1.4.5) for his natural deduction. This style remains the prevalent
one amongst logicians, but it is highly unsatisfactory for the indirect rules, especially for extended proofs. The formulae at
the leaves (with no rule above them) are hypotheses; when φ⇒ θ has been deduced from θ, φ need no longer be a
hypothesis and so may be discharged. The (∨E )-rule is similar. We have done this by closing a box, but it is traditionally
indicated by striking through the formula:
∀n ≥ 2.∃p,q prime.2n = p
+q
cannot be perceived by any mortal from its sub-formulae or instances. The truth-values interpretation of ∀n in this formula
has been verified up to large values, but a proof, if there is one, must be finitary and introduce this quantifier by considering
a generic n, ie using the (∀ℑ )-rule.
REMARK 1.6.12 A formula is not true or false of itself, but only when interpreted in a model M. This specifies
(a)
the individuals which are denoted by the constants, and over which the variables range,
(b)
for each relation-symbol and tuple of individuals, whether or not this instance of the relation holds, and
(c)
for each operation-symbol and tuple of individuals, what individual is the result of the operation.
The meaning of general terms and formulae is defined by structural recursion; this is straightforward for the connectives ∧
and ∨, and we write M\vDash φ if φ is valid in the interpretation. However, a quantified formula ∀x.φ[x] is valid if φ[a]
holds for each individual a, which is an infinitary condition, so the naive meta-language is no longer adequate. The arity of
∀x is the semantic object which interprets the type of the syntactic variable x. We shall consider infinitary operations, ie
whose arities are objects of the world under mathematical study, in Section 6.1 . The quantification over individuals is then
performed by the (∀ℑ )-rule in the meta-logic, as we can never escape completely from finitary proof.
Having set this up, one can show that each of the logical rules preserves validity, ie whenever the premises are true, so is the
conclusion. This is called soundness. To an algebraist, it says that the interpretation is a homomorphism for ∧, ∀, etc . By
structural induction on a proof Γ\vdash φ, if its hypotheses are valid (M\vDash Γ) then so is its conclusion (M\vDash φ).
REMARK 1.6.13 On the other hand, suppose that every model M for which M\vDash Γ also satisfies φ. In this case we say
that Γ (semantically) entails φ and write Γ\vDash φ. As we have said, when there is a proof Γ\vdash φ, then Γ\vDash φ. But
as the notions of proof and validity have been defined independently, Γ\vDash φ may perhaps happen without a proof, either
if φ is supposed to be true but we have forgotten to state some rule of deduction needed to prove it, or if it ought to be false
but our class of models is too poor to furnish a counterexample. The proof theory is said to be complete for its semantics if
proof and truth do coincide. Kurt Gödel showed that first order logic (what we have considered so far) is complete, but
second or higher order logic (Section 2.8) is not.
Both of these famous theorems raise a number of deep questions, many of which are beyond the scope and viewpoint of this
book. As we have noted, there was a tendency in traditional logic to study propositions, regarding variables, individuals and
terms as secondary. To correct this attitude we must add more detail, and begin with the algebraic fragment alone. The
connectives and quantifiers are added one by one, so we shall not reach the classic model- theoretic results. On the other
hand, models in the old framework had to be discrete sets, whereas for us they may perhaps be topological spaces, or come
from some more exotic world. In fact it is possible to fashion such worlds out of the syntactic theories themselves, and they
contain generic models. Conversely, Section 7.6 shows how to design your syntax to fit your semantics.
Model theory, with the completeness theorem at its heart, is a ``three sides of a square'' approach to logic: given the axioms
for, say, groups, instead of deducing theorems directly from the axioms, it seeks to find out what is true about the proper
class of models. We aim to do things directly, in particular recognising the Lindenbaum algebra or ``classifying category'' of
a theory as a syntactic construction.
REMARK 1.6.16 I have seen logic introduced to first year undergraduates both in the form of truth-assignments and using
proof boxes, and firmly believe that the box method is preferable. As this section has shown, it is a formal version of the
way mathematicians (and informaticians) actually reason, even if they claim to use Boolean algebra when asked.
If you want to teach both interpretations, they have to be shown to be equivalent. The formal soundness of (∨E ) and (⇒ ℑ )
is more difficult to explain than one might suppose, as it depends on hypotheses, ie the notion of scope ( cf proof boxes).
Soundness for a whole proof makes use of structural induction, which, whilst they should learn it during the first year, is
unfamiliar to students just out of school. Although they need to be aware of the truth-values interpretation, together with the
statement and explanation of soundness and completeness, this should be given at the end of a first logic course, after the
students have learned to construct some actual proofs.
Vernacular idioms for induction will be discussed in Sections 2.5-2.8 and 3.7- 3.9, and declarative programming in Section
4.3.
The steps which can be automated are the obvious ones, in a technical sense: this literally means ``in the
way'' in Latin. It is obvious how to go through a foreign airport, not because you know it intimately, but
because there are signs telling you where to turn whenever you need them (you hope). This is also
known as exam technique: write down and exploit what you already know. Whereas the box or sequent
rules of predicate calculus from the previous section are the laws of the game of proof, the heuristics are
hints on the tactics. This section is based on teaching first year informatics students to construct proofs
on paper. Of course this will also give some idea of how to write a program to do it, but the strategy for
making choices when backtracking is needed raises issues far outside the scope of this book [Pau92].
George Polya [Pol45] and Imre Lakatos [ Lak63] gave two classic accounts of heuristics in mathematics,
using Euclidean geometry for examples. Polya's advice - make a plan and carry it out, compare your
problem with known theorems, etc - is extremely valuable to help students of mathematics (and
professionals) get past the blank sheet of paper, but treats more strategic aspects of proof than we can.
An early theorem prover was based on his methods of drawing diagrams and formulating conjectures
and counterexamples; that this seems odd now shows both the sophistication of modern proof theory and
perhaps also the danger of isolation from the traditional instincts of mathematicians.
Nikolas de Bruijn's AUTOMATH project (late 1960s) set out to codify existing mathematical arguments,
rather than to find new theorems, and this remains the research objective of automated reasoning. Johan
van Benthem Jutting (1977) translated Edmund Landau's book Foundations of Analysis into AUTOMATH
and analysed the ratio by which the text is magnified, which was approximately constant from beginning
to end. Similar work has been done for other areas of mathematics.
There are certain dangers inherent in the formalisation of mathematics. Systems of axioms acquire a
certain sanctity with age, and in the how of churning out theorems we forget why we were studying these
conditions in the first place. Computer languages suffer far more from this problem: nobody would
claim any intrinsic merit for FORTRAN or HTML, but sheer weight of existing code keeps them in use.
Through the need for a standard - any standard - a similar disaster could befall mathematics if set theory
were chosen. As with any programming, and also with the verification of programs, far more detail is
required than is customary in mathematics. G. H. Hardy (1940) claimed that there is no permanent place
in the world for ugly mathematics, but I have never seen a program which is not ugly. Even when the
mathematical context and formal language are clear, we should not perpetuate old proofs but instead
look for new and more perspicuous ones.
Although we must read a finished proof from top to bottom, the search for and creation of the proof are
not so direct. (The commonest misconception about mathematicians amongst the general population is
that we act like robots when trying to solve problems.) By the nature of cut- elimination, the heuristics
are in fact goal-driven: they proceed mainly in the opposite direction from the reading of the completed
proof.
For certain fragments of logic, if there is any proof of Γ\vdash θ then there is one obtainable by means of
the following heuristics. Conversely, if we fail, by completeness (Remark 1.6.13) there is a
counterexample, which can be obtained from the trace of our proof- attempts.
FACT 1.7.1 Hereditary Harrop formulae are the definite formulae γ and goals θ respectively defined by
the grammar
where α is atomic. If Γ is a list of definite formulae and θ is a goal formula for which Γ\vdash θ is
provable, then it has a uniform proof, ie one in which each sequent ∆\vdash φ with φ non-atomic is
deduced only by means of the introduction rule for the outermost connective of φ [MNPS91]. []
Resolution When the goal is an atomic formula, logical manipulation has nothing to say, and we have
to make use of the database, ie the axioms Γ given in the problem. These are written at the top of the
page, numbering the lines from 1 and giving the justification for each line as `` data.'' The desired
conclusion(s) or goals θ are written at the bottom, numbering the lines backwards from 99 and giving no
reason (yet). We shall progressively add more lines 2, 3, ... and 98, 97, ... and also fill in reasons; the
lines which have no reason so far are called pending goals.
φ1∧φ2∧…∧φ ⇒ θ
k
is in the database, then the problem is reduced to proving each of φ1, φ2, ..., φ . The idea of logic
k
programming is that this is a procedure which defines θ (the head) in terms of the φ (the body), and the
i
In order to answer the query θ, we regard the database as a program and call a procedure whose head is
θ, which calls sub-procedures; if the original call returns successfully then θ has been proved, ie the
answer to the query is yes. Notice that while the search for a proof is in progress there may be several
pending goals, to be taken conjunctively, just as the database may consist of several hypotheses.
REMARK 1.7.3 The program is non-deterministic, because there may be several procedures for θ: if using
one of them fails to find a proof, we backtrack and try another. To do this by hand, place a new sheet of
tracing paper over the proof so far each time you have to make a choice; then if the choice is wrong you
can discard the working which depended on it and return to the immediately preceding state. Only the
last choice is discarded: earlier ones may still be viable until all possibilities at this stage have been
exhausted. This means that the choices in the search form a nested system in the heuristics, but this is
independent of the nested contexts (boxes) in the completed proof.
→ → → → → → → → → → →
φ ]∧···∧φ [ x , y , z ] ⇒ θ[ c , x ]
∀ x , y . ∃ z . 1 [ x , y , z k
By convention, the whole formula is universally quantified over all the free variables, which is the same
as saying that it is a scheme for the closed formulae obtained by substituting terms for variables. A
formula such as this, in which the sub-goals φ are also atomic predicates, is called a ( positive or
i
definite) Horn clause of arity k. (Recall from Definition 1.4.1(a) that the atomic predicates φ [[(x)\vec]]
i
also have arity - the length of the sequence [(x)\vec] - but this is independent of the arity of the clause, ie
the number of atomic formulae it contains.)
Suppose that we want to use this Horn clause to prove (solve the query) θ[[(a)\vec],[(b)\vec]]. By (∀E )
we put [(a)\vec] for [(x)\vec], and by (⇒ E ) we have to prove φ[[(b)\vec],[(d)\vec],[(e)\vec]] and match
[(a)\vec] = [(c)\vec], substituting suitable terms [(d)\vec] and [(e)\vec] for [(y)\vec] and [(z)\vec].
Then for the query journey[Nice, Bristol, u] we expect not only a proof that one can go from Nice to
Bristol by rail, but also the route and cost. So when we assert ∃x.θ[x] we give a definite answer as to
what x is - these substitutions are the result of the computation.
REMARK 1.7.6 John Robinson showed how to do this by resolution (1965). Gentzen's Hauptsatz cannot
eliminate cuts when axioms are used, and resolution deals with those that remain. It involves substitution
of terms for variables, but each resolution step only gives partial information about what has to be
substituted: the constraints which fully determine the value may come from quite different parts of the
proof (execution of the program).
Here φ[[(b)\vec],[(y)\vec]0] is a new goal, to be satisfied by further resolution, as we have done with θ.
The partial proof on the right illustrates the similar way in which existential goals are handled.
The equations [(a)\vec] = [(c)\vec] are also new goals. If these terms are simply names for individuals
(London, York, etc ) and there are no axioms to say that individuals with different names can be equal
then we can see immediately whether or not the equations hold. If not, this attempt at resolution fails and
we backtrack to find another one. In practice this is done by database-searching techniques.
The programming language PROLOG does resolution and unification. Despite its name, it does not in fact
deal with the logical connectives and quantifiers, but what we shall come to call the algebraic fragment
(although this will not look like algebra until Section 4.6). The denotational semantics, based on the
work of Jacques Herbrand (1930), will be discussed in Sections 3.7 and 3.9.
Unification Goals involving function-symbols need another technique, called unification. How to do
unification is easy: the difficult part is to see what it means. The functions in question are those whose
values might be enumerated in a database, such as mother_of, not arithmetic.
REMARK 1.7.7 A goal of the form r([(u)\vec]) = r([(v)\vec]), where r is an operation-symbol for which
no laws are known, can only follow by substitution:
omitted prooftree
environment
with the new goals \termu1 = \termv1, ..., \termu = \termv .
k k
This does not mean that every function is injective. We want to carry on building the logical structure of
a proof, possibly without knowing what terms serve as the subjects of predicates. We postpone filling in
these terms, and then try to do so as non-specifically as possible, using only the building blocks we
already have in the term calculus of the object-language. The possibility that two terms might denote the
same thing is only considered if the terms themselves were formed in the same way.
(a)
if we have no information about r, then the only hope we have of proving that r([(u)\vec]) = r([(v)
\vec]) is by first showing \termu = \termv (1 ≤ i ≤ k). This step can be built into the proof-layout
i i
we have given, by treating \termu = \termv as new goals and giving `` substitution'' as the reason
i i
So
(b)
a match between terms r(\termu1,\termu2) = r(\termv1,\termv2 ) having the same outermost
operation creates a new equation for each argument, \termu1 = \termv1 and \termu2 = \termv2;
despite this proliferation the algorithm does terminate because \termu and \termv are all shorter
i i
(c)
(d)
if r does satisfy other axioms or laws, unification may be of no help; at any rate it may involve
backtracking, as for example with concatenation of lists, where a division must somehow be
chosen;
(e)
a goal of the form x0 = u, where x0 is an indeterminate and u a term in which x0 does not occur,
forms part of the solution of the unification problem, and completes the unfinished declaration
( putx0 = ?) in Remark 1.7.6;
(f)
an equation such as x0 = r(x0), in which x0 does occur on the right, cannot be satisfied by
substitution of a term for x0 (try it!); this necessitates the occurs-check;
(g)
the other axioms for a congruence (Definition 1.2.3) are also applicable: if u = v and v = w are
goals then so is u = w, and these may match, clash, form part of the solution or fail the occurs-
check;
(h)
the heuristic applies only to goals, not to hypotheses - it exploits u = v⇒ r(u) = r(v) without
asserting the converse.
Eventually, if neither type of failure (clash or occurrence) happens, the system of equations will be
saturated, ie none of these rules will expand it further. Then some of the indeterminates will be
expressed as terms, possibly involving the others.
REMARK 1.7.8 Some of the indeterminates may be independent, for example y0 is arbitrary in the
equation x0 = r(y0). The full solution to the unification problem is not unique, since we may put anything
we please for y0. However, the solution in which y0 is left as we have it is the most general unifier in
that
(a)
it is itself a solution,
(b)
(c)
any such substitution is a solution.
We can in fact eliminate the confusion of working backwards from goals, and reduce unification to a
kind of algebra. A theory with operation-symbols of various arities (numbers of arguments) but no laws
is called a free theory, and unification is the study of its free or term model. We shall take this up
formally in Chapter VI and return to unification in Section 6.5 , where we shall see that Remark 1.7.7(g)
can be simplified.
Unification in theories with laws is more difficult. It is possible to handle commutativity and
associativity, at the cost of uniqueness: there is now a family of maximally general unifiers. Unification
under the distributive law would give a uniform way to solve Diophantine equations, but Yuri
Matajasivi vc showed that this is undecidable (1970). Gérard Huet showed that unification in higher
order λ-calculi is also undecidable [ Hue73].
Box-proof heuristics Now we turn to the logical symbols themselves. The following methods belong
in a course on the predicate calculus: it is probably better to teach resolution quite separately. Unless we
say otherwise, any boxes are drawn as large as possible, extending from the end of the database to the
first pending goal.
(a)
Any formula φ∧ψ as a goal or hypothesis may be replaced by φ and ψ as two formulae. Similarly
T may be ignored altogether.
(b)
To prove the goal ∀x.φ[x]⇒ θ[x], we open an (∀ℑ )-box with new variable x, hypothesis φ[[(x)
\vec]] and conclusion θ[x]. Having now filled in the immediate proof-step which justifies the first
goal, albeit without any reason for θ[x] so far, we are excused from considering this goal again by
the annotation (⇒ ℑ ) or (∀ℑ ) on line 99. The goals \lnot φ and φ⇔ θ are handled in a similar
way.
(c)
The behaviour of ∃x.φ[x] as a hypothesis (∃E ) mirrors that of ∀ as a goal, since
(∃x.φ [x])⇒ θ ≡ ∀x.(φ[x] ⇒ θ). Recall, however, that the (∃E ) -box is open-ended below
(Remark 1.6.5), so as long as the variable x does not occur elsewhere, we can simply add φ[x] to
the data without a box. It is to our advantage to do this as soon as possible, because there may be
many things satisfying φ, and it could be relevant later that the same one plays two or more
different roles in the argument, although to say that we ``choose'' a witness does not mean that an
actual individual is selected (Remark 1.6.7). The original axiom ∃x.φ[x] will not be needed
again.
Subject to scoping of variables, these boxes may be nested in any order and so may be taken together in
a single step.
(d)
A goal ∃x.θ[x] can only be deduced from θ[x0 ], using (∃ℑ ), where x0 is a term to be found by
resolution (Remark 1.7.6).
(e)
If φ∨ψ is in the database then an (∨E )-box is opened below it. Each half of the box now has its
own copy of the database (with φ or ψ respectively replacing φ∨ψ) and goals. As this step may
lead to duplication of the proof, we prefer to do it as late as possible.
(f)
We use resolution (Remark 1.7.6) to prove an atomic goal θ[[(a)\vec]] using ∀[(x)\vec],[(y)\vec].
φ [[(x)\vec],[(y)\vec]]⇒ θ[[(x)\vec]] from the database (∀⇒ E).
Notice that (∀ℑ ) and (∃E ) mirror each other, but (∃ℑ ) and (∀E ) do not. This is because a goal
requires just one proof, whereas a hypothesis may be employed any number of times, or not at all.
(Linear logic analyses the reuse of hypotheses, but we shall not consider it in this book.)
(g)
If ⊥ is in the database, all goals are immediately satisfied ( ⊥E ). More generally, if ∀[(x)\vec].
\lnot φ[[(x)\vec]] is in the database then we may replace it by put[(x)\vec]0 = ? and all
outstanding goals by φ[[(x)\vec]0], using ⊥ as a ``joker' ' (Remark 1.4.4).
(h)
If θ0∨θ1 is a goal then we seek first a proof of θ0, and then (if that fails) a proof of θ1.
REMARK 1.7.10 During resolution, we used declarations ( putx0 = ?) to introduce indeterminates. This
was done to allow us to continue building the logical structure of the proof without specifying certain of
its details. When we have obtained a valid proof, complete apart from the occurrences of an
indeterminate x0, we have to find a term which can be substituted everywhere for it. This term must
satisfy any equations in which x0 occurs, irrespective of how they are nested within the proof box, so the
unification problem cuts across the scoping structure of the proof. Nevertheless, the term must still be
well formed at the point of the declaration: the variables belonging to nested (∀ℑ )- and (∃E )-boxes
must not be free in it.
REMARK 1.7.11 We have gone beyond Fact 1.7.1 by discussing axioms of the form ∃x.φ[x], φ∨ψ and
\lnot φ. These are not definite in the sense of Example 1.7.5, because when ∃x.φ[x] is used the program
cannot provide an answer for x, and x may remain free in any other answers it gives. Similarly which of
φ or ψ holds in φ∨ψ is indeterminate. If the joker (⊥E) is used to prove ∃y.θ[y], again we have no idea
what y is.
\dashv
∀α.\lnot \lnot α⇒ α ∀β.β∨\lnot β .
\vdash
However, an individual formula may be \lnot \lnot -closed (satisfy the rule on the left) without being decidable or complemented (as on the
right).
In saying that we shall not use excluded middle, beware that we are not affirming its negation, \lnot (φ∨\lnot φ) ≡ \lnot φ∧\lnot \lnot φ, which is
falsity. If we are able to prove neither φ nor \lnot φ then we remain silent about them.
Of course there are instances of case analysis even in intuitionism, in particular the properties of finite sets. A recurrent example will be parsing
of terms in free algebras, for example a list either is empty or has a head (first element) and tail (Section 2.7).
EXAMPLE 1.8.2 Let M be an invertible matrix. To show that Mx = u has only one solution, it is quite unnecessary to assume that a ≠ b are
different solutions, as the proof naturally leads to a = b without the aid of the hypothesis. It is futile to obtain a contradiction to the hypothesis, ie
\lnot \lnot(a = b), and then deduce a = b after all.
See Remarks 2.5.7 and 3.7.12 for how this arises in induction.
THEOREM 1.8.3 Negation is a duality in classical logic, interchanging T with ⊥, ∧ with ∨ and ∀ with ∃. That is,
n
REMARK 1.8.4 In classical propositional calculus with n propositional variables (and no quantifiers), we may enumerate the 2 cases where each
of them is true or false. Such a listing of the values of a formula is called a truth table. Each line of the table which has the value T may be read
as a conjunction of possibly negated atomic propositions, and the whole table as the disjunction of these lines; this is the disjunctive normal
form and is classically equivalent to the given formula. So if two formulae have the same truth table they are classically inter-provable. In other
n
words, this calculus is complete: anything which is true in all models ( ie all 2 cases) is provable.
omitted array environment 9 = \vtop \special pn 8 \special ar 400 400 400 400 0 6.28319 \wd 9 φ\wd 9 \copy 9 \wd 9 φ∧ψ\wd 9 \copy 9 \wd 9 ψ
Venn diagrams, in which overlapping circles represent propositional variables, are a popular way of illustrating classical propositional logic,
although they are misleading when one of the regions turns out to be empty. They were invented in 1764 by Johann Lambert, better known for
his work on light intensity; he also proved the irrationality of π and introduced the hyperbolic functions sinh, cosh, etc .
The truth table approach to logic is pedagogically not so simple as is claimed, as the material implication (⊥⇒ φ) = T is an obstacle right at the
start, which has confused every generation of students since ancient times. It took deeper insight into logic to discover the analogy between (⇒
ℑ ) and defining functions or sub-routines (Section 2.4), but once pointed out it is completely natural, and the (⇒ ℑ )-rule is easily grasped. This
analogy is a pearl of modern logic, and I believe students should be allowed to glimpse it, rather than have the prejudices of classical logic
reinforced. The material implication does feature in proof theory as the (⊥E )-rule, but we have already called this the joker of logic
(Remark 1.4.4).
The Sheffer stroke Whereas in art and in other parts of mathematics symmetry is considered beautiful, many logic texts use the de Morgan
duality to eradicate half of the calculus in a quite arbitrary way. Instead of presenting ∀ and ∧ with their own natural properties, they are treated
as mere abbreviations for \lnot ∃\lnot and \lnot (\lnot α∨\lnot β), or vice versa . In our intuition ∧ and ∨ are twins, as are ∀ and ∃, so why should
one of them be treated as a second class citizen in the logical world?
Gottfried Ploucquet (1764) discovered that a single binary connective actually suffices; the operation α\lnand β ≡ \lnot (α∧β) is commonly
known as the Sheffer stroke. Using ∀x.\lnot(φ[x]∧ψ[x]), Moses Schönfinkel was able to dispose of variables and quantifiers, reducing the whole
logical calculus to just one symbol (plus brackets). His paper survives as an important part of the literature because his combinators have types
which correspond to the structural rules (Example 2.4.2 and Exercise 2.26ff).
These operations ``simplify'' logic in the same way that it would simplify chemistry if (after the discovery of oxygen, carbon, hydrogen, etc ) sea
water had been chosen as the one primitive substance (``element''), on the grounds that the usual 93 can all be obtained from it.
Although this nihilist tendency contributed to the failure of mainstream logic to observe the propositions as types analogy (which is very clear in
intuitionism), or to recognise the quantifiers as adjoints, the Sheffer stroke is the building block of digital electronics, where it is called nand.
REMARK 1.8.5 When a small current is passed between the base and the emitter of a transistor, a larger current flows from the collector to the
emitter. This effect is used in analogue circuits for amplification, but it performs an essentially negating digital operation: a high voltage on the
base (relative to the emitter) causes a low voltage on the collector and vice versa .
The example shows nor, \lnot (φ∨ψ), which may be used in a similar way to nand, so positive operators are made by concatenating such circuits.
Intuitionism Excluded middle seems obvious, until you think about the reason for believing it, and see that this begs the question. It requires
mathematicians to be omniscient, a power no other scientist would claim. Those who feel obliged to justify this insist on classifying structures up
to isomorphism before using them to solve real problems.
William of Ockham ( c. 1320), in whose name Henry Sheffer wielded his stroke, considered propositions about the future with an
``indeterminate'' value which even God does not yet know to be either true or false, in connection with the problem of the Free Will of potential
sinners. Even Aristotle had his doubts about excluded middle, which is more properly attributed to the Stoics, according to Jan Lukasiewicz.
REMARK 1.8.6 The modern critique of classical omniscience in analysis was formulated by Jan Brouwer (1907). Given a sequence whose
ultimate behaviour is an unsolved problem (for which he took questions about patterns of digits in the decimal expansion of π), he constructed
counterexamples to several of the major assumptions and theorems of analysis. Suppose for example that all known terms of (\arga ) are zero.
n
n
Although the real number u = ∑ \arga 2- is well defined, it is not known whether u = 0 or u > 0, so we cannot find even the first digit of 1-u in
n n
the usual decimal expansion. He showed that continuous functions need not be Lebesgue-integrable, or have local maxima on [0,1]. On the other
With hindsight, it is unfortunate that Brouwer chose analysis to attack. As we shall demonstrate throughout this book, logic and algebra can be
presented intuitionistically, with no noticeable inconvenience, whereas most of the usual properties of the real numbers rely on excluded middle.
For this reason we will not attempt to cover constructive analysis, but Errett Bishop [BB85] gave an account which is very much in our spirit: it
makes such alterations to the definitions as are required, and gets on with proving the traditional theorems as well as can be done, discarding
naive formulations rather than dwelling on their counterexamples.
Every reform provokes reaction. Although Hilbert's influence is to be found behind almost every pre-war revolution in logic, he also made such
comments as ``no one shall drive us from the paradise that Cantor created for us'' and his Platonist battle-cry, ``Wir müssen wissen, wir werden
wissen'' (we must know, we shall know). Later he claimed that ``to prohibit existence statements and the principle of excluded middle is
tantamount to relinquishing the science of mathematics altogether.''
Rather more of mathematics than Hilbert's ``wretched remnants'' has now been developed intuitionistically. Often, however, we shall find it
convenient to assume excluded middle in order to give some simple introductory examples of concepts, particularly when the most familiar form
occurs in the context of R. There are also habits of language (``N has no proper subalgebra'') which one is reluctant to give up: they are to be
understood as idioms. The reader for whom constructivity is essential will be able to recognise the cases where these conventions apply.
Andrei Kolmogorov (1925) devised a translation of classical proofs into intuitionistic ones, which is often attributed to Kurt Gödel.
If Γ\vdash φ is provable classically, then Γ\lnot \lnot\vdash φ\lnot \lnot has an intuitionistic proof, ie without using excluded middle. In particular,
intuitionistic logic does not save us from any inconsistency (the ability to prove ⊥) which might arise classically. []
The axiom of choice The increasingly abstract form of late nineteenth century mathematics led to the use of infinite families of choices, often
with no conscious understanding that a new logical principle was involved. Giuseppe Peano did formulate, and reject, such an axiom in 1890,
and Charles Sanders Peirce gave the following definition in 1893, but it was Ernst Zermelo who first recognised how widely it had already been
used. His 1904 proof of the well-ordering principle (Proposition 6.7.13 and Exercise 6.53) attracted vehement opposition - at least, judged by the
contemporary standards of courtesy, so much higher than today. It was in order to formulate his response that he found his famous axioms of set
theory, which we shall discuss in Remark 2.2.9.
any entire relation R:X\leftharpoondown \rightharpoonup Y (Definition 1.3.1(c)) contains a total functional relation f:X→ Y, ie ∀x.
xR→ f(x).
Exercises 1.38 and 2.15 were how Burali-Forti and Zermelo formulated it; the former is more convenient for category theory, and Radu
Diaconescu showed that it implies excluded middle (1975, Exercise 2.16). Russell and Whitehead used a ``multiplicative axiom,'' but this is only
meaningful in the context of a much stronger principle (Proposition 9.6.13). Well-ordering was later supplanted in algebra by maximality
properties such as Zorn's Lemma (Exercise 3.16), actually due to Kazimierz Kuratowski.
In the first use of Zermelo's axioms, Georg Hamel showed that R has a basis as a vector space over Q. Tychonov's theorem (that a product of
compact spaces is compact) and many other famous results have also been shown to be equivalent to it.
Not all of the consequences of Choice are benign, for instance it allows us to define non-measurable sets, with the bizarre corollary (due to Felix
Hausdorff) that a sphere can be decomposed into two or more spheres, each congruent to the first. The moral of this is that if we allow the Angels
to employ brute force, then the Devil will make use of it too. Even Zermelo and many of the enthusiasts for Choice considered it appropriate to
indicate when results depend on it.
When Hilbert gave his basis theorem using Choice, Paul Gordon (Emmy Noether's thesis adviser) said of it, ``Das ist nicht Mathematik, das ist
Theologie,'' having worked on the subject for twenty years using what we would now call constructive mathematics. Although we, on the cusp of
the millennium, now reject Choice, it was the way forward at the start of the twentieth century: it stimulated research throughout mathematics,
notably in the Polish school, which we have to thank for numerous ideas in logic and general topology mentioned in this book [Moo82, McC67].
Zermelo conjectured that Choice was independent of his other axioms, and Abraham Fraenkel devised models with permutable ur-elements in
order to prove this. Kurt Gödel (1938) showed how to cut down a model of Zermelo's axioms to the ``constructible sets,'' for which Choice is
provable. However, it was 1963 when Paul Cohen found a model of Zermelo's other axioms in which Choice fails.
The axiom of choice is typically not needed in the concrete cases, because their own structure provides some way of making the selections (we
shall indicate real uses of Choice by the capital letter). Often it is used to extend a property of some familiar structures to the generality of an
abstract axiomatisation, but even then the need for Choice may be more a feature of that particular formulation than of the actual mathematical
structures. For example, Peter Johnstone [Joh82] showed that, by a conceptual change from points to open sets, Tychonov's Theorem could be
proved without Choice. It is for infinitary algebra in Sections 5.6 and 6.2 that this axiom will be most missed in this book. We respond to this
difficulty by examining what infinitary operations are of interest in practice.
In the countable case the following assumption, formulated by Paul Bernays, is often more directly applicable.
any entire relation XR\leftharpoondown \rightharpoonup X with an element x0 ∈ X contains an (ω-)sequence, ie a function x(-):N→
X, such that ∀n.x R→ x .
n n+1
If R is a function then this is primitive recursion over N (Remark 2.7.7). Similarly, we get Dependent Choice by repeatedly applying the choice
function to the seed. König's Lemma (Corollary 2.5.10) is a widely used form of Dependent Choice throughout informatics and combinatorics.
Dependent Choice does not imply excluded middle or vice versa .
Logic in a topos For Jan Brouwer and his student Arend Heyting, Intuitionism was a profound philosophy of mathematics [Hey56, Dum77,
Man98], but like increasingly many logicians, we shall use the word intuitionistic simply to mean that we do not use excluded middle. This
abstinence is nowadays very important in category theory, not because of any philosophical conviction on the part of categorists (indeed most of
them still use excluded middle as readily as the Real Mathematician does), but because it is the internal logic of the kind of world (a topos)
which most naturally axiomatises the familiar mathematical universe.
Joachim Lambek and Philip Scott [LS86] show that the so-called ``term model'' of the language of mathematics, also known as the ``free topos,''
may be viewed as a preferred world, but Gödel's incompleteness theorem shows that the term model of classical mathematics won't do.
Category theory provides the technology for creating new worlds. This is quite simple so long as we do not require them to be classical. Why
should we want such worlds, though? One application is to provide the generic objects ( cf those used in proof boxes) in such a way that we may
reason with them in the ordinary way in their own worlds, and then instantiate them. Arguments about ``generic points'' have been used in
geometry since Giuseppe Veronese (1891), but the logic is unsound if, for example, the equality of two such points is required to be decidable.
Worlds have also been created with convenient but exotic properties. In synthetic differential geometry [Koc81] all functions on R are
continuous, as Jan Brouwer said, and it is legitimate to use infinitesimals in the differential calculus. More recently, synthetic domain theory has
similarly postulated that all functions are to be computable.
Excluded middle was traditionally regarded as a true fact about the real world, so in order to investigate intuitionistic logic it was necessary to
build fictional worlds where excluded middle does not hold. The point of view of this book is that these worlds are not exotic but quite normal,
and their logic is perfectly typical, just as algebraic extensions of the rationals have come to be seen as ordinary number domains with
straightforward arithmetic structure.
It is not necessary to know in advance how to construct such worlds from classical ones before learning how to reason in them. Indeed excluded
middle is like the fear of water: it's easier to learn to swim as a small child, before anyone's told you that it's difficult.
Whatever your philosophical standpoint may be, intuitionism forces you to write mathematics much more cleanly, and to understand much more
deeply how it works. Proof by refutation runs backwards, and so the argument gets tangled. The constructive character of intuitionism is really
due to the strong analogy with type theory, which does not extend to excluded middle. We describe it in Section 2.4, making use of another idea
due to Kolmogorov.
1.9 Exercises I
1. When Bo Peep got too many sheep to see where each one was throughout the day, she found a
stick or a pebble for each individual sheep and moved them from a pile outside the pen to another
inside, or vice versa , as the corresponding sheep went in or out. Then one evening there was a
storm, and the sheep came home too quickly for her to find the proper objects, so for each sheep
coming in she just moved any one object. She moved all of the objects, but she was still worried
about the wolf. By the next morning she had satisfied herself that the less careful method of
reckoning was sufficient. Explain her reasoning without the aid of numbers.
2. For each of the connectives and quantifiers, give the phrases in English and any other language
you know which usually express them. Point out any ambiguities, and how they are resolved in
everyday usage.
Now give in logical notation the literal and intended meanings of the following, choosing
appropriate abbreviations for the atomic predicates:
All farmers don't have cows. The library has some books by Russell and
Whitehead. All passes must be shown at the gate. Dogs must be carried on the
escalator. You hit me and I'll hit you.
3. The following equations are familiar in elementary algebra. Which of ∃x, ∀x and {x|·} ·· are
understood?
(x+y)2 = x2+2x y+y2 ax2+bx+c = 0 x2+y2 =
1
4. Show that reductions in the Lineland Army (Example 1.2.7) are locally confluent. Equivalently,
show that (;) is associative.
5. A Turing machine [ Tur35] consists of a head, which may be in any of a finite number of states,
and a tape which extends infinitely in both directions and is divided into cells (indexed by Z),
each of which contains one symbol from a finite alphabet. All but finitely many cells contain the
blank symbol. For each state in which the head may be, and for each symbol which may be
written in the cell currently being read, there is specified a new state, a new symbol and a
direction of motion (left or right). Show how to express a Turing machine as a rewrite system.
[Hint: the root has the state and the current symbol as two of its arguments; the other two are the
left and right parts of the tape, which must be expressed using the blank and one binary
operation.] Since computation can only proceed in one place, the head has been called the von
Neumann bottleneck.
6. Express confluence (Definition 1.2.5) as a formula involving (∀, ∧, ⇒ , \leadsto and) ∃. Using
op op
proof boxes - in particular the (∃E )-rule - show that R satisfies the property iff R ;R ⊂ R;R .
∃x.∀y. φ[y]⇔ x = y
∃x.φ[x] ∧ ∀x.∀y. φ[x]∧φ[y]⇒ x = y
are inter-provable. Use them to derive idiomatic proof-box rules for ∃!.
9. Suppose that ``x is a widget'' and ``x is a gadget'' are descriptions. Show that: the widget is the
gadget iff the gadget is the widget.
(a)
∀x, y, z.z = y∨y = z∨z = x;
(b)
\lnot \lnot ∀x, y, z.z = y∨y = z∨z = x;
(c)
∀x, y, z.(\lnot \lnot z = y)∨(\lnot \lnot y = z) ∨(\lnot \lnot z = x);
(d)
∀x, y, z.\lnot \lnot (z = y∨y = z∨z = x).
Show that (a) is the strongest and (d) the weakest, and that any other formula obtained by
inserting \lnot \lnot (other than between ∀x, ∀y and ∀z) is equivalent to one of these. Restate the
11. Devise formulae similar to those of the previous exercise to say that φ has exactly three, four, ...,
solutions. By adjoining a condition that φ and ψ have no common solutions (Example 2.1.7),
interpret the equations 1+1 = 2, 1+2 = 3 and 2+2 = 4 and prove them using the box method. (1+1
= 2 is proved in this sense on page 360 of Principia Mathematica.)
12. What, if anything, do the negations of x ∈ FV(t) (Definition 1.1.3), x:X (Notation 1.3.3) and x ∈
U (Definition 2.2.3) mean?
13. Show that f(x) = f(y) defines an equivalence relation on X, where f:X→ Y is any function.
14. Show that a relation R:X\leftharpoondown \rightharpoonup Y is total and functional iff the
composite R \hookrightarrow XxYπ0→ X is bijective. So functions X→ Y correspond to sections i
of π0, ie such that i;π0 = id (Definition 1.3.12).
15. List the sixteen cases where the functional, total (entire), injective and surjective conditions do
and do not hold for a binary relation. For each case give an example and, where possible, a name
and notation; three less familiar ones represent overlap, subquotient and its converse.
op
16. Show that a relation R:X\leftharpoondown \rightharpoonup Y is functional iff RoR ⊂ id, total iff
op op op
also id ⊂ R o R, injective iff R o R ⊂ id and surjective iff id ⊂ RoR . Hence prove
Lemma 1.3.11. Show also that a function f is injective iff it is a monomorphism: \fung1;f = \fung2;
f⇒ \fung1 = \fung2, and surjective iff it is an epimorphism: f;\fung1 = f;\fung2⇒ \fung1 = \fung2;
cf Proposition 5.2.2(d).
17. Describe the sixteen cases for a binary (endo)relation where reflexivity, symmetry, transitivity
and functionality do and do not hold, noting those for which idempotence necessarily holds.
18. Let R:n\leftharpoondown \rightharpoonup m be a decidable relation between two finite sets.
Write [(R)] for the (nxm) matrix with 1 in the (i,j)-position if iRj and 0 otherwise. Compare the
matrix product [(R)]·[(S)] with the relational composition [(R;S)].
19. Show that the sequent Γ\vdash θ is provable in the sequent calculus iff θ is provable from
hypotheses Γ in the box style.
20. Describe the introduction and elimination rules for \lnot and ⇔ by adapting those for ⇒ and ∧.
In each case prove that the derived rules are equivalent, and describe the verbal mathematical
(a)
∨, ∧ and ⇔ are commutative and idempotent;
(b)
∨ and ∧ are associative;
(c)
the boxes for (∀ℑ ), (⇒ ℑ ) and (∃E ) may be interchanged.
We tend to abuse ⇔ transitively, so that φ ⇔ ψ⇔ χ means (φ⇔ ψ)∧(ψ⇔ χ), but why is this an
abuse?
22. Prove the following by the box method. Make it clear where any formulae are imported into
boxes (Lemma 1.6.3). omitted eqnarray* environment where x ∉ FV(ψ).
23. In linear implicational logic the only structural rules are identity, cut and exchange ( not
weakening and contraction). Contexts are then bags (unordered lists), not sets. There is only one
connective (⇒ , but it is usually written \multimap ), obeying the sequent form of (⇒ ℑ ) and
omitted prooftree
environment
Using ideas of double-entry bookkeeping, develop a box style of proof (like that in Section 1.5)
which is sound for this logic. [Hint: the reasons or credit column must cite the two formulae
which are used in each (\multimap E), and there must also be a debit column which records
where the present formula is used.] Replace the cross-references by arrows ( proof nets) and
\multimap by a ternary node with one outgoing and two incoming arrows. Show that the boxes
are then redundant, and investigate the dynamical behaviour when (\multimap E) and (\multimap
ℑ) meet.
24. (Only for those who already know linear logic.) Extend the proof box method to the connectives
⊗, ⊕ and &, giving the translations into and from the sequent, λ-calculus and proof net
approaches.
26. Write the direct and indirect rules of propositional logic in the style of Hilbert (Remark 1.6.10).
Using steps of the form ∀[(x)\vec].∧Γ⇒ φ , extend this to the quantifiers (this wasn't how Hilbert
did it).
27. Formulate and prove the soundness of the rules of natural deduction with respect to a truth values
semantics.
28. An assertion is immediate if it is the conclusion of an applicable rule, ie it may be deduced with
no mediating argument. Give examples of mathematical arguments which are obvious but not
immediate and vice versa (page 1.7), and also trivial topics (page def trivial) .
29. Using Remark 1.6.2, translate the proof of Lemma 1.6.6 (that composition preserves functionality
and totality) into the vernacular.
30. Using the (ε-δ) definition of continuity R→ R, show as a proof box that the composite gof of two
such functions is continuous.
31. Why may we assume in Remark 1.7.9(c) that a hypothesis of the form ∃x. φ[x] is used exactly
once in a proof, ie with just one witness?
32. Show how to extend Fact 1.7.1 to φ∨ψ and ∃x.φ[x] in the database, though the proof is no longer
uniform. This requires us to use φ∨ψ as soon as possible, contrary to Remark 1.7.9(e); can our
Remark be justified?
33. Devise heuristics which use axioms of the forms ξ ⇒ (φ∨ψ) and ξ⇒ ∃x.φ[x ]. Why must the new
goal ξ be put at line 50 and new data (φ∨ψ or ∃x.φ[x]) at line 51, instead of their usual places at
the bottom and top of the box? Give examples of theorems (in analysis, for example) whose
statements are of this form. What are the idioms for dividing up arguments in which such
theorems are used?
34. Show that φ is decidable iff φ∨\lnot φ is \lnot \lnot -closed, and then φ itself is also \lnot \lnot -
closed. Show that de Morgan's law gives (\lnot φ)∨(\lnot \lnot φ), so under this assumption every
\lnot \lnot -closed formula is decidable.
35. Write out the truth tables for the two sides of de Morgan's laws and the distributive laws. Show
that if two propositional formulae have the same truth table then they are classically inter-
provable.
36. Express each of \lnot , ∨, ∧ and ⇒ in terms of either nand (the Sheffer stroke) or nor.
37. Write down the formulae for the sum and carry bits in the binary addition of two single-bit
numbers (0+0 = 00, 0+1 = 01, 1+0 = 01 and 1+1 = 10). By expressing them in terms of nor, give
a circuit for a half adder. Show how to add two n-bit binary numbers; why is the half adder so
called?
38. Show that every surjective function p:X \twoheadrightarrow Y has a section i:Y\hookrightarrow X
such that i;p = \id iff the axiom of choice (Definition 1.8.8) holds. [Hint: cf Exercise 1.14.]
Y
39. Let ω be any proposition whose truth-value you know but which others may dispute, such as
``July is in the winter.'' Suppose that Ω, the type of truth values, consists of true, false, ω
and \lnot ω (with apologies to tropical readers). This supposition is consistent with most of pure
mathematics. Explain how it is still the case that Ω is the two-element set {T,⊥}. What element
of this set is ω? Where would you have to be to observe Ω as a four-element lattice?
INTRODUCTION
● Induction
● Minimal counterexamples
● Descending chains
● Proof trees
● Termination
● Complexity measures
● Unions
● Products
● Impredicativity
EXERCISES II
Chapter 2
Types and Induction
Every mathematician's toolbox contains tuples and subsets for making ideal elements, and proofs by
induction. In this chapter we bring together the traditional techniques which form the received view of
the foundations of twentieth century mathematics. Afterwards they will be dismantled and reconsidered
in the light of later algebraic experience.
At the beginning of his career, Georg Cantor investigated sets of points of discontinuity which functions
could have whilst still admitting Fourier representations. He also gave a construction of the real numbers
from the rationals, and showed that there are a lot more reals than rationals (Hermann Weyl later
reproached analysts for decomposing the continuum into single points). Cantor was led to considering
abstract sets, forming hierarchies under constructions such as the set of all subsets.
There are historical parallels between mathematics and programming in the development of types.
Cantor was concerned with the magnitudes of sets, whereas FORTRAN distinguished between integer and
real data types because they have different storage requirements. (Linear logic shows that resource
analysis continues to be a fruitful idea.) Both started from the integers and real numbers alone. Bertrand
Russell formulated his theory of types as a way of avoiding the vicious circles which he saw as the root
of the paradoxes of set theory. On the other hand, the one lesson which the software industry has learned
from informatics is that the type discipline catches a very large proportion of errors and thereby makes
programs more reliable. Early calculi provided a static universe in advance, but modern type theories
and programming languages create new types dynamically from old ones.
What is an abstract set? Some accounts of set theory claim that it is a voluntary conspiracy of its
elements, coming together arbitrarily from independent sources (the inductive conception). But this
conflicts with mathematical practice, and has little backing even in philosophical tradition. Plato held
that members of a class are images of a Form; in practice, we conceive of the Form and certain of its
instances first. The totality is only a semantic afterthought (and the instances are usually not themselves
Forms). Indeed, from Zeno's time, points in geometry lay on lines but did not constitute them.
Gottlob Frege defined sets by comprehension of predicates, which at first he allowed to take anything as
their subjects. Russell's famous {x|x ∉ x} showed that things couldn't be done quite so naively , so
instead we select the elements from an already given ambient set.
For us, types are not imposed afterwards to constrain the size of the world, but are a precondition of
meaning. In elementary trigonometry sin is thought of as applying to angles only, which are only
reduced to real numbers by choosing a unit of measurement. Physical quantities may only be added or
tested for equality if they measure the same thing (length, mass, energy, electric charge, etc ) in the same
units; sometimes laws of mechanics can be guessed by this dimensional analysis alone, or from a scale
model which preserves the dimensionless part (such as the Reynolds number in fluid mechanics).
More complex types are formed by processes, such as the powerset, like those generating terms and
logical formulae. The establishment of certain standard abstract methods of construction made it
possible to state and prove results of a generality that would not have been considered in the nineteenth
century. As Michael Barr [BW85, p. 88] put it, ``The idea of constructing a quotient space without
having to have the ambient space including it, for example, was made possible by the introduction of set
theory, in particular by the advent of the rather dubious idea that a set can be an element of another set.
There is probably nothing in the introduction of topos theory as foundations more radical than that.''
On the other hand, the importance of the quotient operation is such that it should perhaps be taken as
primitive instead. Modern type theory builds hierarchies as Cantor and Zermelo did, but using simpler
ways of forming types, such as the product, sum and set of functions. These correspond very directly to
conjunction, disjunction and implication of propositions, an analogy which will be an important guiding
principle for the rest of the book. We shall find that the structure of the types is characterised, not by
their set-theoretic incarnations, but by certain operations, such as projection and evaluation maps, which
build terms of that type and take them apart. In particular the λ-calculus handles the terms arising from
the function-type.
The second half of the chapter is devoted to induction and recursion. Sections 2.5-2.6 discuss well
founded relations, a notion of induction which also comes from the set-theoretic tradition, but we shall
motivate it instead from the problem of proving correctness and termination of a wide class of recursive
programs. There are classical idioms of induction based on minimal counterexamples and descending
sequences, but we shall show how they can often be made intuitionistic. More complicated inductive
arguments can be justified by constructing recursion measures using lexicographic products and other
methods.
For programming (and foundations), structural recursion over lists, trees and languages is more
important. In Section 2.7 we treat lists and Peano induction over the natural numbers in a similar fashion
to the function-type. The last section treats second and higher order logic.
The rationals (Q) may also be represented in the familiar way as pairs of integers (Z), although now
there are many pairs representing each rational (Example 1.2.1), and the positive and negative integers
may be obtained from the natural numbers (N) in a similar way. This leaves the construction of the reals
(R) from the rationals.
The real numbers The course of the foundations of mathematics in the twentieth century was set on
24 November 1858, when Richard Dedekind first had to teach the elements of the differential calculus,
and felt more keenly than before the lack of a really scientific foundation for analysis. In discussing the
approach of a variable magnitude to a fixed limiting value, he had to resort to geometric evidences.
Observing how a point divides a line into two parts, he was led to what he saw as the essence of
continuity:
REMARK 2.1.1 If all points of the straight line fall into two classes such that every point of the first class
lies to the left of every point of the second class, then there exists one and only one point which
produces this severing of the straight line into two portions.
In [Ded72] he used these Dedekind cuts of the set of rational numbers to define real numbers, and went
on to develop their arithmetic and analysis. By way of an example, he proved ``for the first time'' that
√2x√3 = √6. There is one slight difficulty, in that each rational number gives rise to two cuts, depending
on whether it is itself assigned to the lower or upper part - we shall say neither.
A real number is then a pair of subsets of Q, and, from the universe of all pairs of subsets (L,U ⊂ Q), the
collection \realno of (Dedekind) reals is the subset consisting of those satisfying a certain property,
D
namely
∀x.\lnot (x ∈ L∧x ∈ U) ∧ ∀x, y.y ∈ L∧x < y⇒ x ∈ L ∧∀x, y.y ∈ U∧x > y⇒ x ∈ U ∧∀x.x
∈ U⇒ ∃y.y ∈ U ∧y < x ∧∀x.x ∈ L⇒ ∃y.y ∈ L∧y > x ∧∀ ε > 0.∃x, y. x ∈ L∧y ∈ U∧y-x <
ε.
To do this we have used the cartesian product (collecting all pairs), the powerset (collecting all subsets),
and comprehension (forming a subset by selecting those elements which satisfy a particular property,
for example the circle S1 ⊂ R2 considered as the set of solutions of x2+y2 = 1).
Georg Cantor (1872) gave another construction of R based on the idea of the convergence of sequences,
such as the decimal expansion of π. First we must explain how a sequence may be abstractly convergent
without having a limit point which is known in advance.
DEFINITION 2.1.2 A sequence in a set X is a function \arga(-):N→ X. This is called a Cauchy sequence
(in X = Q or R) if
ε.
If ∀ε > 0.∃N.∀n,m > N.|\arga -\argb | < ε then the sequences (\arga ) and (\argb ) are equivalent, and
n m n n
The Dedekind and Cantor constructions, which are equivalent in classical logic, are developed and
related in Exercises 2.2- 2.11.
(a)
n n
A point in projective n-space is a line through the origin in R +1, ie an equivalence class of R +1
\{0} with respect to the relation that (x0,…,x ) ∼ (y0,…,y ) if for some k ≠ 0, x0 = k y0, ..., x =
n n n
k y . The (n-1)-plane at infinity consists of those classes for which the co-ordinate x0 is zero: no
n
(b)
As 6 = (1+[√(-5)] )(1-[√(-5)] ) = 3·2, unique factorisation fails in R = Z[[√(-5)] ]. To remedy this,
Ernst Kummer (1846) introduced ideal numbers, which are subsets I ⊂ R closed under addition
and under multiplication by elements of R. An ordinary number r ∈ R is represented by its set of
R}.
The product IJis the ideal generatedby {ij|i∈I,j∈J} and then the prime factorisation of 6 is (1+[√(-5)],2)2
(1+ [√(-5)],3)(1-[√(-5)],3).
Functions and equivalence classes The Cantor construction of the reals adds two further operations,
but these may themselves be defined in terms of the product, powerset and comprehension. The idea in
both cases is to internalise the definitions of equivalence relation and function from Sections 1.2 and 1.3
respectively. The connection between the exponential and the set of functions, defined using input-
output pairs, was also first made by Cantor (but for cardinal arithmetic, in 1895).
X
EXAMPLE 2.1.4 For sets X and Y, the function-type is constructed as Y = {f:P(Xx Y)|ψ[f]}, where ψ[f] is
( cf Definition 1.3.1)
X
Any actual function p:X→ Y is represented by {x,p(x)|x ∈ X}. Conversely, given f ∈ Y and a:X, from ψ
[f] we have
∃!y:Y.a,y ∈ f, ie a,y ∈ f ⇔ y = f
(a),
so a,y ∈ f is a description (Definition 1.2.10) of the result, called f(a). But in order to understand
function-types properly, the evaluation operation ev:(f,a)→ f(a) must be studied in its own right. []
EXAMPLE 2.1.5 Let ∼ be an equivalence relation (Definition 1.2.3) on a set X. Then the quotient is the
set X/ ∼ = {U ⊂ X|θ[U]} of equivalence classes, where θ[U] is
For x ∈ X, we write [x] = {y|x ∼ y} ∈ X/ ∼ . The union of these subsets is X, they are inhabited, and if
any two them overlap at all then they coincide (classically, we would say that they are non-empty, and
either disjoint or equal); such a family of subsets is called a partition.
Unions and intersections Having shown that the more powerful operations can be reduced to the
product, powerset and comprehension, we complete the picture by treating the simpler ones in the same
way. (See also Proposition 2.8.6 for the logical connectives.)
EXAMPLE 2.1.6 Let a be an element of a set X, and U,V ⊂ X be the subsets characterised by predicates φ
[x] and ψ[x] respectively. Then
(a)
the singleton, {a}, is characterised by the predicate x = a in x,
(b)
the union, U∪V, is characterised by φ[x] ∨ψ[x],
(c)
in particular {a,b} = {a}∪{b} = {x |x = a∨x = b},
(d)
the intersection, U∩V, is characterised by φ[x]∧ψ[x], and
(e)
the difference, U\V, by φ[x]∧ \lnot ψ[x].
(f)
Given excluded middle, X\V is the complement of V in X:
(X\V)∩V = ∅ (X\V)∪V = X. []
The operations we have just described form subsets of an ambient set X. The disjoint union, like the
product, function-type and quotient, forms a new set. It can also be constructed using products,
powersets and comprehension, though the following construction may be unfamiliar as it is not the same
as that used in set theory. (The common set-theoretic construction is not valid in the axiomatisation of
the next section.)
EXAMPLE 2.1.7 If X and Y are sets then their sum or disjoint union is X+Y = {U,V:P(X)xP(Y)| φ[U,V]},
where φ[U,V] says that U and V have exactly one element altogether, ie
Conversely, Exercise 2.13 shows that every element of X+Y is of one or other of these forms, but not
both - yet another description. (Exercise 1.11 was based on a similar idea.) Case analysis may be used
on such a value: for any two functions f:X→ Θ and g:Y→ Θ, there is a unique function p:X+Y→ Θ such
that
Singletons and the empty set As with the union and disjoint union, there is a conceptual difference
between the singleton as a free-standing set (which we call 1) and the singleton subset consisting of a
particular element of a given set. Up to interchangeability (unique isomorphism) there is only one
singleton set, and only one empty set. The two notions of singleton are related by the correspondence
between elements of any set X (and so its singleton subsets) and functions 1→ X.
EXAMPLES 2.1.8
(a)
On any set X we have the constantly true and false predicates, T and ⊥, which characterise X ⊂ X
and the empty set ∅ ⊂ X; any other subset lies between these. In particular, ∅ ≡ {x:1|⊥}.
(b)
The only subset of ∅ is itself, because the true and false predicates coincide (Exercise 2.29).
Hence P(∅) has exactly one element, ∅. We shall write 1 = {∗} for the singleton, since it is
The symbol ∅ appears to be a 1950s variant of zero, having nothing to do with Latin O, Greek φ or
Danish/Norwegian Ø .
(a)
there is a unique relation ∅\leftharpoondown \rightharpoonup X, namely the empty or constantly
false one, and this is a function ∅→ X;
(b)
any function X→ ∅ is bijective;
(c)
for any sets X and Y, there is a unique bijection
{x: X|⊥} ≡ {y: Y|
⊥},
so we are justified in using ∅for the smallest subset of any set;
(d)
total functions 1 ≡ {∗} → X are of the form ∗→ a, so correspond bijectively to elements a ∈ X;
(e)
there is a unique total function X→ 1 ≡ {∗} , namely x→ ∗. []
There is one further major construction of this kind which we shall do, namely that of the free algebra
for a free theory (Proposition 6.1.11), but now we turn to the axiomatisation of these operations.
We shall make a distinction between elements and sets, though in such a formalism it is usual to refer to
terms and types as we did in Section 1.3. We shall also modify what Zermelo did very slightly, taking
the cartesian product XxY as a primitive instead of the unordered pair {X,Y}, and the singleton instead of
the empty set ( cf Examples 2.1.8).
Our system conforms very closely to the way mathematical constructions have actually been formulated
in the twentieth century. The claim that set theory provides the foundations of mathematics is only
justified via an encoding of this system, and not directly. It is, or at least it should be, surprising that it
took 60 years to arrive at an axiomatisation which is, after all, pretty much as Zermelo did it in the first
place.
The study of sheaf theory by the Grothendieck school unintentionally wrested foundations from the set-
theorists, though it was Bill Lawvere who saw that logic could be done in these new worlds (toposes).
The formulation of languages for such reasoning was undertaken by Bénabou, Coste, Fourman, Joyal
and Mitchell; although they called it ``set theory,'' they were in fact developing the type theory below
and in Chapter V- 1pt. For a detailed account of the modern system and its history, see [ LS86].
(a)
the singleton, 1, which has just one element, called ∗ ;
(b)
the cartesian product, Xx Y, whose elements are ordered pairs a,b, where a ∈ X and b ∈ Y,
whenever X and Y are sets; Remark 2.2.2 gives the full definition of the product and associated
pairing and projection operations;
(c)
(d)
the powerset, P(X), whose elements are the subsets of a set X;
(e)
and the set N of natural numbers (Section 2.7).
Singleton and product We shall give introduction and elimination rules for the types announced in
this Definition, as we did in Sections 1.4- 1.5 for the predicate calculus. There we were really only
interested in the fact that various propositions could be proved, but now we want to say that certain
terms do or do not denote the same value, so we must give reduction rules relating them (Section 1.2).
omitted array
environment
The rules have been named in the same way as in Remark 1.4.3: the introduction rule (xℑ) creates a
term of product type (the pair), which is used by the elimination rules, and the β- and η-rules cut out
detours. The equality rules say that substitution interacts with π0, π1 and , in the same way that it does
with operation-symbols.
omitted prooftree
environment
There are two elimination rules and two β-rules: in the ternary case there would be three, so in the
nullary case (the singleton) there is no elimination rule at all. (xℑ) has two premises, so (1ℑ) has none.
_
r(a,b) (a,
r
= b),
so the product type is used in the semantics of algebra (Section 4.6), but we choose not to use it for the
syntax. Pairing is sometimes claimed to make variables redundant, but it does so by replacing named
variables with numbered projection functions. In a product of many factors (as for record types in
programming languages) we want to name, not number, the fields. [Pit95] treats the product carefully in
its own right.
Comprehension and powerset As we have rejected the ``inductive conception'' of a set as a voluntary
conspiracy of its elements, we are left with the problem of defining what a ``subset'' is. We shall do this
in terms of predicates (Definition 1.4.1): a subset of X is by definition the same thing as a predicate with
a variable of type X. By the same convention, we may treat a k-ary relation as either a predicate in k
variables or a subset of the k-fold cartesian product, cf Definition 1.3.1(a).
The syntax for comprehension, like quantification, binds the variable x, so it is subject to α-equivalence
(Definition 1.1.6). Since it is therefore a context-changing operation, box or sequent methods similar to
those of Section 1.5 are needed to formalise it properly. We shall not in fact do this until Section 9.5,
because it is preferable to introduce these methods for the function-type instead, as we do in the next
section.
The elements of this new set (or, as we prefer to say following Notation 1.3.3, the terms of the new type)
are given by the two-way rule,
omitted prooftree
environment
which we may read downwards as an introduction rule ({}ℑ) and upwards as two elimination rules,
rather similar to those for the product type. The β- and η-rules say that the term a stays the same; this is
because φ[a], unlike the type Y in the product, has no associated term.
REMARK 2.2.4 Notice that the term a has both the ambient set X and the subset {x:X|φ[x]} as its type, so
two occurrences of the same term may have different types. In particular, by one of the elimination
rules, any term of the subtype acquires the wider type. This defines an injective function, which is called
the inclusion:
{x:X|φ[x]}\hookrightarrow
X.
In category theory we define subsets as injective functions (Section 5.2).
DEFINITION 2.2.5 For a predicate φ[x] on X, we write {x:X|φ[x]} not only for the new set defined above
by comprehension, but also for an element of the powerset P(X); this is the introduction rule (Pℑ). The
elimination rule (PE) provides the binary membership relation,
X.
Our ∈ is typed as shown, whereas in set theory there is a single ∈ relation for the whole class of sets.
X
(a)
a:X, the statement in the meta-language that a term a has type X,
(b)
and a ∈ U, which says that the term a of type X satisfies the predicate φ[x] (in the object-
X
The use of ∈ in (a) is a rather ingrained habit, to which we shall often revert since the colon is not
altogether a satisfactory alternative: if it could be stripped of its set-theoretic confusions, a symbol
derived from the Italian è (is) would be entirely reasonable, whereas the colon is punctuation. Nor do we
often bother to write the subscript. We shall, however, be careful to write ∀x:X.φ for quantifiers,
reserving ∈ for the guarded quantifiers, so
The ambiguous notation makes the β-rule for powerset look the same as the introduction and elimination
rules for comprehension together:
φ[a]⇔ a ∈ {x:X|φ[x]}. (P
β)
Regarding them as elements rather than types, we have to say when two subsets U,V:P(X) are equal:
U=V if ∀x:X. x ∈ U⇔ x ∈
V,
ie the predicates defining these subsets are inter-provable. Like ⇔ in logic, but unlike equality in
arithmetic, we have to give two arguments to show that subsets are equal, one in each direction: U ⊂ V
and U ⊃ V ( cf Exercise 2.18). Finally, the (Pη)-rule is U = {x:X|x ∈ U}.
Notation
REMARK 2.2.6 The symbols ≤ and < for the reflexive and irreflexive orders on N were used for inclusion
of subsets in the nineteenth century (and are still used for subgroups), but Ernst Schröder introduced = ⊂
and ⊂ . Many authors use ⊂ for strict containment, and ⊆ for the non-strict version, but strict inclusion
is neither primitive (constructively) nor particularly useful: if U ⊂ V but V\not ⊂ U then the latter fact
has to be proved - and should be stated - separately. Indeed Louis Couturat rejected the symbol = ⊂ in
1905 ``parce qu'il est complexe, tandis que la relation d'inclusion est simple.'' The analogy with
arithmetic is bogus (Section 3.1): ⊂ is syntactic sugar for \vdash , ⇒ or \hookrightarrow . In these cases,
rightly, no notation has been invented for the strict versions, or any resolution made into strict and equal.
So we use ⊂ in the non-strict sense.
A conflicting notation survives in philosophy as ⊃ for implication (and is also used by some modern
authors in type theory to avoid overloading the arrow notations). It is actually older: Joseph Gergonne
introduced C for contient and ⊃ for its converse in 1817, and these symbols were used by Peano and by
Russell and Whitehead. (In fact Russell and Whitehead also used ⊂ for containment in our sense.)
REMARK 2.2.7 A common abuse of the subset-forming notation (which we have already committed) is to
put a term in place of the variable:
Which variables x, y, ..., are deemed to be bound in this notation? This is not made clear - informally, we
write ``x ∈ X,y ∈ Y,φ[x,y]'' to indicate what we mean. (In fact this is the same abuse of notation which
we ridiculed in Examples 1.1.7.) As an important special case, we often want to apply a function ``in
parallel'' to the elements of a subset, obtaining a (sub)set of results. In this case we write, as in Notation
1.3.4,
for the image (see also Remark 3.8.13(b)); in particular \funf!({x}) = {f(x)}. Notice that the extended use
of the notation for comprehension, and in particular the image, disguise an existential quantifier, which
we shall discuss in Section 2.4.
Another, perhaps unfamiliar, special case is when there is a constant on the left of the divider, or maybe
no variables in the expression at all:
{∗|
φ}.
(Classically, we would say that this is {∗} if φ is true and ∅ otherwise.) In this way, propositions
correspond to subsets of the singleton, and to elements of P(1).
Parametric sets In algebra and the predicate calculus we used terms and formulae containing
variables, but the types of the variables were fixed in advance. In Zermelo type theory, by contrast, the
comprehension operation {x|φ} is not required to bind all of the free variables of the formula φ. Those
which remain free become the free variables of a type- expression, for example,
Factors[x] = {y|∃z.x = y
z}.
The ``arguments'' of a dependent type Y[x] will be enclosed in square brackets, as we have already done
for predicates. This is an informal notation like f(x) for functions (Remark 1.1.2). Of course each
argument x:X has its own type, but x is not itself a type (although in Section 2.8 we shall briefly discuss
an extension in which there are type variables and quantification over them). These phenomena are
called polymorphism, because the same type-expression may be instantiated in many ways.
As Factors[x] may be used as the type of another variable y in terms, formulae and other type-
expressions, we must modify Definition 1.5.4.
DEFINITION 2.2.8 For each typed variable y:Y in a context, the free variables [(x)\vec] of the type-
expression Y[[(x)\vec]] must occur earlier in the list than the variable y itself.
This and Remark 2.2.4 make Zermelo type theory very complicated.
Fortunately, Exercise 2.17 shows that it is possible to rewrite any type-expression (using any of the
constructors, including comprehension) as a subset of a type defined using 1, x, P and N alone. So
comprehension may be postponed and used just once. Variables may be taken to range over types from
this simpler class, where each formula or term is guarded (Remark 1.5.2) by the predicate defining the
subset. Types and terms may once more be treated separately, and the exchange rule allows us to
disregard the order of the variables. This system is studied in [LS86].
Comprehension-free types are, however, somewhat artificial and do not allow us to speak directly of
functions, real numbers or equivalence classes. But in practice the difficulty which we mentioned does
not arise unless we make actual use of types containing non-trivial dependency. The notation for
dependent types is not straightforward, but as they are important to the practical foundations of
mathematics and interpreted in many semantic models we devote the final two chapters to them.
Historical comments The foregoing motivation is a fiction, in terms of history. Ernst Zermelo had
been enticed from applied mathematics into foundations by Hilbert. He was interested in cardinal
arithmetic and in particular the well-ordering property (Proposition 6.7.13), which Cantor had assumed
but was unable to prove. The effort of formalising his proof of this led Zermelo to a usable system of
foundations, which brought set theory to what was arguably its perigee.
(a)
Bestimmtheit (literally definiteness, but usually known in English as extensionality): if ∀z.z ∈ x
⇔ z ∈ y then x = y;
(b)
Elementarmengen (basic sets): ∅, {x}, {x,y} are sets if x and y are;
(c)
Aussonderung (comprehension) for definit properties (see below);
(d)
Potenzmenge (power set);
(e)
Vereinigung (union): {z|∃y.z ∈ y ∈ x} is a set if x is;
(f)
Auswahl (Choice): see Exercise 2.15; and
(g)
Unendlichkeit (infinity, N is a set): see Exercise 2.47.
Real numbers and ``ideal'' algebraic numbers were both constructed as sets, so it was reasonable at the
time to treat individuals and collections in the same way. Nowadays we are used to mutual recursive
definitions of several distinct syntactic classes (such as commands and expressions in programs), and it
is preferable to do this for terms, types and predicates.
The (now archaic ) German word Aussonderung means ``sorting out'' in the sense of discarding what is
not wanted. No single English word seems to fit as well, but it is often translated as separation,
comprehension being reserved for the unbounded way of forming sets which brought Gottlob Frege's
system down. But as the word ``separation'' has wider and more natural uses in, for example, topology, it
seems better to re-employ the term whose meaning would be immediately recognisable to non-logicians.
The definit properties sparked a new controversy. Again people were forced to think, now about the
formulation of the sentential calculus, and the outcome was first order model theory. (Zermelo's
vagueness is now perhaps an advantage, since we may (remove Choice and) substitute intuitionistic ,
classical or some other calculus for his missing definition.)
Extensionality defines equality between individuals, but it also imposes an absolute notion of
interchangeability for types, where subsequent experience has taught us that specified isomorphisms
should play this role (Definition 1.2.12ff). This first, seemingly innocuous, axiom has some rather
bizarre results, particularly for unions and intersections. If the grandchildren z belong to some known set
w, then Zermelo's union {z|∃y.z ∈ y ∈ x} is simply the union in the lattice P(w), which is given by the
existential quantification of a family of predicates in our notation. However, if x = {y1,y2}, where y1 and
y2 are sets given independently, they may suddenly be found to overlap. For the more natural disjoint
union (Example 2.1.7), explicit coding must be used to distinguish the elements of the two sets. (The
overlapping union is not definable using Definition 2.2.1, but see [Tay96a] and [Tay96b] .)
Even as an equality test, extensionality is highly recursive, although a further axiom (Foundation) is
needed to justify the recursion. Dana Scott (1966) showed that it is essential to giving the axiom of
replacement its power: without extensionality, ZF (see below) is provably consistent in Zermelo type
theory ( cf Example 7.1.6(g) and Exercise 9.62).
More recently, this idea has arisen in process algebra as bisimulation , and can in fact be seen as a
notion of co-induction [ Acz88,Hen88], cf Exercise 3.53, Example 6.3.3 and Remark 6.7.14.
It is easy to find a bijection between Q and N (we say that Q and Z are countable), but Cantor showed
that there is none between R and N. He found his now well known diagonalisation argument in 1891,
but had a much prettier proof using intervals rather than decimal expansions in 1873. This began the
theory of cardinality. But his next discovery, that there is a bijection between R2 and R, showed at its
conception in 1877 that the attempt to classify infinite sets up to (not necessarily continuous) bijection is
powerless to define dimension and hence make the distinctions which are important in mathematics.
(See the remarks after Proposition 9.6.4 for how cardinals ought to be interpreted.)
As Emile Borel stressed in 1908, the important observation about Q is that there is an effective coding,
not anything to do with its ``size.''
In the same year as Zermelo, Bertrand Russell gave a theory of ``ramified'' types, later developed in
[RW13]. Leon Chwistek (1923-5) and Frank Ramsey (1927) showed how to eliminate ramification,
giving a theory (historically known as simple type theory) which is essentially equivalent to Zermelo's
without the comprehension scheme. Versions of the systems of Frege, Russell and Whitehead, Zermelo,
Ramsey and Quine are compared in [Hat82]; for a more philosophical survey, see [Bla33].
Thoralf Skolem, like Russell, set out to deal with the impredicativity questions which had been raised by
Poincaré and Weyl (see the end of this chapter). He recognised the difference between the mathematical
statement that P(N) is uncountable, and the metamathematical one that its terms are recursively
enumerable. Skolem is best remembered for formalising first order logic and establishing its model
theory. As this was, for a long time, the only available tool for the logical analysis of mathematical
theories, set theory became the study of axiomatisations of a first order ∈ relation. Since the predicates
belonged to the meta-language, comprehension was turned into an infinite scheme of axioms, and, by the
Löwenheim-Skolem theorem (Remark 2.8.1), set theory has countable models which contain
uncountable cardinalities [Sko22].
This paradox was repugnant to Zermelo, who correctly said that it revealed the limitations of first order
logic, not of set theory. Indeed, turning comprehension into salami doesn't explain the type-constructors
that actually occur in mathematics (but see Definition 5.2.10). After his treatment by the subsequently
dominant tradition in set theory, Zermelo would, I believe, readily forgive my putting ordered pairs into
his system.
A trick was found for coding the ordered pairs which mathematics needs in terms of the unordered ones
Zermelo provided (Exercise 2.19). Like the Sheffer stroke (page 1.8.4), this only obscures matters.
(Those who like to argue from authority should note that [ Bou57], whilst indulging in obfuscatory
reductionism of its own (Remark 1.6.7), treats pairing as primitive.) Set theory is sometimes called the
``machine code'' of mathematics, and this comment is supposed to justify the pair formula.
Why Zermelo made the choice he did between ordered and unordered pairs is perhaps worthy of
historical study, though it seems unlikely that it actually occurred to him that he was making any
significant decision at all. Irredundancy was considered important at the time, probably because Hilbert
had only recently found the final settlement of the axiomatics behind Euclid's parallel postulate. In fact
the cartesian product had yet to gain the importance which it has now; for example [Die88] remarks that
the first mention of the product of two abstract topological spaces was also in 1908.
In 1922 Abraham Fraenkel and Thoralf Skolem added another type-forming operation: the axiom-
scheme of replacement. Owing to its obscure formulation, use of Replacement is widely overlooked, but
it is incredibly powerful: Richard Montague (1966) showed that it can prove the consistency, not only of
Zermelo set theory itself, but of the extension of this by any single theorem of ZF. We shall try to see
what Replacement means in the final section.
The last of the Zermelo-Fraenkel axioms is foundation, which says that we may use induction on the
membership relation (Definition 6.7.5). On the other hand, it also says that everything in mathematics
belongs to the set-theoretic hierarchy, contradicting the intuition that, for example, the group A5 has
elements but not elements of elements. (This group has familiar representations as the even permutations
of five objects and the symmetries of a dodecahedron, as well as of two different projective spaces.) The
axiom of foundation makes the hierarchy rigid, whereas objects of mathematical interest typically have
lots of automorphisms. Stone's representation of Boolean algebras as lattices of clopen subsets also gives
an example where there is no preferred view in which either ``points'' belong to ``neighbourhoods'' or
vice versa .
After Zermelo, more axioms were added to set theory in an endeavour to formulate the strongest system
that would remain consistent, in which everything anyone could possibly want would be provable, and
the model would be unique. We mention some of these in Sections 4.1 and 9.6.
We shall take the opposite strategy: by restricting the hypotheses of an argument to just what is needed,
we are better able to understand how it works, and we are led to generalisations and novel applications.
When the box or sequent rules for the correct management of the free variables of terms and
comprehension types are taken into account, Zermelo type theory becomes far too complicated to study
in one go, and is arguably too powerful for actual mathematics and programming.
Function (λ) abstraction Sections 1.1-1.3 discussed how functions act, but they must also be
considered as entities in themselves. Early in the history of the integral calculus problems arose in which
the unknown was a function as a whole, rather than its value at particular or even all points: the Sun's
light takes that path through the variable density of the atmosphere which minimises the time of travel;
the motion of a stretched string depends on its initial displacement along its whole length.
REMARK 2.3.1 In order to consider a function per se, we must first identify which of the unknowns in an
expression p are inputs. Lambda abstraction, λ[(x)\vec], does this, thereby binding these variables
(Definition 1.1.6). Sometimes the function already has a name, such as sin or sine, but the squaring
function can only be written as 2 or, now, as λx.x2. Since λ is clearer than the informal notation p(x) of
Remark 1.1.2 as a way of distinguishing the inputs from other variables, we treat all variables, whether
free or bound, as part of the expression p.
Given a function f and an argument a, the one may be applied to the other and evaluated to a result. This
is usually written fa without brackets, in which the juxtaposition denotes a formal operation of
application; as we shall need to study this operation in its own right, we shall sometimes write ev(f,a)
instead. The result of the evaluation, which was written p(a) informally, is p[x: = a]. The passage (using
substitution)
Besides the type-theoretic rules, we also intend that the constants may have their own laws, or δ-rules, as
they are known in the λ- calculus.
X
For us, all terms are typed: if x and p have types X and Y, the abstraction λx.p has type X→ Y ≡ Y .
Notice that the two reduction rules preserve type, ie they obey subject reduction, Definition 1.2.12.
Y X XxY
The type X→ (Y→ Z) or (Z ) ≡ Z( ) is that of a function of two arguments. This trick - of using λ-
abstraction to supply multiple arguments one by one to a function - is called Currying, though it had
been observed by Moses Schönfinkel and was implicit in Frege's work.
The notation is further abbreviated to [(X)\vec]→ Y, f[(a)\vec] and λ[(x)\vec].p respectively. As this
deals with many-argument functions (in fact in a rather useful way), many authors omit pairing from the
calculus. By contrast, the type (X→ Y)→ Z is that of what is sometimes called a function al, ie a function
whose argument is itself a function, and so is more complex.
Normalisation A distinction is made between (λx.p)a and p[x: = a], and between application and
evaluation, since the result of a β-reduction is frequently a longer expression than the first: it may
contain more λs and so more opportunities for further reduction than the original term. So strategies for
β-reduction are an important topic of study in themselves.
FACT 2.3.3 The Church-Rosser Theorem says that the pure λ-calculus (without δ-rules) is confluent
(Definition 1.2.5). The simply typed pure λ-calculus is also strongly normalising (Definition 1.2.8). []
The Church-Rosser Theorem relies too much on intricacies of syntax to be appropriate for this book:
see, eg , [Bar81], [LS86] and [Bar92] for a detailed treatment. The result is valid in many different
calculi - including the untyped λ-calculus, in which the normalisation theorems fail - but unfortunately
breaks down in some variations which seem semantically benign, such as the untyped calculus with
surjective pairing. Without the type discipline (Notation 1.3.3) there need be no normal form to which a
term reduces. For example the term (λx.xx)(λx.xx) reduces to itself, and there are much worse
phenomena.
The depth of bracket nesting, with the above Convention for omitting them, considered as a notion of
type complexity, can be used to prove weak normalisation : see Example 2.6.4. It follows from Fact
2.3.3 and Theorem 1.2.9 that normal forms exist and are unique. They are characterised in Exercise
2.23, and used in Theorem 7.6.15. Section 7.7 shows in another way that every term is provably equal to
a normal form.
REMARK 2.3.4 When the λ-calculus is used as a programming language [ Plo77], it is usual to forbid β-
reduction under λ, as λx.p is regarded as an as yet passive fragment of code, which is only activated
when it is applied to some argument. Then there is a choice whether
(a)
to reduce (λx.p)a straight away ( call by name), so avoiding the perhaps unnecessary risk of
evaluating an undefined argument, or
(b)
to wait until a has itself been normalised ( call by value), so that this is not done repeatedly if x
occurs several times in p.
Contexts for the λ-calculus The variable governed by λ-abstraction, like that bound by the quantifiers
∃ and ∀, is generic, so we use boxes to delimit it. As in Section 1.5, the variable cannot be used outside
the box, because to do so would restrict its generality and prejudice the question of whether its type is
empty.
DEFINITION 2.3.5 The box rules for λ-abstraction and application are
NOTATION 2.3.6 As for the predicate calculus (Definition 1.5.3), the abstract study of the λ-calculus
needs a sequent form in which
DEFINITION 2.3.7 The sequent forms of the rules for λ-terms are
REMARK 2.3.8 We need structural rules for terms as well as formulae (Definition 1.4.8). In fact these
ought to have been given for the (∀ℑ )- and (∃E )-boxes (Definition 1.5.1), in order to import terms.
REMARK 2.3.9 There is a superficial similarity between the β- and cut rules. To see the β-rule in action in
a symbolic idiom we need to use general expressions for the body p and argument a of the function, cf
the various ways of expressing conjunction as a sequent rule in Remark 1.4.9; Example 7.2.7 gives a
diagrammatic version with variables instead. Cut makes these substitutions, and this explains the
likeness.
The sum type The rules for sum largely mirror those for product. The pair a,b for the product is
supplied by the data, and the program uses π to extract what it needs. For the sum, the program
i
provides a pair [f,g] of options, from which the input makes a selection using ν . There is an ultimate
i
result type Θ to mirror the context of parameters Γ. This symmetry is much more (arguably too) obvious
in the categorical presentation (Sections 5.3-5.5); it is really spoilt by the asymmetry of the term
calculus, in which the input but not the output may involve parameters. The defects require new variable-
binding operations, and rules to handle substitution and its dual notion of continuation.
These rules are rather technical and will not be relevant until Section 5.3, so you should skip the rest of
this section unless you are already very familiar with the typed λ-calculus expressed in a contextual
style.
The η-rule says that if the two branches contain the same code p then the switch is redundant: [\Case 0
x.p(ν0(x)), \Case 1 y.p(ν1(y))](c) = p(c).
REMARK 2.3.11 Just as pairing handles many-argument operations uniformly, so λ- abstraction avoids
the need for other variable binders. Thus (+E) can be put in a form which doesn't itself alter the context:
omitted prooftree
environment
The [ , ] notation is at least standard usage in category theory, where the rules can be summed up as the
two-way adjoint correspondence
omitted prooftree
environment
X+Y) X Y
so Θ( ≡ Θ xΘ .
REMARK 2.3.12 Terms may be imported into the boxes, with the effect of substituting for free variables
within the binding. The rule for this is exactly analogous to that in Definition 1.1.10(d),
^ ^ ^
[f,g](c)[w: = a] (c) * * *
f[w: = a], g[w: = a] w [f,g ] = [ w f, w g].
=
REMARK 2.3.13 Remark 1.6.5 showed that the (∃E )- box is essentially open-ended below: any phrase of
proof not involving the bound variable can be moved in or out. The (+E)-rule has the same property, but
now that we are discussing significant terms rather than anonymous proofs, we must state another law,
called a continuation rule or commuting conversion. This says that moving the continuation z into or
out of the box has no effect,
zo[f,g] =
[zof, zog],
so the η-rule can be expressed as [ν0,ν1] = id. Continuation is dual to substitution, and is explained in
category theory by postcomposition.
As for the function-type, these reduction rules interact, and questions of confluence and normalisation
have to be studied. For example we would like to know that every definable closed term of sum type is
provably equal to either ν0(a) or ν1(b). This will be considered in Section 7.7.
REMARK 2.3.14 Since there are two binary introduction rules, there is no nullary one (and so no β-rule
either), and as the binary elimination rule has two cases (premises), the nullary one has none.
omitted prooftree
environment
The elimination rule provides a function ∅→ Θ for each type Θ, and the corresponding equality rule
says that this is unique. The (∅η) and continuation rules mean that it is the identity on ∅ itself, and
∅
preserved by any function. The substitution rule gives Proposition 2.1.9: any X→ ∅ is invertible, Θ ≡ 1
and Γx∅ ≡ ∅.
Contexts will be developed in Section 4.3 and Chapter VIII. Sections 4.5- 4.7 give a categorical account
of products and function-types , applying the former to universal algebra. We discuss the binary sum
further in Section 5.3ff, along with the if then else fi programming construct. Section 9.3 treats the
infinitary (dependent type) analogue, Σx.Y[x].
which puts propositions and types on a par. This is sometimes called the Curry-Howard isomorphism, as William
Howard (1968) identified it in Haskell Curry's work (1958), although Nikolas de Bruijn, Joachim Lambek, Hans
Läuchli and Bill Lawvere also deserve credit for it in the late 1960s. The idea was developed by Dana Scott to give
substance to Brouwer's intuition, and rather more extensively by Per Martin-Löf.
Formulae correspond to types and their deductions to terms. Crudely, a type gives rise to the proposition that the
type has an element, and a proposition to the type whose elements are its proofs.
Indeed, as soon as we take some care over it, we have no alternative but to treat the hypothesis for ⇒ alongside the
generic value of ∀ and the bound variable of λ. Similarly Sections 4.3 and 5.3 show that midconditions go with
program-variables. Other analogies with types versus terms are games versus strategies, problems versus solutions
and specifications versus implementations of programs.
Calling the analogy an isomorphism overstates the case. Terms may or may not satisfy equations, but proofs are
anonymous: we do not usually bother either to equate or to distinguish between them. The difference between
propositions and types is that the former are much simpler; we exploit this by treating them first, in Chapters I
and III. Posets and induction concern propositions, categories and recursion are about types. (Some authors say
proof-irrelevance instead of anonymity.)
The propositions as types analogy ought not to be confused with the earlier but superficial one between predicates
and classes, which merely states the axiom of comprehension (Definition 2.2.3) in a fixed domain of discourse, and
has no algorithmic content. Far more striking is Jan Lukasiewicz's re-evaluation of the history of logic in 1934 (in
which he condemned earlier historians for their ignorance of the modern study of the subject): he attributed the
identity on propositions ( φ\vdashφ) to the Stoics and that on terms (x:X\vdash x:X) to Aristotle.
EXAMPLE 2.4.1 The proofs of the proposition α→ (α→ α)→ α may be characterised up to βη-equivalence by a
number, viz the number of times modus ponens (→ E ) is used.
These proofs correspond to the λ-terms 0 = λx f.x, 1 = λx f.fx and 2 = λx f.f(fx), called the Church numerals
(Exercise 2.44ff).
= .26 = 1.5em omitted proofbox environment -1.5em omitted proofbox environment omitted proofbox
environment
These types are the axioms of implicational logic (Remark 1.6.10): I is the identity, K corresponds to weakening,
S to contraction, T to exchange and Z to cut. Exercise 2.26 shows how to express any λ-term using S and K alone;
for example I = SKK and Z = S(KS)K. Proposition 2.8.6 gives further examples of the relationship between
proofs and terms.
REMARK 2.4.3 Arend Heyting and Andrei Kolmogorov independently gave this interpretation of intuitionistic logic
in 1934. To prove
(a)
truth: there is a trivial proof ∗ of T;
(b)
an atomic predicate: we must go out into the world and find out whether it's true ( cf Remarks 1.6.12
and 1.7.2);
(c)
conjunction: a proof of φ∧ψ is a pair a,b consisting of a proof a of φ and a proof b of ψ;
(d)
disjunction: a proof of φ∨ψ is a pair i,b, where either i = 0 and b is a proof of φ, or i = 1 and b is a proof of
ψ;
(e)
implication: a proof of φ⇒ ψ is a function f which takes a proof x of φ as argument and returns a proof f(x)
of ψ as result;
(f)
universal quantification: a proof of ∀x.φ[ x] is a function f taking an argument x in the range of
quantification (or maybe a proof that x lies in this range) to a proof f(x) of φ[x];
(g)
existential quantification: a proof of ∃x.φ [x] is a pair a,b, where a is a value in the range, and b is a proof
of φ[a].
These may be read off from the rules of Sections 1.4, 1.5 and 2.3.
REMARK 2.4.4 In particular, the direct logical rules correspond to the operation-symbols of simple type theory:
=0pt
I (a, E0 E1 I0 I1
I E (f,a)
b) (f) (f) (a) (b)
_0 _1 _0 _1 ev(f,
a,b
(f) (f) (a) (b) a).
Direct deductions are, then, sequences of declarations, ie assignments to intermediate variables of expressions
which involve previously declared variables. The utility of the proof box method as a way of composing proofs lies
in its similarity, under this correspondence, to the declarative style of programming, which we shall discuss in
Section 4.3.
REMARK 2.4.5 The indirect rules correspond to λ-abstraction, in which the bound variable is
(a)
a hypothesis in the case of (⇒ ℑ ),
(b)
an object-language variable for first order (∀ℑ ) and (∃E ),
(c)
a proposition- or type-variable for second order (∀ℑ ) and (∃E ), and
(d)
a tag (of type 2 = {0,1}: see Exercise 2.13) in the case of (∨E ).
In particular, the box lying above the (⇒ ℑ )-rule is a program which transforms a given proof x of φ into a proof p
(x) of ψ, the proof of φ⇒ ψ being the abstraction λx.p(x). Conversely, the (⇒ E )-rule applies the proof f of φ⇒ ψ
to the proof a of φ to yield a proof fa of ψ. If f was in fact λx.p then there is a β-reduction of (λx.p)a into p[x: = a],
which is obtained by removing the box around p and replacing its hypothesis x with the given proof a
(Remark 1.5.10).
WARNING 2.4.6 The λ-calculus variables introduced in this translation of indirect rules denote proofs, not
propositions. They do not occur in terms or predicates and can only be bound by λ, not by ∀ or ∃ . They should not
be confused with the variables which denote elements of the object-language, occur free in predicates and are
bound by first order quantifiers.
Programs out of proofs It is sometimes claimed that, as a corollary of the analogy, programs may be extracted
from proofs. Indeed [GLT89 , chapter 15] shows that any function which can be proved to be total in second order
logic is definable in Girard's System F. Unfortunately, to achieve this in practice it currently appears to be
necessary to add so much detail to a proof that it would be easier to write the program in the first place. Maybe in
the future it will be possible to automate the process of filling in this detail.
The constructive existential quantifier The logical symbol for which the type-theoretic version involves the
most extra detail compared to its propositional form is the existential quantifier or dependent sum. Here, then, is
where the isomorphism has its strongest consequences: in particular Choice is a theorem. So conversely this is
where it is perhaps the most overstated: the existential quantifier, as understood by ordinary mathematicians, does
not provide a particular witness, so is weaker than a dependent sum. So judicious consideration leads us to ask
where the principle is to be followed, and where it is essentially inapplicable.
REMARK 2.4.7 We observed that the extended use of the notation for set-comprehension (Remark 2.2.7) hides an
existential quantifier. Some of the footnotes in the next chapter point out other disguised quantifiers resulting from
our blindness to why one thing is ``less than'' another; this is the abstract version of only considering prov ability.
Inequalities, structures ``generated'' by a set, and any words which end in ``-able'' or ``-ible'' probably all conceal
similar secrets. Chapter IV provides ways of accounting for the unstated reasons.
EXAMPLE 2.4.8 Square roots, on the other hand, can manifestly be exhibited, but the extreme constructivist would
seem, like Buridan's ass, to be unable to make a choice of them.
The diagram illustrates the squaring function on the unit circle in the complex plane. It is well known that there is
no continuous choice of square roots which can be made all the way around the circle, but there are, according to
Brouwer, no real- (or complex-) valued functions apart from continuous ones (Remark 1.8.6). We seem to be
unable to say that squaring is a surjective function, ∀x.∃y.x = y2.
One answer which a constructivist type-theorist might give falls back on the Cauchy sequences (Definition 2.1.2)
used to define the real and complex numbers. Using only the first approximation \arga0 ∈ Q[i] to x, we make an
arbitrary selection of \argb0 such that \argb02 ≈ \arga0; this is admissible since Q has decidable equality.
Subsequent approximants \argb are chosen to be nearer to \argb0 than to -\argb0. The resulting square root
n
``function'' does not preserve the equality relation on its nominal source S1. So tokens for mathematical objects (the
actual terms of Cauchy sequences in this case), rather than the objects themselves, must be used to give the
witnesses of existential quantifiers. A similar idea will emerge from constructions of algebras using generators and
relations in Section 7.4.
I have to say that I am not convinced, and feel that the Example shows that the unwitnessed existential quantifier is
important in mathematics. This and the dependent sum should be seen as two cases (in some sense the extremes) of
a single more general concept, and we shall treat them as such in Section 9.3. To rely on coding is certainly not
conceptual mathematics, even if Per Martin-Löf regards this as constructive.
Another reason for wanting to ignore the witnesses is that, in the study of systems, we need to be able to discard as
much information as we can about the lower levels in order to comprehend the higher ones.
The existence and disjunction properties Classical logic asserts (and thereby gives a trivial proof of the fact)
that φ∨\lnot φ is true, without necessarily giving a proof of either φ or \lnot φ.
REMARK 2.4.9 The normalisation theorem for the sum type (which we shall prove in Section 7.7) says that any
proof \vdash c:φ∨ψ is either of the form ν0(a) with \vdash a:φ or of the form ν1(b) with \vdash b:ψ. So the
identification of ∨ with + means that the only way to prove φ∨ψ is by first proving one of the disjuncts. This is
known as the disjunction property of intuitionistic logic.
Similarly, if ∃x.φ[x] is provable then there is some a - which may be found by examination of the given proof - for
which φ[a] is also provable (the existence property). This is what justifies the uniform proofs in Fact 1.7.1. The
classical inter-definability of ∀ and ∃ via negation breaks down because ∀ has no analogous property. For
example, let φ[n] be the statement that n is not the code of a valid proof in Peano arithmetic whose conclusion is
that 0 = 1. Then φ[n] has a proof for each n, but Kurt Gödel (1931) showed that ∀x.φ[x] has no proof in the same
system (Theorem 9.6.2).
Classical logic The failure of the disjunction property in classical logic seems to mean that it has no constructive
interpretation.
⇒
(θ⇒ ψ)⇒ θ
θ,
may be expressed in sequent form as
omitted prooftree
environment
and provides a useful proof box idiom for classical logic. If something in the predicate calculus is provable
intuitionistically, the proof can often be found very easily using the methods of Section 1.7. When proof by
contradiction is needed, the negation of the goal often has to be introduced repeatedly and un-intuitively as a
temporary hypothesis.
The sequent above has been called the restart rule because it is invoked when we have failed to prove the local
goal: it is sufficient to prove any pending goal from an enclosing box. Note that this is a valid classical proof of the
main goal θ, but not necessarily of the sub-goal φ⇒ ψ.
This idiom seems a little less bizarre if we imagine, not a proof, but a program such as a compiler with a recursive
structure tied closely to that of the user's input. The easiest way of dealing with potential errors in the input is to
abandon the structure, ie to jump directly out of scope. Recently, methods have been developed for compiling
higher order programs into very efficient continuation-passing object code. Nevertheless , such idioms are
notoriously difficult to understand: the verification must take account of the fact that there are two ways of getting
to line restart goal.
Returning to logic, suppose we have a classical proof which doesn't use (⊥E ). Harvey Friedman (1978) observed
that the symbol ⊥ (and its implicit use for negation) may validly be replaced throughout such a proof by any
formula - such as the one to be proved (Remark 1.4.4). Peirce's law serves for excluded middle in this translation;
this explains why box proofs that involve negation or classical logic are very difficult to find. Hence any
proposition of the form ∀n.∃m.φ[n,m] with a classical proof also has an intuitionistic one, so long as φ is primitive
recursive, and in particular decidable. This result does not extend to more complex formulae, for example ∃m.∀n.f
(m) ≤ f(n) is provable in classical Peano arithmetic, for any f:N→ N, but (using Gödel again) ∀n.f(m0) ≤ f(n) need
not be provable in the same system for any m0.
See [Coq97] for a survey of the constructive interpretations which can be given to classical logic.
Although we do not exploit such interpretations in this book, continuations do feature throughout, in the
background. The open-ended (∃E )- box and the commuting conversion rule for sums (Remarks 1.6.5 and 2.3.13ff)
were our first encounter with them, and we shall always write them as the symbol z. They will arise in more
interesting ways in our treatments of lists and while programs.
Term and type assignment Another difference between propositions and types apparent in the simpler examples
of symbolic reasoning is that algebra, the λ-calculus and programming typically involve many terms but few
distinct types, whereas in logic we mention many propositions, but the identity of their proofs is unimportant and
usually left implicit.
REMARK 2.4.11 The uninteresting information therefore gets elided, and we might expect development tools to
recover it automatically:
(a)
The type discipline is important, but it is a nuisance for the programmer to have to write this information in
longhand. So one of the jobs of a compiler is to interpolate it, ie to verify that (base types can be chosen
such that) each sub-expression obeys the type discipline. This process is known as type assignment or
reconstruction. For algebraic expressions this is very easy, but it becomes more useful (and difficult) in
polymorphic languages such as ML
(b)
Conversely, filling in the proof of a proposition from given hypotheses is called term assignment (Section
1.7).
The \Clone notation We shall exploit the analogy between propositions as types, deliberately confusing algebra,
logic and type theory.
Cn(Γ,
X)
for the set of (αβηδ-equivalence classes of) terms of type X in co ntext Γ. In universal algebra this is known as the
clo ne, whilst in model theory it is useful to consider the set Cn(Γ) of all co n sequences of a set of hypotheses.
In the last case, Cn(Γ,θ) is the set of proofs of θ, so the consequences are just those θ for which this is non-empty.
We shall develop this in Chapters IV and VIII.
Induction and recursion are often confused in symbolic logic, but the difference between them is another
one between propositions and types: we prove theorems by induction, but construct programs by
recursion. The two are linked by the General Recursion Theorem 6.3.13 (it is not enough ``to prove by
induction that p(x) is defined,'' as this wrongly treats existence as a predicate).
DEFINITION 2.5.1 A self-calling program is said to obey the recursive paradigm if it defines a function p
(x) of exactly one argument and it proceeds as follows:
(a)
from the given argument t, by means of code known to terminate,
it derives zero or more other sub-arguments [(u)\vec] of the same type,
(b)
then it applies p to each sub-argument \termu in parallel, ie with no interaction among the sub-
j
computations,
(c)
finally, from the sub-results returned by the recursive calls together with the original argument,
by means of another piece of code which is known to terminate, it computes its own result to be
output.
We shall call phases (a) and (c) parsing and evaluation respectively, and write u\prec t whenever u is a
sub-argument of t. This may mean that u is symbolically a sub-expression of t, in which case we're doing
structural recursion, but we shall use the notation and terminology for any recursive problem which fits
the paradigm, whatever the means by which the us are obtained from t.
Recursive definitions of functions with many arguments, and systems of functions which call each other
( mutual recursion), are no more general than the paradigm, because the arguments or results may be
packaged into a ``tuple.'' The complication which we have not allowed is that a sub- result might be fed
back to help generate another sub- argument. The effect of the paradigm is to allow the parsing of
hereditary sub-arguments to be exhausted before any evaluation is done; this will be expressed more
formally in Definition 6.3.7. A special case, tail recursion, is discussed in Section 6.4, where it is shown
to be equivalent to the imperative while construct.
EXAMPLES 2.5.2
(a)
The familiar example of the factorial function
(b)
A compiler for a programming language takes the source text of an entire program and extracts
the immediate sub-expressions, which it feeds back to itself. It puts the digested code-fragments
together as the translation of the whole program.
(c)
Carl Friedrich Gauss (1815) showed how to transform a polynomial of degree 2n into one of
degree n(2n-1), in such a way that each root of the latter may be used to obtain two roots of the
former by means of a quadratic equation. Any polynomial of odd degree has a real root by the
intermediate value theorem. Here the sub-argument is the auxiliary polynomial and the sub-result
one of its roots.
How do we prove that such programs are correct? There is nothing to say in the case of the factorial
function, because it is evaluated directly from its definition. The correctness of the compiler, ie of its
output (object code) relative to the input (source), is given case by case for the connectives of the
language: for example the Floyd rules in Sections 4.3, 5.3 and 6.4 for imperative languages say that if
the sub-expressions satisfy certain conditions then so does the whole program(-fragment). Finally,
Gauss's paper gave a method of deriving the auxiliary polynomial and showed that if it has a root then
this provides a root of the original one.
Induction In all of these arguments, and whenever we want to prove consequences of recursive
definitions, it is necessary to demonstrate
a typical instance, θ[t], of the required property, from ∀u.u\prec t⇒ θ[u], the induction
hypothesis, that all of the sub-arguments have the property,
by methods which are peculiar to the application. Common usage often forgets that the θ[t] so proved is
not that of the ultimate conclusion, ∀x.θ[x], but only a stepping stone. In order to justify a single θ[a],
the induction step must be proved for every t (or at least for all descendants of a). Such sloppiness in the
presentation can give the impression that induction is circular.
To obtain the desired result, we have to use an additional rule, of which the proof ``by induction'' is the
premise.
omitted prooftree
environment
where \prec is a binary relation on a set X. We say that \prec is well founded if the induction scheme is
valid. It is a scheme because it is asserted for each predicate θ (which may involve parameters) on X;
quantification over predicates introduces second order logic (Section 2.8).
The important point formally is that the variable t in the premise and the induction hypothesis which it
satisfies are bound in an (∀⇒ ℑ) proof box, so nothing is to be assumed about t apart from this.
We have followed tradition in using a symbol \prec suggesting order for well founded relations, but this
is extremely misleading. The motivating examples of immediate sub-processes and sub-expressions
above are not transitive, and Corollary 2.5.11 shows that \prec is irreflexive.
Genealogical analogies can be useful, but there is a conflict of intuition about the direction. In set theory
it is traditionally synthetic: ``on the first day, God made ∅'' [Knu74]. Our recursive paradigm is analytic,
and sub-processes are usually called children, so ∅ is the ultimate descendant.
∀u.u\prec \prec t⇒ φ
[u].
Theorem 3.8.11 uses this to show that the strict induction scheme
omitted prooftree
environment
(with ⇔ ) is equivalent to the one in Definition 2.5.3.
EXAMPLES 2.5.5
(a)
If no instances of the induction hypothesis are actually used in the body of the proof, it is simply
an (∀ℑ ) (Definition 1.5.1).
(b)
Let n\prec n+1 on N. The rule reduces to the familiar Peano scheme,
omitted prooftree
environment
n 2 1
which is also known as simple, primitive, or just mathematical induction. ∑ = 1 k = /6n(n+1)(2n+1)
k
(c)
Let \prec be the strict arithmetical order < on N. Then we have course-of-values induction
(sometimes called complete induction):
omitted prooftree
environment
This relation \prec is the transitive closure of the previous one. Notice that in the base case (n= 0) the
hypothesis ∀i < n.θ[i] is vacuous.
Whereas simple induction deals with orderly, step-by-step reductions (like nuclear decay by α- or β-
radiation), numerous algorithms split up graphs into parts whose size may be anything between zero and
the original size (like fission). For such problems we need course-of- values induction. But the problem
for a graph with n nodes does not usually reduce to sub-problems involving graphs of every size < n (the
products of fission of 235U by neutrons typically have masses about 90 and 145), so \prec is usually
(d)
For a recursive program in a functional language, the invariant or induction hypothesis behaves
just as in the mathematical setting: the induction premise is that the result on an argument t is
correct as long as the recursive calls (\termu ) have been computed correctly.
j
(e)
Since the order of any subgroup or quotient divides that of a finite group, many properties of
groups (notably the Sylow theorems, Fact 6.6.8, and their applications) are shown by induction
on the divisibility orders on N. This may be seen as just another example of course-of-values
induction (c). Alternatively, prime factorisation expresses the positive integers as (part of) an
infinite product of copies of N, which may be given either the product order or, as with Gauss's
proof, the lexicographic one.
Section 2.7 considers induction and recursion for numbers and lists.
Minimal counterexamples If some property is not universally true of the natural numbers, there is a
least number for which it fails. (Georg Cantor generalised this idea to define the ordinals, Section 6.7.)
This least number principle depends on excluded middle, but with a little care such proofs can be made
intuitionistic.
PROPOSITION 2.5.7 [Richard Montague, 1955] A relation \prec is well founded iff every non-empty
subset has a \prec -minimal element.
PROOF: Let ∅ ≠ V ⊂ X, and put θ[x] = (x ∉ V). This means that the conclusion of the induction scheme
( ∀x.θ[x]) fails. The scheme is therefore valid iff the premise also fails, ie
REMARK 2.5.8 The classical idiom of induction is to use the minimal counterexample to show that ∀x.θ
[x] by reductio ad absurdum. Like Example 1.8.2, this is very often gratuitous. The hypothesis ``let t be a
minimal counterexample'' breaks into two parts:
(a)
t is a counterexample, \lnot θ[t], and
(b)
anything less, u\prec t, is not a counterexample, ie ∀u\prec t.θ[u].
The second part is the induction hypothesis as given in Definition 2.5.3. Commonly, θ[t] can be
deduced without using the first part, so this may be eliminated to give a sound intuitionistic argument.
Imagine a proof by induction as the tide climbing up a beach: in the end the whole beach gets
submerged. The induction scheme requires us to show that any part which is above only water soon gets
wet. The classical induction hypothesis is that there is somewhere which is on the water-line but still
completely dry - a rather implausible supposition!
COROLLARY 2.5.9 Any (properly) recursive definition of a function which terminates on all values
includes an analysis ( parsing) into
(a)
base cases (leaves or \prec -minimal elements) at which the value of the function is immediately
given in the code (in Examples 2.5.2 these are respectively 0! = 1, constants of the language and
odd-degree polynomials), and
(b)
induction steps (branches) at which the value of the function is given by a k-ary operation applied
to the values at recursive arguments; k = 1 in three of the examples, but for a compiler the arities
are exactly those of the connectives of the language. []
This feature of well founded structures and free algebras will recur many times, in the rest of this
chapter, in Sections 3.7- 3.8, and in Chapter VI.
Descending chains Dependent Choice (Definition 1.8.9) gives another condition which is non-
constructively equivalent to well-foundedness.
PROPOSITION 2.5.10 (X,\prec ) is well founded iff there is no sequence of the form ···\prec \termu3\prec
\termu2\prec \termu1\prec \termu0.
PROOF:
● [[⇒ ]] If there is a sequence \termu with ∀ n.\termu \prec \termu then U = {\termu |n ∈ N }
n n+1 n n
is a non-empty set with no \prec -minimal element; alternatively the predicate θ[x] ≡ (∀n.x ≠
\termu ) does not satisfy the induction scheme.
n
● [[⇐ ]] If (X,\prec ) is not well founded, it has a non-empty subset U ⊂ X with no minimal
element. In other words, ∀t.t ∈ U⇒ ∃u. u ∈ U∧u\prec t and \termu0 ∈ U, so the axiom of
dependent choice applies. []
The geometrical form of Euclid's algorithm (Example 6.4.3) was perhaps the first statement of
induction: an infinite sequence of numbers is found, each less than the one before, which, as
Elements VII 31 says quite clearly, is impossible amongst whole numbers. When this algorithm is
applied to the side and diagonal of a square or pentagon, however, the successive quotients form a
periodic sequence. David Fowler [Fow87] has made an appealing reconstruction of classical Greek
mathematics, in which he claims that Euclid's book was motivated by problems like this. (The story that
the discovery of the irrationality of √2 led to the downfall of the Pythagorean sect seems to have been an
invention of Iamblichus nearly 1000 years later.)
In the investigation of the triangle which bears his name, Blaise Pascal (1654) stated lemmas for one
case and for any ``following'' one, and concluded that the result holds for all numbers. John Wallis
(1655) may also have been aware of the logical principle, but Pierre de Fermat (1658) was the first to
make non-trivial use of the method of infinite descent to obtain positive results in number theory
(Exercise 2.33).
A variant of this result is (Dénes) König's lemma (1928). A tree is an oriented graph with a
distinguished node (its root) from which to each node there is a unique oriented path. Since in most
applications the branches (outgoing arrows) from any node are labelled in some fashion, the Choice of
branch may usually be made canonically. To any programmer, the procedure in König's lemma is depth-
first search.
COROLLARY 2.5.11 If, in a tree, every node has finitely many branches and there is no infinite oriented
path, then the tree is finite.
op
PROOF: As there is no ω -sequence, the branch relation is well founded, so induction on it is valid.
Using this, every sub-tree is finite. []
Using this formulation of induction is not so easy as it may appear: somehow we have to find that
forbidden infinite sequence, which usually requires König's lemma.
Notice that we never said that the terms in the sequence had to be distinct; of course if only finitely
many values may occur then any infinite sequence of them must contain repetitions ( loops).
COROLLARY 2.5.12 If the relation has a cycle, u\prec u, or u\prec v\prec u, or u\prec v\precw\prec u, etc ,
then \prec is not well founded. []
Proof trees The graphical presentation, as a tree without cycles, shows the difference between the
induction step which is to be proved and the induction hypothesis which can be assumed.
REMARK 2.5.13 The ingredients for an inductive proof of θ[n] by the Peano primitive induction scheme
are the axiom/zero θ[0] and the rule/successor ∀n.θ[n]⇒ θ[n+1]. If we need to prove θ[3], then these
may be composed, showing the different roles of the occurrences of θ. Compare this with the proofs we
called 0, 1 and 2 in Example 2.4.1, and notice how the structure of the term directly gives that of the
proof.
Termination Infinite descending sequences are very familiar to anyone who has tried to execute an ill
founded recursive program: the sequence is stored on the machine stack, which overflows. (Non-
terminating loop programs are usually silent.)
REMARK 2.5.14 (Robert Floyd, 1967) In what circumstances can a recursive program p which obeys the
paradigm (Definition 2.5.1) fail to terminate on a particular argument \termu0? If, and only if, the
execution of p(\termu0) involves the computation of some p(\termu1), which in turn requires some p
(\termu2) and so on ad infinitum.
For any two values t and u of the domain of definition, write u\prec t if u is one of the sub-arguments
p
generated in the first step of the computation of p(t). Then the program p terminates on all arguments iff
\prec is well founded. Let θ[x] denote that p terminates on the argument x. (This is a higher order
p
predicate; indeed its mathematical content is really that \prec restricted to the descendants of x is well
p
founded.) The premise of the induction scheme holds by assumption on the form of p, that the first and
third stages of the program terminate. The conclusion is universal termination, so this is the case iff the
induction scheme is valid. []
A good definition in mathematics is one which is the meeting point of examples and theorems, cf
introduction and elimination rules in logic. Definition 2.5.3 is such a meeting point. Minimal
counterexamples and descending sequences may perhaps help to give you a picture of how induction
works, but they do not prove it. Without excluded middle, nor can they even prove the results for which
induction is needed.
Complexity measures A recursive program terminates iff its sub-argument relation is well founded,
but usually the only way to calculate the number of iterations needed is to execute the program itself.
This doesn't matter because, as the first result shows, if we can show that something is reduced at each
iteration then the loop terminates. This quantity is called a loop measure.
DEFINITION 2.6.1 A function f:(X, < )→ (Y,\prec ) between sets with binary relations is said to be strictly
monotone if
PROOF:
EXAMPLES 2.6.3 For the majority of applications, including the first two of Examples 2.5.2 (factorial
and compiler), termination is very easy, because we can see directly that the sub-argument is a smaller
number, a shorter string of symbols or a shallower tree. Gauss's proof is shown to terminate with a little
k k-1
more effort: the sub-arguments are polynomials of degree 2 \numo0, 2 \numo1, ..., 2\numo , \numo ,
k-1 k
EXAMPLE 2.6.4 Normalisation for the λ-calculus (Fact 2.3.3) is a qualitatively more difficult problem.
For weak normalisation, we must define a reduction strategy, ie a way of choosing a redex of any term t,
such that the relation u\prec t is well founded, where u is the result of the chosen reduction. This is a tail
recursion because there is just one reduced term, which we regard as a sub-argument (in the sense of
Definition 2.5.1(a)) , and whose normal form is that of the original term. The entire normalisation
process happens in the parsing phase of the recursive paradigm, with trivial evaluation, ie it is tail-
recursive.
The strictly monotone function used to show termination takes the term t to the set of types of its
subterms, and so the proof depends on subject reduction (Definition 1.2.12), and fails for the untyped λ-
calculus.
∀t,u: X. t\prec u ∈ U⇒ t ∈
U.
PROPOSITION 2.6.6 Let (X,\prec ) be a set with a binary relation. Suppose for every x ∈ X there is a well
founded initial segment X′ ⊂ X with x ∈ X′. Then X is itself well founded.
PROOF: (Essentially Remark 2.5.13.) Let θ be a predicate on X for which the premise of the induction
scheme holds, and x ∈ X: we have to show θ[x]. Let X′ ⊂ X be a well founded initial segment with x ∈ X
′. The premise still holds when restricted to X′, and by well-foundedness the conclusion, ∀x′.x′ ∈ X′⇒ θ
It follows, for example, that the disjoint union of sets with well founded relations is well founded.
Products There are three ways of putting a well founded relation on the product of two sets, depending
on how strict we are about descending on the two sides.
PROPOSITION 2.6.7 The cartesian product of a well founded relation (Y,\prec ) with an arbitrary relation
(X, < ) is well founded:
The next construction was popular in classical mathematics because it preserves trichotomy; in
particular this is how to multiply ordinals. See Exercise 2.40 for a more general result.
X
x′,y′\prec x,y if (x′\prec x) ∨ (x′ = x ∧y′\prec
Y
y),
X Y
is well founded if both \prec and \prec are.
PROOF: The induction hypothesis is of the form (α∨β)⇒ θ; this is equivalent to (α⇒ θ)∧(β⇒ θ), which
is more convenient.
[]
in which descent is necessary on both sides, at least one of them being strict. This is well founded,
X Y
assuming that both \prec and \prec are.
Although box proofs such as these are not difficult to find, they are nevertheless quite complicated. To
express them in the vernacular would require an ah doc lemma for each box (Remark 1.6.2), but this
would only take us backwards conceptually. Intuitionism has forced us to devise auxiliary predicates
(ψ), and it is by investigating their role that we make progress in Chapter VI and [ Tay96b].
The predicate calculus has a better claim to being the ``machine code'' of mathematics than set theory or
the Sheffer stroke does, but machine code is always rather clumsy in handling higher level idioms.
Monotonicity,
which was elided from the end of the first proof, is the first of many concepts which need to be coded on
top of the predicate calculus, but which have foundational status themselves. Monotonicity is the subject
of the next chapter. In fact the modal operator (Definition 3.8.2)
is the key to reducing these proofs to easy calculations; Definition 2.6.1, for example, says that [\succ ][f]
θ⇒ [f][ > ]θ. Although well-foundedness of the transitive closure and the strict induction scheme can be
studied using the methods which we have described, this will be much simpler with modal logic in
Theorem 3.8.11 and Exercise 3.54 . Arithmetic for the ordinals (Section 6.7) packages the techniques of
this section.
Structural recursion over free algebras is crucial for foundations because, from the outside, the
mathematical world is just a string of symbols: to handle it at the most basic level we need concatenation
and parsing operations. On the other hand, van Kampen's Theorem 5.4.8 illustrates that lists pervade
mathematics way beyond foundational considerations. Finite sets arise even more frequently, but to
count a set means to form an exhaustive, non-repeating list of its elements (Section 6.6).
Induction for numbers and lists Richard Dedekind (1888) studied the natural numbers as well as the
reals, and gave the following axiomatisation, but it is usually attributed to Giuseppe Peano (1889).
DEFINITION 2.7.1
(a)
0:N;
(b)
if n:N then succn:N;
(c)
if n:N then 0 ≠ succn;
(d)
if n,m:N and succn = succm then n = m;
(e)
if U ⊂ N is a subalgebra, ie 0 ∈ U and ∀n:N.n ∈ U⇒ succn ∈ U, then U = N.
The last is the induction scheme, which we have already mentioned as an example of well founded
induction in Example 2.5.5(b).
The set of lists of elements of a set X may be defined in a similar way. The set X is sometimes called an
alphabet, and lists are words. Other names for lists are strings, paths and texts.
DEFINITION 2.7.2
(a)
The empty list, written nil, ∗ or [ ], is in List(X);
(b)
if h:X and t:List(X) then cons(h,t): List(X), the list constructed from the head h and tail t; some
authors write h::t for this;
(c)
∗ ≠ cons(h,t);
(d)
if cons(h,u) = cons(k,v) then h = k and u = v;
(e)
if U ⊂ List(X) is a subalgebra, ie ∗ ∈ U and ∀h:X.∀t:List (X). t ∈ U⇒ cons(h,t) ∈ U, then U =
List(X).
Stephen Kleene used the notations X* = List(X) and X+ = X*\{[ ]} in the theory of regular grammars, cf
\leadsto * for the transitive reduction relation (Definition 1.2.3), but we shall not use them. It is usual to
write
x ].
n
The head and tail operations are also known as car and cdr. These are fossils of John McCarthy's
original implementation of LISP in 1956 on an IBM 704. This machine had a 36-bit register with two
readily accessible 15-bit parts called address and decrement, of which car and cdr extracted the
contents. Lists also feature in a dynamic or imperative context as stacks, where cons is push and (head,
tail) together correspond to pop.
DEFINITION 2.7.3 The last axiom says that the relation t\prec cons(h,t) is well founded and gives rise to
list induction:
omitted prooftree
environment
For the one-letter alphabet X = {s}, this is just Peano induction on List({s}) ≡ N, where cons is the
successor.
Concatenation Many of the uses of lists can be seen as simply ``adding them up,'' where the notion of
addition is some associative operation with a unit, ie it defines a monoid. Concatenation of lists is the
generic such operation. In the following examples, notice that functions of two lists may often be
defined by recursion on only one of them.
DEFINITION 2.7.4
(a)
The concatenation of two lists is defined by structural recursion on the first of them (this
operation also called append ):
∗;l = l cons(h,t);l =
h,(t;l ) .
cons
Section 4.2 explains why we use a semicolon for this as well as for relational composition
(Definition 1.3.7).
(b)
Let M be a set with an element e and a binary operation m, and f:X→ M any function. Define a
function List(X)→ M by omitted eqnarray* environment so f(x) = fold(e,m,f,[x]).
For practical purposes, these recursive definitions are rather inefficient; Exercise 6.28 gives
extensionally equivalent versions of fold and append which are tail -recursive, ie they are essentially
while programs. But we need to know much more about lists and monoids to understand how to
transform programs in this way.
PROPOSITION 2.7.5
(a)
Lists form a monoid with ∗ as unit and (;) as composition:
(b)
If (M,e,m) is a monoid then l→ fold(e,m ,f,l) is a homomorphism for the binary operations:
fold e,m,f,(l;l1) = m fold(e,m,f,l), fold(e,m,f,l1) .
PROOF: To use list induction in each case we have to identify the variable l and the predicate θ[l]
(namely the displayed equation), then prove the base case θ[∗] and the induction step ∀h.∀t.θ[t]⇒ θ
[cons(h,t)]. We are given one unit law, ∗;l = l, and so the base case, ∗;∗ = ∗, for the other.
(a)
[[right unit, step]] Since (t;∗) = t by the induction hypothesis,
cons(h,t);∗ = cons(h,(t;∗)) = cons(h,
t).
(b)
[[associativity, base]] ∗;(l1;l2) = l1;l2 = (∗;l1);l2.
(c)
[[step]] omitted eqnarray* environment
(a)
[[base]] m([[∗]] ,[[l1]]) = m(e,[[l1]]) = [[l1]] = [[∗; l1]]
(b)
[[step]] omitted eqnarray* environment
The operation fold has countless programming applications. We freely mix (functional) programs and
mathematical notation.
(a)
length(l) = fold(0,+,(λx .1),l) ∈ N.
(b)
reverse(l) = fold(∗,( λl1,l2. l2;l1), (λx.cons (x,∗)),l) ∈ List(X).
(c)
fold(0,+,id,l) is the sum of the elements.
(d)
map(f,l) = fold(∗,(;),f, l) applies a function f:X→ Y to each of the elements of a list ``in parallel,''
so
mapf (cons(h::t)) = cons f(h),map(f,t)
(it is the effect of the functorList:Set→Set, Section 4.4).
(e)
fold(⊥,∨,(λx.y = x),l) is a proposition, namely whether the value y occurs as an element of l.
(f)
fold(∅,∪,(λx.{x}), l) ∈ P(X) provides the set of elements of l.
(g)
If equality in X is decidable, fold(0,+,(λ x.x = y),l) ∈ N counts the occurrences of y in l, where ``x
= y'' is the function which returns 1 if they're equal and 0 otherwise.
(h)
l→ fold(∗,(;),id, l):List(List(X))→ List(X) flattens a list of lists.
Useful though they are, fold and append are not in fact the fundamental operations of recursion over
lists.
REMARK 2.7.7 Zero and successor are the introduction rules for N:
Γ. The two cases are each matched by the β-rules, which say how to compute with rec,
m+n = rec(m,succ,n) and mxn = rec 0,(m+(-)),n .
For factorial, we need the argument n in the evaluation phase (Definition 2.5.1(c)). Exercise 2.46 uses
pairs to reduce this to the basic case.
Predecessor and pattern matching Successor and cons would be bijective functions, but for the
omission of one element (zero or the empty list) from their images. It is tempting to overlook this and
define ``inverses,'' pred(n+1) = n, tail(cons(h,t)) = t and head(cons(h,t)) = h.
REMARK 2.7.9 The new operations are only partially defined, but we may extend them as we like to total
functions, since the support is complemented. For example pred(0) may be taken to be zero, ``error'' or
``exception.'' (The last is a non-judgemental word for error, such as the exit from a loop.)
Although the operations can be forced to be total in this fashion, the rules, such as succ(pred(x)) = x,
are only conditionally valid. A similar situation arises with division by zero, and we shall discuss how
algebraic methods may be extended to handle it in Examples 4.6.4 and 5.5.9.
A more flexible approach is to say that each case offers a pattern r([(x)\vec]) against which the terms
may be matched. Then we may define functions as we like by case analysis, so long as the patterns are
mutually exclusive (non-overlapping). The function so defined is total iff the patterns are also
exhaustive.
In particular, definitions of ``well formed'' formulae (wffs) in complex type theories (Section 6.2) may
be given by side-conditions involving the outermost operation-symbol(s). Sometimes we have to parse
more than once, the general procedure being unification (Section 6.5).
Type-theoretic rules for lists We shall now give the Gentzen-style presentation for lists, but if you are
not yet familiar with the rules for the sum type you should skip the remainder of this section.
REMARK 2.7.10 Empty list and cons are introduction rules for List(X):
The operator listrec must also be invariant under substitution. By using a function \ops instead of a
Θ
term with free variables of types X and Θ we have avoided introducing yet another variable binder
together with its α-equivalence ( cf Remark 2.3.11).
COROLLARY 2.7.11 List(X) - in particular N ≡ List(1) - is the free algebra for two different theories:
(a)
One constant (variously called 0, ∗, [ ], \EMPTY or nil), together with one unary operation
( succ or λn.n+1), or in general ( λt.cons(h,t)) for each element h ∈ X, and no laws. For any
other such structure (Θ,\opz ,\ops ), listrec defines the unique mediating homomorphism.
Θ Θ
(b)
A single associative binary operation (+ or ;) with a unit (0, etc ) and a generator 1, or h for each
element h ∈ X. The mediating homomorphism is given by fold. []
REMARK 2.7.12 The terms \opz :Θ and \ops :XxΘ→ Θ in the premises of (E List) are called the seed
Θ Θ
and action of X on Θ. The continuation rule says that if z:Θ→ Θ′ is a homomorphism for this structure,
omitted prooftree
environment
(There is no connection between z and \opz at the moment: this notation will make more sense in
Θ
Section 6.4.)
Although we have written Proof after these assertions, it will take us until Section 6.3 to show that List
(X), the type of lists with elements from a given set X, actually exists in Zermelo type theory and has the
required properties. On the other hand, in functional programming it is more appropriate to treat List as a
new type constructor like +, x and → , together with the rules in Remark 2.7.10.
List recursion is a type-theoretic phenomenon for which we haven't yet given the propositional analogue.
This is induction for the reflexive-transitive closure of any binary relation, and is discussed in Sections
3.8 and 6.4. This will be based on another general induction idiom (on closure conditions), given in
Definition 3.7.8. Similar methods are used for finite subsets in Section 6.6, which we shall also discuss
using lists. In Chapter VI we shall introduce a new approach to induction based on a categorical analysis
of free algebras. Theorem 6.2.8(a) constructs the free category on an oriented graph using the list idea.
First order schemes Before ascending to the second order, let us note first that there is a tradition
(with almost a strangle-hold over twentieth century logic [ Sha91]) of reading any quantification over
predicates or types as a scheme to be instantiated by each of the formulae which can be defined in the
first order part. This has a profound qualitative effect.
REMARK 2.8.1 The completeness of first order model theory - the fact that the syntax and semantics
exactly match (Remark 1.6.13) - has strange corollaries for the cardinality of its models. If a theory has
arbitrarily large finite models then it has an infinite one (the compactness theorem), and in this case
there are models of any infinite cardinality (the Löwenheim-Skolem Theorems). Second order logic has
no such property.
The first order theory of R is of algebraic interest in itself. It suffices to say that -1 is not a sum of
squares, and every odd-degree polynomial equation has a root, cf Example 2.5.2(c).
Addition and multiplication are definable by recursion from successor in second order Peano arithmetic
(Example 2.7.8). Moj\.zesz Presburger (1930) showed that, in stark contrast, first order arithmetic with
addition is decidable, as it can only express (in)congruence modulo fixed numbers and so-called linear
programming. Arithmetic with addition and multiplication allows Diophantine equations, which are
undecidable.
The type of propositions Even though set theory can be presented in a first order meta-language, the
subject itself is plainly intended to handle higher order logic. By the definition of powerset
(Definition 2.2.5ff), the type of propositions is isomorphic to P(1).
NOTATION 2.8.2 We shall use the symbol Ω for the type of propositions. It is a hybrid of the Greek letter
Ω (omega) used for this purpose in topos theory, and the digit 2, since classically Ω = {⊥,T}.
X
PROPOSITION 2.8.3 P(X) ≡ Ω .
PROOF: Write λx:X.φ instead of {x:X|φ[x]} for the term of type P(X) corresponding to the predicate φ[x] .
The membership relation (a ∈ U) is application (Ua) and the β-, η- and equality rules for the powerset
are special cases of those for the λ-calculus. []
so bi-implication (⇔ ) is a congruence (Definition 1.2.12). This special case of the extensionality axiom
(Remark 2.2.9(a)) is rather mysterious (Exercise 2.54); Remark 9.5.10 considers what need there is for
it.
Definability of the connectives In Remarks 2.3.11 and 2.7.10 we made use of the λ-calculus to avoid
the need to introduce new variable-binding operations for the (+E)- and (List E)-rules.
This is little more than a shift of notation, but there is a further reduction based on a more profound idea.
In the following formulae (due to Russell) we use θ for the variable bound by ∀, because we shall find
that it always plays the same special role , namely that of the arbitrary conclusion or result type in the
elimination rules for ⊥, ∨, +, ∃ and List.
where x has any type X (which may be Ω), υ:X→ Ω and the other variables are of type Ω. In fact the
formulae on the right satisfy the introduction and elimination rules for the connectives, even when the
latter are not in the language.
PROOF: Consider the third: φ\vdash ∀θ.(φ⇒ θ)⇒ θ amounts to (⇒ E ); for the converse put θ = φ. The
other results are obtained by substituting a first order equivalence into this one. []
We shall give a detailed proof for the conjunction in Example 2.8.12, from which we see that these
equivalences do not depend on the extensionality axiom (Remark 2.8.4). There is a similar result for
equality, cf congruence, Definition 1.2.12.
PROOF: Suppose first that we also have p:X\twoheadrightarrow P(X) splitting m, ie pom = \id ( ). Put R
PX
According to Cantor (1891), P(X) is ``bigger than'' X, since there is an injective function x→ {x} in one
direction but none the other way. He was ignoring Galileo's 1638 warning about ``how gravely one errs
in trying to reason about infinities by using the same attributes that we apply to finites,'' in response to
the observation that the squares form a proper but equinumerous subset of N, from which he concluded
that ``equal, greater and less have no place in the infinite.'' Cantor's interpretation has prevailed (so far),
even though it is well known that his motivations were religious at least as much as they were
mathematical [Dau79]. Much has also been made of the self- referential nature of this and similar results
such as Gödel's Incompleteness Theorem [Hof79]. We shall return to these matters at the end of the
book.
Second order types Quantification over predicates was needed for induction, so (applying the ideas of
Section 2.4) quantification over types should tell us something about recursion. We leave discussion of
the type of types to the final section of the book.
EXAMPLE 2.8.9 If we delete the dependency on n, Peano induction for N becomes the second order
formula
DEFINITION 2.8.10 The second order polymorphic λ-calculus was introduced independently in proof
theory by Jean-Yves Girard (1972) and in programming by John Reynolds (1974). To the base types of
the λ-calculus (Section 2.3) it adds type variables, and an operation of `` quantification over types'' (α),
omitted prooftree
environment
where we list the free type variables (α, β, ...) together with the ordinary variables (x, y, ...) on the left of
the turnstile, subject to the condition that for each ordinary variable x:V in Γ, any free type variables in V
must occur beforehand (Definition 2.2.8). In particular, when we write `` Γ,α'' as a context, we
presuppose that Γ was already a valid context, so α must not be free in the type of any variable x ∈ Γ.
This is important because, if x ∈ FV(p), then x remains free in λα.p in the (Πℑ )-rule.
but John Reynolds showed that the type Πθ.((P(P(θ))→ θ)→ θ) violates Cantor's theorem.
\proofvbl7 = λθ.λ\proofvbl2.\proofvbl6
2.9 Exercises II
1. Give a construction of the integers (Z) from the natural numbers such that z = {m,n|m-n ≤ z}.
Define addition and subtraction for both this coding and the one in Example 1.2.1.
2. Show how to add and multiply complex numbers as pairs of reals, verifying the commutative,
associative and distributive laws and the restriction of the operations to the reals.
n m k
3. The volume-flow (in m3 s-1 ) down a pipe of radius r of a liquid under pressure p is cη r p for
some dimensionless c, where η is the dynamic viscosity, in units of kg m-1 s-1. Find n, m and k.
4. Show how to add Dedekind cuts and multiply them by rationals, justifying the case analysis of
the latter into positive, zero and negative. What do your definitions say when the cuts represent
rationals? Verify the associative, commutative and distributive laws.
5. Express √2, √3 and √6 as Dedekind cuts, and hence show that √2·√3 = √6.
6. Let x = (L,U) and y = (M,V) be Dedekind cuts of Q and put omitted eqnarray* environment Show
that (the lower closure of) \typeN1∪\typeN2 and (the upper closure of) \typeW1∪\typeW2 define
a Dedekind cut of R. Calling it x y, verify the usual laws for multiplication, without using case
analysis [Con76].
7. For any Cauchy sequence (\arga ), show that there is an equivalent sequence (\argb ) which
n m
8. Show how to add Cauchy sequences and to multiply them by rational numbers.
9. Show how to reduce a Cauchy sequence of reals ( ie of Cauchy sequences) to a Cauchy sequence
of rationals [Hint: diagonalise].
10. Define multiplication of Cauchy sequences, without using a case analysis according to sign.
11. Let (\arga ) be a Cauchy sequence and (L,U) a Dedekind cut. Formulate the predicate φ[(\arga ),
n n
(L,U)], that they denote the same real value. Show that for every Cauchy sequence there is a cut,
and that
φ[(\arga ),(L,U)]∧φ[(\argb ),(L,U)] ⇔ (\arga ) ∼
n n n
(\argb ).
n
Hence φis an injective functional relation \realno \hookrightarrow \realno . Show that it respects
C D
addition and multiplication. Show, using excluded middle, that φis bijective (see also
Exercise 2.30).
12. Explain the difference between p(U) in Example 2.1.5 and \funf!(U) in Remark 2.2.7.
13. Show that each element U,V of X+Y, defined in Example 2.1.7, is either of the form {x}, ∅ with x
∈ X, or else of the form ∅,{y} with y ∈ Y, but not both. [Hint: use (∨E ) on the first clause.]
Deduce that for any functions f:X→ Θ and g:Y→ Θ , the subset \fred!(U)∪\guy!(V) ⊂ Θ has
exactly one element. So there is a unique function [f,g]:X+Y→ Θ such that [f,g]oν0 = f and [f,g]
oν1 = g.
14. Using the isomorphism P(X+Y) ≡ P(X) xP(Y), construct the product Xx Y from powersets and
disjoint unions. [Hint: consider those U ⊂ X+Y with exactly one element in each component.]
15. For any set X, show that there is a function c:P( X)\{∅} → X such that ∀U.c(U) ∈ U, iff the
axiom of choice holds (Definition 1.8.8). (This was Zermelo's formulation of the axiom of
choice.)
16. Any predicate φ[x] gives rise to an equivalence relation ∼ on {0,1}x X so that ∀x.(φ[x] ⇔ 0,x ∼ 1,
x). Let Y = ({0,1}xX)/ ∼ and p:Y\twoheadrightarrow X by p[i,x] = x. Prove that ∀y. ∃j.y = [j,p(y)],
making your use of (∃E ) explicit. Show that if this has a Choice function then φ is decidable.
Hence the axiom of choice implies excluded middle.
17. Show that omitted eqnarray* environment giving the functional relations involved. Use these to
show that the axiom of comprehension may be eliminated from Zermelo type theory, in the sense
that every type is of the form {x: U|φ[x]} where U is a type-expression built from 1, N, x and P
alone. Hence develop a formalism for Zermelo type theory with unordered contexts.
18. Let a,b,c,d ∈ X, not necessarily distinct. Explain carefully what the left hand side means ( cf
n m
20. Explain how P (N) ⊂ P ( N) for n ≤ m. Using Exercises 2.17 and 2.19, show that the types U
n
with U ⊂ P (N) for some n ∈ N form a model of pure Zermelo type theory. It is known as the
von Neumann hierarchy V 2 .
ω
X
21. Give the λ-terms which define the isomorphism between X and X1x1 , and verify that they are
mutually inverse.
22. Show how to orient the β- and η-rules for pairs (Remark 2.2.2) to make them confluent and
strongly normalising (Definitions 1.2.5 and 1.2.8). The normal form says how the types are
bracketed.
\argu ,
m
where xis either a constant or a variable, possibly one of the y , and no sub-sequence x\argu1…
i
\argu is a δ-redex. Show that a term is in normal form iff it is hereditarilyhead normal.
k
24. The de Bruijn index of a bound variable is the number of λs which separate its use from its
declaration in the tree structure of the term. For example λx.(λy.x y)x becomes λ.(λ.2 1)1. ( Cf the
way in which a compiled procedure accesses local variables relative to the current stack pointer.)
Give the formal translation of raw λ-terms from variables to de Bruijn indices and vice versa .
Show that, using indices, substitution is performed textually, except that in (λx.a)[y: = b] the free
indices within b are incremented.
25. Show that α-conversion is unavoidable in the reduction of the untyped term (λx.x x)(λx, y.x y).
26. Combinatory algebra is given by one binary operation called application and two constants S
and K with laws Sx y z = x z(y z) and Kx y = x (Example 2.4.2). Show functional completeness,
that any term p in variables x1,…, x is equivalent to some fx1x2···x , where f is a term with no
n n
27. Recall from Remark 1.6.10 that the two axioms of implicational logic are the types of S and K.
Use Exercise 2.26 to prove the deduction theorem, that if φ is provable from hypotheses α1,…,
α then α1⇒ ···⇒ α ⇒ φ is provable under no hypothesis. This says that each instance of the
n n
R R R
28. Suppose that X \triangleleft R, ie j:X \hookrightarrow R, q:R\twoheadrightarrow X with j;q =
\id R. Show that any f:X→ X has a fixed point.
X
29. Use the elimination rule for ∅ to show ∃x:∅.φ[x]\dashv \vdash ⊥ and ∀x: ∅.φ[x]\dashv \vdash T.
Formulate the substitution rule and use it to show that any X→ ∅ is invertible. Show also that
∃x:1.φ\dashv \vdash φ[x: = ∗] \dashv \vdash ∀x:1.φ.
30. Consider the last clause in the definition of a Dedekind cut (Remark 2.1.1). Show that an
assignment (x ,y ) of witnesses (as in Remark 2.4.3(g)) provides a pair of Cauchy sequences
ε ε
which define the same real number in the sense of Exercise 2.11.
31. Show that Peirce's law, ((θ⇒ φ)⇒ θ)⇒ θ (Remark 2.4.10), excluded middle and the restart rule
are intuitionistically equivalent.
33. (Pierre de Fermat) Show that a prime number can be expressed as the sum of two squares iff it is
(either 2 or) of the form 4n+1. [Hint: define q\prec p on primes of this form if q < p and there are
numbers k and 1 ≤ m < p such that k p q = m2+1.]
34. Prove the results of Section 2.6 using minimal counterexamples, and also using descending
sequences. Explain where excluded middle and König's Lemma are needed when using these
forms of induction.
35. Suppose that finitely many types of polygonal tile are given, with the property that a disc of
arbitrary radius may be covered with copies of the tiles, without overlaps. Using König's Lemma
(Corollary 2.5.10), show that the whole plane (R2 ) may be covered.
36. Let \prec be a decidable binary relation with no cycles on a finite set. Show that it is well founded
( cf Corollary 2.5.11). [Hint: consider the number of descendants of each element and use
Proposition 2.6.2.]
37. Show directly that the interleaved product relation (Proposition 2.6.9) is well founded.
Investigate whether any of the three product relations have well founded analogues for infinite
families of sets. Is the order of potential words in a dictionary well founded?
38. Show that the lexicographic product of two transitive relations is transitive, and similarly for the
trichotomy law.
39. Find an example of a union of well founded relations which is not well founded, to show that
lower is necessary in Proposition 2.6.6.
40. Let f:(X, < )→ (Y,\prec ) be a function between sets with binary relations which is (non-strictly)
monotone in the sense that
x′ < x ⇒ fx′\prec fx ∨ fx′ =
fx
where (Y,\prec ) is well founded. Suppose that every set f-1[y] ⊂(X, < ) is well founded. Show that
Xis also well founded.
41. Using proof boxes, show that the transitive closure of a well founded relation is also well
founded (Theorem 3.8.11)
43. Let f:X\twoheadrightarrow Y be any surjective function between sets. Show that mapf:List(X)
\twoheadrightarrow List(Y) (Example 2.7.6(d)) and deduce that a finite set of witnesses may be
chosen without invoking the axiom of choice (Definition 1.8.8).
44. Show that for the Church numerals (Example 2.4.1), the terms
45. Show that if α is the n-element set then the Church numerals of type α→ (α→ α)→ α up to M+n-
1 are distinct, but that afterwards they repeat with period M, where M is the least common
multiple of 1,2,…,n. Estimate M.
51. Consider the language of higher order predicate calculus with only the connectives ∀ and ⇒ .
Show that the formulae on the right hand side of Proposition 2.8.6 satisfy the introduction and
elimination rules for the other connectives.
52. Prove Proposition 2.8.7, that x = y \dashv \vdash ∀θ.θ[x]⇔ θ[y], making clear where the
congruence rules are used. Express the proof in such a way as to make equality a derived
connective. By substituting the two-way formula for θ, show that the result remains true with one-
way implication: ∀θ.θ[x]⇒ θ [y].
53. Show that Ω satisfies Exercise 1.10(d) but that (c) gives excluded middle. If you know some
sheaf theory, show that (b) fails in Shv( R).
54. Let i:Ω\hookrightarrow Ω. Show that ∀x.i(x)⇒ (x = i(T)). On the assumption that i(i(T)) = T,
show that i2 = id and i(x) = (x = i( T)). Without this assumption, deduce that i(x) = [i(i(T))∧( x = i
(T))]. By computing i(T) and i3(T), show that the assumption is redundant.
INTRODUCTION
● Monotone functions
● Representation of orders by subset-inclusion
3.4 DOMAINS
● Directed diagrams
● Posets with directed joins
● The Scott topology
3.6 ADJUNCTIONS
● Closure conditions
● Induction on closures
● Modal logic
● The transitive closure
● Modal logic for preorders
● Functions and quantifiers
● Galois connections
EXERCISES III
Chapter 3
Posets and Lattices
O rder structures provide some simple tools for investigating semantics. For us, they serve a double
purpose, describing systems of propositions, and also as the substance of individual types.
The system of propositions is an order structure with respect to the provability relation, in which the
logical connectives are characterised as algebraic operations (meet, join, etc ) satisfying certain laws. In
the later chapters we shall discuss the analogous operations for types. For example implication is
discussed here briefly using Heyting algebras, and the function-type in Chapter IV with cartesian closed
categories.
Similar operations, sometimes with weaker laws or in infinitary form, arise in many mathematical
situations beyond logic. For example, the lattice of sub algebras often throws much light on the structure
of an algebra, topological spaces are described by their open sets and programs in terms of partial
evaluation. This ubiquity has led some authors to try to force other concepts such as well founded
relations (Section 2.5) into the same mould, a tendency which we aim to reverse.
Our other use of order structures is as individual types. As we remarked in Section 2.2, we need
something subtler than sets (as described by Zermelo type theory) to illustrate many of the phenomena
of reasoning, especially about non-terminating computation. We do this using posets that have directed
joins in Sections 3.3- 3.5. Later in the book we shall give examples where types are interpreted as
topological spaces.
Implication, the quantifiers and infinitary meets and joins are just a few examples of adjunctions or
universal properties. We study them in detail here because many of the features of this central concept of
category theory can already be seen in the simpler order-theoretic context. In practice, if a function
preserves all meets then its left adjoint tends to be used formulaically, without appreciating the
important theorem which is involved. Here and in subsequent chapters we shall indicate some of the
huge number of mathematical results which can be obtained from simple observations about
adjunctions.
The last three sections are devoted to adjunctions between powersets. To fulfil a promise that everything
which is later done for categories is treated first for posets, some material has been included that is
disproportionately more difficult than the rest of the chapter, so you should feel free to skip it on first
reading. The intermediate status of posets, ambiguously individual types or systems of them, is
unfortunately also reflected in some schizophrenic notation.
Sections 1.4, 1.5 and 2.3 defined the logical connectives in terms of their introduction and elimination
rules. Algebraic operation- symbols can also be seen as introduction rules; at the propositional level,
these are known as closure conditions. They arise as the conditions for subalgebras and congruences,
including reflexivity, symmetry, transitivity and convexity, and also as logic programs. The
corresponding elimination rules are familiar or novel induction schemes, which Section 3.7 also
describes.
The final section introduces the construction of semantics from syntax which will be developed
throughout the book.
(a)
any set with the discrete order, x ≤ y iff x = y;
(b)
N, Z, Q and R with the usual arithmetical order;
(c)
N with the divisibility order, n| m iff ∃k .n k = m;
(d)
the two-element set {⊥,T} with ⊥ ≤ T but T\not ≤ ⊥;
(e)
P(X) with the inclusion order, ⊂ , for any set X;
(f)
in particular Ω = P(•), the type of propositions or truth values under implication, which is
reflexive and transitive; antisymmetry in this case is the η-rule for the powerset, which says that
inter-provable propositions are equal (Remark 2.8.4);
(g)
the set of open subsets of a topological space under inclusion;
(h)
the set of subgroups of a group, and so on.
(i)
The specialisation order between points in a topological space,
x≤y if ∀U ⊂ X open. x ∈ U⇒ y ∈
U,
is in general a preorder; the space is called T0if the specialisation order is antisymmetric, and T1if it is a
discrete order ( cfLeibniz' Principle, Propositions 2.8.7and 3.8.14).
(j)
The ``bracket nesting'' order on sub-expressions may be regarded as a poset, but its purpose is
structural recursion, for which a well founded (and in particular irreflexive) relation is needed.
(k)
Formulae form a preorder under prov ability (\vdash , Definition 1.4.7), though there may be
many different proofs.
(l)
Expressions form a preorder under reducibility (\leadsto , Definition 1.2.3), though there may be
many different reduction paths.
Remember that the transitivity of a relation has significant mathematical force: don't assume it without
checking it ( cf Example 9.4.9(d))!
The word poset is a corruption of ``partially ordered set,'' where a total or linear order is one satisfying
DEFINITION 3.1.3 The earliest use of order relations was arithmetical, where the reflexive order ( ≤ ) is
accompanied by an irreflexive relation ( < ) of equal importance. N, Z and Q have the trichotomy
property,
Posets are partial in the sense that there may be pairs of elements which are incomparable, ie fail to
stand in the order relation either way round, cf Galileo's remarks after Proposition 2.8.8. (This is what
people usually mean by ``equality '' in politics.) As there is no connection between total orders and total
functions, or between linear orders and preserving sums, it is best to forget these terms and treat ``poset''
as a bona fide English word for the fundamental notion that it is.
Moving away from arithmetical examples, trichotomy usually fails for posets arising in logic, and is
destroyed by products and function-spaces. Imposing trichotomy can be a nuisance in technical
situations ( cf Corollary 3.5.13, but see the ordinals in Section 6.7). It has also given rise to a great deal
of misleading terminology. For example, without trichotomy, x\not > y (``no more than'') is no longer a
synonym for x ≤ y (``at most''). Some authors, whilst trying to be careful about non-strict inequalities,
fall into a greater error by saying ``non-decreasing'' for monotone.
The symbol ≤ , like subset inclusion (Remark 2.2.6), is irreducible, but unfortunately too well
established to replace; an arrow would be better, both graphically and theoretically.
WARNING 3.1.4 Any irreflexive relation < can in fact be recovered from ( < )∪( = ), but we need
excluded middle to recover the reflexive relation ≤ from ( ≤ )∩( ≠ ) (Exercise 3.3). Sometimes we use <
and ≤ together in the same passage: beware that they are not assumed to be related in this classical
fashion.
X Y
DEFINITION 3.1.5 Let (X, ≤ ) and (Y, ≤ ) be posets (or preorders) and f:X→ Y a function. Then we say f
is monotone, covariant or order- preserving if
X Y
∀x1,x2. x1 ≤ x2⇒ f(x1) ≤ f
(x2).
A function for which the converse implication holds is said to reflect order. It is full if both directions
hold; if X is a poset this requires f to be injective, identifying X with a subset of Y, where this subset is
Y
equipped with (the restriction of) the same order relation ≤ . (The property of functors which most
naturally corresponds to fullness is that they be full and faithful, Definition 4.4.8.)
A function which, in the same sense, preserves an irreflexive order is called strictly monotone, cf
Definition 2.6.1 for well founded relations.
X Y
∀x1,x2. x1 ≤ x2⇒ f(x1) ≥ f
(x2),
then we call f antitone, contravariant or order-reversing. There's nothing new in this because antitone
op op op
functions X→ Y are just monotone functions X→ Y or X → Y. Here X is the opposite poset, (X,
≥ ), whose order relation is the converse of ≤ (Definition 1.3.9).
EXAMPLES 3.1.6
(a)
The formulae α∧φ, φ∨α, φ⇒ α, ∀x.α and ∃x.α are monotone functions of the propositional
variable α.
(b)
The formulae \lnot α and α⇒ φ are antitone functions of α.
(c)
The composite of two monotone functions is monotone, as is the identity function.
(d)
A preorder in which ≤ is symmetric is an equivalence relation. In this case monotone functions
are functional in the sense of Remark 1.3.2.
(e)
A bijective monotone function has a (monotone) inverse iff it is full.
(f)
Recall that any function f:X→ Ω is a predicate and so defines a subset U = {x|f(x) = T} ⊂ X by
comprehension, and conversely by f(x) = (x ∈ U) ( cf Definition 2.2.5 and Notation 2.8.2).
∀θ,x:X. θ ≥ x ∈ U⇒ θ ∈
U.
We write x↓ X = {θ|x ≤ θ} for the up-closure of the singleton x, ie the upper set which it
generates.
(f) shows that this also accounts for the arithmetical orders.
op
DEFINITION 3.1.7 An antitone predicate on X, ie a monotone function X→ Ω , defines a lower set A ⊂
X,
∀γ,x: X.γ ≤ x ∈ A⇒ γ ∈
A.
We write shv(X) for the collection of lower subsets, ordered by inclusion. The down-closure of x ∈ X is
written X↓ x = {γ|γ ≤ x}, and any subset of this form we call a representable lower set. We have used γ
and θ as a reminder of the roles of Γ as hypotheses and θ as a conclusion or result type.
Using lower subsets, every partial order may be seen as an inclusion order. This is called the covariant
regular representation. We introduce this terminology here, while the technology remains simple, to
prepare for the Yoneda Lemma for categories in Sections 4.2 and 4.8.
(a)
(X↓ x) ⊂ (X↓ y)⇔ x ≤ y, ie the function X↓ (-):X→ shv(X) is monotone and full;
(b)
for any lower set A ⊂ X, we have X ↓ x ⊂ A⇔ x ∈ A. []
Upper sets provide a similar contravariant representation. There are two different senses of
``representation'' here: a subset of the form X↓ x is represent ed by the element x, whereas shv(X)
represent s the poset X.
EXAMPLE 3.1.9 Consider a collection of propositions (a Lindenbaum algebra, Remark 1.4.6), ordered by
provability. Then
(a)
the covariant regular representation represents a proposition φ by its reasons, ie the set {γ| γ
\vdash φ} (a ``reason'' for something may be some other assumption, from which the thing
follows), and
(b)
the contravariant representation (using upper sets) represents φ by its consequences, Cn(φ) = { θ|
φ\vdash θ} (Notation 2.4.12). []
The regular representation of a preorder identifies x and y iff both x\preccurlyeq y and y\preccurlyeq x,
since then X↓ x = X↓ y. We may cut down the representation from X↓ x = {y|y\preccurlyeq x} to {y|y ∼
x}.
x ∼ y⇔ x\preccurlyeq y∧y\preccurlyeq
x
is an equivalence relation. The quotient X/ ∼ carries an antisymmetric order ≤ such that the function η:
X→ X/ ∼ is monotone and full.
Moreover if Θ is any poset and f:X→ Θ is a monotone function, then x ∼ y⇒ f(x) = f(y) and there is a
unique monotone function p:X/ ∼ → Θ such that f = η;p.
PROOF: First, ∼ is reflexive and transitive because \preccurlyeq is, and symmetric by construction. So it
is an equivalence relation, of which Example 2.1.5 gave the quotient X/ ∼ as the set of equivalence
classes, together with the mediating function p. We want [x] ≤ [y] if x\preccurlyeq y; this is well defined
because, if x ∼ x′, y ∼ y′ and x\preccurlyeq y, then x′\preccurlyeq y′ by transitivity. Then ≤ inherits
reflexivity and transitivity from \preccurlyeq , and is antisymmetric on equivalence classes by
construction of ∼ . Finally, p is monotone. []
(a)
a least element or bottom if ∀θ. u ≤ θ,
(b)
a locally least element if ∀x,y.x ≤ y ≥ u⇒ u ≤ x (Exercise 3.5),
(c)
a minimal element if ∀x.x ≤ u⇒ u ≤ x.
(d)
a greatest element or top if ∀γ. u ≥ γ,
(e)
a maximal element if ∀x.x ≥ u⇒ u ≥ x.
Bottom and top, if they exist, are written ⊥ and T respectively; if ≤ is antisymmetric then they are
unique. (They are in any case unique up to ∼ of Proposition 3.1.10.) The terms maximum and minimum
(which are nouns) are also used to mean greatest and least elements, but should be avoided as a source
of confusion. Local maxima and minima in the sense of elementary calculus are not formally related to
any of these concepts.
EXAMPLES 3.2.2
(a)
False and true are least and greatest formulae under provability ( \vdash ).
(b)
∅ and X are the least and greatest subsets of any set X under ⊂ .
(c)
If ( ≤ ) = ( < )∪( = ) with < irreflexive, then this definition agrees with the notion of minimality
used in Proposition 2.5.6 (Exercise 3.3).
(d)
A lower subset is representable (Definition 3.1.7) iff it has a greatest element.
We do not extend the symbol ≤ to U ≤ V, but see Exercise 3.55 for three such orders on the set of
subsets.
(a)
If γ ≤ ℑ then we call γ a lower bound for ℑ.
(b)
A greatest lower bound, ie a greatest element of {γ|γ ≤ ℑ} ⊂ X, the set of lower bounds, if such
exists, is called a meet or infimum, and is denoted by ∧ℑ; then the set of lower bounds is
representable (Definition 3.1.7), namely by the meet.
(c)
Similarly, if ℑ ≤ θ then we call θ an upper bound for ℑ. A least upper bound, if any, is called a
join or supremum, ∨ ℑ.
If ≤ is antisymmetric then meets and joins, where they exist, are unique. Otherwise they are unique up to
∼ from Proposition 3.1.10. See Exercises 3.5, 3.21 and 3.33ff for minimal and locally least upper
bounds.
EXAMPLES 3.2.5
(a)
∨∅ = ⊥ = ∧X and
(b)
If ℑ = {φ,ψ} then we write ∨ ℑ = φ∨ψ and ∧ℑ = φ∧ψ.
(c)
Meets and joins of sets of subsets in the inclusion order are called intersections (∩) and unions
(∪) respectively.
(d)
The union or intersection of a family of lower subsets is again lower.
(e)
The Dedekind reals, \realno (Remark 2.1.1), have meets and joins with respect to the
D
arithmetical order for all bounded inhabited subsets; these are usually written as inf and sup
respectively.
(f)
For the divisibility order on N, the meet and join of two numbers are called their greatest
common divisor and least common multiple respectively. The extremal elements are ⊥ = 1 and T
= 0. This conflict with the conventions of logic is resolved by considering ideals (Example 2.1.3
(b)), for which I|J⇔ J ⊂ I; this is the contravariant regular representation (Example 3.1.6(f)ff).
(g)
Arbitrary meets and joins in the type Ω of propositions are found using the guarded quantifiers
(Remark 1.5.2):
DEFINITION 3.2.6 Let f:X→ Y be a monotone function between preorders. If u is a meet of ℑ ⊂ X then fu
is a lower bound of \funf!(ℑ) ⊂ Y by monotonicity of f. (Recall the notation \funf! from Remark 2.2.7.)
Then f preserves the meet if fu is a greatest lower bound.
(If X and Y are posets then of course these meets are unique; the point of stressing preorders and being a
meet is that no choice of meet is needed to define preservation, by an argument similar to Lemma
1.2.11. This will be important for limits and colimits in categories, Definition 4.5.10.)
We also say that f creates the meet if (a) y = ∧\funf!(ℑ) exists, (b) there is a unique x ∈ X (up to ∼ )
such that y = f(x) and x ≤ ℑ, and, having fixed x, (c) in fact x = ∧ℑ.
Meets and joins of lower sets The covariant regular representation (by lower sets, Definition 3.1.7)
has both meets and joins, but it behaves quite differently with respect to them. Both cases are important.
(a)
is full and preserves any meets which exist, ie
∧ ℑ = ∩
(X↓
X↓
x),
x∈ℑ
since an element γbelongs to either side iff γ≤ℑ;
(b)
but freely adds joins. omitted diagram environment This means that, for any monotone function f:
X→ Θ to a poset which has joins of all subsets, there is a unique join-preserving function p:shv
(X)→ Θ such that p(X ↓ x) = f(x). []
Theorem 3.9.7 shows how to add new joins, but keep specified old ones.
Diagrams It is often more convenient to define meets and joins with respect to arbitrary functions ℑ→
X instead of just subsets ℑ ⊂ X.
DEFINITION 3.2.9 A diagram in a poset X is a function to X from a set ℑ, or a monotone function from a
poset or preorder. We call ℑ the shape of the diagram, and write x for the value of the function at i ∈ ℑ
i
Then θ ∈ X is an upper bound for the diagram if ∀i.x ≤ θ, ie θ bounds the image, {x |i:ℑ} ⊂ X. We
i i
write ∨ x for the join (least upper bound) of the diagram, and similarly ∧ x for the meet.
i i i i
The notations x∨y and x∧y are, strictly speaking, examples of a join and a meet of a diagram of shape 2
= {• •} rather than of a subset. We shall study certain non-discrete diagrams in the next section.
Any diagram has the same meets and joins as its image. Indeed it has the same joins as the down-closure
of its image, and more generally we can say exactly when a comparison between diagrams induces the
same joins. This result will be used in Corollary 3.5.13.
PROPOSITION 3.2.10 Let U:J→ ℑ be monotone. Then the following are equivalent:
(a)
U is cofinal: ∀i.∃j.i ≤ U(j);
(b)
for every preorder X, diagram x(-):ℑ→ X and element θ ∈ X, θ is an upper bound for {x |i:ℑ} iff
i
(c)
for all such diagrams the minimal upper bounds coincide;
(d)
for all such diagrams the locally least upper bounds coincide;
(e)
for all such diagrams the least upper bounds coincide.
PROOF: The proof is easy except for (e)⇒ (a). For this case we use the representation by lower sets, so
let X = shv(ℑ) and x = ℑ↓ i. Then θ is an upper bound of {x |i:ℑ} iff θ = ℑ, and of the subset {x ( )|j:J}
i i Uj
Lattices Now we shall give an algebraic characterisation of finite meets and joins, motivated by the
logical connectives ∧ and ∨ (Section 1.4). Gottfried Leibniz defined order in terms of joins, and Johann
Lambert recognised the idempotent law. Ernst Schröder was the first to see that the distributive law was
not automatic. These laws, plus associativity, commutativity and the units for lattices, express the
structural rules of logic (Definition 1.4.8), as do the reflexivity and transitivity axioms for a preorder.
Antisymmetry, on the other hand, is a side-effect of the algebraic formulation of logic in terms of
lattices.
PROPOSITION 3.2.11 Let (X, ≤ ) be a poset with a greatest element (T) and meets of pairs (x∧y). Then the
operation ∧:Xx X→ X is
Moreover x ≤ y iff x∧y = x. Conversely if (X,T,∧) satisfies these laws then this condition defines a partial
order for which ∧ is the binary meet and T the greatest element. []
DEFINITION 3.2.12 Such a structure is called a semilattice; a function which preserves ∧ and T is called a
semilattice homomorphism, and is therefore monotone. Notice that the algebraic laws do not force the
direction of the order relation: ∨ and ⊥ satisfy them as well. When we speak of an algebraic structure as
a join- or meet-semilattice we are therefore imposing the direction by convention.
Since finitary meets and joins are characterised by the same laws and each alone can uniquely determine
the order, there must be an additional law forcing the two orders to coincide. It suffices to say that x,y ≤
x∨y, where ≤ is the relation defined by ∧, or vice versa . Eliminating the inequality, either of these
conditions is known as the absorptive law:
The nullary cases, y∧⊥ = ⊥ and y∨T = T, follow from these (with x = ⊥,T) and the unit laws.
PROPOSITION 3.2.13 Let (X, ≤ ) be a poset with finite meets and joins. Then (X,⊥,∨) and (X,T,∧) are both
semilattices, and the absorptive laws hold. Conversely, if (X,⊥,∨,T,∧) obeys both sets of semilattice laws
and either of the absorptive laws then the orders agree and the meets and joins are as given. Then X is
called a lattice; lattice homomorphisms by definition preserve ⊥, ∨, T, and ∧. []
where LHS ≤ RHS hold in an arbitrary lattice. In fact one can be derived from the other in the context
of the lattice laws, and we have already stated the nullary versions ( x∧⊥ = ⊥ and x∨T = T). Whereas
expressions in the theory of lattices may need many nested brackets, the distributive laws give rise to the
disjunctive normal form, eg
There are also major applications of lattices to the structure theory of algebra. The lattices of
congruences of certain familiar theories, notably groups, rings and vector spaces, obey the modular law
(Exercise 3.24), which is weaker than distributivity and was first identified by Richard Dedekind
(1900). This gives a sense in which these algebras are made of ``building blocks,'' of which the
dimension of a vector space and the Jordan-Hölder theorem for groups are examples.
We shall return to lattices with arbitrary meets and joins in Section 3.6. Semilattices will be used in
Theorem 3.9.1 to describe Horn theories (Remark 1.7.2ff). Now we shall turn from finitary meets and
joins to those which are in a sense ``purely infinitary.''
Taking a different attitude, we may accept non-termination as a first class value. By extending the domain of
mathematical values which the two sides may be understood to have, we may then treat recursive definitions of
programs as equations.
REMARK 3.3.2 Any recursive program nevertheless has a well defined, albeit partial, de facto meaning. When the
program is given a particular argument, say 2, what the machine actually executes is
where u never gets called. So it may be anything - for example the program
whileyesdoskipod which goes straight into a tight unending loop. The latter is interpreted as the empty (totally
undefined) partial function ⊥, and the version of the factorial function as executed for the argument 2 is T3(⊥) ≡ T
(T(T(⊥))).
n+1
In general, T (⊥) suffices to give fact(n). Intuitively, the programs
The poset of partial functions Agreement of total functions is all or nothing, but two partial functions
(Definition 1.3.1(b)) may agree on the intersection of their supports, whilst one offers more information than the
other on some other values.
DEFINITION 3.3.3 Let f,g:X\rightharpoonup Y be partial functions between sets. We write f\sqsubseteq g and say
that g extends f if
Then f\sqsubseteq g iff suppf ⊂ suppg and f is the restriction of g to suppf. So the inclusion k:suppf
\hookrightarrow suppg satisfies i = k;j and f = k;g.
LEMMA 3.3.5 Partial functions X\rightharpoonup Y between sets form a poset under the extension relation.
Moreover
(a)
The empty relation is the least partial function, called ⊥.
(b)
Any total function is maximal (Definition 3.2.1(e)), but not greatest (unless X = ∅ or Y = 1). Total
functions at higher types, as in Convention 2.3.2, are not, however, characterised by maximality.
(c)
Any inhabited set of partial functions has a meet; its support is the set of elements on which all the partial
functions are defined and agree, and it takes the agreed values there.
(d)
Any set of partial functions which pairwise agree on their common support has a join in the extension
order; its support is the union of the supports and it takes the same value at an argument as any (and hence
all) of the partial functions which are defined there. See Exercise 3.21 for a notion of domain based on
this fact. []
Instead of modifying the source of a function we can change its target. The trick we employ to do this is
applicable to any relation:
LEMMA 3.3.6 There is a bijection between relations R:X\leftharpoondown \rightharpoonupY and functions \expx
R:X→ P(Y), defined by \expx R(x) = {y|x Ry}. If the relation R is functional, then each subset \expx R(x) has at
most one element. []
DEFINITION 3.3.7 The lift or partial function classifier of a set Y is the set of subsets with at most one element:
Lift Y = {V ⊂ Y|∀y1,y2 ∈ V. y1 =
y2},
⊥ = ∅ ∈ Lift Y.
We tend to identify y ∈ Y with lifty ∈ Lift Y. (Classically, Lift Y = Y∪{⊥}, and we put f(x) = ⊥ for x ∉ supp f.)
The lemma restricts to a bijection between partial functions X\rightharpoonup Y and total functions X→ Lift Y.
LiftY is the set of partial functions {∗}\rightharpoonup Y, and the extension order for these agrees with the
inclusion order on subsingleton subsets of Y; we call it the information order. It is rather sparse, being discrete
when restricted to Y itself, with ⊥\sqsubseteq y. Example 3.9.8(c) extends the definition of the lift to the case
where there is already an order on Y, Exercise 3.71 shows how to construct it for topological spaces and locales,
and Example 9.4.11(a) explains its significance in type theory. Function-types in Section 3.5 are typically less
``flat'' than LiftY is.
We expect computable functions to be monotone with respect to the information order x1\sqsubseteq x2.
Otherwise providing the information that x2 does beyond that provided by x1 would result in retracting the
guarantees which f(x1) has already given about the output.
On the other hand, strictness of a program means that it tries to use the input ( cf Remark 2.3.4 and Example
6.1.10). The input of T is the sub-program u in Remark 3.3.2, which, as we saw, need not be called (used), so in
general T(⊥) ≠ ⊥.
n
The fixed point theorem We may define fact as the join \bigsqcup T (⊥) in the poset of partial functions N
n
\rightharpoonup N. It is important to appreciate that the order implicit here is the extension or information order,
not the arithmetical one.
Now the poset N\rightharpoonup N does not have all joins, so how can we be sure that this particular join exists?
n
We have ⊥\sqsubseteq T(⊥) since ⊥ is the least element, and then since T is monotone it follows that
n n +1
⊥ = T0(⊥)\sqsubseteq T(⊥)\sqsubseteq T2(⊥) \sqsubseteq ···\sqsubseteq T (⊥)\sqsubseteq T (⊥) \sqsubseteq ···
DEFINITION 3.3.9 A poset X is ω-complete if any ω- sequence, ie any diagram x(-):ω→ X where ω is N with the
arithmetical order, has a join. A monotone function f:X→ Y between ω-complete posets is ω-continuous if it
preserves all such joins.
EXAMPLE 3.3.10 The poset [N→ LiftN] of partial endofunctions of N is ω-complete, by Lemma 3.3.5(d). Any
functional T:[N→ LiftN]→ [N→ LiftN] which codes a recursive program is, in fact, ω-continuous.
PROPOSITION 3.3.11 Let X be an ω-complete poset which has a least element ⊥ and let T:X→ X be an ω-
n
continuous function. Then the join ∨ T (⊥) exists, is a fixed point of T and is indeed the least such.
n
PROOF:
(a)
n
We have already observed that n→ T (⊥) is an ω-sequence, so the join exists.
(b)
n n+1 n
The sequence n→ T(T (⊥)) = T (⊥) is the same as n→ T (⊥), apart from the missing T0(⊥) = ⊥, but this
does not affect the join.
(c)
n n+1
Since T preserves joins of ω-sequences, T(∨ T (⊥)) = ∨ T (⊥), so this is a fixed point.
n n
(d)
n n n+1 n
If T(θ) = θ then T (⊥) ≤ θ by induction on n (since if T (⊥) ≤ θ then T (⊥) ≤ T(θ) = θ), so ∨ T (⊥) ≤ θ.
n
[]
This is often (inaccurately) called Tarski's theorem: the result that Alfred Tarski actually proved (1955, Exercise
3.39) is that any monotone endofunction of a complete lattice has a least fixed point, and indeed a complete
lattice of fixed points (see also Proposition 3.7.11ff).
REMARK 3.3.12 Algebraic topology gives other, quite different, reasons why continuous endofunctions of certain
spaces must have fixed points: the closed interval [0,1] ⊂ R and disc [(B)] 2 = {(x,y)|x2+y2 ≤ 1} ⊂ R2 are the
simplest examples (the former is essentially the intermediate value theorem). The latter is due to Jan Brouwer,
which is ironic because such results rely on excluded middle, and the fixed points do not depend continuously on
the given endofunction.
Tarski's theorem does assign fixed points continuously - a fact which is crucial to denotational semantics.
A traditional fixed point theorem which is more closely related to what we require is that for contraction
mappings on a complete metric space (X,d), ie functions f:X→ X such that ∀x, y.d(f(x),f(y)) ≤ k d(x,y) for some
constant 0 ≤ k < 1. This analogy is closer if the symmetry law, d(x,y) = d(y,x), for metric spaces is dropped, since
certain concrete domains can be equipped with such a (pseudo)metric.
3.4 Domains
Domains abstract what is needed to prove Tarski's fixed point theorem. For the interpretation of
programming languages, joins of ω-sequences suffice, so many authors just consider ω-complete posets
with ⊥, though there are very many more specialised notions in the literature.
Directed diagrams Aside from the fixed point theorem, carrying the completeness property for
sequences verbatim through our working gets us caught in pedantic and ultimately confusing notation.
For example, if we take such a join over two indices n and m we no longer have a sequence. The
following definition is equivalent (in the sense of Proposition 3.2.10), but it is invariant under operations
such as products.
It may be important that the diagram be computable (and a fortiori countable), but this is an issue which
is best studied separately.
DEFINITION 3.4.1 A poset ℑ is directed if every finite subset F ⊂ ℑ has an upper bound F ≤ i ∈ ℑ.
Specialising this to the cases F = ∅ and F = {\numi1,\numi2}, directedness is equivalent to
(a)
If the binary form (alone) holds then we say that ℑ is semidirected. Classically, a poset is
semidirected iff it is either directed or empty.
(b)
We say that ℑ is confluent ( cf Definition 1.2.5) if
∀\numi0,\numi1,\numi2. \numi0 ≤ \numi1∧ \numi0 ≤ \numi2 ⇒ ∃i.\numi1 ≤ i∧ \numi2 ≤
i.
A directed lower subset of a poset X is called an ideal , and we write IdlX for the poset of them, ordered
by inclusion. The analogy between lattices and rings which lies behind the name ideal ( cf Example 2.1.3
(b)) was first noticed by Marshall Stone in 1935 (Exercises 3.10 and 3.11).
EXAMPLES 3.4.2
(a)
Any poset which has a greatest element is directed.
(b)
N with the arithmetical order is directed (ω, Definition 3.3.9).
(c)
Any join-semilattice ( cf Definition 3.2.12) is directed.
(d)
A lower subset of a join-semilattice is directed (an ideal) iff it is a (lower) subsemilattice.
(e)
For any set X, the powerset P(X) and, more usefully, the finite powerset P (X) consisting of the
f
finite subsets of X, are directed posets under inclusion; see Lemma 6.6.10(e).
(f)
Raw λ-terms form a confluent preorder under (reverse) βη-reduction (the Church-Rosser
Theorem, Fact 2.3.3).
(g)
A poset is confluent iff every connected component (Lemma 1.2.4) is directed.
Lemma 3.5.12 and its corollary, about functions of two arguments, show how much easier it is to use
directedness than sequences.
NOTATION 3.4.3 If ℑ is directed then we indicate this fact by an arrow when writing its directed join:
\dirsupdisplay
i∈ℑ
x.
i
Often this arrow is used instead of saying in words that the relevant set is assumed or has been shown to
be directed.
DEFINITION 3.4.4 A poset which has all directed joins is called directed complete or a dcpo for short. If it
also has a least element - from which it follows (without using excluded middle) that it has joins of all
semidirected sets - then we call it an inductive poset or ipo.
The term complete partial order or cpo is commonly found instead of ipo in the literature, but we avoid
it on the grounds that it conflicts with complete categories ( cf complete semilattices), which also have
(the analogue of) finite joins. This confusion has been made worse by authors who use ``cpo'' for dcpo,
ie not necessarily with bottom.
A (monotone) function between dcpos which preserves directed joins is said to be Scott-continuous.
These are the most useful morphisms between ipos because we wish to allow programs to ignore inputs
and so terminate even if these are not specified. This is peculiar from the point of view of universal
algebra because not all of the structure is preserved. The fixed point theorem is essentially the only use
of ⊥.
REMARK 3.4.5 Peter Freyd [Fre91] observed that the three notions of
(a)
domain (having, but with functions not necessarily preserving, ⊥),
(b)
predomain (not necessarily having ⊥ at all) and
(c)
lift-algebra (both having and preserving ⊥)
ought to be formulated in tandem (Example 7.5.5(c)). Since we only intend to investigate function-
spaces, where ⊥ gets in the way, we shall concentrate on dcpos (predomains). Notice that when the
morphisms change, in this case whether or not they have to preserve ⊥, we change the names of the
objects too, from domain or ipo to lift-algebra.
LEMMA 3.4.6 The composite of two Scott-continuous functions is Scott-continuous, as is the identity
function. []
The Scott topology Continuity may be expressed in terms of open or closed sets, but the
correspondence partly depends on excluded middle.
DEFINITION 3.4.7 A subset A ⊂ X of a dcpo is said to be Scott-closed if it is a lower subset and also
closed (``upwards'') under directed joins. In particular any representable lower set X↓ x is Scott- closed.
LEMMA 3.4.8 A function f:X→ Y between dcpos is Scott-continuous iff the inverse image f-1(B) of every
Scott-closed subset B ⊂ Y is again Scott-closed.
PROOF: For monotonicity, let x′ ≤ x ∈ X and consider B = Y↓ f(x), so x ∈ f-1(B). Then x′ ∈ f-1(B)⇔ f(x′)
≤ f(x).
↑ ↑
Suppose that f-1(B) is Scott-closed, where x = ∨ x , y = ∨ f(x ) and B = Y↓ y. Then x ∈ f-1(B) by
i i i
↑
Conversely, let B ⊂ Y be any Scott-closed subset and x ∈ f-1(B) ⊂ X be a directed set with ∨ f(x ) = f
i i
↑ ↑
(∨ x ) ∈ B. Then ∨ x ∈ f-1(B) as required. []
i i
Classically there is a bijective correspondence between closed subsets and their complementary open
subsets, but we shall define them separately.
PROPOSITION 3.4.9 Let (X, ≤ ) be a dcpo. Then for a subset U ⊂ X, the characteristic function χ :X→ Ω
U
(a)
U is an upper set and
(b)
it is inaccessible by directed joins, ie for any directed diagram x(-):ℑ→ X, if \dirsup x ∈ U then
i i
∃i.x ∈ U.
i
Such subsets are said to be Scott-open, and they form a topology. For any Scott-continuous function f:
X→ Y between dcpos, the inverse image of any open subset V ⊂ Y is open. The converse holds so long
as there are enough Scott-open sets to make the specialisation order (Example 3.1.2(i)) coincide with the
given order.
PROOF: Directedness is needed to show that the whole set X is Scott-open, and that intersections of open
subsets are open. Closure under unions is easy, and we leave the rest as an exercise, adapting the
DEFINITION 3.4.10 The type of truth values is playing a topological role here, in which the point T is
open and ⊥ is closed. This is called the Sierpi\'nski space, S. Intuitionistically, it is intermediate between
2 and Ω, though the considerations bearing on what its definition ought to be lie outside the scope of this
book (see the footnote on page Sierp footnote), so as with R we shall avoid questions that rely on the
distinctions.
Classically, subsets of the form {x|x\not ≤ y} are Scott-open, but the Scott topology need not be sober
[ Joh82, p. 46]. For an intuitionistic result, we restrict attention to a smaller class of domains, drawing
upon another source of upper (so possibly open) subsets: those of the form x↓ X.
↑ y ⇒ ∃i.x ≤
i
∨ yi
x
≤
for all directed sets, then x is said to be compact or finitely generable. We write \X ⊂ X for the subset
fg
x
then X is called an algebraic dcpo. The name arose by extension from algebraic lattices (Theorem
3.9.4), but since their algebraic aspect has really been lost, finitary or locally finitely generable would
be better names. Algebraic dcpos satisfy the qualification of the previous result, as do the more general
continuous dcpos [ GHK+80, Joh82].
PROPOSITION 3.4.12 A dcpo X is algebraic iff it is isomorphic to Idl(Y) for some poset Y. Then for any
dcpo Θ there is a bijection
omitted prooftree
environment
In particular the topology is the lattice of monotone functions [Y→ Ω].
↑
PROOF: [⇒ ] For X algebraic, put Y = \X . Then ℑ = {y|x ≥ y ∈ Y} and x = ∨ ℑ give the isomorphism.
fg
↑
[⇐ ] For any ideal ℑ ∈ Idl(Y), ℑ = ∪ {Y↓ y|y ∈ ℑ}. ℑ is compact iff ℑ = Y↓ y for some y ∈ Y.
Finally, any continuous function f:X→ Θ is determined by its values at compact elements. []
X Y
PROPOSITION 3.5.1 Let (X, ≤ ) and (Y, ≤ ) be preorders. We equip the cartesian product XxY with the componentwise
order:
Xx Y X Y
x1,y1 ≤ x2,y2 if x1 ≤ x2 and y1 ≤ y2.
(a)
the projections π0:x,y→ x and π1:x,y→ y are monotone;
(b)
if a:Γ→ X and b:Γ→ Y are monotone functions then so is the function a,b:Γ→ Xx Y defined by a(-),b(-);
(c)
a function p:XxY→ Z is (jointly) monotone iff it is monotone in each argument, for each constant value of the
other;
(d)
X Y
if X and Y are posets, ie their order relations ≤ and ≤ are antisymmetric, then Xx Y is also a poset. []
The corresponding result for domains applies equally well to semilattices, lattices, Heyting (semi)lattices and complete
(semi)lattices, so we state it for individual diagram shapes. We shall consider part (c) later.
(a)
Let x(-),y(-):ℑ→ XxY be a diagram. Then the joins
(Xx
X Y
Y) x ,y
∨ =
i i
∨ x,
i ∨ y
i
i∈ℑ i∈ℑ
i∈ℑ
coincide in the sense that if one exists then so does the other, and then they are equal.
(b)
If X and Y have all joins of shape ℑ then so does XxY.
(c)
The projections π0:x,y→ x and π1:x,y→ y preserve joins.
(d)
If a:Γ→ X and b:Γ→ Y preserve joins of shape ℑ then so does the pair a,b:Γ→ XxY.
PROOF: In (a), θ,φ is an upper bound for the set of pairs iff θ bounds the first components and φ the second. Hence θ,φ is
least (or locally least or mimimal) iff both θ and φ are. The other parts follow. []
COROLLARY 3.5.3 If X and Y are dcpos or ipos then so is XxY, the projections are Scott-continuous and pairing preserves
continuity. []
Products of diagrams are used to handle multiple suffixes. Doubly indexed joins can be rearranged, as may doubly
indexed meets, but Example 3.5.14 shows that meets cannot be interchanged with joins.
LEMMA 3.5.4 Let x(-),( = ):ℑx J→ X be a diagram. If either the expression on the left or that on the right is defined in
x, x, x,
i
∨ ∨ =
ij
∨ =
ij
∨ ∨ j
,
i∈ℑ j∈J -20mui ,j ∈ ℑx J-20mu j∈J i∈ℑ
then so is the one in the middle and then they are equal.
∀i. θ ∀j. θ
≥ ∨ x,
ij ⇔
∀i,j. θ ≥ x ,
ij ⇔
≥ ∨ x,
ij
j∈J i∈ℑ
and each of these is equivalent to θ lying above the corresponding join. Substituting each of the three joins for θ, they
must be equal. []
PROPOSITION 3.5.5 Let Y be a preorder and f,g:X→ Y be functions (the order, if any, on X is unimportant). Then the
X→ Y Y
f≤[ ] g if ∀x: X.f(x) ≤ g(x).
X
The preorder of monotone functions from X to Y is written [X→ Y] or Y and is called the function-space. Then
(a)
ev:[X→ Y]x X→ Y, given by f,x→ f(x), is monotone;
(b)
if p:ΓxX→ Y is monotone then so is \expx p:Γ→ [X→ Y], defined by λx.p (-,x);
(c)
[X→ Y] is a poset, ie the pointwise order is antisymmetric, if Y is.
PROOF:
(a)
[[a]] If f,x1 ≤ g,x2 then f ≤ g and x1 ≤ x2, so using the pointwise order, monotonicity and transitivity, we have a
square omitted diagram environment
(b)
[[b]] Joint monotonicity implies separate.
(c)
Y Y
[[c]] f ≤ g ≤ f iff ∀x: X.f(x) ≤ g(x) ≤ f(x) whilst f = g iff ∀x: X.f(x) = g(x) . []
EXAMPLES 3.5.6
(a)
[X → Ω], where X is a set with the discrete order and Ω is the type of propositions, is isomorphic to the powerset
P(X) under inclusion.
(b)
op
[X → Ω], where X is a poset, is isomorphic to shv(X), the lattice of lower sets (Definition 3.1.7).
(c)
op
Similarly [X → Ω] is the lattice of upper sets (Example 3.1.6(f)).
(d)
[X→ LiftY], where X and Y are (discrete) sets and LiftY carries the information order (Definition 3.3.7), is
isomorphic to the poset of partial functions X\rightharpoonup Y with the extension order (Definition 3.3.3).
LEMMA 3.5.7 Let \funf(-):ℑ→ [X→ Y]. If the joins on the right of
[X→
Y
Y] \funf = \funf
∨
i i
∨ λx.
i∈ℑ
(x)
i∈ℑ
exist then so does the join on the left, and the equation holds.
PROOF:
(a)
If we have {\funf |i:ℑ} ≤ θ (in particular if the function θ is the join) then each set { \funf (x)|i:ℑ} has an upper
i i
u.
(b)
The function p:x→ ∨\X (\funf (x )), if it exists, is monotone, because if x′ ≤ x then ∀ i.\funf (x′) ≤ \funf (x) ≤ p
i∈ℑ i i i
(x) and so since p(x′) is the least upper bound we have p(x′) ≤ p(x). It follows that p is the join of the functions. []
REMARK 3.5.8
(a)
However, knowing that there is some {\funf (x)|i: ℑ} ≤ \argu for each x ∈ X is not sufficient to give a bound for
i x
the set of functions, because x→ \argu need not be monotone. Indeed we need the axiom of choice (Definition
x
That part (b) above works and this doesn't illustrates the value of unique ness, cf Lemma 1.2.11; ``locally least''
may be good enough, but inserting ``minimal'' doesn't help. omitted diagram environment
(b)
Let θ be a monotone upper bound of the two functions illustrated, so θ(1) = f(1) = g(1), but θ(0) must also take
this value. This constant function is therefore the least upper bound in the function-space, but as θ(0)\not ≤ u it is
not the pointwise least upper bound. []
(a)
Then [X→ Y] also has all joins of shape ℑ , computed pointwise;
(b)
ev(-,x) preserves them for each x;
(c)
of course ev(f,-) ≡ f preserves them iff f does;
(d)
for p:ΓxX→ Y, if p(-,x) preserves such joins for each x ∈ X then so does \expx p:Γ → [X→ Y] by λx.p(-,x). []
So far we have been discussing the poset of all monotone functions, which is not what we want for domains. Abusing
notation, we also write [X→ Y] for the poset of Scott-continuous functions from X to Y, with the pointwise order, X and
Y now being dcpos. Recall that pointwise joins gave joins of functions.
PROPOSITION 3.5.10 The same holds with [X→ Y] reinterpreted to consist only of continuous functions. In particular,
[X→ Y] is a dcpo, and an ipo with ⊥[ ] = λx.⊥ if ⊥ exists.
X→ Y Y Y
PROOF: From Lemma 3.5.4, if each \funf preserves ℑ-indexed joins (for each j ∈ J), and ∨ \funf exists pointwise, then
j j j
it too preserves ℑ- indexed joins. Moreover it is the join amongst join-preserving functions. []
Beware that we have not said that ev:[X→ Y]xX→ Y preserves joins.
Joint continuity Proposition 3.5.2 stated the properties of products of domains analogous to the result preceding it,
apart from part 3.5.1(c). This deserves special consideration.
There are several ways of reacting to this; in particular Exercise 3.20 shows what is missing in the case of binary meets
or joins. For directed joins it turns out that the difficulty does not arise, but perhaps this just shows the poverty of order-
theoretic representations of semantics.
LEMMA 3.5.12 The product of two directed, semidirected or confluent (Definition 3.4.1) posets has the same property.
Moreover the diagonal function ℑ→ ℑxℑ is cofinal ( Proposition 3.2.10) iff ℑ is semidirected, whilst the function ℑ→
{∗} is cofinal iff ℑ is inhabited. []
COROLLARY 3.5.13 A function f:Xx Y→ Z of two arguments between dcpos is (jointly) continuous iff it is separately
continuous. In particular ev:[X→ Y]xX→ Y is jointly continuous.
↑
and separate continuity of f(-,∨ y ) and f(x ,-) gives
j i
We may compute binary meets pointwise if they commute with directed joins; a dcpo with this property is called a
preframe. Infinite meets in [X→ Y] may exist but need not be computed pointwise.
n
EXAMPLE 3.5.14 Consider X = Y = [0,1] ⊂ R (the unit interval) and (with J = N) let \funf :X→ Y by \funf (x) = x . The
n n
pointwise meet is discontinuous at 1 in both the Cauchy and Scott senses; the constantly 0 function is the meet in the
function-space. []
It is common for function-spaces to inherit the properties of the target domain, irrespective of the source, because the
function-space is often a subalgebra of a product. We have shown that this is the case for the existence of various classes
of joins (see also Exercise 3.21), but there are simple counterexamples to this behaviour for the trichotomy property.
Scott's thesis The origin of the work in these three sections was Dana Scott's unexpected discovery that non-syntactic
models of the untyped λ-calculus could be constructed from certain topological spaces (1969). These very un-
geometrical spaces were algebraic lattices endowed with Scott's topology. Their retracts, known as continuous lattices,
also arose from abstract work in topological lattice theory [ GHK+80].
Scott proposed that topological continuity be used as an approximation to computability. There were precedents for this
idea in recursion theory around 1955. Yuri Ershov observed the analogy between the lattice of recursively enumerable
sets and a topology. The Rice-Shapiro Theorem says that any recursively enumerable set of partial recursive functions
( ie sets of codes such that if one code for a function N\rightharpoonup N belongs to the set then so do all others) is
Scott-open. The Myhill-Shepherdson Theorem says that any recursive f:[N\rightharpoonup N]\rightharpoonup [N
\rightharpoonup N], as we would write it domain-theoretically, is Scott-continuous.
Christopher Strachey and others applied Scott's work to denotational semantics of programming languages, where the
lattice element T was inappropriate. Scott and his followers repeatedly simplified the theory for this new audience, with
the result that order theory replaced topology in the formal development. In particular, the term Scott domain came to be
applied to any boundedly complete algebraic dcpo X for which \X is countable (Exercise 3.21).
fg
Domain theory can solve, not only fixed point equations (Example 3.3.1), but also type-equations, such as X ≡ [X→ X]
for the untyped λ-calculus. The right hand side may involve any of the type-constructors on ipos in an arbitrarily
complicated way, giving a domain of mathematical meanings for objects with functions, case analysis and non-
determinism.
Here and in Section 4.7 we interpret the λ-calculus , primitive recursion on N and the fixed point operator Y. Gordon
Plotkin [Plo77] considered these as a programming language ( PCF), with call-by-name evaluation (Remark 2.3.4). He
showed that any program (closed term of type N) whose denotation in IPO is a numeral in LiftN ( ie not ⊥) terminates
with that value, so there is a link back from the semantics to the syntax. This can now be proved by methods like those
in Section 7.7.
However, parallel or ( por(yes,⊥) = yes = por(⊥,yes)) is also interpreted in IPO, but is not definable in PCF, whose
programs execute ``sequentially.'' The tight link is broken for higher order terms, as there exist such terms that can
``recognise'' por as an argument. By adding por and a similar ``existential quantifier'' to PCF, Plotkin was able to extend
the correspondence to higher types. Gérard Berry eliminated por with a different notion of ``domain,'' cf Example 4.51,
but more complicated examples recur. The sequentiality and ``full abstraction'' problems remained open for two
decades; they were solved for PCF in 1994 by Abramsky, Hyland, Jagadeesan and Ong, using games.
Without bounded completeness, function-spaces of algebraic dcpos need not be algebraic. Achim Jung showed that his
own L-domains (Exercise 3.34) and Plotkin's SFP domains are the two maximal cartesian closed categories ( ie closed
under function-spaces) of algebraic dcpos [Jun90].
The search for cartesian closed or ``convenient'' categories in topology is much older, and equally inconclusive. The
X
function-space S (where S is the Sierpi\'nski space, Definition 3.4.10) only exists with the properties it should have
X
when X is locally compact, and even when Y exists it need not be locally compact. A famous cartesian closed full
subcategory of compact Hausdorff spaces was found by John Kelley [Kel55], and there are other approaches to topology
with different notions of function-space. Example 9.4.11(f) shows how a certain generalised function-space first arose
geometrically.
Because of the generality of the infinitary joins required in topology, a function may be topologically continuous
without being computable. One can add the word ``effective'' throughout the theory, but it seems to me to be very
clumsy to bolt together two subjects like this. There ought to be a common axiomatisation, of which the free model
would be equivalent to recursion theory, but with another model consisting of certain spaces. Synthetic domain theory
abolishes non-computable functions between sets themselves, by refining the underlying logic.
This cannot be done in classical logic, because the extra axioms (such as the Church-Turing thesis) conflict with
excluded middle. However, the use of excluded middle so infests existing accounts of mathematical foundations that it
was necessary to start from the beginning, although synthetic domain theory is beyond the scope of this book.
3.6 Adjunctions
Adjunctions unify the treatment of the logical connectives. Generalised from propositions to types, they
not only handle the operations x, → , +, etc but also account for the ``universal properties'' that we have
been meeting, such as Propositions 3.1.10, 3.2.7(b) and 3.4.12. Adjunctions themselves also arise as a
common method of construction involving powersets. In view of this and the shift in the next chapter to
categories, we now call the posets S and A, and their elements X ∈ S and A ∈ A.
DEFINITION 3.6.1 Let F:S→ A and U:A→ S be functions (not a priori monotone) between preorders. We
say that F and U are
(a)
adjoint, written F\dashv U, if
(b)
A S
a Galois connection if ∀X,A. A ≤ F (X)⇔ X ≤ U(A);
(c)
A S
a co-Galois connection if ∀X,A. F(X) ≤ A ⇔ U(A) ≤ X.
Galois and co-Galois connections are also known as contravariant or symmetric adjunctions, on the
right and left respectively. They are the same as covariant adjunctions ( ie the first kind) between S and
op
A , but Section 3.8 shows how (a) and (b) arise between full powerset lattices in two idiomatic but
different ways. Exercise 4.27 uses all three as interesting type-connectives between complete
semilattices.
(a)
F and U being monotone with \id ≤ Uo F and Fo U ≤ \id ;
S A
(b)
F and U being antitone with \id ≤ Uo F and \id ≤ F o U;
S A
(c)
F and U being antitone with Uo F ≤ \id and Fo U ≤ \id ;
S A
where ≤ is the pointwise order (Proposition 3.5.5). Moreover if S and A are posets, then in all three
cases F = Fo Uo F and U = Uo Fo U.
PROOF: Suppose F\dashv U. Let X ≤ Y and put A = F(Y); then F(Y) ≤ A so X ≤ Y ≤ U(A), whence F(X) ≤
A. Next, F(X) ≤ F(X) so X ≤ U(F(X)). Monotonicity of U and the other inequality are the same.
Conversely if F(X) ≤ A then X ≤ U(F(X)) ≤ U(A), and similarly F(X) ≤ F(U(A)) ≤ A. Finally F(X) ≤ F(U
(F(X))) since X ≤ U(F(X)), and F(U(F(X))) ≤ F(X) since F(U(A)) ≤ A so F = Fo Uo F by antisymmetry.
The results for Galois connections are obtained by reversing some of the inequalities. []
COROLLARY 3.6.5 If S is a poset and F:S→ A has a right adjoint U then it is unique. Similarly for left
adjoints. []
This is one of the circumstances in which it is convenient to use both the left- handed (o) and right-
handed (;) notations together. []
REMARK 3.6.7
(a)
Any isomorphism between preorders is an adjunction, in two ways, with ≤ replaced by equality.
op
Likewise any duality (isomorphism S ≡ A ) is both a Galois connection and a co-Galois
connection.
(b)
Replacing ≤ in Definition 3.6.1 by the equivalence relation ∼ (the conjunction of ≤ and ≥ ,
Proposition 3.1.10), we obtain the notion of a strong equivalence of preorders: \id ∼ UoF and
S
FoU ∼ \id .
A
(c)
A full monotone function F:S→ A for which
∀A:A.∃X:S.F(X) ∼
A
is called an equivalence function; for example Proposition 3.1.10showed that any preorder is equivalent
to a poset. Using the axiom of choice, every equivalence function is part of a strong equivalence (see
Exercise 3.26for why Choice is necessary).
The last does not, as it stands, give an equivalence relation on the class of preorders: it is reflexive and
transitive but not symmetric. But as it is also confluent (Exercise 3.27) it is easy to find the equivalence
closure:
(d)
Two preorders S and \cat T are said to be weakly equivalent if a third preorder A and equivalence
functions S→ A← \cat T are given.
Although Proposition 3.1.10 makes these notions somewhat redundant for preorders, we shall need them
for categories (Definition 4.8.9).
The adjoint function theorem Theorem 3.6.9 is the most important result about infinitary meets and
joins. However, it is rarely stated: since the formula for the adjoint is notationally simple, it is used,
embedded in more complicated calculations, without appreciating the significance of the theorem.
F preserves any (locally) least upper bounds which exist, and U preserves and greatest
lower bounds which exist.
Functions which are symmetrically adjoint on the left preserve least upper bounds and functions which
are symmetrically adjoint on the right preserve greatest lower bounds.
PROOF: Let C be an upper bound of ℑ ⊂ S and θ likewise of the image, \leftadj!(ℑ) = {F(X)|X ∈ ℑ}.
Then
(a)
F(C) is an upper bound of \leftadj!(ℑ) by monotonicity.
(b)
Since ∀X.F(X) ≤ θ, we have ∀X.X ≤ U(θ) by adjointness, so U(θ) is an upper bound of ℑ.
(c)
If C is a least upper bound of ℑ then C ≤ U(θ), so F(C) ≤ θ.
(d)
If C is locally least and F(C) ≤ A ≥ θ then C ≤ U(A ) ≥ U(θ), so C ≤ U(θ) and F(C) ≤ θ.
The other cases are the same, switching arguments and inequalities. []
The adjoint function theorem can be proved in a similar way for meets and joins of
diagrams (Definition 3.2.9) rather than subsets. Exercises 3.33ff generalise to minimal and locally least
upper bounds.
THEOREM 3.6.9 Let S be a poset with all joins, A a preorder and F:S→ A a function. Then F has a right
adjoint U:A→ S iff it preserves all joins. Similarly a function from a poset with all meets has a left
adjoint iff it preserves all meets.
PROOF: As S has arbitrary joins and F preserves them, the lower subset {X:S|F(X) ≤ A} is representable,
ie of the form S↓ \typeX0 for some unique \typeX0 ∈ S (Definition 3.1.7). Indeed
(a)
The unique function F:S→ {∗} has a right adjoint iff S has a greatest element. The right adjoint
U is ∗→ T, and T is the join of the whole poset (considered as a diagram id:S→ S).
(b)
The diagonal function F:S→ SxS has a right adjoint iff S has meets of pairs, and then the right
adjoint is U(X,Y) = X∧Y.
(c)
ℑ
More generally, let ℑ be any diagram shape, and put A = S and \constfunct (X) = λi.X ∈ A.
A
Then γ ∈ S is a lower bound for a diagram d:ℑ→ S iff \constfunct (γ) ≤ d. The right adjoint,
\constfunct \dashv U, provides the meet, U(d) = ∧\fund!(ℑ); it is given by ∨{γ: S|\constfunct (γ) ≤
A
d}. (Notice that this repeats the same calculation at each co-ordinate i ∈ ℑ.)
Similar results hold for ⊥ and joins, which are given by left adjoints to the diagonal or constant
functions. In particular,
(d)
the regular representation X↓ (-):A → shv(A) (Definition 3.1.7), which preserves meets by
Proposition 3.2.7(a), has a left adjoint iff A has all joins, and then the left adjoint is ∨.
REMARK 3.6.11 These observations, that ∨\dashv \constfunct \dashv ∧ and meets or joins commute with
adjoints, are summed up by the diagram
ℑ
where \constfunct :X→ λi.X:S→ S . Using the composition (Lemma 3.6.6) and uniqueness
ℑ ℑ
(Corollary 3.6.5) of adjoints, U;\constfunct = \constfunct ;U implies Fo∨ = ∨ o F , and F;
S A S A
ℑ ℑ
\constfunct = \constfunct ;F implies Uo∧ = ∧ o U .
A S A S
So long as there are enough meets and joins, the converse holds.
DEFINITION 3.6.12 A poset is a complete join-semilattice if every subset has a join, and similarly a
complete meet-semilattice if every subset has a meet; if both of these hold we call it a complete lattice.
Any complete join-semilattice is a complete meet-semilattice and vice versa , but we make the
distinction between complete meet- and join-semilattices and complete lattices because, although the
existence of one structure forces that of the other, a monotone function preserving all meets need not
preserve all joins, cf Remark 3.4.5.
By Corollary 3.6.5, a function can have at most one adjoint on each side, but it can have both of them,
and there are strings of any finite length of successively adjoint monotone functions. Remark 3.8.9
characterises complete sublattices of powersets, ie inclusions with both adjoints.
(a)
The inverse image operation F = g* on open sets for a continuous function g between topological
spaces preserves finite intersections.
(b)
In domain theory, if \id = UoF and FoU ≤ \id with U Scott-continuous ( ie preserving directed
S A
Frames and Heyting algebras Let us consider when F:Ω→ Ω by α→ α∧φ has a right adjoint.
PROPOSITION 3.6.14
(a)
Let U:β→ (φ⇒ β), so F\dashv U; then these operations preserve joins and meets respectively,
(b)
Conversely, by the adjoint function theorem, in a complete lattice, F has a right adjoint iff binary
meet distributes over arbitrary joins,
omitted prooftree
environment
cf the introduction and elimination rules for implication (Remark 1.4.3 and Definition 1.5.1).
Exercise 3.28 gives a purely equational version. Heyting semilattice homomorphism s by definition
preserve ⇒ , whereas semilattice homomorphisms need not.
A Heyting lattice also has ⊥ and ∨, and these are to be preserved by Heyting lattice homomorphisms. A
Boolean algebra is a Heyting lattice in which x = \lnot \lnot x, where \lnot x is x⇒ ⊥, for which the truth
table (Remark 1.8.4) gives a normal form. In the language of Heyting semilattices, by contrast,
expressions may require nested implications , cf Convention 2.3.2 for bracketing function-types.
Powersets in classical and intuitionistic logic provide examples of complete Boolean and Heyting
lattices respectively. See [ Joh82, p. 35] for a picture of the free Heyting lattice on one generator.
DEFINITION 3.6.16 A complete join-semilattice in which ∧ distributes over joins is called a frame. The
open sets of any topological space form a frame, and the inverse image operation of any continuous
function preserves finite meets and arbitrary joins. Implication and infinite meets exist, but need not be
preserved by frame homomorphisms: once again, the name of the objects changes when the morphisms
change. Frames will be discussed in Theorem 3.9.9ff.
DEFINITION 3.7.1 A monotone endofunction M:S→ S of a poset such that \id ≤ M = MoM is called a
S
closure operation.
LEMMA 3.7.2 For any adjunction, U is injective iff F is surjective iff Fo U = \id . Every closure
A
split the idempotent M (Definition 1.3.12); F takes an element X to the least A which is a fixed point of
M above X (Remark 3.6.3). []
PROPOSITION 3.7.3 Let A ⊂ S be the image of a closure operation, so meets in A agree with those in S by
Proposition 3.6.8. To calculate joins in A, we first find them in S and then apply the closure operation.
A is already closed under any joins the closure operation preserves. []
In particular, topological closures preserve finite joins, and the finitary closure operations which arise in
algebra are Scott-continuous.
Closure conditions Behind this static definition lies a dynamic point of view, which leads us into a
notion of induction. Unfortunately the established terms ``operation'' and ``condition'' convey exactly the
wrong idea. This kind of induction is actually more general than that treated in Section 2.5, as it allows
for the non-deterministic recursive paradigm which lies behind resolution with backtracking in logic
programming.
Given any subset X of some space, let TX be the set of limits t (in the sense of analysis) of sequences
(\termu ) ⊂ X. Now TX need not be closed, so we form the set T2X of limits of sequences (\termu ) ⊂
i i
n
TX , and so on, but, unlike Proposition 3.3.11, even ∪ T X need not be closed. Nevertheless, MX exists
n
DEFINITION 3.7.4 A system of closure conditions on a set Σ is given by any relation \triangleright :P(Σ)
\leftharpoondown\rightharpoonup Σ, without restriction. Then a subset A ⊂ Σ is said to be \triangleright -
closed if
Let A ≡ Mod(L) be the set of \triangleright-closed subsets, ordered by inclusion, and U:Mod(L)
\hookrightarrow P(Σ). Since this is closed under intersections , the Adjoint Function Theorem 3.6.9
gives F\dashv U, and we call F(X) the closure of X ⊂ Σ. (We shall write FX ∈ A for the closure
considered as a model or closed subset, and MX ∈ S for its underlying set.)
The left hand side of K\triangleright t is called the arity of this instance of the closure condition: the
choice of letter reflects our use of k for the arity of operation-symbols. If this is always a finite set, we
sometimes write it as a list, and say that \triangleright is finitary.
(a)
If \triangleright is a functional relation such as
{u,v}\triangleright u+v, ie ∀u,v:Σ. u,v ∈ A⇒ u+v ∈
A,
then \triangleright -closed subsets are subalgebras with respect to +.
(b)
Convexity in an affine space: {u,v}\triangleright λ u+(1-λ)v for 0 ≤ λ ≤ 1.
(c)
(d)
Transitivity is a closure condition on pairs (instances of ≤ ),
{(x,y),(y,z)}\triangleright (x,
z),
rather than on elements. Reflexivity and symmetry are nullary and unary conditions on pairs: ∅
\triangleright (x,x) and {(x,y) }\triangleright (y,x).
(e)
Ideals and normal subgroups are subgroups satisfying an extra unary closure condition: {u}
\triangleright ru or {u} \triangleright a-1ua respectively.
REMARK 3.7.6 A nullary closure condition ∅\triangleright t says that t must be in every closed set. If
\triangleright consists only of such conditions, for t ∈ G, then F(X) = G∪X and A is the upper set G↓ S
(Example 3.1.6(f)).
For any system of closure conditions, in order to find the closure F(G) of a set G (of generators), we
may instead consider the extended system \triangleright ′, with an extra nullary closure condition ∅
\triangleright ′x for each x ∈ G. We write \triangleright +G for \triangleright ′. Without loss of generality
we therefore need only consider F(∅), which is the smallest \triangleright -closed set, so long as we
study closure conditions in general, for example in Corollary 3.9.2.
REMARK 3.7.7 In any situation A ⊂ Σ, we may always regard Σ as a set of propositions: each element x:Σ
names the proposition ``x ∈ A .'' Then we can read any finitary closure condition
θ,
ie a propositional Horn clause (Remark 1.7.4). A nullary condition is called an axiom. Conversely, any
(single, predicate) Horn clause is a scheme of closure conditions given by instantiating terms for its free
variables. A system of finitary closure conditions is therefore also called a Horn theory (a system of
Horn clauses involving predicates is the dependent type analogue, Chapter VIII).
In logic programming (Remark 1.7.2), the collection of terms generated by the constants and operation-
symbols (of the underlying term calculus) is known as the Herbrand base, and the set Σ of all instances
of the predicates is the Herbrand universe. A program is then a Horn theory.
DEFINITION 3.7.8 The induction scheme for a system of closure conditions \triangleright is
omitted prooftree
environment
where A = F(∅) is the smallest closed subset. This is valid because the premise says that Θ = {x:Σ|θ[x]}
is itself a \triangleright -closed subset.
EXAMPLES 3.7.9
(a)
By Remark 3.7.7, the induction scheme says that if the Horn clauses are sound then all of the
propositions in A = F(∅) are true.
(b)
Structural induction is of this form, eg to prove associativity of concatenation of lists
(Proposition 2.7.5), put
(c)
Course-of-values induction on N is given by {0,1, …,n-1}\triangleright n.
(d)
The correctness of a logic program is established by induction on the system of closure
conditions which it codes. Conversely, any Horn theory has a procedural reading, the goal being
to prove x ∈ A.
(e)
For any binary relation \prec on a set Σ, let {u |u\prec t}\triangleright t for each t ∈ Σ. Then
induction on \triangleright as a closure condition is the same as well founded induction for \prec
(Definition 2.5.3).
(f)
To recover \prec from \triangleright , for each t ∈ Σ there must be a unique set K with K
\triangleright t. (It's called parse(t) in Example 6.3.3 .) This typically fails for subalgebras, for
example {u,v} \triangleright u+v in a vector space does not allow the recovery of u and v. It is
also the reason why backtracking is needed in logic programs (Remark 1.7.3).
Using results from earlier in this chapter, we can derive stronger idioms of induction for the set A = F
(∅). In particular, there is a `` systematic'' way of generating it.
TX = {t:Σ|∃K.K ⊂ X∧K\triangleright
t },
ie the application of the conditions once. Then A is \triangleright -closed iff TA ⊂ A, and the smallest
\triangleright-closed subset of Σ is the least fixed point of T. Every monotone endofunction T arises in
this way, with K\triangleright ′t if t ∈ TK. Then any subset is \triangleright ′-closed iff it is \triangleright
- closed, though the original \triangleright itself cannot be recovered from T. []
Exercises 3.40 and 3.42 give the induction scheme in terms of T. We can use induction on closure
conditions to investigate the least fixed point.
PROPOSITION 3.7.11 Let S be a complete (join-semi)lattice and T a monotone endofunction of it. Then T
has a least fixed point A ∈ S, whose properties may be deduced from the following induction scheme:
omitted prooftree
environment
PROOF: Define a closure condition on Σ = S by ℑ\triangleright ∨ℑ for all ℑ ⊂ Σ, and also {U}
\triangleright T(U). (Notice that we have not said U ≥ V⇒ U\triangleright V: see Exercise 3.41. Nor is
this the closure condition derived from T in the lemma.) In particular ∅\triangleright ⊥, and any closed
subset A has a greatest element A = ∨A ∈ A. By \triangleright -induction, ∀X.X ∈ A⇒ X ≤ T(X), whilst
∀X.X ∈ A⇒ X ≤ θ for any fixed point θ = T(θ). But T(A) ∈ A so A = T(A) is the least fixed point of T.
Other properties of A may be proved by instantiation of \triangleright -induction at A ∈ A = F(∅). []
REMARK 3.7.12 For finitary closure conditions, the function T defined in Lemma 3.7.10 is Scott-
n
continuous, and F(∅) = \dirunion T (∅) is the least fixed point by Proposition 3.3.11. Then in the
n∈N
induction scheme it is enough to consider countable directed subsets ℑ, or just ω-sequences. This idiom
of induction is due to David Park (1976). When S = P(Σ), as in Lemma 3.7.10, this construction says
n
that if X ∈ F(∅) ⊂ Σ then X ∈ T (∅) for some n. Classically, the least such n is called the rank of X, but
in general this is not well defined intuitionistically.
It is a common but very clumsy idiom to prove something over a closed set by induction on the first time
n when each element X gets in. For example, the rank of a goal in a logic program is the depth of its
proof tree, which gives a lower bound on the execution time. But it is both difficult to calculate, and
n
extremely crude, as the corresponding upper bound is k , where k is the maximum arity. As in
Remark 2.5.7, the assumption that n is least is unnecessary: it would be clearer to use induction on the
closure condition directly. By Lemma 3.7.10 these two idioms of induction are equally expressive.
We have seen that Tarski's Theorem can be proved without the Scott-continuity assumption, and so can
properties of the least fixed point:
THEOREM 3.7.13 Let \triangleright be a (possibly infinitary) system of closure conditions on a set Σ and θ
[X] any predicate on S = P(Σ) such that
(a)
θ[∅] holds;
(b)
if θ[X] holds and X ⊃ K\triangleright t then θ[X∪{t}] also holds;
(c)
↑
if X = ∪ \typeX such that each θ[ \typeX ] holds then θ[X] holds.
i i
φ[X] ≡ ∀Y.θ[Y]⇒ θ
[X∪Y] ,
and check that it satisfies (a)-(c), so θ[X]⇒ φ[X], since θ is least. Then θ[X]∧θ[Y]⇒θ[X∪Y], so θ
preserves arbitrary unions by (a) and (c), and there is a greatest \typeX0 with θ[\typeX0] by the adjoint
function theorem. By (b), \typeX0 is \triangleright -closed, so A ⊂ \typeX0. The predicate ψ[X] ≡ (X ⊂ A)
also satisfies the conditions, so θ[X]⇒ ψ[X] and A = \typeX0. []
Classically, the least fixed point can be approached by a transfinite union, which we investigate in
Section 6.7, but Exercise 3.40 defines an intrinsic notion of ordinal for a particular problem.
For the obvious reasons of constructivity, we are primarily interested in finitary algebra, and we shall
find in Section 5.6 that there is a significant obstacle to the extension of equational reasoning to
infinitary operations. Why, then, should we be interested in infinitary algebra at all? Since there is a
duality between finiteness and (infinitary) directed joins, the more we concentrate on the finitary, the
more we need to know about infinitary operations and Scott-continuous functions between domains.
Section 3.9 considers the lattice of models of any Horn theory, and the semilattice which ``classifies''
models, making use of induction on closure conditions. We shall study algebraic theories in Section 4.6,
using closure conditions in Sections 5.6 and 7.4 to impose relations. Recursion for free algebraic
theories is the subject of Chapter VI. Closure operations are generalised to reflective subcategories in
Examples 7.1.6 and further to monads in Section 7.5.
Modal logic Consider the special case of a system of unary closure conditions, where we abbreviate
{u}\triangleright t to u < t. A subset is closed under such a condition iff it is an upper subset with respect
to the reflexive-transitive closure ≤ of < . (Recall Warning 3.1.4, in particular < need not be transitive.)
The ``one-way'' closure of a point under a unary closure condition is called its trajectory, and the ``two-
way'' closure under a symmetric relation is known as the orbit.
(a)
an equivalence relation are equivalence classes;
(b)
n
u < u or u < nu in a monoid are cycles;
(c)
u < g-1;u;g in a group are conjugacy classes.
These examples illustrate that the relation < may arise naturally as a function, and we do not always
replace it in our considerations by its transitive closure. For this reason we shall allow the relation < :X
\leftharpoondown \rightharpoonup Y to have different sets for its source and target.
Recall that, for finitary closure conditions, the functions U, M and T preserve directed joins, ie ``all but''
finitary joins (Exercise 3.14). In the unary case they preserve all joins, so have right adjoints, which we
write as < ≡ T\dashv[ > ] and ≤ ≡ M\dashv [ ≥ ]. It may help to think of x < y as meaning that x is the
present and y is a potential future world. Modal logic has medieval and even ancient roots, but its
modern study was begun by Clarence Lewis (1918) and models based on order relations were first given
by Saul Kripke (1963).
for V ⊂ Y. If the relation is unambiguous, we just write < > and []. With the opposite relation, > U and
[ > ]U ⊂ Y are similarly defined for U ⊂ X. Other adverbs used for [] include hereditarily and stably.
EXAMPLE 3.8.3 In Sections 4.3, 5.3 and 6.4 we shall show how to prove correctness of (imperative)
programs by means of statements of the form U ⊂ < V and U ⊂ [ < ]V, ie if the initial state belongs to U
then
(a)
some execution which terminates does so with a final state in V, or
(b)
every execution which terminates does so with a final state in V.
(c)
[(a′)] the program does terminate, and the final state is in V,
which is called total correctness. For terminating deterministic programs, the statements are the same,
but partial correctness, that
(d)
[(b′)] if the program terminates, then the final state is in V,
EXAMPLE 3.8.4 Let H ⊂ G be a subgroup of a group, and u < g-1;u;g the conjugacy relation
(Example 3.8.1(c)). Then the core of H,
Modal logic is the fragment of predicate calculus consisting of formulae with just one free variable.
Since the predicate quantifiers ∃ and ∀ bind a variable, in order to stop the calculus from degenerating
altogether, we allow the quantifiers < > and [] to introduce a new free variable for each bound one, by
means of a binary relation < .
so < > preserves disjunction and [] preserves conjunction, and they obey the intuitionistic (\lnot < > = []
\lnot ) de Morgan law. They also satisfy
⇒
∀y.x < y⇒ y ∈ U ∧ ∃y.x < y ∈ V ∃y.x < y ∈ U∩V ,
and x < y⇔ x ∈ < {y}⇔ y ∈ > {x}. []
The usual features of binary relations translate directly into properties of modal operators (Lemmas 3.8.8
and 3.8.12). Since we're interested in < -closed subsets, we consider preorders first, and functions later.
The transitive closure The following account is due to Gottlob Frege (1879), and is the propositional
analogue of the unary treatment of lists in Remark 2.7.10. It will be used for while programs in
Section 6.4.
DEFINITION 3.8.6 Let ( < ),Θ:X\leftharpoondown \rightharpoonup X be binary relations, which need not
be transitive (Warning 3.1.4). Instead of using the binary closure condition on pairs (x,y) that was used
to axiomatise transitivity in Example 3.7.5(d), consider the nullary and unary ones
omitted prooftree
environment
This corresponds to unary Peano induction for cons and listrec (Remark 2.7.10). We now show that ≤
with this definition is the smallest reflexive-transitive relation which contains < . The binary closure
condition which states transitivity and its associated induction scheme correspond to append and fold
for lists in Definition 2.7.4ff.
PROPOSITION 3.8.7
(a)
≤ is transitive,
(b)
it satisfies a base/step parsing rule ( cf empty/head+ tail),
x ≤ z ⇔ x = z ∨ ∃y.x < y ≤
z,
(c)
and a binary induction scheme ( ie it is the transitive closure),
omitted prooftree
environment
Proof: These follow from the unary induction scheme for various Θ.
(a)
[[a]] Consider xΘy ≡ (∀z.y ≤ z ⇒ x ≤ z).
(b)
[[b]] Put Θ for the right hand side, so ∆ ⊂ Θ and
(c)
[[c]] From the premise of the binary rule we have ( < );Θ ⊂ Θ;Θ ⊂ Θ, so the unary rule applies. []
Modal logic for preorders This insight into the relationship between [ < ] and [ ≤ ] will enable us to
complete some unfinished business from Section 2.6, where we promised that modal logic would
greatly facilitate the study of well-foundedness.
(a)
reflexive iff > is reflexive iff id ≤ < iff id ≥ [ < ], and
(b)
transitive iff > is transitive iff < 2 ≤ < iff [ < ]2 ≥ [ < ].
The propositional calculus extended with these two modalities arising from a preorder is known as the
modal logic S4. Both [] and < > split into adjunctions, which combine with those which we identified
earlier.
REMARK 3.8.9 For any subset U ⊂ Σ, ≤ U is the down-closure of U and [ ≥ ]U is the largest lower subset
contained in U, so these closure and coclosure operations have the same image.
An informal way of putting this is that, whereas any closure operation rounds up, in the unary case we
can also round down to a closed subset.
The lattices shv(Σ, ≤ ) and shv(Σ, ≥ ) are the covariant and contravariant regular representations
(Example 3.1.6(f)ff) of ≤ , or of any relation < of which it is the reflexive-transitive closure.
LEMMA 3.8.10 The parsing rule for the transitive (irreflexive) closure is
THEOREM 3.8.11
(a)
The well founded induction scheme (Definition 2.5.3) is equivalent to its strict (⇔ ) form.
(b)
The transitive closure of a well founded relation is also well founded.
(a)
[[a]] Suppose that θ satisfies the lax \prec - induction premise, [\succ ]θ⇒ θ, and the strict \prec -
induction scheme is valid: if [\succ ]ψ ⇔ ψ then ∀x.ψ[x]. So [\succ ][\succ \succ ]θ = [\succ
\succ ][\succ ]θ⇒ [\succ \succ ]θ by the lax \prec -premise, but by parsing
[\succ \succ ]θ = [\succ ][\succ \succ ]θ∧[\succ ]θ ⇒ [\succ ][\succ \succ ]θ, so ψ ≡ [\succ \succ ]
θ⇔ [\succ ]ψ. Hence ψ holds by strict \prec -induction, and a fortiori so does [\succ ]θ, whence θ
follows by the lax \prec -premise again. (The other way is trivial.)
(b)
[[b]] Suppose that θ satisfies the \prec \prec - induction premise, [\succ \succ ]θ⇒ θ, and the
\prec -induction scheme is valid, ie if [\succ ]ψ ⇒ ψ then ∀x.ψ[x].
Parsing says [\succ \succ ]θ = [\succ ][\succ \succ ]θ ∧[\succ ]θ, but this is just [\succ ][\succ
\succ ] θ, since the first term implies the second by the \prec \prec - premise. So ψ ≡ [\succ \succ ]
Functions and quantifiers Since modal logic is its unary fragment, in order to recover the predicate
calculus we must make use of pairs to encode many-place predicates. Consider (γ,x) < γ, the product
projection π0 = [^(x)]:ΓxX→ Γ. For a context Γ consisting of typed variables, recall that Cn(Γ) is the set
of formulae in these variables. For an extra variable x, there is an inclusion [^(x)]*:Cn(Γ) ⊂ Cn([Γ,x:X])
called weakening (Remark 2.3.8). Similar results hold for any functional relation in place of π0.
(c)
functional (single valued) iff < ≤ [ < ], iff < preserves binary ∧, iff [ < ] preserves binary ∨, iff < o
> ≤ ∆, iff id ≤ [ > ]o[ < ] , iff > o < ≤ id, where ∆ is the identity relation on Y, cf Lemma 1.2.11
and Exercise 1.16, and
(d)
total (entire) iff [ < ] ≤ < , iff < preserves T, iff [ < ] preserves ⊥, iff \idrel ≤ > o < , iff [ < ]o[ > ]
X
Injectivity and surjectivity are characterised by similar conditions on the opposite relation. []
op op
REMARK 3.8.13 ∃x ≡ π0 and ∀x ≡ [π0 ].
(a)
The adjoints to weakening are the quantifiers, cf Remark 1.5.5. omitted diagram environment
Consequently ∃x.- and ∀x.- preserve joins and meets respectively.
(b)
For any function, f = [f] = f* is the inverse image map, and this has adjoints on both sides, the
guarded quantifiers, cf Remark 1.5.2.
op op
f = \funf! = ∃ \dashv [f] = f* = f \dashv [f ] = \funf* =
f
∀
f
Galois connections Evariste Galois's name was given to Definition 3.6.1(b), by Ø ystein Ore, not
because he spent his short life (1811-32) considering such definitional minutiae, but because the
correspondence between intermediate fields and subgroups of the Galois group of a field extension
(Example 3.8.15(j)) was the first such situation known.
But the basic properties of the correspondence do not at all depend on groups and fields, so they are
repeatedly re-proved in the literature. In fact any binary relation ⊥:S\leftharpoondown \rightharpoonup
A gives rise to a Galois connection. It must not be confused with bottom or falsity, although it often has
a negative connotation: for x⊥a we read ``x is orthogonal to a.''
a ∈ F({x}) ⇔ x ⊥a ⇔ x ∈ U
({a}).
A Galois connection is often presented as the lower set
{X,
X ≤ \orthl A ≡ A ≤ \orthr X ≡ ∀x ∈ X.∀a ∈ A. x⊥a },
A|
which is closed under unions in each component (separately).
The ⊥-closed subsets on either side are also closed under any algebraic operations or other closure
conditions that ⊥ respects: for example they are automatically subgroups, subfields, etc .
EXAMPLES 3.8.15
(a)
(Abraham Lincoln, 1858) ``You can fool all the people some of the time, and some of the people
all the time, but you cannot fool all the people all the time.''
(b)
Let S = A be any set and x⊥a the inequality relation, x ≠ a, on S. Then \orthr X = S\X , and there
are enough As iff \lnot \lnot (x = y)⇒ x = y (a presheaf S with this property is separable in sheaf
theory).
(c)
Let S = A be a preorder and (⊥) ≡ ( ≤ ) its order relation. Then x ∈ \orthl A iff x is a lower bound
for A ⊂ S, and similarly a ∈ \orthr X iff a is an upper bound. Then each \orthr X is an upper set
and each \orthl A is a lower set. There are enough As iff ≤ is antisymmetric, and S is a complete
lattice iff every closed set is of the form \orthl {a} or \orthr {x}. For S = Q∩[ 0,1], the closed
subsets are one-sided Dedekind cuts ( cf Remark 2.1.1).
(d)
Let S be a collection of ``individuals'' and A be some ``properties'' with (⊥) ≡ (\vDash ) the
satisfaction relation. The set \orthr {x} of properties of an individual is closed with respect to the
logical connectives. If there are enough properties then \orthr {x} is a description and Leibniz'
principle holds, cf Proposition 2.8.7. The specialisation order on properties (\vDash ) is semantic
entailment, and it coincides with \vdash iff there are enough models, ie the system is complete
(Remark 1.6.13).
(e)
Let S be the set of points and A the set of lines in the plane, with ⊥ the incidence relation. Then
\orthl {a} is the set of points which lie on a line a and \orthr {x} the pencil of lines passing
through a point x.
(f)
As a special case of (d), let A be a topology on S and x⊥a the relation that a point belongs to an
open set; this even- handed view is the one taken in [Vic88]. Then \orthr X consists of the
neighbourhoods of the subset X and is closed under arbitrary unions and finite intersections.
\orthl {a} is the extent (set of points) of an open set. A spatial locale is one with enough points, a
T0 space has enough open sets ( cf Proposition 3.4.9).
(g)
In topology à la Fréchet, let x⊥a be instead the relation x ∉ a. This respects convergence of
sequences ( accumulation points) in S and arbitrary unions in A.
(h)
Let S be a vector space and A = (S\multimap K ) the dual space, with x⊥a if a(x) = 0. Then \orthr
X and \orthl A are both called annihilators; they are closed under addition and scalar
multiplication.
(i)
Let S = A be a group and x⊥a the property that they commute. Then \orthr X is the centraliser
subgroup for a subset X.
(j)
Let S be a field of numbers, A its group of automorphisms and ⊥ the relation that the
automorphism a ∈ A fixes the number x ∈ S, ie a(x) = x. The pair (F,U) is the original Galois
connection. Each \orthr X is a subgroup which is closed in a certain topology, whilst \orthl A is a
p
subfield such that if x ∈ \orthl A then x ∈ \orthl A, where p ≠ 0 is the characteristic of the field.
For us, the most important example of a Galois connection will be that defining the factorisation of
functions into epis and monos in Section 5.7, which we shall use to study the existential quantifier in
Sections 5.8 and 9.3. Unary theories are considered further in Sections 4.2 and 6.4, but the unary
closure condition that most concerns us is invariance under substitution or pullback in Chapter VIII.
As sheaf theory is outside the scope of this book, we just develop it far enough for lattices to illustrate
the connection. The notion of stable saturated (Exercise 3.36) coverage (Definition 3.9.6) is the
analogue for posets of a Grothendieck topology on a category, which is itself an intuitionistic version of
the forcing technique invented by Paul Cohen (1963) to show independence of Cantor's continuum
hypothesis. Saul Kripke used similar ideas to model intuitionistic logic (1965). For a full account, see
[ MLM92]; the propositional analogue of this subject, the theory of locales, is expertly described in
[Joh82] .
THEOREM 3.9.1 For any Horn theory L = (Σ,\triangleright ) there is a semilattice Cn\land which
L
omitted prooftree
environment
Cn\land is the free semilattice on Σ subject to the relations which \triangleright codes.
L
PROOF: The elements of Cn\land are contexts Γ, ie lists or finite subsets of elements of Σ considered as
L
closure of)
(a)
weakening: Γ,φ\vdash Γ, so ∆\vdash Γ whenever Γ is a subset of ∆, and
(b)
the axioms: K\vdash φ if K\triangleright φ.
Recall that, for us, lists on both sides of the turnstile mean conjunction , not disjunction. So Γ\vdash ∆ iff
each proposition θ in ∆ has a proof tree with root θ whose leaves are certain subsets of Γ and whose
nodes are instances of the \triangleright relation. Remark 1.7.2 showed how to find such a proof. The
union of two contexts defines the meet in this preorder: we can obtain a semilattice in the sense of
Definition 3.2.12 as a quotient using Proposition 3.1.10.
In the presence of a subset A ⊂ Σ, Remark 3.7.7 gave a propositional meaning, which we now write as
[[φ]] , to each element φ ∈ Σ. We extend this to contexts by conjunction, so that [[-]]:Cn\land → Ω is a
L
semilattice homomorphism. At least it is so long as we ensure that it is monotone. We have to check the
two generating cases of \vdash , but clearly [[Γ,φ]]⇒ [[Γ]] . For the other case, A obeys the closure
condition K\trianglerightφ iff [[K]]⇒ [[φ]] , so A has to be a model. The restriction to single-proposition
contexts recovers (the characteristic function χ of) the model from [[-]]. []
A
PROOF: The smallest \triangleright -closed subset A = F(∅) ⊂ Σ consists of the \triangleright -provable
propositions, ie φ ∈ A⇔ (\proves φ). All such φ are identified with T ∈ Cn\land when the order on Cn
L L
\land is made antisymmetric. But A is also a model, satisfying exactly those propositions that are true in
L
all models, since it is the set of propositions which are in all models.
We deduce completeness by varying \triangleright . Suppose that Γ\satisfies θ, ie for every model Γ ⊂ A
L
By the completeness theorem, \vdash coincides with the semantic entailment \vDash in Example 3.8.15
(d); then since there are enough models, to prove θ we need only show (classically) that there is no
counterexample, ie a model which distinguishes θ from T.
This way of constructing the relation \vdash in Cn\land , as the reflexive-transitive closure of two
L
∧
PROOF: In the canonical language L = L (C) = (Σ,\triangleright ), Σ is the set of elements of the
semilattice C, with the closure conditions ∅\triangleright T and {u,v}\triangleright u∧v. The elements of
the preorder Cn\land are finite subsets of Σ, with Γ\proves ∆ iff ∧Γ ≤ ∧∆, so the quotient poset is the
L L C
given C. []
Beware that if C = Cn\land was the classifying semilattice for some Horn theory (Σ0,\covers0), then the
L0
new Σ is bigger than Σ0 and the systems of closure conditions are also different.
Algebraic lattices Now we consider the lattice Mod(L) of models of a finitary Horn theory. The
classifying semilattice is static, and loses the dynamic information in the original theory; as the lattice of
models is also static, there is no harm in using the classifying semilattice C to represent the theory in the
next result.
THEOREM 3.9.4 The models of a finitary Horn theory form an algebraic lattice. Every algebraic lattice
arises uniquely in this way, in the sense that the classifying semilattice is unique up to unique
isomorphism.
PROOF: By Proposition 3.7.3, any directed union of models (\triangleright -closed subsets) is a model,
and a model is finitely generable in the sense of Definition 3.4.11ff iff it is the closure of some finite
subset. By Theorem 3.9.1, models of C correspond to semilattice homomorphisms C→ Ω, and so to
op
upper subsets containing T and closed under ∧. Since such subsets are ideals of C (Example 3.4.2(d)),
op
SLat(C,Ω) ≡ Idl( C ) ≡ Mod(L) ≡ A ≡ Idl
(\A ),
fg
op
using Proposition 3.4.12. Hence we recover C as (\A ) . []
fg
COROLLARY 3.9.5 There is an order-reversing bijection between finitely generated models A ∈ \A and
fg
contexts Γ ∈ Cn\land . []
L
This justifies the name algebraic lattice: recall that subalgebras and congruences were described by
Horn theories. In fact any algebraic lattice arises as the lattice of subalgebras of some algebra for some
theory.
Any system of (possibly infinitary) closure conditions has a complete lattice of closed subsets. This
lattice is algebraic - characterised in terms of directed joins - iff every instance K\triangleright t of the
closure condition contains a finite condition K′ ⊂ K with K′\triangleright t. Given that directedness can
be defined by a nullary and a binary condition, this result sheds some light on the notion of finiteness,
but we shall defer a full discussion (using closure conditions) to Section 6.6.
Adding and preserving joins Now we turn our study of closure conditions back onto the order theory
from which it came, repeating for arbitrary joins the treatment which we have just given to finite meets.
Beware that we have dropped stability under meets from the way in which the following ideas are
usually presented. Recall that P(Σ) has arbitrary joins, which in fact it freely adds to the set Σ. Similarly,
shv(Σ, ≤ ), which consists of the ≤ -lower subsets of a poset (Σ, ≤ ), freely adds joins respecting the
order ≤ (Proposition 3.2.7(b)).
Now we want to force some of the joins to have particular values, and in the extreme case retain all joins
which already exist in Σ. This can be done with the closure condition K\triangleright ∨K which we have
already met in Proposition 3.7.11, Example 3.8.15(g) and (for ∧) Proposition 3.9.3.
The order relation x ≤ y can be coded using joins (x∨y = y), and so by a closure condition, as in the
previous section. However, we prefer to take the lattice shv(Σ, ≤ ) of ≤ -lower subsets as our raw
material.
(a)
(b)
and is subcanonical if whenever K\triangleright t and K ≤ x then t ≤ x; in particular, if x = ∨K
then t = x, so the coverage is only used to nominate actual joins for preservation by the
embedding η below.
(c)
The canonical coverage of a complete join-semilattice Σ has K\triangleright ∨K for all K ⊂ Σ.
(d)
A \triangleright -closed ≤ -lower subset A ⊂ Σ is called a \triangleright - sheaf or \triangleright -
ideal:
x ≤ a ∈ A⇒ x ∈ A K\triangleright t & K ⊂ A⇒ t ∈
A.
THEOREM 3.9.7 Let \triangleright be a subcanonical coverage for a poset (Σ, ≤ ). Write A = shv(Σ, ≤ ,
\triangleright ) for the lattice of sheaves; these are the elements of shv(Σ, ≤ ) which ``think'' that t is the
join of K.
For each x ∈ Σ, the set η(x) ≡ Σ↓ x ≡ {a:Σ|a ≤ x} belongs to A, and η:Σ→A is monotone and obeys K
\triangleright t ⇒ η(t) = ∨ {η(u)|u ∈ K}.
A
Moreover it is universal: let Θ be another complete join-semilattice and f:Σ→ Θ a monotone function
such that f(t) = ∨ {f(u)|u ∈ K} whenever K\triangleright t. Then there is a unique function p:A→ Θ
Θ
PROOF: For a subcanonical coverage, each Σ↓ x is \triangleright -closed, whence η is full and preserves
arbitrary meets by Proposition 3.2.7(a). Similarly p(Σ↓ x) = ∨{f(a)|a ≤ x} = f(x) by monotonicity of f. If
the mediator p exists then it must be given by the formula in the diagram, since A = ∨ {Σ↓ a|a ∈ A}.
A
We can also show that p preserves joins by \triangleright -induction. Let \typeA ∈ A for i ∈ ℑ such that
i
∀i.p(\typeA ) ≤ θ ∈ Θ. We have to show that f(a) ≤ θ for all a ∈ A = ∨ \typeA , given that it holds for
i A i
all a ∈ \typeA . To satisfy Definition 3.7.8 we need (∀u.u ∈ K⇒ f(u) ≤ θ) ⇒ f(t) ≤ θ whenever K
i
EXAMPLES 3.9.10
(a)
Let ∅\triangleright ⊥ and {u,v} \triangleright u∨v for each u,v ∈ Σ (a join-semilattice). Then A is
in A iff it is a directed lower set, so A = Idl(Σ), which is an algebraic lattice (Definitions 3.4.1
and 3.4.11). The map Σ\hookrightarrow Idl(Σ) preserves finite joins by construction, but if Σ also
has meets ( ie it is a lattice) then these are preserved too, cf Exercise 3.17. If Σ is a distributive
lattice then Idl(Σ) is a frame.
(b)
Let K\triangleright t if K is directed with join t. Then A is \triangleright -closed iff it is Scott-
closed (Definition 3.4.7). In fact A = A+∪{∅} , where A+ is defined in the same way but with K
semidirected (so ∅\triangleright ⊥ and ⊥ ∈ A for all A ∈ A+). A+ is called the Smyth
powerdomain.
(c)
Let K\triangleright t if K is inhabited with join t. Then any inhabited subset A is \triangleright -
closed iff it has a greatest element, but ∅ is also \triangleright -closed. Then A is the lifting LiftΣ
for complete semilattices, cf Definition 3.3.7.
Joins which are stable under meet In conclusion, we restore the distributivity condition which was
stripped from Theorem 3.9.7.
THEOREM 3.9.11 Let \triangleright be a subcanonical coverage on a semilattice (Σ, ≤ ). Suppose that it is
stable, ie
Then A ≡ shv(Σ, ≤ ,\triangleright ) is a frame (Definition 3.6.16) and the left adjoint F of its inclusion in
shv(Σ, ≤ ) preserves finite meets. If Θ is also a frame and f:Σ→ Θ preserves finite meets then so does p:
A→ Θ.
PROOF: We always have F(X∩Y) ⊂ FX∩FY. A typical element of the right hand side is a∧b with a ∈ FX
and b ∈ FY, so consider
XxY ⊂ C = {(a,b):Σ2|(a∧b) ∈ F
(X∩Y)}.
We use a double induction to show that C = FXxFY. Suppose J\triangleright a, K\triangleright b with
JxK ⊂ C. Then \dgmJ = {j∧k|j ∈ J} ⊂ F(X∩Y), but \dgmJ \triangleright a∧k by stability, so a∧k ∈ F
k k
(X∩Y), since this is closed. Therefore \dgmK = {a∧k|k ∈ K} ⊂ F(X∩Y), but again \dgmK \triangleright
a a
Since shv(Σ, ≤ ), whose elements we call presheaves, is a frame, and p preserves joins, algebraic
manipulation easily shows that A is also a frame, and p preserves finite meets if f does and Θ is a frame.
[]
EXAMPLES 3.9.12
(a)
Let A ⊂ X be any subset of a topological space, with the subspace topology, and F:S
\twoheadrightarrow A the inverse image induced by the inclusion. This preserves meets and has a
right adjoint U. Indeed A is the image of a nucleus M on S, ie a meet-preserving closure
operation.
(b)
In particular, if A ⊂ X is open, then M = (A⇒ (-)). In this case, M has a left adjoint A∧( = ), and
A ≡ S↓ A .
(c)
If A is the complementary closed subset to B then M = (B∨(-)).
(d)
Double negation \lnot \lnot is a nucleus, for which A is a Boolean algebra. The open subsets of X
that are fixed by \lnot \lnot are called regular; in R these are unions of non-touching intervals, so
(0,1)∪(2,3) is regular open but (0,1)∪(1,2) isn't.
(e)
Let X and Y be the frames of open subsets of topological spaces X and Y. Put Σ = Xx Y with the
componentwise order. Where a ∈ X and b ∈ Y represent open subsets (``intervals'') in X and Y,
(a,b) will be the open ``rectangle'' in XxY . Whenever a = ∨K ∈ X and b = ∨J ∈ Y , let {(a,j)|j ∈
J}\triangleright (a,b) and {(k, b)|k ∈ K}\triangleright (a,b). This is a stable coverage and shv(Σ,
≤ ,\triangleright ) is the Tychonov product, which is the topology on XxY [ Joh82, pp. 59-62].
Generalising from propositions to types The remainder of the book will develop the analogues for
categories and types of the poset and propositional ideas in this chapter. In the propositional
terminology, Chapter IV studies posets, monotone functions, the transitive closure, universal properties,
Horn theories, Heyting semilattices and the pointwise order. The phenomena in Chapter V are largely
new to the level of types, but it does discuss distributive lattices (and use closure conditions to construct
quotient algebras). Chapter VI is about induction on infinitary closure conditions. Most of the
categorical developments are to be found in Chapter VII: adjunctions, closure operations, adding meets
and joins, the adjoint function theorem and the canonical language; we only touch on Galois
connections, modal logic and sheaf theory. Chapter VIII restores the predicates to Horn theories, and in
the final chapter we see how types and propositions interact in the behaviour of the quantifiers.
(a)
any irreflexive relation can be recovered from its reflexive closure, characterising (in
terms of decidability of equality) those reflexive binary relations that arise in this way
from irreflexive ones;
(b)
Proposition 2.5.6 and Definition 3.2.1(c) agree on minimality;
(c)
the interleaved product agrees with the componentwise order, as defined in Propositions
2.6.9 and 3.5.1;
(d)
any function which is injective and monotone ( ie it preserves the reflexive relation) is
strictly monotone (it preserves the irreflexive one), but not conversely.
Reformulate the induction scheme (Definition 2.5.3) with respect to the reflexive relation.
4. Show that (X, ≤ ) satisfies ∀x,y.x ≤ y∨y ≤ x iff it has binary joins and every monotone function f:
X→ Y preserves them. [Hint: use the representation by lower subsets.]
5. Show that an element of a poset is locally least iff it is least in its connected component (Lemma
1.2.4), and that if the poset has pullbacks (meets of pairs which are bounded above) then it
suffices that the element be minimal.
7. Describe the greatest common divisor and least common multiple of a pair of ideals in a
commutative ring.
8. Show that the various forms of the absorptive law mentioned before Proposition 3.2.13 agree.
9. Show that any lattice homomorphism between Boolean algebras also preserves \lnot and ⇒ .
Find a sublattice of a Boolean algebra which is a Heyting lattice but for which the implication
operations are different.
10. Let R be a ring such that ∀x:R.x2 = x. Show that ∀x.x+x = 0 and ∀x,y.x*y = y*x, and that x∨y ≡ x
+y-x*y, x∧y = x*y make R a Boolean algebra. Conversely, define the ring operations on any
Boolean algebra, and show that a function between such structures is a ring homomorphism iff it
is a homomorphism for the logical operations.
11. Let X be a distributive lattice, but write 0, + and x instead of ⊥, ∨ and ∧. Show that any subset ℑ
⊂ X is an ideal in the sense of ring theory (Example 2.1.3(b)) via this notation iff it is a directed
lower set.
12. Write \YY :[X → X]→ X for the operation which yields the least fixed point of a continuous
X
function on an ipo X. Express \YY as \YY (F) for some F:Y→ Y and hence deduce (without
X Y
13. (Hans Bekivc) Let h:DxE→ DxE in IPO. For any e ∈ E devise some \funf :D→ D and hence
e
define p(e) ∈ D as its fixed point. Similarly q:D→ E. Using \YY again, obtain a fixed point of
D
h.
↑
14. Show that if a poset X has, or a function X → Y preserves, both finite (⊥, ∨) and directed (∨ )
joins, then it has or preserves all joins.
15. Let U:J→ ℑ be a cofinal function between posets. Show that if J is directed then so is ℑ, and
conversely if U is also full. Show that any countable directed poset has a cofinal sequence, and
hence that the corresponding notions of countable Scott continuity coincide.
16. Suppose that every ipo has a maximal element (this assertion is known as Zorn's Lemma), and
assume excluded middle. Deduce the axiom of choice (Definition 1.8.8). [Hint: consider the ipo
of partial functions contained in the given entire relation.]
17. Let X be a meet-semilattice. Show that IdlX is a preframe, ie it has meets and they distribute over
directed join in each argument. Also show that X→ IdlX preserves T and meets.
18. Let Y be a preframe and X a dcpo. Show that [X→ Y], the dcpo of Scott-continuous functions, is
also a preframe, binary meets being computed pointwise.
19. Let X and Y be posets or domains. Show that their disjoint union X+Y (with no instances of the
order relation linking the two summands) obeys the rules for sums (Remark 2.3.10).
20. Let X, Y and Z be meet- semilattices and f:Xx Y→ Z a monotone function such that f(x,-) and f(-,
y) preserve binary meets, for each x ∈ X or y ∈ Y. Find a necessary and sufficient condition (in
terms of the order relations but not the meets) which makes f a semilattice homomorphism ( cf
Corollary 3.5.13).
21. A poset X is boundedly complete if any subset ℑ ⊂ X which has some upper bound ℑ ≤ θ
actually has a join. Show that the product and function-space of two such posets are boundedly
complete. Do the same for boundedly complete domains (in which directed subsets also have
joins), with the Scott-continuous function-space.
22. Construct the meet and join of a bounded inhabited set of real numbers, considered as Dedekind
cuts. Assume excluded middle. Under what (simple) condition does a monotone function R→ R
have adjoints? Express limsup and liminf as extrema.
24. Show that for any lattice X there is an adjunction omitted diagram environment If this is an
isomorphism then X is called a modular lattice.
25. Suppose F\dashv U and U\dashv F between posets. Show that they are mutually inverse, or form
a strong equivalence in the case of preorders.
A
26. Let U:A\twoheadrightarrow X be a surjective function between sets. Define a preorder ≤ such
X
that U is an equivalence function (Remark 3.6.7), where ≤ is discrete. Show that U is part of a
strong equivalence iff it is split epi. In other words, every weak equivalence is strong iff the
axiom of choice holds.
27. Let \leftadj1:S→ \A1 and \leftadj2: S→ \A2 be equivalence functions between preorders. Find
equivalence functions \funcG :\A → \bal B with \funcG1o \leftadj1 = \funcG2o\leftadj2. [Hint: let
i i
the underlying set of \bal B be the union of those of \A1 and \A2.]
28. Show that a semilattice equipped with an additional binary operation ⇒ is a Heyting semilattice
(Definition 3.6.15) iff it satisfies omitted eqnarray* environment
29. Describe implication and infinitary meet in the frame of open sets of any topological space.
[Hint: do negation first.] Let f:X→ Y be a continuous function. Describe the adjunctions f*\dashv
\funf* between the frames of open subsets of X and Y.
30. For any poset X, show that shv (X) is a complete Heyting lattice with [A⇒ B] = { x|∀y.x ≥ y ∈
A⇒ y ∈ B}. [Hint: use Proposition 3.1.8(b) and (-)∩A\dashv (A⇒ ( = )).] Show that x→ X↓ x
preserves ⇒ . This interpretation of intuitionistic implication is due to Saul Kripke and was
inspired by Definition 3.8.2 for modal logic.
31. Show that a poset is boundedly complete (Exercise 3.21) iff it has meets of all inhabited subsets.
32. Let X be a poset with T and U:X→ Y a monotone function to any poset. Show that it is cofinal iff
U(T) is the top element of Y.
33. Let F\dashv U with F injective. Show that F preserves minimal upper bounds, but U need not
preserve maximal elements. Conversely, find a criterion involving minimal upper bounds for F:X
⊂ A to have a right adjoint.
(a)
X↓ θ is a complete lattice for each θ ∈ X;
(b)
X has meets of all subsets which are bounded above ( wide pullbacks, Example 7.3.2(h)),
but not necessarily T;
(c)
for each diagram ℑ ⊂ X with an upper bound ℑ ≤ θ there is a unique minimal upper bound
for ℑ in X below θ.
X is then called an L-poset, and an L-domain if it also has all directed joins. Formulate and prove
an adjoint function theorem for L-posets. Show that if U:X→ Y preserves wide pullbacks and is
cofinal then it has a left adjoint. (Notice how introducing a degree of uniqueness improves the
result by allowing us to drop the injectivity assumption.)
35. Describe the closure and coclosure operations arising from the examples of adjunctions in
Section 3.6, and characterise the (co)closed subsets or elements.
36. Let M:P(Σ)→ P(Σ) be any closure operation. Show that it arises from the system of closure
conditions
K\triangleright t if t∈M
(K),
and that this is a saturated system of closure conditions, ie
37. Show that if M is Scott-continuous in the previous exercise then it suffices to use finite K.
38. Let M be a closure operation on a poset or preorder (S, ≤ ). Let A be the set S equipped with the
relation that A\preceq B if M(A) ≤ B. Show that (A, \preceq ) is a preorder which is equivalent to
the set of fixed points of M on S. Exercise 3.56 is an example of this construction.
39. (Tarski) Let S be a complete ( meet-semi) lattice and T:S→ S a monotone function. Put A = ∧{X:
S|T X ≤ X} and show that TA ≤ A. Using T(TA) ≤ TA, deduce that A is the least fixed point.
40. With the same data, call an element X ∈ S well founded if X ≤ T X and ∀U.TU∧X ≤ U ≤ X ⇒ U =
X. Show that if X is well founded then so is T X, and that any join of well founded elements is
again well founded, so there is a greatest well founded element, and it is fixed by T. Show that if
X is well founded and Tθ ≤ θ then X ≤ θ. Compare this with the proof of Proposition 3.7.11.
Show that the simpler condition that X ≤ T X and ∀U.TU∧X ≤ U⇒ U = X is equivalent to well-
foundedness.
41. Let T:S→ S be a ∧-preserving endofunction of a frame, X ∈ S a well founded element and Y ≤
X. Show that if also Y ≤ T Y then Y is well founded. [Hint: given TV∧Y ≤ V ≤ Y consider U = (Y⇒
V)∧ X.] Show that the property Y ≤ T Y is not automatic. [Hint: S = P(2).]
42. Let T:P(Σ)→ P(Σ) be defined as in Lemma 3.7.10 from a closure condition \triangleright . Show
that X is a well founded element of P(Σ) in the sense of Exercise 3.40 iff X has no non-trivial
relatively \triangleright -closed subset ( cf Definition 3.7.8). Show also that, when \triangleright
is itself defined from a binary relation \prec by Example 3.7.9(e), this is equivalent to well-
foundedness of \prec on X (Definition 2.5.3).
43. Proposition 2.5.6ff described classical approaches to induction. Discuss induction on closure
conditions (Definition 3.7.8) in this fashion, giving a condition for an element not to be in the
smallest closed subset.
44. Use Proposition 3.7.11 to show that the set of closure operations on a complete lattice S is itself
the image of a closure operation on [S→ S]. Now let S be a dcpo. Show that the set of Scott-
continuous closure operations on S is the image of a closure operation defined on id↓ [S→ S] ≡
{T :S→ S|\id ≤ T}.
S
45. (Dmitri Pataraia) By Lemma 3.5.7, the set of inflationary monotone functions id ≤ f:A→ A on
any dcpo A forms an ipo. Show that there is a greatest such function, g. [Hint: use composition to
show that the set is directed.] If A has bottom, show that g(⊥) is a fixed point for any inflationary
function f.
46. Let s:X→ X be a monotone endofunction of an ipo. Consider the smallest subset A ⊂ X with ⊥ ∈
↑
A which is closed under s and ∨ ( cf Exercise 6.53). Using Definition 3.7.8, show that the
restriction of s is inflationary on A, and so has a least fixed point. Show that this is in fact the
only fixed point in A, and is its greatest element, and also that it is the least fixed point in X.
Applying this to X = {X ⊂ Σ|θ[X]}, prove Theorem 3.7.13. If X has binary meets, show that its
subset of well founded elements (Exercise 3.40) has similar properties to A.
47. Use infinitary conjunction to interpret ∀θ. (Tθ⇒ θ)⇒ θ in a complete Heyting lattice and verify
Proposition 2.8.6.
INTRODUCTION
4.1 CATEGORIES
● Categories as theories
● Categories as congregations
● Categories as structures
● Size issues
● Operational interpretation
● Logical interpretation
● Normal forms
● The category of contexts and substitutions
● Terms as sections
● Use of variables
4.4 FUNCTORS
● Constructions as functors
● A classifying category
● The force of functoriality
● Full and faithful
● Examples
● Semantics of expressions
● The classifying category
● Composition
● Equivalences
● Functor categories
● The Yoneda Lemma
● 2-Categories
EXERCISES IV
Chapter 4
Cartesian Closed Categories
C ategory theory unifies the symbolic (Formalist) and model-based (Platonist) views of mathematics. In
particular it offers an agnostic solution to the question that we raised in Section 1.3 of whether a function
is an algorithm or an input-output relation.
Traditionally, categories were congregations, each object being a set with structure: a topological space,
an algebra or a model of some theory. The morphisms are functions that preserve this structure
(homomorphisms), so the notion of composition is ultimately that for relations (Definition 1.3.7). As an
approach to logic, this went round ``three sides of a square'' ( cf Remark 1.6.13 for model theory) and so
ran into some foundational problems over the category of all categories.
In informatics, the principal examples are constructed from λ-calculi and programming languages; being
syntactic, they are typically recursively enumerable. Composition is by substitution of terms (Definition
1.1.10), or by the cut rule (Definition 1.4.8), which uses old conclusions as new hypotheses. A category
Cn[] of this kind encodes a certain theory L itself, instead of collecting its models; we call it the
L
category of contexts and substitutions by analogy with categories of objects and homomorphisms in
semantics.
We shall give a novel construction of Cn[] that embodies well-established techniques for proving
L
correctness of programs and works uniformly for any fragment of logic. At the unary level, the ideas
come from geometry and physics (groups), automata and topology; we also carry it out for algebraic
→
theories ( Cn× ) before turning to the λ-calculus ( Cn ).
L L
The fragment of logic in question, [], corresponds to certain categorical structure defined by universal
properties: products and exponentials in this chapter, coproducts and factorisation systems in the next.
The recursively defined interpretation functor [[-]]:Cn[] → S preserves this structure, so the semantic
L
universe S must also have it. Cn[] is also called the classifying category for (models of) the theory,
L
4.1 Categories
DEFINITION 4.1.1 A category C consists of
(a)
a class of objects, obC,
(b)
for each pair of objects X,Y ∈ obC, a set of morphisms, C(X,Y) (for f ∈ C(X ,Y) we write f:X→
Y, calling X the source and Y the target; the words domain and codomain are synonyms for
source and target, and map and arrow are synonyms for morphism ; as a throwback to the time
when categories were only used for algebras and homomorphisms, C(X,Y) is called the hom-set),
(c)
for each object X ∈ obC, an identity, \id ∈ C(X,X), and
X
(d)
for each triplet of objects, X, Y, Z ∈ ob C, a composition,
C(X,Y)xC(Y,Z)→ C(X,
Z),
which we write synonymouslyas f;gor gof(composition is a functionbetween hom- sets, whereas f:X→Yis
an abstract arrow),
such that f;id = f = id;f and f;(g;h) = (f;g);h, so the composition (;) is associative and id is a unit:
CONVENTION 4.1.2 The order of composition is a contentious issue. The left-handed notation (o) is older,
both in category theory and in mathematics as a whole, and comes from the custom of writing function-
application on the left - apart from the factorial function! We shall not challenge this custom itself:
juxtaposition will always mean application on the left; moreover we shall adopt the convention from the
λ-calculus that ΦF X means (ΦF)X. If functional composition arises by abstraction of application, it is
clearer to use the left-handed notation.
For those literate in Arabic or Hebrew, diagram-chasing in the right-to-left notation may present no
problem, but for the rest of us it can be rather confusing. There are even situations, such as the
composition of adjunctions (Lemma 3.6.6), where it helps to use both conventions together. Of course
we distinguish them by using two different symbols (; and o), and we will always use one or other of
them, without relying on juxtaposition. In practice there is no need to decorate these symbols with the
objects to which they apply, though it is useful to annotate \id .
X
Categories as theories In this book we shall be most interested in how categories can describe type-
theoretic phenomena. The objects are contexts (lists of typed variables and predicates) and the
morphisms are assignments (substitutions) of terms or proofs to these variables.
Categories as congregations First, however, we give the usual list of mathematical structures and their
homomorphisms. These categories are traditionally named by their objects, but in some cases such as
matrices and programs the arrows are more prominent: following Peter Freyd and Andre Scedrov
[FS90], we speak of the category composed of named maps, here relations.
LEMMA 4.1.3 Sets and binary relations form a category Rel, where the identity on X is the equality
relation ( = ), and the composition is the relational one given in Definition 1.3.7.
X
xRy.
Showing that R;( = ) = R is similar. For associativity,
Y
\leftharpoondown\rightharpoonup Z Set
DEFINITION 4.2.1 A covariant action of a group or monoid (M,id,o) on a set A is a binary operation
\blank*( = ):MxA→ A such that
for all a ∈ A. Similarly a contravariant action is a binary operation (-)*( = ):MxA→ A such that id*a = a
and
Usually we treat the star as a unary operation on its first argument, so the abstract arrow f is represented
by a concrete function \f*:A→ A or f*:A→ A such that \id* and id* are the identity on A and
M Set M Set *
(go f)* = \g*o \f* (go f)* = f*o g ≡ g*;
f*.
Notice that these laws link the composition in M to that in Set.
op
A contravariant action of M is the same as a covariant action of M , the opposite monoid, cf Definitions
1.3.9 , 3.1.5 and 4.1.1.
A faithful action is one for which things are semantically equal only when they are syntactically the
same:
(∀a:A.\f*a = \g*a)⇒ f =
g.
When the structure A is considered to be variable, we shall write \f* as whichever of \f or \typeA better
A f
A f
expresses the emphasis intended at the time, and similarly f* as f or A .
EXAMPLE 4.2.2 Let M = (R,1,x) and A be a vector space. Then multiplication of a vector a ∈ A by a
scalar f ∈ R is an action. Notice that it preserves the (additive) structure of A as well as the
multiplicative (and additive) structure of R.
EXAMPLE 4.2.3 Rubik's cube consists of 27 pieces jointed in such a way that any of the six faces (each
with nine pieces) can be rotated through a quarter turn. From the home position, in which each face is
uniformly coloured, 212·12!·38·8!/12 ≈ 4.3·1019 positions can be reached. The quarter turns generate a
group of this order which acts on the set of components of the cube.
To solve this puzzle, ie to restore a jumbled cube to its home position, you need to know the complicated
laws which the generators satisfy. However, if I presented a group with six generators by just giving
these laws, without telling you that it acts on this structure, the problem would be more difficult: there is
no general algorithm to decide whether the group defined by an arbitrary presentation is finite, or on the
other hand whether it is non-trivial. Syntax - the expression of elements of the group as strings of
generators subject to laws (``relations'') - gives us very little help in understanding the group. Indeed the
structure on which it acts is the only thing we can use to give an explicit description. The semantics
gives a kind of tally of the syntax, and then this group may be characterised quite straightforwardly in
the language of group theory. The imperative interpretation (Remark 4.3.3) develops the ``tally'' idea.
It was originally thought that the following result, Cayley's Theorem, would eliminate the need for the
abstract study of groups.
PROOF: Let (M,id,o) be a group or monoid, and put A = M, ie the underlying set. Then \f*a = fo a and f*a
= f;a = ao f define the covariant and contravariant regular actions, respectively. They may be seen to be
faithful by considering the effect on id ∈ A. []
The importance of the result is undisputed, as is shown by the fact that it will shortly turn into a Lemma
named after someone else, but the reductionist motive was mistaken. Both the abstract and concrete
approaches are needed to complement one another. Indeed, the same group, eg A5 ≡ PSL(2,5)
(page 2.2.9), may have two intuitively unrelated concrete representations. Likewise, the beautifully
simple but powerful theory of matrix representations of finite groups relies precisely upon considering
all such representations. Again this supports the thesis that what mathematical objects do is more
important than what they are.
The Rubik cube group was specified by means of its six generators and their action on the cube, from
which we could see the laws or relations. The converse process - deriving the concrete form from the
laws - is, as we have said, notoriously difficult. As for expressions in Lemma 1.2.4ff, we just have to
accept that the equivalence classes can be constructed somehow. In Sections 5.6 and 7.4 we shall
examine such quotienting (for arbitrary laws) categorically.
Sketches Let us transfer these ideas from groups to categories, starting with generators and laws.
(a)
named base types or sorts, X, Y, ... (there are no constructors);
(b)
one variable x:X for each occurrence of each sort;
(c)
unary operation-symbols or constructors r (in preparation for our treatment of type theory we
write x:X\vdash r(x):Y or just X\vdash r:Y, using the turnstile \vdash instead of an arrow to
emphasise that there is no function-type);
(d)
laws, \r (\r (···\r2(\r1(x))· ··)) = \s (\s (···\s2(\s1(x))· ··)).
n n-1 m m-1
We write Σ for the set of sorts. A free unary theory has no laws.
EXAMPLES 4.2.6 This basic tool has many different names, and it would be instructive to read through
the remainder of this section several times, substituting each of the following points of view in turn.
(a)
If there is only one sort, the operation-symbols generate a monoid (whose elements are the terms)
subject to the given laws.
(b)
If, further, there are no laws then these terms are simply lists of operation-symbols
(Definition 2.7.2) .
(c)
Instead of specific laws it may be understood that all parallel pairs of maps are equal
(Proposition 4.1.5). The sorts are just individuals (from a set Σ) without internal structure, and
the operation-symbols are the instances of a binary relation < on Σ. Terms are instances of the
reflexive-transitive closure ≤ (Section 3.8 and Exercise 3.60).
(d)
In particular, if the types are propositions, the operation-symbols are deduction steps and the
terms are proofs.
(e)
A free unary theory is just a (labelled) oriented graph. A graph in this sense may have loops and
multiple edges, unlike in combinatorics ( cf Example 5.1.5(e)) , and we use the word oriented to
avoid confusion with directed diagrams (Definition 3.4.1). The types are called vertices or nodes,
and the operation-symbols are (oriented) edges. Terms are paths. This is the many-sorted version
of (b).
(f)
A free unary theory may be seen as a deterministic automaton. The types are called states and
the operation-symbols are actions. It is deterministic because the action labelled r has a unique
target. Now terms are acceptable words or behaviour traces, and may be described by a regular
grammar.
(g)
Unary theories are also called linear or elementary sketches. The types are objects, the operation-
symbols are arrows or generating morphisms, and the laws are called commutative polygons.
(h)
A unary theory also describes a polyhedron in which the types are vertices, the operation-
symbols are oriented edges and the laws are faces, though it need not be embeddable in R3. Each
raw term is given by an oriented path, but the faces generate an equivalence relation, as follows.
A path which follows (all of) one half of a face may be ``dragged across it'' and is equal to
the path taking the alternative route. The terms are homotopy classes, and inequations are
holes (\puncture ).
A law is a (so-called commutative) (n+m)-sided polygon which has exactly one source and exactly one
target (or sink): at every other node there must be just one incoming and one outgoing edge. The target
is the type of the term and the source is (the type of) the free variable. Variables are redundant, as is the
notion of application: only associative graphical composition of arrows remains. Indeed commutative
diagrams are the best way to illustrate unary first order equational reasoning.
When we draw sketches, as for instance in Example 4.6.3(f), we may name several nodes with the same
type X. This is to avoid appearing to state unintended laws, where it is understood informally that the
polygons which we draw are meant to commute. (This convention is made formal in [FS90], using the
puncture symbol \puncture to indicate that an apparent law is not required, though this does not mean
that the equation is forbidden in any interpretation.) All occurrences of the same named type X must
nevertheless be interpreted by the same set \typeA .
X
DEFINITION 4.2.7 A model (also called an algebra, interpretation , representation or covariant action)
of an elementary sketch is
(a)
an assignment of a set \typeA to each type name X and
X
(b)
a function \r*:\typeA → \typeA to each operation-symbol r, such that
X Y
(c)
each law \r (···\r2(\r1(x))··· ) = \s (···\s2(\s1(x))···) holds, in the sense that \r *(···\r2*(\r1*(a)) ···) =
n m n
\s *(···\s2*(\s1*( a))···) ∈ Y for all a ∈ \typeA , where src\r1 = X = src\s1 and tgt\r = Y = tgt\s .
m X n m
(a)
List(X) is given by \opz :A and \ops :XxA→ A in Remark 2.7.10;
A A
(b)
a unary closure condition \triangleright in Ω is a \triangleright -closed subset or trajectory in the
set Σ of sorts (Examples 3.8.1);
(c)
a polyhedron is a geometric realisation;
(d)
an automaton is the regular language which it recognises.
PROPOSITION 4.2.9 Every elementary sketch has a faithful covariant action on its clones H = ∪ \Clone
X Γ L
Y
(Γ,X). Substitution for the variable defines a contravariant action on H = ∪ \Clone (Y,Θ), also
Θ L
faithful.
PROOF: Recall from Notation 2.4.12 that the clone, \Clone (Γ,X), is the set of terms of type X in the
L
context Γ, subject to the laws. In our case Γ consists of a single typed variable, so we abuse notation by
writing Γ for the type too. A term of type X is a composable string of unary operation-symbols applied
to a variable σ:Γ (σ for ``state'').
Y
The actions of r:X→ Y on Cn(Γ,X) ⊂ \H and Cn(Y,Θ) ⊂ H are
X
=
\z (···\z2(\z1(r(x)))·· ·) ∈ \Clone (X,Θ)
m L
with σ:Γ, x:X and y:Y; they are faithful by considering the empty strings (n = m = 0). []
Analogously, Definitions 3.1.7 and 3.9.6ff represented each element x of a poset X covariantly as the
lower subset X↓ x and contravariantly as the upper set x↓ X. Lists were used to form the transitive
closure of a binary relation in Exercise 3.60. We shall postpone the analogue of Proposition 3.1.8 (called
the Yoneda Lemma) to Theorem 4.8.12.
EXAMPLE 4.2.10 The Lindenbaum algebra (Example 3.1.9) gives the regular representation of
propositions. For types this is the term model.
(a)
The covariant representation of a term is the effect of substituting values for its free variable.
(b)
The contravariant representation is the result of substituting the term itself for a free variable in
Convention 4.1.2, that juxtaposition means application on the left, is clearly not very appropriate for
unary theories. Many algebraists, in group theory in particular, apply functions on the right, and abstract
this to composition without any sign; this is unambiguous in that subject because the language is first
order and strongly typed. As we shall soon be passing on to many-argument and higher order languages,
we shall put up with the earlier convention.
This interpretation is sound: the semantics obeys the rules specified by the syntax. It is also complete:
any two terms which are semantically equal may be proved to be so using the given rules, because we
only made them equal when the rules said so; this is what faithful means.
Generating a category A faithful action can be used to represent a category concretely: the object X is
the set \typeA and maps f:X→ Y are those functions (``homomorphisms'') \typeA → \typeA which
X X Y
arise as actions. In the case of the Cayley-Yoneda action, we shall characterise them intrinsically in
terms of naturality in Example 4.8.2(f).
The category obtained from an elementary sketch via its action has the same objects as the sketch. The
maps of the category are composites of those of the sketch. In the case of a free unary theory, the maps
are lists (Definition 2.7.2) of generators, though we postpone the proof to Theorem 6.2.8(a). With laws,
they are equivalence classes of lists.
Since the action is faithful, the only equations amongst maps are those provable from those given in the
sketch plus the axioms for a category. Lemma 1.2.4 gave the preorder version of this construction;
recall that it made objects interchangeable (isomorphic), but didn't pretend to make them equal.
LEMMA 4.2.11 The hom-set \Clone (X,Y) in the category is the clone of the same name. The identity \id
L X
is the variable x:X qua term, and composition is substitution, which is associative. []
The category saturates the sketch by adding into \Clone (X,Y) all derived operations of type Y (these are
L
just composites in the unary case, cf the reflexive-transitive closure of a relation), and making all
provable identifications amongst them. To sum up the precise relationship,
THEOREM 4.2.12 Every elementary sketch presents a category, and conversely, any small category C is
presented by some sketch L in the sense that C ≡ \Clone . We write \qqdash for this isomorphism and
L
call the sketch L = L(C) the canonical elementary language of C ( cf Proposition 3.9.3). It is defined as
follows:
(a)
the sorts \qq X of L(C) are the objects X of C,
(b)
the operation-symbols \qq f are its morphisms f, and
(c)
the laws are \qqid (x) = x and \qq g(\qq f(x)) = \qq go f(x),
using \qq X and \qq f to distinguish the linguistic sorts and operation- symbols from the objects and
morphisms of the original category. []
We are accustomed to writing languages in alphabets of 26 or maybe 128 enumerated symbols, but for
this construction we need a ``letter'' for each object and morphism of C. Questions such as whether we
can distinguish between letters or form a dictionary of the words now arise, to which one answer might
be a severe restriction on the applicability of this result. This is not the line which we follow, but the
issues deserve separate consideration , which we defer to Section 6.2.
The notion of sketch interpolates between those of oriented graph and category: it mentions some
composites, where the extremes require none or all of them. For a category, the representations of an
object X given by Proposition 4.2.9 reduce simply to ∪ C(Γ,X) and ∪ C(X,Θ), where we no longer
Γ Θ
have to form explicit composites.
The geometrical interpretation shows that unary theories have an input-output symmetry. This is broken
by the term calculi of the richer type theories, but the closer analogy is retained if we consider programs
or substitutions instead. By examining the covariant and contravariant actions of the latter, we shall next
give a construction of the category of contexts and substitutions which is directly applicable to a very
wide class of formal languages. The use of sketches to do this is new.
This section has shown how categories can present combinatorial data in algebra, topology and logic.
The proofs are not complicated, but nor are the ideas trivial or immediately grasped: you should go back
and use them to express any familiar examples as a category. This is important because our account of
substitutions and hence the semantics of type theory depend on it.
DEFINITION 4.3.1 The direct declarative language has the following syntax for \bnfname programs
(sometimes called commands):
subject to the weak variable convention that no declaration may be made of a variable (with the same
name as one) which is already in scope. A variable is said to be in scope from its put to its discard
command; of course only the variables in scope may occur freely in \bnfname terms. In the construction
of this section, the notion of \bnfname term is meant to be an indeterminate one, which will be taken to
be algebra in Section 4.6, and the λ-calculus in Section 4.7. Conditionals, loops and other constructs will
be added later. The put command is commonly called let, but we wish to maintain the distinction made
in Remark 1.6.2 between definite and indeterminate values.
The direct declarative language alone is rather mundane, so we shall borrow an extension from
Section 5.3 to present a famous example.
Operational interpretation Treating the language as imperative, the state at any point of the
execution of a program like this is determined by the tuple of current values of the variables in scope.
The type of states is the cartesian product of the sets over which the variables range. Notice that the put
and discard commands change not only the value of the state σ but also its type, so we really need a
category and not just a monoid (or semigroup, ie monoid without identity) to interpret the language.
(a)
contravariantly by their effect on continuations, and
(b)
covariantly by changing the initialisations of program-variables,
REMARK 4.3.3 Our language does not seem to have an imperative flavour, but in fact the weak variable
convention allows assignment to be defined in it:
EXAMPLE 4.3.4 This program finds the one real (x0) and two possibly complex roots (x ) of the cubic
±
equation x3+ax2+bx+c = 0. The index n takes each of the values 0, +1 and -1, and ω = -1/2+1/2[√(-3)]
denotes a complex cube root of unity.
q2 00 { = [1/27]a3-2q-ap}0 then 0
Logical interpretation What the covariant imperative action means for more complex languages
becomes less clear. It also has the weakness that it acts on terms which may be substituted for the free
variables, which must therefore belong to the same calculus, here a programming language. The
contravariant action, by substitution into expressions, is not restricted in this way: these expressions may
belong to a much richer language, such as higher order logic. Recall that we also used predicates (or
subsets) to represent posets in Definition 3.1.7ff.
This method of proving partial correctness of programs is due to Robert Floyd [Flo67], though he
presented it for flow charts. The notation γ{u}φ is due to Tony Hoare. Floyd also gave the criterion for
termination (Remark 2.5.13) .
{γ} u
{θ}
to mean ``if γ was true before executing u then θ will be true afterwards,'' γ and θ being known as pre-
{θ}
skip {θ}
^
* θ} discardx {θ} , where x ∉ FV
{x
(θ)
omitted prooftree
environment
omitted prooftree
environment
Or, in the informal notation (without [^(x)]* from Notation 1.1.11),
{θ[a]} putx = a {θ[x]}
{θ} discardx {θ}.
In the case of the whole program, γ and θ constitute the specification, for instance in the Example it is
that the program produces the solution to the cubic equation. Of course we may always take
γ = ⊥ and θ = T, but this vacuous specification says that the program is good for nothing.
It is natural to insert midconditions between the lines of the program, instead of this repetitive sequent-
style notation. A fully proved program consists of phrases of proof interrupted by single commands. The
latter, together with the proof lines either side, must obey the Floyd rules.
The midconditions need not be computable. Even when they are, it would often be more difficult to
compute them than the program itself. Sometimes they involve universal quantification over infinite
sets, so they are of strictly greater logical (quantifier) complexity. The proofs may be arbitrarily
n n n
complicated: it is true that given integers x,y,z ≥ 1 and n ≥ 3 the program x +y -z will always produce a
non-zero answer, but the proof- phrase took 357 years!
REMARK 4.3.6 We may also read put as an abbreviation, local definition or (as we called it in
Definition 1.6.8) a declaration within the proof. We showed there that a declaration may itself be treated
as an ( ∃E )-proof box - the scope of the variable. The box is open-ended: it extends until the end of the
argument, or of the enclosing conditional branch.
As we saw, the weak variable convention allows us to define assignment, but the proof rules then
become much more complicated. The original (strong) convention gives referential transparency, the
free ability to import formulae ( cf Lemma 1.6.3). But if we have shown that x is odd and then do x: = 2
we may no longer use the previous knowledge. For this reason assignment is to be avoided in
programming.
REMARK 4.3.7 For any u and θ there is a weakest precondition, obtained by letting u act on θ by
substitution:
Normal forms The operational interpretation - the execution of the program on numerical values - is not
the only notion of computation. As in the λ -calculus, there are rules for rewriting programs with the aim
of putting them in normal (or, as it is called in algebra, closed) form.
where x and y are distinct variables with x,y ∉ FV(a) and y ∉ FV(b). Of course skip is the identity and ;
is composition. The last law, which is not redundant, says that x: = x does nothing. Assignment is the
simplest case of conflict between the names of input and output variables, which must be resolved by
renaming. We shall discuss the orientation of these laws as reduction rules in Remark 8.2.7.
THEOREM 4.3.9 Every program of the direct declarative language is equivalent (with respect to the above
laws) to one in normal form:
→ → → → → →
; ; ;
put z = a x y = z z
discard put discard
where FV([(a)\vec]) ⊂ [(x)\vec] are the inputs and [(y)\vec] the outputs. This means
(a)
first the (renamed) outputs are declared by \bnfname terms in which only the input variables
occur, not the output ones;
(b)
then the input variables are discarded;
(c)
finally the output variables are renamed. []
The proof (by induction on the length of the program) is left as a valuable exercise. The point is to show
that the laws suffice to capture the familiar process of eliminating intermediate variables (p, q, etc in the
Example). Beware that we are only normalising the connectives defined in this section: the theorem says
nothing about any normal forms of the \bnfname terms themselves ( eg from the λ-calculus). The normal
form is unique (up to order, which may be canonised, and the choice of new names), so it may be used
to compare programs and make deductions about commutative diagrams (Remark 7.6.12). But it is not a
good way to define the category, as (the proof of) the theorem is needed every time we compose two
morphisms, cf Example 1.2.1.
REMARKS 4.3.10
(a)
Normalisation uses the second law from left to right, but (in the opposite sense) the derived law
putx = a[z: = u] \leadsto putz = u; put x = a;
discardz
decomposes terms in algebra into operation-symbols and sub-terms, and hence to operation-symbols
alone. On a machine the latter are calls to library routines, so this process ( compilation) reduces any
program to a sequence of such calls. The second law then becomes redundant, being replaced by those
of the language itself.
(b)
The weak variable convention is necessary to define the category composed of programs.
The object at each semicolon is the set of typed variables in scope there.
In particular the source and target of the program are the lists of types of the input and output variables
respectively. It is convenient to assume that these have no local variable names in common.
(c)
From the example and theorem, we see that the put and discard commands do not form a nested
system like proof boxes; indeed if they did the program would just throw away its results.
Remark 1.6.5 already gives some freedom to choose when to close ( ∃E )-boxes and definitions.
The laws for discard extend this conservatively, ie they give no new equivalences between
properly nested boxes. In fact the proof of Lemma 1.6.6 illustrates that the natural scope of
( ∃E )-boxes is not necessarily nested.
(d)
The discard commands (which Floyd called undefine) have been added for an exact match with
the categorical concepts which are used to interpret the language, but experience shows that a
compiler can do a more accurate and efficient job if it is given better type and resource
information about the intentions of the programmer. If the variables obey the strong Convention
1.1.8 ( ie they cannot be reused, so assignment is not allowed) the laws allow us to move all of
these commands to the end of the program. As compilers do this automatically, discard is
redundant in programming.
The language defines a sketch whose objects are sets of typed variables; the maps are programs whose
source and target objects are the variables which are in scope before and after. We have given familiar
covariant and contravariant actions, and five laws which are sound for them. The normal form theorem
shows that the substitution action is faithful , and indeed that we need only consider its effect on
variables; in other words the five laws provide a complete axiomatisation.
The category of contexts and substitutions These programs are not exactly the notation which we
introduced in Section 1.1, but the difference is ``syntactic sugar.''
DEFINITION 4.3.11 The category of contexts and substitutions, which is called Cn× , is presented by the
L
(a)
x
The objects of \Clone are the contexts of L (Notation 2.3.6), ie finite lists of distinct variables,
L
\vec]], [[(X)\vec]], [(x)\vec] or Γ, with len[(x)\vec] = k. As far as Section 4.6, the \typeX will
i
just be base types (sorts), then in Sections 4.7 and 5.3 we shall begin to allow expressions such as
X→ Y and X+Y for the types of the terms. Lists of types on both sides of the turnstile mean
products.
(b)
The generating morphisms are put (declaration) and discard, ie
❍ single substitutions [x: = a]:Γ→ [Γ,x:X] for each term a:X in the context Γ, where x is a
new variable of the same type X;
❍ single omissions [^(x)]:[Γ,x:X]→ Γ for each variable x:X; a morphism of this special form
is called a display map, for reasons which will emerge in Chapter VIII.
(c)
The Extended Substitution Lemma ( 1.1.12) gives the laws: omitted eqnarray* environment
where x\not ≡ y, x,y ∉ FV(a) and y ∉ FV(b). They are shown as commutative diagrams in
Section 8.2.
For a Horn theory, Theorem 3.9.1 generated the analogous preorder Cn\land of contexts under
L
provability, by the two cases of omitting a proposition from a context and using a single instance of the
closure condition.
There is nothing in this construction which is peculiar to either algebra, programming or the λ-calculus:
it may be applied to any typed calculus of substitutions. The maps have been written with a substitution
notation, because this is the notion of composition. The λ-calculus defines another composition
operation via abstraction and application, but it is only associative after the β- and η-rules
(Definition 4.7.6) have been imposed, and then the two forms of composition agree. Substitution is a
primitive of symbolic manipulation, the λ-calculus is not.
In Section 4.7 we shall begin to add type constructors such as → to the logic. Then [f:(X→ Y)] will be a
→
valid context, and so will be added to the category as a new object. We shall write Cn for the larger
L
category, in which the morphisms are formed by λ-abstraction and application. But it turns out (and this
is an important theorem) that there will be no additional maps between the old objects, nor do maps
→
which were previously distinguished become equal: we say that Cn is a conservative extension of
L
Cn× . It may provide more powerful methods of reasoning, without doing anything which we couldn't
L
have done before. In other words, it gives short proofs of facts which were already true in the simpler
system, but which would have taken much (maybe hyper-exponentially) longer to prove. Other type
constructors extend the category further, and we write Cn[] for the generic situation; in fact we have
L
already made such an extension from the unary case in Section 4.2. We shall discuss conservativity in
Terms as sections Theorem 4.3.9 substantiates the remarks about simultaneous substitution which we
made after we first introduced the Substitution Lemma 1.1.5.
NOTATION 4.3.12 Any map may be written uniquely as a multiple or simultaneous substitution,
ie a sequence of bindings of terms to variables, where y and \arga have type \typeY , and FV(\arga ) ⊂
j j j j
The sources and targets of the maps are ambiguous in this notation. There may be more variables in the
source context than are mentioned in the substituted terms, and it is not clear whether they should
survive or be forgotten in the target. Indeed we took advantage of the ambiguity by saying that the
substitutions for different variables commute. The (strict) notion of commutativity in monoids does not
extend to categories because the sources must agree with the targets, but it becomes meaningful in
situations like this, where maps with different endpoints have ``essentially'' the same effect, differing
only in their passive contexts. This is what led to the ``commutative'' diagram terminology.
COROLLARY 4.3.13
(a)
A map a:Γ→ [Γ,x:X] is a single substitution iff it is a section of the display [^(x)]:[Γ,x:X]→ Γ , ie
a;[^(x)] = \id . Then a = [x: = a], where the term Γ\vdash a:X is determined by a = a*x.
Γ
(b)
Sections a:Γ→ [Γ,x:X] ≡ Γx X correspond bijectively to total functional relations Γ
\leftharpoondown \rightharpoonup X, cf Exercise 1.14.
(c)
The clone \Clone (Γ,X) is isomorphic to the hom-set Cn× (Γ,[x:X]) , whence the deliberate use of
L L
similar notation.
(d)
len X
More generally, Cn× (Γ,[[(x)\vec]:[(X)\vec]]) ≡ ∏ [( )\vec] \Clone (Γ,\typeX ). []
L i=1 L i
We shall use display maps and their sections to recover a generalised algebraic from a category in
Chapter VIII.
Use of variables In the last section terms of a unary language were called (oriented) strings of
operations.
REMARK 4.3.14 Let's say wires instead; then a morphism in the many-argument version is to a term in
the unary one as a multi-core cable is to a single wire. We now need a way to distinguish the wires,
where we did not before: this is what variables do. We use German and italic letters for the cables and
single wires respectively.
In the foregoing account we have chosen to colour the wires with variable-names. Many authors prefer
to number the pins in the plugs and sockets instead. That is, they specify that x1,…,x are the actual
n
names rather than meta-variables ( cf Definition 1.1.9), and have to renumber them at every stage.
According to this convention, a morphism is a list of equivalence classes of assignments of the form [x1,
…,x → a], where
n
→ →
[x1,…,x → a] = [y1,…,y → a[ x : y ]] .
n n
=
This ubiquitous renaming needlessly complicates the formal discussion of the notation. It is useful as an
informal way of naming a single map, to avoid choosing names for the target variables. However, in a
diagram, where such maps are to be composed, it is often more convenient to attach distinct variables to
the objects ( ie to adopt our convention) in order to use them in a global symbolic argument about
commutativity.
In this book we shall keep an explicit distinction between free variables, so that [x:X] and [y:X] are
isomorphic but unequal contexts. The difference is that our category has many isomorphic duplicates of
each object X, but the leaner one may be obtained from it by a straightforward construction from abstract
category theory (Exercise 4.7). We have chosen this convention in order to take best advantage of
variables as they are normally used in mathematics, namely to relate quantities defined in one part of
any argument to their use in another.
A change of free variables we call open α-equivalence . When we come to λ-abstraction, terms differing
only by the names of corresponding bound variables will be considered to be the same. The context says
which free variables are allowed, but the bound ones are unlimited.
A map is in fact not just a cable but a device with inputs and outputs. Corollary 4.3.13(d) says that if the
output is a tuple then it may be split into several (multiple-input,) single-output devices. Yet another
common metaphor is to think of each term as a tree (Remark 1.1.1) and the maps are forests (collections
of trees); composition is by substitution of the roots of one forest for the leaves of another!
Theorem 3.9.1 also described Cn\land using proof trees.
L
4.4 Functors
Since an action takes composition in the syntax to composition in the semantics, it is an example of a
homomorphism of categories.
(a)
a (class) function \funcF :obC → obD, together with
o
(b)
a function \funcF :C(X,Y)→ D(\funcF X,\funcF Y) for each pair of objects X,Y ∈ obC
X,Y o o
C D
\funcF (id\C ) = id\D\funcF \funcF (f; g) = \funcF (f); \funcF (g).
X,X X o
(X) X,Z X,Y Y,Z
op op
A functor F:C→ D or F:C → D may be called a contravariant functor from C to D, the usual case
being styled covariant if emphasis is needed. To avoid the confusion caused by discussing morphisms of
an opposite category explicitly when describing contravariant functors, it is usual simply to define
\funcF , :C(X,Y)→ D(\funcF Y,\funcF X).
XY o o
Since the essence of a functor is that it is defined in a ``coherent'' fashion for all objects and morphisms
together, the subscripts and superscripts are omitted: we write F X and F f for the application of the
functor to an object or morphism. If it is defined on objects by built-in notation such as C(X,-) or Y(-) this
can look a bit strange when applied to maps.
Of course given another functor G:D→ E we can apply this too, writing the result as G(F X) or G(F f),
with the brackets. The abstract theory of functors is a good example of a unary language
(Definition 4.2.5), and would be clearer in the left-to-right notation without operators or brackets. For
the sake of conformity with other notations and concepts, we shall, however, always write composition
of functors from right to left as G·F, and not using juxtaposition.
(a)
between preorders considered as categories is exactly a monotone function (Definition 3.1.5), and
a contravariant functor is antitone;
(b)
between monoids, groups or groupoids is exactly a homomorphism;
(c)
between equivalence relations is a function between their quotients (Remark 1.3.2, Examples
2.1.5 and 3.1.6(d));
(d)
from a group to the category of vector spaces is a linear or matrix representation of the group;
(e)
op
from a poset X to Ω is an upper subset of X (Example 3.1.6(f)), and X → Ω is a lower subset
(Definition 3.1.7);
(f)
from C to Set is a covariant action of C (Definition 4.2.7);
(g)
op
from C to Set is a contravariant action of C on sets; it is also called a presheaf on C, cf
Definitions 3.1.7 and 3.9.6. []
Constructions as functors
EXAMPLES 4.4.3 The following are often known as forgetful functors or underlying set functors. This
terminology should only ever be used when the meaning, ie just what is being forgotten, is completely
clear from the presentation of the category. Notice that we may forget (a) properties of objects,
(b) properties of morphisms or (c) structure on an object together with the property of morphisms that
they preserve the structure. The last is the commonest situation. In all cases composition is preserved
because it is defined in the same way on both sides.
(a)
Pos→ Preord , DLat→ Lat, IPO → Dcpo, AbGp→ Gp, CMon→ Mon etc . which forget the
significance of laws and the descriptions of special elements such as ⊥;
(b)
Pos\dashv→ Pos, Set→ Pfn and Pfn → Rel which forget that all joins exist, and totality and
functionality of relations;
(c)
Heyt→ Lat→ SLat→ Pos→ Set , Dcpo→ Pos, Mon → Set and Rng→ AbGp which forget
operations and their preservation.
Besides forgetting things, functors also arise from constructions, where now one may need to check
preservation of (identities and) composites.
(a)
Pfn→ Pos, which takes a set X to its lift, Lift X, with the information order (Definition 3.3.7), and
a partial function f:X\rightharpoonup Y to the monotone function (U ∈ LiftX)→ {y|∃x ∈ U.xf→
y};
(b)
Rel→ CSLat by X→ P(X) and R→ (U→ {y|∃x ∈ U. x Ry});
(c)
CSLat→ Pos\dashv, which equips a function that preserves all joins with its unique right adjoint
(Theorem 3.6.9);
(d)
Dcpo→ Sp by the Scott topology (Proposition 3.4.9);
(e)
op
Sp→ Loc ≡ Frm by the frame of open sets;
(f)
Sp→ Preord by the specialisation order (Example 3.1.2(i)).
PROOF:
(a)
[[a]] First check that the result of the functor applied to a map is a map of the right kind, in this
case {y|∃x.x ∈ U ∧x,y ∈ f} ∈ Lift Y. This and preservation of composition are technically the
same as the fact that relational composition preserves functionality (Lemma 1.6.6).
(b)
[[b]] Powersets have arbitrary joins, given by unions. These are preserved by the formula shown
for morphisms, and in fact any join- preserving function P(X)→ P(Y) arises uniquely in this way.
The other examples rely on composition of adjunctions (Lemma 3.6.6) and of continuous functions. []
A classifying category We saw that a category is what is required to express a unary algebraic theory.
An interpretation of such a theory is similarly given by a functor. Any category may play the role of Set:
we restrict to the special case simply because we did in Section 4.2.
THEOREM 4.4.5 Let L be a unary language and \Clone the category it presents by Theorem 4.2.12. Then
L
PROOF: Let the interpretation be \typeA on sorts and \typeA :\typeA → \typeA on operations. These
X r X Y
are already part of the required data for a functor \typeA(-):\Clone → Set, but it remains to define its
L
effect on strings. This is uniquely determined by preservation of (the identity and) composition. Using
list recursion, the identity is the base case and composition ( cons) the recursion step. Proposition 2.7.5,
which showed that append is associative, guarantees that this too is preserved.
Where a law is given to hold in the interpretation of L, this means exactly that the functor takes equal
values on the corresponding strings of operation-symbols. Conversely any functor \Clone → Set
L
restricts to the sorts and generating arrows in a way which satisfies the laws. []
How interpretations and functors correspond is what matters here, not just the fact that they do.
(Category theory is in a real sense constructive logic, since the proofs are usually needed to give an
accurate statement of the theorems.) Theorem 4.6.7 extends the result to Cn× and, since it discusses
L
algebra, has a more type-theoretic flavour; later we shall do the same for larger fragments of logic.
Example 4.8.2(d) shows how the correspondence deals with homomorphisms.
The propositional analogue at the unary level is simply that a function f:(Σ, < )→ (Θ, ≤ ) obeys x < y⇒
Θ
f(x) ≤ f(y) iff the same function p:\Clone = (Σ, ≤ )→Θ is monotone when the source is considered to
Θ L
carry the reflexive- transitive closure (Section 3.8). A model of a unary Horn theory is a < -upper subset.
Similarly any function f:Σ→Θ from a set to a monoid extends uniquely to a monoid homomorphism
\Clone = List(Σ)→ Θ (Section 2.7).
L
Theorem 4.2.12 generated a category freely from a sketch in the sense (of universal algebra) that it
satisfies only those laws which are forced. By Theorem 4.4.5, it is the free category in the categorical
sense of satisfying a universal property (next section). Classifying categories for algebra and the λ-
calculus will be given in Theorem 4.6.7 and Remark 4.7.4.
The force of functoriality It is easy to get into the (bad) habit of only defining the effect of a functor
on objects, since we usually write them in this way. The force of functoriality, however, lies in the
definition on morphisms and the preservation of composition.
(a)
The map C-/→ M from any category which takes isomorphisms to id and everything else to e,
where M is the monoid {id,e} with e2 = e.
(b)
The centre of a group, Z(G) = {x: G|∀ g.x g = gx}. The result of applying a homomorphism f:
G→ H to a central element of G need not be central in H, so Z(-) is not defined on maps. In my
experience this is the commonest fallacy: not checking that the ``expected'' action on morphisms
is well defined.
(c)
X
Set-/→ Set by X→ X is also not defined on morphisms, because it is the ``restriction to the
op
diagonal'' of a functor of mixed variance Set xSet→ Set.
(d)
Operations satisfying the definition of a functor apart from the preservation of identities are
called semifunctors; they were first studied by Susumi Hayashi. For an example Rel → Rel, take
the powerset on sets and the lower order, (-)\flat, on relations (Exercises 3.55 and 3.57). Any
semifunctor gives rise to a functor by splitting idempotents in the categories (Definition 1.3.12,
Exercise 4.16).
Category theory was first used in algebraic topology, which aims to assign an (easily calculable)
algebraic structure to each topological space in order to distinguish between spaces. For example (only)
n
the nth reduced homology group is non-trivial for the sphere S which embeds in (n+1)- dimensional
space. The homeomorphisms ( ie topological isomorphisms, and even the continuous functions) between
the spaces also give rise to isomorphisms (respectively homomorphisms) between the corresponding
groups. It is this property which enables algebraic structures to distinguish the spaces.
REMARK 4.4.7 Suppose \typeX1 ≡ \typeX2 in C, ie there is a pair of morphisms u:\typeX1→ \typeX2 and
|:\typeX2→\typeX1 with u;| = \id\typeX1 and |;u = \id\typeX2; we say that the two objects are isomorphic
( cf Lemma 1.3.11).
(a)
Any structure carried by \typeX1 may be transferred to \typeX2 , because any morphism Γ→
\typeX1 or \typeX1→ Θ may be turned into a morphism Γ→ \typeX2 or \typeX2→ Θ by
composition with either u or |, and the process is reversible. Hence ≡ is a congruence (Definition
1.2.12) with respect to categorically definable properties.
(b)
Any functor F:C→ D preserves this property: F \typeX1 ≡ F \typeX2.
Hence if \typeX1,\typeX2 ∈ C are two objects and F:C→ D is a functor (such as homology) for which F
\typeX1 and F \typeX2 are not isomorphic in D, then \typeX1 and \typeX2 are not isomorphic in C. []
Full and faithful Since objects are only defined up to isomorphism, it is harmless (and often useful) to
make isomorphic duplicates of them (for example in our use of variables, Remark 4.3.14). For this
reason injectivity and surjectivity on objects are not particularly important for functors. The force of
functoriality is, as we have said, on morphisms.
(a)
F is faithful if the functions \funcF :C (X,Y)→ D(FX,FY) are injective, ie given f,g :X
X,Y
\rightrightarrows Y in C, if Ff = F g then f = g.
(b)
F is full if each function \funcF is surjective , ie given X,Y ∈ obC and h:FX→ FY in D there is
X,Y
some f:X→ Y with F f = h. Notice that (unlike surjectivity of functions) C gives as well as takes
in this definition; that is, the objects of C must be specified, not just the morphism of D. In
particular ∅→ C is full. Fullness is often accompanied by faithfulness, just as uniqueness is more
important than existence (see the remarks after the proof of Lemma 1.2.11) .
(c)
F is essentially surjective (on objects) or has representative image if for every object A ∈ ob D
there are some object X ∈ obC and an isomorphism F X ≡ A in D.
(d)
F is replete if for every X ∈ obC and isomorphism |:A ≡ F X in D there is a (not necessarily
unique) isomorphism u:Y ≡ X in C such that F Y = A and F u = |. A forgetful functor which is
replete reflects the means of exchange in the sense that the underlying object may be exchanged
for an isomorphic copy and the structure will follow. This is a feature of the presentation, but in
their usual form most of the functors we describe are replete.
(e)
F reflects invertibility if every morphism u:X → Y in C for which Fu:FX ≡ FY in D is already
itself invertible in C.
(f)
F reflects the existence of isomorphisms if every X,Y ∈ obC such that FX ≡ FY in D are already
themselves isomorphic in C.
(g)
F is an equivalence functor if it is full, faithful and also essentially surjective (see also
Definition 4.8.9(c)).
Similarly, a full subcategory is one whose inclusion functor is full, so it shares the same hom-sets and is
determined by its objects. Conversely, a wide or lluf subcategory is one with the same objects, but
perhaps fewer morphisms. A replete subcategory U ⊂ C is one which is full with respect to
isomorphisms and is such that if X ∈ obU and X ≡ Y in C then Y ∈ obU; this happens, for example,
when U is defined by a universal property.
EXAMPLES 4.4.9
(a)
Every monotone function between posets is faithful. It is injective iff it reflects the existence of
isomorphisms, and surjective iff it is essentially surjective. The notions of fullness and reflecting
invertibility are relevant to posets, and repleteness to preorders.
(b)
Monoid and group homomorphisms are faithful and full iff they are respectively injective and
surjective. The monoid inclusions N \hookrightarrow Z (under addition) and Z\{0}
\hookrightarrow Q\{0} (under multiplication) do not reflect invertibility. A group
homomorphism is replete iff it is surjective.
(c)
The functor Sp→ Loc (giving the open-set lattice) is not faithful, but becomes so exactly when
restricted to T0-spaces.
(d)
An action of a category is faithful qua functor iff the action is faithful, ie maps which have
identical effect are equal (Definition 4.2.7).
(e)
Examples 4.4.3(a) are full and faithful, as are the Scott topology Dcpo→ Sp, the powerset
functor Rel→ CSLat and the forgetful functor CSLat→ Pos\dashv. The topology functor Sp→
Loc is full and faithful exactly when restricted to sober spaces.
(f)
Forgetful functors from categories of algebras, such as Gp→ Set and Lat→ Set, also reflect
invertibility: any bijective homomorphism is an isomorphism.
(g)
The forgetful functors Pos→ Set and Sp→ Set are faithful in the sense given, but do not reflect
invertibility (Remark 3.1.6(e)).
(h)
For a forgetful functor to reflect the existence of isomorphisms, each carrier set must support at
most one algebraic structure.
(i)
Examples 4.4.3(b) are wide subcategories.
(j)
Let L and L′ be unary languages (elementary sketches) with the same sorts and operation-
symbols, but such that L′ has additional laws. Then \Clone → \Clone is full but not faithful.
L L′
There is a moral to this: the full subcategory of CSLat consisting of powerset lattices has forgetful
functors successively to CSLat, Pos\dashv, Pos, Set and finally Rel, but is itself equivalent to Rel. This
shows that it is misleading to regard forgetful functors as providing a hierarchy of simplicity amongst
categories: the notion is entirely dependent upon presentation, and indeed some of the functors in
Examples 4.4.4 would be regarded as forgetful by certain authors.
The ``true form'' (in Plato's sense) of a mathematical object is the totality of constructions from it - its
presentations are only images (in both the Platonist and functorial senses). This is in the modern spirit of
object-oriented programming, in which data-objects are available only via constructions and not as their
substance (machine representation).
Our strategy will be to keep faith with categories as they are, uncovering their latent structure - what is
there already. We shall show that this matches the type-theoretic phenomena of interest. Even graph
theory does this in a primitive fashion, by studying features such as bridges, ie edges whose removal
would disconnect the graph. Particular nodes and edges may be characterised à la Leibniz (Proposition
2.8.7) by such features - by the way in which the others see them.
But a graph may have many (different) bridges: we need descriptions (Definition 1.2.10) - properties for
which the definite article ( the) can be used. We use superlatives such as greatest. These must be
justified by comparatives ( greater) with all other objects. Indeed French and Italian ( il più bel
prodotto) use the definite article to make superlatives out of comparatives. In a category the
comparatives are the morphisms.
The Leibnizian method of description is based on quantification over the whole category, and we shall
use either Γ or Θ for the bound variable. This quantification means that universal properties are
impredicative definitions; they are (loosely) related to those in Remark 2.8.11.
Variation over the ambient category may instead be expressed as an axiom scheme with an instance for
each comparative object (Γ or Θ). The maps a:Γ→ X used to test a universal property may be seen as
terms or elements of (type) X, so quantifying over the objects Γ in the category says ``for all terms a:X''
without specifying the parameters Γ occurring in them. But the definitions must be preserved by
substitution for these parameters, cf Lemma 1.6.3.
The terminal object We shall begin by looking closely at the very simplest case. Category theory
attracts from mathematicians derisive names such as ``abstract nonsense'' and ``empty-set theory''
because of definitions like this one and the remarks which follow it. Informaticians will be better aware
of the need to get base cases right: although they appear trivial, they are where the actual work of a
construction gets done, and are also where most program bugs lie. In category theory, any universal
construction is the terminal object of some category.
DEFINITION 4.5.1 1 ∈ obC is a terminator, terminal object or final object if for each object Γ ∈ obC
there is a unique morphism \terminalproj :Γ→ 1.
Γ
EXAMPLES 4.5.2
(a)
A terminal object of a preorder qua category is a greatest element (Definition 3.2.1(d)); truth is
the terminal formula under provability.
(b)
{∗} is the terminal object of Set, and also of Preord, Pos, CSLat , SLat, Lat, DLat, HSL, Heyt,
Frm, Dcpo, IPO, Sp, Mon, Gp , CMon and AbGp with the unique structure. Lemma 2.1.9(e)
showed that there is exactly one total function, x→ ∗.
(c)
∅ is the terminal object of Rel and Pfn; ! is now the empty relation.
(d)
The empty sequence of types is the terminal object of the category of contexts and
substitutions; ! is the empty sequence of terms.
(e)
The terminal type in C and JAVA is called void.
REMARK 4.5.3 In Set, for each element a ∈ X there is a function {∗} → X selecting a (Lemma 2.1.9(d)),
so maps from the terminal object are called global elements. Although this usage came from topology
via sheaf theory (Example 8.3.7(a)) its meaning is exactly what might be expected from programming:
a term in the global (empty) context. By Corollary 4.3.13, a general map a:Γ→ X is a term Γ\vdash a:X
in the arbitrary or local context of the parameters Γ, so it is called a generalised or local element.
The distinction is well known and important in recursion theory: we say that two functions f,g:N→ N are
numeralwise equal if f(0) = g(0), f(1) = g(1), f(2) = g(2), ..., ie the two composites
agree ( cf a predicate φ[n] provable for each n separately), but this is not enough to prove f = g or ∀x.φ[x]
(Remark 2.4.9, Theorem 9.6.2).
Proposition 4.2.9 showed that every category has a faithful covariant action H on its generalised
X
DEFINITION 4.5.4 If G ⊂ obC is such that the restriction of the action to (\G = ∪ C(Γ,X):X ∈ obC)
X Γ∈G
remains faithful then it is a class of generators. Classically this is often stated as follows: if f ≠ g:X
\rightrightarrows Y in C then there are some Γ ∈ G and a:Γ→ X with fo a ≠ go a. To say that there is
some set G of generators is a widely applicable way of restricting the size of a category ( cf locally
small, Remark 4.1.10).
EXAMPLES 4.5.5
(a)
In Set the singleton G = {1} suffices; a category for which {1} is a set of generators is sometimes
called well pointed.
(b)
Pos is well pointed in the weak sense that Pos(1,-):Pos→ Set is faithful, but this functor doesn't
reflect invertibility. Theorem 4.7.13 needs this stronger property, which G = {1,{⊥ < T}} does
satisfy.
(c)
A well pointed semantics for the λ-calculus is called a model, where the word algebra is used for
the general case.
Some writers have specialised the term concrete category to this case, where the action on global
elements is faithful. The standard usage of this term is that there is some faithful action. For example
CSLat is manifestly described ``concretely,'' but CSLat(1,X) always has exactly one element, so {1}
cannot generate. There are other categories which we would like to call concrete but in which there are
few maps 1→ X, or which don't have a terminal object at all. What ``concrete'' really means is that ``the
action you first thought of'' is faithful, so neither definition is particularly satisfactory. However, there is
no real need for any of this terminology. See also Exercise 4.12.
Unique up to unique isomorphism The account of the theory of descriptions which we gave in
Section 1.2 was more liberal than Russell's, allowing interchangeable objects to share the description.
This is the nature of universal properties.
THEOREM 4.5.6 The terminal object, if it exists, is unique up to unique isomorphism. Conversely, any
object which is isomorphic to a terminal object is itself terminal.
PROOF: Let \typeT1 and \typeT2 be terminal objects. Then putting \typeT1 for 1 and \typeT2 for Γ, there
is a unique map u:\typeT2→ \typeT1, and with the opposite assignment a unique map |:\typeT1→
\typeT2.
To show that these are mutually inverse, consider \id\typeT1 and |;u. These are both maps \typeT1→
\typeT1, but casting \typeT1 in both roles (as 1 and as Γ), there is only one such map and so the two
candidates for it must be the same, cf Lemma 1.2.11. So |;u = \id\typeT1 and similarly u;| = \id\typeT2.
Finally, let \typeT1 be terminal and suppose u and | form an isomorphism with \typeT2. Then for any
object Γ, the unique map Γ→ \typeT1 extends by | to \typeT2, but if a,b:Γ\rightrightarrows \typeT2 then
a;u = b;u since \typeT1 is terminal, and then
u;| = \id\typeT2. []
Notice how the uniqueness of maps to 1 enabled us to deduce the equality of id and u;| and hence
transfer known properties from one to the other. Mere existence of morphisms between objects is
nothing like as useful.
Products The cartesian product can also be described in this way. It is what we use in type theory to
describe functions of two variables.
DEFINITION 4.5.7 Let P, X and Y be objects of a category C. Then (π0:P→ X,π1:P→ Y) is a product if for
every object Γ ∈ C and pair of morphisms (a:Γ→ X,b:Γ→ Y) in C there is a unique mediating map f:
Γ→ P such that π0o f = a and π1o f = b. The product is written Xx Y, and the mediator f, which is called
the pair , is written a,b.
Like the terminal object, the product together with the projections, π0 and π1, is unique up to unique
isomorphism. Conversely, any isomorphic object u:Q ≡ Xx Y may be given the structure of the product,
with projections u;π0 and u;π1 and pairs u-1;a,b.
The π notation signifies selection ( 4.5.8(f)), but we often want to specify the omission of a component
(4.5.8(i)); for this we use the hat notation ([^(x)]) and the triangle arrowheads. Beware that the latter
indicate a type-theoretic interpretation and not an intrinsic property such as surjectivity.
EXAMPLES 4.5.8
(a)
A product in a poset or preorder is a meet (Definition 3.2.4(b), Proposition 3.2.11) and in
particular is conjunction for provability of formulae (Definition 1.4.2).
(b)
The product in Set is the cartesian product (Remark 2.2.2) with its usual projection functions.
Given a:Γ→ X and b:Γ→ Y, the mediator is a,b:[(z)\vec]→ a([(z)\vec]),b([(z)\vec]), where
Γ = [[(z)\vec]: [(Z)\vec]].
(c)
The componentwise operations on the cartesian product give the categorical product in SLat,
Lat, DLat, HSL, Heyt, BA, Frm and Gp.
(d)
The componentwise order gives the product in Pos, Preord, SLat, CSLat, Dcpo and IPO by
Propositions 3.5.1 and 3.5.2.
(e)
This is also the componentwise order in Lat, DLat, HSL, Heyt and Frm by Proposition 3.2.11.
(f)
A record consists of data assignments t(i) ∈ \typeX to each field i ∈ I. The type of records
i
whose fields are of given types is the product of those types. A selector is a component projection
π.
i
(g)
Let C be the monoid of primitive recursive functions, qua category with one object called N.
Then NxN ≡ N in C.
(h)
Remark 2.2.2 gave the type-theoretic rules for pairing,
omitted prooftree
environment
though this only defines the product for two types, not for objects of Cn× , which are contexts.
L
(i)
For two disjoint contexts [[(x)\vec]:[(X)\vec]]x[[(y)\vec]:[(Y)\vec]], the product is their
concatenation [[(x)\vec]:[(X)\vec],[(y)\vec]:[(Y)\vec]], and the projections π0 and π1 are the sub-
sequences of variables [[(x)\vec]: = [(x)\vec]] and [[(y)\vec]: = [(y)\vec]]. The pair [[(x)\vec]: =
[(a)\vec]],[[(y)\vec]: = [(b)\vec]] is also the concatenation [[(x)\vec]: = [(a)\vec],[(y)\vec]: = [(b)
\vec]] (Corollary 4.3.13(d)).
REMARK 4.5.9 The product X× Y in an abstract category and the cartesian product of hom-sets in Set
C
Y).
This is a bijection a,b↔ f because ( cf Corollary 4.3.13(d))
π0a,b = a π1a,b = b π0(f),π1(f) =
f.
Notice that 1 has just one generalised element Γ→ 1, whilst the set of elements of Xx Y is the cartesian
product of those of X and of Y.
Preservation and creation of products There is no need to make a choice of products to define what
it means for a functor to preserve them: if one product P is preserved then so is any other Q ≡ P.
(a)
preserves the product of X and Y if, whenever X\gets P→ Y obeys the universal property defining
a product cone in C, then so does UX\gets UP→ UY, but in D ( cf Definition 3.2.6 for posets);
(b)
preserves this product on the nose if choices of product cones have been made in both categories
and U takes one choice to the other;
(c)
creates the product if
So if U is a forgetful functor there is a unique structure that can be put on the product of the
underlying sets and is consistent with the structure of the given objects.
(a)
The forgetful functor Lat→ Set (or Mod(L)→ Set for any single-sorted algebraic theory L)
creates products: once we have found the product of the carriers in a diagram of lattices, there is
only one structure such that the projections are homomorphisms.
(b)
A functor creates unary products ( sic) iff it reflects invertibility, cf Definition 4.4.8(e).
(c)
The forgetful functor Sp→ Set preserves products, but it does not create them: apart from the
Tychonov topology (which gives the categorical product), the projections are also continuous
when the product of the underlying sets is given the discrete topology. []
Using the existence of products To say that a category C ``has products'' is a statement of the form
REMARK 4.5.12 In particular, whenever we have two particular objects A and B, we may (instantiate the
universal quantifiers and) suppose that some product cone A← P→ B is at our disposal. The way that we
do this is just the same as the idiom for (∃E ) in any other circumstances (Remark 1.6.5): it is a formal
property of the existential quantifier that we may invent a name for such a cone and continue to use it for
as long as the given A and B remain in scope. Nothing in this procedure assumes a global assignment of
products to all pairs of objects in the category.
Suppose, on the other hand, that an object P has already been introduced by some other means, and that
we can show that it possesses the universal property needed to be a product. Suppose also that no other
name has so far been given to any product of X and Y. Then, since being a product is a description up to
isomorphism (Lemma 1.2.11, Theorem 4.5.6), we may write P = XxY, electing P as the specified
product, instead of P ≡ XxY.
Usually in category theory, as throughout mathematics in action, the quantifiers do their entrances and
exits unannounced. The process of inventing names for witnesses for existential statements may be
repeated any finite number of times (and that number may be unspecified: we may use products of pairs
of objects from a list of indeterminate length, Exercise 2.43). In a sense it may even be done infinitely:
if, by methods such as in Chapter VIII, we have some way of collecting an ``indexed family'' as one
object then the uniqueness feature of universal properties means that one application of a product
construction suffices to provide it uniformly for all members of the family.
The point at which such methods break down is where we want to make some construction (usually a
functor) globally throughout the category. There are three ways to proceed. Classically, the axiom of
choice selects products once and for all. Logically, such constructions may be regarded as schemes : to
be instantiated to each situation as required. The third way is to replace the category itself with an
equivalent (interchangeable) one on which products are indeed defined globally. We shall show how to
do this in Section 7.6, without using the axiom of choice.
Universal properties give functors We see why it is important for mediators to be unique when we
extend universal constructions to maps.
PROPOSITION 4.5.13 Let U be an object of a category C for which all products Xx U for X ∈ obC exist
and are specified. Then there is a unique functor C→ C whose effect on objects is X→ Xx U and such
that (fx U);π0 = π0;f and (fx U);π1 = π1. It is called (-)x U.
PROOF: The equations amount to one side of the commutative diagram. Clearly the universal property of
products provides a unique fill-in, but it still remains for us to check that this is a functor. In the case f =
\id , the identity \id is such a fill-in, and so by uniqueness this must be it: (\id x U) = \id .
X Xx U X Xx U
Similarly, ignoring Yx U, there is a unique fill-in (f;g)x U:Xx U→ Zx U, but (fx U);(gx U) also makes the
diagram commute, and so by uniqueness they are equal. []
There is similarly a functor x:CxC→ C of two arguments which yields the product of an arbitrary pair of
objects or morphisms.
The terminal object and binary product are the nullary and binary cases of an n-ary connective on
objects, but as usual they suffice.
PROPOSITION 4.5.14 If a category has a terminal object and a product for every pair of objects then it has
a product for every finite list of objects. []
REMARK 4.5.15 The cartesian products Xx(Yx Z) and (Xx Y)x Z of sets are not equal: their typical
elements are x,y,z and x,y,z respectively. Although what these products are is different, what they do is
the same: they both satisfy the universal property of a product of three objects, and so they must be
uniquely isomorphic. That is why definition up to isomorphism is the (accepted) norm, and definition up
to equality is meaningless.
Set has a canonical way of assigning binary products, but we have just seen that this is not associative.
By convention, we shall take the left-associated product (···(((1x \typeX1)x\typeX2)x\typeX3)x···x
\typeX )x\typeX , but notice that this depends on a notion of atomic type. We use it in Remark 4.6.5
n-1 n
to define [[Γ,x:X]] = [[Γ]] x X inductively and hence to interpret expressions in algebras and
programming languages.
LEMMA 4.5.16 Let X,Y ∈ obC. Then there is a category, which we call C↓ X,Y, whose objects are spans,
ie pairs of C-maps X a← U b→ Y, and whose morphisms from this are C-maps f:U→ U′ making the
two triangles below commute.
The terminal object (if it exists) of this category is the product cone for X and Y in C.
PROOF: As with the ``concrete'' categories of Proposition 4.1.4, to show that C↓ X,Y is a category we
just have to verify that (the identity \id is a morphism and) composition preserves the defining
U
property. This is easy, but note that it uses the associativity law in C:
Mainstream mathematicians tend to view this as a rather abstruse way of saying something quite simple,
but the method is like programming with abstract data types. The complexity of the constructions (in this
case the commutativity of the triangles in the above diagram) is hidden from the user and later
applications work in an apparently effortless way, but at a certain price. The components (modules) must
be fully equipped with their ancillary operations and specifications: in this case the product projections
and the associativity law of the given category, which are needed to make the definitions of respectively
the objects and maps meaningful. What seems to be secondary, even optional, structure in the data turns
out to be primary and necessary structure in the derived constructions; conversely ``trivial'' features of
the derived construction unravel into something non-trivial in the given presentation.
In order to see this ancillary structure in action, you should demonstrate in detail that the product of two
objects (if it exists) in a category is unique up to unique isomorphism, both directly in terms of the
definition and indirectly via the category C↓ X,Y.
The type disciplines of category theory and the abstract data type idiom in programming often ensure
that the ancillary structure is complete in the following sense: When the Real Mathematician - and this
applies equally to the Real Programmer - is up to his elbows in the grime of a difficult procedure, he
frequently needs to prove (or to leave to the reader) lemmas and sub-lemmas (respectively to program
sub-routines or perform in-line hacks) which import the notation of the immediate application but in fact
only reproduce in a multitude of special cases that ancillary structure which he had disregarded as trivial.
By contrast, the categorist begins by applying the parsimonious categorical tools which she has learned
to trust. She then approaches applications with the calm self-confidence of one who knows that she is
prepared for the occasion.
formallanguages → interpretation
functor [[-]] semantics with universal
properties
(a)
base types or sorts, X (there is as yet no need for a product type constructor, as many-variable
contexts can handle products for us); we write Σ for the set of sorts;
(b)
an inexhaustible collection of variables x :X of each sort;
i
(c)
operation-symbols, \typeX1,…,\typeX \vdash r:Y, each of which has an arity, ie a list of input
k
(d)
laws between terms (it is a free theory if there are none).
(a)
(b)
a multiplication table for each operation-symbol r, ie a map \opr :\typeA\typeX1x···x\typeA \typeX
A k
such that the polygons (such as those in Example 4.6.3(f)) which express the laws commute.
\typeB between the corresponding objects which preserves each operation r in the sense that the
X
Note that we have already used the product functor (Proposition 4.5.13) in this definition, so we would
now be stuck if we hadn't insisted on uniqueness of the mediator in the definition of product.
Algebras and homomorphisms in C form a category, called \Mod (L); the subscript is omitted if C = Set
C
is understood.
For each sort there is a forgetful functor \Mod (L)→ C, but it is usually not faithful: if it were, this one
C
Σ
sort would suffice. There is a faithful functor to C reflecting invertibility, where Σ is the set of sorts.
Examples It is clearer to fix the meaning of these widely applicable definitions by means of familiar
examples than by formality. But note that, at present, we intend each of the sorts, operation-symbols and
laws to be named concretely. Typical examples of theories in mathematics have one or two sorts, half a
dozen operation-symbols (of arity zero, one, two, or occasionally three) and a dozen laws.
In Chapter VI we shall develop internal theories, whose linguistic classes may themselves be objects of
the model, and there may be arbitrarily many sorts and operation-symbols. The arities will be allowed to
be ``infinite,'' ie again objects of the model, but there will be no laws. As we shall see in Section 5.6, this
is because, for laws to behave as intended, each operation must have a finitely enumerated family of
arguments. Nevertheless, when there are laws, there may be as many as we please.
EXAMPLES 4.6.3
(a)
An internal algebra in Set is an algebra in the ordinary sense.
(b)
The propositional form of an algebraic theory is a Horn theory, ie a system of finitary closure
conditions \triangleright on the set Σ of sorts (Sections 1.7, 3.7 and 3.9). A model in Ω is an
assignment of a truth-value to each element of Σ, ie a subset A ⊂ Σ, which is \triangleright -
closed. A homomorphism is simply a containment A ⊂ B .
(c)
Natural numbers (N), lists and trees are described by free theories (Sections 2.7 and 6.1 ). Zero,
the empty list and the leaves of trees are nullary operations or constants. Successor and adding an
item to a list are unary operations, and the node-types of a tree are operations of various arities.
Bourbaki used the word magma for an algebra with one binary operation and no laws, ie the
theory whose free algebra consists of binary trees. Remark 2.7.12 gave the continuation rule for
lists in terms of homomorphisms.
(d)
The abstract syntax of a context-free language is the same thing as a free theory. In the design
of programming languages the sorts are unfortunately called syntactic categories, eg \bnfname
program, \bnfname term, \bnfname number . The keywords of the language are the operation-
symbols, but these needn't be represented graphically. For example, in omitted eqnarray*
environment 0,…,9 are constants and +,- are unary operations, but there are an invisible binary
operation \bnfname numberx\bnfname digit→ \bnfname number and two unary ones \bnfname
digit\hookrightarrow \bnfname number\hookrightarrow \bnfname integer. The variables for each
sort are called meta-variables, since \bnfname variable may itself be one of the sorts. We shall
consider the meta-language of type theory (variables, terms, types, contexts, substitutions) in
Section 6.2.
(e)
The specification for a program module consists of data-sorts, operations and laws; its
implementations are algebras.
(f)
An internal monoid in a category C with finite products is an object M ∈ obC together with unit
and multiplication maps,
e:1→ M and m:MxM→
M
such that the following diagrams commute:
(g)
Internal groups in the categories of topological spaces, manifolds and algebraic varieties are
topological, Lie and algebraic groups.
(h)
A ring together with a module is an example of a two-sorted algebra.
(i)
The morphisms of a category with a fixed set O of objects form an algebra for a theory with OxO
sorts, namely the hom-sets for each source-target pair. There are O constants (identity on each
object), O3 binary operations (composition) and O2+O2+O4 (unit and associative) laws. When O
= {∗} this is just a monoid.
(j)
A particular model of a theory may be specified by generators and relations, ie as the free
algebra for the theory augmented by constants and laws. List({a,b,c})/(ab ∼ ba), for example,
specifies the monoid with three generators, two of which commute (Section 7.4).
EXAMPLES 4.6.4 The following are not algebraic theories, as there are exceptional values at which the
operations are not defined, but it is often profitable to apply algebraic ideas to them.
(a)
Lists, with head and tail, because of the empty list (Section 2.7).
(b)
Number fields, because of division by zero.
(c)
An abstract projective plane consists of two sets (of ``points'' and ``lines'') and an (``incidence'')
relation between them. Through any two distinct points a unique line passes, and any two distinct
lines intersect in a unique point (Example 3.8.15(e)).
These theories involve conditional properties (``either ... or ...''), which we shall discuss briefly in
Examples 5.5.9.
Semantics of expressions Now we shall extend Theorem 4.4.5 to algebraic theories. We have already
seen several notations for the effect of a functor, and now we shall introduce yet another: Dana Scott's
semantic brackets. These are convenient because the construction is applied to lengthy expressions; the
brackets also draw attention to the difference between syntax and semantics. When more than one model
A is under discussion, we write A[[X]], A[[Γ]] , A[[r]], A[[u]], etc .
REMARK 4.6.5 An algebra A in a category C gives the meaning of each sort X as an object \typeA . This
X
using the left-associated product (Remark 4.5.15) . A also gives the meaning of each operation-symbol
\typeY1,…,\typeY \vdash r:Z and each constant c:Z as a morphism \opr :\typeA\typeY1x···x\typeA
k A
\typeY → \typeA or \opc :\terminalobj → \typeA . These extend uniquely to the \bnfname terms in the
k Z A C Z
1 1
a2- b,
9 3
in the ring (R,+,*) in the category Set. The operation-symbols have been marked in the diagram, along
with the product projections that show which variables and sub-expressions are the arguments. Using the
universal property of each product, there is a unique way of filling in other maps which define the sub-
expressions as (polynomial) functions of a,b,c ∈ R3. The last is the whole expression as a map R3→ R.
To show that this is well defined we must show that the laws of the sketch (Remark 4.3.8) and of the
theory L are respected. Those involving discard are easy. The Substitution Lemma itself holds because
we used Remark 4.3.10(a) to give the interpretation of terms. Terms a and b which are shown to be
equal by a single application of a law a′ = b′ in L correspond to programs
EXAMPLE 4.6.6 YACC generates a parser in C from a context-free grammar (Example 4.6.3(d)). The rule
states the syntax of the token /+/ in the program text, and defines its semantics to be the C subroutine /
addition/.
Notice that the variables disappear in passing from syntax to semantics, so when we go the other way we
must make an arbitrary choice of them ( cf Remark 4.3.14).
The classifying category The notions of functor and product exactly capture algebraic theories and
their models. The single-sorted version is due to Bill Lawvere (1963, Exercise 4.29).
(a)
x
\Clone has a choice of finite products, and a (``generic'') model of L.
L
(b)
Let C be another category with a choice of finite products and a model of L. Then the functor
[[-]]:Cn× → C preserves (the choice of) finite products and the model, and is the unique such
L
functor.
(c)
Any functor Cn× → C which preserves finite products also preserves the L-model.
L
x
PROOF: Recall that the objects of \Clone are contexts and the morphisms are programs or substitutions.
L
(a)
[[a]] Products are given by concatenation of contexts (Example 4.5.8(i) and its footnote). The
model is as follows:
x
❍ The base type X of L is interpreted as a single-variable context [x:X] ∈ ob\Clone , in
L
(b)
[[b]] The interpretation is given by Remark 4.6.5.
(c)
[[c]] The sorts and operation-symbols of the model are given by the effect of the functor,
essentially as in part (a). []
Theorem 3.9.1 found the classifying semilattice Cn\land\triangleright for a Horn theory, using induction on
closure conditions to show that [[-]] is monotone, and Exercise 3.36 gave the saturated form of any
x
closure condition. Now \Clone is the saturation as an algebraic theory, with all possible derived
L
operations of every finite arity, modulo all provable equalities between them ( cf regarding expressions
as operation-symbols in Remark 1.1.2). Hence the term clone, which was introduced by Philip Hall in
1963.
The model is generic in the category Cn× in the same way that the value x0 was generic in a proof box
L
in Remark 1.5.6: inside this world it may be treated as an ordinary object, and there are special rules for
importing and exporting it. Indeed, just as in Remark 1.5.10, given any such object in the outside world,
there is a β-reduction of the generic object which reproduces it, namely the functor [[-]].
COROLLARY 4.6.8 Let φ be a formula involving equations between terms of L, conjunction and universal
quantification, and let Γ be some possibly infinite set of such formulae. We write Γ\satisfies φ if every L-
L
algebra satisfying all of the formulae in Γ also satisfies φ. Then Γ\satisfies φ iff Γ\proves φ.
L L
PROOF: Without loss of generality φ is a single equation and we may strip conjunctions and universal
quantification from Γ. Let L′ be the theory L together with the equations in Γ as additional laws. Then an
``L-algebra satisfying all of the formulae in Γ'' is just an L′-algebra. If Γ\satisfies φ then in particular
L
Cn× satisfies φ, but this model satisfies exactly those equations that are provable from L′, so Γ
L′
We shall give a generalisation of the universal property of products in Section 5.1. In the corresponding
wider notion of algebraic theory, which includes categories themselves as an example, operations may
be defined or laws imposed only when certain conditions hold. Remark 5.2.9 and Chapter VIII set out
two ways of formulating such theories.
In this chapter, we already have a technique which brings both parties together on the same categorical
platform. Then the ways in which they each express the same essential features can be compared
directly.
The syntactic category was constructed in Section 4.3. Its objects are lists of \bnfname typed variables
and its morphisms are lists of \bnfname terms, where we left the notions of \bnfname type and \bnfname
term undefined. In Section 4.6, these meant sorts (base types) and algebraic expressions respectively.
Now we shall allow the types to be expressions built up from some given sorts using the binary
connective → , and the terms to be λ-expressions.
We already have the notion of composition: it is given by substitution, as before, and is not something
new involving λ-abstraction, which we do not yet understand. The Normal Form Theorem 4.3.9 for
substitutions still holds - not to be confused with that for the λ-calculus, Fact 2.3.3. The category has
specified products, given by concatenation.
As the raw syntax is now a category, we can already ask about any universal properties it might have.
We begin by defining the \bnfname terms to be αδ-equivalence classes, ie taking account of any
algebraic laws between operation-symbols in the language, but not the βη-rules. These take the form of
equations between morphisms of the category, and we shall argue from them towards cartesian closure.
The technique is a generic one, and will be applied to binary sums, dependent sums and dependent
products in Sections 5.3ff, 9.3 and 9.4 respectively.
It is easy to be fooled by syntactic treatments into thinking that for a type to be called [X→ Y] is
necessary and sufficient for it to behave as a function-type. Our development (here and in Chapter IX)
is based on how this type is used (application, abstraction and the βη-rules): any type-expression or
semantic object is a priori eligible for the role .
\ev = [y: =
X,Y → [y:
f:[X→ Y],x:X
fx]: Y].
The raw calculus Abstraction, which is what the function-type is about, is much more interesting.
In Sections 1.5 and 2.3 sequent rules were needed: λ is not an operation on terms but a meta- operation
on terms -in-context. It defines a bijection,
omitted prooftree
environment
as in Definition 2.3.7, so it does the same to their interpretations.
So first we shall concentrate on the meaning of the new operation λ in the raw calculus, and in particular
on its invariance under substitution, adding the β- and η-rules later. With or without these rules,
substitution remains the notion of composition, and we shall refer to the categories composed of λ-terms
and of raw λ-terms respectively.
DEFINITION 4.7.2 Let C be a category with a specified terminal object and for which each pair of objects
has a specified product together with its projections. Then a raw cartesian closed structure assigns to
each pair of objects X and Y
(a)
an object of C, called the function-type or -space, [X→ Y],
(b)
a C-morphism, called application, \ev :[X→ Y]x X→ Y, and
X,Y
(c)
for each object Γ, a function λ : C(Γx X,Y)→ C(Γ,[X→ Y]) between hom-sets called
Γ,X,Y
u
for each u:Γ→∆and p:∆xX→Y. omitted diagram environment
In defining products we didn't reserve any special treatment for base (atomic) types - nor do we here. In
the semantic case they are not special anyway, but in the syntactic category an object is a list of types.
We use Currying (Convention 2.3.2) to exponentiate by a list.
(a)
X
In Set, the specified products are cartesian ones, the function-space Y (Example 2.1.4) serves as
[X→ Y] and application provides \ev , . The abstraction λ , , takes the function p:Γx X→ Y of
XY ΓXY
two variables to the function (of one variable σ:Γ) whose value is the function x→ p(σ,x). The
naturality law is clearly valid.
(b)
Any Heyting semilattice (Definition 3.6.15); the interpretation of the language is that of
propositions as types (Remark 2.4.3). The rules (⇒ ℑ ) and (⇒ E ) correspond to abstraction and
evaluation, and nothing need be said about naturality.
(c)
x
We write \Clone +λ for the category composed of raw λ-terms, given by Definition 4.3.11, in
L
\vec] are new variables and omitted eqnarray* environment The expressions in [(f)\vec], [(p)\vec]
and [(y)\vec] must be read as ``for each j, ...'' (which explains the bracket conventions). Naturality
with respect to u = [^(z)] and [z: = c] is the substitution rule in Definition 2.3.7. By the Normal
Form Theorem 4.3.9, any substitution u may be expressed as a composite of these two special
cases, from which the naturality law above follows.
Interpretation The raw λ-calculus extends the language of algebra by some new types [X→ Y] and
operation-symbols \ev . As yet these have no special significance, and they can be handled as if they
X,Y
→
were algebra: hence the notation Cn× +λ (in contrast to Cn below).
L L
REMARK 4.7.4 Let C be a category together with a raw cartesian closed structure, in which the base
types, constants and operation-symbols of L have an assigned meaning. Then the language L+λ has a
x
unique interpretation, and this defines a functor [[-]]:\Clone +λ → C.
L
(a)
the base types are given to be certain objects;
(b)
the function-types are those of the raw cartesian closed structure;
(c)
from these the contexts are (specified) products;
(d)
the variables and operation-symbols (including evaluation, ev) are treated as in algebra, and the
laws are satisfied;
(e)
the last clause of the raw cartesian closed structure says how to perform λ-abstraction;
(f)
the morphisms are treated as in the direct declarative language.
By Theorem 4.6.7 this is a product-preserving functor, and by the present construction it also preserves
the new structure. []
The β- and η-rules We have constructed Cn× from the syntax, so it has names for its objects and
L λ
+
morphisms. But it is also a category, so we may compare these two modes of expression, from which we
shall derive a universal property.
\ev o λ , , (p)x\id
X,Y ΓXY X = p (β) (4.1)
λ[ (\ev , \id[
X→ Y],X,Y X X→
= (η) (4.2)
) Y]
Y
DEFINITION 4.7.6 A cartesian closed structure on a category is a raw cartesian closed structure which
satisfies the β- and η-rules.
(a)
Set and any Heyting semilattice.
(b)
→
The category of contexts and λ-terms, Cn . This is again given by Definition 4.3.11, but now
L
(c)
Using domain-theoretic techniques developed by Dana Scott, it is possible to construct a space X
such that X ≡ Xx X ≡ [X→ X]. Then X is a model of the untyped λ-calculus with surjective pairing
and End(X) is called a C-monoid [ Koy82]. We only need to add a terminal object to get a two-
object cartesian closed category {X,1}. (But splitting idempotents (Definition 1.3.12 ,
Exercise 4.16) gives a richer category.)
Let \typeX1,…,\typeX \vdash r:Y be an operation-symbol (of type Y and arity k). Then \expx r ≡ λ[(x)
k
In the unary case, r(a) is an operation applied to a value. In a cartesian closed category this equals ev
(\expx r,a) = \expx ra in the sense of λ-application. The former uses composition by substitution (which
we treat as the standard notion), whilst the λ- calculus provides gof = λx.g(fx). These coincide iff the β-
and η-rules hold.
The notation \expx p for the exponential transposition is commonly found in category theory texts, but
it is clearly inadequate to name all of the morphisms of a cartesian closed category. So they frequently
go without a name. All too often, proofs are left to rely on verbal transformations of unlabelled
diagrams, without regard to the categorical precept that morphisms are at least as important as objects.
The λ-calculus gives the general notation we need.
PROPOSITION 4.7.10 A cartesian closed structure on a category is given exactly by the choice of a
product and a function-space for each pair of objects, together with the projections and evaluation.
PROOF: Suppose we have the structure of Definition 4.7.2 and Lemma 4.7.5. The two definitions of
function-space share the same data and also the β- rule, so that λ , , (p) serves for \expx p.
ΓXY
so \expx p is unique, ie the universal property holds. Conversely, naturality and the β-rule follow from
the universal property by uniqueness (as in Proposition 4.5.13). The η-rule holds as id serves for λ(ev),
and the naturality law holds because its right hand side serves for the left. []
COROLLARY 4.7.11 The exponential object [X→ Y] is unique up to unique isomorphism. It defines a
functor which is contravariant in the first or raised argument and covariant in the other.
→
THEOREM 4.7.12 The category \Clone has a cartesian closed structure and a model of the λ-calculus
L
with base types and constants from L. Any other such interpretation in a category C is given by a unique
→
functor [[-]]:\Clone → C that preserves the cartesian closed structure and the model. Conversely, any
L
PROOF: As in Theorem 4.4.5 and Remark 4.6.5. Remark 4.7.4 extends the interpretation to the λ-
calculus; in particular the function-types have to be preserved. By Proposition 4.7.10, these must be
exponentials. []
Cartesian closed categories of domains The category of sets and total functions is the fundamental
interpretation of the typed λ-calculus, but it does not have the fixed point property (Proposition 3.3.11)
needed for denotational semantics. During the 1970s and 1980s a veritable cottage-industry arose,
manufacturing all kinds of domains with Scott-continuous maps, each with its own peculiar proof of
cartesian closure. In fact these categories ( necessarily) share the same function-space as in Dcpo: what
is needed in each case is not a repetition of general theory, but the verification that the special semantic
property is inherited by the function- space.
PROOF: The universal property tells us what the exponential [X→ Y] must be. Taking Γ = {∗} , it is the
set of monotone functions, whilst for ev to be monotone we must have f ≤ g⇒ ∀x.f(x) ≤ g(x). Now
consider Γ = {⊥ < T}, cf Example 4.5.5(b). If f ≤ g pointwise then there is a monotone function Γx X→ Y
by ⊥,x→ f(x) and T,x→ g(x). The exponential transpose is ⊥→ f and T→ g, so f ≤ g as elements of the
function-space.
(a)
(b)
π0, π1 and ev are well defined morphisms;
(c)
-,- and λ take morphisms to morphisms;
(d)
pairing, naturality and the β- and η-rules are satisfied.
The first three parts were proved in Propositions 3.5.1 and 3.5.5, but it is the notion of cartesian closed
category which makes sense of the collection of facts in Section 3.5. The laws in part (d) are inherited
from the underlying sets and functions. []
The result for Scott-continuous functions (redefining [X→ Y]) is proved in the same way.
PROOF: For similar reasons, [X→ Y] must be the set of Scott- continuous functions with the pointwise
order. Propositions 3.5.2 and 3.5.10 gave the details, based on a discussion of pointwise joins, and in
particular Corollary 3.5.13 about joint continuity of ev. []
Algebraic lattices, boundedly complete posets, L-domains and numerous other structures form cartesian
closed categories with Scott-continuous functions as their morphisms. The issue of making ev preserve
structure jointly in its two arguments may be resolved in a different way, as Exercise 4.51 shows.
At the end of the next section we shall show that categories themselves may be considered as domains
and form a cartesian closed category. First we need to introduce the things which will be the morphisms
of the exponential category; this turns out to be the abstract notion which we needed for substitution-
invariance of λ. Section 7.6 returns to the relationship between syntax and semantics, bringing the term
model into the picture. Function-spaces for dependent types are the subject of Section 9.4.
One reason is the problem of ``size'' mentioned in Remark 4.1.8, but there is also an algebraic one. As
we have said, mathematical constructions generally define objects only up to isomorphism, because
frequently there is a different but equally useful representation which can be substituted. For example
there are two versions of the three-fold cartesian product. But once the representation is chosen, (the
elements and more generally) the morphisms have a unique construction.
Such constructions of objects are, with a few rare exceptions, always functors, albeit frequently
contravariant or even of ``mixed'' variance (Example 4.4.6(c)). In particular, an algebra (the
interpretation of a theory L) is a functor Cn× → Set (Theorem 4.6.7). Thus functors are often
L
parametric objects and so, like objects, are intrinsically defined only up to isomorphism. Whereas
morphisms of a category are in some sense isolated from one another, functors (like the objects which
are their values) have a kind of fluidity between them, given by the morphisms of the target category,
which we haven't taken into account.
Often the object X is put as a subscript (φ ), but here we have written φX as an application to an object
X
of C yielding a morphism of D. This is the counterpart of using the same notation for the result of a
functor applied to a morphism as to an object (Definition 4.4.1). Indeed the naturality square is the
application of φ:F→ G to f:X→ Y. The square is not symmetrical, and occasionally we shall indicate
whether the vertical is natural with respect to the horizontal (as above) or vice versa by an `` N'' or `` Z''
in the middle.
EXAMPLES 4.8.2
(a)
If C and D are preorders then a natural transformation F→ G is (an instance of) the pointwise
order F ≤ G (Proposition 3.5.5).
(b)
Proposition 4.5.13 defined the effect of the product functor (-)x U by making π0:(-x U)→ id
natural between functors C→ C.
(c)
X
Application \ev :(-) xX→ (-) is natural for each X ∈ obC, by Definition 4.7.2(c).
X
(d)
Theorem 4.4.5 showed that functors A:\Clone → Set correspond to algebras for a unary theory L.
L
symbols.
(e)
Natural transformations between the (product-preserving) functors A,B:Cn× \twoheadrightarrow
L
Set which interpret algebraic theories correspond to homomorphisms φ:A→ B. In this case the
definition of φ must be extended from base types to contexts in the same way as in the remarks
after Definition 4.6.2.
(f)
op
Consider \H = C(-,X):C → Set as a functor (instead of the union ∪ \Clone (Γ,X) in
X Γ L
(g)
(h)
The abstraction operation λ-, : C(-x X,Y)→ C(-,[X→ Y]) of a (raw) cartesian closed structure
X,Y
op
(Definition 4.7.2(c)) is another natural transformation between functors C → Set.
Natural transformations show up even when we only set out to consider categories and functors.
(a)
Their product ΓxC in the category Cat of categories and functors has object class (obΓ)x( obC)
and hom-sets
= Γ(E′,E)xC(X ′,
(ΓxC) E′,X′,E,X
X)
with componentwise composition, cfProposition 3.5.1. The pairing operation and projection functions
are as expected.
(b)
In particular for groups (one-object categories), the factors C and Γ are isomorphic to subgroups
of the product ΓxC by f→ id,f and h→ h,id, and these commute.
(c)
Let P:ΓxC→ D be a functor (``of two variables'') and h:E′→ E a morphism of Γ. Then P(h,-):P(E
′,-)→ P(E,-) is a natural transformation between functors C→ D. omitted diagram environment
(d)
Suppose that the value \funcP (E,X) of the functor on objects is given, together with P(E,f) and P
o
(h,X), functorially in each argument separately, such that the square above commutes. Then the
(``joint'') functor P:ΓxC → D can be defined, cf 3.5.1(c). []
It is often the case that ``naturally defined'' constructions are functorial or natural in the formal sense by
completely routine calculation. However normality, like functoriality (Examples 4.4.6ff), does carry
mathematical force, since it provides an equation, and may be the point at issue, as it is in
Theorem 7.6.9.
Composition We shall use the following scheme to discuss composition of natural transformations; it
also explains the geometrical terminology.
DEFINITION 4.8.4 The vertical composite φ;ψ:F→ H is defined by (φ;ψ)X = (φX);(ψX), as in the
following diagram:
The identity for this composition is defined by (\id )X = \id = F\id , but it is often called just F. We
F FX X
never use (;) to compose functors. For posets, vertical composition is the transitivity of the pointwise
order.
On the other hand, functors themselves apply to morphisms and hence to natural transformations, giving
K·φ and L·φ. These are natural because the functors K and L preserve commutativity of the naturality
square for φ. Natural transformations also apply to the results of the functors on the objects, to give θ·F
and θ·G, which are natural by instantiation. These are related by the square,
which commutes by naturality of θ. Note that the object X is completely passive; in fact if we replace it
by a morphism f:X→ Y we obtain the commutative cube which shows naturality of θ·φ. For posets, we
are simply applying a monotone function to the pointwise order.
LEMMA 4.8.6 The two composition operations are related by the middle four interchange law,
(θ;χ)·(φ;ψ) = (θ·φ);(χ·
ψ)
as suggested by the diagram opposite. []
LEMMA 4.8.7 A natural transformation φ is vertically invertible iff every component φX is invertible, and
is then called a natural isomorphism. It is horizontally invertible iff also the functors which it relates are
themselves invertible. []
(a)
Pairing and λ-abstraction (Examples 4.8.2(g) and (h)) make
C(-,X)xC(-,Y) ≡ \H and C(-x X,Y) ≡ \H[
Xx Y X→
.
Y]
(b)
⊥⊥
The double transpose (-) :V→ V** for a finite-dimensional vector space; this was the original
example considered by Saunders Mac Lane and Sammy Eilenberg (1945), in order to distinguish
this from the ``unnatural'' (basis-dependent) isomorphism V ≡ V*.
(c)
Any natural transformation between functors F,G:C \twoheadrightarrow D for which either C or
D is a group, groupoid or equivalence relation.
(d)
In particular between permutation or matrix representations of a group.
Equivalences Functors are the means of exchange between categories, so since functors are only
defined up to isomorphism, exchange between categories is a notion of isomorphism that is further
weakened by putting isomorphism for equality. In Section 7.6 we exploit the difference between strong
and weak equivalences to resolve the issue of whether products, exponentials, etc in a category are
structure or properties.
(a)
an isomorphism of categories if there is a functor U:A→ S such that \id = U· F and F·U = \id
S A
(b)
a strong equivalence of categories if the following are given together with F:
❍ the laws F(ηX);ε(FX) = \id and η(U Y);U( εY) = \id hold;
FX UY
(c)
an equivalence functor if it is full, faithful and also essentially surjective (Definition 4.4.8(g)).
(d)
As in Remark 3.6.7(d) we also say that two categories S and \cat T are weakly equivalent if a
third category A and equivalence functors F:S→ A and G:\cat T→ A are given.
We write S ≅ A to indicate that categories are equivalent, making it clear in each context whether we
mean strongly or weakly. See Definition 3.6.7 for the preorder version and Exercise 3.26 for the need
for Choice.
We shall show in Corollary 7.2.10(c) that, with Choice, strong and weak equivalence coincide, ie any
equivalence functor F has a pseudo-inverse, but this is determined only up to unique isomorphism, not
equality. For given Y ∈ obA there may be many objects X ∈ obS with F X ≡ Y, and any such object may
itself have many automorphisms. (The reason for postponing the proof is simply to avoid repetition,
since it is technically the same as the relationship between universal properties and categorical
adjunctions.) Exercises 4.36ff explore equivalences for monoids.
Functor categories As we observed in Proposition 4.1.5ff, categories may arise as structures as well as
congregations. In particular, some of the more exotic ``domains'' in the literature [HP89, Tay89] are
categories rather than posets.
THEOREM 4.8.10 The category Cat of small categories and functors (Remark 4.1.8) has a cartesian
closed structure.
PROOF: We shall follow the four-point plan set out in Theorem 4.7.13, but Proposition 4.8.3 has already
discussed the product. To compare with Proposition 3.5.5, think of a category as a ``preorder with
proofs'' (Definition 4.1.6). The generalisation forces us to give notation explicitly for them: φ:F→ G and
f:\typeX1→ \typeX2 are ``the reasons why f ≤ g and x1 ≤ x2'' and monotonicity becomes the idea of a
functor.
(a)
We know that functors X→ Y and natural transformations between them form a category with
vertical composition.
(b)
Next we show that ev:[X→ Y ]xX→ Y is a functor. A map in its source category consists of a
natural transformation φ:F→ G and a morphism f: \typeX1→ \typeX2. The naturality square
(Definition 4.8.1) for φ at f corresponds to that in Proposition 3.5.5. Its diagonal defines
evaluation on maps, by ev(φ,f) = φ f. A similar diagram of nine objects in four squares shows that
composites are preserved.
(c)
To show that λ preserves functoriality, let P:Γx X→ Y be a functor. Then so are \expx P(U′) = P
(U′,-) and \expx P(U) = P(U,-); moreover h:U′→ U gives rise to a natural transformation between
them, \expx P(h) = P(h,-) (Example 4.8.3(c)).
(d)
Finally, naturality and the β- and η-rules must be tested on maps as well as on objects. []
REMARK 4.8.11 As Set is not a small category, the size problems have to be handled differently in the
next result. In practice, the easiest way is to continue to treat functors as schemes of constructions, but
Cop
the objects of Set also have an alternative representation by the Grothendieck construction
(Proposition 9.2.7), so long as at least C remains small. For the (large) category-domains mentioned
above, it is still possible to control the size of the functor categories, because the functors in question are
Scott-continuous and are therefore determined by their values on ``finite'' objects as in
Proposition 3.4.12. We restrict attention to those locally finitely presentable categories (Definition
6.6.14(c)) which have a set of generators in the sense of Definition 4.5.4.
The Yoneda Lemma The following theorem is the abstract result which underlies the regular (Cayley)
representation studied in Section 4.2 (and, for posets, in Sections 3.1 and 3.2). Unfortunately, the
abstract version is often all that is presented, leaving students unenlightened and, more seriously,
depriving them of a powerful technique. Section 4.3 used it to construct the category of contexts and
substitutions of a formal language, which we shall develop in Chapter VIII. Proposition 3.1.8 gave the
(a)
Let φ :C(Γ,X)→ C(Γ,Y) be any system of functions. Then φ(-) is natural iff it arises by
Γ
postcomposition with some map f:X→ Y as in Example 4.8.2(f), and then f is unique.
(b)
Cop op
Hence \H(-):C\hookrightarrow Set (where \H ≡ C(-,X):C → Set) is a full and faithful
X
(c)
For any functor F:C→ Set and object X ∈ obC,
Cop
Set (\H ,F) ≡
X
FX
naturally. (This part is called the Yoneda Lemmaitself, 1954.) omitted diagram environment
PROOF:
(a)
[[a]] Put f = φ (\id ), so this is uniquely determined by φ. Then by naturality with respect to
X X
(b)
[[b]] Verify that post(f) is functorial in f; it is full and faithful by the previous part.
(c)
[[c]] Let φ(-):\H → F be natural. Put a = φ (\id ) ∈ FX. Then by naturality with respect to u:Γ→
X X X
X, φ (u) = Fu(a). Conversely, verify that this defines a natural transformation for any a ∈ FX. []
Γ
By Exercises 4.40 and 4.41, the Yoneda embedding preserves products (indeed all limits) and
exponentials. Section 7.7 is a powerful application of the Yoneda lemma to the equivalence between
DEFINITION 4.8.13 A representable functor is one which is naturally isomorphic to some \H = C(-,X)
X
( cf Definition 3.1.7).
(a)
Xx Y represents C(-,X)xC(-,Y) ≡ C(-,XxY);
(b)
Y Y
X represents C(-x Y,X) as C(-,X );
(c)
P(Y) represents Rel(-,Y) in Lemma 3.3.6 as Set(-,P(Y)); for Y = 1, Ω = P(1) represents Sub
(Proposition 5.2.6);
(d)
LiftY represents Pfn(-,Y) in Definition 3.3.7 as Set(-, LiftY).
2-Categories Since it is equipped with natural transformations as well as functors, the class of
categories is an example of a two-dimensional generalisation of categories themselves.
(a)
a class of 0-cells;
(b)
for each pair of 0-cells, a category, whose objects and morphisms we call 1-cells and 2-cells
respectively; the given 0-cells are called the left and right hand ends of these 1- and 2-cells; the
source and target of the 2-cells are called their top and bottom sides or edges and the composition
is styled vertical;
(c)
for each pair of 1-cells or 2-cells of which the right side of the first is the left side of the second, a
horizontal composite 1- or 2-cell;
such that the vertical and horizontal associativity and identity laws and the middle four interchange law
hold. There is a corresponding notion of 2-functor. Beware that 2-cells are not square but lens-shaped,
with two ends and two sides, cf the diagram before Definition 4.8.4.
EXAMPLES 4.8.16 The following each define the 0-, 1- and 2-cells of 2-categories:
(a)
C/
at: categories, functors and natural transformations;
(b)
Pos: posets, monotone functions and (instances of) the pointwise order; horizontal composition is
defined by the monotonicity of functional composition ( cf the proof of Proposition 3.5.5);
(c)
Rel: sets, binary relations and inclusion;
(d)
Sp: topological spaces, continuous functions and the specialisation order (Example 3.1.2(i))
pointwise;
(e)
the objects and morphisms of any category, with only identity 2-cells (the discrete 2-category on
a category);
(f)
→
\Frak Cn : the types, raw terms and standard reduction paths of the λ-calculus (Exercise 4.34) .
Sometimes composition is only defined up to isomorphism, satisfying certain coherence equations that
were identified by Saunders Mac Lane and Max Kelly (1963); structure of this kind is called a bi-
category.
(g)
The points, paths and homotopies in a topological space form a bi-category (Exercise 4.43); a 2-
functor π1(X)→ C takes a homotopy, ie a continuous function IxI→ X, to a commutative square
in C, a fact which we use to prove van Kampen's Theorem 5.4.8.
(h)
Conversely, Exercise 4.49 about λ-abstraction of natural transformations may be seen as
homotopy of functors.
Since there are two directions of motion, there are two independent ways of forming opposite 2-
categories, and a third by doing both of them. Hence there are three notions of contravariant 2-functor.
We say that a functor is ``contravariant at the 1- and/or 2-level.''
4.9 Exercises IV
1. Formulate and prove the fact that any polygon commutes iff it can be decomposed into
commuting polygons.
H
2. Using the Cayley-Yoneda action, show that every monoid arises as a submonoid of H for some
set H. For any set X, characterise the constant functions λx.a solely in terms of composition in
X
X . Describe the Cayley-Yoneda action of any monoid M on its constants.
3. By choosing some combinatorial structure, such as the cosets of subgroups, show that every
group arises up to isomorphism as the group of all automorphisms of some algebraic structure.
X
4. Restate Proposition 4.2.9 for categories. Show that C has two faithful regular actions: on H =
∪ C(X,Θ) (contravariantly by precomposition) and \H = ∪ C(Γ,X) (covariantly by
Θ X Γ
X
postcomposition). Reformulate these as functors H = C(X,-) and \H = C(-,X).
X
5. Prove the Normal Form Theorem 4.3.9. Note that skip, put and discard each require special
treatment. Say where each of the five laws is used, explaining why the last is not redundant.
x
6. Let L be a free theory (with no laws). Show that the isomorphisms of \Clone are just the
L
8. Show that Example 4.3.4 correctly solves the cubic equation; explain where the Floyd rules
9. Formulate the side-conditions on the use of variables which are needed to adapt the Floyd rules to
assignment.
op
10. Explain Examples 4.4.2, in particular that functors C→ Set and C → Set are actions.
12. Why are forgetful functors between concrete categories usually faithful but not full?
13. Show that the functor Rel→ CSLat of Example 4.4.4(b) is full and faithful.
14. For a locale (frame) A, define pts(A) = Frm(A,Ω). Extend this definition to a functor pts:Loc→
Sp. When is it faithful? Full?
15. Let \typeF0, \typeF1 and \typeF2 be the sets of vertices, edges and faces of some polyhedron with
triangular faces, for example these sets have respectively 12, 30 and 20 elements in the case of an
n-1
icosahedron. Using the functions d0 ,d1 , …,d :\typeF → \typeF , describe the boundary of
n n n n n-1
i
the n-dimensional faces and explain how the data (\typeF ,d ) fit together into a functor ∆ →
n n
Set, where ∆ is the opposite of the category of finite sets and injective functors. Such a structure
is called a simplicial complex.
16. Verify that there is a category K(C) , known as the Karoubi completion, whose objects are
idempotents ( e;e = e, Definition 1.3.12) and whose morphisms e→ d are C-maps f with e;f = f =
f;d. Show that every idempotent in K(C) is split, ie expressible as \nearrow ;i where i;\nearrow =
id. Let F:C→ Θ be any functor, where Θ has given splittings of idempotents. Show that there is a
unique functor K(C)→ Θ extending it and formulate this as a universal property. Finally, show
that functors P:K(C)→ Θ correspond to semifunctors F:C→ Θ (Example 4.4.6(d)).
17. Show that groups and groupoids, considered as categories, have no non-trivial products ( cf
Exercise 5.1).
18. Prove Proposition 4.5.14, that the terminal object and products of pairs suffice to give all finite
products.
19. Justify Example 4.5.8(g), ie that N ≡ NxN in the category composed of primitive recursive
functions.
20. Propositions 2.6.7- 2.6.9 defined three constructions with well founded relations. Are they
products? If so, identify the categories.
22. Show that the Tychonov product (Example 3.9.10(e)) gives the (categorical) product in Sp and
also in Dcpo and IPO.
23. Let L be the two-sorted algebraic theory of a commutative ring together with a module.
Characterise the objects of Cn× by a pair of natural numbers, and the morphisms by a list of
L
24. Let L be a single-sorted algebraic theory. Show that the forgetful functor |-|:Mod(L) → Set
creates products (Definition 4.5.10(c)). What is the analogous result for many-sorted theories?
25. Let L be an algebraic theory all of whose operation-symbols happen to be unary. Consider its
classifying categories \Clone and Cn× qua unary and algebraic theories respectively. Show that
L L
\Clone is strongly equivalent to the full subcategory of Cn× with of one-variable contexts, ie
L L
26. Show that any internal monoid in the category Mon of monoids is commutative. [Hint: consider
m(x,ee,y).]
27. For complete join-semilattices A and B let A⊗B , A\PAR B and A\multimap B be respectively the
semilattices of Galois connections, co-Galois connections and left adjoints from A to B
(Definition 3.6.1). Show that
(a)
A⊗B satisfies the Mac Lane-Kelly laws with unit Ω;
(b)
CSLat(A,B\multimap C) ≡ CSLat (A⊗B,C);
(c)
A⊗Ω ≡ A ≡ Ω⊗A;
(d)
op op
(A\multimap Ω )≡A ;
(e)
op op op op
A\PAR B ≡ A \multimap B ≡ (A ⊗B ) also satisfies the Mac Lane-Kelly laws, with
op
unit Ω ;
(f)
op
negation defines CSLat-maps Ω→ Ω and A⊗B→ A\PAR B.
28. Let !A be the lattice of Scott-closed sets (Proposition 3.4.9) of a complete lattice A. Show that !1
= Ω, !(Ax B) ≡ (!A)⊗(!B) and [A→ B] ≡ (!A)\multimap B, the latter being the lattice of Scott-
continuous functions. Hence show that the category of complete lattices and Scott- continuous
functions is cartesian closed.
29. (Bill Lawvere) Let L be a single-sorted algebraic theory. Show that Cn× ≅ C for a category with
L
obC = N such that addition is the categorical product. Conversely show that every such category
C arises in this way.
30. Show that any group in which ∀g.g2 = id is Abelian. Express this argument in the various
notations we have used.
31. Show that if f(x,y) is a derived operation in the theory of groups such that f(x,f(y,z)) = f(f(x,y),z)
then f(x,y) is one of the forms x y, y x, x, y or id. [Hint: it is enough to consider Z2.]
X
32. Describe the category (analogous to that in Lemma 4.5.16) of which the function-space Y is the
terminal object.
33. Let C be a cartesian closed category. Show that K(C) is too (Exercise 4.16).
34. Consider the category C whose objects are raw λ-terms of type X in a context Γ, so obC = Cn×
L
+λ(Γ,X), where the maps of C are reduction paths. What equivalence relation on paths is needed
in order to make the diamond in the Church-Rosser Theorem (Lemma 1.2.4) commute in C? A
canonical choice amongst equivalent paths is given by reducing these redexes from left to right;
this is called a standard reduction: see [Bar81, Definition 11.4.1].
35. By considering the action of substitution in the λ-calculus, show that the equivalence relation of
→
the previous exercise is needed to make horizontal composition in the 2-category \Frak Cn of
contexts, raw λ-terms and βη δ-reduction well defined. Use it to characterise normal λ- terms,
distinguishing them from terms which reduce to themselves.
36. Show that there is a natural transformation between parallel group homomorphisms iff they are
conjugate. Give an example of a strong equivalence (Definition 4.8.9(b)) between groups for
which η and ε are not determined by the isomorphisms F and U involved.
38. Show how to compose and invert strong equivalences, and how to compose equivalence functors.
Show also that equivalence functors are confluent ( cf Exercise 3.27).
39. Which parts of Definition 4.4.8 are invariant under weak equivalence? What are the invariant
versions of those which are not, and also of injectivity and surjectivity on objects?
40. Let C be a category with finite products. Show that the Yoneda embedding \H(-):C
Cop
\hookrightarrow Set (Theorem 4.8.12(b)) preserves products. [Hint: see Proposition 3.2.7(a)
and Remark 4.5.9.]
Cop
41. For any small category C, show that S ≡ Set is cartesian closed. [Hint: [F→ G](X) = S(\H ,
X
[F→ G]) = S(\H x F,G) by the Yoneda Lemma, Theorem 4.8.12(c).] Deduce that the Yoneda
X
embedding preserves function-spaces. (See Exercise 3.30 for the poset version.)
42. Find a category C (which need only have two objects) in which the Schröder-Bernstein Theorem
Cop
(Exercise 3.63) fails. Using the Yoneda embedding, show that it also fails in Set . Since this is
a topos (model of Zermelo type theory), this theorem relies on excluded middle.
43. Let X be a topological space. Show how to define the path category C, whose objects are the
points of X and whose morphisms x→ y are continuous functions f:[0,n]→ X from real intervals
of length n with f(0) = x and f(n) = y. [Hint: composition of paths adds lengths.]
A homotopy f→ g between paths of the same length n is a function h:[0,n]x[0,1] with h(t,0) = f(t),
h(t,1) = g(t), h(0,s) = x and h(n,s) = y. Modify this definition to allow f and g to have different
lengths, and in particular so that f;f- and f-;f are homotopic to identities in C, where f-(t) = f(n-t).
Define a bi-category of points, paths and homotopies.
Now show that the existence of a homotopy f→ g is an equivalence relation, and that its quotient
is a groupoid, the fundamental groupoid π1(X) of X.
44. Show that evaluation has the dinaturality property on the left.
X
45. Show that the fixed point operator \YY :X → X (Proposition 3.3.11) satisfies the dinaturality
X
property on the right, and conversely that any transformation with this property yields fixed
X
points. [Hint: put Y = X and consider id ∈ X .]
47. Let C be any category. Taking ℑ to be ∅, {•} , {•→ •}, {•\rightrightarrows •} and {•→ •→ •},
describe [ℑ→ C].
48. Taking Γ to be successively {•} , {•→ •} and {•→ •→ •} use the universal property to show
that the exponential [X → Y] in the cartesian closed 1-category Cat must have as objects the
functors X→ Y, as morphisms the natural transformations between such functors and as
composition vertical composition of natural transformations. Finally, use the diagram with four
objects to show that this is associative.
49. The proof of Theorem 4.8.10 defined the λ-abstraction [(P)\tilde] of any functor P:ΓxX→ Y.
Show how to define [(ψ)\tilde] for any natural transformation ψ:P→ Q between such functors.
[Hint: replace Γ by Γx{ •→ •}]. Show that this is natural in the sense of Definition 4.7.2(c),
preserves vertical composition, and defines a bijection between the natural transformations P→ Q
50. Write down the definition of a 3-category, together with its notions of composition and the laws
which hold between them. Give some geometrical and categorical examples. [Hint: a ``middle
four'' law between any two levels suffices: there is no ``middle eight.'']
51. By a pullback in a poset is meant the meet of two elements which have a common upper bound
(Exercises 3.5, 3.20 and 3.34). Let C be the category of posets with pullbacks and pullback-
preserving monotone functions. For X,Y ∈ obC, write [X→ Y] for C(X,Y) with an order relation
which is to be determined; show that for ev:[X → Y]x X→ Y to preserve pullbacks (f ≤ g)⇒ ∀x,x′.
X
x′ ≤ x⇒ fx′ = fx∧gx′ is needed. Show that this does in fact make C cartesian closed.
Let V be the three-point poset {no ≥ ⊥ ≤ yes}. Show that there is no pullback-preserving
function por:VxV→ V with por(no, no) = no and por(yes, ⊥) = yes = por(⊥,yes).
INTRODUCTION
● Applications
● Slices
5.2 SUBOBJECTS
● Partial morphisms
● Conditionals
● Disjoint unions
● Abelian categories
● Stone duality
● Free (co)products and van Kampen's theorem
● Distributivity
● Extensive categories
● Stable disjoint sums
● Interpretation of theories with disjunction
● Kernels
● Congruences
● Quotients
● General coequalisers
● Colimits by duality
● Image factorisation
● Properties of factorisation systems
● Finding factorisations
EXERCISES V
Chapter 5
Limits and Colimits
Products may easily be defined for infinitely many factors, and also considered in the opposite category.
However, unlike meets and joins in posets, this does not exhaust the possible types of limits and colimits
in categories. Since most of the interesting phenomena may be observed more clearly in the concrete
cases of pullback, equaliser, pushout and coequaliser, we postpone the abstract definition to Section 7.3.
This chapter is an account of first order logic, originally motivated by the needs of homological algebra -
the understanding of sets, functions and relations came later. As we said in Chapter II, products,
equalisers, sums and quotients of equivalence relations provide the real foundation for algebra, rather
than the powerset. The diversity of the behaviour of finite limits and colimits is striking: the basic
features of groups, rings, vector spaces and topology may often be discovered just by looking for the
coproducts in these categories.
Besides these traditional applications in mathematics, we also show how stable disjoint coproducts (as in
Set) interpret the conditional declarative language, with if then else fi. The extension to while in
Section 6.4 shows that general coequalisers in Set are much more complicated than equalisers or
quotients.
As in the λ-calculus, the need for the operations of logic to respect substitution must be expressed in
category theory. In Chapters VIII and IX we shall show in terms of syntax that pullbacks effect
substitution, but in this chapter relational algebra exhibits the same behaviour.
Limits and colimits also interact in that we can try to construct one from the other as in Theorem 3.6.9.
However, size issues arise in the case of categories, where they did not for lattices, so we postpone the
general result until Theorem 7.3.12 . By way of preparation we consider factorisation systems, which
abstract the image of a function. This useful technique is frequently relegated to an exercise, and the
lemmas repeatedly re-proved in papers which use it. Here we treat it in full as we shall need it for the
study of the existential quantifier in Section 9.3.
http://www.cs.man.ac.uk/~pt/Practical_Foundations/html/s50.html2007-8-27 11:53:18
Practical Foundations of Mathematics
DEFINITION 5.1.1
(a)
Let f,g:X\rightrightarrows Y be a parallel pair of maps in a category S. Then an object E together
with a morphism m:E→ X is an equaliser of f and g if m;f = m;g, and whenever a:Γ→ X is
another morphism such that a;f = a;g there is a unique map h:Γ→ E with a = h ;m.
(b)
Let f:X→ Z and g:Y→ Z be two maps in a category. Then an object P together with a pair of
maps π0:P→ X and π1:P→ Y is a pullback if π0;f = π1;g and whenever a:Γ→ X and b:Γ→ Y also
satisfy a;f = b;g there is a unique map h:Γ→ P such that h;π0 = a and h;π1 = b.
Pullbacks and equalisers are unique up to unique isomorphism where they exist, cf Theorem 4.5.6.
The pullback and mediator are written X× Y and a,b. However, one should remember that this notation
Z
hides not only the projection maps (which were already absent from the product and exponential
notations we have used) but also the maps f and g which are part of the data. The object X× Y or the map
Z
π1:X× Y→ Y is also called the pullback of f along or against g. In Proposition 5.1.9, where we write π0
Z
= f*g, we shall see that this asymmetrical language is more typical of the way pullbacks arise than the
diagram above suggests.
Pullbacks are often indicated with the right angle symbol, which was suggested by William Butler in
1974 and popularised by Peter Freyd. In Section 9.4 we will no longer be able to afford the space for it,
and will instead adopt the convention that the ubiquitous pullbacks are drawn as parallelograms. This
There is no widely used notation for an equaliser, but Freyd and Scedrov [ FS90] write \rightarrowtail \f
g
•
or just \rightarrowtail for the map m. The hook notation will be discussed in the next section.
•
LEMMA 5.1.2 Suppose that the two squares commute below, and that the one on the right is a pullback.
Then the rectangle is a pullback iff the left hand square is. []
Applications
EXAMPLES 5.1.3
(a)
Pullbacks in a poset are just meets, but of pairs with a common upper bound (Exercises 3.5, 3.20,
3.34 and 4.51). Since all squares commute, what the bound is doesn't matter, only that it exists.
Again, since any two parallel maps are equal, they trivially have an equaliser: the identity.
(b)
A parallel pair f,g:[[(x)\vec]:[(X)\vec]] \rightrightarrows [[(y)\vec]:[(Y)\vec]] in the category of
contexts and substitutions is given by a list of pairs of terms \funf ([(x)\vec]),\fung ( [(x)\vec]):
j j
\typeY . A substitution a = [[(x)\vec]: = [(a)\vec]] has equal composites with them iff it is a
j
unifier, ie \funf [[(x)\vec]: = [(a)\vec]] = \fung [[(x)\vec]: = [(a)\vec]], or, in the informal
j j
notation, \funf ([(a)\vec]) = \fung ([(a)\vec]). The equaliser, if any, is the most general unifier
j j
(Remark 1.7.8), since any other unifier is a substitute. Similarly pullbacks solve \funf ([(x)\vec])
j
= \fung ([(y)\vec]). In Section 6.5 we shall construct the most general unifier in the simplest case,
j
(c)
A pullback rooted at the terminal object Z = 1 is a product. In any category with binary products,
pullbacks may be constructed from equalisers and vice versa ( cf Remark 5.2.3).
(a)
The equaliser of two parallel functions f,g:X\rightrightarrows Y is (the inclusion of) the subset E =
{x|f(x) = g(x)}. If a: Γ→ X with fo a = go a then the results of a at all elements of Γ lie in E ⊂ X,
so a restricts to h:Γ→ E ⊂ X.
(b)
The pullback of any two functions f:X→ Z and g:Y→ Z is the subset {x,y|f(x) = g(y)} with the
usual projections. A pair of maps from Γ gives rise to a map a,b:Γ→ Xx Y to the product, which
restricts to the pullback in the same way as for the equaliser.
(c)
If g:Y ⊂ Z is a subset inclusion then the pullback is the inverse image, f-1[Y] ⊂ X.
(d)
If f and g are both subset inclusions then the pullback, X∩Y ⊂ Z, is their intersection (Example
2.1.6(d)).
(a)
The restriction of the order on the source gives the equaliser and on the product gives the
pullback in Preord, Pos, CSLat, SLat, Lat, DLat, BA, Frm, HSL, Heyt and Dcpo. This fails,
however, in many popular categories of domains.
(b)
Cop
Equalisers and pullbacks in Set are constructed pointwise and the Yoneda embedding
Cop
(Theorem 4.8.12(b)) \H(-):C \hookrightarrow Set preserves whatever limits exist.
(c)
The category Mod(L) of models of any algebraic theory L has pullbacks and equalisers. Indeed
Σ
the forgetful functor Mod(L)→ Set creates limits ( cf Definition 4.5.10(c)), Σ being the set of
sorts.
(d)
The category of trichotomous orders (Definition 3.1.3) and strictly monotone functions does not
have products, but it has got pullbacks and equalisers, constructed in Set. This is because its
(e)
A coherence space is a set X with two symmetric relations \coh (joined or coherent) and \icoh (un-
joined or incoherent ) satisfying a trichotomy law: exactly one of x\coh y, x = y and x\icoh y
holds. An embedding is a function which preserves all three relations, whence it also reflects
them and is injective, so pullbacks are intersections.
(f)
The category of fields also has pullbacks and equalisers but not binary products or a terminal
object. We can say that two elements x,y ∈ K of a field are apart in a positive sense which is
preserved by homomorphisms, if ∃z.(x-y)z = 1, which we write as x#y. This obeys the dichotomy
law that exactly one of x = y and x#y holds, whence all homomorphisms are injective. The full
subcategory consisting of those fields in which a particular polynomial has a root (say x2+1 = 0)
has pullbacks but not equalisers.
(a)
Exercise 4.34 showed that the Church- Rosser Theorem (Fact 2.3.3) may be expressed as a
commutative square in a certain category. This square is in fact a pushout (a pullback in the
opposite category) [ Bar81, Exercise 12.4.4].
(b)
The effect of the product functor on morphisms (Proposition 4.5.13) gives rise to pullback
squares. These also occurred in the diagrams Example 4.6.3(f) and Definition 4.7.9.
(c)
Let f:X′→ X in C and g:E′→ E in Γ be maps in any two categories. Then this square (Proposition
4.8.3(c)) is a pullback: omitted diagram environment This means that the result of applying a
pullback-preserving functor P:ΓxC→ D to this square yields another pullback: cf Exercises
3.20 and 5.9.
(a)
they generalise products (so are also called fibred products),
(b)
they have something to do with equality, and
(c)
in Chapter VIII their primary role is substitution, cf Exercise 5.6.
In the first two cases there is a symmetry between the two legs, but this is not true of substitution, where
one is definitely acting on the other and not vice versa . Let's look at (b) and then (c).
REMARK 5.1.7 The pullbacks below are in a sense trivial, in that the pullback of an identity must be an
isomorphism. But if we think of the pullback as a subset X× Y ⊂ XxY, on the left we obtain the graph of
Z
The even simpler case on the right gives the binary equality relation, ( = ) ⊂ XxX, to which we return
X
in Remark 8.3.5. []
To see pullbacks as products exactly, we need to formalise the idea used in Lemma 4.5.16. For the
terminal object, see Exercise 5.4.
DEFINITION 5.1.8 Let X be any object of any category S. The slice category S↓ X has
(a)
as objects the S-morphisms d:Y→ X,
(b)
as morphisms the commuting triangles in S, omitted diagram environment
(c)
and as identity and composition those for S .
PROPOSITION 5.1.9 If S has chosen pullbacks against u:X′→ X then u*:S↓ X→ S↓ X′ is a functor. This is
a contravariant action (Section 4.2), except that id* need only be isomorphic to the identity and (u;|)* to
u*·|*.
PROOF: The effect of u* on the morphism f is the mediator between the pullback parallelograms.
Identities and composites are preserved by the same argument by uniqueness of mediators as in
Proposition 4.5.13. []
The analogue of the slice for posets is just the lower set generated by an element (Definition 3.1.7). In
the next section we shall use the same construction, but with the ds restricted to be monos. It will appear
in a more general form in Definition 8.3.8, where the ds belong to a specified class but u and f will be
arbitrary. An important common generalisation of pullbacks and slices (comma categories) is given in
Definition 7.3.8.
5.2 Subobjects
This section and the next study the categorical notions of subsets and injective functions, cf Exercise
1.16; Remark 5.8.4 discusses surjective functions for their own sake, rather than by duality. In Set a
function which is both mono and epi is an isomorphism, but this fails in most other categories of
interest: Proposition 5.8.10 shows what is needed.
(a)
a monomorphism or mono if the post cancellation property holds: given any a,b:Γ
\rightrightarrows U, if a;m = b;m then a = b (reading maps from Γ as generalised elements in
the sense of Remark 4.5.3, this says exactly that m is injective Definition 1.3.10(a)),
(b)
a regular mono if it is the equaliser (Definition 5.1.1(a)) of some parallel pair ( cf Lemma 5.6.6
(a) for a canonical such pair), and
(c)
a split mono if there is a postinverse e:X \twoheadrightarrow U with m;e = \id .
U
Dually we have (regular, split) epimorphisms or epis - please, not ``epics.'' Monos are often indicated by
a hook on the arrow (\hookrightarrow or \rightarrowtail ) and similarly epis by \twoheadrightarrow (or,
but not in this book, by → ); notice that the hook is at the end where the cancellation properties hold.
(a)
Then m is mono iff the square is a pullback:
(b)
In particular this happens if m is the structure map of an equaliser.
(c)
if m is split mono then so is Fm;
(d)
if F preserves pullbacks and m is mono then so is Fm ;
(e)
if F preserves equalisers and m is regular mono then so is Fm.
(f)
if m and \ are mono then so is m; \, and similarly for split monos, but Example 5.7.5(d) shows
that this may fail for regular monos;
(g)
conversely if m;\ is (split) mono then so is m;
(h)
if m;\ is regular mono and \ is mono then m is regular mono. []
REMARK 5.2.3 The class of split monos is not closed under pullback, but instead generates the class of
regular monos (so a regular mono is ``possibly split'' in the sense of Definition 3.8.2). For if m is the
equaliser of f,g:X\rightrightarrows Z then it is the pullback along f,g of the diagonal Z→ ZxZ (assuming
that this exists). The pullback u*m:V→ Y of m along any map u:Y→ X is also regular mono, being the
equaliser of (u;f) and (u;g).
The class of plain monos is also stable under pullback. See Example 5.4.6(e) and Lemma 5.5.7 regarding
coproduct inclusions X→ X+Y, and Exercise 5.37 for pullbacks of split epis. []
If there is any map f (necessarily a unique mono, by the cancellation properties) making this triangle
commute, we write m ≤ \ or U ⊂ V.
X
REMARK 5.2.5 We write \Sub (X) for the class of subobjects. It is the full subcategory (S↓ X) of the
S
slice S↓ X (Definition 5.1.8) consisting of those objects d:U→ X for which d is mono. There is no a
priori restriction on the morphisms, but in the case of monos (and not in the similar but more general
situation, S↓ X, of Definition 8.3.8) f is forced to be mono and unique. The latter makes \Sub (X) a
S
preorder, so we take the poset reflection (Proposition 3.1.10). If this is small (a set) for all X then we say
that S is well powered.
We said that pullbacks arise in type theory as equality types, products and substitution. When restricted
to subobjects, the equality types become trivial (Proposition 5.2.2(a)), but the product and substitution
are known as intersection and inverse image (u-1). The pullback u* considered as a functor says that if
U ⊂ V then u-1(U) ⊂ u-1(V).
op
If the category S is well powered and has inverse images then there is a functor Sub:S → SLat, with
Sub(u) = u-1.
PROPOSITION 5.2.6 In Set, the type Ω of propositions is a subobject classifier. That is, for any mono m:U
\hookrightarrow X, there is a unique function χ :X→ Ω which makes the square a pullback:
m
In other words, the natural transformation S(-,Ω)→ Sub(-) defined by the square is an isomorphism, so
the functor Sub is representable (Definition 4.8.13) by Ω (we introduced the symbol Ω in
Notation 2.8.2).
Although the pullback is given only up to isomorphism, the subobject is unique, being defined as an
isomorphism class. To show that the isomorphism S(-,Ω)→ Sub(-) is natural, use Lemma 5.1.2.[]
X
Using cartesian closure, the functor \Sub (-xX) is also representable, by the powerset P(X) = Ω
S
(Definition 2.2.5). These properties, including the existence of pullbacks, are precisely what we need to
model Zermelo type theory, ie to axiomatise the category of sets and functions. Such a category is called
Cop
an elementary topos. These properties extend to the functor category Set for any small category C,
by Exercises 4.41, 5.8 and 5.13, so presheaves form a topos.
COROLLARY 5.2.7 Every mono m in Set is regular, because it is the equaliser of χ and x→ T. []
m
We are running way ahead of ourselves in discussing higher order logic at this point: the purpose of this
chapter is to present the traditional categorical account of the first order logic of sets. Section 9.5 returns
to the comprehension and powerset axioms in type theory.
For example, this could be the set of roots of a polynomial, or some geometrical figure such as the circle
x2+y2 = r2.
Either by replacing A with a context, or by using pullbacks (intersection of subobjects), this treatment of
a single equation extends easily to the simultaneous solution of a family of them, a = b, ie to
conjunction.
Given the interpretations of equations a = b and c = d, we may express the (semantic) entailment ( cf
Corollary 4.6.8),
→ → → → → →
x : X \vDash a( x ) = b( x )⇒ c( x ) = d( x ),
ie that the set of solutions to one is contained in the other, by saying that there is a fill-in as shown,
making this square commute:
Besides equality and conjunction, there is another connective which can be expressed using finite limits.
REMARK 5.2.9 Recall that we wrote [^(y)]:XxY→ X for the product projection in Definition 4.5.7. In this
case, if there is a fill-in
then
→ → → → →
∀ x .(∃y.a( x ,y) = b( x ,y)) ⇒ c( x ) = d( x ).
→ → → → →
,
∀ x .c( x ) = d( x ) ⇒ ∃!y.a( x ,y) = b( x
y),
The inverse map, which we have already supposed to be present in the category, is an operation of arity
[(X)\vec]\vdash Y, defined conditionally on the equations c([(x)\vec]) = d([(x)\vec]). We may give it a
name, ie conservatively extend the language, using the i-calculus (Lemma 1.2.11).
Entailments of this form may be used as axioms for a generalised algebraic theory. For example,
Peter Gabriel and Friedrich Ulmer [GU71] proved the duality linking small lex categories qua theories
to their categories of models, which are locally finitely presentable categories (Definition 6.6.14(c)).
Chapters 2 and 3 of [MR77] provide a textbook account of this approach (there's no need to read
Chapter 1 first), as does [BW85] , which uses sketches.
Proposition 5.6.4 describes equivalence relations categorically in this way. Remark 5.8.5 discusses ∃
without the uniqueness condition. In my view it is more natural to describe the theory of categories
using dependent types (Example 8.1.12).
Special subobjects Algebraic equations typically describe a smaller class of subobjects than general
predicates do. We may similarly wish to restrict attention to recursively enumerable sets or those of
some other low degree of logical complexity. In categories other than Set, perhaps not all monos are
regular or ``well behaved'' in some other sense. For such a class to be useful it must at least be closed
under inverse image, ie be invariant under substitution.
(a)
all isomorphisms are in M,
(b)
if m:X→ Y and \:Y→ Z are in M then so is m;\, and
(c)
if m:X→ Y is in M and u:Y′→ Y is arbitrary then the pullback u*m:X′→ Y′ exists and is in M
is called a class of supports or a dominion. Of particular interest is the case where there is an object Σ
equipped with a global element T:1→ Σ having the same property for M that Ω has for all monos in Set.
This is called a support classifier or dominance. If both Σ and Ω exist then there is a ∧-semilattice
(a)
all monos in Set, classified by Ω ( Proposition 5.2.6);
(b)
upper subsets in Pos, also classified by Ω (Example 3.1.6(f));
(c)
op
algebraic equations in CRng , the category of affine varieties (not classified);
(d)
open subsets in Dcpo, IPO, Sp or Loc, classified by the Sierpi\'nski space S (Definition 3.4.10);
(e)
recursively enumerable subsets in a certain category composed of total recursive functions;
Exercise 5.10 describes the classifier.
Classes of monos will be used as the supports of partial functions in the next section. Pullback-stable
classes of maps which are not necessarily monos will be needed to interpret dependent types in Chapters
VIII and IX. We study support classifiers and the powerset in Section 9.5.
We aim to identify the essential categorical features of the imperative action, where the interpretation of
such a context is the set of states of [(x)\vec] which satisfy φ[[(x)\vec]], ie the comprehension {[(x)\vec]|φ
[[(x)\vec]]}. If φ[[(x)\vec]] is a family of equations, we already know how to express this subobject
directly in category theory, as an equaliser; this will be extended to full first order logic in Section 5.8.
Midconditions were introduced with a view to proving correctness of programs, but they are also used to
control (non-)termination. Although we cannot cause non-termination in the programs we build until we
introduce recursion or while loops in Section 6.4, the building blocks may themselves be non-
terminating if applied indiscriminately. We use conditionals to ensure that they only get called in
circumstances where we know that they are defined and produce correct results.
EXAMPLE 5.3.1 In the program for the cubic equation (Example 4.3.4), some of the operations involved
are only defined on part of R, or have ambiguous results. We must be clear, for example, that √{ } and
cos-1 are the inverses of maps (-)2:P→ P and cos:H→ I, where the subsets P = [0,∞), H = [0,π] and I = [-
1,1] of R must also be objects of the category.
If, as in Section 4.3, the meaning of a program is to be a morphism of a category whose objects are
simply lists of program-variables, then this morphism must be allowed to be a partial one. We can
restore totality by introducing virtual objects, defined by program-variables together with
midconditions. In particular, the troublesome primitive operations in the cubic equation program are
treated as partial functions (Section 3.3) whose supports are given by known predicates, ie the
preconditions which guarantee termination and correctness.
For this approach to be adequate for recursive or while programs, the logic used to define the virtual
objects must be strong enough to express the termination or otherwise of the program-fragments which
are about to be executed, and in particular stronger than the program itself could verify. This is the case
if category S incorporates the predicate calculus, with quantification over N, or is simply assumed to be
(like) Set. This chapter adopts the point of view of traditional categorical logic, which aims to describe
the category of sets and functions. Our account therefore falls short of a purely syntax-driven
construction.
REMARK 5.3.2 Let u:Γ→ ∆ be a terminating (everywhere defined) program, where Γ = [[(x)\vec]:[(X)
\vec]]. The interpretation of its restriction to the subset on which φ[[(x)\vec]] holds is of course the
composite
→ → u
]} \hookrightarrow Γ→
[Γ,φ] ≡ { x |φ[ x
∆,
where different programs u,|:Γ\rightrightarrows ∆ may have the same effect when restricted to [Γ,φ].
The Floyd assertion (Remark 4.3.5)
{φ}
u {θ}
says that the target may also be restricted to the subset [∆,θ]\hookrightarrow ∆, ie that (in the
interpretation as sets of states) there is a function which completes the square
The top map is unique because the right hand one is an inclusion. This square is a pullback iff φ is the
weakest precondition u*θ for which the property is valid (Remark 4.3.7).
In the terminology of Example 3.8.3, we still aim for total correctness; partial correctness is more useful
for while programs (Remark 6.4.16) .
REMARK 5.3.3 Notice that we have returned to the more general notion of context developed for the
predicate calculus (Definition 1.5.4), which involves both typed variables and logical formulae. Given
that the Floyd rules are anyway not fully specified without a statement of the variables involved, with
their types, we now see them as a minor variant of our standard notation u:[Γ,φ]→ [∆,θ].
Computationally, Γ is the physical range of the program-variables. The state is confined to [Γ,φ] only by
a ``gentlemen's agreement,'' namely the proof of the preceding program-fragment: the program could be
started (such as in a debugging environment) at this intermediate point in any state [(a)\vec] ∈ [[Γ]] .
Consider, for example, the type of primes (with the usual binary representation of numbers), or the type
of codes for programs that terminate. These acquire their meaning by intension, not extension.
The discard commands divide Example 4.3.4 into seven phrases. Each one is interpreted as a bijection,
whose inverse is essentially given by the comments. When a set is defined using comprehension in
Zermelo type theory, it is commonly understood to be interchangeable with any isomorphic set, defined
by some other set-theoretic formula. However, in the case of a program such as this, to dismiss these
isomorphisms as mere changes of representation of the same object would trivialise this historic
achievement in algebra. So, in a computational intuition this virtual object is more closely related to the
ambient Γ than to any isomorphic [∆,θ]. There is a rigid division of the context between typed variables
and predicates. The axiom of comprehension makes the division permeable, emancipating the subset
(Sections 9.2 and 9.5).
Partial morphisms Now we drop the assumption that u:Γ→ ∆ is total on the physical range of the
program-variables. We begin by defining partial functions in terms of a given class of total functions in
a category S such as Set. Afterwards, we consider how the total functions and virtual objects can be
recovered when the possibly non-terminating programs are the raw material.
DEFINITION 5.3.4 A partial function with source X, target Y and support U in a category S is an
isomorphism class of diagrams like
Definition 3.3.4 defined equality and the extension order (f\sqsubseteq g) on partial functions, exactly
analogously to the definitions for subobjects in Section 5.2. They also bear the same relation to the
category of spans used in Lemma 4.5.16 as S↓ X ≅ \Sub (X) did to the slice S↓ X.
S
PROPOSITION 5.3.5 Composition of partial functions, U;V, defined by pullback, is associative, cf Lemmas
1.6.6 and 4.1.3. (Note that U;V is defined as a subobject, ie as an isomorphism class of such pullbacks,
so it is legitimate to say that associativity holds up to equality.)
We write \nearrow [thick] = P(S,M) for the category composed of partial functions.
The diagram on the right shows that a second pullback also arises; it is used in Exercise 5.53 and
Section 6.4. []
REMARK 5.3.6 An alternative view of partial functions starts from the category \nearrow [thick] and tries
to recover S and M:
Proposition 5.8.7 considers an approach to relations analogous to this one for partial functions. A similar
construction using virtual objects applies. Section 6.4 shows how while programs can be interpreted
using coequalisers; in this case the virtual objects form a far larger structure than is needed to prove the
soundness result.
Conditionals Even when no proof of correctness has been supplied for a program, it is still natural to
think of the branches of a conditional as defined only on the virtual subobjects described by the
condition or its negation. The word `` if'' is misleading: the test is not a proposition as in implication but
a computable function. Failure is not enough to cause execution of the else branch: that only happens
when the test has succeeded in producing the second value. This all too common confusion with logic
may be avoided by regarding the condition as a question, to which yes and no are possible responses:
there may be no answer at all. The two parts are put together in the same way as those of a disjunction
(∨E )-box or a coproduct.
DEFINITION 5.3.7 The conditional declarative language extends the \bnfname programs of
Definition 4.3.1 by
ASSUMPTION 5.3.8 Without loss of generality, the test c terminates without side-effect. This can be
ensured by inserting putz = c before the conditional. The test is then just that of a variable of type 2
(Boolean), ie of a single bit ( b inary dig it). We shall need this in the following account because it only
uses the test to select one of the branches, and does not incorporate it into the flow of control.
REMARK 5.3.9 The interpretation of a conditional with test c is defined in terms of the restrictions of the
two branches to the virtual objects
The conditional is then the mediator [g,f]:Γ→ Θ, which is a bijective correspondence so long as the
triangles Y→ Γ→Θ and N→ Γ→ Θ commute. This requires the β- and η-rules
The β- and η-rules make Γ the coproduct Y+N of the virtual objects for the two branches. However, this
is not the typical coproduct situation, as ours also has to agree with a pair of pullbacks.
EXAMPLE 5.3.10 In Sections 4.3 and 4.6 we saw how to interpret the individual lines of the cubic
equation program (Example 4.3.4). The whole program is the composite along the top of
where the dotted line is a product mediator, and we still have to define the lower map in terms of the
conditional. (The numbers refer to the seven phrases into which the discard commands break the
program.)
Writing Y,N ⊂ R2 for the complementary subsets of R2 on which the condition succeeds and fails, the
two maps are then as shown.
omitted prooftree
environment
We aim to redress the balance in our survey, which illustrates several other important themes in
mathematics (more particularly in topology, but this may be a historical accident). Coproducts are very
simple in Set - they are called disjoint unions and interpret conditionals - but get more complicated as
algebraic operations are added. In universal algebra coproducts and pushouts were called free products
and free compositions respectively, because of the way they are constructed for groups.
DEFINITION 5.4.1
(a)
An object 0 is an initial object if for each Θ ∈ obS there is exactly one morphism 0→ Θ.
(b)
Let N,Y ∈ obS. An object C together with maps ν0:N→ C and ν1:Y→ C is a coproduct if for
each object Θ and pair of maps Nf → Θg ← Y there is a unique map p:C→ Θ such that ν0;p = f
and ν1;p = g . We usually write N+Y for C and [f ,g ] for p. In the case Y = N = Θ = X and f = g =
id, we write ∇ :X+X→ X for the codiagonal.
X
In a poset, the initial object is the least element and the coproducts are joins (Definition 3.2.4); in
particular they are falsity and disjunction for formulae under the provability order.
Disjoint unions These are discussed more fully in the next section.
EXAMPLES 5.4.2
(a)
The empty set, ∅, is the initial object of Set, Pos, Preord, Dcpo, Sp, Cat and Gpd, by
Proposition 2.1.9(a).
(b)
The coproduct in Set was constructed in Example 2.1.7 and is called the disjoint union.
Exercise 2.13 showed how to find the mediator p = [f ,g ]:N+Y→ Θ, and that it is unique.
(c)
Coproducts in Pos, Preord, Dcpo, Pfn, Sp, Gpd and Cat are computed as in Set, where in the
first three cases ν (x) ≤ ν (y) iff i = j and x ≤ y ∈ \typeX .
i j i
(d)
The Boolean type is the coproduct 2 = 1 +1 in Set. The inclusions are called yes and no and the
coproduct mediator is the conditional.
(e)
More generally, a variant field in a record consists of a tag i ∈ I and a data assignment x ∈
\typeX . See Exercise 5.36 for how switch is typically implemented.
i
(f)
JAVA allows constructors with the same name and result type but different arities; the source of
such a constructor is effectively the coproduct of the arities.
(g)
JAVA and ML provide idioms for exception- handling, which amount to returning a result of
coproduct type, cf Remark 2.7.9.
Abelian categories In categories of vector spaces and modules for rings, finite products coincide with
the corresponding coproducts.
DEFINITION 5.4.3 An object of a category which is both terminal (1) and initial ( 0) is known as a zero
object. In Vsp, the zero object is the space consisting only of the zero vector, and in general the
singleton is the zero algebra for any single-sorted theory which has exactly one definable constant ( ie
every expression of the form r(c,c,…) must also be provably equal to c), for example Mon, CMon,
SLat, CSLat, HSL, Gp and AbGp. On the other hand, ∅ is the zero object in Rel and Pfn.
EXAMPLE 5.4.4 The coproduct in the category of commutative monoids agrees with the product, in which
ν0:N→ N+Y ≡ NxY is x→ x,0, ν1:y→ 0,y and [f ,g ]: x,y→ f (x)+g (y). In particular
may be seen as going from the coproduct to the product, they are determined by an (nxm) matrix of
maps \typeX → \typeY . Composition is matrix multiplication, in which we take composition of
i j
Conversely, any category with a zero object and biproducts is CMon-enriched, ie the hom-sets carry a
commutative monoid structure for which composition is linear in each argument separately (Exercise
5.20). The categories AbGp, Vsp, SLat, Rel and CSLat are CMon-enriched (Exercise 5.22); in the last
three cases ``addition'' means join or union. []
REMARK 5.4.5 Homological algebra was the progenitor of category theory. Generalising Leonhard
Euler's formula f+v = e+2 for the faces, vertices and edges of a convex polyhedron, Enrico Betti defined
numerical invariants of spaces by formal addition and subtraction of faces of various dimensions; Henri
Poincaré formalised these and introduced homology. Emmy Noether stressed the fact that these
calculations go on in Abelian groups, and that the operation ∂ taking a face of dimension n to the
n
alternating sum of faces of dimension n-1 which form its boundary is a homomorphism, and it also
satisfies ∂ ·∂ +1 = 0. There are many ways of approximating a given space by polyhedra, but the
n n
quotient H = Ker∂ /im∂ is an invariant, the homology group. Since Noether, the groups have been
n n n+1
the object of study instead of their dimensions, which are the Betti numbers [ Die88].
The categories used for homology are AbGp-enriched (additive) - but more. It emerged in the 1950s that
one could argue in them by chasing diagrams involving kernels and cokernels instead of elements.
(Kernels and their quotients are the subject of the later parts of this chapter.) David Buchsbaum
axiomatised ``Abelian'' categories, in his thesis (under Sammy Eilenberg's supervision, but without
knowing about Mac Lane's work) and in [CE56, appendix]. Alexander Grothendieck, again
independently, showed that sheaves of vector spaces and modules also form Abelian categories (1957).
We defer to Definition 5.8.1(d) discussion of the extra condition which an AbGp-enriched category
must satisfy in order to be Abelian, since it also applies to sets and other algebraic theories. Abelian
categories are covered thoroughly in [Fre64], [ML71], [FS90] and in any modern homology text.
In domain theory, Dana Scott (1970) discovered an analogous infinitary property, that certain filtered
colimits coincide with cofiltered limits. This may be used to find domains satisfying equations such as X
X
≡ X . Michael Smyth (1982) showed that it arises in Dcpo-enriched categories. For a treatment of more
general diagrams, see [ Tay87]. More recently, Peter Freyd [ Fre91] has emphasised the coincidence of
initial algebras and final coalgebras for certain functors.
Stone duality For certain algebraic theories - the ones with which the discipline of universal algebra,
op
despite its name, is mainly concerned - Mod(L) has a spatial flavour: the lattices of congruences of
groups, rings and modules are modular, and for lattices they are distributive (Exercise 3.49). This
suggests viewing their quotients as monos in the opposite category. Marshall Stone (1937) showed how
any Boolean algebra arises as the lattice of clopen ( ie both open and closed) subsets of some compact
Hausdorff totally disconnected topological space. This was the first real theorem linking logic to the
mainstream of mathematics, and dualities like this are explored in [ Joh82]. In particular, coproducts of
certain algebras signify topological products.
EXAMPLES 5.4.6
(a)
The two-element lattice is the initial object of DLat , BA and Heyt.
(b)
Z is the initial ring.
(c)
There is no initial field, but Q is initial in the full subcategory of fields of characteristic zero.
(d)
The coproduct of two commutative rings R,S ∈ obCRng is given by their tensor product R⊗ S
Z
(e)
in particular the coproduct of Z/(2) and Z/(3) is the trivial ring (with 0 = 1), so the ``inclusions''
are not mono.
(f)
More generally, any homomorphism T→ R of commutative rings makes R into a T-module; then
the pushout of T→ R and T→ S is the tensor product R⊗ S of T-modules (Example 7.4.7).
T
(g)
The initial object of Frm is Ω, and its coproducts and pushouts can also be found by a tensor
product construction (Example 3.9.10(e)).
Free (co)products and van Kampen's theorem Coproducts of algebras in general tend to be rather
chaotic, Mon being typical.
EXAMPLE 5.4.7 By Corollary 2.7.11, List(X) is the free monoid on a set X. Then List(N+ Y) ≡ List(N)
Set
Elements of such monoids have normal forms in which we choose the first representative in
lexicographic (dictionary) order. The situation for algebraic theories with more operations gets
progressively worse.
Coproducts of monoids are clearly relevant to formal languages, but one might think that the only other
value of this construction is in universal algebra. The following famous result shows that, on the
contrary, it is also of interest to the geometric tradition. These topological intuitions were already present
in Section 4.2.
We only intend to give a sketch of the algebraic idea in its simplest topological form. It is not necessary
that both maps be open inclusions, but there are some topological counterexamples which we do not
want to consider. The interested reader should see eg [Bro88, Section 6.7].
Those not familiar with topology may ignore the compactness and open sets, considering finite networks
instead. Indeed the edges of the network may be oriented, in which case there is a category but no
meaningful group(oid) of paths. For example the paths in an oriented figure of 8 beginning and ending at
the cross-over form the monoid List(2).
THEOREM 5.4.8 Let X be a topological space and U,V ⊂ X be open subspaces with U∪V = X. Put W =
U∩V, so the diagram shown is (both a pullback and) a pushout in Sp, and also in Set.
Then the functor π1 which assigns the fundamental groupoid (Exercise 4.43) to any space preserves this
pushout (it trivially preserves ∅).
PROOF: As obπ1(X) is by definition the underlying set of the space X, it is the pushout of obπ1(U) and
obπ1(V) from obπ1(W), so we must consider the morphisms (paths). Let F:π1(U)→ C and G:π1(V)→ C
be functors which agree on π1(W).
In order to define [F,G](s) for any path s:I→ X (with I = [0,1] ⊂ R), s must be expressed as a composite
of paths in U and V. The open sets s-1(U),s-1(V) ⊂ I are each unions of open intervals; altogether they
cover I, but this is compact so finitely many suffice. Hence the path s is \arga1;\argb1;\arga2;\argb2;···;
\arga ;\argb where each \arga is a path in U and each \argb is a path in V (so the changeover points lie
n n i i
in W = U∩V).
By a similar argument using compactness of IxI, any homotopy between composites s and t of this form
may be decomposed into a (rectangular) patchwork whose cells each lie wholly in either U or V (so the
boundaries between cells of different kinds are in W). Since F and G are defined on homotopy classes,
they map each of these cells to a commutative square in C (Example 4.8.16(g)). By composing this array
of commutative squares, [F,G](s) = [F,G](t). Hence [F,G]:π1(X)→ C is well defined, and it preserves
identities and composition. []
The group of endo-paths of a point a ∈ X is known as a fundamental group of X and written π1(X,a).
When U and V are path-connected and W is contractible (so in particular every path is homotopic to a
point, and the fundamental groupoid of W is trivial), the theorem reduces to saying that the fundamental
group for X is the coproduct (in Gp) of those for U and V. This special case is commonly attributed to
Edgar van Kampen (1935), though Herbert Seifert proved an earlier result for simplicial complexes. Van
Kampen stated his result using generators and relations (Section 7.4), so the proof is very difficult to
follow. Ronald Brown (1967) proved the groupoid form, and Richard Crowell formulated it in terms of a
universal property with a modern proof.
Van Kampen wanted to find the fundamental groups of the complements of algebraic curves in C2. The
case where W is not connected is needed even in the simplest example of the fundamental group of a
circle (or the complement of a point in R2). His results may be deduced from the group oid form, but not
solely from the result for groups.
We did not need to construct the pushout of groupoids, because π1(X) already has this property. Yet the
fundamental groups of a wide range of spaces of traditional interest in geometric topology (such as a
many-handled torus) may be deduced from this theorem, starting from the easy case of contractible
spaces. This illustrates the power of categorical methods, both for producing the ``right'' object for
algebraic study, and for manipulating constructions with it.
These examples show that coproducts in Set, AbGp, Mon and CRng behave very differently from one
another. For coproducts in general algebraic theories we must resort to generators and relations
op op
(Lemma 7.4.8). The next section considers coproducts in Set, Frm and CRng .
The notion of stable (or universal) disjoint coproduct was recognised by the Grothendieck school
( c. 1960), and is part of Jean Giraud's characterisation of sheaf toposes. However, our categorical
account is based on ideas of Steve Schanuel and Bill Lawvere from about 1990. The details are due to
Robin Cockett [Coc93], Aurelio Carboni, Steve Lack and Robert Walters [CLW93].
The weaker of the two notions which we consider corresponds exactly to a type theory with products
and sums (Section 2.3), but I do not know of a syntactic calculus for the stronger one, other than by
restricting full predicate logic. This is unfortunate, as it is this one which we want, but the actual
difference between the two is less than it may seem.
Distributivity Recall the second diagram in Example 5.3.10: this omitted the variable a because it is
not used during the conditional. Clearly we do not want the local behaviour of the program to depend on
how many (unused) global variables are also present.
DEFINITION 5.5.1 A category with finite products and coproducts is said to be distributive if, for all Γ,Y,
N ∈ obS,
is a coproduct diagram ( cf Definition 5.4.1 ). The analogous property for nullary coproducts is Γx0 ≡ 0,
which is equivalent to saying that any map Γ→ 0 is an isomorphism ( cf Proposition 2.1.9(b)); in this
case we call 0 a strict initial object.
omitted prooftree
environment
Categorically, the functor Γx(-) preserves coproduct diagrams:
(Recall from Definition 4.5.7 the use of → for the product projection π1.) In this case the maps
+N),
which are in any case equal by the universal properties, are invertible. The bottom row is given to be a
coproduct, whilst the squares are easily seen to be pullbacks, as they are defined in terms of products;
then we conclude that the top row is also a coproduct.
id c f g]
ifathenfelsegf i:Γ→ , Γx2 ≡ Γ+Γ→ [ ,
Θ.
In Examples 5.4.4 and 5.4.7 we observed that Mon and CMon are not distributive; indeed very few
categories of algebras are [Joh85,Joh90] .
x
REMARK 5.5.2 Theorem 4.6.7 showed that each algebraic theory L is classified by a category \Clone
L
x
with products. The objects of \Clone were contexts, consisting of variables of base type. In Section 4.7
L
→
we gave a similar construction ( Cn ) for the λ- calculus, where the objects are still contexts, but
L
→
consisting of type- expressions in the constructor → . If L was just an algebraic theory then Cn is the
L
x
free cartesian closed category on \Clone .
L
The same can be done for sum types, by allowing the types listed in contexts to be expressions such as X
+Y rather than X→ Y. The terms may now involve the coproduct injections ν0 and ν1 together with the
conditional [ , ], for which the β-, η-, substitution and continuation rules were given in Section 2.3. The
x+
category \Clone of such contexts and substitutions is distributive (from the substitution rule), and the
L
x+
models of L in S correspond to (x,+)-preserving functors \Clone → S. As these automatically preserve
L
Definition 5.5.1, the semantic category S must be distributive too. If L was only an algebraic theory then
x+ x x x+
\Clone is the free distributive category on \Clone ; the inclusion \Clone \hookrightarrow\Clone
L L L L
can be shown to be full and faithful, so the extension is conservative (Sections 7.6- 7.7).
Extensive categories For categories rather than lattices, it is natural to state the distributive law more
generally: that coproducts are to be stable under pullback. That is, if we replace Γx(Y+N) above with an
arbitrary object C and form the two squares as pullbacks, the top row still has the universal property of a
coproduct. (In fact this stronger property does already hold for distributive lattices, but trivially, as
pullbacks exist but are no more general than meets, by Example 5.1.3(a).)
What distinguishes the sums we require from joins in lattices is that joins are idempotent, but we want
components to be disjoint. This property can be formulated as the ``converse'' of stability under
pullback. The term extensive describes properties such as mass, volume and force which increase with
quantity, as opposed to intensive properties like density and acceleration which remain the same.
DEFINITION 5.5.3 An extensive category is one which has finite coproducts, such that every commutative
diagram of the form
In a distributive lattice, the rows can be coproducts (joins) without the squares being pullbacks (meets),
as N∩B and Y∩A can be non-trivial.
LEMMA 5.5.4 Consider the following commutative diagram, in which the rows are coproducts.
(a)
Then the two lower squares are pullbacks iff
(b)
for all commutative trapezia there are unique N′→ N and Y′→ Y making the whole diagram
commute.
+:(S↓ A)x(S↓ B) → (S ↓ (A
+B))
is an equivalence functor for every A,B ∈ obS. In particular, the functor category S2 is equivalent to the
slice category S↓ 2.
PROOF: By the lemma, this functor is full and faithful iff coproducts imply pullbacks. Essential
surjectivity says that the source of any map into A+B must be a coproduct N+Y that makes the squares
commute, but these squares are pullbacks by the first part. []
EXAMPLES 5.5.6
(a)
Set enjoys these properties by Exercise 2.13.
(b)
Pos, Dcpo, Sp, Gpd and Cat are extensive, since the forgetful functor to Set creates coproducts
and pullbacks.
(c)
op op
Loc ≡ Frm and CRng are also extensive (Exercise 5.31). []
Exercise 5.35 explains, using partial maps, why the category of virtual objects in Section 5.3 must be
extensive rather than just distributive to interpret conditionals.
(a)
coproducts are stable, ie the pullback of any coproduct diagram is another one (in particular the
distributive law holds);
(b)
the initial object 0 is strict;
(c)
the components of the coproduct are disjoint, ie the square on the left is always a pullback.
omitted diagram environment
(d)
if the middle square commutes then U ≡ 0;
(e)
for disjointness of general binary coproducts, it suffices that 0 be strict and that the square on the
right be a pullback, ie yes ≠ no;
(f)
the maps Y→ Y+N← N are monos ( cf Example 5.4.6(e));
(g)
in fact they are regular monos, assuming extensivity.
which are clearly coproducts, therefore pairs of pullbacks; so X← 0 is the pullback of the isomorphism
1← 1.
(a)
[[d]] The mediator to the pullback is U→ 0 by (c), so U ≡ 0 by (b).
(b)
[[e]] Form a cube from the middle and right hand squares.
(c)
[[f]] Form the pullbacks U and V in the left hand diagram below, then U ≡ 0 by disjointness; but
B = U+V by stability, so B ≡ V is the pullback in Proposition 5.2.2(a). omitted diagram
environment
(d)
[[g]] The inclusions 1\hookrightarrow 2 \hookleftarrow 1 are split mono, so general coproduct
inclusions are regular mono by Remark 5.2.3. []
THEOREM 5.5.8 A category with pullbacks is extensive iff it has stable disjoint sums and a strict initial
object.
PROOF: It remains to show that Y is a pullback in the diagram below, so take a commutative square with
vertex Γ. Form the pullbacks U and V, so Γ = U+V since the coproduct Y+N is stable ( cf the proof of
Lemma 5.5.7(f)). Now we have a commutative trapezium from U to A+B via A and B, so U ≡ 0 by
Lemma 5.5.7(d).
Γ ≡ V→ Y is the pullback mediator: the triangle Γ→ Y+N is actually the top right square, which
commutes; that to B commutes because the composites as far as A+B are equal and B\hookrightarrow A
+B by Lemma 5.5.7(f). The mediator is unique because Y\hookrightarrow Y+N. []
Interpretation of theories with disjunction Examples 4.6.4 listed some mathematical structures that
are almost algebraic theories, but which have exceptions such as (division by) zero and ( popping) the
empty stack, cf the predecessor (Remark 2.7.9).
EXAMPLES 5.5.12 The theories of natural numbers, lists (Section 2.7), number fields and projective
planes involve axioms of the form
in which there is only one witness to each existential quantifier, and only one term of each disjunction
can be satisfied. These things can be proved from the theory because they also have axioms of the form
REMARK 5.5.13 Unique existential quantification has already arisen for ``essentially'' algebraic theories
(Remark 5.2.9), but the unique disjunction is a new feature of extensive categories. The property
How can we define a classifying category for such a theory, short of interpreting the whole of the
predicate calculus? The sum calculus of Remark 2.3.10ff can express N ≡ 1+N and L ≡ 1+XxL, but,
without extending the fragment [] = (x,+) of logic to include pullbacks, there is nothing to force [[-]]:Cn[]
→ S to preserve the pullback which expresses disjointness. ( A priori we do not even know that this
L
x+
square is a pullback in Cn , though it follows from Section 7.7 that in fact it is.) However, the
L
tradition of sketch theory, like that of model theory, has only considered the situation where S = Set, so
the coproduct diagram in the semantics is necessarily stable and disjoint.
There is no avoiding other finite limits in the cases of fields and projective planes, as the axioms
themselves involve equations.
REMARK 5.5.14 The category Mod(L) of models and homomorphisms of such a theory need not have
products, though it still has pullbacks, equalisers, and indeed limits of all connected diagrams
(Examples 5.1.5).
The axioms above say that lists and natural numbers can be parsed, an idea which we shall take up again
in Chapter VI. Infinitary sums will be treated type-theoretically in Section 9.3. The rest of this chapter is
about coequalisers, but we postpone stability for them until Section 5.8, where it is also combined with
extensivity to give the notion of pretopos. Virtual objects reappear there and in Section 6.4, which
interprets an imperative language with while.
DEFINITION 5.6.1 The kernel pair or level of a morphism f:A→ B in a category with pullbacks is the
pullback square
EXAMPLES 5.6.2
(a)
Any map f is a mono iff ker0f = ker1f = \id (Proposition 5.2.2(a)).
A
(b)
In Set, Kerf = {x,y|f(x) = f (y)} ⊂ A2 is just an equivalence relation.
(c)
In Mod(L), where L is a single- sorted algebraic theory, Ker f is also a subalgebra of Ax A. For
example in Lat,
❍ the constants ⊥ = ⊥,⊥ and T = T,T are in Ker f because it's reflexive, or because f
preserves them,
❍ if x,y and u,v are in Ker f then so are x∨ u, y∨v and x∧u, y∧v, because f is a
homomorphism,
❍ for theories with unary operations ( eg logical and arithmetical negation), Ker f is closed
under them as well, and likewise under operations of arbitrary arity.
(d)
In the category of groups, the subset N = {x|f(x) = 1} ⊂ A suffices to define the kernel pair, as K =
{x,y|x;y-1 ∈ N}. N is the equaliser of f with the trivial (constantly 1) homomorphism. It is
characterised as a normal subgroup, ie a subgroup satisfying ∀x, n:A.n ∈ N⇒ x-1 ;n;x ∈ N.
(e)
Similarly in the category of vector spaces (or modules for a ring) the subspace or submodule U =
{x|f(x) = 0} determines the kernel pair by K = {x,y|x-y ∈ U}. Again U is the equaliser with the
zero map, but this time every submodule arises in this way.
(f)
The kernel pair for rings has yet another representation peculiar to that category, as a (two-sided)
ideal I = {x|f(x) = 0}, Example 2.1.3(b). The kernel pair can be recovered in the same way as for
vector spaces, but I is not an equaliser or subring unless B is the trivial ring; it has the property
that ∀x, i:A.i ∈ I⇒ xi,ix ∈ I.
(g)
In a many-sorted algebraic theory, kernels are constructed for each sort independently of the
Σ
others, as the inclusion Mod( L) ⊂ Set creates pullbacks ( Definition 4.5.10(c)), where Σ is the
set of sorts.
(h)
Let M be a module for a commutative ring R. Then a kernel consists of an ideal I ⊂ R and a
submodule U ⊂ M, with the additional condition that IM ⊂ U. So the sorts do interact, and the
general case is more complicated than this example.
The trick in (d)-(f) does not work for other theories such as lattices, but needs division or subtraction to
translate everything to the origin. However, the kernel pair of a homomorphism of complete semilattices
can also be represented as a subset, which is order-isomorphic to its quotient: see Lemma 5.6.14 below
and Exercises 5.39 and 5.41.
Congruences The kernel pair of any map, where it exists in a category, is an equivalence relation
(Definition 1.2.3), which we express in the style of Remark 5.2.8. For an algebraic theory L, we can
Σ Σ
regard the following diagrams as being in Set, Set or Mod(L), since Mod(L)→ Set creates limits.
DEFINITION 5.6.3 A mono K\hookrightarrow Ax A (or the pair K\rightrightarrows A) for which these
diagrams exist is called a congruence. (This loosely agrees with Definition 1.2.12 in that it transfers
algebraic properties.)
PROPOSITION 5.6.4 In a category with pullbacks, kernel pairs satisfy the reflexive, symmetric and
transitive laws, in the sense defined by the following diagrams.
PROOF: The mediator, x→ x,x, of the diagram on the left expresses reflexivity, whilst that on the right, x,
y→ y,x, expresses symmetry. The bottom and right faces of the cube represent the hypotheses for the
transitive law: f(x) = f(y) and f(y) = f(z). The back face forms their conjunction, and the front the
conclusion; since this is a pullback and the cube commutes, the mediator exists and states the entailment.
[]
The other two faces are also pullbacks (by Lemma 5.1.2 ), and in fact the eighth vertex is the ternary
pullback of the first three edges, and also the limit of the first three faces. See Exercise 5.40 for an
alternative proof.
Quotients
DEFINITION 5.6.5 The coequaliser of a parallel pair u,|:R\rightrightarrows A is the universal map q:A→
Q with u;q = |;q.
So, for any other map f:A→ Θ with u;f = |;f, there is a unique mediator p:Q→ Θ such that f = q;p. As
usual, coequalisers are unique up to unique isomorphism. We shall concentrate on congruences since
they are easier to handle than general pairs, but also because of
(a)
If a map A→ Q is the coequaliser of some pair W \rightrightarrows A then it is also the
coequaliser of its kernel pair. It is called a regular epi.
(b)
If K\rightrightarrows A is the kernel pair of some map A→ Θ then it is also the kernel pair of its
coequaliser.
PROOF: Consider the relation (W\rightrightarrows A)⊥(A→Θ) that the composites are equal; this induces
a Galois connection (Proposition 3.8.14) between the class of pairs of maps into A and that of single
maps out, or between the double slice S↓ A,A (Lemma 4.5.16) and the coslice A↓ S. Then the kernel
pair of A→ Q is the terminal object of \orthl (A→ Q), whilst the coequaliser of W\rightrightarrows A is
initial in \orthr (W\rightrightarrows A). The result follows from the idempotence of Galois connections.
[]
DEFINITION 5.6.7 We want to use equivalence relations to specify that pairs of elements are to be
identified. The coequaliser q:A\twoheadrightarrow Q of a congruence K\rightrightarrows A is known as
its quotient and written Q = A/K. It is said to be effective if K is the kernel pair of q, ie the quotient
identifies the specified pairs and no more.
PROOF: Let Q ⊂ P(A) be the set of equivalence classes. Example 2.1.5 showed that Q is the coequaliser
(the mediator p is unique since A\twoheadrightarrow Q is surjective). This is effective because [x] = [y]
⇔ x,y ∈ K. []
THEOREM 5.6.9 Let L be a finitary algebraic theory. Then the category Mod(L) also has effective
quotients of congruences.
PROOF: We begin with semilattices, as a typical single-sorted finitary algebraic theory. The constant T in
the quotient is the equivalence class [T]. Similarly [a]∧[b] = [a∧b], which we must show to be well
defined with respect to choices of representatives: the equation [a] = [a′] means that a,a′ ∈ K; if also b,b′
∈ K, then a∧b,a′∧b′ ∈ K since it's a subalgebra, so [a∧b] = [a′∧b′].
Categorically, the two squares from \relK[( to \typeA exist as K is a subalgebra; indeed they are
X)\vec] Y
pullbacks and \relK[( is a congruence on \typeA[( . But in order to define the dotted map, the
X)\vec] X)\vec]
top row must be a coequaliser. It is, because the product functors preserve coequalisers. quote omitted
diagram environment
The laws of L are inherited from A, again by choosing representatives . They may be expressed
k
It is essential here that the functors (-) preserve regular epis. In the next chapter we shall extend some of
the methods of universal algebra to infinitary theories: the theory may have infinitely many sorts or laws
without causing difficulty. But this will be at the cost of the ability to treat laws in general; the problem
lies in the above result, though this obstacle can be removed by brute force (the axiom of choice). By
finitary we really do mean that the arity k has to be finitely enumerated (Definition 6.6.2(a)). This is not
finitist dogma: the condition may be formulated abstractly, and is called projectivity, Remark 5.8.4(e).
COROLLARY 5.6.10 In a category with pullbacks and effective quotients of congruences, the Galois
connection used to prove Lemma 5.6.6 reduces to an order-isomorphism between
(a)
quotients (regular epis out) of A, cf Sub(A ) in Remark 5.2.5, with \id as the least element and !:
A
(b)
congruences on A, with the diagonal as least element (the discrete congruence) and AxA as
greatest (the indiscriminate one). []
General coequalisers It remains to convert an arbitrary parallel pair into a congruence, using zig-zags
(Lemma 1.2.4).
LEMMA 5.6.11 In the following diagram in a category with kernel pairs, suppose that u = e;m;\;π0 and
| = e;m;\;π1, the maps are epi and mono as marked and K is the smallest congruence containing R.
(a)
u;f = |;f:W→ Θ
(b)
m;\;π0;f = m;\; π1;f:R→ Θ;
(c)
\;π0;f = \;π1;f:K→ Θ.
(d)
f factor through q, ie there be a unique map p such that f = q;p.
PROOF: [a⇒ b]: e is epi. [b⇒ c]: R ⊂ Ker f, so K ⊂ Ker f by hypothesis. [c⇒ d]: by definition of A/K.
The converses hold by composition. []
COROLLARY 5.6.12 Set has coequalisers for all parallel pairs, as does the category Mod(L) of algebras
for any finitary algebraic theory L.
PROOF: Put R = {x0,x1|∃y.u(y) = x0∧u(y) = x1}, and let K be its congruence closure. In Set this is the set
of zig-zags. []
Note that to find e and m from u and | in Lemma 5.6.11 uses the image factorisation, which we shall
discuss in the next section.
REMARK 5.6.13 In the case of the category of algebras for a many-sorted algebraic theory, the
congruence (\relK ⊂ \typeA x\typeA ) has to be generated for all of the sorts and operation-
X X X X∈Σ
symbols together, because operation-symbols with results of one type may make use of arguments of
any of the other types. Treating it as a subset of ∪ (\typeA x\typeA ), the reflexive, symmetric and
X∈Σ X X
transitive laws, and each operation-symbol (with respect to which the inclusion must be a
homomorphism ), give rise to a closure condition (Example 3.7.5(a)).
Colimits by duality Although this way of constructing coequalisers is not in general available for
infinitary operations, most of those of interest (meets, joins, limits and colimits) are defined by universal
properties. We can use our knowledge of adjunctions to make the necessary choices canonically. Indeed
in some categories it is much easier to find colimits in this way than it is in Set by the combinatorial
technique.
op
For example, since the category of complete join-semilattices is self-dual, ie CSLat ≡ CSLat , its
colimits follow immediately from its limits. The next result was, of course, discovered by unwinding
this corollary, but it is instructive as an example of a construction which does not simply work by
symbol-pushing.
Q = {a|∀x.u(x) ≤ a⇔ v(x) ≤ a} ⊂ A
and observe that it is closed under all meets in A. Using Theorem 3.6.9, Q is a complete join-semilattice,
and the inclusion j:Q\hookrightarrow A has a (join-preserving) left adjoint, e\dashv j. In fact
Q A
e (a) = ∧{q|a ≤ q} and ∨ \argq = e (∨ \argq ).
i i i i
Then
preserves them. []
Although SLat, AbGp, Vsp and categories of modules for rings are not quite self-dual, a similar
technique applies. The particular case of the coequaliser of a linear map u with the zero map v = 0
(Example 5.4.4) is called its cokernel.
Section 6.4 uses coequalisers to interpret loops, and Sections 6.5 and 7.4 consider laws in algebras. The
rest of this chapter shows how limits and monos interact with colimits and epis.
It was Emmy Noether who shifted the emphasis from subalgebras and congruences to homomorphisms.
Including both in the same structure shows us the universal property that distinguishes and re-unites
them. The result also explains the existential quantifier, so often obscured by lattice-theoretic methods,
as we shall see in Sections 5.8 and 9.3.
DEFINITION 5.7.1 We say that two maps e:X→ A and m:B→ Θ in S are orthogonal and write e⊥m if,
for any two maps f and z such that the square commutes, there is a unique morphism p:A→ B making
the two triangles commute:
For technical reasons it is also useful to say that `` e⊥m with respect to z'' if the fill-in property above
holds for all f but just this particular z.
DEFINITION 5.7.2 A factorisation system [FK72] on a category S is a pair of classes of morphisms (E,M)
of S such that
(a)
the classes E and M each contain all isomorphisms , and are closed under composition on either
side with isomorphisms (we shall find that they are non-full replete subcategories),
(b)
every morphism f:X→ Θ of S can be expressed as f = e;m with e ∈ E and m ∈ M, and
(c)
e⊥m for every e ∈ E and m ∈ M.
If the pullback of any composite e;m against any map u:Γ→ Θ exists, and the parts lie in E and M
respectively, then we call (E,M) a stable factorisation system, cf stable coproducts in Section 5.5.
In Set, image factorisation is stable: this is necessary in Lemma 5.8.6 to make relational composition
associative, and in Theorem 9.3.11 for the existential quantifier to be invariant under substitution. The
only pullback- stability properties that factorisation systems in general have are Lemmas 5.7.6(f)
and 5.7.10. Although the image factorisation is the most familiar and accounts for the notation, there are
other important examples in topology, categorical logic and domain theory. Exercise 9.5 describes one
that is related to virtual objects (Remark 5.3.2) .
Image factorisation First we shall look at the motivating examples, so let S be a category that has
kernel pairs and their coequalisers.
LEMMA 5.7.3 If e is regular epi and m mono then e⊥m. Conversely, if m satisfies e⊥m for every regular
epi e then m is mono.
PROOF: Given a coequaliser and a mono in a commutative square as shown, the composites K
\rightrightarrows X→ B\hookrightarrow Θ are equal; hence so are those K→ B and by the universal
property there is a unique fill-in.
Conversely, apply orthogonality to the square with id:X→ B; the diagonal fill-in shows that e is
invertible. []
So with M and E the classes of monos (inclusions) and regular epis (quotients or surjections), we have
\orthr E = M in any category which has kernels and quotients.
LEMMA 5.7.4 If the class of regular epis is closed under composition, then together with the class of
monos it forms a factorisation system.
PROOF: To factorise f:X→ Y, let q:X\twoheadrightarrow Q be the coequaliser of the kernel pair K
\rightrightarrows X of f; by Lemma 5.6.6(b) this is also the kernel pair of q. We must show that Q
\hookrightarrow Y, so form its kernel pair L\rightrightarrows Q and let P be the coequaliser. The kernel
pair of the composite X\twoheadrightarrow Q\twoheadrightarrow P is sandwiched (as a subobject of
XxX) between those of X→ Y and X→ Q, which are both K. By hypothesis X\twoheadrightarrow Q
\twoheadrightarrow P is regular epi, so it is the quotient of its kernel pair K\rightrightarrows X. But X
\twoheadrightarrow Q was already the quotient of this pair, so L ≡ Q ≡ P. By Proposition 5.2.2(a), X
\twoheadrightarrow Q\hookrightarrow Y. []
We would like to say that whenever the relevant finite limits and colimits exist, so does the image
factorisation into regular epis and monos, and also dually the co-image factorisation into epis and
regular monos. Unfortunately this is not so in general, but it is when the class of regular epis is closed
under pullback (Proposition 5.8.3). In any case we call \orthl M-maps covers.
EXAMPLES 5.7.6
(a)
In a preorder all morphisms are both epi and mono, but only the isomorphisms are regular.
(Example 5.7.9 nevertheless gives a non-trivial prefactorisation system in a poset.)
(b)
If, as in Set by Corollary 5.2.7, all monos are regular, then the dual of this lemma shows that
epis and monos form a factorisation system. From Lemma 5.7.6(a) it follows that all epis are
regular.
(c)
A homomorphism of algebras for a single-sorted finitary algebraic theory L is regular epi in the
category Mod (L) iff it is surjective on its carriers, and mono iff it is injective (Exercise 5.38).
These classes form a factorisation system.
(d)
In CMon regular monos do not compose. Consider the submonoids U = 3,5 and V = 3,5,7 of N;
the inclusions U\hookrightarrow V and V\hookrightarrow N are regular monos but their
composite is not, because if f,g:N\rightrightarrows Θ agree at 3 and 5 then (as 2+5 = 2+3+2 = 5
+2) they do at 7 too.
(e)
Although any field homomorphism is mono (Example 5.1.5(f)), it is regular iff it is a separable
extension. There are non-trivial epis, namely totally inseparable extensions, such as K = F(p)[x]
p
\hookrightarrow K[ √{x} ] ( cf Example 3.8.15(j), and see [Coh77, Theorem 6.4.4]).
(f)
In Sp, continuous functions are epi or mono according as they are surjective or injective on
points, but are regular iff the topologies are the quotient or subspace ones. Both factorisations
arise.
(g)
Let E be the class of full functors which are bijective on objects, and M the class of faithful
functors.
(h)
Regular epis in Cat do not compose: (•→ •)\twoheadrightarrow N\twoheadrightarrow Z/3.
Instead of allowing all subsets to be in M, we may restrict to those that are closed in some sense
(Section 3.7).
(i)
In Sp, let M be the inclusions of closed sets in the topological sense. Then E is composed of the
continuous functions with dense image.
(j)
In Pos, let M be the inclusions of lower sets and E be the class of cofinal maps (Proposition
3.2.10).
Properties of factorisation systems Proposition 5.2.2 showed that the class of regular monos has
weaker cancellation properties than that of all monos. Although the diagonal map p is determined in
both cases by just one triangle, even this property is not typical. The other closure conditions satisfied by
epis and monos do hold for arbitrary factorisation systems, but to show this we must study orthogonality
more closely.
LEMMA 5.7.7
(a)
M ⊂ \orthr E⇔ E ⊂ \orthl M, ie E→ \orthr E and M→ \orthl M form a Galois connection
(Definition 3.6.1(b)) between classes of morphisms.
(b)
If i is invertible then e⊥i for any e ∈ E, so \orthr E contains all of the isomorphisms, as does
\orthl M (they are replete, Definition 4.4.8(d)).
(c)
If i⊥i then i is invertible, so M∩\orthl M and E∩\orthr E contain only isomorphisms.
(d)
If e⊥\m1 and e⊥\m2 then e⊥(\m2;\m1), so \orthr E is closed under composition, and likewise
\orthl M.
(e)
If e⊥\ and e⊥(m ;\) then e⊥m, cf Proposition 5.2.2(h).
(f)
If e⊥m, and
(g)
The ⊥ relation is preserved and reflected by full and faithful functors.
PROOF: The relation e⊥m defines a Galois connection by Proposition 3.8.14; most of the rest is shown in
the diagrams.
(a)
If e;p = f and p;\m2;\m1 = z then p;\m2 = \Polly1 and p = \Polly2 since the mediators (p, \Polly1,
\Polly2) are required to be unique.
(b)
Any fill-in for the left-hand trapezium serves for the rectangle ( p), but z and p;m both serve as
the fill-in for the right-hand trapezium.
(c)
The fill-in for the rectangle gives one for the upper square by pullback and conversely by
composition. []
PROPOSITION 5.7.8 Any factorisation system is also a prefactorisation system, so its two classes are
closed under the above properties.
PROOF: It only remains to show that \orthr E ⊂ M. Using the factorisation property, suppose that (e;m) ∈
\orthr E with e ∈ E and m ∈ M. Then in particular e⊥(e;m) and e⊥m, so e⊥e is invertible using Lemmas
5.7.6(e) and (c). By repleteness, (e;m) ∈ M. Similarly \orthl M ⊂ E. []
Finding factorisations Given an arbitrary prefactorisation system (E,M), we can now try to factorise S-
morphisms f:X→ Y as f = e;m with e ∈ E = \orthl M and m ∈ M = \orthr E. Any M- or E- morphism we
can find that factors appropriately into f will contribute to this.
LEMMA 5.7.9 M is closed under wide pullbacks (Example 7.3.2(h)), ie arbitrary intersections in the case
of monos. That is, for any wide pullback diagram in \orthr E , if the limit exists in S then its limiting
cone lies in M, as does the mediator for any cone of M-maps. []
Similar results hold for wide pushouts in E. We have to impose algebraic and size conditions to ensure
that the limit for M and colimit for E exist, but they may still fail to meet in the middle.
The classes E and M consist of the marked arrows, together with all of the identities and a single
composite. The wide pullback of the M-maps into any object exists, as does the wide pushout of the E -
maps out, but the unmarked broken arrow does not factorise. []
The problem is that the map cannot be in E because there are distant M-maps to which it is not
orthogonal, but parallel translation using pullback which might bring it into M is not available.
LEMMA 5.7.11 Suppose that the pullback of any M-map against any S-map exists (and so is in M). Then
to show e ∈ \orthl M it suffices to test orthogonality with respect to z = id, ie
We need a solution-set condition such as that for the General Adjoint Functor Theorem 7.3.12 to show
that any prefactorisation system with sufficient pullbacks is a factorisation system (Exercise 7.34), so
we shall end this section with a special case.
op
PROPOSITION 5.7.12 Let S be a category such that there is a functor Sub:S → CSLat. Explicitly, S is
well powered (Remark 5.2.5) and has arbitrary intersections of subobjects and inverse images, ie
pullbacks of them along S-maps. For example S may be Set, Sp or any category of algebras. Then any
prefactorisation system (E,M) which is such that all M-maps are mono is a factorisation system.
PROOF: The factorisation of f:X→ Θ is e;m where m:A\hookrightarrow Θ is the intersection of the M-
subobjects B\hookrightarrow Θ through which f factors (using Theorem 3.6.9). Then e ∈ E by
Lemma 5.7.10. []
(a)
If S has finite limits it is called a lex category.
(b)
If further the image factorisation exists (Lemma 5.7.4) and is stable under pullback then S is
called a regular category.
(c)
If further every congruence is the kernel pair of its coequaliser then S is said to be effective
regular or Barr- exact.
(d)
An AbGp-enriched category (Definition 5.4.3ff) which is also effective regular is known as an
Abelian category. Equivalent definitions are discussed in [FS90, Section 1.59].
(e)
A prelogos is a regular category with finite unions of subobjects which are stable under pullback
(inverse image).
(f)
A prelogos in which the inverse image operation between subobject lattices has a right adjoint is
called a logos.
(g)
By Theorem 3.6.9 any prelogos which has arbitrary stable unions of subobjects is a logos; we
call it locally complete.
(h)
A pretopos is an effective regular extensive category (Definition 5.5.3); in particular it is a
prelogos, but not necessarily a logos.
The original example of a regular category was AbGp, and in general Mod(L) for any finitary algebraic
theory L, by Theorem 5.6.9. From Section 5.6, these categories have all coequalisers, so there is some
ambiguity in the literature as to whether all coequalisers are required in the definition: after [FS90,
§1.52], we say they are not. Indeed in regular categories coequalisers of their kernel pairs are stable
under pullback, but even when they exist general coequalisers need not be stable.
EXAMPLES 5.8.2
(a)
Set is a pretopos by Example 5.5.6(a) and Proposition 5.6.8.
(b)
A poset is lex iff it is a semilattice, and then it is trivially effective regular. It is a prelogos iff it is
a distributive lattice, and a logos iff it is a Heyting lattice (Definition 3.6.15). Conversely, in any
prelogos or logos, every \Sub (X) is a distributive or Heyting lattice respectively. Posets can
S
(c)
Mod(L), where L is a finitary algebraic theory, is effective regular by Theorem 5.6.9. It also has
arbitrary colimits; filtered colimits (Example 7.3.2(j), in particular directed unions of subobjects)
are stable, but finite ones are usually not.
(d)
In particular Gp is effective regular. It also has all coequalisers, but the following one is not
stable under pullback. omitted diagram environment The parallel pairs consist of the inclusion of
the normal subgroup
{id, (12)(34), (13)(24), (14)(23)} ⊂
A4
(e)
Von Neumann regular rings, those satisfying ∀x.∃y.x y x = x, form a non-effective regular
category.
(f)
Abelian groups, vector spaces and modules form Abelian categories.
(g)
The category of compact Hausdorff spaces and continuous functions is a pretopos, in which
directed unions and general coequalisers also exist but are not stable.
(h)
Equivalence relations in Pos are not effective: all three points in the quotient of {0 < 1 < 2} by 0
∼ 0 ∼ 2 ∼ 2, 1 ∼ 1 are identified. The kernel pairs of monotone functions are convex
(Example 3.7.5(c)).
Stable image factorisation To make Lemma 5.7.4 work as intended, we have to show that the class of
regular epis is closed under composition.
PROPOSITION 5.8.3 In a regular category the class of regular epis is closed under pullback and so is the E
class of the image factorisation.
PROOF: Suppose e:X→ Y and f:Y→ Z coequalise their kernel pairs, K\rightrightarrows X and L
\rightrightarrows Y respectively. Let M\rightrightarrows X be the kernel pair of X→ Y→ Z; comparing
commutative squares with these kernels ( cf Lemma 5.6.6(a)), there are mediators K→ M→ L.
Let X→ Θ be a map having equal composites with M\rightrightarrows X. Then the composites K→ M
\rightrightarrows X→ Θ are equal and give Y→ Θ. The required mediator Z→ Θ exists iff the
composites L\rightrightarrows Y→ Θ are equal.
We can achieve this by showing that the maps M→ P→ L are epi. Indeed, both parts are pullbacks of the
regular epi e:X\twoheadrightarrow Y (against split epis). []
The last part is sufficient but not necessary. The literature on this subject is confusing, because in certain
categories, one may show by slightly different arguments that regular epis compose, without being
stable under pullback. There is apparently no convenient necessary and sufficient condition for
composability of regular epis.
REMARK 5.8.4 There are four useful notions of surjectivity e:X\twoheadrightarrow Y in lex categories ( cf
Definition 5.2.1 for monos):
(a)
split epi: there is a map m:Y\hookrightarrow X with m;e = \id ;
Y
(b)
regular epi: e is the coequaliser of some (its kernel) pair;
(c)
cover or surjective: e⊥ m for all monos m;
(d)
epi: for any f,g:Y\rightrightarrows Z, if e;f = e;g then f = g .
By Proposition 5.8.3, the middle two coincide in a regular category. Furthermore, every epi is regular in
Set, and indeed in any pretopos, but to say that every epi splits would be an internal form of the axiom
of choice (Exercise 1.38).
(e)
An object P with the lifting property on the left for every cover A\twoheadrightarrow B is called
projective . In a regular category, the right hand diagram shows that P is projective iff every
cover A′\twoheadrightarrow P splits.
Relations Logically, stability of image factorisation corresponds to the Frobenius law (Lemma 1.6.3).
It is also vital for relational calculus.
REMARK 5.8.5 The stable image interprets existential quantification, removing the uniqueness
requirement from Remark 5.2.9.
Stable unions of subobjects similarly express disjunction, where once again we have dropped uniqueness
from Remark 5.5.10 . The right adjoint in a logos gives the universal quantifier. In this way, existential,
coherent (∧, ∨, ∃) and first order logic may be interpreted in regular categories, prelogoses and logoses
respectively. We study these type-theoretically in Chapter IX, and just treat relational algebra here.
(a)
As in Proposition 5.3.5, the pullback below gives a subobject, abbreviated as x Ry Sz ⊂ XxYxZ, of
the three-fold product, but not necessarily a mono into Xx Z. We use image factorisation to get
one, as in the bottom row. omitted diagram environment
(b)
To say that the factorisation is stable means that the mono in the middle of the top row is
invertible; this was one of the two crucial steps in the proof of Lemma 4.1.3, the other being
similar.
(c)
Interchanging existential quantifiers does not cause any problem, because surjections compose
(but cf Lemma 5.7.4).
(d)
Composition of three particular relations may be associative without stability, as the two
subobjects
= 0pt omitted array
environment
may coincide whilst still being proper.
(e)
However, if R is functional then the lower epi is invertible, so one subobject is total without
op
hypothesis. Thus associativity in the case where R is functional and Z = 1 implies stability. []
(a)
form another category under relational composition ( id , R;S);
(b)
with the same source and target admit binary intersection ( R∩ S), which is associative,
commutative and idempotent; we also define R ⊂ S⇔ R = R∩S;
(c)
have monotone composition: R;(S∩T) ⊂ (R;S)∩(R;T) ;
(d)
op
have opposites, where R has the source and target interchanged, and satisfies
omitted array
environment
(e)
op
and also obey modularity: (R;S)∩T ⊂ (R∩( T;S ));S. []
Freyd and Scedrov [FS90] call such a structure an allegory. They show how the logical connectives
described above may be reformulated in terms of relations rather than functions. For example R is
op op op op
functional iff R ;R ⊂ id, total iff id ⊂ R;R and confluent iff R ;R ⊂ R;R .
Using this we can try to recover the category composed of functions from its allegory composed of
relations. Any R ⊂ id is the support of a partial function (Remark 5.3.6) and must be introduced as a
virtual object. Every allegory is thereby embedded in the category of relations of a regular category. The
utility of the technique, as we shall see in Theorem 6.4.19, is that allegories can sometimes be
constructed from one another more simply than the corresponding categories, and results deduced
without the extraneous objects.
Stable unions To make full use of relational algebra we need unions of possibly overlapping subsets.
In a pretopos (effective regular extensive category), unions of subobjects may be defined as quotients of
sums and are stable. First we consider the stable unions alone.
LEMMA 5.8.8
(a)
Image factorisation provides the left adjoint to the inverse image map f*:Sub(X)→ Sub(Y) , so
preserves unions by Proposition 3.6.8.
(b)
Relational composition distributes over stable unions.
(c)
In a logos there are adjoints (-);R\dashv (-)/R and R;(-)\dashv R\(-), which both reduce to Heyting
implication R⇒ (-) if R ⊂ ∆. These satisfy R;(R\S) ⊂ S and (S/R);R ⊂ S. The analogous structure
for relations is called a division allegory.
The slash notation, suggesting division, is due to Joachim Lambek, who used it first in linguistics
[Lam58] and later for modules for rings [Lam89]. (Don't confuse division with \ for subset difference,
Example 2.1.6(e).) We shall use it to study transitive closures in the proof of Lemma 6.4.9.
and conversely since f*U is a pullback. Composition is given by the image of a pullback, and both
operations preserve unions or have adjoints as appropriate. []
The elementary properties of Set are already beginning to emerge in a prelogos. The next result is
known as the Pasting Lemma as it is needed to combine partial functions, for example in Lemma
6.3.12.
LEMMA 5.8.9 Let U,V ⊂ X and suppose that product distributes over union. Then the diagram on the left,
which is a pullback by construction, is also a pushout.
PROOF: Recall that a relation R\hookrightarrow XxY is the graph of a function iff R\hookrightarrow
XxYπ0→ X is an isomorphism. Applying this to f, g and h in the diagram on the right, the graph of f is
isomorphic to U, etc . Then the union of the graphs of f and of g, formed as a subobject of XxY, is
isomorphic to U∪V ⊂ X by distributivity. Hence it is the graph of a function U∪V→ Y. []
m
PROPOSITION 5.8.10 Let Y\hookleftarrow X\hookrightarrow \ Z be monos in a pretopos.
(a)
Then
5.9 Exercises V
1. Show that any group or groupoid, considered as a category, has pullbacks, cf Exercise 4.17.
[Hint: division.]
2. (Bob Paré) In the diagram on the left, suppose that \;\nearrow = \id and
Y
\nearrow ;\;| = | ;\;|. Show that (\;|) is idempotent (Definition 1.3.12), so (assuming that
idempotents split in the category, cf Exercise 4.16) we may let
\;| = q; m with m;q = \id . Show that \nearrow ;q = | ;q and q is the coequaliser of \nearrow and |.
Q
It is an absolute coequaliser in the sense that (being equationally defined) it is preserved by any
functor.
Barry Jay observed that (\nearrow ,|):X\rightrightarrows Y describes a binary ``reduction'' relation
on Y with a normal form (defined by \). omitted diagram environment
3. Similarly, in the diagram on the right, suppose that the four pairs are split ( \;\nearrow = \id ,
W
etc ) and that the four squares N→ S, S→ N, W→ E and E→ W commute. Show that S is the
pushout of \nearrow and \nearrow ′. In fact S is the absolute coequaliser of \nearrow and | =
\nearrow ′ ;\′;\nearrow .
Some of these conditions are redundant. Let e and e′ be idempotents on any object N such that e;e
′; e = e′;e. Then in the Karoubi completion (Exercise 4.16), the objects e′; e and e′;e;e′ are
isomorphic quotients (but not isomorphic subobjects) of N and provide the absolute pushout.
4. Show that if S has a terminal object then there is an isomorphism S↓ 1 ≡ S, and conversely that
\id is the terminal object of S↓ X.
X
6. Suppose that a;\nearrow = \id , b;q = \id , a′;\nearrow ′ = \id and the two lower squares are
Z X Y
pullbacks: omitted diagram environment Show that the upper square commutes, and is also a
pullback. [Hint: show that a;b;\nearrow ′;q′ = id and use Lemma 5.1.2.]
7. Investigate pullbacks and equalisers in the categories Pfn, IPO, Pos\dashv and Rel.
8. Let C be a category with pullbacks (equalisers), and ℑ any small category. Show that the functor
ℑ
category C also has pullbacks (equalisers), constructed pointwise. [Hint: cf Lemma 3.5.7.]
9. Let Pbk be the category whose objects are small categories with binary pullbacks and whose
morphisms are pullback-preserving functors. Show that Pbk is cartesian closed, the morphisms
of the functor category being cartesian transformations, ie those natural transformations such
that the square in Definition 4.8.1 is a pullback ( cf Exercise 4.51).
10. (Giuseppe Rosolini) A PER (partial equivalence relation) is a symmetric transitive binary
relation on N. A PER R names an object, which we may think of more concretely as the set N/R
of equivalence classes under R; these are disjoint, but their union need not be N, as we have not
required R to be reflexive.
(a)
Considering f,g ∈ N as Gödel numbers for partial recursive programs N\rightharpoonup
N, write f[R→ S]g if f and g induce the same function N/R→ N /S, and this is total.
Formulate this condition directly in terms of the binary relations R and S and show how to
define a cartesian closed category with exponential [R→ S].
(b)
Let f Σ g be the equivalence relation ``f terminates iff g does,'' where these are programs
which run without being given any input (they have type 1\rightharpoonup 1). Define
partial maps by relaxing the totality requirement on PER-maps, and show that Σ is a
support classifier.
11. Show that a mono in Pos or Dcpo is regular iff it is full. Find a subdcpo of the lazy natural
numbers whose Scott topology (Definition 3.4.7ff) is not the subspace topology.
12. Construct the map Σ→ Ω in Definition 5.2.10 and explain why it is mono and is a semilattice
homomorphism.
13. Let C be any small category and X ∈ obC. A class R ⊂ morC of generalised elements of X, ie
maps a:Γ→ X, is called a sieve or crible on X if it is closed under precomposition with any u:∆→
Cop
Γ. Show that a sieve on X is exactly a subobject of \H in Set . Let Ω(X) be the set (indeed
X
complete lattice) of sieves on X, and write T ∈ Ω(X) for the sieve of all generalised elements.
X
op
Show how to make Ω:C → Set into a functor and T:1→ Ω a natural transformation, and that
Cop
this is the subobject classifier of Set .
14. Formulate what it is to be a category in Gpd (the category of groupoids and homomorphisms).
Show how any small category (in the ordinary sense) can be regarded as such an internal
category, in such a way that it is skeletal there ( cf Exercise 4.37), in a sense to be defined.
15. (Giuseppe Rosolini, [RR88]) Given partial maps f:Z\rightharpoonup X and g:Z \rightharpoonup
Y, show how to define f,g:Z \rightharpoonup Xx Y and hence x as an endofunctor of P( S,M).
[Hint: take the intersection of their supports.] Although the symbol x no longer denotes the
categorical product, show that the projection π0 ≡ \p , :Xx Y→ X is natural in X but that the
XY
corresponding square for Y need not commute but involves an inequality. The diagonal, \d :X→
X
\q ,-:Xx-→id, satisfying these laws and the Mac Lane-Kelly laws for associativity and
X
\nearrow [thick](X,X) and identify Sand Mwithin \nearrow [thick]. Finally, show that (\nearrow
[thick],x) has a full embedding \nearrow [thick]\hookrightarrow P(S,M) such that xrestricts to the
categorical product in S.
16. Similarly characterise the product structure using the category composed of relations, and, given
an allegory satisfying your conditions, show how to embed it in a category of relations
(Proposition 5.8.7).
17. Show how the Floyd rules (Remark 4.3.5) define the category of virtual objects (contexts with
midconditions) in Remark 5.3.2 .
18. By giving the matrices for the (co)projections, (co) diagonals and (co)pairing, show that
(a)
n+m n m
R is the product of R and R in Vsp and CRng;
(b)
it is also their coproduct in Vsp;
(c)
nxm
R is their coproduct in CRng.
n m
19. Show that if N ≡ N in CMon then n = m.
20. Let C be a CMon-enriched category, ie each hom-set C(X,Y) is a commutative monoid (written
using 0 and +) and the composition (;):C(X, Y)xC(Y,Z)→ C(X,Z) is a monoid homomorphism.
Show that the terminal object (if any) is also initial and any product also carries a coproduct
structure.
21. Conversely, show that in any category with a zero object (Definition 5.4.3), the zero map is
preserved by composition on either side with any map. Suppose further that binary products and
coproducts exist and [π0;ν0, π1;ν1 ]:Xx Y ≡ X+Y. Show that the category is CMon-enriched .
op
22. Show that Rel ≡ Rel and that the product and coproduct are both given by disjoint union. What
happens in CSLat and SLat? Describe the CMon-enriched structures.
23. Use van Kampen's theorem to show that the fundamental group of the circle S1 is Z. [Hint: U and
V are open intervals and W is the disjoint union of two open intervals.]
24. Let \typeM0 and \typeM1 be modules for the commutative rings \typeR0 and \typeR1
respectively. Show that their coproduct in the category of rings-with-modules is the module
(\typeM0⊗\typeR1)⊕(\typeR0⊗\typeM1) for the ring \typeR0⊗\typeR1.
25. (Only if you know some hyperbolic geometry.) Considering the matrices (( 0 -1 ) || 1 1 )
and ( 0 1 || ( -1 0 )) as elements of PSL(2,Z), show that it is the coproduct of the groups Z/
(3) and Z/(2).
26. Explain how the maps (ΓxY)+(ΓxN)→ Γ x(Y+N) given in Definition 5.5.1 arise from the
universal properties of + and x, and why they are equal. Show that they are invertible in a
Y Z Y Z Y Z Yx Z
cartesian closed category. Do the same for X + ≡ X x X and (X ) ≡ X . Show that these
isomorphisms are natural in all three variables, and relate them to Exercise 1.22, Remark 2.3.11
27. Show that this square is a pushout in any category which has coproducts, and that it is also a
pullback in an extensive category: omitted diagram environment In other words, if two sets Y′ = X
+Y and Z′ = X+Z have a common decidable subset X then, within their union, they intersect only
in X. Deduce Proposition 5.8.10 for decidable subsets. Although all four maps are monos, the
mediator from the pushout to a commuting square of monos need not be mono, since it may
identify some y ∈ Y and z ∈ Z.
28. The following similar property holds in Pos . Suppose X ⊂ Y′ and X ⊂ Z′ are full subposets (and
the underlying sets are decidable subsets). Then Y′ and Z′ are full subposets of the union,
intersecting only in X. Deduce that it holds for embeddings (Example 3.6.13(b)).
29. Let A,B ⊂ X be ideals of a distributive lattice and f :A→ Θ, g :B→ Θ be ∨- semilattice
homomorphisms which agree on A∩B. Show that if a∨b = a′∨b′ then f (a)∨g (b) = f (a′)∨g (b′),
where a,a′ ∈ A and b,b′ ∈ B. Deduce that, with C = {a∨b|a ∈ A,b ∈ B} ⊂ X, there is a unique ∨-
semilattice homomorphism p:C→ Θ which restricts to f and g .
30. Characterise finite products and coproducts in Loc. [NB: The two-element lattice {⊥,T} is not
complete without excluded middle (Example 3.2.5(h)); Example 3.9.10(e) gave the product in
Loc .]
op
α ∧β = ⊥ and α∨β = T. Similarly partitions in CRng are given by pairs with α2 = α, β2 = β, αβ
op
= 0, α+β = 1. Hence show that Loc and CRng are extensive.
32. Show that binary coproducts commute with wide pullbacks in an extensive category. Does the
analogous property hold for distributive lattices? Let L be a disjunctive theory; show that Mod
(L) has wide pullbacks.
33. Show how to swap the values of two variables using the direct declarative language with
assignment. Using this and conditionals, write a program to sort three objects with respect to a
given trichotomous order relation. How many comparisons are needed in the worst case?
34. Show that the coproduct in S extends to a functor on P(S,M), where it remains the coproduct.
It is not stable disjoint there: explain why we do not want it to be.
35. Let S be a distributive category with a class M of supports to which yes,no:1 \rightrightarrows 2
σ
belong. Suppose that, for each pair of objects, there are partial maps N\leftharpoonup 0Y+N
σ
\rightharpoonup 1Y such that ν ;σ = id if i = j and ⊥ otherwise. Show that S is extensive.
i j
36. Variant records of type Y+N are typically implemented by coding the elements ν0(n) and ν1(y) as
0,0,n,1,y,0 ∈ 2xYxN respectively, where 0 is a ``dummy value'' of type Y and N. Formulate the
midcondition φ which defines this subobject ( cf Example 2.1.7). Making clear where extensivity
is used, show that the virtual object is indeed the coproduct Y+N, and define the switch,
satisfying the rules of Remark 2.3.10. Modify this construction to remove the assumption that the
unused field is initialised to the dummy value. Generalise it to the case where the components are
themselves (possibly uninhabited) virtual objects; the dummy value now belongs to the
underlying real objects but not necessarily to the virtual ones.
37. Show that the pullback of a split epi against an arbitrary map is another split epi. Conversely,
show that every split epi is a pullback of its kernel projection, if the kernel pair exists.
38. Let L be a single-sorted algebraic theory. Show that f:A→ B is a mono in Mod(L) iff it is an
injective function, and a regular epi iff it is surjective. [Hint: Propositions 5.2.2(a) and 5.6.8.]
39. Characterise the inverse images of ⊥ and T under homomorphisms in DLat and of 0 and 1 in
CRng.
41. Prove the analogues of Theorem 5.6.9 for CSLat and Frm, and show how to use a closure
operator on an object (not its square) to code kernel pairs, analogously to Lemma 5.6.14. [Hint:
cf Theorem 3.9.9.]
42. Let XxX→ Ω be the characteristic map of an equivalence relation K on X in Set. What is the
X
image factorisation of its exponential transpose, X\twoheadrightarrow I \hookrightarrow Ω ?
43. How do you recover the kernel pair of e:A \twoheadrightarrow Q from the subset Q ⊂ A in
Lemma 5.6.14?
44. Explain the relationship between factorisation systems and the so-called isomorphism theorems
for groups, rings and vector spaces.
45. By Proposition 3.8.14, the orthogonality relation e⊥m in Definition 5.7.1 gives rise to a
specialisation order. Investigate its relationship to the category, and what it means to have enough
epis or monos.
46. Suppose that the rectangle is a pullback and contains a mono and a stable surjection as shown.
Show that the two squares are pullbacks. omitted diagram environment [Hint: form the pullback
of Y\twoheadrightarrow Q against Γ→ Q ; this result holds for any stable factorisation, not just
the image one.]
47. Let M be closed under pullback, and m, \ ∈ M. Prove the following tighter form of Lemma 5.7.6
(e), that if e⊥(m; \) and e⊥\ with respect to \ then e⊥m ( cf Exercise 9.27).
48. Let E be the class of functors which are full and essentially surjective, and M those that are
faithful (Definition 4.4.8). Show that this is a factorisation system in Cat.
op
49. Let U be a relation. Explain what categorical structure is needed to show that U∪∆ and U∪U
are respectively its reflexive and symmetric closures, and that U∪∆ is confluent if U is
functional.
50. Show that, in a pretopos, the (stable) union U∪V ⊂ X is computed as the image of U+V→ X.
51. In a pretopos, show that the pushout of a mono against an arbitrary map is again a mono, and that
the square is also a pullback. Deduce that if the opposite category has equalisers then it is regular.
f
52. Let (•→ •)\twoheadrightarrow eN\twoheadrightarrow Z/(3) in Cat. In Proposition 5.8.3, find L
and discuss how K, P and M might be defined, noting that e is neither replete nor full. [Hint: obK
≡ 2+4xN in Definition 7.3.8.]
(a)
Given f,g,b,s,q,c with the right square a pullback, suppose that f*b and g*b exist, a priori
with different vertices V and W. Show that a,h,k may be chosen so that the two left
squares are pullbacks.
(b)
In a regular category, suppose that the q and s are coequalisers, (h,k) and (f,g) are kernel
pairs, and that the two left squares are pullbacks. Show that the right hand square is also a
pullback. [You have to be pretty good at diagram-chasing to do this.]
(c)
In Set, suppose again that the left squares are pullbacks and the rows are arbitrary
coequalisers. Assume also that c = id, and show by induction on zig-zags that b is
surjective.
(d)
For general c, deduce that Y\twoheadrightarrow X× S in the right-hand square.
Q
(e)
With S = Q = 1, U = X = 2, V = Y = 6, find maps in Set satisfying these conditions, so b
need not be an isomorphism.
The reason for the difficulty is that the fibres of b are carried isomorphically to one another by
the transitions. If these have (unoriented) cycles, these isomorphisms may give an automorphism
of a single fibre. This can't happen if b is mono or U\rightrightarrows X is functional and acyclic.
INTRODUCTION
● Recursive covers
● Variables
● Many-sorted theories
● Formation rules in type theories
● Proof theory
● Infinitary operations
● Semantics
● Transitive closure
● The recursion and induction schemes
● Partial correctness
● Discussion
6.5 UNIFICATION
● Unification
6.6 FINITENESS
● Transfinite recursion
● Rank
● Arithmetic
● Classical applications
● Axiomatisation
EXERCISES VI
Chapter 6
Structural Recursion
Foundations must be built on a sound layer of ballast. The edifice of mathematics rests on calculi of
expressions composed of symbols written on paper or coded in a computer. To make a mathematical
study of this edifice itself, or of the ways of handling expressions mechanically, we must consider the
completed infinities of formulae, such as N. Since these completed infinities have no existence as
objects in the real world, but only as characters in the drama of mathematics, this kind of study ( meta-
mathematics) must be made within a pre-existing mathematical context. This is why we are doing it in
Chapter VI and not Chapter I.
Mathematically, the algebras of formulae are free for certain free theories, whose operations are the
connectives and proof rules of logic and type theory. In the category-theoretic tradition, a free algebra is
the initial object in the category of all algebras. However, the very general nature of universal properties
- the fact that this mode of description applies to so many other mathematical phenomena, as we have
seen - means that this does not give a ``hands on'' appreciation of term algebras.
In formal languages, terms cannot be equal unless they have been built in the same way. Since they are
put together with connectives such as +, ∧ and ∃, they form an algebra, but, unlike the arithmetical
examples which motivated Section 4.6, they can also be taken apart - parsed by analysis into cases. This
leads to the techniques of structural recursion, which we have been using from the first page of this
book, and unification, the subject of Section 6.5. We shall characterise, and thereby construct, the free
algebras for free theories, by the property that the parsing operation is well defined and well founded, ie
that it terminates (Section 6.3).
Section 6.4 specialises to tail recursion, which can be implemented more efficiently using imperative
while programs. The interpretation in this case is described in terms of coequalisers, from the previous
chapter.
The properties of term algebras also hold for infinitary operations. More careful consideration, indeed,
shows that notions of induction are needed to capture finiteness, and not vice versa (Section 6.6).
The mathematisation of symbolism also gives us the means to create new mathematical worlds inside
existing ones, for example in a model of set theory or in a topos of sheaves. In this book we have been
careful to make clear which logical connectives are needed for each application, rather than rely globally
on higher order logic: cartesian closed categories are used in functional programming, and pretoposes
serve many of the purposes of universal algebra. Certain other structure is needed in the meta-logic to
express or construct these features, so for example we might treat the λ-calculus symbolically using
algebraic tools. Category theory, unlike symbolic logic, is well adapted to the parallel treatment of
different fragments, because it can both represent logical structure very naturally and also be
represented in a clear combinatorial way.
Set theory tried to deal with both the internal and external structure, though it was particularly ill suited
to universal algebra. The inductive hierarchy of sets, on which mathematics was allegedly founded, is
defined using the quantifiers, and Lemma 6.3.12 shows that it also relies on the Pasting Lemma 5.8.9.
Nevertheless, set theory does throw some light on induction: Section 6.3 uses category theory to
generalise some of its ideas and apply them to new inductive situations.
One reason for the difficulty in understanding set-theoretic induction was that the ordinals, as
traditionally presented, depend heavily on excluded middle. My original draft of this chapter contained
material from set theory which was embarrassingly old fashioned. I tried to develop a simple
intuitionistic account of the ordinals; such an account now exists in [ JM95], [Tay96a] and [Tay96b],
and is summarised at the end of the chapter, though it is hardly simple. But the motivating problem was
solved in the mean time in a more elementary way (Exercise 3.45 ).
The algebraic theories for which we are able to study recursion do not have laws. The reason was in
Example 3.7.9(f), which showed that a system \triangleright of closure conditions arises from a (well
founded) relation \prec iff for each element t there is exactly one set K of preconditions (K\triangleright
t). We shall provide free models for infinitary free theories in this section, and also give some
applications of them. As for induction on closure conditions, it is a distortion of the theory to restrict the
arities to be finite in advance.
We considered algebraic theories in general in Section 4.6, allowing laws and many sorts. But we
insisted there that the arities were finite, since Theorem 5.6.9 fails for infinitary operations. The results
of this chapter are still needed and useful, as the means whereby the algebras are constructed in Zermelo
type theory. We shall develop this technique in Section 7.4, also using it to find colimits of algebras.
Infinitary algebraic theories without laws What do we mean by infinite? In classical mathematics,
for any set K, either it can be finitely enumerated, or any attempt to do so is non- terminating and we
have N\hookrightarrow K. Cantor and his followers developed transfinite numbers (ordinals, Section
6.7) to extend the counting idea. Constructively, this argument does not apply, and the enumeration may
fail for many other reasons. For example the set in Exercise 2.16 with two ``overlapping'' elements is not
finite in the strong sense required by Theorem 5.6.9. (We discuss finiteness in Section 6.6.)
Describing the arities and carrier as infinite simply means that we do not restrict them to being concrete
enumerations. (Nor need the arities be ordinals or carry any other special structure.) In this setting we
can treat internal theories, in which the collection of operation- symbols need no longer be {0,1,+,-,x,÷}
or {⊥,T,∧,∨,⇒ ,\lnot } but can be a type Ω in the object-language. For example Ω may be a topological
space or an abstract data type. The arities ar[r] of the operation- symbols may also be internal objects
instead of numbers such as 0 and 2.
For our purpose the arities must nevertheless be specified in advance of the models. The theory of
complete semilattices - with operations of arbitrarily large arity - is therefore excluded, even though it
happens to have free models (Proposition 3.2.7(b)). The apparently similar theory of complete Boolean
algebras has no free algebra on N [Joh82, pp. 33-34].
DEFINITION 6.1.1 A free algebraic theory is given by a set Ω of operation- symbols together with an
assignment to each r ∈ Ω of a set ar[r], called the arity of r. The disjoint union
The aim is to construct and study the free model, which is also known as an absolutely free algebra. It is
no loss of generality to consider only the free model without generators, ie the initial object of the
category Mod(Ω) of algebras and homomorphisms. If we require the algebra generated by a set G, we
consider instead the theory Ω′ = Ω+G, where ar[x] = ∅ for x ∈ G ( cf Remark 3.7.6 for closure
operations).
As in Definition 4.6.2, a model or algebra of (Ω,ar) is a set A together with a multiplication table \opr
A
ar r ar r
A [] → \opr A T A = \coprod A [] → \ev A
A
r∈Ω A
As there is only one sort, these may be combined into one function \ev , called the structure map, as
A
shown on the right. As there are no laws, any such map defines an algebra ( cf Proposition 7.5.3(b) for
monads).
Similarly a homomorphism f:A→ B is a function making the square on the left commute for each
operation-symbol r ∈ Ω:
These conditions too may be combined, as the single diagram on the right. Compare the role of the
functor T here with that of the product functor in the diagram in Definition 4.6.2. Notice that Ω ≡ T1,
and ar[r] can also be recovered (Exercise 7.37), but we shall often forget that T has a power series
expansion, assuming only that it preserves monos, their inverse images and arbitrary intersections. The
lattice analogue of the functor T was given in Lemma 3.7.10, based on closure conditions.
This is quite a different notation from that in which algebraic theories were presented in Definition 4.6.1,
so before reading any further you should convince yourself that we have merely presented the relevant
data in a more concise form. We shall restore the sorts in Proposition 6.2.6, the generators in Section
6.5 and the laws in Section 7.4.
DEFINITION 6.1.2 Even though the arities are general types rather than numbers, it is useful to retain the
ar r
notation [(a)\vec] for a typical element of A [] and \arga for the co-ordinate [(a)\vec](j) where j ∈ ar
j
[r].
→ → ) ⇒ r ≡ s ∧ ∀j:ar[r].\termu =
j
\opr ( u ) = \ops ( v
A A
\termv ,
j
and parsable if \ev is an isomorphism, so every u ∈ A is r([(v)\vec]) for some unique r ∈ Ω and [(v)
A
ar r
\vec] ∈ A [ ]. Compare Example 3.7.9(f), where we needed ∀t.∃!K.K\triangleright t. Note that algebraic
theories with laws usually have no equationally free models.
EXAMPLE 6.1.3 Let Ω = {x,s} with ar[x] = ∅ and ar[s] = {∗} . Any Ω-algebra satisfies the first two of
the Peano axioms for the natural numbers (Definition 2.7.1). The third and fourth axioms make it
equationally free ( cf Exercise 2.47). N+Z with the obvious structure is parsable but not initial, since it
fails Peano's fifth axiom. []
The following well known properties of initial T-algebras, due to Lambek and to Lehman and Smyth,
give a taste of the rest of this chapter.
(a)
Then Tev:T2 A→ TA is also an algebra and ev:T A→ A is a homomorphism.
(b)
Then it is parsable, ie ev:TF ≡ F;
(c)
and any T-subalgebra U ⊂ F is the whole of F. (Out of classical habit, we say that ``F has no
proper subalgebra.'')
(d)
if T preserves monos, any subalgebra U ⊂ A is also equationally free,
(e)
but if A has no proper subalgebra then it is parsable.
PROOF: [a] Obvious. [b] From (a), since F is initial, there is a unique homomorphism p:F→ T F. Then p;
ev:F→ F is an endomorphism of the initial algebra, so by uniqueness p;ev = id. But as p is a
homomorphism, ev;p = T p;Tev = id, so p = ev-1. [c] Similarly, p;m = \id , but m is mono, so U ≡ F
F
[LS81 , §5.2]. [d] Cancellation of monos, which T preserves. [e] TA is a subalgebra by (a). []
Given any equationally free algebra, we obtain one which is parsable as the intersection of all
subalgebras, by the Adjoint Function Theorem 3.6.9. We shall show in Section 6.3 that this is the initial
algebra.
Natural numbers Being the free algebra for the functor (-)+1 captures the recursive properties of N, as
set out in Remark 2.7.7.
DEFINITION 6.1.5 The diagram on the left below displays the data for any algebra (Θ,\opz ,\ops ) and
Θ Θ
the homomorphism p:N→ Θ for the Peano operations. The second diagram re-expresses this in terms of
the functor T = (-)+1.
This universal property was identified by Bill Lawvere (1963), and such a structure is called a natural
numbers object or simply an NNO.
Since N has a universal property, it is unique. But beware that this relies on second order logic (the
induction scheme): there are non- standard structures which share all of the first order properties of N
(Remark 2.8.1). Example 6.4.13 gives another characterisation of N.
REMARK 6.1.6 The Lawvere property provides the unique solution for any primitive recursion problem
of the form
In a cartesian closed category the parametric problem may be reduced to the simple one by putting Θ′ =
Γ
Θ , s′ :g→λ[(x)\vec].s([(x)\vec],g([(x)\vec])) and p′ = \expx p (Remark 4.7.8). Without these
Θ
exponentials, Γ is essential to make the definition invariant under substitution. Similarly the target
algebra for recursion over a general free theory must be expressed as
→
≡ \coprod \nolimits Γx Θ ar[r]
ar r
ΓxTΘ ≡ Γx\coprod \nolimits Θ []
ev
r r Θ,
where the sums have to be stable (Section 5.5) as shown. For a general abstract functor T to admit
parametric recursion, additional structure, known as a strength, is needed; see Exercise 6.23. The
polynomial functors arising from free theories and the other functors over which we shall consider
recursion all admit this structure in an obvious way.
EXAMPLE 6.1.7 The recursive argument itself may also occur as a parameter, for example in the factorial
function ( cf Example 2.7.8 ):
Exercises 2.46, 6.24 and 6.25 show how to handle this case using pairs.
Finally, the theory (Ω,ar) may be parametric, but we shall not consider this possibility in this book. In
fact we shall usually omit the parameters Γ and N as well.
Lists For any set (alphabet) G there is a free theory of lists, in which the set of operations is Ω = G
+{x}, with ar[x] = 1 for x ∈ G and ar[x] = ∅, so TA = {x}+GxA (Definition 2.7.2ff). Here G and Ω are
general (internal) types, and do not have to be concrete enumerations of symbols. The case G = {s}
gives the natural numbers.
In a cartesian closed category, using N we can construct equationally free algebras for the theory of lists,
and Exercise 6.10 shows (also using pullbacks) that all finitary free theories have free algebras.
N
PROOF: Put A ≡ ({∗} +G) with
This is the set of streams, in which the symbol ∗ is being used to indicate the end of a (finite) list. More
categorically,
N
{x}+Gx A\hookrightarrow A+GxA ≡ ({∗} +G){0}+ \hookrightarrow ({∗} +G)
N
,
ie ev:TA\hookrightarrow A, for any N such that N\twoheadrightarrow {0}+N. []
Infinitary conjunction and disjunction Using infinitary algebraic theories we can now define M
\vDash φ, the validity of a formula φ of the predicate calculus in a model M, in the way sketched in
Remark 1.6.12. The meaning of the quantifiers is defined, not by the proof rules, but by infinitary
conjunction or disjunction of its instances in M. Note that this model is chosen before defining the
infinitary theory below.
EXAMPLE 6.1.9 Let L be a collection of sorts U and relation-symbols ρ. The (closed raw) formulae of
first order predicate calculus (Definition 1.4.1) over L form the free algebra for the free theory with
Ω = {T,⊥,∧,∨,⇒ } + {ρ[( } +
u)\vec]
{∃ ,∀ }.
U U
The last term adds two copies of the set Σ of sorts to Ω. The set of symbols of the form ρ[(
u)\vec]
contributed to Ω depends on the model M in which we aim to interpret L, specifically the sets [[U]]
denoted by the sorts U: ρ[( )\vec] ranges over the instances of each relation-symbol at each tuple in the
u
Thus if there are, for example, relation-symbols of arities U2 and Vx W in the theory then a summand
[[U]]2+[[V]]x[[W]] is included in Ω. (The language and model may also have functions, but these do no
more than add synonyms for the instances of the relations.)
ar[∀ ] = [[U]],
U
where the arity of the symbols ∃ and ∀ also depends on M. The role played before by variable-
U U
binding is now taken by the infinite arities. We are interested, not in the free Ω- algebra, but in the
particular structure ev:TΩ→ Ω (Notation 2.8.2) for which
→ →
ev(∃ ,
U φ ) ∨ φ
u
ev(∀ ,
U φ ) ∧ φ
u
u ∈ [[U]] u ∈ [[U]]
and each nullary operation ρ[( is a propositional constant [[ρ[\vec] u]] ∈ Ω (which is again
u)\vec]
prescribed by the model M). The constants T and ⊥ and binary operation- symbols ∧, ∨ and ⇒ have the
usual meanings in Ω.
Note that φ is not a single formula with a free variable, but a U-indexed tuple of elements of Ω (a
u
function U→ Ω). Then a formula φ is valid in M if its value in this algebra (calculated , as always with
expressions in algebras, by structural recursion) is T. From this we may say what it means for M to obey
certain first order axioms, or to satisfy some other property, as in Remark 1.6.13.
Proofs Γ\vdash φ in the predicate calculus also form a free algebra, whose operation-symbols are named
by the proof rules (but the formulation of this algebra is complicated by pattern matching and side-
conditions which we shall discuss in the next section). By structural induction on the proof, we may
show that if M\vDash Γ then M\vDashφ, ie the interpretation is sound. For this structural induction, it is
only necessary to verify the soundness of each proof rule individually. []
EXAMPLE 6.1.10 The same conjunctive interpretation, in which r([(φ)\vec] ) = ∧ φ in Ω for every
j j
operation-symbol r ∈ Ω, is also the basis of strictness analysis. Instead of treating the data types in the
program as sets or domains and the values as elements, the (base) types are all interpreted as Ω and the
constructors as conjunction. The program may then be simplified to a conjunction of some of its inputs,
namely those that need to be evaluated in order to execute the program. This subset may be found
mechanically by the compiler, which may then detect which arguments actually need to be evaluated. []
Existence of equationally free models The following construction is applicable to any free theory; see
Example 6.2.7 for the finitary case.
PROOF: Let κ = \coprod ar[r] be the rank and A = P(List(κ)xΩ) be the set of sets of lists of odd length.
r
Such lists begin and end with an operation-symbol; each such symbol r (except the last) is followed by a
position j ∈ ar[r] in its arity. In particular, a nullary operation-symbol can only occur at the end of the
list.
→
\termu = {l| [r,j];l ∈ ev (r, u )},
j
The idea of this construction is that the terms are (infinitely branching) trees, and are determined by the
set of paths through them from the root. Imagine a term being processed by a program; at any moment it
is at a certain point in the tree, with the path stored on its stack, ie as a list. Corresponding to the root
there is an operation-symbol, \opr0, with a co-ordinate \numj0 ∈ ar[\opr0]; the next stage is a similar pair
(\opr1,\numj1) with \numj1 ∈ ar[\opr1] and so on. At the last stage (which is the top of the stack or the
head of the list) we have only an operation-symbol \opr without any specified co-ordinate. Otherwise
n
we would not be able to handle the nullary operations, without which the free algebra would be empty.
It still remains to show that the minimal equationally free T-algebra is initial, but we defer this to
Section 6.3, devoting the next section to further, more complicated, examples.
9.6.2.
In practice, additional side-conditions are required of the terms which are to be admitted to the language.
Some of these, such as the number of arguments taken by each operation-symbol, can be enforced in
advance, but others must be stated by simultaneous recursion together with the expressions themselves.
The terms which do satisfy the conditions are traditionally known as wffs ( well formed formulae).
DEFINITION 6.2.1 A wff-system is a set X of terms for a free theory (Ω,ar) such that if r([(u)\vec]) ∈ X
then \termu ∈ X for all j:ar[r]. Therefore
j
parse:X \hookrightarrow TX
is a total function on X, and is injective (since ev is a partial inverse).
Nonsense results if instead we admit expressions with ill formed sub-terms, for example the assertion
that `` the unicorn is the author of the Principia'' (page 1.2.11). As with pred(0) (Remark 2.7.9), the cost
of trying to make all operations total is a proliferation of exceptions to the rules of inference. These are
easily overlooked in complex situations, leading to errors in programs which are extremely hard to track
down.
Recursive covers Having enumerated the raw terms (wffs), we impose laws to equate them. When
arguing inductively, the ideal situation is that the laws be oriented, ie presented as reduction rules, and
these be confluent and strongly normalising. The surjection in Definition 6.2.2 then has a (canonical)
splitting, the semantic values being identified with the class of normal forms. This class may itself have
a recursive characterisation: for example Exercise 2.23 showed that normal λ-terms are hereditarily λ[(x)
\vec].y[(u)\vec] ( ie each sub-term \termu is also normal, and of this form). We can regard this as a
j
(finer) notion of well-formedness, with useful inductive consequences of its own ( eg Remark 7.6.12ff).
Failing this, finitary algebraic theories handle laws using quotients by congruences (Sections 5.6
and 7.4), whilst the properties of adjunctions take care of most of the infinitary theories of interest ( eg
Lemma 5.6.14). These methods tend to interact notoriously badly with recursion.
Section 7.6 turns the construction of linguistic structures on its head, in the search for a language which
exactly matches a given semantic structure, for example a λ-calculus which is equivalent to a given
cartesian closed category. New symbols such as \qq X are added to the language for each entity in the
semantics, but then terms in the language have equal meanings because this is given in the semantics,
rather than because there is a symbolic proof of this fact.
Either way round, the reason for considering raw terms is that they admit structural recursion.
DEFINITION 6.2.2 A recursive cover of a ``semantic'' set A is a wff-system X (``syntax,'' with a well
founded ``sub-expression'' relation \prec ), together with a surjective function p:X\twoheadrightarrow A.
(Since X ⊂ F, A is a subquotient of the free algebra F if the latter exists.)
EXAMPLES 6.2.3
(a)
To solve Rubik's cube (Example 4.2.3) we must find a list of moves which restores a
semantically given position to the home one. We need to split the surjection F = List(6)
\twoheadrightarrow A, where A is the group.
(b)
The reflexive-transitive closure ≤ of a relation < is a subquotient of List( < ) (Exercise 3.60).
(c)
The finite powerset P (X) is a quotient of List(X) (Definition 6.6.9).
f
(d)
Let A be the free algebra for a finitary theory L and F the free algebra for the free theory which
consists of the operation- symbols of L alone, forgetting the sorts and laws. We construct A in
Section 7.4 as the quotient by the laws of the subset X ⊂ F of terms obeying the type discipline.
(e)
Let Ω be the rules of logic. The class Θ of formulae is an Ω-algebra, of which the class A ⊂ Θ of
formulae true in some class of models is a subalgebra ( cf Remark 3.7.7 and Example 6.1.9).
The free algebra F consists of proofs and the homomorphism F\rightharpoonup Θ sends each
proof to its conclusion. If F\twoheadrightarrow A then all true formulae are provable.
(f)
Let C be a cartesian closed category (Section 4.7) and Θ = mor C. Let F be the class of raw λ-
terms in the canonical language of C (Section 7.6) and p:F \rightharpoonup Θ the interpretation
function. Its image A ⊂ Θ is the class of λ-definable functions.
(g)
The same, where F is the class of normal λ-terms.
(h)
PERs (Exercise 5.10).
(i)
(Lothar Collatz, 1930s) Let f:N→ N by
f(n) = omitted casearray
environment
m
it is an open problem whether ∀n.∃m.f (n) = 1.
Putting a(m) = 2m and b(m) = (2m-1)/3 (defined only when m ≡ 2 mod3), there is a partial function p:
List({ a,b})\rightharpoonup N with p([ ]) = 1, p(cons(a, l)) = a(p(l)) and p(cons(b,l)) = b(p(l)). The
Collatz problem asks whether p is surjective.
So both in the Collatz problem and in logic, the image of one recursive structure in another may be an
intractably intricate maze, cf Gödel's Incompleteness Theorem 9.6.2.
The semantics inherits a weak induction principle from the syntax. (For a stronger result see
Exercise 3.54.)
PROPOSITION 6.2.4 Let (X,\prec ) be well founded and let p:X\twoheadrightarrow A be a surjective partial
function. Then for predicates φ[a],
omitted prooftree
environment
PROOF: The rule is simply the induction scheme for φop: from ∀t.φ[p(t)] we deduce ∀a.φ[a] by
surjectivity. []
Variables At the lowest level of linguistic analysis, we use variables, operation-symbols and
punctuation, and it must be decidable whether any two such things are the same. This is obvious for
marks on the page or bits in a computer, but the development of Section 6.1 was intended also to apply
to internal structures, where the alphabets such as Ω are themselves types, and to address the issues
raised in Theorem 4.2.12.
EXAMPLE 6.2.5 The variable names (or generators, as we call them in this chapter) in a context have to
be distinct, so they should come from a population G with decidable equality. Our convention is that
they be explicit, but we need some elementary manipulations:
(a)
Example 2.7.6(e) tests whether the name x is in the list Γ;
(b)
Exercise 6.50 chooses new names from a population G, providing the inexhaustible supply of
variables needed for substitution under a λ (Definition 1.1.9ff);
(c)
Simple type theory (Section 2.3) is the free algebra Σ for the theory with three operation-symbols
(x, + and → ) over the base types.
In the same fashion we can define the set FV(t) of free variables of a term in algebra, the λ-calculus and
logic and prove the Extended Substitution Lemma (Proposition 1.1.12). The unification algorithm
(Section 6.5) also depends on parsing and the ability to distinguish between operation-symbols and
generators.
Many-sorted theories The type discipline is obeyed (globally) iff the (local) side-conditions on the
formation of each operation-symbol are satisfied hereditarily (Definition 2.5.4). As the type of a term is
that of its outermost symbol, the validity of the typing rules may be expressed by a function F→ ΩxΣ,
where Σ is the class of types. If the operation-symbols have finite arity and are distinguishable from one
another, we may instead define F→ Σ+{∗} , the extra element being an error value.
PROOF: Let (Ω,ar) be the corresponding untyped theory and F its free algebra. From the type discipline
of Notation 1.3.3,
r([(u)\vec]) ∈ F is well formed and of sort V iff the operation r has arity ∏ \typeU
j ∈ ar[r] j
\vdash V and for each j ∈ ar[r], \termu is well formed and of sort \typeU .
j j
Then the carrier \typeF of the free typed algebra at sort U is the set of well formed terms of sort U.
U
If A is any algebra for the typed theory then Θ = \coprod \typeA is a partial algebra for the untyped
U U
one, and then Theorem 6.3.13 gives the unique solution p:X\rightharpoonup Θ of the recursion
equation. By induction, if u:F is well formed and of type U then p(u) is defined and p(u) ∈ \A ⊂ Θ. []
U
The notion of type in this result is just accountancy: it need not be semantic. For example it may simply
be the number of terms in a list.
EXAMPLE 6.2.7 A word l ∈ List(Ω) is a well formed sequence of n terms over a finitary theory ar:Ω→ N
if
Those words which are single terms (n = 1) form the free algebra for the free finitary theory. This way of
forming expressions, for example (1+2)x(3+4) is written [x,+,1,2,+,3,4], is called Polish notation; it is
due to Jan Lukasiewicz (1920s). It is used by compilers in reversed form ( ie with operations after their
arguments) for the evaluation of arithmetic expressions using a stack (Exercise 6.8).
THEOREM 6.2.8
(a)
Every directed graph src,tgt:E \rightrightarrows O generates a free category.
(b)
Let C be any internal category and |C| its underlying graph. Then the internal graph
homomorphism \qqdash names objects and arrows of C as types and unary operation-symbols of
L(|C|), and hence as objects and arrows of \Clone ( ). Conversely, the internal functor [[-]]
L |C|
p≡
\Clone \twoheadrightarrow
L |C|
( )
[[-]]
However, we saw in the previous section that there are many applications in which the required structure
is not the initial algebra - consisting of all terms - but a subset of ``well formed formulae.'' Wff-systems
also have a direct computational meaning: they consist of the hereditary sub-arguments that are actually
generated in the course of a particular recursive calculation. So they measure the stack space that it
needs, and if an execution goes beyond the largest feasible wff-system then it overflows. This approach
can also be used when T is a functor such as the powerset which has no initial algebra.
We shall characterise wff-systems as extensional well founded parse- coalgebras, using a new,
categorical, definition of well- foundedness.
The proof of the main theorem - that induction implies recursion - works by pasting together attempts
(partial solutions); it is similar to the fixed point theorem (Proposition 3.3.11), but without the need for
Scott-continuity. In the infinitary case we want to reapply T after taking the colimit over infinitely many
steps. Transfinite iteration can be done using ordinals (Section 6.7), but the soundness of that technique
relies on the result we're about to prove.
Instead of doing this tedious and repetitive job ourselves, imagine (after John Conway) that we have a
class of servants, who each do what they can before getting tired. Classically, we ask which servant
claims to be the most (or maximally) hard-working and, by finding his shortcomings, show how he
might have done better. Intuitionistically, the result is obtained by co-operation. It uses second order
logic.
Behind this proof is the idea that wff-systems, like attempts, are built up from the empty set by iterating
the functor T. The collection of all such coalgebras generalises the von Neumann hierarchy in set
theory; the general recursion theorem also comes from that tradition, where it was originally stated for
the ordinals. As in set theory, the hierarchy exists even when the initial algebra doesn't.
REMARK 6.3.1 In this section, T:Set→ Set will denote any functor that preserves monos and inverse
image diagrams, and therefore partial functions and their composition and order (Section 5.3). Extra
ar r
structure is needed to handle parameters (Exercise 6.23). The functors P, P , List and ∑ (-) [] have
f r∈Ω
Using various such functors, the following development is not absolutely restricted to free theories.
Laws of certain forms can be handled, such as commutativity (permutation of maybe infinitely many
sub-terms) and idempotence (in the sense of the law r(x,x) = r(x), again infinitarily). Adding these to the
theory of lists, the list-forming operations such as cons and append make sets instead.
One can also generalise to a class of supports M (Definition 5.2.10) in categories other than Set with
certain completeness conditions [Tay96b].
Well founded coalgebras Definition 6.2.1 captures the feature which was common to the examples of
that section, using parse rather than ev.
Suppose that whenever the above diagram is a pullback, m is in fact an isomorphism; then we call X a
well founded coalgebra. Proposition 6.3.9 and Theorem 6.3.13 are examples of arguments which use
this as an idiom of induction.
Exercise 3.42 shows how this new notion of well- foundedness, reduced to its lattice form, relates to
induction for well founded relations and for closure conditions (Definitions 2.5.3 and 3.7.8).
EXAMPLE 6.3.3 A coalgebra for the covariant powerset functor defines a binary relation \prec by u\prec t
if u ∈ parse(t), so parse(t) = {u|u\prec t}. Well-foundedness agrees with the old sense
(Definition 2.5.3) because
H ≡ {(t,V)|parse(t) = V ⊂ U} ≡ {t|∀u.u\prec t⇒ u ∈
U}
is the induction hypothesis, the inclusion H ⊂ U is the induction premise ( cf H = U in Example 3.8.11
(a) and Proposition 6.1.4(c)), and U = X is the conclusion.
(∀w.w\prec u⇔ w\prec v) ⇒ u =
v,
iff parse is mono. Recall that parse is also mono in a wff-system.
REMARK 6.3.4 Let κ:T→ P be a cartesian transformation, ie a natural transformation whose naturality
squares are pullbacks.
Then parse:X→ TX is a well founded T-coalgebra iff the composite parse;κ :X→ PX is a well founded
X
There is a natural transformation κ:T→ P such that the naturality squares with respect to monos are
pullbacks iff T preserves arbitrary intersections (as our examples do). Then κ :TX→ P(X) by
X
κ (τ) = {v|∀U ⊂ X. τ ∈ TU ⇒ v ∈
X
U},
and we call the relation v\prec t⇔ v ∈ κ (parse(t)) the immediate sub-expression relation. For a free
X
ar r
algebraic theory, τ = (r,[(u)\vec]) and t = r([(u)\vec]) for some unique r:Ω and [(u)\vec]:X [ ]. Then v is
an immediate sub-expression of t iff v = \termu for some j:ar[r] (consider U = {[(u)\vec]}).
j
From the point of view of induction (though not algebra) this allows us to identify each expression with
its set of immediate sub-expressions, and to do so hereditarily. Certain ideas from set theory now
become useful, where for the purposes of universal algebra in the previous chapter they were a nuisance.
REMARK 6.3.5 [Gerhard Osius [Osi74]] Inclusion Y ⊂ X between wff-systems, or between sets in the set-
theoretic sense, is characterised by an (injective) coalgebra homomorphism, ie a function f:Y→ X
making the square on the left commute:
First notice that the inequality \parse ;P(f) ⊂ f;\parse holds iff f is strictly monotone (Definition 2.6.1 )
Y X
with respect to the associated \prec and \prec . Then the square commutes iff the lifting property
X Y
holds. An inclusion Y ⊂ X has this property iff it is closed downwards with respect to \prec ; we call it an
initial segment, cf Definition 2.6.5. As in Remark 6.3.4, if there is a cartesian transformation κ:T→ P
then initial segments for T and P agree. The unit of the extensional reflection (Example 7.1.6(g)), ie the
map linking a well founded structure to its Mostowski collapse, is a surjective coalgebra
homomorphism.
COROLLARY 6.3.6 If there is a coalgebra homomorphism f:Y→ X (or an initial segment) with X well
founded then Y is well founded too.
PROOF: By Proposition 2.6.2 or Exercise 3.54. There is also a direct categorical proof [Tay96b], in which
op
\funf* ≡ [f ] (Remark 3.8.13(b)) is applied to the subset V ⊂ Y testing well-foundedness of Y. []
The recursion scheme Recall from Definition 2.5.1 that the three phases of the recursive paradigm say
that p = parse;Tp;ev.
We use partial functions with the extension order \sqsubseteq (Definition 3.3.3).
DEFINITION 6.3.8 [Osius]Let \parse :X→ TX be a coalgebra and \ev :TΘ\rightharpoonup Θ a (partial)
X Θ
algebra. Then p:X\rightharpoonupΘ is an attempt if p\sqsubseteq \parse ;Tp;\ev , and satisfies the
X Θ
recursion equation if these are equal as partial functions, ie have the same support.
The coalgebra obeys the recursion scheme if for every such (Θ,\ev ) there is a unique p:X
Θ
\rightharpoonup Θ with p = parse;Tp;\ev . In particular, if \parse = \ev -1 and \ev is total, this
Θ X X Θ
This is a very convenient diagrammatic form in which to present recursive programs, as we illustrate in
Example 6.4.7 and Exercises 6.27ff.
LEMMA 6.3.9 Any partial attempt p:X\rightharpoonup Θ is given by a total one p = (p′;ev):Y→ Θ, the
support i:Y\hookrightarrow X being an initial segment.
If f:Z→ Y is another coalgebra homomorphism then f;p:Z→ Θ (restriction along f) also satisfies the
recursion equation.
PROPOSITION 6.3.10 If \parse :Z→ TZ is well founded then there is at most one total function p:Z→ Θ
Z
PROOF: Suppose p,q:Z\rightrightarrows Θ both satisfy it. Then the two parallel rectangles on the right
commute since p and q are total attempts. Let e:E\hookrightarrow Z\rightrightarrows V be the equaliser
of p and q, and form the pullback H.
The composites H\rightrightarrows TΘ are equal by construction, and j is mono by hypothesis. Hence H
\hookrightarrow Z\rightrightarrows V are equal, and H\hookrightarrow Z factors through the equaliser,
so e:E ≡ Z by well-foundedness of Z, whence p = q. []
Notice that once again we have uniqueness before existence (page 1.3.1).
REMARK 6.3.11 Using the conjunctive interpretation (Example 6.1.10), well-foundedness is also
necessary for uniqueness. For T = P, Θ = Ω and ev = ∧ = ∀, the recursion equation reduces to
p[t]
∀u.u\prec t ⇒ p[u] ,
⇔
which is the strict induction premise (Definition 2.5.4). But the constant function p:t→ T also satisfies
this equation, so uniqueness of p is equivalent to the induction scheme. See also Exercise 6.14. []
This result should be treated with circumspection: taking Θ = Ω means that we are using higher order
logic (a point which is obscured classically, where Ω = 2). Induction for the second order predicate φ[x]
≡ (x\not \prec x) shows that well founded relations in this sense are irreflexive, and so are too clumsy to
analyse fixed points of iteration. By closer examination of the carrier and structure of the intended target
of recursion, maybe we can restrict the class of subsets (predicates) to those that need to be considered,
and thereby get a weaker notion of well-foundedness which admits more source structures X but remains
sufficient for recursion.
The general recursion theorem It remains to show that well-foundedness is sufficient for recursion.
There is a zero attempt, with ∅ as support, and we now describe the successor.
LEMMA 6.3.12 Let parse:X→ T X be a coalgebra and p:X\rightharpoonup Θ be an attempt. Then Tparse:
TX→ T2X is also a coalgebra and (T p;ev):TX\rightharpoonup Θ and p\sqsubseteq q ≡ (parse;T p;ev):X
\rightharpoonup Θ are attempts.
Note that this is a diagram of partial functions: Remark 6.3.1 says that T acts on such diagrams. []
(\termv ))),
ij
where i ranges over ar[r] and j over ar[\ops ]. Sammy Eilenberg, one of the founders of category theory,
i
but whose main work was in subscript-ridden homological algebra ( cf Exercise 4.15) commented on a
seminar in 1962 that ``If you define it right, you won't need a subscript.''
PROOF: Let \typeY0,\typeY1 ⊂ X be the supports of \polly0 and \polly1, so \typeY0 and \typeY1 are initial
segments of X (Lemma 6.3.8). By Lemma 5.8.9, the union Y = \typeY0∪\typeY1 ⊂ X is the pushout
over the intersection Z = \typeY0∩\typeY1. These are also initial segments: the structure map of Z
mediates to T\typeY0∩T\typeY1 = T(\typeY0∩\typeY1), and that of Y from the pushout. Then Y and Z
are well founded by Corollary 6.3.6. The restrictions of \polly0 and \polly1 to Z satisfy the recursion
equation (Lemma 6.3.8), so agree by Proposition 6.3.9. Hence we may form the union p:Y→ Θ of the
partial functions as a pushout mediator, and p = \parse ;Tp;ev because the right hand side also mediates
Y
THEOREM 6.3.14 Let X be a well founded coalgebra and Θ a partial algebra. Then there is a greatest
attempt p:X\rightharpoonup Θ, and this satisfies the recursion equation, p = \parse ;Tp;ev. If Θ is total
X
then so is p.
PROOF: Attempts X\rightharpoonup Θ form an ipo, which we have just shown to be directed as well, so
let p:X\rightharpoonup Θ be the greatest one (by the Adjoint Function Theorem 3.6.9), with support Y.
As \parse ;Tp;ev is also an attempt (Lemma 6.3.11), by maximality we have p = \parse ;Tp;ev.
X X
Suppose now that Θ is total. Form the pullback H, with \parse = l;Tk and q = l;Tp;ev. Then k;q = k;l;Tp;
H
ev = p and \parse ;Tq;ev = q. So q:H→ Θ is an attempt, but p:Y→ Θ is the greatest such, so H = Y, but
H
then Y = X by well-foundedness. []
In applications such as Remark 6.2.10 and Proposition 6.2.6, we need the homomorphism p to be total
on the hypothesis that \ev is defined ``whenever it needs to be.'' It is still necessary to use methods of
Θ
proof by induction, but now the issue is that the steps be well defined (partial correctness ) rather than
that the whole process terminate.
(a)
there is at most one sub-argument to each recursive call, and
(b)
its sub-result is immediately passed out as that of the whole program (in particular the original
argument is not needed).
Operationally, the sub-argument s(x) may be assigned to the variable x, and the continuation z from the
(most deeply) nested call is just that from the main program, so the functional idiom of tail recursion
translates directly into the imperative while program
DEFINITION 6.4.2 The simple imperative language extends the conditional declarative language of
Definition 5.3.7 by the \bnfname program
EXAMPLE 6.4.3 The following program computes the highest common factor (y) of two integers a,b ∈ Z
by Euclid's algorithm.
where
REMARK 6.4.4 Notice that both parts of Definition 6.4.1 are needed for the translation into a while
program. The definition of the factorial function, read as a program, is not tail-recursive because the
argument must be saved for the multiplication with the sub-result. Unary recursion (satisfying just the
first condition) can, however, be translated into tail recursion by using an accumulator (Exercise 6.26).
It is unlikely that there is such a uniform way of reducing the arity of a recursion, though it can
sometimes be done. For example, the Fibonacci function is defined by
Semantics Fixed points in order structures (Sections 3.3- 3.5) give the semantic treatment of recursion
which is perhaps the best known, but they take no notice of the recursive paradigm, let alone the special
case of tail recursion. After setting up notation like that in Section 5.3, we shall use various categorical
techniques to axiomatise the naive understanding of iteration. There are several threads in our treatment,
describing the data and results in terms of partial functions, relational algebra, recursion for a functor as
in the previous section, and the categorical structure of Section 5.8, so you should feel free to skip some
parts.
Let X = Y+N be the partition induced by the loop condition and S ⊂ Y ⊂ X the support of the partial
function which represents the body s of the loop, so S∩N = ∅.
Together with the code Γ\rightharpoonup X and N\rightharpoonupΘ before and after the loop, the whole
program is illustrated by the staircase diagram, in which the S step may be repeated any number of
times. The letters of course stand for successor and zero, but notice that they count backwards:
comparing tail recursion with Remark 2.7.7, s is the predecessor. (This is the reason for using the letter
z for continuations.) Because of the exit condition, N is the target of the partial function W which
represents the whole loop.
defines a partial map parse:X\rightharpoonup X+N, which is a coalgebra for the functor T = (-)+N. The
algebra is ev ≡ [id,z]:Θ+N→ Θ, or just the codiagonal ∇:N+N→ N with no continuation z:N→ Θ after
the loop. The two conditions for tail recursion are respectively that the functor and algebra be of these
forms; Exercise 6.26 deals with arbitrary algebras ev ≡ [a,z]:ΘxX+N→ Θ. As the functor is always the
same, we drop the use of T for it and use this letter for something else.
Recall from Lemma 6.3.8 that a partial attempt X\rightharpoonup Θ is a total attempt on a subcoalgebra,
X\hookleftarrow W→ Θ. The rectangle on the right below says that \nearrow = if cthen s;\nearrow
elsezfi.
The semicolons arise from the relational algebra which we shall use.
EXAMPLE 6.4.7 For the Euclidean algorithm N = {0}xZ\hookrightarrow X = ZxZ and we define parse:
X→ TX = (ZxZ)+Z by
The loop terminates since |x| ∈ N, the loop measure, strictly decreases ( cf Remark 2.5.13,
Proposition 2.6.2 and Corollary 6.3.6). []
REMARK 6.4.8 Barry Jay observed that any map f:X→ Θ with m;f = s;f is invariant in the strict sense
that its value is restored after each iteration, so it is the same when the loop terminates (if it does) as at
the beginning. Such a map factors through the coequaliser Q, which Jay called the universal loop
invariant.
The correctness of while loops is always shown by finding an appropriate invariant. Indeed every
competent programmer writes ``the state of the variables at the point of the loop test is ...'' or similar.
This is vital, as the commonest error is to be out by 1 in an array suffix.
However, the established usage of the term loop invariant is for a predicate (so Θ = Ω) such that, if it
holds before execution of the body then it holds afterwards. In other words, f may become valid when it
had not been before, and the converse implication is not relevant. (The heredity operation, Definition
2.5.4, turns such a lax invariant into one in Jay's strict form, cf Theorem 3.8.11(a) and Exercise 6.2.) The
coequaliser must also be modified to account for the exit condition N.
both composites are \id and w is the coequaliser of the parallel pair. It follows that the recursion
N
equation for any z:N→ Θ has a unique solution, since z is the unique mediator from the coequaliser .
We shall use relational algebra to investigate this coequaliser in terms of the associated equivalence
relation. The results do not apply to arbitrary categories, but rely on certain exactness properties of Set,
and our aim, as in the previous chapter, is to find out what these are: we need stable coequalisers not just
of congruences but of functional relations. Recall that general coequalisers are computed in several steps
(Lemma 5.6.11), not all of which are needed this case, but we still need to consider stable directed
unions in order to form the transitive closure.
The coequaliser is also peculiar to this situation in another respect, namely that it only works for unary
recursion, cf the special properties of unary algebra in Section 3.8. In an arbitrary coalgebra, an element
is well founded iff all of its children (Remark 6.3.4) are well founded. If there is just one child, the
parent is well founded iff the child is, so the well founded elements are exactly those that are related to
childless (base) cases by the equivalence generated by the transition relation. It is like König's Lemma,
except that there is no choice to be made.
Transitive closure The partial function s will now be treated as a binary relation S:X\leftharpoondown
\rightharpoonup X and the subset N ⊂ X as a subrelation of the diagonal ∆ ( cf the virtual objects in
Proposition 5.8.7). These relations should be thought of as transition graphs on the set X of states;
evaluation of the whole program W consists in following the relation S, or rather its transitive closure T,
until we arrive inside the subset N, so W = T;N. (The letter T is no longer a functor.)
We take up the story of the transitive closure from Proposition 3.8.7, where it was defined by a unary
induction scheme. Frege (1879) showed that the transitive closure of any functional relation S is
trichotomous (classically). We shall show instead that it is confluent (Definition 1.2.5), so the
op
equivalence closure is K = T;T . Working backwards, K is the kernel of a coequaliser, and also gives
enough information to allow us to investigate T and W. Exercise 3.60 instead considers the transitive
closure as a list of steps, capturing the imperative idiom directly.
Let R = S∪∆ be the reflexive closure of S, and E = (S+N)∩∆ the equaliser of (S+N)\rightrightarrows X,
so N ⊂ E. Note that A+B denotes the union A∪B of two relations, but also signifies that they are disjoint,
↑
ie A∩B = ∅ ( cf the use of ∨ in Notation 3.4.3).
(a)
For any relation V:X\leftharpoondown \rightharpoonup Θ, if S;V ⊂ V then T;V ⊂ V.
(b)
For any relation A:X\leftharpoondown \rightharpoonup X, if A;S ⊂ S;A then A;T ⊂ T;A.
(c)
E;T = E.
(d)
op op
R = S∪∆ is confluent (Definition 1.2.5), ie R ;R ⊂ R;R ;
(e)
T is also confluent ( cf Lemma 1.2.4);
(f)
op
K = T;T .
∞ ↑ n ∞ n ↑
n R T;N = \coprod (S ;N)
∪ ∪ ∪ Rn;N
T S n=0
= =
n=0 =
from which the results follow by brute force.
They may also be proved in a finitary way using the universal properties in a logos. We show, in the
notation of Lemma 5.8.8(c), that
(a)
[[a]]
(b)
[[b]] Transitivity of A\(T;A) is similar; also A;S ⊂ S;A ⊂ T;A, so S ⊂ A\(T;A) and T ⊂ A\(T;A).
Hence A;T ⊂ T;A.
(c)
op
[[c]] E;S ⊂ E, so by (a ), E;T ⊂ E = E;∆ ⊂ E;T.
(d)
op op op op op op
[[d]] S ;S ⊂ ∆, so R ;R = ∆∪ S∪S ∪S ;S = ∆∪S∪S ⊂ R;R .
(e)
op op op
[[e]] As in (b), A;R ⊂ R;A⇒ A;T ⊂ T;A; with R for A, by (d), R ;T ⊂ T;R . By the same
op op op
principle for the opposite relations, now with T for A, we have T ;T ⊂ T;T .
(f)
op op op op op
[[f]] T;T ;T;T ⊂ T;T;T ;T ⊂ T;T . []
It is curious that we need two forms (a) and (b) of the inductive principle.
The recursion and induction schemes Now we are equipped to justify Remarks 6.4.6 and 6.4.8 and
see how the categorical induction and recursion schemes from Section 6.3 restrict to tail recursion.
THEOREM 6.4.10 The relation W = K;N is functional; it is the least solution ( cf Section 3.3) of the
recursion equation
w = ifcthens;welse
skipfi,
and satisfies c = no on exit.
op
PROOF: Since N = N = E;N = N;T by Lemmas 6.4.9(c) and (f),
op
W = K;N = T;T ;N = T;
N.
Then by Lemma 6.4.9(e),
op op op
W ;W = N;T ;T;N ⊂ N;T;T ;N = N ⊂
∆,
and by Proposition 3.8.7(b) and Lemma 5.8.8(b),
W = T;N = (∆∪S;T);N = N∪S;T;N = N+S;
W,
which is the recursion equation. This condition is no on exit from w because W = K;N = K;N;N = W;N.
The discussion of the transitive closure above completes the investigation of the stability of the steps in
the construction of general coequalisers in Lemma 5.6.11. The fact that K;N is functional can be restated
in terms of coequalisers, and is of interest in itself. It relies on stability of unions: see Example 5.8.2(d)
for a counterexample in Gp. Without loss of generality (by adding a loop counter), the body always
changes the state, ie N = E.
This says that there is at most one fixed point in each component of the transition graph of a functional
relation.
PROOF: Two elements x,y:Γ\rightrightarrows E of the equaliser become identified in the coequaliser Q ≡
X/K iff x,y ∈ K, since Q is effective. Then
op op op op
x,y ∈ E;K;E = E;T;T ;E = E;E =E⊂∆
by Lemmas 6.4.9(c) and (f) with S = U, so x = y as claimed. []
REMARK 6.4.12 The construction of W is shown in the diagram below. As a rule, pullbacks of parallel
pairs need not have the same vertex, but they do here because, as the composites S+N\rightrightarrows
X→ Q are equal, we may form the pullbacks rooted at Q (Lemma 5.1.2, Exercise 5.53).
With the same assumptions as before (and in particular in Set), W\twoheadrightarrow N is the
coequaliser of the pair shown. For suppose that V:X\rightharpoonup Θ with support W has equal
composites; then N;V ⊂ V and S;V ⊂ V, so
EXAMPLE 6.4.13 [Peter Freyd [Fre72]] Consider the following special case, with x:N, which always
terminates:
where N→ N+1 is the usual coalgebra structure, which is well founded by Peano induction. (Beware
that we have re-indexed this diagram: the above discussion actually gives id,pred:{n|n ≥ 1}
\rightrightarrows N.)
PROPOSITION 6.4.14 In the sense of Definition 6.3.2, W is the largest well founded subcoalgebra of X.
PROOF: Treat V ⊂ W as a subrelation, so the pullbacks are relational composites as shown. The
diagrammatic induction scheme says that
omitted prooftree
environment
which we have already deduced from Lemma 6.4.9(a). The way that W was constructed, as a pullback,
makes it the largest well founded subcoalgebra by Exercise 6.17. []
COROLLARY 6.4.15 W is the unique well founded solution to the fixed point equation in Theorem 6.4.10.
[]
Partial correctness Recall the difference between total and partial correctness, expressed in terms of
modal logic in Example 3.8.3.
{φ} u {θ}
is redefined to mean that, if u terminates and φ holds beforehand, then θ holds afterwards.
Then A;U ⊂ A ⊂ X consists of those states for which φ holds and u terminates; the effect of u is the
composite A;U\hookrightarrow U→ Y. The partial Floyd triple says that this factors through B ⊂ Y.
Hence there is a pullback mediator A;U→ U;B as shown, or, in terms of relations, A;U ⊂ U;B:X
\leftharpoondown\rightharpoonup Y.
By Corollary 6.4.15, the behaviour of a while program is captured by proving partial correctness like
this, together with termination, ie well-foundedness of the coalgebra W, which is done by finding a loop
measure.
Discussion Although we have used properties of Set, in particular the transitive closure, to prove
correctness of the interpretation, it can be stated using pullbacks, coproducts and coequalisers alone.
This means that any ( exact) functor which preserves finite limits and finite colimits also preserves the
interpretation.
REMARK 6.4.18 Correctness is reflected if the functor F:C\hookrightarrow S is also full and faithful. For
suppose that both categories have the limits and colimits needed to draw the diagrams in this section (so
F makes these agree) and that in S we have shown that FW\rightrightarrows FX is a functional relation
satisfying
(a)
FW = F N+FS;FW = FW;FN,
(b)
(c)
∀A. A;FS ⊂ FS;A ⇒ A;FW ⊂ FW;A.
Then F preserves composition and intersection of relations, and reflects their containment, so these
properties restrict to C. In particular, Freyd's characterisation of N (Example 6.4.13) says that exact
functors between toposes, such as inverse images of geometric morphisms, preserve N. []
When can a category without infinite unions be embedded in one with them, thereby generalising the
interpretation?
Stability of the coequaliser is clearly necessary, but unfortunately seems not to be sufficient. But if the
op n
coequaliser of S+N\rightrightarrows X and its kernel K exist then the cocone (∆∪S∪S )
\hookrightarrow K indexed by n ∈ N is always colimiting, even though we have not asked for a general
infinitary union operation. Then a pretopos C can be embedded in a topos of sheaves on C preserving
(finite limits, coproducts, quotients of equivalence relations and) the coequaliser iff this union is stable
under pullbacks in C.
Using this condition there is a simpler way to extend the proof: we only need unions of relations, not the
virtual objects in this sheaf topos.
THEOREM 6.4.19 The language is correctly interpreted in any pretopos such that each functional relation
has an equivalence closure which is stably the union of powers. Any functor which preserves finite
limits and colimits also preserves the interpretation.
6.5 Unification
Now we shall apply the parsing and well-foundedness properties of term algebras to the unification
algorithm of Remark 1.7.7. Our approach is a new one, which begins from an idea in universal algebra,
and carries this through directly to a very efficient implementation.
The most general unifier has a universal property, which may be viewed in two ways: it is the equaliser
x
in the category \Clone of contexts and substitutions (Example 5.1.3(b)) , and also the coequaliser of
L
free algebras. The connection between these points of view is the fact that the clone Cn× ([x1:\typeX1,
L
The most general unifier is not the coequaliser amongst all algebras.
EXAMPLE 6.5.1 Consider a unary operation s and two variables x and y. Then as a unification problem
the equation s(x) = s(y) implies x = y, but the coequaliser as an algebra is essentially ``N with two
zeros.'' []
Recall that the products in Cn× must be preserved in any interpretation of the theory L in a category.
L
The equalisers are not preserved because terms may have equal value without being provably so in the
theory.
We shall discuss coequalisers amongst arbitrary algebras in Section 7.4, and characterise the
subcategory of free algebras in Proposition 7.5.3(a). Section 8.2 constructs a version of the classifying
category which does have some equalisers (and so whose interpretations preserve them).
For our work in this section we need first to restore the generators which we discarded in
Definition 6.1.1. The recursion scheme for them is the universal property. The generators do not behave
as extra constants as we said in Definition 6.1.1, because the homomorphism which solves the problem
may send them to arbitrary terms in the target algebra Θ.
NOTATION 6.5.2 Write η :G→ FG for the inclusion of the generators G into the free algebra which they
G
generate.
Then any function f:G→ Θ to another algebra extends uniquely to a homomorphism p:FG→ Θ with η ;
G
p = f. []
Unification For any homomorphism f:A→ Θ of algebras, the kernel pair {(a,b)|f(a) = f(b)} is an
equivalence relation on A and is closed under its operation-symbols (Proposition 5.6.4). In a unification
problem the kernel pair is also closed under parse :
LEMMA 6.5.3 Let f:A→ Θ be a homomorphism from any algebra to an equationally free one (so even if A
is free, f may send generators to expressions).
→ → )) = \ev (s,\vreq f
\ev (r,\vreq f(u)) = f(\opr ( u )) = f(\ops ( v Θ
Θ A A
(v)),
but \ev is mono, so r = s and ∀j:ar[r].f(\termu ) = f(\termv ). []
Θ j j
So two expressions are unifiable iff their outermost operation-symbols agree and the corresponding pairs
of sub-expressions are each unifiable, but all by the same assignment to the generators. For unification
to be possible, we must therefore be able to distinguish the generators and operation-symbols from one
another, cf Example 6.2.5.
REMARK 6.5.4 Besides parse, the congruence is also closed under the operations r ∈ Ω and the axioms
for an equivalence relation.
(a)
Transitivity via generators gives rise to parsable equations:
→ →
r( u ) = x x = ··· = y y = s( v )
where x= ···= yis a zig-zag of equations amongst generators. (This affects the way in which we must
prove termination: consider
x=u y1 = \termv1 y2 = \termv2 y3 = \termv3 ···
r(x,r(x,r(x,…))) = r(y1,r(y2,r(y3,
…)))
in which ureappears in a new equation arbitrarily deeply.)
(b)
transitivity of equations between terms - but from r([(u)\vec]) = r([(v)\vec]) and r([(v)\vec]) = r
([(w)\vec]) we can deduce ∀j.\termu = \termv = \termw anyway;
j j j
(c)
symmetry and reflexivity may be taken as read;
(d)
applying operation-symbols: {\termu = \termv |j:ar[r]} \vdash r([(u)\vec]) = r([(v)\vec]);
j j
(e)
substitution of x = u into v = w, where x ∈ FV(v,w). By a similar argument this may also be
postponed, unless v or w is the generator x, when transitivity via generators applies. []
LEMMA 6.5.5 Let p:FG→ Θ be a homomorphism from a free algebra to any algebra. Then ( cf
Lemma 1.3.8)
px = p u ∧ py = p v ⇔ px = p u ∧ py = p v[x: = u] ,
REMARK 6.5.6 The previous Remark makes useful optimisations, as they avoid the need to compare or
copy terms, providing a parallel, in situ algorithm which unifies terms almost as fast as they can be read.
The terms may be represented in Polish notation (Example 6.2.7), but it is more usual to code each
instance of an operation as a record consisting of the operation-symbol together with pointers to records
for each immediate sub-expression (Remark 1.1.1). The pending equations are then held as pairs of
pointers to these records. The program needs to store the equivalence relation E on generators, together
with the equations R between generators and (pointers to) sub-terms.
The given and subsequently generated equations [(u)\vec] = [(v)\vec] do not need to be stored, since
each may be dealt with as a sub-process:
(a)
if it is of the form r([(u)\vec]′) = s([(v)\vec]′) with r\not ≡ s then we have a clash;
(b)
the equation r([(u)\vec]′) = r([(v)\vec]′) forks into sub-processes for each u′ = v′ ;
j j
(c)
x = u or u = x is added to R, unless there is already an equation x E = x′R = v, in which case u = v
is handled as in the previous cases;
(d)
x = y is added to E and their equivalence classes amalgamated; if two R-equations become linked
by u R = x′E = x = y E = y′R = v, then one of these is deleted and the unification step applied to u
= v. []
LEMMA 6.5.7 The unification algorithm (applied to any finite set of equations between terms for a
finitary free theory) terminates.
PROOF: Consider the total number of generators and operation-symbols in the outstanding equations,
including the terms assigned by R to the generators. The generators listed in E and R are not counted.
The operation-symbol r in case (b), or the generators x and y in cases (c) and (d), are deducted from this
count at each step, so the program terminates by Proposition 2.6.2. (In fact only the outermost operation-
symbol of u in case (c) is considered more than once, cf Remark 6.5.4(a).) []
PROOF: The unification step continues to be applicable as long as there remains any equation relating a
term to a term, possibly via generators. So when the iteration terminates, the outstanding equations
consist of an equivalence relation E: G\leftharpoondown \rightharpoonup G together with a system of
equations R:G\leftharpoondown \rightharpoonup FG such that, for x,y ∈ G and u,v ∈ FG,
uR=xE=yR=v ⇒ u≡
v.
Then any unifier of E\rightrightarrows G→ FG factors through G/E and so F(G/E). By construction, the
equations R link equivalence classes [x] ∈ G/E of generators to expressions, so without loss of generality
E is ( = ).
G
We want to partition the set G = D+I into dependent and independent generators, the former being the
support of R.
Define x < y on G if x\not ≡ y∧x ∈ FV(v)∧yR = v. If there is any unifier p:FG→ Θ then x < y⇒ p(x)\prec
\prec p(y), by structural induction on v, where \prec \prec is the transitive closure of the sub-expression
relation in Θ. Since \prec \prec is well founded (Theorems 6.3.13 and 3.8.11(b)) , < must be too
(Proposition 2.6.2). In particular the relation < must have no cycles, which, by substitution, would lead
to the situation x = u(x).
Since G is finite and < is decidable, to verify well-foundedness it is enough to make the occurs check,
for such loops (Exercise 2.36).
By < -recursion, dependent variables may now be eliminated from R using Lemma 6.5.5. Then we have
one equation for each generator, which expresses it as a term in F I, the independent variables standing
for themselves, so the left-hand triangle commutes:
Each step of the execution has replaced one unification problem by another which is equivalent to it, the
last being an assignment of terms to independent variables, which has a coequaliser. Since the latter is
unique up to unique isomorphism, the algorithm is confluent, despite being slightly non- deterministic. []
6.6 Finiteness
It probably comes as something of a surprise that we have got this far through a book on foundations of
mathematics - especially a book which stresses constructivity - without discussing finiteness. Emphasis
on finite enumeration was a reaction to the excesses of classical set theory, and clearly any set which is
given by picking its elements must be finite in order to be represented in a machine.
The reason why we have played down finiteness is that mathematical and computational objects are
normally handled according to their structure, with no need for explicit enumeration. Besides,
enumerative processes are very slow: one does not have to go to a very high order of functions over a
two-element base type to exhaust the memory of a computer, or indeed of the Universe. We can
nevertheless handle higher order functions very easily with the λ-calculus, and prove certain properties
of all numbers by induction without examining each one.
Arguments in combinatorics usually do make essential use of finite sets. Unfortunately it is for these that
the type theory and logic tend to be the most difficult to mechanise (although the combinatorial aspects
can be handled directly), whereas Section 1.6 showed that variables, the connectives and the quantifiers
can be translated very fluently between the vernacular and formal language. When a proof does need
finite sets, considerable bureaucratic manipulation of lists must be added to the level of detail that a
(human) mathematician would regard as sound and precise. Even this section elides many such details,
since we have already said as much about lists as we intend to say.
When we have spoken of finite sets in this book, notably in the definition of algebraic theories and the
crucial result about quotients of algebras, we have usually had in mind particular numbers, such as 3.
It is said that ``hard cases make bad law'': each instance of finiteness is familiar, but it is very difficult to
identify the abstract notion.
Adam of Balsham (1132) observed that the difference between finite and infinite sets is that the latter
admit proper self-inclusions, such as n→ 2n ( cf Exercises 1.1 and 2.47). But, without Choice, only
special infinite sets have this property, so the rest are inappropriately called ``finite.''
As with the exactness properties of Set (Section 5.8), progress was made only after similar concepts had
been identified in algebra and topology. In these disciplines, many familiar objects, despite having
infinitely many elements, are ``bounded'' in some sense which may be used to investigate their
properties. For example [0,1] ⊂ R is compact (if it is contained in any directed union of open sets then
just one of them suffices), whence any continuous function on it is bounded and attains its bounds.
EXAMPLE 6.6.1 A commutative ring R is said to be Noetherian if every directed set of ideals is
eventually constant. Assuming excluded middle and Dependent Choice, every element a ∈ R of such a
ring can be expressed as a product of irreducibles ( cf Example 2.1.3(b) for prime factorisation).
PROOF: If a is reducible then a = bc, where the ideals they generate satisfy (a)\subsetneqq (b)\subsetneqq
R. Repeating this process, we get either a product as required or, by Dependent Choice (Definition
1.8.9), a properly increasing sequence of principal ideals, which is forbidden by hypothesis. []
Emmy Noether was the pioneer of conceptual mathematics ( begriffliche Mathematik), but this notion is
no mere piece of abstract nonsense. Since it is inherited by quotients, polynomial rings and even formal
power series rings, it replaced the ``jungle of formulae'' (as she described her own doctoral thesis) which
had characterised nineteenth century algebra ( cf Exercise 6.49).
Three definitions by counting Of course a set is finite iff there is a listing of its elements, and we
know from Section 2.7 what a list is. Classically (more precisely, when equality is decidable), any
repetition in such a list may be eliminated, but anyone who has tried to maintain a moderately large
database will be aware that this is not a trivial problem in practice. Depending on the amount of control
we have over repetition, there are three useful definitions.
(a)
finitely enumerated if n ∈ N and p:n ≡ X are given, where n is the set {m:N|m < n};
(b)
finitely presented if n,k ∈ N are given together with a coequaliser diagram k\rightrightarrows n
\twoheadrightarrow X; the elements of k name laws (and also generate a finite equivalence
relation);
(c)
finitely generated if n ∈ N and a surjection p:n\twoheadrightarrow X are given;
and finitely enumer able, present able, or gener able if these structures exist but are not specified. The
first two are equivalent for sets (Exercise 6.42); the last is usually called Kuratowski -finiteness, but
Cesare Burali-Forti was actually the first to formulate it.
EXAMPLES 6.6.3
(a)
Theorem 5.6.9 requires that the positions in the arity of a general finitary operation must be
distinguishable , because we want to put semantic values in them separately.
(b)
If an object is repeated in a family of which we are forming the coproduct, then we get multiple
copies of it, which (in the case of Set, Section 5.5) are disjoint.
(c)
However, an element of a semilattice may be repeated arbitrarily in the formation of a join
(Proposition 3.2.11 ).
(d)
Any closure condition may be repeated without changing its force.
(e)
An alias in a database specifies that two entries denote the same thing. If we are sure that all such
coincidences have been recognised then the database is a finite presentation of its subject matter,
otherwise it is a finite generation.
(f)
Any subsingleton {∗|φ} ⊂ 1 is finite in all three (``-able'') senses iff φ is decidable (Exercise
6.41). More generally, a subset of a finite set is finite iff it is complemented.
It does not seem to be an appropriate property to require of a notion of finiteness that any subset of a
finite set be finite (but cf Remark 9.6.5). As with the connectives of logic and type theory, we shall take
natural deduction for finite sets (rather than the prejudices of classical logic) as our guide, and this
matches the definitions we have given.
REMARK 6.6.4 There are similar (and in fact prior) notions in algebra. The first corresponds to Fn, the
free algebra on n generators (such as a finite- dimensional vector space), which replaces the set n in the
others. This analogy is a conceptual one, and is only tenuously related to the size of the carrier sets. The
algebraic definitions are also distinguishable even in classical logic.
(a)
The free monoid on one generator is infinite.
(b)
Addition modulo m is finitely presented as a monoid (generated by one element subject to one
law), but it is not free; of course the carrier set is finite.
(c)
The wreath product Z\wr Z is the group generated by symbols a and b subject to the relations that
n n m m
b- ab commutes with b- ab for all m,n ∈ Z. It has no finite presentation.
Any algebra with finitely enumerated carrier for a theory with finitely many operation-symbols is
finitely presented as an algebra: for laws we just list the instances of the operations ( cf the ambiguity in
the usage of this word mentioned in Definition 1.2.2). However, there is no other implication. We shall
compare and contrast these definitions further in Proposition 6.6.13ff.
The ability to count We have usually avoided discussing the external interpretation of logic, but it is
important to understand the meaning of the existential quantifier (``-able'') in these definitions.
REMARK 6.6.5 Recall from Remark 5.8.5 that Γ\vdash∃y: Y.φ[y] means that !:{y: Y|φ[y]}→ Γ is epi. Thus
an object X is finitely enumer able or gener able iff
∆
The left projection ∆→ N, an element n ∈ N , may be regarded as a ``variable number,'' viz the generic
∆
length of a way of listing the set. For i < n, ie i ∈ N with ∀δ.i(δ) < n(δ), we have x (δ) = p(i(n,p)) ∈ X
i
X = {x0,…,x ,…,x
i n-
1}
(either as a set or as a list) so long as the suffixes are understood to be these variable numbers.
Now consider the introduction and elimination rules for the quantifier.
way on the listing, ie n, i, or x . On the other hand, (∃E )-boxes are always open-ended below
i
(Remark 1.6.5).
COROLLARY 6.6.6 If a set is finite in either sense then we may assume that it has a listing (with or
without repetition). []
This means that we do not need to mention the set ∆ when working in the internal logic, nor are we
making a choice of listing (Remark 1.6.7). But this box, like all others, must be properly nested. So if
the set X depends on parameters, the (∃E )- box which encloses the choice of listing must be closed
before any surrounding box which quantifies the parameters.
However, there are problems with counting even when we seem to be able to identify a specific number
of distinct things.
(a)
in C is Kuratowski-finite, since zero has just one;
(b)
on the unit circle S1 ⊂ C satisfies ∃p.p: 2 ≡ X, where the existential quantifier ( cf Example 2.4.8)
must be interpreted carefully as above: there is a cover ∆\twoheadrightarrow S1 by open subsets
each equipped with such an isomorphism, but there is no global one. The collection {p|2 ≡ X}, as
a sheaf on S1, is called a torsor; as an algebraic structure, it retains the Mal'cev operation (u v-1
w) of the automorphism group of X, but not the identity element (Example 9.2.12(d)).
To sum up, we are not at liberty to regard the listing of a dependent finite set X[y], or even the number of
elements needed to cover a Kuratowski- finite set, as a function of y.
Counting is unique But if a set can be finitely enumerated then the length of such an enumeration is an
invariant. Although this must have been known as a fact before any other human intellectual
achievement, the realisation that there is (a) a powerful theorem, and (b) something to prove, takes a
degree of mathematical sophistication.
Bo Peep's Theorem (Exercise 1.1) amounts to saying that any injection n\hookrightarrow n is a
bijection. The pigeonhole principle is a stronger result: if f:n→ m with m < n then f(i) = f(j) for some i ≠
j ∈ n.
These properties fail, of course, for big sets like N, but finiteness is also about granularity. This is the
reason for only allowing decidable subsets of finite sets to be called finite. For example, if I give you
some wool, and some more wool, and a third quantity, and then take some back, and some more, and
some more again, you may or may not still be holding some of what I gave you.
Besides its usefulness in agriculture, the power of counting is illustrated by the Sylow theorems in group
theory.
k
FACT 6.6.8 Let G be a group with p m elements, where p is a prime not dividing m. Then there is a
k
subgroup H ⊂ G of order p . Moreover the number n of such subgroups (which are conjugate to each
other in G) divides m, and n ≡ 1modp. If n = 1 then H is normal. []
By counting Sylow subgroups and elements of particular orders we may easily identify the simple group
of order 168 and show that there is no simple group of small non-prime odd order (105 is the first tricky
case).
Finite subsets Now let us look more closely at Kuratowski finiteness, which is apparently an exception
to the rule that the poset version of a concept is simpler and older than the categorical analogue. This is
the one which is usually relevant for subsets, since the other requires equality of elements to be
decidable before we may form the subset {a,b}.
Although the induction scheme is the main feature to which we want to draw attention, we shall
continue the discussion based on lists instead of developing the theory from the closure conditions as we
did for the transitive closure in Definition 3.8.6ff ( cf Exercises 3.60 and 6.43).
There are rules for adding elements and subsets, corresponding to cons and append for lists. P (X) is
f
the free semilattice on a set X, just as List(X) is the free monoid (Corollary 2.7.11).
List(X)\twoheadrightarrow P (X)\hookrightarrow P
f
(X)
of the ``set of elements'' function, Example 2.7.6(f). This is a recursive cover (Definition 6.2.2).
LEMMA 6.6.10
(a)
A subset U ⊂ X belongs to P (X) iff U is finitely generable; this property is, in particular,
f
(b)
P (X) is the smallest set of subsets closed under
f
transitive closure (Definitions 2.7.1, 2.7.3and 3.8.6). Exercise 6.44gives the binary induction scheme,
corresponding to addition and the appendoperation.
(c)
The image of a finitely generable set is also finitely generable.
(d)
The coproduct or union of two finitely generable sets is also finitely generable.
(e)
P (X) is a semilattice, and hence directed (Definition 3.4.1); seen as a diagram in P(X), the join
f
is X.
(a)
[[a]] ``U ∈ P (X)'' and ``U is finitely generable'' are both existentially quantified statements. The
f
(b)
[[b]] The second part is now an example of Proposition 6.2.4.
(c)
[[c]] The third follows from the definition of finite generability .
(d)
[[d]] [ ] lists ∅; if t lists U then cons(h,t) lists {h}∪U; if l1 and l2 list \typeU1 and \typeU2 then
l1;l2 lists both \typeU1+ \typeU2 and \typeU1∪\typeU2.
(e)
[[e]] X is the join (within P(X)) of its singletons, X ⊂ P (X). []
f
PROOF: Let f:X→ Θ be a function to another semilattice, in which (by Proposition 3.2.11) the operations
may be taken to be ⊥ and ∨. By Kuratowski induction, (Θ, ≤ ) has joins of all finitely generable sets, but
if U ⊂ X is finitely generable then so is its image p(U) = \fred!(U) ⊂ Θ, so it has a join. Hence there is a
semilattice homomorphism P (X)→ Θ extending f; it is unique because the equaliser of any two such is a
f
The empty set Corresponding to List(X) ≡ {∗} +XxList(X), an element of the free semilattice can be
parsed as empty or inhabited.
≠0
COROLLARY 6.6.12 P (X) is the disjoint union of {∅} and P (X), the set of inhabited finitely
f f
generable subsets; the latter is the free algebra for binary join alone. In other words, it is decidable
whether a finitely generable subset is empty or has an element, cf Example 6.6.3(f).
PROOF: {⊥,T} is a semilattice; the unique homomorphism taking all singletons to T maps everything else
there except ∅. []
Finiteness and Scott-continuity Theorem 3.9.4 established a link between the (Kuratowski-)finite
arities of Horn theories and preservation of directed joins. We should therefore look for a categorical
analogue involving finite presentability. To recap,
(a)
X is finitely generable in the listing sense of Definition 6.6.2(c);
(b)
it is finitely generable in the sense of Definition 3.4.11, ie if X = \dirunion \typeU then ∃i.
i∈I i
\typeU = X;
i
(c)
X
the functor (-) preserves directed unions.
Moreover every set is the directed union of its finitely generated subsets.
PROOF: [b] is a special case of (c), and the last part is Lemma 6.6.10(e), whence [b⇒ a]. [a ⇒ c] Any
function f:{x1,…,x }→ ∪ \typeV factors through some \typeV . []
n i i i
X
It can also be shown that a set is finitely presentable iff (-) preserves filtered colimits (Example 7.3.2
(j)), and conversely that every set can be expressed as a filtered colimit of finitely presentable sets. This
X
suggests a generalisation, since, for Set, the exponential (-) is the same as the hom-set Set(X,-).
DEFINITION 6.6.14 Let X be an object of a category S with finite limits and pullback- stable filtered
colimits.
(a)
X
If the functor H ≡ S(X,-):S → Set preserves filtered colimits then we say that X is externally
finitely presentable.
(b)
X
If S is cartesian closed and the functor (-) :S → S preserves filtered colimits then X is internally
finitely presentable.
(c)
If every object of S is a filtered colimit of finitely presentable objects then S is locally internally
or externally finitely presentable.
The algebraic notions of finiteness (Remark 6.6.4) can be shown to be equivalent to the external ones for
S = Mod(L), where L is a finitary algebraic theory. Mod(L) is locally externally finitely presentable;
Peter Gabriel and Friedrich Ulmer [GU71] showed that every locally externally finitely presentable
category which also has all small colimits is of this form, for some essentially algebraic theory in the
sense of Remark 5.2.9. The term LFP is traditionally applied only to such categories, in contrast to
algebraic dcpos, Definition 3.4.11.
N
EXAMPLES 6.6.15 To make the distinction clearer, let S = Set be the category of presheaves on N. Then
X ∈ obS is finitely presentable
(a)
internally iff each of the sets X(n) is finite, although they may grow without bound as a function
of n;
(b)
externally iff ∑ X(n) is finite, so ∃m. ∀n > m.X(n) = ∅.
n
The terminal object \terminalobj = λn.{∗} is internally but not externally finite. Nor need finitely
S
This, and the fact that emptiness or habitation of a Kuratowski-finite set is decidable, illustrate that
singletons are rather big subsets. In this case how is it that every set is the directed union of its finitely
generable subsets? If it only consists of partial singletons, the nodes of the diagram (other than ∅) are
themselves partial.
It was already known before Georg Cantor that a real-valued function could have a Fourier
representation even if it had finitely many points of discontinuity . He saw that this could be extended to
functions which were discontinuous on a set \typeX0 such as {0}∪{1n|n ∈ N} with an accumulation
point \typeX1 = {0}. Repeating the argument, \typeX1 could have a set \typeX2 of accumulation points
and so on. Then in 1870 he realised that the set ∩\typeX could also be non-empty, and that a
n
``transfinite'' sequence of sets \typeX , \typeX , ... could be defined. We might picture ∞, now called
∞ ∞+1
ω2 = \ordinal ω2 ω3 = \ordinal
ω3
The easiest way to tell these countable infinities apart is to work backwards, ie find out what descending
sub-sequences there are. Every such sub-sequence must eventually stop: that's well-foundedness
(Proposition 2.5.9). In the longer ones, there is more opportunity to dawdle down, but from time to time
we must leap from the heights of a limit ordinal, landing somewhere that is infinitely far below.
The relation on an ordinal is easily seen to be transitive and extensional (Example 6.3.3), so it is useful
to identify each ordinal with its set of predecessors, cf 5 = {0,1,2,3,4} and Remark 6.3.5.
(a)
Zero (the empty string of matchsticks) is an ordinal.
(b)
If α is an ordinal then so is succα, its successor, also written α+. We embed α ⊂ succα as an
initial segment, and use α to name the extra matchstick on the right. Under the identification of
each β ∈ α with {γ ∈ α|γ \prec β}, the elements β and α of succ α are subsets of α, so succα
may alternatively be defined to consist of the initial segments of α. These definitions are
equivalent classically, but not intuitionistically, and we shall be ambiguous for the moment as to
which is intended.
(c)
If α for i ∈ I are ordinals then so is ∪ α , this union being taken with respect to the inclusions
i i∈I i
of initial segments.
Successor and directed union preserve Cantor's definition, but comparing two ordinals (or directed
unions) given separately relies on excluded middle: if α∩β ≠ α then β is the least element of α\(α∩β).
Between any two ordinals there is a reflexive relation ( ⊂ ), whether one is an initial segment of the
other. On the other hand, the identification between systems and individuals gives rise to an (irreflexive)
well founded relation, written as ∈ . They are related by
Classically, any ordinal is either zero, or a successor or a limit (union). These cases may be
distinguished by trying to define the predecessor,
α = {γ| ∃β.γ ∈ β ∈ α} ⊂
∪
pred(α)
= α.
Then pred(succ(α)) = α, and either pred(α) ∈ α if α is a successor ordinal, or pred(α) = α if it is a
limit.
Transfinite recursion Since ordinals are a special case of well founded relations, for any \ev :P(Θ)→
Θ
Θ the equation
α})
has a unique solution by the General Recursion Theorem 6.3.13.
REMARK 6.7.3 The classical transfinite recursion theorem is based on a three-way case analysis. Instead
λ
of \ev , it takes an element x ∈ Θ, and functions \ops :Θ→ Θ and \opu :Θ → Θ. (Since the arity of u
Θ α λ
depends on the ordinal anyway, we are obliged to treat the case where the argument is used in the
evaluation phase of the recursion.) Then there is a unique function p with
PROOF: Since u may make use of its arguments in an entirely arbitrary way, the general recursion
succα
theorem for free theories must be used, with operations \opr :Θ \rightharpoonup Θ of each arity.
α
For
(x ),
α
the operation \opr( must be able to select x , its αth argument, by its position. Of course \opr0(∅)
succα α
)
This arbitrariness is not possible in the intuitionistic version, but recall that infinitary operations tend to
be given by universal properties, join being typical. The three conditions are regarded as equations to be
solved or ``boundary conditions'' rather than as a case analysis.
THEOREM 6.7.4 Any system of ordinals obeys the universal property (recursion scheme) for a free partial
(∨,s)-algebra with s monotone. That is, for any complete join-semilattice Θ equipped with a monotone
unary operation \ops :Θ→ Θ, there is a unique homomorphism p, ie
Θ
β ⊂ α⇒ p(β) ≤ p
(α),
and for any limit ordinal λ we still have p(λ) = ∨{p(α)|α ∈ λ}.
{\ops (p(β))|β ∈
∨ α},
p(α) Θ
=
is satisfied since, by monotonicity of p and \ops , \ops (p(α)) is the biggest term in the join. The
Θ Θ
property for limit ordinals still holds since any such ordinal is closed under successor. []
The seed x need not be ⊥, as in place of Θ we may consider the upper set x↓ Θ ≡ {x|x ≤ x} (Example
3.1.6(f)). The unary operation s may also take an ordinal parameter, in which it must be monotone.
Remark 9.6.16 considers iteration of functors, with filtered colimits in place of directed joins, and using
the axiom-scheme of replacement.
Rank It is a striking feature that systems of ordinals share many of the properties of the individuals.
(We have already seen this for wff- systems in Remark 6.2.10.) However, Cesare Burali-Forti (1897)
showed that the whole system is a proper class: it cannot be an individual. Dimitry Mirimanoff (1917)
showed that this and the Russell paradox don't arise in Zermelo set theory (Remark 2.2.9), by measuring
the well-foundedness of each definable set. This was later incorporated into Zermelo- Fraenkel set
theory as the axiom of foundation.
∪ t}
p(t) {s(p(u))|u\prec
=
defines the ordinal rank function on X, using the axiom of replacement. Remark 3.7.12 gave the finitary
version of
PROPOSITION 6.7.6 Let t ∈ F be a term for a free theory T. Then the rank of t with respect to the
p(t)
immediate sub-expression relation is the first stage at which t occurs, ie t ∈ T (∅). []
Arithmetic In practice it is not feasible to calculate the rank of a term or proof exactly, cf the number
of iterations needed to complete a loop (Remark 2.5.13ff). Instead of the least upper bound ∪ in the
previous definition, we assign to r([(u)\vec]) any value which is larger than those of the sub-expressions
[(u)\vec], as in Examples 2.6.3ff. This can be done using ordinal arithmetic. It was to define ordinal
exponentiation that Cantor introduced transfinite recursion.
omitted array
environment
are extended transfinitely by making each operation preserve inhabited (directed and binary) joins in the
second argument β. (Using upper sets, they can instead be considered to preserve all joins.)
but
β
It is also important to appreciate that α is not an exponential in either a cartesian closed category or a
Heyting lattice. One way of seeing this is that the latter have left adjoints (product and meet).
REMARK 6.7.9 As these operations preserve unions, they have right adjoints (Theorem 3.6.9), called
subtraction, division and logarithm,
where On is the class of ordinals (reflexively) ordered by inclusion. The units and co-units of the
adjunctions satisfy
subject to the conditions on the right. Notice that we subtract matchsticks from the left, so (-)-1 is not
pred (for example at ω+1 ). The second equation, however, depends on excluded middle: 1-φ = \lnot φ
and φ+(1-φ) = φ∨\lnot φ for φ ⊂ 1. Since for 2 ⊂ α and 1 ⊂ γ we also have
γ β β
γ÷αlogα \prec α γ-(α *(γ÷α ))\prec
β
α ,
it is possible (classically) to write any ordinal γ uniquely as
β β β β
α n*δ + α n-1*δ +···+ α 0*δ0 + α 1*
n n-1
δ1,
where β \succ β \succ ···\succ β2\succ β1 and δ \prec α. The β may then also be decomposed in the
n n-1 i i
same way. In the case α = ω, this hereditary expression is known as the Cantor normal form. []
REMARK 6.7.10 Arithmetic on matchstick pictures may be represented by the lexicographic product
ω ωω
(Proposition 2.6.8), and in fact any finite tower ω, ω , ω , ... can be encoded as an order on N (or a
subset of Q) using primitive recursion. Lemma 6.7.8 may be extended to a system of reduction rules
β
(there are some messy extra cases for α ), and such expressions do have a (Cantor) normal form.
Moreover, by comparing successive exponents of ω and their coefficients , the trichotomy law holds,
just as it does in N.
The union (equivalently, the collection) of all ordinals which may be expressed using finite towers of ωs
is called ε0. By induction over ε0 one can prove consistency of Peano arithmetic (with primitive
recursion), so by Gödel's Incompleteness Theorem 9.6.2 general recursion must be needed to show that
ε0 is well founded. Proof theory, which is typically concerned with extensions of first order logic by
specific induction schemes, uses the representability of ordinals to measure the strength of formal
systems. (The final section looks at other ways of doing this.)
Classical applications Refining Burali-Forti's argument, Friedrich Hartogs (1917) actually provided
LEMMA 6.7.11 For any set X, the set HX of ordinal structures on subsets (or subquotients) of X forms an
ordinal, and there is no injective function HX\hookrightarrow X. In particular ω1 = HN is the first
uncountable ordinal and others are defined by iteration (Example 9.6.15(b)). []
EXAMPLE 6.7.12 Let s:Θ→ Θ be a monotone endofunction of an ipo (Definition 3.4.4). With α = HΘ,
define p:succα→ Θ by Theorem 6.7.4. Then x = p(α) satisfies \lnot \lnot (x = sx). If there is a fixed
point, y ∈ Θ such that (y = sy), then x ≤ y. []
Assuming excluded middle, x is the least fixed point, and conversely Exercise 6.54 derives ordinal-
indexed joins from least fixed points.
Besides transfinite recursion, ordinals also provide a convenient way of performing constructions which
require the axiom of choice.
(a)
the axiom of choice as Ernst Zermelo gave it: for any set Θ, there is a function c:P(Θ)\{∅} → Θ
with ∀U.c(U) ∈ U;
(b)
the well-ordering principle: any set carries an ordinal structure;
(c)
Zorn's lemma: any ipo (Definition 3.4.4) has a maximal element;
(d)
the axiom of choice as given in Definition 1.8.8.
Recall that the axiom of choice implies excluded middle (Exercise 2.16).
(a)
[[a⇒ b]] Define p(α) = c(Θ \{p(β)|β ∈ α}) by the General Recursion Theorem 6.3.13. Hartogs'
Lemma bounds the ordinal needed.
(b)
[[b⇒ a]] It is the typical use of the well-ordering principle to let c(U) be the first element of U.
(c)
[[a⇒ c]] Apply Example 6.7.12 to s(x) = c({y|x ≤ y∧ x ≠ y}); as there is no fixed point, the
construction must cease to be possible at a certain stage, which is a maximal element. []
Axiomatisation It is quite usual for intuitionistic logic to cause a bifurcation of definitions and results.
The Cantor Normal Form fails as a theorem, but proof theory has found ordinals in this form (as
synthetic expressions) valuable as a measure of complexity. On the other hand, what should the analytic
second order definition of ordinal be? One objective of such a theory would be to replace the set theory
which still litters infinitary algebra, for example linking the ``cardinal'' rank in Definition 6.1.1 to the
ordinal one in Proposition 6.7.6, and to preservation of κ-filtered colimits.
Cantor's definition is equivalent, classically, to requiring the relation \prec to be transitive, extensional
and well founded. The successor is just α∪{α} and the condition ∀x.x ≤ \ops (x) suffices for the
Θ
Transfinite Recursion Theorem 6.7.4. However, intuitionistically, we only have
omitted array
environment
There are several intuitionistically inequivalent notions of ordinal, (some of) which André Joyal and
Ieke Moerdijk [JM95] characterised as the free structures in the sense of Theorem 6.7.4 for which s
(a) satisfies no condition, (b) obeys x ≤ s(x), (c) is monotone or (d) preserves binary joins. These
correspond to sets, transitive, plump and directed ordinals.
The problem lies in the confusion of ∈ with ⊂ in the classical theory (where
β ⊂ α⇔ β ∈ α∨β = α). In fact there are three relations: the partial order should be treated a priori
separately even from the inclusion defined in terms of the (irreflexive) well founded relation.
REMARK 6.7.14 The results of Section 6.3 may be developed for the functor shv:Pos→ Pos (Definition
3.1.7) in the place of the covariant powerset P:Set→ Set. A shv- coalgebra is well founded in the sense
of Definition 6.3.2 iff \prec is a well founded relation (Definition 2.5.3). The difference between sets
and ordinals lies in the notion of ``mono'' used to define extensionality, viz inclusions of lower sets. The
structure map
α → {x|x\prec α}
(X, ≤ ) \hookrightarrow {U ⊂ X| ∀u,x.x ≤ u ∈ U⇒ x ∈
U}
The successor operation must be adjusted accordingly: now succα does indeed consist of all initial
segments of α, and all four implications above become reversible. But, as their name suggests, plump
ordinals consist of rather more than the matchstick pictures: 2 is Ω, 3 is shv(Ω) and the axiom-scheme
of replacement seems to be needed to construct plump ω.
Unlike the classical ones, neither the transitive nor the plump ordinals need be directed
(intuitionistically): this rather useful property has to be imposed separately (and hereditarily). In fact the
same strategy, now in the category of binary semilattices (with ∨ but not necessarily ⊥), provides
directed plump ordinals. These are better in that only directed joins are needed in the Transfinite
Recursion Theorem 6.7.4.
Written in set-theoretic language, these intuitionistic notions of ordinal are very complicated [Tay96a].
Category theory has come to the rescue, restoring the unity of the theory by abstract analysis of the old
notions of extensionality and well-foundedness. The constructions may then be applied to other functors
which do not have the ontological pretensions of the powerset, but the resulting ordinals have quite
different properties.
For example, we use least fixed points in informatics, and expect to find a system which, unlike
Cantor's, naturally stops at ω, this being its own successor, rather than being curtailed perfunctorily à la
Hartogs [Tay91]. For this, TX in Dcpo is the lattice of Scott-closed subsets of X, where we used lower
subsets in Pos above. Domains of this kind have been generalised by Roy Crole and Andrew Pitts
[ CP92] to ``FIX objects'' in more complicated recursive structures. The induction scheme must be
restricted to a smaller class of predicates, such as the Scott-continuous ones in Theorem 3.7.13.
The proof of Proposition 6.7.13 above was not Zermelo's, which instead constructed the well founded
relation on Θ using the substance of Θ itself, rather than the monolithic structure of all ordinals. This
monolith has failed quite spectacularly to provide an intuitionistic proof of the fixed point theorem in
Example 6.7.12: such a proof was found in the centenary of Burali-Forti, but depends on no such
machinery. We have already given it in Exercise 3.46, and in fact it is very similar to Zermelo's 1908
The remaining chapters build and study Cn[] , often using induction, recursion and parsing, but only
L
6.8 Exercises VI
1. Construct FV(t), the set of free variables of a term in a free algebra for a free theory, as an
example of the conjunctive (or, rather, disjunctive) interpretation.
2. Any subset U ⊂ A has a characteristic function χ :A→ Ω (Notation 2.8.2). Suppose that U is a T-
U
3. Define the additive interpretation in N of any finitary free theory in an analogous way to the
conjunctive interpretation in Ω. Describe the free algebra F1 on one generator and the
interpretation [[-]]:F1→ N in terms of trees. Generalise to an interpretation of infinitary free
theories with the class of sets as its carrier.
4. Explain how the set of streams in an alphabet G is the final coalgebra for T = (-)xG. Derive the
final coalgebra for an infinitary free theory from Proposition 6.1.11.
5. Show that the functor List preserves pullbacks, and P preserves monos.
f
6. Show that the functor T of Definition 6.1.1 preserves directed unions iff the arities of all of the
operations are finitely generable.
7. Prove Remark 1.6.12, that each of the rules of natural deduction for the predicate calculus
preserves validity of formulae in any interpretation, in the sense defined by Example 6.1.9.
10. Use the List functor, head, tail, their analogues for reversed lists, and equalisers to construct the
set of morphisms of the free category on a graph (Theorem 6.2.8(a)). Use append and a
pullback to define composition.
11. Show that if the set Ω of operation-symbols of a finitary free theory has decidable equality then
so does the initial algebra.
12. Let e:X\twoheadrightarrow Y. Consider the free theory with Ω = X+{r}, where ar[x] = ∅ and ar
Y
[r] = Y. Let F be its free algebra. Consider also the algebra A = Y+(Y+1) with x = ν0(e(x)) and
A
Y
\opr (g) = ν1(g;k) for g ∈ A , where k = (id+!):A → Y+1. Show that each ν0(y) is in the smallest
A
Ω- subalgebra of A, and hence so is ν1(\expx ν0). But if this lies in the ( Set-)image of the unique
homomorphism p:F→ A then e is split.
14. Let ev:TΩ→ Ω be the characteristic map of the subset T{T} ⊂ TΩ. Use this to show that the
recursion scheme implies the induction scheme for any functor T ( cf Remark 6.3.10).
15. Show that if \parse :X→ TX is well founded or extensional then so is T\parse :TX→ T2X ( cf
X X
16. Show that any colimit of well founded coalgebras and homomorphisms is well founded. Discuss
extensionality for filtered colimits and pushouts.
17. Let W ⊂ X be a subcoalgebra and suppose that W (but not X) is well founded. Show that it is the
largest well founded subcoalgebra iff the square is a pullback: omitted diagram environment
18. Show that for every well founded relation (X,\prec ) there is a wff-system (for a free theory) of
which it is the immediate sub-expression relation. [Hint: take tgt:(\prec )→ X as the arity display
of the theory.]
19. Show how to apply the General Recursion Theorem 6.3.13 to Remark 6.2.10 and Proposition
6.2.6.
20. Give the symbolic proofs in set theory and algebra corresponding to Proposition 6.3.9, Lemma
6.3.12 and the last part of Theorem 6.3.13, cf the comments after Lemma 6.3.11.
21. Follow the particular cases of the predecessor function and the Euclidean algorithm through
Section 6.4.
22. A natural transformation σ :ΓxT A→ T(ΓxA) is called a strength for the functor T if it satisfies
Γ,A
.
ΓxA
Find σfor the covariant powerset and for the functor which codes a free algebraic theory.
23. The notions of T-coalgebra and algebra with parameters in Γ are given by parse:ΓxX→ TX and
ev:ΓxTΘ→ Θ, in the presence of a strength σ. Explain how the diagram on the left describes the
parametric recursion scheme, in particular in the cases of the powerset and a free algebraic
theory: omitted diagram environment If the category is cartesian closed show how a solution of
the non-parametric recursion on the right solves the given one, cf Remark 6.1.6.
Using Corollary 6.3.6, show that if X→ TX (doesn't depend on Γ and) is well founded then so is
ΓxX→ T( ΓxX). Without using cartesian closure, formulate and prove directly the parametric
version of Proposition 6.3.9, and hence of the General Recursion Theorem 6.3.13.
24. Recursion may be parametric in another sense, namely that the argument is used in the evaluation
phase. This issue itself divides into two parts, depending on whether the argument is used in
parsed or unparsed form. The second case is given by the diagram on the left: omitted diagram
environment By a similar method to the previous exercise, show that this is equivalent to the
form on the right in a cartesian closed category, where T has a strength σ. Again, formulate and
prove the parametric General Recursion Theorem directly without using cartesian closure.
25. If the original argument is used in parsed form, the recursion scheme simplifies to omitted
diagram environment Derive this from the previous case by putting Θ = ΦxX. Conversely, reduce
that one to this using \ev = parse-1 , in the case where X is the initial T-algebra (an extensional
X X
26. A unary recursion problem is one which makes at most one recursive call at each level, but where
the argument may be used again after the return from the nested call. Explain how this is coded
by a functor of the form T = K+Cx(-) and show how to reduce this to the diagram on the left
(allowing the argument as a parameter as in Exercise 6.25): omitted diagram environment The
map a:XxΘ→ Θ gives a homomorphism List(X)→ [Θ→ Θ] of monoids. Suppose that this factors
through another monoid M, which is called an accumulator. Show how to reformulate the
problem as a tail recursion with functor T′ = (-)+(NxM), as on the right.
Explain how the case M = List(X) corresponds to a stack, and how M = N suffices if a does not
depend on X. The last map ΘxM→ Θ is itself defined by recursion over M: to what part of the
execution of a recursive program does it correspond?
27. Express the factorial program in this form and apply the transformation, using the monoid M =
(N,x). Do the same for the Fibonacci function, which is calculated using the unary recursion on
N2
p(0) = (0,1) p(n+1) = (v,u+v) where (u,v) = p
(n),
using the monoid of (2×2) matrices under multiplication. Hence show that p(n) = (0,1)(0 1 || 1 1)
n k
, so by calculating (0 1 || 1 1)2 the problem can be solved in logarithmic time. What features of
a recursive program must a compiler recognise in order to make use of optimisations of this kind?
28. Express fold and append for lists (Definition 2.7.4) using Exercise 6.26, with M = List(X), and
translate these back into functional programs. What advantage, if any, do the tail- recursive
programs have over the original non-tail recursion using a stack?
29. General recursion extends primitive recursion by use of the search or minimalisation operator
µ. For any partial recursive function f:ΓxN \rightharpoonup 2, µn.f(x,n) is the n (if any) for which
f(x,n) = 1 but f(x,m) = 0 for all m < n (if any of these values is undefined then so is µn.f(x,n)).
Express this as a while program, and hence as recursion over a certain coalgebra structure (ΓxN)
\rightharpoonup (ΓxN)+N.
30. Show that (L,z,s) ≡ (List(X),[ ], cons) iff the diagrams omitted diagram environment are
respectively a coproduct and a coequaliser, cf Example 6.4.13.
31. Modify the Floyd rules of Remarks 4.3.5, 5.3.2 and 5.3.9 for partial correctness, as in
Remark 6.4.16ff.
33. Investigate unification in equationally free algebras that are not necessarily well founded, so that
the equation x = r(x) can be solved.
34. Show that the unifier is the quotient by the parsing congruence, and that the congruence is the
kernel of the unifier, cf Section 5.6. Allow infinitely many equations, and infinite arities.
37. Explain why, in Definition 6.6.2(b), any coequaliser diagram k\rightrightarrows n generates a
finite equivalence relation on n.
38. Prove the pigeonhole principle, which is that, for any function f:n+1→ n, there are some i < j ≤ n
with f(i) = f(j). Deduce Bo Peep's theorem (Exercise 1.1).
39. Develop a Kuratowski-style unary induction scheme for finitely enumerable sets, and use it to
prove the pigeonhole principle in an extensive category (Section 5.5) without assuming the
existence of N, or using numbers at all.
40. By considering the orders of elements of its Sylow subgroups, show that there is no simple group
of order 105.
41. Show that {∗|φ} (Remark 2.2.7) is finitely enumerated iff φ is decidable. [Hint: exactly how
many elements does it have?] Show that this also holds for finite generability.
43. Develop the results about Kuratowski finiteness and the finite powerset starting from the unary
induction scheme in Lemma 6.6.10(b), in the style of Definition 3.8.6.
44. Show that the following binary Kuratowski induction scheme is equivalent to the unary one:
omitted prooftree
environment
45. Let R:X\leftharpoondown \rightharpoonup Y be any binary relation, and suppose that U ⊂ X, V ⊂
Y are finitely generable sets which match in the sense that
∀u.∃v.u R v ∀ v.∃u.u R v .
∧
Show that U and V have matching listings, ie U = {\argu0, …,\argu } and V = {\argv0,…,\argv
n-1 n-
1} with ∀i.\argu R \argv . Deduce the following results by taking R to be equality or an order
i i
relation.
(a)
The kernel pair of the set of elements function List (X)→ P(X) is generated by the laws for
a semilattice; use Section 5.6 to deduce that P (X) is the free semilattice on X.
f
(b)
Let (X, ≤ ) be a preorder. Suppose P (X) also carries a preorder \sqsubseteq ( not inclusion)
f
with respect to which ∪:P (X)xP (X)→ P (X) and {-}:X→ P (X) are monotone. Then if U,
f f f f
V ⊂ X are finitely generable and match with respect to ≤ , ie U\cvx ≤ V in the notation of
Exercise 3.55, then U\sqsubseteq V.
46. Let L be a single-sorted algebraic theory with only finitely many operation-symbols and laws,
and let A be an L-algebra. Show that if the carrier of A is a finite set then A is finitely presentable
as an algebra.
47. Let L be a single-sorted algebraic theory which is finitary in the stronger sense that it has finitely
many ( ie a finitely enumerated set of) operation-symbols, each of finite arity, and finitely many
laws (where, for example, the associative law counts as one law). Let A be an algebra whose
carrier is Kuratowski-finite, so there is a surjection n\twoheadrightarrow A for some n. Show that
there are a finitely enumerated algebra B and a surjective homomorphism B\twoheadrightarrow
A. [Hint: show that n is an algebra for the operation-symbols of L but not necessarily the laws,
and that the laws of L define a finite equivalence relation k\rightrightarrows n.]
49. (Only for those who have studied ring theory.) Using the fact that polynomial rings are
Noetherian, show that every finitely generable commutative ring is finitely presentable.
50. Let G be a set with decidable equality equipped with a map G→ G which has no cycles ( cf
Exercise 2.47). Construct functions ν:GxP (G)→ G and σ:P (G)xP (G)→ P (G) such that ν(x,U)
f f f f
∉ U and σ(U,V) ≡ U+V, where the coproduct inclusions are also to be found. The functions must
not depend on the order in which the elements of the finite sets are given.
ω ωω
51. Using the results of Section 2.6, show how to code ω , ω , etc either as special orders on N or
with the arithmetical order on Q.
52. Describe the three product relations on αxα defined in Propositions 2.6.7- 2.6.9, for each of α =
ω, ω2 and ω2. If the result is not extensional, describe the extensional quotient as a binary
operation αxα→ β. [Hint: one of them gives ordinal multiplication and another intersection.]
53. Given a Choice function c:P(Θ)\ {∅} → Θ (Exercise 2.15), define T:P(Θ)\{∅} → P( Θ) by T
(U) = U∪{c(Θ\U)} for U ≠ Θ and T(Θ) = Θ. Let α ⊂ P(Θ) be the smallest set closed under T and
union ( cf Proposition 3.7.11 and Exercise 3.46). Show that α is in bijection with Θ, and that
every non-empty subset of α has a first ( ⊂ -greatest) element (without using Hartogs' Lemma;
this was Zermelo's proof).
54. Show, using excluded middle, that if a poset X has a least fixed point for every monotone
endofunction s:X→ X then X has a least element and all ordinal-indexed joins. [Hint: for ⊥, take s
= id; for y ∈ X and a diagram x(-):α→ X consider whether { β|x ≤ y} is α or an element of α, and
β
in the second case use the successor.]
55. Show that the relation x\prec y ≡ (\lnot x∧y ) on Ω is extensional and well founded, but that the
reflexive closure of this relation is sparser than ⇒ unless excluded middle holds. Show that this
structure has no non-trivial automorphism, and that (up to isomorphism) this is the only
extensional well founded relation on Ω.
56. Let X be a set with a bijection X+X ≡ X. Construct a bijection HXxHX ≡ HX. An ordinal κ is said
to be a cardinal (or an initial ordinal) if it is least amongst the ordinal structures on the same
↑
underlying set. Show that ∅, H- and ∪ generate the class of cardinals, and hence deduce that
κxκ ≡ κ for any cardinal κ. Hκ ≡ κ+ is called its successor cardinal.
57. Show that any transitive extensional well founded relation is trichotomous, using proof boxes,
making explicit the use of excluded middle in the form \lnot \lnot φ\vdash φ. [This is not easy: try
Remark 2.4.10.]
VII. Adjunctions
Practical Foundations of Mathematics Paul Taylor Cambridge University Press
INTRODUCTION
● Colimits
● Free algebras
● Completions
● Co-universal properties
● Classifying categories
7.2 ADJUNCTIONS
● Applications
● Proof of the equivalence
● Reflections and representables
7.5 MONADS
● Encoding operations
● Canonical language
● The equivalence
● Conservativity by normalisation
EXERCISES VII
Chapter 7
Adjunctions
Universal properties galore have arisen in the earlier chapters, and it is high time we gave a unified
framework for them. In 1948 Pierre Samuel identified universal properties as a common formulation of
several constructions in topology, and the Bourbaki school used them in their comprehensive account of
mathematics. Independently, Daniel Kan introduced adjoint pairs of functors, with the tensor product
and internal hom for vector spaces as his main example (1958). The name was suggested by Sammy
Eilenberg, by analogy with Ta,b = a,T*b for operators in a Hilbert space, which notation had itself been
proposed by Marshall Stone.
Nowadays, every user of category theory agrees that this is the concept which justifies the fundamental
position of the subject in mathematics.
There are several other formulations (such as ends and Kan extensions), and which of them to use is a
matter of personal taste. The symmetrical presentation of a pair of adjoint functors between two
categories will be given in Section 7.2, but this raises logical questions because of the choice of a
particular product or whatever within its isomorphism class. We prefer diagrammatic reasoning, which
opens the calculations out to view, especially in complicated situations, and avoids the Choice. We shall
also show that the naturality conditions on adjoint functors - all too easily dismissed as bureaucracy - are
directly related to substitution- and continuation-invariance of the rules of type theory.
The most commonly used universal constructions are limits and colimits. We devote Sections 7.3-7.5 to
them and to how they relate to other adjunctions, using the fact that left adjoints preserve colimits to
fashion each from the other. Limits and colimits of topological spaces have an almost completely ``soft''
construction, and we also investigate free algebras, leading into the theory of monads (Section 7.5).
Often the thing which is required is obtained as a composite of two universal constructions: recognising
it as such makes the development more modular. Sometimes the construction can only be done in a few
simple cases, others being too complicated to be judged reliable. When it becomes apparent that it is an
adjoint - frequently to something completely trivial, for example pullback between slices is right adjoint
to composition - the general case quickly falls into line (Exercise 7.42).
Finally we return to the task of showing the equivalence between syntax and semantics. Adjunctions not
only describe the logical connectives themselves, but also characterise the category Cn[] of contexts
L
and substitutions. Equivalences, in their strong form themselves examples of adjunctions, settle the issue
of the choice of the structure (Section 7.6). Finally, Section 7.7 proves some deep results about syntax
using just adjunctions and pullbacks.
DEFINITION 7.1.1 We say that η:X→ UA is a universal map from the object X ∈ obS to the functor U:
A→ S if, whenever f:X→ UΘ is another morphism with Θ ∈ obA, there is a unique A-map p:A→ Θ
such that η;U p = f ( cf Remark 3.6.3 for posets).
If every object X has a universal map to U then the latter is called an adjunctible functor.
As we showed for the terminal object in Theorem 4.5.6, the universal property determines A up to
unique isomorphism. So it is a description, but in the interchangeable sense which we discussed at the
end of Section 1.2. Of course this is why we generalised Russell's theory. Although any description
allows us to introduce a function (indeed a funct or, the left adjoint) there is a loss of logical clarity here
because of the choices; we prefer to shift the balance of the formulation back from algebra to logic.
(a)
Let U:A ⊂ S = P(Σ) be the inclusion of the lattice of closed subsets for a system of closure
conditions on a set Σ. Then η:X→ UA is the inclusion of an arbitrary subset X ⊂ Σ in its closure
(Section 3.7).
(b)
The possibility modal operator < > , existential quantifier ∃ and direct image \funf! in Section
3.8. These will be developed in Section 9.3.
(c)
In general, let U:A→ S be a monotone function with F\dashv U. Then η:X→ U(FX) is one of the
inequalities in Lemma 3.6.2.
EXAMPLES 7.1.3
(a)
Let U:A→ 1 be the unique functor to the category with one object (∗) and only its identity
morphism. Then η = \id :∗→ U0 is the universal map iff 0 ∈ obA is the initial object. The
∗
mediating homomorphism p:0→ Θ is the unique map, cf Example 3.6.10(a), Definition 4.5.1ff
and Definition 5.4.1(a).
(b)
Let U:A→ AxA be the diagonal functor A→ A,A, and N and Y two objects of A. Then the pair η =
ν0,ν1:N,Y→ A,A is universal iff it is a coproduct diagram. Given maps f:N→ Θ and g:Y→ Θ, the
mediator is p = [f,g], cf Example 3.6.10(b), Definitions 4.5.7ff and 5.4.1(b) . omitted diagram
environment
(c)
Let U:A→ A\rightrightarrows be the diagonal functor to the category whose objects are parallel
pairs;
η = (q,r) is universal iff r = u;q = |;q and q is the coequaliser (Definition 5.1.1(a) and
Proposition 5.6.8ff).
(d)
ℑ
Let ℑ be any diagram-shape and U:A→ A the diagonal or constant functor ( X→ λ i.X) into the
ℑ
functor category A (Theorem 4.8.10). Then for \typeX(-):ℑ → A and A ∈ obA, η: \typeX(-)→
UA is universal iff it is a colimiting cocone. For any cocone f, the mediator p in Definition 7.1.1
is that from the colimit, cf Example 3.6.10(c).
Free algebras Besides coequalisers, the new feature which arises when we move from posets to
categories is the free algebra (Chapter VI).
EXAMPLES 7.1.4
(a)
Let U:A→ S be the forgetful functor CRng→ Set. Then A is Z[X], the ring of polynomials in a
set X of indeterminates, and p evaluates polynomials using the assignment f:X→ Θ.
(b)
X
With A = Vsp, let A ⊂ R consist of those functions a:X→ R with non-zero values at only
finitely many elements of X. The unit η(x) is the xth basis vector and p(∑ \arga x ) = ∑ \arga f(x ).
i i i i i i
(c)
Let U:Rng→ Mon, forgetting addition. The free ring A consists of linear combinations of
elements of a monoid. When X = G is a group this is called the group ring Z G. Similarly, with K
any ring, for the coslice U:K ↓ Rng→ Mon (the objects of K↓ Rng are called K- algebras), we
get the group ring KG.
(d)
Let A = Rel, Pos, Sp, Graph or Cat and S = Set, with U the forgetful functor. Then A is X with
the discrete structure (the empty relation, equality, all subsets open, no arrows, and only identity
maps).
Completions An important special case of universal maps arises from ``completing'' an object, so that
when it is complete (re-)applying the construction has no (further) effect. This is most the direct
analogue of a closure operation (Section 3.7, but see also Section 7.5).
(a)
full and faithful iff id:UA→ UA is a universal map from UA to U for each A ∈ obA, and
(b)
an equivalence functor iff for every X ∈ obS there is an invertible universal map η:X ≡ UA.
A (full, and often by convention replete) inclusion A ⊂ S which is adjunctible is called a reflective
subcategory (Corollary 7.2.10(b)).
PROOF:
(a)
[[a]] Both conditions say that ∀f:UA→ UΘ . ∃!p:A→ Θ.f = id;U p.
(b)
[[b]] For equivalence we also need, for each X ∈ ob S, some isomorphism η:X ≡ UA. The issue
is therefore to show that any such η is a universal map. For any f:X→ UΘ there is a unique p:A→
Θ with Up = η-1;f :UA→ UΘ, since U is full and faithful. []
EXAMPLES 7.1.6 Reflections are sometimes completions (the universal map is injective) and sometimes
quotients (where it is surjective).
(a)
Pos ⊂ Preord is reflective, the quotient η:X\twoheadrightarrow X/ ∼ described in
Proposition 3.1.10 being the universal map.
(b)
Let S be the category of metric spaces and isometries (functions that preserve distance) and A ⊂
S the full subcategory of spaces in which every Cauchy sequence (Definition 2.1.2) converges.
This is reflective, and the universal map η:X\hookrightarrow A is an inclusion.
(c)
The functor U:Set\hookrightarrow Graph which treats a set as a discrete graph (Example 7.1.4
(d)) is the inclusion of a reflective subcategory, where η:X\twoheadrightarrow X/ ∼ is given by
the set of (zig-zag) components (Lemma 1.2.4 ). Similarly we have the components of a preorder,
groupoid or category.
(d)
Imposing laws such as distributivity or commutativity on algebras, for example the reflections of
DLat\hookrightarrow Lat, CMon\hookrightarrow Mon and AbGp\hookrightarrow Gp, results
in a quotient.
(e)
The reflection of AbGp in CMon adjoins negatives, but unless the monoid has the cancellation
property ∀x, y, z.x+z = y+z⇒ x = y (as with N→ Z ) there is also identification.
(f)
Let S be the category of integral domains and monomorphisms and Fld ⊂ S be the full
subcategory of fields. Then η:X\hookrightarrow A is the inclusion of X in its field of fractions, for
example Z\hookrightarrow Q .
(g)
Let S be the category of well founded relations and simulations, so S ≡ Coalg(P) by
Remark 6.3.5. Let A be the full subcategory which consists of the extensional relations; set-
theorists call its objects transitive sets, and its maps are set-theoretic inclusions [Osi74]. Then A
⊂ S is reflective; Andrzej Mostowski (1955) used Definition 6.7.5 to find the extensional
quotient set- theoretically. See Exercise 9.62 and [ Tay96a] regarding the axiom of replacement.
Co-universal properties
DEFINITION 7.1.7 A map ε:FX→ A is said to be co-universal from the functor F:S→ A to the object A
if for every map p:FΓ→ A in A there is a unique map f:Γ→ X in S such that Ff;ε = p.
EXAMPLES 7.1.8
(a)
Coclosures, universal quantifiers (∀), necessity ([]) and other right adjoint monotone functions,
analogously to Examples 7.1.2.
(b)
Limits, X = lim\typeA , the duals of Examples 7.1.3. The co-unit ε is the family of projections
i
(π ).
i
(c)
((-)∧φ)\dashv (φ⇒ ( = )) in a Heyting semilattice (Proposition 3.6.14).
(d)
Let S be a cartesian closed category and F = (Yx(-)): S→ S. Then ε:F X→ A is universal iff X =
Y
A and ε is evaluation.
(e)
For the forgetful functor F:Set→ Rel, X = P(A) and ε(a) = {a}.
(f)
For F:Set→ Pfn, X = Lift A and ε(a) = lifta (Definition 3.3.7).
(g)
The co-reflection of Gp into Mon gives the group of units.
(h)
Set, again with the discrete structure, is co-reflective in Bin, Sp, Graph , Preord, Gpd and Cat.
Co-universality of ε:F X→ A says that X is A with the indiscriminate structure (also called
indiscrete or chaotic), which cannot distinguish between points. So x→ y, x ≤ y, X(x,y) = {∗} , etc
for all x, y. Classically, the indiscriminate topology only has ∅ and X as open sets.
(i)
For any set (``alphabet'') G, the set Γ of G- streams carries a map read:Γ→ ΓxG, so it is a
coalgebra for the functor (-)xG (Lemma 6.1.8). Let F be the underlying set functor. Then FX→ 1
N N
is co-universal from F to 1 ( ie X is the terminal coalgebra) iff X = G ≡ G xG, this structure
being induced by N+1 ≡ N.
There are also examples of symmetrically adjoint contravariant functors, generalising Galois
connections (Proposition 3.8.14). Recall in particular that negation, (-)⇒ ⊥, is symmetrically self-
adjoint (Exercise 3.50).
EXAMPLES 7.1.9
(a)
The Lineland Army is a monoid equipped with a symmetric self-adjunction (Example 1.2.7 and
Exercise 7.8).
(b)
X
Let Σ be any object of a cartesian closed category S, such as Ω in Set. Then X→ Σ , as a functor
op
S→ S , is symmetrically adjoint to itself on the right. omitted diagram environment
(c)
Let Σ be a field, say R. For a vector space V, the dual space V* consists of the linear maps V→
op
Σ . Then (-)*:S→ S defines a self- adjoint functor on the category of vector spaces, whose unit
was the original natural transformation (Example 4.8.8(b)).
EXAMPLES 7.1.10
(a)
The inclusion BA ⊂ HSL of Boolean algebras in the category of Heyting semilattices has both a
reflection and a co-reflection, and these functors are the same. Exercise 7.12 shows that this
situation always arises from a natural idempotent, in this case \lnot \lnot .
(b)
The inclusion Gp ⊂ Mon is also both reflective and co-reflective, but now the two adjoints are
different, as they are for the inclusion of any complete sublattice (Remark 3.8.9).
Classifying categories We have constructed the preorder or category of contexts and substitutions Cn[]
for various fragments [] of logic.
L
(a)
Extensionally, each of them is the system of well formed formulae built up from some
indeterminates ([(x)\vec] or L) using certain operations, namely addition and multiplication, or
the logical connectives of the fragment []. (Recall also that the hom-set Cn× ([[(x)\vec]:[(X)
L
(b)
Any sequence [(a)\vec] ∈ R of elements of another ring, or any model M of L in a semantic world
E, may be substituted for the indeterminates, and the syntactic formulae `` evaluated'' using
structural recursion. omitted diagram environment
(c)
Intensionally, this is a universal property, because Z[[(x)\vec]] and Cn[] have the same structure
L
as R or E and [[-]] is the unique homomorphism of this structure taking the generic object to the
concrete one.
(a)
Let [] be unary propositional logic. Then L→ \Clone is inclusion of (a set with) a binary relation
L
(unary closure condition, Section 3.8) in its reflexive-transitive closure. This is the universal map
from L to the forgetful functor U:Preord→ Bin; in fact Preord is reflective in Bin (an object of
which is a set with a binary endorelation).
(b)
Let [] be Horn logic (propositional algebraic); then U is the forgetful functor from SLat. For
propositional geometric and intuitionistic logic, the category A is Frm or HSL (Section 3.9).
(c)
Let [] be classical propositional logic, so A = BA (Boolean algebras), and let X be a finite set.
X
Then Cn boole = 22 is the set of disjunctive or conjunctive normal forms (Remark 1.8.4) in the
X
(d)
op
U:CSLat→ Pos. Then A = shv(X) = [X → Ω], the lattice of lower sets, ordered by inclusion
(Definition 3.1.7), with η(x) = X↓ x and p(I) = ∨ {f(x)|x ∈ I} by Proposition 3.2.7(b).
Θ
(e)
The classifying monoid for a free single-sorted unary theory gives the universal map to U ≡ List:
Mon → Set. In Section 2.7 we wrote [[l]] = fold(e,m,f,l).
(f)
A many-sorted free unary theory is described by an oriented graph; the classifying category is
composed of paths ( U:Cat → Graph).
(g)
Finally, elementary sketches present equational many-sorted unary theories, and the classifying
category is free on the sketch.
REMARK 7.1.13 The intuition that Cn[] has a polynomial structure is a valuable one, but there are three
L
technical problems.
(a)
For an ``adjunction'' Cn[](-)\dashv L[] we need a category of languages, but there seems to be no
convincing notion of map L1→ L2, even for elementary sketches. In Sections I §10and II §14of
[LS86], Joachim Lambek and Philip Scott define language morphisms to match their semantic
interpretation, Cn[] 1→ Cn[] 2. However, this is not consistent with our use of Bin and Graph
L L
above, and so loses the dynamic aspect of the language which was the theme of Sections 3.7-3.9
and also of Chapter VI. See also Remark 7.5.2.
(b)
The semantic structure corresponding to the type theory (products, exponentials, etc ) is itself
characterised categorically by universal properties. In defining the classifying category, should
we require [[-]] to preserve them on the nose (and to be unique up to equality), as we expect from
syntax, or only up to isomorphism, as is the case for semantics? This question itself is the subject
of Section 7.6. If we take the semantic option, then the universal property of the classifying
category is more complicated than Definition 7.1.1: the interpretation functor [[-]] is only unique
up to unique isomorphism - if it is defined at all, as some Choice is to be made.
(c)
The axiom-scheme of replacement is needed to construct [[-]].
The second question also arises when we define limits and colimits of categories; we shall return to this
point at the end of the next section.
REMARK 7.1.15 Historically, these intuitions emerged from algebraic geometry in the form of a
classifying topos. The fragment of logic which is classified by toposes (known as geometric logic)
includes products, equalisers and arbitrary pullback-stable colimits, the coproducts being disjoint and the
quotients effective (Chapter V). In particular it allows existential quantification and infinitary
disjunction.
By analogy with polynomials, the classifying topos for, say, groups was written Set[G]. Given a group
G in another topos E, [[-]]:Set[G]→ E evaluates type-expressions by substituting the particular group G
for the generic G. Being a homomorphism of the categorical structure, [[-]] preserves finite limits and
has a right adjoint, written \nearrow * ≡ [[-]]\dashv \p*. Such an adjoint pair is called a geometric
morphism, and is the analogue for toposes of the inverse and direct image operations on open sets that
arise from a continuous function \nearrow between spaces. Then groups in E correspond to continuous
functions \nearrow :E→ Set[G]; in particular the ``points'' of the topos Set[G] are ordinary groups, since
the topos Set denotes a singleton space. Hence Set[G] is thought of as the ``space of groups,'' not to be
confused with the category Gp - homomorphisms are in fact the ``specialisation order'' (Example 3.1.2
(i)) on this space.
This analogy between model theory and topology explains why both subjects depend so heavily on
Choice: it is necessary to find points, models or prime ideals with certain properties. However,
propositional geometric logic is only special in so far as its model theory has this familiar points-and-
open-sets form, whereas the categorical model theory of full first order logic has not yet been fully
worked out. Classifying gadgets may be constructed in a uniform way for any fragment of logic,
although we have considered simpler ones; besides, we have been interested in toposes as models of
higher order logic, the relevant homomorphisms being logical functors, ie those that preserve Ω.
7.2 Adjunctions
Now we shall transform universal maps from their logically quantified statement into a purely algebraic
form. This involves making a Choice (Exercise 3.26) of universal and co-universal maps, which provide
the components of two natural transformations. Although they are unique up to unique isomorphism,
these play a crucial role , and are just as much part of the definition as are the functors U and F. In
particular we shall show that the laws which they obey express the β- and η-rules in type theory
(Sections 2.3 and 2.7), and (as in Definition 4.7.2(c)) the naturality condition handles substitution or
continuation.
In this section we drop the Convention 4.1.2 for brackets (which arose from Currying, Convention
2.3.2): F UA means F(UA), etc .
(a)
the right or upper adjoint functor U:A → S,
(b)
the left or lower adjoint functor F:S → A,
(c)
the unit natural transformation, η:\id → U·F, sometimes called the front adjunction, and
S
(d)
the co-unit natural transformation, ε:F· U→ \id , also called the back adjunction,
A
cf Lemma 3.6.2 for posets. The letters L and R will be used to identify the triangles in the same way as N
(a)
(F,U,η,ε) form an adjunction.
(b)
A natural isomorphism λ:A(F(-),( = )) ≡ S((-),U( = )), called the adjoint transposition ( cf
Definition 3.6.1)
omitted prooftree
environment
op
is given between functors S xA\twoheadrightarrow Set. (It is actually enough for λto be natural in one
argument for each fixed object as the other.)
(c)
There is an assignment η :X→ U\typeA of universal maps from each object X ∈ obS to the
X X
functor U.
(d)
There is an assignment ε :F\typeX → A of co- universal maps from the functor F to each A ∈
A A
obA .
omitted array
environment
We shall break with our usual custom by deferring the proof to page 7.2.9.
REMARK 7.2.3 Naturality of λ means that the following square in Set commutes, which, for p:F X→ B,
says that λ(Fu;p;z) = u;λ(p);Uz.
Hence if the square on the left below commutes in A then so does that on the right in S (and conversely
since λ is bijective).
Applications Recall from Sections 2.3 and 2.7 that the introduction, elimination, equality, β- and η-
rules for the product, sum, function-type and List constructions are summed up by adjoint
correspondences.
REMARK 7.2.4 Consider the common situation in which a universal property is used to define a new
construction in terms of an old functor. Indeed the definition of universal maps, unlike that of
adjunctions, was phrased in just this way.
(a)
The unit η provides the operations for the introduction rules, if these are direct (ν0, ν1, 0, +,
empty list and append), and the co-unit ε gives the direct operations for elimination (π0, π1, ev).
(b)
The triangle law on the old side is the β-rule.
(c)
The triangle law on the new side is extensionality or the η-rule . (It is a pity that the letter η has
established meanings for two different parts of the anatomy of an adjunction.)
(d)
The adjoint transposition λ is the meta-operation or indirect rule (λ-abstraction, pairing, case
analysis and recursion).
(e)
The naturality condition on the new side of the adjunction defines the effect of the new
construction on morphisms.
(f)
Product and exponential are right adjoints, and naturality on the old side (the left) states the
substitution-invariance of -,- and λ.
(g)
For sum and List, which are left adjoints, old (right) naturality gives the continuation rules; in
these cases substitution -invariance must be expressed by an extra condition.
EXAMPLE 7.2.5 For products, F:S→ SxS is the diagonal functor and U = x. Then λ is -,-, the co-unit ε is
the family of projections (π ) and the unit η is the diagonal map ∆:X→ XxX. The product functor is
i
defined on maps,
omitted prooftree
environment
by naturality on the right (the new side). The triangle laws are
or, type-theoretically, π0x,x = x, π1x,x = x and π0(z),π1(z) = z. We must use naturality of ε to put the β-
rules in the form π0x,y = x.
Notice that y:1→ Y is a global element: if it is only partially defined then commutativity of the square on
the right becomes an inequality, so strict naturality fails in p-categories (Exercise 5.15).
EXAMPLE 7.2.6 For colimits, λ-1 is case analysis [ , ], the unit η is the family of inclusions (ν ) and the
i
The naturality condition on the left (the new side) defines + on maps. On the right (old) it states
invariance under continuation z (Remark 2.3.13). This says that the (+E)- box is open-ended, cf
Remark 1.6.5.
Y
EXAMPLE 7.2.7 For function-types, F = (-)xY and U = ( = ) . The adjoint transposition (λ) is λ-
abstraction and the co-unit ε is evaluation. Naturality on the left (the old side) states substitution-
invariance of λ (Definition 4.7.2(c)), and on the right (new) is postcomposition. The naturality of ε is (-)
X
on morphisms; and that of η:x→ λy.x,y is substitution under λ.
The left triangle law is the essence of the β-rule, but in the special case p = x,y, (λy′.x,y′)y = x,y. To see it
at work we must pre- and postcompose with the argument and the function, and use naturality, cf
Remark 2.3.9.
EXAMPLE 7.2.8 List:Set→ Mon is left adjoint to the forgetful functor. The unit is the singleton list, and
the co-unit List(A)→ A multiplies out a list of elements of the monoid A. The adjoint transposition λ-1 is
(roughly) what we called fold in Proposition 2.7.5 and [[-]] in Section 4.6 and elsewhere. Naturality on
the left (the new side) for u:X→ Y is the action mapu of the functor (Example 2.7.6(d)). On the right it is
the action of a monoid homomorphism z (Remark 2.7.12).
Example 7.4.6 shows how ε :List(A)→ A captures the structure of the algebra A.
A
REMARK 7.2.9 This account is connected with the binary rules for List ( append and fold) in Definition
2.7.4ff, whereas recursion in Section 6.1 was based on the unary form ( cons and listrec). The unary
and binary theories employ universal properties in different ways, which are related to the different roles
of the alphabet X.
(a)
The adjunction Mon\rightleftarrows Set describes all lists whatever, and we make use of the free
monoid on any set of generators.
(b)
Definition 6.1.1 built X into the functor T, so we only use the initial T-algebra for recursion.
Proof of the equivalence It is surprising how little structure need exist in advance: we need naturality
of λ on one side, for which we must know what the morphisms are. The rest of the structure is derivable:
the naturality and triangle laws are the essential part.
PROOF OF THEOREM 7.2.2: We must show that the eight expressions are functorial, natural, mutually
inverse or universal as appropriate.
(a)
[[a⇒ b]] Bijectivity of λ uses naturality and the two triangular laws. omitted diagram
environment Naturality uses naturality (what else?). omitted diagram environment
(b)
[[b⇒ c]] Putting λ for B = F X, p = \id , u = \id in Remark 7.2.3 gives omitted diagram
FX X
environment so z mediates for λ(z) and is unique because λ is bijective ( cf the proof of the
(c)
[[c⇒ a]] First we must define F functorially on maps and show that η is natural. Fu = λ-1(u; ηY):
FX→ F Y is the unique map making the square commute, but this square states naturality of η:
omitted diagram environment For u = id only F u = id will do, and similarly F(u;|) = F u;F | both
fill in for the composite along the top ( cf Proposition 4.5.13). Putting εA = λ-1( \id )
UA
immediately satisfies law R, but we have to show that it satisfies L and is natural. omitted
diagram environment The naturality square for ε in the left-hand diagram commutes since both
routes serve for λ-1(U z) using universality of ηUA. The right hand diagram (the L law)
commutes by naturality of η with respect to ηX and by the definition of εF X; the top composite
(FηX;εF X) and id both serve for λ-1(ηX), so L holds.
The proofs of [b⇒ d] and [d⇒ a] are dual, ie they are obtained from [b⇒ c⇒ a] by interchanging the
parameters and reversing the arrows. []
Reflections and representables The equivalence amongst these four presentations explains how
various other terminologies arise. Recall from Theorem 4.8.12(b) that the Yoneda embedding identifies
Sop
X ∈ obS with \H = S(-,X) in Set , cf S↓ x in Definition 3.1.7.
X
COROLLARY 7.2.10
(a)
F:S→ A has a right adjoint iff for each A ∈ obA, the presheaf (contravariant functor,
op
Example 4.4.2(g)) A(F(-),A ):S → Set is representable, by X = UA (Definition 4.8.13). Cf the
representable lower set {X|F(X) ≤ A} in the proof of Theorem 3.6.9.
(b)
U:A→ S is the inclusion of a reflective subcategory iff it is part of some adjunction whose co-
unit is invertible (being mono suffices), cf closure operations (Section 3.7).
(c)
U is an equivalence functor iff it is part of a strong equivalence.
PROOF: The first part is immediate and the other two are similar to Lemma 7.1.5. Recall the
characterisations of
● strong equivalence, by ηX and εA both being invertible, for each X ∈ obS and A ∈ obA
respectively (Definition 4.8.9(b)).
By the R law ηUA is invertible iff UεA is; indeed the latter being mono suffices. This holds if ε is mono
because U, being a right adjoint, preserves limits (Theorem 7.3.5) and hence monos (Proposition 5.2.2
(d)). Conversely if U is the inclusion of a reflective subcategory then it is full and faithful, so ε is
invertible. []
The natural bijection is a simple and symmetrical way of presenting an adjunction and remembering (in
a subject with plenty of traps for the dyslexic) which is left and which is right. Although an ``intuitively''
natural construction usually turns out to be natural in the formal sense, Theorem 7.6.9 shows that
naturality may be the point at issue. We saw in Remark 7.2.3 how checking it can be built into the
idiom.
The formulation using representables shows how to generalise the notion of universal property, for
example to that of Cn[] .
L
REMARK 7.2.11 Let A and C be 2-categories. The isomorphism A(F(-),( = )) ≡ C(-,U( = )) may be
replaced by either a strong or a weak equivalence of hom-categories (Definition 4.8.9). The various
forms are illustrated by C/at: this has limits, exponentials and coproducts in the `` isomorphism'' sense
( cf Exercise 4.49), but other colimits can only be defined by an adjoint correspondence which is an
equivalence functor. These different situations are distinguished by referring to 2-limits, but bi-colimits.
Definition 7.3.8 also gives a new kind of 2-limit. An even more general situation is where A and C are
enriched, ie A(A,B) and C(X,Y) belong to some other category |[thick] such as AbGp [Kel82].
Adjunctions are more common than the newcomer to category theory might expect, and individual cases
are correspondingly less remarkable. Not unusually they are produced like a rabbit out of a hat,
following an argument in which the more canny reader has watched the conjurer stuffing them in, as free
algebras. The point is that, if the categories (S and A) and the functors (F and U) are artificial (A = SxS
for the product, to give a mild example), then the significance of the adjunction F\dashv U is not
particularly cogent. The universal property in the style of Definition 7.1.1 expresses the important facts
more directly. It also avoids the question of Choice, which is, frankly, a red herring: this is a side-effect
of imposing one-dimensional algebraic notation on a logical situation which is naturally a little subtler.
The original case for which a definition like the one which follows was needed was that of a chain or ω-
sequence, where obℑ = N and there is an arrow (n+1)→ n in the limit case ( projective or inverse limits)
and n→ (n+1) in the colimit case ( inductive or direct limits). When it was realised that products,
pullbacks, equalisers and their duals fit the same pattern, it became customary to define a diagram-shape
as a category. More recently, since the identities and composition play no role in the definition of (co)
cones and (co)limits - and often get in the way - authors' habits have reverted to regarding them as
graphs. Each convention has its uses, and the notion of elementary sketch covers both.
DEFINITION 7.3.1 A diagram-shape is an elementary sketch by yet another name ( Definition 4.2.5ff),
usually with only a set of nodes. (See Definition 3.2.9 for the poset analogues.)
(a)
A diagram (of shape ℑ) in a category C is an interpretation of the sketch, ie an assignment of an
object \typeX ∈ obC to each node i ∈ ℑ and a morphism \typeX :\typeX → \typeX to each
i u i j
arrow u:i→ j in such a way that the given polygons (laws) commute. So if ℑ is given as a
category, a diagram is a functor \typeX(-) :ℑ→ C.
(b)
A cone with vertex Γ over this diagram assigns a map \a :Γ→ \typeX to each node i ∈ ℑ such
i i
that for each (``base'') arrow u:i→ j in ℑ, the (``cross-section'') triangle \a ;\typeX = \a
i u j
commutes.
(c)
A limit is a universal cone (L,(π ) ), so that for any other cone (Γ,(\a ) ) there is a unique
i i∈ℑ i i∈ℑ
mediating map h:Γ→ L with h;π = \a for each i ∈ ℑ. omitted diagram environment
i i
(d)
A cocone with covertex Θ under the diagram is an assignment of a morphism \Fred :\typeX → Θ
i i
to each node i ∈ ℑ such that for each arrow u:i→ j in ℑ, the law \typeX ;\Fred = \Fred holds.
u j i
(e)
A colimit is a universal cocone (C,(ν ) ), so that for any cocone (Θ,(\Fred ) ) there is a
i i∈I i i∈ℑ
(f)
A category which has all set-indexed limits or colimits is called complete or cocomplete
respectively, cf Definition 3.6.12 for lattices.
(g)
See Definition 4.5.10 regarding preservation and creation of limits and colimits. Some authors
say that a functor is ( co) continuous when it preserves (co)limits, but we reserve this word for
Scott-continuity, ie preservation of filtered colimits.
As for all universal properties, limits and colimits, where they exist, are unique up to unique
isomorphism. They are written
respectively. The projections π and inclusions ν are an essential part of the definition.
i i
EXAMPLES 7.3.2
(a)
If C is a preorder then a cone is a lower bound and a cocone is an upper bound; the arrows of ℑ
are redundant. Limits and colimits are meets and joins respectively (Definition 3.2.4).
(b)
For ℑ = ∅, a (co)cone is simply an object, the (co)vertex; then a limit is a terminal object and a
colimit is an initial object.
(c)
If ℑ is a discrete graph (with no arrows), or a discrete category (with only identity maps), then a
diagram is a family of objects, a limit is its product and a colimit its coproduct.
(d)
For the graph on the left below, a limit is an equaliser and a colimit is a coequaliser.
(e)
For the graph on the right, a limit is a pullback, but the colimit is simply the value of the diagram
at the corner. For the opposite of this diagram-shape, a colimit is a pushout.
(f)
The equaliser of a parallel pair consisting of an endomap i:X→ X and the identity is the set of
fixed points. If i is idempotent then the coequaliser is the image, and these two objects are
isomorphic; the mono and epi split the idempotent i (Definition 1.3.12).
(g)
Section 6.4 used similar diagrams (without the requirement that i be idempotent) to study while
loops.
(h)
If ℑ has a terminal object then we call it a wide pullback; such diagrams arose in Exercise 3.34
and Lemma 5.7.8. The term is also applied to a diagram of such a shape or its limit.
(i)
If the graph ℑ is connected (in an unoriented sense, cf Lemma 1.2.4) then we similarly refer to
connected limits. The epithet refers to the shape of the defining diagram, and does not mean that
the object L is connected, for example as a topological space. The empty diagram is not
connected; wide pullbacks are, as are equalisers, which are not wide pullbacks. For simple
connectedness, see Exercise 8.13.
(j)
A category in which every finite diagram has a cocone (which need not be colimiting) is called
filtered; this generalises directedness for posets (Definition 3.4.1). The forgetful functor U:Mod
(L)→ Set for a finitary theory L creates filtered colimits ( ie of diagrams whose shape is filtered),
cf Theorem 3.9.4 and Definition 6.6.14.
(k)
(Freyd) Assuming excluded middle (so Ω = 2), if a small category has arbitrary limits then it is a
preorder. Let a, b:Γ\rightrightarrows X be distinct and I = mor C be the set of all morphisms;
I I I
then, by the definition of I-fold product, 2 ⊂ C(Γ,X) ≡ C( Γ,X ) ⊂ I, contradicting Cantor's
Theorem (Proposition 2.8.8). This result remains true in any sheaf topos over the classical
category of sets.
(l)
Martin Hyland, Edmund Robinson and Giuseppe Rosolini [ HRR90] showed that the effective
topos has a reflective, and so complete, subcategory (whose objects are known as modest sets)
which is weakly equivalent to the internal (``small'') category of PERs (Exercise 5.10). Because
of the problem of Choosing amongst isomorphic objects, the latter does not have limits of
parametric diagrams.
LEMMA 7.3.3 Let \typeX(-):ℑ→ C be a diagram in a category with equalisers. If C has products indexed
by obℑ and morℑ then lim \typeX exists. Moreover if F:C→ D preserves these equalisers and products
i i
EXAMPLES 7.3.4
(a)
The lemma was found by abstracting the construction of general limits in Set, namely the subset
consisting of those (obℑ)-tuples (\args ) which satisfy \typeX (s(i)) = s(j) for each u:i→ j in ℑ.
i u
(b)
Equalisers, binary products and a terminal object suffice to construct all finite limits (see also
Example 5.1.3(c)).
(c)
Similarly, the colimit C of any diagram \typeX(-): ℑ→ Set is the coequaliser of a pair of functions
into the coproduct \typeC0. By Lemma 5.6.11, this coequaliser is the quotient C = \typeC0/ ∼ by
an equivalence relation, namely that generated by the relation i,x ∼ j,y if there is an arrow u:i → j
in ℑ such that y = \typeX (x).
u
Although it is useful for showing that limits and colimits exist in certain semantic categories, this
decomposition into products and equalisers is misleading for type theory, as we shall see in Section 8.3.
(See [ML71, p. 109] for a more detailed proof of the lemma.)
THEOREM 7.3.5 Let F\dashv U. Then F preserves any colimits that exist and U any limits, cf
Proposition 3.6.8.
PROOF: Let \typeX(-):ℑ→ S be any diagram and let ν :X→ C be a colimiting cocone under it in S . Then
i
(a)
By functoriality, Fν :FX→ FC is also a cocone under F\typeX(-).
i
(b)
Let \Polly :F\typeX → Θ be a cocone under the diagram F\typeX(-):ℑ→ A. By the adjoint
i i
correspondence there are maps \Fred :\typeX → UΘ, which, by naturality with respect to
i i
\typeX , form a cocone under the diagram \typeX(-):ℑ→ S. omitted diagram environment Hence
u
(c)
Taking this back along the adjoint correspondence, there is a map p:FC→ Θ, which is a mediator
for the cocone \Polly by naturality. It is unique because the argument is reversible.
i
The result for limits is the same with the arrows reversed. []
REMARK 7.3.6 Colimits are preserved by left adjoints because they are themselves left adjoints, and a
diagram of left adjoints commutes iff the right adjoints do, cf Remark 3.6.11. However, to state this
precisely we have to formulate the analogue of Lemma 3.6.4ff (see Exercise 7.27ff).
EXAMPLES 7.3.7
(a)
Let U:A ⊂ S be the (full and replete) inclusion of a reflective subcategory, with F\dashv U, and
let \typeA(-):ℑ→ A be a diagram. Then
S A A
Fcolim U\typeA = colim FU\typeA ≡ colim
I I
\typeA
I
by Corollary 7.2.10(b), cfProposition 3.7.3for posets. As an easy special case of Proposition 7.5.6, the
inclusion Unot only preserves but createslimits (Definition 4.5.10(c)).
(b)
X
In a cartesian closed category, where Xx(-)\dashv ( = ) ,
X
❍ (-) preserves limits;
❍ the contravariant functor Σ(-) sends colimits to limits, because it is symmetrically self-
X+Y X Y
adjoint on the right (Example 7.1.9(b)); in particular Σ ≡ Σ xΣ ;
(c)
Since the forgetful functors Set→ Pfn and Set→ Rel have right adjoints (namely the lifting and
covariant powerset functors), the colimits with respect to total functions work for partial
functions and for relations too.
(d)
The forgetful functor U:Mod(P)→ Set from the category of algebras for the covariant powerset
functor creates (small) limits, but has no left adjoint, since the initial P-algebra would be P(A) ≡
A, contradicting Cantor's theorem (Propositions 2.8.8 and 6.1.4(b)).
Because of Example 7.3.2(k) we cannot reasonably ask for categories to have and functors to preserve
class-indexed (co)limits. Theorem 7.3.12 shows what alternative condition suffices to obtain the right
adjoint for a colimit-preserving functor. The forgetful functor from the category of complete Boolean
algebras to Set is a more popular counterexample, but it takes rather more work to show that FN doesn't
exist [Joh82 , p. 57].
Comma categories The next construction is a new kind of limit which arises in 2-categories, just as
equalisers appeared when we moved from posets to categories. Section 7.7 makes a powerful
application of this simple idea to the equivalence of syntax and semantics.
DEFINITION 7.3.8 Let CF→ S U← A be functors. Then the comma category F↓ U has
(a)
objects (X,A,α) where X ∈ ob C, A ∈ obA and α:FX→ UA in S (we commonly abbreviate this to
the map α:F(X)→ U(A) alone, bracketing the objects for emphasis if necessary; such an object
will be called tight if the map α is an isomorphism),
(b)
morphisms the pairs φ:X→ Y, u:A→ B making the square on the left commute:
PROPOSITION 7.3.9 Let Γ be another category with functors P:Γ→ C and Q:Γ→ A and a natural
transformation φ:F·P→ U·Q. Then there is a functor P,Q,φ:Γ→ F↓ U, unique up to equality, such that P
= π0·P,Q,φ, Q = π1·P,Q,φ and φ = α·P,Q,φ.
In particular, if the square from Γ commutes up to isomorphism then the mediator factors through the
pseudo-pullback, ie the full subcategory of F↓ U consisting of the tight objects. []
(a)
If A = {∗} , U∗ = A, S = C, F = \id , we have the slice C↓ A (5.1.8).
C
(b)
Dually, for C = {∗} , F∗ = X, S = A, U = \id , the coslice X↓ A .
A
(c)
If C, S and A are discrete categories then F↓ U is the pullback in Set of their sets of objects
(Example 5.1.4(b)).
(d)
If they are preorders then F↓ U = {x,a| x ∈ C,a ∈ A,Fx ≤ Ua} (see footnote down-cl fn on
page down-cl fn ).
(e)
An initial object of X↓ U (where C = {∗} and F(∗) = X) is a universal map from the object X to
the functor U (Definition 7.1.1).
(f)
The functor U is called final if for every X ∈ obS, X↓ U is a connected category. See
Exercise 7.19 for filtered final functors.
(g)
A terminal object of F↓ A (where A = {∗} and U(∗) = A) is a co-universal map from the functor F
to the object A (Lemma 4.5.16 and Definition 7.1.7ff);
(h)
(Lawvere) F\dashv U iff there is an isomorphism of categories omitted diagram environment
such that the triangle of functors commutes.
(i)
For S = C, F = \id and U:A → S any functor, S↓ U is called the gluing construction
C
(Section 7.7).
Equivalent colimits We now prove Proposition 3.2.10 for categories. Let \typeX(-):ℑ→ C be a
diagram. Then any functor U:J→ ℑ between diagram-shapes induces a ``restriction'' of cocones \Guy :
I
\typeX → Θ under the diagram of shape ℑ to \Fred = \Guy under the composite J→ ℑ→ C.
I J UJ
PROPOSITION 7.3.11 If U is final then this is a bijection: every cocone \Fred :\typeX →Θ for J extends
J UJ
uniquely to a cocone \Guy for ℑ. So whenever either of them exists, \colim \typeX ≡ \colim
I I∈ℑ I J∈J
\typeX .
UJ
PROOF: Let \Fred :\typeX → Θ be a cocone in C under J. By finality, for I ∈ obℑ the comma category
J UJ
I↓ U is connected. Let a:I→ UJ be any object of it and put \Guy = \typeX ;\Fred , as we must for this to
I a J
Any other object a′:I→ UJ′ of I↓ U is linked to this one by a zig-zag, of which (for induction on its
length) we need only consider one step j:J→ J′. Since this is a morphism of I↓ U and \Fred(-) was a
cocone, the triangles commute, so \Guy = \typeX ;\Fred .
I a′ J′
We may then take \Guy as the value at I of a cocone over ℑ, because for i:I′→ I, I↓ U\hookrightarrow I′
I
The general adjoint functor theorem We would like to show that if S has and F:A→ S preserves
colimits then F has a right adjoint.
By Example 7.3.10(g), it suffices to show that, for each A ∈ obA, the category F↓ A has a terminal
object. Since the terminal object of any category X is the colimit of the diagram \id :ℑ→ X with ℑ ≡ X,
X
UA = colim{X| (FXf→ A) ∈ F↓
A },
as in the proof of Theorem 3.6.9. (This is an example of a left Kan extension, the categorical analogue
Fid(A) of the modal operator < > in Section 3.8 .)
Unfortunately, F↓ A is a large category, so by Example 7.3.2(k) we cannot ask for its colimit directly.
There may, however, be a smaller diagram which has the same colimit, in the sense of
Proposition 7.3.11.
THEOREM 7.3.12 [Peter Freyd, 1963] Let S be a category which has and F:S→ A a functor which
preserves all small colimits. Let A ∈ obA and suppose that the solution-set condition holds:
Then there is a co-universal map ε :FX→ A from the functor F to the object A. Making an assignment of
A
If we are already in possession of the co-universal map, the singleton subdiagram consisting of this
alone suffices, so the solution-set condition is trivially necessary.
Preservation of limits is normally excellent heuristic evidence justifying the search for an adjunction: it
is well worth making a habit of checking whether functors preserve products, coproducts and a few other
cases. Frequently a few minutes' work will identify the essential features of any adjunction which holds,
and the quickest way of proving adjointness is the universal property. We have also seen in the
preceding sections that the effect of the two functors on morphisms, the unit, η, co-unit, ε, transposition,
λ, and their naturality all shed light on the phenomena under study. So, although this is technically
``redundant'' information, they should always be investigated and described.
The general adjoint functor theorem has been cited [ML88 ] as the first theorem of category theory
itself, but the solution-set condition seriously limits its value. Which is more useful, an ah doc
construction verifying this condition, or a presentation of the adjoint, which is an invariant? Clearly the
latter, because subsequent researchers will want to know as much as possible about what it does, and
maybe study it in its own right. Describing the adjoint is elementary (in the technical sense) and usually
simpler than testing preservation of all (co)limits properly.
Although the theory of general limit diagrams is important to bringing together products, equalisers,
pullbacks and inverse chains, it begs a number of questions, in particular where such (infinite) diagrams
come from. We shall discuss this and the existence of arbitrary limits and colimits in Set in the final
section of the book.
Limits and colimits in topology and order theory We do not ask that the spaces be T0, the preorders
antisymmetric or the groupoids skeletal, as this destroys the adjunction (γ) - a quotient such as that in
Proposition 3.1.10 must be applied to the colimits.
1 (#1) ⊥
(a)
Any limit of preorders is the limit of the underlying sets, equipped with some order; similarly the
vertex classes of a limit of relations, graphs, categories or groupoids (from β).
(b)
Moreover the hom-sets are also calculated as limits, and the order relation as a conjunction (from
δ).
(c)
Any limit of spaces is the limit of the underlying sets, equipped with some topology (from β).
(d)
The limit of a diagram of discrete relations, etc is discrete (from α). This is not so for spaces, as it
is not possible to define the connected components of an arbitrary topological space.
(e)
A colimit of preorders is the colimit of the underlying sets, equipped with some preorder; also the
vertex classes of graphs, etc (from γ).
(f)
A colimit of spaces is the colimit of the underlying sets, equipped with some topology or
preorder (from γ).
(g)
op op
Ω:Sp→ Frm and pts:Frm→ Sp (Exercise 4.14) are symmetrically adjoint on the right, so
they send colimits to limits. The frame of open sets of a colimit is the limit of the frames of open
sets of the spaces; in particular the Aleksandrov topology (whose open sets are the upper sets) on
a colimit of preorders and the Scott topology (Proposition 3.4.9) on dcpos are found as limits
(from ε).
(h)
A colimit of discrete spaces is discrete (from β).
(i)
The set of connected components of a colimit of graphs etc is the colimit of those of the graphs in
the diagram (from α).
(j)
The space of points of a limit of locales is the limit of the spaces of points of the locales
themselves (from ε). []
(a)
As we know the underlying set of a limit of spaces, identifying the topology is reduced to a
lattice-theoretic problem: it is the coarsest that makes the projections continuous.
(b)
Similarly, a colimit carries the finest topology making the inclusions continuous. (In fact these
properties may also be expressed in pure category theory, by observing that the points functor Sp
→ Set is a bifibration, Definition 9.2.6(e).)
(c)
Limits of locales (colimits of frames) are not in general the same as the corresponding limits of
spaces; they must be found using generators and relations, as below. Nuclei (Example 3.9.10(a))
give the most efficient technique; see [Joh82] for more detail. []
REMARK 7.4.3 Van Kampen's Theorem 5.4.8 may also be formulated as the preservation of colimits by a
left adjoint. The right adjoint takes a groupoid or category to its nerve, which is a simplicial complex
whose 0- and 1-cells are the objects and morphisms, and whose higher n-cells are composable sequences
of length n (so Theorem 4.2.12 described the two-dimensional skeleton).
Generators and relations We can exploit the relationship between left adjoints and colimits to
construct free algebras using coequalisers and vice versa .
THEOREM 7.4.4 Every finitary equational algebraic theory L has a free algebra FX/R on any set X of
generators for any set R of relations.
(a)
Forget the sorts and laws of L and the relations R;
(b)
add the generators X as nullary operation-symbols to the theory;
(c)
Proposition 6.1.11 and Example 6.2.7 give equationally free algebras;
(d)
the minimal subalgebra of an equationally free algebra, consisting of raw terms, is well founded
n
and parsable; recall that it may also be constructed either as the colimit ∪T (∅), or as the union
of all extensional well founded T-coalgebras (Section 6.3);
(e)
it satisfies the recursion scheme by Theorem 6.3.13;
(f)
the sorts are restored by restricting to the well formed formulae (Proposition 6.2.6); let
\LeftAdjoint0X be the free algebra for the theory L0 (known as the signature) obtained by
forgetting the laws of L;
(g)
the congruence K is generated from the laws and relations R;
(h)
the required free algebra FX/R is the quotient Q = \LeftAdjoint0X/K by this equivalence relation.
Let us spell out the last step; notice that it treats ``general'' laws and ``particular'' relations in the same
way, so without loss of generality R includes the laws of L. Each member of R relates two raw terms, so
we have a parallel pair of functions R\rightrightarrows \LeftAdjoint0X. Let K be the congruence
Σ
generated from R in step (g), and Q its quotient in Set .
For Theorem 5.6.9, the theory must be finitary in order to define the operations on the quotient, making
it a coequaliser in Mod(L). Any algebra Θ for the equational theory L is a fortiori a model of the free
theory L0, and so it has a unique homomorphism \LeftAdjoint0X→ Θ. That it satisfies the laws is exactly
to say that the composites with R\rightrightarrows \LeftAdjoint0 X are equal. The composite
homomorphisms with K\rightrightarrows \LeftAdjoint0X are also equal because K is the congruence-
closure of R. Hence there is a mediating function Q→ Θ, and this is a homomorphism. []
EXAMPLE 7.4.5 Recall from Example 4.6.3(i) that a category with a given set O of objects is an algebra
for a theory with O2 sorts, O+O3 operation-symbols and O2+O2+O4 laws. Theorem 6.2.8(a) constructed
the free such algebra on an oriented graph (with the same set O of nodes). By Theorem 7.4.4 we now
have the free category on any elementary sketch (Theorem 4.2.12). []
EXAMPLE 7.4.6 Every algebra presents itself by its multiplication table. It is generated by its own
elements a ∈ A, subject to the relations that each depth 1 expression r([(a)\vec]) ∈ TA (Definition 6.1.1)
is identified with its value \opr ([(a)\vec]) ≡ \ev (r([(a)\vec])) ∈ A ( cf the ambiguity in the usage of the
A A
This is the canonical language for the algebra A. It has the same operation-symbols as the single-sorted
theory for which A is an algebra, plus a constant for each element of A, and a law for each \opr ([(a)
A
\vec]). We shall generalise the canonical language to larger fragments of logic in Section 7.6. []
Daniel Kan's original example of an adjoint derived the tensor product M⊗(-) à la Remark 7.2.4 from
the internal hom ( = )⊗M. As such it is constructed from generators and relations.
EXAMPLE 7.4.7 The tensor product of two modules M and N for a commutative ring R is the free R-
module generated by symbols \qq (m,n) (for m ∈ M, n ∈ N) subject to relations such as r\qq (m,n) = \qq
Computing colimits Conversely, we may exploit the self-presentation of algebras for finitary algebraic
theories. If we had a convincing notion of morphism of languages ( cf Remark 7.1.13(a)), these results
would simply be applications of the fact that Cn×(-), being a left adjoint, preserves colimits. Indeed the
first is plausibly the coproduct of theories.
LEMMA 7.4.8 Let A and B be generated by X and Y respectively, subject to the relations R and S.
PROOF: The relations S for B are irrelevant: Q is also the coequaliser of FY\rightrightarrows A, where FY
is the algebra freely generated by the given generators of B. Then Y may be viewed as an additional
system of relations, but these are on A, not FX as required, so let P→ (FX)2 be the pullback of Y→ A2.
(See Exercise 5.53 for why this is not just the pullback of Y\rightrightarrows A.) []
Some of these manipulations of colimits have been implemented in ML by David Rydeheard [RB88].
Treating relations as another theory These constructions are not very tractable: they mix the
fundamentally incompatible techniques of structural recursion and quotients by equivalence relations.
We cannot expect any better (such as from the General Adjoint Functor Theorem 7.3.12), because while
programs can be interpreted with coequalisers. From the point of view of machine representation, we
have to accept generators and relations as a legitimate way of naming objects. We must also resort to
these methods to find coproducts of groups, and so the fundamental groupoids of spaces obtained by
surgery à la van Kampen (Theorem 5.4.8) .
Nevertheless some unification is possible, by treating both the creation of raw terms and their
identifications under the laws in the same way, as generative processes, but at different levels. (We shall
have to do this for generalised algebraic theories in Remark 8.4.2.) We should take the principle of
interchangeability more seriously, adding a categorical dimension below that which we're used to
considering in mathematics: individuals can only be represented by tokens, with interchange arrows
between them, cf the comments in Definition 1.2.12 and Example 2.4.8.
(a)
It is the closure of (the image of) R ⊂ A2 = (\LeftAdjoint0X) 2 under
→ →
{(\arga1,\argb2),(\arga2,\argb2),…,(\arga , \argb )} \triangleright (\opr ( a ), \oprA( b )) (r ∈
k k A
Ω)
(b)
It is the free algebra on R for a more complicated theory than L, namely with operation-symbols
for each of the closure conditions above, including reflexivity, symmetry and transitivity.
(c)
This theory may be chosen to have all of the laws induced by being a congruence on this
particular algebra A, or may instead have no laws at all ( cf Lemma 7.4.9), making it a free
theory. Another option is the theory of groupoids in Mod(L0). []
Definition 8.1.11 and Example 9.2.4(h) suggest a framework in which such a theory can be formulated,
using dependent types and indexed categories. The next section uses self-presentation to give a concise
functorial description of algebraic theories. Section 7.6 defines the self- presentation or canonical
language of any semantic category using the type theory corresponding to the structure it has.
7.5 Monads
Monads are the view of adjunctions which we get by looking at them from one end, generalising closure
operations from Section 3.7. They describe (single-sorted) finitary equational theories, but also
characterise many apparently unalgebraic categories as categories of infinitary algebras.
Let F\dashv U be an adjunction and put M = U·F. Then the natural transformation µ = ε·F (the
multiplication) makes the following diagrams commute, by the triangle laws (Definition 7.2.1) and
naturality of ε.
DEFINITION 7.5.1 A triple consisting of a functor M:S→ S and natural transformations η:\id → M and µ:
S
REMARK 7.5.2 Algebras for functors have arisen in informatics, and algebras for monads in categorical
algebra, so the uncritical reader of abstract accounts of these separate topics is in danger of making
inappropriate value-judgements. Functors and monads can both be used to code the same (free) theory,
but in different ways:
(a)
The functor T is the analogue of Lemma 3.7.10, which coded a system of closure conditions. It is
dynamic: we can watch the genesis of the free algebra from its well founded coalgebras (Section
6.3), and there is an associated notion of recursion.
(b)
The monad (M,η,µ) generalises the closure operation and is static: the construction of the free
algebra is already finished.
In particular, MX is much bigger than TX, and may be a proper class if, for example, T = P.
We shall only consider single-sorted theories (as before, for many-sorted theories we must work in
Σ
Set ), and start with a free algebraic theory in the sense of Section 6.1, presented as a functor T. Let
\LeftAdjoint0X be the free T-algebra on X, with inclusion η :X→ \Monad0X, where \Monad0 = U·
X
For example, let T = {1} +(-)2 and X = {a,b}. Then \Monad0X contains
A monad is needed to code laws. For a T-algebra \ev :TA→ A, the map ε :\Monad0A→ A constructed in
A A
Example 7.4.6 evaluates terms of arbitrary depth, whereas \ev only handles those of depth 1. The more
A
comprehensive structure is needed to test whether A satisfies the laws of an equational theory of which T
is the signature, as these laws may be between terms of any depth. But even for a free theory, ε must
A
The Kleisli and Eilenberg-Moore categories All monads arise from adjunctions, in fact in two
different ways, both found in 1965.
(a)
The Kleisli category \Kleisli (M,η,µ) has the same objects as S, but its maps X→ Y are the S-
maps X→ MY; the Kleisli identity on X is η and the composite of f:X→ MY with g:Y→ MZ is f;
X
Mg;µ ( cf Exercise 3.38 for preorders). There is an adjunction, omitted diagram environment
Z
(b)
An algebra is an S-map α:MA→ A such that omitted diagram environment cf fixed points for
closure operations (Section 3.7). Although this is a more complicated notion of algebra than that
in Section 6.1, the definition of homomorphism f:(A,α)→ (B,β) is the same: an S-map f:A→ B
such that α;f = Mf; β. They constitute the Eilenberg-Moore category, Mod(M,η,µ). The forgetful
functor \RightAdjoint EM has a left adjoint, \LeftAdjointEM:X→ (MX,µ ), and ε( , ) = α is the co-
X Aα
(c)
Let U:A→ S with F\dashv U be any adjunction (with co-unit ε and transposition λ) giving rise to
this monad. Then there are unique functors making the triangles commute: omitted diagram
environment K takes the Kleisli morphism g:X→ MY to λ-1(g ):FX→ FY, and, for A ∈ obA, E(A)
is the algebra ε : UFUA→ UA. []
A
DEFINITION 7.5.4 An adjunction F\dashv U (or just the functor U) for which the functor E is a weak
equivalence is said to be monadic.
EXAMPLES 7.5.5
(a)
Vsp is the Eilenberg-Moore category for the monad on Set induced by the adjunction in
Example 7.1.4(b). The Kleisli category consists of those vector spaces that have bases (which is
all of them, given the axiom of choice).
(b)
Rel\hookrightarrow CSLat are (equivalent to) the Kleisli and Eilenberg-Moore categories for the
covariant powerset monad (P,{-},∪) on Set.
(c)
The following Kleisli, co-Kleisli and Eilenberg-Moore adjunctions arise from lifting in domain
theory (Remark 3.4.5, Definition 3.3.7 and Example 3.9.8(c)), where Lift(-)- homomorphisms
are continuous functions that also preserve ⊥. omitted diagram environment Classically, every Lift
(-)-algebra (or ipo) A is LiftX, for X = A\{⊥} , so the middle adjunction above is also Kleisli.
(d)
Any single-sorted finitary algebraic theory L gives rise to a monad on Set, for which the
Eilenberg-Moore category is Mod(L). The Kleisli category consists of the free algebras on any
op
set of generators. Restricting to finite such sets, we obtain (Cn× ) , cf Corollary 3.9.5.
L
(e)
Exercises 4.27 and 7.41 show how some cartesian closed categories of domains arise as co-
Kleisli categories of monads on symmetric monoidal closed categories.
Beck's theorem The importance of monadic adjunctions lies in the fact that universal constructions
with algebras may be computed for the carriers, so long as the two structures commute. Jon Beck's
theorem was presented at a conference in 1966 but he never wrote it up for publication.
PROPOSITION 7.5.6 The forgetful functor Mod(M,η,µ)→ Set creates all small limits (Definition 4.5.10),
and whatever colimits M preserves.
PROOF: The structure map for the (co)limit algebra is the mediator α to lim \typeA or from \colim M
i i i
\typeA shown dotted. Similarly the Eilenberg-Moore equations η;α = id and Mα;α = µ;α hold because
i
both sides are mediators to lim \typeA or from \colim M\typeA or \colim M2\typeA . []
i i i i i i
REMARK 7.5.7 [Bob Paré] Consider contractible coequalisers (Exercise 5.2). Such coequalisers exist in
any category S where idempotents split, and they are preserved by all functors out of S, in particular by
M.
An algebra for the monad (by definition) makes the two squares below commute and the rows identities
(recall that Z indicates a naturality square). It is a contractible coequaliser.
The contraction η is not a homomorphism: the coequaliser diagram for the algebras only becomes
MA
contractible when we apply the forgetful functor U. Such a parallel pair of homomorphisms \nearrow ,r:
C\rightrightarrows B for which there is some S-map \:B→C with \;\nearrow = \id and \nearrow ;\;r = r;
B
Notice in particular that the structure map of any algebra α:FA→ A is the coequaliser of Fα and µ not
A
only in S but also in the category of algebras and homomorphisms, so it is a self- presentation.
EXAMPLE 7.5.8 For the discrete\dashv points adjunction (Theorem 7.4.1), the maps Fα and µ are both
A
the identity on the underlying set of A equipped with the discrete topology, so the original topology on
the space A is not recovered as a coequaliser.
THEOREM 7.5.9 [Jon Beck] Let F\dashv U between categories in which idempotents split
(Exercise 4.16). Then the following are equivalent:
(a)
the adjunction is monadic,
(b)
U creates whatever colimits M preserves: for any diagram ℑ→ A, if M preserves the colimit of ℑ
→ A→ C, then U creates the colimit of ℑ→ A;
(c)
U creates U-contractible coequalisers.
The comparison functor is full and faithful (and in particular reflects invertibility) iff every ε is a self-
A
presentation.
PROOF: To show that E is full and faithful, let g:UA→ UB in S. There is a unique f with g = Uf iff the top
row is a coequaliser in A:
For essential surjectivity, let α:UFX→ X be an algebra. Then Fα and ε form a U-contractible pair
FX
EXAMPLES 7.5.10
(a)
Reflective subcategories are always monadic, with invertible µ: we call the monad idempotent.
(By Corollary 7.2.10(b), ε is invertible and the forgetful functor is full and faithful, so the
contraction in S is already a morphism of the category A).
(b)
U:Mod(T)→ Set creates whatever colimits T preserves, by the same argument as for Proposition
7.5.6, so it is monadic. Hence the functor and monad have the same algebras, and we also
justified this name in terms of multiplication tables in Definition 6.1.1.
(c)
When does α:\Monad0A→ A satisfy the laws of an equational theory L? As in Theorem 7.4.4,
they can be stated as the equality of composites R\rightrightarrows \Monad0A→ A. However, we
already know the coequaliser of this pair: it is the free L-algebra FA on A. Hence α:MA→ A is an
M-algebra iff \Monad0A \twoheadrightarrow MA→ A is an L0-algebra satisfying the laws.
(d)
op
(Bob Paré) Set ≅ Mod(M, η,µ), where this monad arises from the symmetric self-adjunction of
Example 7.1.9(b) with Σ = Ω, ie the contravariant powerset. Monadicity in this case is a
consequence of the Beck-Chevalley condition for the quantifiers (Remarks 9.3.7 and 9.4.3). From
the definition of an elementary topos as having finite limits and powersets, it follows immediately
that it has finite colimits, though the resulting constructions are more complicated than those in
Section 2.1. Indeed toposes have products and sums of the same shapes, Proposition 9.6.13 and
[BW85, Section 5.1].
Applications We came to monads from finitary algebraic theories. The types analogue of
Theorem 3.9.4 and Exercise 3.37 is
PROPOSITION 7.5.11 Every finitary monad on Set, ie for which M preserves filtered colimits, is
isomorphic to that given by the free algebras for some single-sorted finitary algebraic theory L, and then
Mod(L) is externally locally finitely presentable (Definition 6.6.14(c)).
PROOF: We shall describe the Lawvere theory (Exercise 4.29). The set of k-ary operation-symbols is
Cn× (k,1) = Mk. The composition
L
k
(Mn) x(Mk) = Cn× (n,k) xCn× ( k,1) → Cn× (n ,1) =
L L L
Mn,
which determines how to apply a k-ary operation-symbol to arguments, is the effect of the adjoint
transposition f→ p:
The multiplication µ thereby captures the laws of L ( cf saturation for closure conditions). For a finite set
n, the value at n of the monad derived from the theory L is the carrier of the free algebra on n generators,
which is Cn× (n,1) = Mn as required, and similarly for functions between finite sets. Since every set is a
L
filtered colimit of finite(ly presented) sets (Exercise 7.21) and M preserves filtered colimits, M is
determined up to unique isomorphism by its values Mn, so it agrees with the monad arising from L.
Again since M preserves filtered colimits, U creates them; F preserves finite presentability, and Mod(L)
is LFP. []
As we have repeatedly pointed out, infinitary algebraic theories with arbitrary laws present problems
when we reject the Axiom of Choice. Fred Linton developed monads as a useful alternative, showing
that the symbolic and diagrammatic notions are equivalent in the presence of Choice [Lin69 ]. Monads
can give an unexpected algebraic perspective on topology: for example the category of compact
Hausdorff spaces is monadic over Set, the left adjoint being the space of ultrafilters. See [ Man76] for
this and a monadic treatment of algebra.
REMARK 7.5.12 The infinitary operations of most interest are meets, joins, limits and colimits.
Proposition 3.2.7(b) and Theorem 3.9.7 showed in particular how to add joins to a poset, retaining
certain specified joins. If a poset A is able to carry an algebra for a join-adding monad, then this structure
α is unique, and α\dashv η ; such monads may be recognised by the fact that µ \dashv η . They were
A X MX
investigated by Anders Kock [Koc95] and V. Zöbelein. For meet- or limit- adding monads, the
adjunctions are reversed. Alan Day found such a monad over Sp (or Dcpo) whose algebras are
↑
continuous lattices and whose homomorphisms preserve ∧ and ∨ [GHK+80]. This was generalised (to
smaller classes of meets) in [Tay90] and [ Sch93].
REMARK 7.5.13 The Transfinite Recursion Theorem 6.7.4 relates the algebras for the covariant powerset
monad (complete join- semilattices) equipped with an endofunction, to the well founded coalgebras for
the functor alone, which we discussed in Example 6.3.3 and Remark 6.7.14 . In fact the coalgebras for
any functor which happens to be part of a monad carry partial ``successor'' and ``union'' operations
(Exercise 7.45) [CP92, JM95,Tay96b].
REMARK 7.5.14 Eugenio Moggi has argued that certain monads should be regarded as notions of
computation [Mog91]. To each type X we associate the type MX of computations (whose ultimate results
would be) of type X. For example, with the lift monad (Definition 3.3.7), morphisms Γ→ LiftX are
partial (possibly non-terminating) programs. Additional structure, known as a strength, is needed for
this, as for parametric recursion in Remark 6.1.6 and Exercise 6.23. Moggi gave a symbolic form (the ``
let'' calculus) for this piece of category theory.
REMARK 7.5.15 Jon Beck himself studied monads to unify homological algebra [ BB69]. Applying the
n-i-1
functor M repeatedly to an object X, the natural transformations M µ i provide the boundary
MX
operations (S-morphisms) of a simplicial complex (Exercise 4.15). On the other hand the ``ordinals''
n
M ∅ just mentioned provide an abstract system of simplices (point, interval, triangle, tetrahedron, ...), so
n
we may use maps M ∅→ X to investigate the homotopy of any object X.
Echoing what we said about the utility of adjoints at the end of Section 7.3, whenever you have a
construction which yields an object of the same category as its data, it is well worth looking for natural
transformations making it a monad. The opportunities for mathematical investigation from such a simple
thing are quite striking: the algebras for the monad often turn out to form important categories in their
own right, and the iterates of its functor provide detailed information about the abstract topology and
recursion theory of the category.
products and function-spaces in any category S in which we want to define semantics. On the other
hand, mathematical intuition is that there is no God-given choice of the product of two topological
spaces: these objects may always be interchanged with isomorphic copies.
Consequently there are strong and conflicting opinions about whether, when we say that a category has
products etc , we mean them to be chosen or merely to exist. The second point of view is important even
in formal reasoning, as a systematic way of discarding type-theoretic detail, cf Example 2.4.8. This
conflict, to which we have referred in Sections 1.2, 4.5 and 7.1, is the price we have to pay for the
versatility of category theory, but now we intend to resolve it, showing that it amounts to the difference
between strong and weak equivalences .
Logicians, when they have considered semantics at all, have traditionally called it complete if there are
enough models in a fixed universe of sets to distinguish the syntax (Remark 1.6.13). That is, if L\vDash
φ, ie some property φ holds for all models of a theory L, then it is provable ( L\vdashφ). We have already
achieved this goal (for the fragments of logic we have discussed) by opening out the universe to
encompass models built from the syntax itself. Now we want to go further, and treat syntax and
semantics as equals - literally - requiring completeness of the syntax for the semantics as well as vice
versa .
We aim to construct a language L from any suitable given category C, such that C ≅ Cn[] .
L
Theorem 4.2.12 did this for the unary case, where the structure [] just consisted of identities and
compositions: the canonical elementary language L = L(C) has a type \qq X for each object of C and an
operation-symbol \qq α for each C-morphism. L also names as laws all of the equations which hold in
C, so \qqdash preserves the structure [], ie it is a functor, and \Clone ≡ C by Theorem 6.2.8(c).
L
Recall that we also called L an elementary sketch. Although we shall describe the language for products
etc symbolically, many of the ideas of this section come from sketch theory (and others from sheaves).
Encoding operations We shall consider those connectives [] of type theory which are characterised by
universal properties: products, sums, exponentials, quantifiers, List(-) and P(-), but not tensor products.
REMARK 7.6.1 Suppose that the objects X and Y have a product P and a sum S in a category C. This
means that there are C-maps
π π ν ν
X← 0 P→ 1 Y X→ 0 S← 1 Y
satisfying certain universal properties, but we shall ignore the latter for the time being. Adding product
and sum types to the canonical elementary language, together with the pairing and case analysis meta-
x+
operations, in the syntactic category Cn there are morphisms
\qq P → \qq π0,\qq π1 \qq Xx\qq Y \qq S← [\qq ν0,\qq ν1] \qq X+\qq
Y.
Although they are interpreted as identities in the intended semantics, these maps are not invertible in the
syntax (or in arbitrary models of it) as it stands. So if we want P to name the product we must add
x:\qq X,y:\qq Y\vdash \pair (x,y):\qq P
X,Y,P
and an η-rule pair(\qq π0(z),\qq π1(z)) = z, which force pair to be the inverse of \qq π0,\qq π1. The
definable operation for the sum goes in the opposite direction, so the encoding operation is
REMARK 7.6.2 P need not be the chosen product. Let α:Q ≡ P be a semantic isomorphism, so we have
another product diagram in C ( cf Theorem 4.5.6), and the composite \qq αo\pair obeys the laws for
X,Y,P
\pair . Since inverses are unique, if this operation had also been named it would be provably equal to
X,Y,Q
the derived form. For a semantically given category, therefore, it is harmless to name all of the diagrams
(or as many as we please) which satisfy the universal property.
On the other hand, some diagrams may be products ``accidentally,'' not because we intended them to be.
If we omit the corresponding pair encoding, we get a new product (that is, the syntactic product will not
be isomorphic to the unwanted semantic one). Including its encoding operation in the language is how
we give our approval to a particular semantic product, and this will ensure that the functor \qqdash
preserves it.
EXAMPLE 7.6.3 The theory of monoids was given in Example 4.6.3(f) by a finite product sketch.
A traditional syntactic account would say that multiplication has two (separate) arguments, but in this
section all operations are treated as unary. We introduce a new object M(2) to be the source, and have to
give additional information to force M(2) to be MxM. This is what the upper parts of the diagrams in
Example 4.6.3(f) say. An interpretation of this finite product sketch in any category S with finite
products is an interpretation of the underlying elementary sketch for which M← M(2)→ M becomes a
product cone in S. We have just described the syntax which is needed to do the same thing.
EXAMPLE 7.6.4 The canonical elementary language of a poset (Σ, ≤ ) lists the elements of Σ and all
instances of the order relation ≤ between them; a general language L of this kind is simply a set Σ with
any binary relation < , and \Clone is the reflexive- transitive closure (Section 3.8).
L
Analogously to Cn× we have the classifying semilattice Cn\land of a Horn theory (Σ,\triangleright )
L L
constructed in Theorem 3.9.1. By Proposition 3.9.3, any meet-semilattice arises in this way, where, if we
want to specify that p = x∧y, we must add this fact to the language L, by writing
Canonical language The tradition of sketches is a minimalist one: to include only enough objects to
determine the theory. We also want to show that categories which have all finite products, specified or
not, arise as classifying categories. To deal with each of these applications, we give three versions of the
definition.
The equational laws ought be treated in the same way as the additional structure such as products and
function-types. However, it is technically simpler to assume in part (a) below that the unary part of the
structure, which includes all of the equational laws amongst expressions, has been dealt with already (we
did this in Theorem 3.9.7 for the lattice case too). The underlying elementary sketch is therefore a
category.
To this we add encoding operations such as pair, split and (for function-types) \abs , together with the β-
and η-rules that force them to be inverse to the corresponding definable operation in Remark 7.6.1.
(a)
a subcanonical language for C if it consists of (the whole of) the elementary language L(C),
together with some encoding operations for constructions which obey the appropriate universal
properties in C (there need only be a few of these, or maybe none at all, but it is important that
we only nominate products etc for the language if they really were products in the semantics, cf
Definition 3.9.6(b)) ;
(b)
the weakly canonical []-language L [](C) if it has at least one encoding operation for each tuple
of sub-types;
(c)
a strongly canonical []-language for C if it has exactly one object and one encoding operation
for each tuple of sub-types. That is, it makes an assignment of []-structure, which it registers by
including these particular encoding operations in the language. (The word ``canonical'' is carrying
two senses here: the plain English one that it is making a choice, and the one from sheaf theory,
Definition 3.9.6, that it accounts exactly for all of the semantic structure.)
In the last case the interpretation [[-]] of L in S is defined, by Remarks 4.6.5, 4.7.4 and 5.5.2. Besides the
operations and meta-operations of [], it must define the elementary language ( ie the objects, maps and
composites) of C , and the encoding operations. These must be
The quoting operation \qqdash preserves just that structure of C which L names. In particular it is a
functor because the identities and composites \qq α;\qq β = \qq α;β are named as laws of L.
(a)
The functor \qqdash sends it to a product cone in Cn[] iff there is some definable term pair in
L
the syntax such that pair:\qq Xx\qq Y ≡ \qq P. Either pair is an encoding operation obeying the β-
and η-rules above, or there is some other definable term which does this job.
(b)
In this case \qqdash is full and faithful for morphisms Γ→ P iff P was a product in the semantics,
since otherwise \qq P has mediators in Cn[] which either did not exist or were not unique in C
L
(a)
Let (F,U,η,ε):C ≅ D be a strong equivalence of categories, where D has assigned categorical
structure of some kind, such as products. Then the equivalence transfers it to C.
(b)
Suppose that products etc exist as a property in D and that there is an equivalence functor F:C →
D. Then C also has products.
D
PROOF: Let P = FXx FY.
C
However, the unit η is also needed - and it must be natural - to show that U P = Xx Y. []
DEFINITION 7.6.8 The structure [] is said to be conservative if, for any category C and subcanonical []-
language L, the functor
is full and faithful. That is, every map a:Γ ≡ [x:\qq X]→ [y:\qq Y] is of the form [y: = \qq α(x)] for a
unique α:X→ Y in C. If the interpretation is defined then α = [[a]], so the issue is uniqueness, ie to show
that a = \qq [[a]]; in particular there is nothing more to do in the poset case.
Conservativity is a theorem that we must prove for each fragment [] of structure, along with giving its
interpretation. Traditionally, this term means the relative property of the extension of a theory by some
new connective, for example adding function-types to algebra. According to our definition, [] is
conservative if the extension to the full []-structure is conservative relative to every intermediate
structure L.
The equivalence
THEOREM 7.6.9 Let [] be a conservative structure, C a small category which has this structure and L a
subcanonical []-language for C .
(a)
Suppose C has and L names a choice of all []-structure; that is, it is a strongly canonical []-
language (Definition 7.6.5(c)). Then the corresponding []-type theory has an interpretation [[-]] in
C, which is a []-preserving functor, and
[[-]]
The origin of the name gluing is that this is how to recover a topological space from an open set and its
complementary closed set (Exercise 3.71). The construction for Grothendieck toposes was first set out
in [AGV64, Exposé IV, §9.5]. Considered as inverse images, functors between toposes with the rich
properties of (π0,π1):S↓ U→ SxA or π1:S↓ U→ A are called surjections and open inclusions
respectively (geometrically, SxA is the disjoint union of S and A.)
The gluing construction Recall from Example 7.3.10(i) that, for any functor U:A→ S, the gluing
construction is the category S↓ U whose objects consist of I ∈ obS, Γ ∈ obA and f:I→ UΓ in S, and
whose morphisms are illustrated by the diagram below. We shall say that (I,Γ,f) is tight if f is an
isomorphism.
PROPOSITION 7.7.1 Let U:A→ S be any functor. (We emphasise the case where it preserves finite
products and maybe pullbacks, and S is a topos; the dotted lines signify even stronger assumptions
which we do not wish to make. See Exercise 3.70 for posets.) Then
(a)
π1:S↓ U→ A is an op-fibration, and also a fibration if S has pullbacks (Definition 9.2.6);
(b)
if S has an initial object \initialobj then π1 also has a full and faithful left adjoint, E, so π1
S
(c)
E identifies A with the full (co-reflective) subcategory of S↓ U in which I = \initialobj ;
S
(d)
if \initialobj is strict (Definition 5.5.1) and A has a terminal object \terminalobj then this
S A
(e)
in this case, moreover,
E(Γxπ1(Jf→ U∆)) = (0→ U(Γx ∆)) = E(Γ)x(Jf→
U∆)
holds and the co-unit E·π1→idis a cartesian transformation (Remark 6.3.4, cfthe Frobenius law,
Lemma 1.6.3, Corollary 9.3.9);
(f)
π1 has a right adjoint, A, so π1 preserves colimits and A limits;
(g)
A identifies A with the full subcategory of S ↓ U consisting of tight objects, which is therefore
reflective; it is also an exponential ideal (Exercise 7.11) if, as we shall show, S↓ U is cartesian
closed;
(h)
U = π0·A and π1·A = id;
(i)
A preserves whatever colimits U does (but one of our main goals is to show that this happens); if
U has a right adjoint R:S → A and A has pullbacks then A also has a right adjoint; omitted
diagram environment
(j)
π0 is a fibration;
(k)
π0 preserves whatever limits U does; indeed if F\dashv U with unit η then π0 has a left adjoint,
identifying S with the full subcategory of S↓ U ≅ F↓ A which consists of universal maps
(Definition 7.1.1);
(l)
if A and S have and U preserves 1 then π0 has a right adjoint T, so π0 preserves colimits and T
limits;
(m)
T identifies S with the full subcategory of S ↓ U in which Γ = \terminalobj ; omitted diagram
A
environment
(n)
(π0,π1) creates colimits, and has a right adjoint V if S has binary products;
(o)
(Gavin Wraith) S↓ U is the category of coalgebras for the comonad ( cf Definition 7.5.4) on SxA
induced by (π0,π1)\dashv V;
(p)
(π0,π1) creates any limits which U preserves, and has a left adjoint if U does and A has binary
coproducts. []
COROLLARY 7.7.2 Assuming only that U preserves finite limits, if A and S have the following structure,
so does S↓ U, and π1 preserves it:
(a)
finite limits;
(b)
stable disjoint sums (Section 5.5);
(c)
(d)
effective regularity (Barr-exactness);
(e)
the structure of a prelogos;
(f)
that of a (countably) complete prelogos;
(g)
being a pretopos;
(h)
N (Example 6.4.13) and List(X) (Exercise 6.30).
Implication and the function-type are considered in Exercise 3.72 and Proposition 7.7.12; Exercises
7.50- 7.51 deal with factorisation systems (and so the existential quantifier, Section 9.3) and Ω (higher
order logic).
NOTATION 7.7.3
(a)
Cop
Define a functor U:Cn[] → S ≡ Set by
L
Γ→ Cn[] (\qqdash ,
L
Γ),
(b)
and another functor Q:C→ S↓ U by
X → (\H ,\qq X,
X
\q ),
X
Cop
where \q :\H →F\qq Xis a morphism of S≡Set , iea natural transformation between presheaves. It has
X X
components
\q :\H (Z) ≡ C(Z,X)→ U\qq X(Z) ≡ Cn[] (\qq Z,\qq
X,Z X L
X)
given by quoting \qqdash of C-maps Z→X(which are the operation-symbols of L). In Proposition 7.3.9,
Qis the mediator to the comma category from the lax square consisting of the Yoneda embedding \H(-):C
Cop
\hookrightarrow Set , quoting \qqdash :C→Cn[] andq:\H(-)→U\qqdash .
L
EXAMPLE 7.7.4 Let C = {1} and suppose that L says that \qq 1 is indeed the terminal object (so there is a
nullary encoding operation \vdash ∗:\qq 1 with one η-rule x:\qq 1\vdash x = ∗). Then
gives the set of global elements of a context, ie its closed terms, or proofs under no hypotheses
(Remark 4.5.3). In sheaf theory this functor is traditionally called Γ, which of course conflicts with the
notation of this book. When A is a topos (instead of Cn[] ), [^(A)] ≡ Set↓ U is called the Freyd cover or
L
scone (Sierpi\'nski cone) of A, the (closed) vertex being Set qua the one-point topos ( cf lifting a
domain, Definition 3.3.7 and Exercise 3.71) . Also, Q(1) = (id:1→ U1). []
PROOF: An (S↓ U)-map QX→ QY is a pair (ξ,a) making the square below commute, but by the Yoneda
Lemma (Theorem 4.8.12(a)) any S-map (natural transformation) ξ:C(-,X)→ C(-,Y) is of the form post
The other map, a, belongs to Cn[] , ie it is syntactic, a substitution. We must show that a = \qqα without
L
assuming the theorem we're trying to prove, that \qqdash is full and faithful. In fact this follows from the
fact that the square commutes at \id ∈ C(X,X). Hence (ξ,a) = Qα. []
X
(a)
S↓ U has and π1 preserves []-structure ,
(b)
S satisfies the axiom-scheme of replacement, and
(c)
Q is a model of L, ie Q preserves any []- structure which L specifies.
In practice, π1 preserves []-structure on the nose, and [[-]] is defined by structural recursion so that [[Γ]]
= ([[Γ]]0,Γ,\q ), where [[-]]0 = π0[[-]].
Γ
The functor [[-]] reflects the existence of isomorphisms: if [[Γ]] ≡ [[∆]] somehow then Γ ≡ ∆. (This is the
categorical analogue of an injective function between posets.) For higher order logic, [[-]] need not be
full.
PROOF: Since Cn[] is the classifying category, and using the axiom-scheme of replacement to justify the
L
recursion, the model Q extends to a []-preserving functor [[-]]:Cn[] → S↓ U making the upper triangle
L
The lower Cn[] is also a model, for which both id and π1[[-]] serve as the mediator from the classifying
L
category (since [[-]] and π1 preserve []- structure), so they are isomorphic, π1[[Γ]] ≡ Γ. Hence [[-]] is
faithful, whilst Q is full and faithful, so \qqdash is also full and faithful. []
EXAMPLES 7.7.7 The first clause of the theorem is applicable to the structures [] listed in Corollary 7.7.2
(and more). The difficulty lies in condition 7.7.6(c), ie what structure is named by the language L.
(a)
L may be just the canonical elementary language of C, with no extra structure.
(b)
L may also include some tuple maps, or, more generally, encoding operations for some finite
limits (Definition 7.6.5). Then C, Cn[] , S and S↓ U have these limits and \H(-), \qqdash , U and
L
Q preserve them.
(c)
L also includes some stable colimits, encoded by a Grothendieck topology J. The category S =
Cop
Set of presheaves must be replaced by the category Shv(C,J) of sheaves, which freely adjoins
colimits, but keeping those in J, cf Theorem 3.9.7 for posets.
(d)
Corollary 7.7.13 shows that Q and A also preserve exponentials.
The construction relies on the fact that S, which is a topos, has all of the extra structure [] (plus the
axiom-scheme of replacement) and the Yoneda embedding is full and faithful and preserves it. However,
S does not freely adjoin this structure (except in the case of arbitrary stable colimits), and the question is
whether the embedding into the free category Cn[] is full and faithful.
L
REMARK 7.7.8 In the case C = {1}, an object If→ UΓ ≡ Cn[] (1,Γ) of S↓ U is a family of closed terms
L
or proofs of Γ, indexed by I. More generally, it is a cocone \Fred :\qq \typeX →Γ of such proofs under a
i i
certain diagram \typeX(-):ℑ→ C of base types or hypotheses. This diagram is the discrete fibration
op
corresponding to the sheaf I:C → Set by Proposition 9.2.7.
Notation 7.7.3(b) provided a specific sheaf of closed terms \q of each base type X ∈ obC, so we shall
X
call it the realisation of X. Theorem 7.7.6 showed that this is an isomorphism (the realisation is tight) for
base types, and extended the notation to general contexts. We already know that the full subcategory of
tight objects in S↓ U is closed under definable limits, so the same is true of the class of tightly realised
contexts. We shall see that this extends to colimits and exponentials, so if [] consists only of this (first
order) structure then A is already the interpretation [[-]], and this is full as well as faithful.
For higher order logic the realisation is no longer tight. Andre Scedrov and Philip Scott [SS82] trace the
method back in the symbolic tradition to Stephen Kleene's realisability methods (1962), and link it to the
categorical construction. Peter Freyd found this after hearing the presentation of Scott's work with
Joachim Lambek [ LS80] at a conference, and not at first believing the theorem below which their
results implied.
naturally in (- and) Γ. We have to show that \q (\m (u)) = u for each u ∈ UΓ ≡ Cn[] (\qqdash ,Γ),
Γ Γ L
along the bottom row of the next diagram. Evaluating this equation at each X ∈ obC, we use naturality
with respect to \u :\qq X→ Γ.
X
Then the diagram in Set commutes and the required law follows from its effect on \id ∈ C(X,X). []
X
THEOREM 7.7.10 [Freyd] U:Cn[] → S preserves colimits named in [], but not necessarily those named in
L
(a)
the initial object, so there is no closed term \vdash 0 in Cn[];
(b)
coproducts, so there are just two closed terms \vdash 2 ;
(c)
regular epis, so 1 is projective (Remark 5.8.4(e));
(d)
coproducts and coequalisers (Example 6.4.13), so the closed terms \vdash N are the numerals. []
PROOF: We have just shown that U is a retract of [[-]]0, which preserves whatever colimits are in [], and
hence so does U (Exercise 7.13). []
(a)
is consistent;
(b)
has the disjunction property: if \vdash φ∨ψ then either \vdash φ or \vdash ψ;
(c)
has the existence property: if \vdash ∃x.φ[x] then \vdash φ[a] for some a;
(d)
has standard arithmetic. []
S can prove consistency of [] because it has been strengthened with the axiom-scheme of replacement,
to which we return in Section 9.6.
Exponentials Recall that [I→ J](X) = S(\H xI,J) by Exercise 4.41, whilst UΓ(X) = Cn[] (\qq X,Γ) by
X L
Notation 7.7.3(a).
PROPOSITION 7.7.12 Suppose A and S are cartesian closed, S has pullbacks and U preserves finite
products. Then S↓ U is cartesian closed and π1 and A (but not π0 or U) preserve exponentials. (See
Exercise 3.72 for the version for Heyting semilattices.)
PROOF: Given (If→ UΓ) and (Jg→ U∆) in S↓ U, we form an internal version of the hom-set (S↓ U)(f,
g), namely the pullback
Uev
UΓxU[Γ→ ∆] ≡ U Γx[Γ→ ∆] →
U∆.
Then (Hh→ U[Γ→ ∆]) is the required exponential, with λ-abstraction
COROLLARY 7.7.13 Q preserves any exponentials that are named in L. If ∆ has tight realisation then so
does [Γ→ ∆] for any Γ whatever.
PROOF: In computing Q(X→ Y), the edges of the pullback square above are all invertible, the vertices
being isomorphic to \H . []
X→ Y
THEOREM 7.7.14 [Yves Lafont, [Laf87, Annexe C]] The λ- calculus is a conservative extension of
algebra. []
This is as much as is needed for the equivalence between semantics and syntax in Theorem 7.6.9. We
haven't proved the normalisation theorem as such, but by a variation on this technique every term is
provably equal to a normal form [MS93, AHS95,CDS98]. It seems likely that a purely categorical proof
will be found for strong normalisation itself, handling reduction paths in the fashion of Exercise 4.34.
We return to consistency and the axiom of replacement in Section 9.6.
2. Define the Cauchy and Dedekind completions as functors on suitable categories and prove their
universal properties.
3. Show that the functor which assigns the set of components to a graph (Example 7.1.6(c))
preserves finite products but not equalisers or pullbacks. Explain why the axiom of choice is
needed to extend this to infinite products.
4. Let T:Set→ Set be a functor coding an infinitary free theory as in Chapter VI. What is the
categorical structure []-Cat which corresponds to this fragment of logic ([])? [Hint: restrict the
arity to κ and consider κ-ary products.] Describe the classifying category Cn[] . Find a category
T
C of languages (whose objects are functors such as T) so that this classifying category is the
universal map from the object T to the forgetful functor U:[]-Cat→ C .
X
5. Explain how naturality of λ and ε defines postcomposition and the effect of the functor (-) on
maps.
6. Show that any adjunction between groups must be a strong equivalence, the two functors being
isomorphisms whose composition is a conjugacy (inner automorphism) ( cf Exercise 4.36).
ℑop
7. Let U = S(1,-):S→ Set be the global sections functor of a topos S = Set . Show that U
calculates limits of presheaves considered as diagrams in Set, and that \constfunct \dashv U ,
where \constfunct X(I) = X for X ∈ obSet and I ∈ obℑ. The functor \constfunct itself has a left
adjoint: what is it?
8. Let S be the monoid of parades in the Lineland Army (Example 1.2.7). Show that the operation
op
S \hookrightarrow S which reverses a parade and promotes everyone by one grade is a monoid
homomorphism which is symmetrically adjoint to itself on the right. Show that this is the free
monoid-with-a-monad.
9. Given ``F\dashv U'' with ε:F·U ≡ \id naturally, and a family of isomorphisms η :X ≡ U(F X),
D X
show that U is full and faithful iff η is natural ( cf the proof of Theorem 7.6.9).
10. Let A be a cartesian closed category and F:A→ S a functor which preserves binary products and
has a right adjoint U. Show that [X\typearrow U A] ≡ U[FX\typearrow A] for X ∈ obS and A ∈
S A
11. Deduce that any reflective subcategory A ⊂ S of a cartesian closed category is an exponential
ideal ( [X\typearrow UA] ∈ obA and is the exponential there) iff the reflection preserves
S
12. Let U:A ⊂ S be a full replete subcategory. Show that there exists a functor F:S→ A such that
both F\dashv U and U\dashv F iff each object X ∈ obS carries a natural split idempotent α :X→
X
13. Let F:C→ D be a (co)limit- preserving functor and α:F→ F a natural idempotent. Suppose that
each α :FX→ FX has a splitting GX in D. Show that G:C→ D is also a (co)limit-preserving
X
functor.
14. Treating diagrams as functors, explain how (co)cone s are natural transformations.
16. Construct (finite) connected limits from equalisers and (finite) wide pullbacks.
17. Show that \colim \colim \typeX( , ≡ \colim \colim \typeX( , ). (For joins, see Lemma 3.5.4.)
ℑ J IJ) J ℑ IJ
18. Suppose that S has (co)limits of shape ℑ , and let C be any (small) category. Show that [C→ S]
C
≡ S also has (co)limits of shape ℑ, and that they are constructed pointwise, cf Lemma 3.5.7.
19. Let ℑ be a filtered category and U:ℑ → J a final functor. Show that J is also filtered. [Hint: use
filteredness to simplify the definition of finality first.] Show that if U is also full and faithful then
filteredness of J implies that of ℑ ( cf Exercise 3.15). In this case show that, to test finality, it
suffices that the object class be ``cofinal'' (Proposition 3.2.10): ∀X:obS. ∃A:obA. ∃f:X→ UA.
20. Prove the converse of Proposition 7.3.11. [Hint: consider the Yoneda embedding \typeX(-):ℑ→
ℑop
C = Set and let Θ = \H .]
I
X
21. Show that any set X is finitely presentable iff (-) preserves filtered colimits, and that every set is
a filtered colimit of finitely presented sets, cf Proposition 6.6.13ff. [Hint: consider ℑ = § ↓ X.]
fp
X
22. Collect and compare properties (such as Definition 6.6.14) of the form ``(-) preserves colimits
of shape ℑ'' for various classes of diagrams ℑ.
23. We say that X is finitely related ( cf stable in [Joh77, p. 233]) if, in any pullback of the form
omitted diagram environment with A and B finitely generated, K is also finitely generated. In
other words, for any two lists of elements, there is (in the internal sense) some list of
coincidences between them. Show that X ∈ ob Set is finitely related iff it has decidable equality.
24. Formulate the definitions of finitely presentable, generable and related for objects of Mod(L),
where L is a finitary algebraic theory. Show that X is finitely generated iff C(X,-) preserves
directed unions, finitely related iff this functor preserves filtered colimits of surjections, and
finitely presented iff all filtered colimits are preserved. Hence show that it is finitely presented iff
it is finitely generated and finitely related.
25. Let L be a disjunctive theory such as trichotomous orders, coherence spaces or fields (Section
Σ
5.5). Show that the forgetful functor Mod (L)→ Set creates connected limits. For the theory of
fields having roots of specified polynomials, show that the forgetful functor creates wide
pullbacks but not equalisers .
26. Let F:C→ A, U:A → C, B:A→ K and Y:C → K be functors. Show that F\dashv U if there is a
natural bijection
omitted prooftree
environment
for all Band Y, iein the ``opposite'' sense to Theorem 7.2.2.
(\LeftAdjoint1·ε2· \RightAdjoint1);ε1 and adjoint transposition λ1o λ2. Show also that this notion
of composition is associative (up to equality) and has a unit.
29. Deduce that if a diagram of left adjoints commutes up to isomorphism then so does the
corresponding diagram of right adjoints.
30. Explain how the triangle laws (Definition 7.2.1) give a meaning to the notion of adjunction
within any 2-category C (Definition 4.8.15). Show how to define the 2-category C\dashv which
has
(a)
the same objects (0-cells) as C;
(b)
as 1-cells, adjunctions, composition being given by Exercise 7.27;
(c)
as 2-cells, natural transformations as in Exercise 7.28.
31. Show that the forgetful 2-functor C\dashv → C which extracts the left (or right) part of the
adjunction and natural transformations is full and faithful at the 2-level. Deduce that the 2-
category of left adjoints is equivalent to that of right adjoints, in the weakest sense of Definition
4.8.9(d). Explain why they are not directly related.
32. For any 2-category C, (C\dashv)\dashv has the same objects (0-cells), but the 1-cells E→ F consist
of four 1-cells A:E→ F, \typeB1,\typeB2:F\rightrightarrows E and C:E → F and eight 2-cells of
C with A\dashv \typeB1 and \typeB2\dashv C. By looking at various adjoint transpositions, show
that \typeB1 ≡ \typeB2 canonically and hence that (C\dashv)\dashv is strongly 2-equivalent to the 2-
category whose 1-cells are adjoint triples in C. Describe the 2-cells. By applying the result for C
to C\dashv, show that ((C\dashv)\dashv)\dashv is strongly 2-equivalent to the 2-category of adjoint
sequences of length 4 and so on. ( Arbitrarily long chains of adjunctions exist by Exercise 3.61
and its categorical analogue.)
33. What does the solution-set condition in the General Adjoint Functor Theorem 7.3.12 mean in the
case of Proposition 6.1.11, and how do the conjunctive interpretation and equationally free
algebras show that it is satisfied?
34. Formulate a solution-set condition for a pre factorisation system (E,M) in which M-maps need
not be mono, and use it to factorise maps ( cf Proposition 5.7.11). [Hint: consider the category
whose objects are factorisations of the given map into an E and an arbitrary map.]
op
35. Let F:C xC→ C be a mixed variance functor and Γ ∈ obC. A wedge from Γ to F is a dinatural
X
transformation (Exercises 4.44ff), and the end ∫ F(X,X) is the final wedge, just as a cone is a
natural transformation from an object to a diagram and its limit is the final cone. Show that A =
X
∫ [[A→ X]→ X] for any object A of a cartesian closed category. Deduce the other parts of
Remark 2.8.11.
36. Let U:A→ C be a functor. A C-map e:Γ→ UA is called a candidate if, in any commutative
square of the form on the left, omitted diagram environment there is a unique p:A→ B such that
both triangles commute (without U, ie p;m = z and not just Up;Um = Uz). Suppose that every
map Γ→ TX in C factors uniquely as e;Um, with e a candidate. Show that U preserves wide
pullbacks. ( Cf factorisation systems, Definition 5.7.1.)
n
37. Show that any functor T:Set→ Set of the form \coprod \argc x X satisfies the conditions of
n∈N n
the previous exercise, where the candidates with Γ = 1 and A = n correspond bijectively to
elements of \argc . André Joyal has called such a functor analytic [Joy87,Tay89 ].
n
38. Describe the left adjoint of mor: Cat→ Set. [Hint: not List.]
39. Show that IPO ⊂ Dcpo is the full subcategory of objects which are able to support an algebra
structure for the lift monad, and that this is what the co-Kleisli category on the category of
algebras is always like.
40. Show that an adjunction F\dashv U is of Kleisli type (Proposition 7.5.3(a)) iff F is essentially
surjective and every ε is a self-presentation ( cf Theorem 7.5.9).
A
41. (Robert Seely) Let A be a symmetric monoidal closed category (so it has a tensor product ⊗ and
a mixed-variance functor \multimap satisfying (-)⊗X\dashv X\multimap ( = ) ) which also has
finite products and is equipped with a comonad ! such that !(AxB) ≡ !A⊗!B and !1 = I (the tensor
unit). Show that the co-Kleisli category is cartesian closed, with [A→ B] = !A\multimap B.
(Exercises 4.27ff give a concrete example.)
42. A function f:A\multimap B between L-domains is called linear if it preserves locally least upper
bounds ie if a is a locally least upper bound of \arga1 and \arga2 in A then so is f(a) of f(\arga1)
and f(\arga2) in B. In particular f is Scott-continuous, but this is weaker than the condition needed
for a right adjoint unless A is a complete semilattice ( cf Exercise 3.33ff). Let A\multimap B be
the dcpo of linear functions, with the pointwise order, which agrees with Exercise 4.27 for
complete semilattices. There we found (-)⊗A\dashv A\multimap ( = ). Use the fact that A = ∪ A↓
a
a for any L-domain A (Exercise 3.34), to deduce the existence of A⊗B for L-domains from the
interaction of colimits and left adjoints. Describe A⊗B for boundedly complete domains
(Exercise 3.21).
43. Show that \Monad0 in Remark 7.5.2 is the free monad on the functor T, formulating a suitable 2-
category in which this is so.
44. By analogy with Exercise 3.36 and Proposition 7.5.11, explain how any (infinitary) monad can be
seen as an infinitary single-sorted algebraic theory with a proper class of operation-symbols and
laws.
45. Let (M,η,µ) be a monad on S and suppose that ev:X→ MX is a final coalgebra for the functor M.
Show that for each Γ ∈ ob S there is a unique map f:Γ→ X such that f;η = f; ev. [Hint: η .]
X Γ
46. Formulate the results of Section 7.6 as a reflection of the 2-category of categories with []-
structure as a property and functors preserving it (and natural transformations) into the 2-
subcategory where this structure is canonical and preserved on the nose.
48. Investigate the effect of a natural transformation φ:U→ U′ on S↓ U. Consider in particular the
case where φ is cartesian, ie its naturality squares are pullbacks [Tay88].
50. Let U:A→ S be a functor between categories, each equipped with a factorisation system, such
that U takes ``monos'' of one kind to ``monos'' of the other. (For example all maps in S could be
called ``monos.'') Define a factorisation system on S↓ U which is preserved by π0 and π1. Show
that if the given factorisation systems are stable, then so is the resulting one, so long as U
preserves pullbacks.
51. Let U:A→ S be a functor between toposes that preserves finite limits. In particular it preserves
monos, so there is a semilattice homomorphism p:U\twom → \twom . omitted diagram
A S
INTRODUCTION
● Terms
● Equality of terms
● Dependent types
● Equality of types
● The object-language
● Objects
● Display maps
● Cuts
● Laws
● Normal forms for morphisms
● Substitution by pullback
● Display maps
● Products and equality types
● Display maps in topology and elsewhere
● Relative slices
8.4 INTERPRETATION
EXERCISES VIII
Chapter 8
Algebra with Dependent Types
Mathematical reasoning in the large (as we have been doing it) involves the introduction and
manipulation of symbols for individuals and structures which successively depend on one another. For
example we may introduce a category C, some objects X and Y of C, a morphism f of the hom-set from
X to Y, and (if C is a concrete category) elements of some pullback whose construction involves f.
Similar idioms of presentation may be found in geometry, algebra, analysis and so on. Simple type
theory (reasoning as we have studied it) does not allow for this dependency: we must consider more
complicated calculi.
We studied propositions dependent on terms in Section 1.4, where they were called predicates (but type-
theorists prefer the word proposition). The main story in a proof in the predicate calculus is told by the
logical formulae which are asserted at each step. The proof boxes which we used in Sections 1.4-1.7
were designed to reflect this: variables were consigned to the margin, and we hardly mentioned types as
they and any function-symbols which we used were understood to come from a simply typed algebraic
theory (maybe even a free one).
A dependent-type proof is much busier, since we also have to show that the types and terms we use are
well formed. This goes on in the same current of reasoning, attention passing amongst these different
aspects. Nikolas de Bruijn devised ways of expressing successive dependencies in AUTOMATH, with
various conventions for saying that the context of the previous line was to be repeated, augmented or
diminished, or that the current line was asserted in a global or some other context. As before, here we
shall use the box or sequent style according to the emphasis we wish to place on the changes of context.
For some authors, the phrase ``dependent type theory'' means the study of the universal quantifier ∀ or
dependent product Π, which generalises the function-type → . That is the subject of the final chapter:
this one sets up the correspondence between type theory and category theory at the algebraic level. The
symbolic rules and universal properties of the quantifiers are then directly related. If you wish to go
straight to Chapter IX, you should just observe the way in which contexts are used as vertices of
commutative diagrams, and that this notation is sometimes abbreviated to a juxtaposition of letters
denoting contexts or types.
Examples of types dependent on terms that we have already met include the hom-set C[x,y] for two
objects of any category, and the arity ar[r] of an operation-symbol in any infinitary free theory. In each
of these cases, an alternative presentation mor C→ (obC)2 or κ→ Ω is useful for some purposes
(Remark 4.1.10 and Definition 6.1.1). The targets of these maps are the types of the independent
variables, and the source is the disjoint union \coprod Y[x], displaying the type Y[a] over its index a ∈
x
X. Diagrams (expressing limits or colimits) in categories may also be seen as dependent types, but the
additional arrow information makes the situation more complicated : it is handled by the Grothendieck
construction, Proposition 9.2.7.
The notion of generalised algebraic theory, ie with dependent types, is powerful enough to serve as a
general meta-language. Not only is the theory of categories an example of it, but so is the theory of
generalised algebraic theories itself, as are even stronger notions such as cartesian closed categories and
toposes. ( Cf that a few basic styles of symbolic reasoning such as the ruled lines and sequent notation
for deductions suffice to set out the rules of complex calculi.)
Although any practising mathematician can formulate any particular dependent-type argument quite
fluently, a very complicated recursive construction is needed to describe the generality - which is what
this chapter is about. After describing the calculus, we shall construct the classifying category using the
techniques developed in Sections 4.2-4.6.
Recall that for simply typed algebraic theories this was a category with products; by Section 7.6 every
such category arises in this way. The analogue for dependent types is a category with a class of display
maps, such as κ→ Ω. Syntactically, displays drop one typed variable y:Y[[(x)\vec]] from a context [Γ,y:
Y], whilst substitution of a term a for the variable x gives rise to a pullback square whose vertical maps
are displays and which has the substitution morphism [x: = a] along the bottom. The class of displays
must therefore be closed under pullback against any map.
According to set theory, the passage from a dependent type Y[x] to its display involves the axiom of
replacement. Semantically, the general notation Y[x] is meaningless in itself: we understand it only via
the interpretation of this syntax in terms of displays as given in this chapter, so the correspondence is the
n
definition of Y[x], rather than a theorem or axiom. However, the particular case Y[n] = T (U) of a
(possibly transfinitely) iterated functor does depend for its existence on a substantive axiom, as does the
interpretation functor [[-]] (Section 9.6).
Earlier investigations of this subject made the perhaps more obvious generalisation from products to
pullbacks (Remark 5.2.9). This is the extreme case of our formulation, in which arbitrary functions
between sets play the role of displays. It has the effect of building equality types into the syntax, but this
is not necessarily appropriate when the objects under study are computational (Proposition 8.3.4ff).
Our approach is based on a philosophical and idiomatic point to which John Cartmell [Car86] first drew
attention. He called the relationship between [Γ,x:X] and Γ analytic in the sense of Immanuel Kant. For
example, to speak of a ``sentence'' one must presuppose some particular language, since this is implicit
in the nature of a sentence. By contrast, general functional relationships like ``the official language of a
country'' are synthetic. The point of view we took in Remark 4.1.10 regarding the morphisms of a
category was similarly based on the idea that when f and g compose it is not merely because tgtf = srcg
accidentally.
Generalised algebraic theories can be interpreted in other categories besides sets. On the semantic side,
the investigation of what morphisms may arise as displays can involve non-trivial issues in topology,
order theory and other disciplines about which we shall just give a few hints in Section 8.3. Similarly,
Chapter IX does some calculations about quantifiers and even the type of types.
This subject is very much in its infancy: it is notationally complicated, most algebraists concern
themselves with simply typed theories such as groups, rings, modules, etc ., and logicians are more
interested in the quantifiers. The account in this chapter is perhaps overly influenced by type theory
(though the practitioners of that subject would, on the contrary, find it inadequate), as it allows for a high
degree of interdependence amongst types. This is needed for comprehension ( cf Exercises 2.17
and 8.3ff) and for the equivalence with display categories.
There are, however, numerous examples from algebra whose dependency is ``stratified,'' with simply
typed theories at each level (as in Section 4.6). We saw such a stratification in Sections 5.6 and 7.4 for
the congruences for simply typed theories. Another is the theory of categories. This means that we must
expect all of the difficulties associated with the category of categories (two-dimensional limits, pseudo-
colimits, failure of regular epis to compose) to beset generalised algebraic theories too.
If the intended laws can be separated from the operation-symbols by stratification, then it is no longer a
severe restriction to concentrate on free theories (with no laws within levels). Then there would a
stratified theory of structural recursion, unification and resolution. All of these issues demand further
research, but are beyond the scope of this book.
The aims of the present account are to identify how an object-theory contributes types, operation-
symbols and laws to the language, and how the arguments are supplied by (the category composed of)
substitutions. Section 8.2 describes this category, and Remark 8.4.1 the object-theory. Section 8.1
leads informally towards these from the vernacular.
Instead of defining the objects and maps of the category of contexts and substitutions directly, we do
this by means of an elementary sketch as in Section 4.3. I began work on this chapter from the exercise
of showing that the context-morphisms defined recursively in [Pit95] (essentially Remark 8.2.12) do
indeed satisfy the axioms of a category. This proved to be unreasonably laborious, being a highly
convoluted version of Proposition 2.7.5, and led me to the approach via sketches. This profoundly
influenced the rewriting of Chapters IV and I.
It is well known that syntactic substitution is characterised by pullback, but Section 8.2 is probably the
first explicit proof of this in a category fashioned directly from language. From this careful analysis we
may determine what coherence conditions pullbacks along composites must satisfy, namely none so
long as on-the-nose equalities are never asserted between type-expressions involving different outermost
type-symbols.
In Section 8.4 we repeat the discussion in Section 7.6 about whether the notion of a class of displays in
a category is a structure or a property: again this dispute is resolved by means of the distinction between
strong and weak equivalences of categories.
DEFINITION 8.1.1 The steps in an argument are called judgements. Amongst the direct algebraic steps (those within
a fixed context Γ), we distinguish the following forms:
(a)
Term formation (Γ\vdash a:X): the term a is well formed and of type X.
(b)
Truth ( Γ\vdash holdsφ): that φ is true in the context, ie that there is a well formed proof p of (type) φ,
where p is implicit in the history of the deduction. This is a special case of the previous form, where we
omit proof(-term)s because no distinction is made between them.
(c)
Term equality (Γ\vdash a = b:X): that a and b are equal qua terms of type X. What notion of ``equality'' we
mean needs some discussion.
(d)
Proof equality.
(e)
Type formation ( Γ\vdash X type or Γ\vdash φ prop ): that X is a well formed type (or φ a proposition).
(f)
Type equality (Γ\vdash X = Y or
(g)
Context formation:
We discuss context formation in the next section, where it will become clear that we must also say when two
contexts are equal, using equality of their types. The provision of arguments of types and operation-symbols also
needs formation and equality rules for substitutions.
The idioms which close boxes or make the contexts smaller arise from the quantifiers (Remark 9.1.6 and the whole
of the final chapter). For the time being, therefore, boxes never get closed. Despite the boxes and sequents, this
chapter generalises, not the predicate calculus as we saw it in Sections 1.4- 1.5, but just resolution of Horn clauses
involving atomic predicates, as in PROLOG (Remark 1.7.2ff).
At the algebraic level the only distinction between the behaviour of types and propositions is that elements (terms)
of the same set (type) are distinguished , but proofs are anonymous. By giving names to the Horn clauses, proof-
terms may be assigned to propositions (Remark 6.2.10); the main point of Section 2.4 was to do the same for
implicational logic, using λ-terms. Alternatively, we may simply assert that any two proofs of the same
proposition in the same context are equal.
The classification of judgements also applies to the rules which justify them. These are the subject of the rest of
this section.
Terms In this chapter, a term is an algebraic expression: just as in Definition 1.1.3, it is either a variable x
belonging to the context, or an operation-symbol r applied to (zero or more) sub-terms.
omitted prooftree
environment
where now the types of the second and subsequent sub-expressions may depend on the preceding terms, and the
type of the result may depend on all of them. If there is no such dependency, as in Section 4.6, we can allow the
sub-expressions to have been born simultaneously - at any rate there is no restriction on their order of formation. If
\typeX2 really does depend on \arga1 then we must have had a fragment of proof like
and so on (with intermediate steps). The operation-symbol r might for example be composition, the types being the
set of objects and hom-sets of a category. For propositions r names a Horn clause:
α1,α2,…,α \vdash
k
φ.
The arity of an operation-symbol r is given by listing the types on the left and right of the turnstile. Now, in order
to express the dependency of the later types, the earlier ones must be accompanied by variables. In the dependent-
type situation it is convenient to regard the operation-symbol not as the letter r alone, but as applied to a list of
variables:
→ →
x1:\typeX1,x2:\typeX2[x1],…, x :\typeX [x1,…,x ] \vdash r( x ):Y[ x ].
k k k-1
The informal notation r([(a)\vec]) and X[[(a)\vec]] quickly becomes inadequate, so we shall develop a formalism
in which the arguments are delivered to r by substitution of [(a)\vec] for [(x)\vec], writing
→ → →
:
r[ x a ] for r( a ).
=
This notation allows substitution into expressions (t[[(x)\vec]: = [(a)\vec]]) as well as operation-symbols, repeated
substitutions and weakening ( [^(x)]*t).
A lot of work remains to be done in the next section to define general substitutions u and their action u*t on terms
- the calculus is highly recursive, and we have to break into the circle somewhere - but, in anticipation of this, here
are the first of the formal rules.
(a)
resolution ( cf Remark 1.7.6) of an operation-symbol r,
omitted prooftree
environment
by substitution u= [[(x)\vec]: = [(a)\vec]] of arguments [(a)\vec], and
(b)
variables considered as terms,
omitted prooftree
environment
of which the identity axiom(x:X\vdash x:X) is a special case.
Both of these rules incorporate weakening, cf Definition 1.4.8 and Remark 2.3.8. We study the structural rules in
the next section.
Equality of terms The notion of equality needed in the foundations of dependent type theory is that from
algebra: congruence. The judgement Γ\vdash a = b means that, for any valid judgement Γ,Ψ\vdash J in which a is
a sub-expression, replacing it with b gives another valid judgement.
(a)
the laws of the object-theory, as in Definition 4.6.1(d);
(b)
the rules for an equivalence relation (Definition 1.2.3),
(c)
and pre- and postsubstitution, including in particular congruence for each operation-symbol r,
omitted prooftree
environment
The formal rule, cfDefinition 8.1.3(a), is
omitted prooftree
environment
omitted prooftree
environment
REMARK 8.1.5 Intensional equality Γ\vdash a = b:X remains at the level of judgements: it does not (within the
basic calculus) provide us with a term of some propositional type eq[a,b]. We discuss extensional equality briefly
in Section 8.3. If we want to do something conditionally on the equality of a and b, it is extensional equality that
we need.
Dependent types The dependency of types on terms (in predicates, hom-sets, arities, etc ) is expressed in a
similar way to the application of operation-symbols to arguments, but there are no type variables.
omitted prooftree
environment
where Y is a dependent type-symbol in the object-theory. As we did for the operation-symbols (Definition 8.1.3
(a)), it is convenient to regard the primitive form of the dependent type as having variables for its arguments: [(x)
\vec]:[(X)\vec]\vdash Y[[(x)\vec]]. The formal type-formation rule is then
omitted prooftree
environment
Equality of types The instantiation of a dependent type Y[[(x)\vec]] at equal terms [(a)\vec] = [(b)\vec] gives rise
to equal types. We have not addressed this phenomenon before - indeed we have gone to some lengths to exclude
it - but how can we deny that Factors[9×4] = Factors[6×6]? There must at least be a canonical isomorphism
between the two, but if we chose to make this explicit we would be obliged to introduce formation and equality
rules for it, which would have to obey further coherence rules with respect to other substitutions. Predicates at
equal subjects also give rise to equal types; again there is a canonical way of translating a proof of one into a proof
of the other, which must also obey coherence.
Rather than enter this labyrinth, we accept that types can be intensionally equal, ie if they have a common history
of formation. Whereas set theory allows independently given types to be tested for equality or inequality, we do
not. Section 9.2, however, does look at some of the categorical consequences of replacing equality by
isomorphism .
(a)
reflexivity, symmetry, transitivity,
(b)
and congruence,
omitted prooftree
environment
for which the formal rule is
omitted prooftree
environment
(c)
But the most important consequence of type equality is that terms may acquire new types:
For the sake of giving a little more thought to the all-important coercion rule, we pause to consider its one-way
version.
REMARK 8.1.8 Subtyping generalises equality between types to a non- symmetric relation(-judgement) Γ\vdash U
⊂ X satisfying
Subtyping also arises in object-oriented programming languages, in which complex types are developed from
simpler ones by the addition of constructors and properties. As a mathematical analogy to this style, we may define
a field as an Abelian group, with multiplication as an extra operation and the field axioms as extra conditions.
Terms of the narrower type inherit the covariant properties (positive, in the sense of Remark 1.5.9) associated with
the wider one to which we coerce them; negative properties pass the other way. The coercion functions, like
forgetful functors, need not be injective, but if they are suppressed from the notation then there must be at most
one between any two types. (This has non-trivial consequences for function-types.)
REMARK 8.1.9 Some authors allow axioms stating equality of types, so that Heyting semilattices can be treated in
the same framework. We forbid them, because interchangeability of objects should be expressed as isomorphisms,
ie by means of two operation-symbols and two laws between their composites ( cf Section 7.6). See Exercise 4.7
regarding the quotient of a category by a system of canonical isomorphisms. As we commented before
Proposition 3.2.11, the antisymmetry law for posets is a side-effect of the imposition of algebraic notation (∧, ∨,
⇒ ), and is not an intrinsic feature of logic.
The restriction drastically simplifies the issue of type equality, as type- expressions can be equal only as a result of
making equal substitutions into the same ``outermost'' type-symbol. This is essential to the validity of the structural
recursion used in the interpretation (Section 8.4).
Definitional equality need not be excluded: it is of course very useful to define R = {(L,U):P(Q)|…} as in
Remark 2.1.1. This is harmless to the interpretation of dependent type theory, as we may simply replace the left
hand side of the definition by the right. Inter-provability as a notion of equality of propositions will be discussed in
Remark 9.5.6.
The object-language The pure theory cannot prove the existence of anything apart from the empty context [ ],
so an object-theory is needed. As in simply typed algebra, it has types, operation-symbols and laws.
(a)
type-symbols ∆\vdash X type and proposition-symbols ∆\vdash φ prop , each defined in a context ∆;
(b)
operation-symbols, ∆\vdash r:X, which are typed and in context;
(c)
laws between terms ∆\vdash a = b:X of the same type in context.
In order to give meaning to the ``contexts'' which occur in these data, we have to generate a small part of the
language - as we did to state laws for an algebraic theory in Section 4.6. However, the ubiquitous dependencies
must not be allowed to become circular, so we require that
the types and operation-symbols which occur in ∆ must have been declared in advance.
Abstractly, the presentation defines a relation between each new symbol being defined and the type-symbols and
operation-symbols that are used in the formation of its defining context ∆ (there are operation- symbols in the
terms that are the arguments of the type-expressions). This relation must be well founded.
DEFINITION 8.1.11 A stratified algebraic theory is one which obeys a stronger well- foundedness condition, that all
of the operation-symbols r:X and laws a = b:X must be declared before variables x:X may be used as arguments of
further type-symbols Y[x]. We shall not impose this condition in this chapter, as it is violated by the canonical
language in Definition 8.4.6. See Exercises 8.1- 8.5 and Examples 9.2.4.
In Remark 4.1.10 we said that the maps of a category C with O = obC could be collected together and presented
by src,tgt:(mor C)→ OxO instead of using the dependent type H[x,y]. In fact src,tgt will serve as the display map
[^(f)] that corresponds to this dependent type, which we introduce syntactically and semantically in the next two
sections.
Before composition can be defined, its support - the set of composable pairs (f,g) with tgt(f) = src(g) - must be
constructed. There is a natural syntactic description of this set, namely the context of the rule which introduces c.
It is also the pullback of tgt and src, as marked in the diagram above, cf transitivity of a kernel pair (Proposition
5.6.4).
This chapter shows how to translate between the symbolic and diagrammatic idioms. The next section constructs
the classifying category for L, ie the category in which this diagram is drawn, and shows that squares like the one
marked really are pullbacks. In fact pullback performs all of the substitutions u* in Definitions 8.1.3(a), 8.1.4(c),
8.1.6 and elsewhere. In Section 8.4 we shall give the interpretation of the language, for which we shall need a
sharper form of the induction which generates it.
If you prefer symbols, remember that commutative diagrams simply show the types of the symbols.
Giving a list of successively equal terms does not show the type information (which, after all, is what
dependent type theory is all about), and makes it difficult to follow the reduction rules. In view of the
fact that types contain terms as sub-expressions, the practice of annotating terms with their types is
inadequate:
After defining equality of contexts we discuss the structural rules. These are derived in the sense that if
the premise is deducible using the rules of the previous section then so is the conclusion. It is left to the
reader to demonstrate this. The calculus has a ``term model'' in which
We write x ∈ Γ to mean that x:X is one of the variables which are listed in this context. Note that for a
context Γ, if x ∈ FV(Y) for the type (expression) Y of one of the variables y ∈ Γ then x ∈ Γ itself.
However, for a context extension [Γ,Ψ] we write x ∈ FV(Ψ) if x ∈ FV(Y) for some type in Ψ, where x
may belong to Γ rather than Ψ ( cf Exercise 1.12).
Objects As before, these are contexts, but type dependency means that the class of well formed
contexts must be defined recursively. Indeed the terms, types and contexts must be generated in a single
simultaneous process (as remarked in Section 6.2). The starting point is the empty context [ ], which
will be interpreted as the terminal object 1, but we shall find that a general context Γ is always present in
the background.
The objects and maps of the category will be contexts and multiple substitutions. Morphisms may only
compose if the target of one matches the source of the other, ie they are equal objects of the category.
This means we have to give rules defining equality of contexts which allow substitution of a term of one
type for a variable of an equal type. These simply extend the type equality rules (Definition 8.1.7).
omitted prooftree
environment
With the axiom for the empty context ([ ] = [ ]) we have all the rules for the additional forms of
judgement for formation and equality of contexts, because reflexivity, symmetry and transitivity are
derivable from those of type equality, and hence equality of substitution (into a type-symbol). Briefly,
contexts are well formed and equal iff the corresponding types are, in the preceding context, and the
variable names agree.
This satisfies
omitted prooftree
environment
is derivable, by inserting coercion rules (Definition 8.1.7(c)).
REMARK 8.2.4 Our convention is to retain the variable names - in the analogy of Remark 4.3.14, we use
coloured wires where, eg , [Pit95] numbers the pins. In accounts of the latter kind, [Γ,x:X] and [Γ,y:X]
are equal contexts. Nevertheless for us the rule
omitted prooftree
environment
is derivable: Remark 8.2.8 gives a canonical isomorphism between any two contexts which differ only in
these names (open α-equivalence).
Coloured wires, unlike numbered pins, may be permuted, but now our freedom to do this is restricted by
the type dependency. Of course, it is necessary that the two types be well formed in the same context,
omitted prooftree
environment
but this is also sufficient.
It would be nice to treat permutation as equality. A context would be no longer a list but a partially
ordered set, subject to the order relation implicit in Definition 1.5.4, namely that the variables occurring
freely in any type must be mentioned before it in the listing of the context.
Unfortunately the explicit listing is required for the semantics, since, even in a category with specified
products (or pullbacks), the choice usually cannot be made commutatively or associatively (Remark
8.3.1). Contexts which differ only by permutation are nevertheless canonically isomorphic, by a
substitution which is the identity on each variable. They also share the same clone, so to make the
isomorphism explicit the order of the variables must be specified.
Display maps As in Definition 4.3.11(b), the arrow class is generated by display maps and single
substitutions. The former arise from
omitted prooftree
environment
ie the addition of an unused variable at any valid position in a context, but by the permutations above it
may be taken to be the last. This rule was given in Definitions 1.4.8 and Remark 2.3.8 for propositions
and for types, respectively.
Weakening, like coercion, is a derived rule, since in all of the rules of the previous section variables may
be added to the context. Indeed only Definition 8.1.3(b) mentioned the context at all, but we added
passive Γ and Ψ to it to incorporate weakening.
{(Ψ,J)|Γ,Ψ\vdash J} ⊂ {(Ψ,J)|Γ,x:X,Ψ\vdash
J},
and we abstract this as (the contravariant action of) a display map:
omitted prooftree
environment
The hat was introduced in Notation 1.1.11 and used in Sections 1.5, 2.3 and 4.3. Display maps are
marked with open triangle arrowheads. The composite of zero or more displays will be indicated by a
double triangle arrowhead (→ [abut]) and [^(Ψ)] . The subscript on [^(x)] refers to the extension Ψ of
Ψ
Cuts The other class of generating maps consists of the substitutions of one term for a variable of the
same type in the same context. As the types may now have free variables, Definition 1.1.10 must be
extended to deal with substitution of terms into types and contexts. This is routine, but once again we
see the need for type-equality rules.
omitted prooftree
environment
Like coercion and weakening, it is a derived rule, because all of the rules are invariant under
substitution.
omitted prooftree
environment
which Definition 8.3.2 characterises semantically .
The context Γ of global parameters is needed in the cut rule in order to compose cuts. Let Γ\vdash X
type and Γ,x:X\vdash Y type; then
omitted prooftree
environment
where, by the first cut, Y[x: = a] is a well formed type in the context Γ# and we abbreviate Ψ′ = Ψ[x: =
a]. Similarly Ψ′′ = Ψ[x: = a][y: = b].
Laws In the Extended Substitution Lemma (Proposition 1.1.12) we must now say whether the
variables x and y may or may not occur free in the type-expressions, as well as in the terms. In the
following, y:Y is later in the context than x:X, and Y may in general depend on x.
The type Y may depend on x in ([^(S)]) but not in (\check S). We shall see in Lemma 8.2.10 that these
laws play different roles , with opposite senses as reduction rules. In fact (\check S) may be derived (as
an unoriented law) from ([^(S)]) and (R) .
Of course the object-language also contributes laws, but we shall regard them as part of the term
calculus (so a denotes an equivalence class modulo such laws) rather than of the sketch for the category
composed of substitutions.
Now that the types may be dependent it is essential to include them in the statement of the laws.
We must in particular ensure that the terms and the intermediate types are well formed. As usual,
commutative diagrams provide the clearest mode of expression; the little arrows in the corners of the
squares indicate the sense of the reduction rules.
The general forms of the laws, which we shall mark with a star, allow for an extension Ψ to the context
which may depend on y. For example the general form of (T*) is that the composite
^ ^ y
[Γ,Ψ] [y: = b][Γ, y: y * Ψ] → [^( )] [Γ,
Γ, y Ψ[y: = b] →
*
= Y, Ψ]
is the identity, but to show that this is well formed we need the version without Ψ first.
The following diagrams show how the (S*) and (W*) laws including the context extension Ψ reduce to
the simpler forms, where PP and [^(SS)] stand for as many squares as there are variables in Ψ.
REMARK 8.2.8 The final law justifies renaming and permutation ( exchange) of variables. It uses
contraction, ie the special case of cut where the term is a new variable of the same type:
Hence the renaming and exchange rules (Remark 8.2.4) are derivable, since we have shown how to turn
x into y and move it past Φ (which may be empty). []
Normal forms for morphisms We shall now prove Theorem 4.3.9 and Corollary 4.3.13 for dependent
types: each map in the category can be written uniquely as a multiple substitution in a certain normal
form. As usual the most natural way of expressing this involves a ground context Γ. Essentially, this
corresponds to the slice category Cn× ↓ Γ (Definition 5.1.8), but the appropriate notion is actually a
L
full subcategory, cf \Sub (X) ⊂ C↓ X (Remark 5.2.5), which is the slice relative to monos.
C
(a)
objects are contexts [Γ,Φ] extending Γ, so that the structure map [^(Φ)] :[Γ,Φ]→ Γ is always a
composite of displays, and
(b)
morphisms [Γ,Φ]→ [Γ,∆] are generated by displays and cuts which leave the context Γ
untouched, subject to the laws above.
LEMMA 8.2.10 Any map u:[Γ,Φ]→ [Γ,∆] in Cn× ↓ Γ may be expressed uniquely (up to the choice of
L
The structure requires less than that the semantic category have all finite limits, and we show that this
shortfall corresponds exactly to the addition of equality types to the language. In preparation for the
interpretation of and equivalence with syntax in the next section, we show how to present any semantic
category equipped with a class of display maps as a sketch analogous to that used to construct the
category of contexts and substitutions in Section 8.2.
Display maps In Section 4.6 the interpretation of a (simply typed) algebraic theory L in a category C
was a product- preserving functor Cn× → C. In the dependent type case the functor preserves (certain)
L
pullbacks, but there are a lot more of these, so how can the extension to type dependency be
conservative?
REMARK 8.3.1 The P pullback is simply the binary product Xx Y in the slice Cn× ↓ Γ, as in simple type
L
The second square was not mentioned before, as it was automatically a pullback by Lemma 5.1.2. This
is because when x ∉ FV(Y), as also for (\check S), the X-indexed union becomes XxY and the map
marked ν is a section of [^(x)] = π1:XxY→ Y ( cf Lemma 8.2.15(b)).
a
In the dependent case, ν is no longer a (generalised) element but the ath inclusion into a sum indexed
a
by X. Being a pullback expresses stability of this sum, as in Section 5.5, and ν is a regular mono, but
a
not necessarily split. We also see the semantic reason why the components of the normal form of a map
to [Γ,x:X,y:Y] come in the order [y: = b];[x: = a]: the element b ∈ Y[a] must be selected before it is
included in the sum.
Putting these together, to abstract Theorem 8.2.16 we need the pullback of any display map [^(y)]
against arbitrary maps in the category.
DEFINITION 8.3.2 Let D ⊂ morC be a class of maps of any category, which we write → .
Then we call D
(a)
a display structure, if for each d:X→ ∆ in D and u:Γ→ ∆ in C there is a given pullback square, in
which u*d ∈ D ;
(b)
a class of displays if every such square exists, with u*d ∈ D, and the class D is closed under
composition with isomorphisms. omitted diagram environment
We shall say that a C-map is a cut if, like ν above, it can be expressed as a pullback of a section of a
a
The class D need not be a subcategory, ie include all identities and be closed under composition. In fact
these properties of D say that there are base types isomorphic to singletons and dependent sums
(Remark 9.3.1). But the semantic classes of displays frequently are subcategories, and in practice it is
convenient and harmless to make this assumption.
Even when D is a subcategory, it is ``closed under pullbacks'' only in the sense that if the right hand side
of a pullback square in C lies in D then so does the left. The category D need not have pullbacks: we
have not required pullback mediators to lie within it (Exercise 8.13). In particular it need not be the M-
class for a factorisation system. Indeed if the cancellation property (u;d), d ∈ D⇒ u ∈ D ( cf Lemma
5.7.6(e)) were required, this would defeat the point of Definition 8.3.8 below.
Next we consider the extreme cases between which display structures interpolate.
EXAMPLE 8.3.3 Let C be a category with specified terminal object 1 and specified binary products. Then
the class D of maps that are specified left projections, π0:∆xX→ ∆, forms a display structure. The
resulting interpretation of the context [x1:\typeX1,…,x :\typeX ] is the left-associated product ((···(((1x
n n
\typeX1)x \typeX2)x \typeX3) x···)x \typeX ) which was defined in Remark 4.5.15.
n
PROOF: Given any map u:Γ→ ∆, let the specified product projection π0:ΓxX→ Γ serve as the choice of
pullback. The ``dependent'' types in this example are in fact constant. []
PROPOSITION 8.3.4 Let C be a category with a class of displays. If all product projections and pullback
diagonals are displays, then every map is (isomorphic to) a composite of displays, and C has all finite
limits.
^ :{y ,y |y = y } → [y ,y :
0 1 0 1 0 1
=
Y],
so Proposition 8.3.4 says that the classifying category of a generalised algebraic theory has all finite
limits (and all of its maps are isomorphic to displays) iff the theory has all equality types.
The presence or absence of equality types influences the quantifiers. In the syntax, the quantifiers are
always associated with a (bound) variable , and we shall find in Sections 9.3 and 9.4 that they are the
two adjoints to the weakening functor for that variable. Bill Lawvere, however, who first had this
insight, described them as adjoint to pullback d* against arbitrary maps d [Law69], and emphasised this
by discussing the diagram above [Law70, p. 8]. Quantification along a general function f in Set gives the
guarded quantifiers (Remarks 1.5.2 and 3.8.13(b)),
Perhaps the equality type itself may be acceptable in a computational setting, but the inequality type (x0
≠ x1) begins to raise doubts. To be able to distinguish positively between x0 and x1 suggests a
``Hausdorff'' condition on the type, ie that there is some computation f:X\rightharpoonup 2 that
terminates for these arguments (but not necessarily everywhere) with f(x0) = 0 and f(x1) = 1. In fact ∀[^
=] exists for X ∈ obSp (classically) iff it is locally Hausdorff (Example 9.4.11(e)). On the other hand,
∃[^ = ] exists iff X is a discrete space (Remark 9.3.13) .
Display maps in topology and elsewhere Since Definition 8.3.2 is a unary closure condition
(Section 3.8) there are lots of examples.
(a)
Product projections ( ie all legs of spans which have the universal property of a product) in a
category with finite products.
(b)
All maps in a lex category.
(c)
All instances of the order relation in a meet-semilattice.
(d)
Monos in Set. If we think of subobjects as subsets with canonical inclusions then these provide a
corresponding display structure. The pullback in this case is called inverse image.
(e)
A map d:X→ ∆ in any category is called carrable if the pullback u*d along every map u: Γ→ ∆
exists.
(f)
The carrable maps in IPO (domains with bottom) are exactly the projections (Example 3.6.13
(b)).
(g)
Inclusions of normal subgroups form a class of displays in Gp which is not closed under
composition.
(h)
Replete functors (Definition 4.4.8(d) ) form a class of displays in the 1-category of categories and
functors.
(i)
Subspaces, local homeomorphisms and open surjections (but not general continuous surjections)
of topological spaces. Example 9.3.10 shows how to use open maps to interpret the existential
quantifier. These and other classes may also be defined for toposes.
(j)
We may form the closure under pullbacks of any class whatever of carrable maps, to give a class
of displays.
(k)
More usefully, given a class of ``grue'' maps (for example closed maps between topological
spaces), we say that a morphism is stably grue if all pullbacks of it exist ( ie it is carrable) and are
grue. (In the modal logic of Section 3.8, stably means [], and part (j) said < > .)
Many of the examples of classes of displays of toposes may be described as an internal set, locale or
other structure in the target topos.
EXAMPLES 8.3.7
(a)
D. Lazard ( c. 1950) defined a sheaf as a local homeomorphism d:X→ ∆ between two topological
spaces. The fibres X[u] ≡ d-1(u) for u ∈ ∆ are discrete and may be regarded as the values of a
``variable'' set as u ranges over the space ∆. The relative slice Sp↓ ∆ ≅ Shv(∆) is a topos, so it
obeys the same (intuitionistic) logic as Set. The (global) sections ∆→ X of d are the global
elements of the set X, but these are inadequate to characterise it: we need to consider arbitrary
maps (generalised elements) Γ→ X (Definition 8.2.13).
(b)
The category Loc of locales has all finite limits, so that arbitrary continuous functions d:X→ ∆
may be used as displays. They correspond to internal locales in the topos Shv(∆).
notions of space over ∆ lying between the discrete and the general cases.
(c)
Algebraic lattices (Sections 3.4 and 3.9) provide the simplest notion of ``domain.'' Recall that
op
they are of the form Idl(C ), where C is a meet-semilattice, and the Scott topology is the frame
Cop
of monotone functions Ω . Martin Hyland and Andrew Pitts showed how to make these
domains ``variable'' [HP89], by allowing C to be an internal semilattice in Shv(∆). The topology
Cop Cop
of the domain is then A , the topology of ∆ being A; the topos display is d:Shv(∆) → Shv
(∆), where d* takes F ∈ obShv(∆) to λX:obC.F .
(d)
They also generalised this idea from propositions to types, replacing semilattices by lex
categories, ie the classifying categories of finitary essentially algebraic theories internal to Shv
(∆). A similar construction could be based on our generalised algebraic theories (classified by
display categories) instead.
In the next section we show that for each class of display maps there is a certain generalised algebraic
theory. The terms of the corresponding type in context are exactly the points of the variable space, and
the type theory allows us to reason about it as if it were a set. Unlike classical logic, no assumption is
built in that structures are determined by their points: they may have none globally. The ``points''
provided by type theory are terms or generalised elements (Remark 4.5.3). In this way dependent type
theory is applicable in general topology, topos theory and domain theory to justify more ``synthetic''
styles of argument.
Relative slices From the semantics we shall now move gradually back towards syntax, starting with the
analogue of Definition 8.2.9.
(a)
as objects ∆ the composable sequences of D-maps
Γ← \typeX1 ← \typeX2 ← ·· ·←
\typeX
n
(b)
and as morphisms the commuting triangles in C of the form omitted diagram environment where
α:∆→ Φ is any C-map.
So the forgetful functor src:C↓ Γ→ C is faithful and reflects invertibility. In particular, if C has a
terminal object, C↓ 1→ C is full and faithful. Don't confuse C↓1 with C↓ 1, which is trivially
isomorphic to C.
REMARK 8.3.9 Every context may be reduced to [ ] by successively omitting variables, so every object
of Cn× has a canonical sequence of displays down to the terminal object. Since it preserves displays,
L
the interpretation only makes use of a particular semantic object if it too has such a sequence. Cartmell
[Car86] focused on the sequences of displays, defining a contextual category to have a tree-structure on
its object-class (and functorial assignment of pullbacks). However, there may be isomorphic objects with
entirely different paths in the tree structure. Our relative slice category C↓ 1 has this tree structure, but
C need not.
No further hypothesis is needed concerning the existence of products in C↓ Γ, because they are given by
pullbacks of displays over Γ and so are guaranteed by the display axioms.
Using the contextual structure of the semantic category C↓ 1, its maps can be expressed in ``normal
form'' and this category is presented by a sketch in the same way as that used to define the syntactic
category Cn× . The proof is much easier than for the corresponding results of the previous section,
L
because this time we know in advance that P and \check S are pullbacks, whereas before we had to work
up to this from the special case in Lemma 8.2.10.
LEMMA 8.3.10 Every morphism β:∆→ Φ of C↓ Γ may be expressed as a composite of a sequence of cuts
(in the sense of Definition 8.3.2) of the same length as Φ, followed by displays corresponding to ∆. This
is unique up to unique isomorphism of the intermediate objects.
PROOF: The pullback on the left gives β = γ;[^(∆)] with γ;[^(Φ)] = \id :
∆
We decompose γ into a sequence of cuts by induction on (the length of) Φ, the base case [ ] being
trivial. The second diagram is the induction step, adding one display Y→ Φ, where
δ = γ;[^(y)] is already in normal form. The extra cut α is found as shown, and γ = α;δ . Finally, as in
y
Lemma 8.2.10, the number of displays involved is fixed by the source and target of the given map, and
the intermediate objects are determined (as pullbacks) up to isomorphism. []
PROPOSITION 8.3.11 C↓ Γ is given by a sketch with laws analogous to those called (P), (T), ([^(S)]),
(\check S) and (W) in Remark 8.2.7.
PROOF: These laws are needed to take a composite which is the wrong way round and rearrange it into
normal form. []
We have presented the syntactic and semantic categories by sketches of the same form. Now we shall
turn this into a categorical equivalence.
8.4 Interpretation
Now we shall interpret the language in a category with displays, and show that any such category arises
up to equivalence as Cn× for some generalised algebraic theory L, as we did in Sections 4.6 and 7.6.
L
Derivation histories in normal form Before giving the interpretation itself we must clarify the well
founded structure over which it is defined, because Example 8.2.1 shows that this is delicate. We must
take (equivalence classes of) histories, rather than the strings of operation- symbols and variables, as the
x
terms etc in \Clone .
L
REMARK 8.4.1 The rules of Section 8.1 may be reorganised according to the use they make of the object-
language (Definition 8.1.10). Let u:Γ→ ∆ be any morphism (substitution), [Γ,Ψ] an extended context
and x a variable which is not in Γ, ∆ or Ψ.
(a)
Each type-symbol (∆\vdash X type) ∈ L provides the following features:
❍ the type-expression Γ\vdash u*X, whose arguments are given by the components of u,
In the presentation of the syntax these were listed under different headings (8.1.6, 8.2.2, 8.2.5,
8.2.8 and 8.1.3(b) respectively), but they are obtainable from one another immediately, ie by the
use of a single rule of derivation. Types are equal ( 8.1.7) iff they result from equal substitutions
into the same type-symbol; similarly with context equality and coercion (8.2.3). omitted diagram
environment These are all aspects of the same thing, for which it is convenient to take the display
[^(x)]:[∆,x:X]→ ∆ as the primitive form. The other, substituted, forms are obtained from the
display by pullback.
(b)
The type of an operation-symbol may in general be a substitution instance \nearrow *X of a type-
symbol, given by the previous part. Each operation-symbol (∆\vdash r:\nearrow *X) ∈ L provides
(Definitions 8.1.3(a) and 8.2.6). We shall use the section [x: = r] of \nearrow *[^(x)] as the
primitive form of the operation- symbol. omitted diagram environment
(c)
Each law (∆\vdash f*r = g*s:h*X) ∈ L, where h = f;\nearrow = g;q, relates two terms of the same
type (Definition 8.1.4(a)); omitted diagram environment it extends to Γ,Ψ\vdash (u;f)*r = (u;g )
*s:(u;h)*X (congruence, 8.1.4(c)), but there is insufficient space to show this.
(d)
The (reflexive, symmetric and) transitive laws of term equality are also needed (8.1.4(b)), as are
the substitution laws (8.1.4(c) and 8.2.7). All of them feature in these diagrams, apart from (R),
which disappears along with the variables in the interpretation anyway.
(e)
Type and context equality and coercion are derivable.
omitted prooftree
environment
where J is a type-symbol, operation-symbol or law, cf the generalised cut rule (Definition 8.2.13). (If we
laid out the derivation as a tree in the way suggested by this rule there would be a great deal of
redundancy; the box style is more natural.)
(Γ)).
Similarly, for a map u ≡ [[(y)\vec]: = [(a)\vec]] in the form of Remark 8.2.12(a) we need
omitted prooftree
environment
where | ≡ [y1: = \arga1,…,y : = \arga ]: Γ→ ∆′ ≡ [y1:\typeY1,…,y :\typeY ] , \arga ≡ f*r and
n-1 n-1 n-1 n-1 n
\typeY ≡ g*Y. We have to prove commutativity of the square, ie that the maps are composable and their
n
composites are equal, in order to justify the typing. This uses the laws (which must be part of the
derivation tree) and the normal form theorem (Lemma 8.2.10): the derivations are of raw terms, but the
justification of such formations may rely on certain laws, the two sides of which themselves each require
derivation sub-trees.
Comparing this with the recursive paradigm (Definition 2.5.1 and Section 6.2), the sub-expressions to
be considered in practice are the types of the context Γ and the terms used in the substitution u:Γ→ ∆.
The latter are the arguments of the type- and operation-symbols as presented informally in Section 8.1.
The well-foundedness condition mentioned in Definition 8.1.10 is needed for the existence of
derivations.
PROOF: We have shown throughout Remark 8.4.1 how the rules (for the generalised algebraic fragment)
set out informally in Section 8.1 fit into the normal form. Weakening and cut (Definitions 8.2.5ff)
commute with the last rule in the normal form, increasing the height of the derivation history by at most
that of the substituted term, so histories are strongly normalising. []
The induction in this and later results is over the derivation tree, not just on the number of variables in Γ,
since the target ∆ of u may be a longer context than Γ. This allows for the possibility that nested
operation-symbols may have greater arity than the outermost one. The base case is the empty
context [ ].
In Section 7.4 we found the free model for equational algebraic theories with many but simple types as
the quotient of the associated absolutely free model by the congruence generated by the laws. Because of
the effect the laws have on types, this is not possible in the dependent case. We have to generate the well
formed instances of equality as syntactic entities in themselves, along with the types, terms, contexts and
substitutions, allowing coercions arising from such equalities. At the end these will form a congruence,
and the terms may finally be treated as equivalence classes. The derivation histories form a recursive
cover (Definition 6.2.2), but there is no canonical choice of history for any given term or type.
Interpretation To present the data for a model of a simply typed algebraic theory in Definition 4.6.2,
we needed only each object \typeA itself which was to be the denotation of the sort X, together with
X
some products of these objects to serve as the sources of the denotations of the operation-symbols. The
dependent type situation is much more complicated. Now, for the sort ∆\vdash X, the object \typeA is
X
the source of some semantic display map whose target is the interpretation [[∆]] . We must begin the
proof of the following theorem, ie the construction of [[-]]:Cn× → S by structural recursion, before
L
THEOREM 8.4.4 Let L be a generalised algebraic theory and let C be a category with a display
structure D. Then the interpretations of L correspond bijectively to functors Cn× → C↓ 1→ C which
L
preserve displays and the P and S pullbacks. The choice of pullbacks of displays defines the
interpretation up to equality. Homomorphisms correspond to natural transformations as in Example 4.8.2
(e).
PROOF: Using structural recursion over the derivations in Remark 8.4.2 the interpretation given in
Remark 4.6.5 extends to dependent types:
(a)
The base case is the empty context [ ], which is interpreted as the terminal object 1 of C.
(b)
The (displays corresponding to) type-symbols ∆\vdash X type are given.
(c)
Substitution along u:Γ→ ∆ and [^(Ψ)] :[Γ,Ψ]→ Γ uses the chosen P and S pullbacks of display
maps against arbitrary maps in C; cut and weakening are sound.
(d)
The (sections of displays corresponding to) operations are given.
(e)
The laws of L are given. Any provable equality has a derivation tree, by induction over which the
terms have equal interpretations.
(f)
The laws T, P, S, W and R (the Extended Substitution Lemma) hold by Proposition 8.3.11. []
COROLLARY 8.4.5 To interpret a particular context, type, term or morphism only finitely many choices
of pullback are used. Hence such fragmentary interpretations exist even when no global choice of
pullbacks is provided, and are unique up to unique isomorphism ( cf the proof of Theorem 7.6.9(b)).
Also, when the interpretation of a type in a context has been chosen, the meaning of its terms is fixed up
to equality. []
Canonical language Conversely, we shall show that every category with a rooted class of displays is
equivalent to the category of contexts and substitutions for some generalised algebraic theory. This may
be read off from the sketch in Proposition 8.3.11, but we need encoding operations (Section 7.6) to say
that the P and S squares are pullbacks. As in Sections 4.2, 7.6 and 8.3, we shall temporarily use Greek
letters for maps of C, the semantic category. German letters still denote syntactic morphisms. We shall
not make a notational distinction between the objects of C↓ 1 and the type-symbols and contexts which
they name; type expressions are flagged by the presence of a substitution action.
DEFINITION 8.4.6 Let C be a small category with a class of displays D. The canonical language L(C,D)
is the following theory:
(a)
Essentially, each display map X→ ∆ gives a type-symbol which we shall call just X (though this
is an abuse of notation).
+1,
names a context-symbol
\qq Φ ≡ [x1:\typeX1,x2:\typeX2,…, x :\typeX ,x :\typeX
n n n+1 n
+1],
in which all of the types are type- symbolswith arguments exactly the preceding variables. The variable
names are chosen arbitrarily.
Remark 8.4.1(a) put no restriction on the defining context ∆ of a type-symbol: its types could be any
substitution instances. Here we only introduce type-symbols in contexts which themselves consist of
type- symbols alone. Since no substitution operations are used, no pullbacks are needed to construct the
interpretation of these contexts and type-symbols in C; in fact we recover [[\qq Φ]] = Φ.
(b)
Each section of a display names an operation-symbol:
α
∆→
←X ∆ \vdash \qq α:
X
3. Why, with only O and H as types, is the theory of categories with pullbacks not a generalised
algebraic theory as defined in Section 8.1? Describe such a theory using an additional type of
commutative squares or triangles. This is not a stratified theory - do you think that any theory of
pullbacks could be?
4. Formulate the notions of plain, distributive, positive, division and power allegory in [FS90] as
stratified algebraic theories. Why is the theory of tabular allegories not stratified?
5. Formulate the generalised algebraic theory of generalised algebraic theories ( cf Remark 8.4.1).
Extend this to the theory of a theory together with a model.
6. Show that coercion (mentioned in Definition 8.2.3), weakening ( 8.2.5) and cut (8.2.6) are
derived rules of the calculus of Section 8.1, in the external sense that if there is a valid deduction
of the premise then there is also one of the conclusion. Explain what one must check in order to
show that extensions of the calculus retain these properties, and verify them for the sum and
product rules.
7. Deduce the starred laws in Remark 8.2.7 from their unstarred forms using Lemma 8.2.15. [Hint:
writing h:Ξ→ ∆ for the unstarred commutative square, and f,g:[Ξ,h *Ψ]\rightrightarrows [∆,Ψ]
for the two sides of the starred one, show that f = \id * ;\h = g .]
h Ψ Ψ
8. Let e = [x: = a] and m = [^(y)] where Γ\vdash a:X and ∆\vdash y:Y. Show that the diagonal fill-in
for the orthogonality property (Definition 5.7.1) exists. [Hint: use Lemmas 5.7.10 and 8.2.10. ]
omitted diagram environment Show that if e⊥[^(y)], ie the fill-in is unique, for all displays [^(y)]
y: = a] and g = id and show that both [y: = x] and [y: = a] are fill-ins.]
9. Show that every map f:[Γ;Φ]→ [Γ, ∆] such that f;[^(∆)] = [^(Φ)] ( ie the triangle in
Definition 8.2.9 commutes in the whole category) can be expressed as a composite of displays
and cuts which each leave Γ untouched.
10. Show that (∆,J)→ ([x:X,∆], J) defines a functor [^(x)]!:Cn× ↓ [Γ,x:X]→ Cn× ↓ Γ with [^(x)]!
L L
\dashv [^(x)]*.
11. Show that pullback along any map u:Γ→ ∆ extends to a functor u*:C↓ ∆ → C↓ Γ.
12. Prove the normal form theorem for (semantic) relative slice categories (Proposition 8.3.11).
13. Let D ⊂ C be a class of display maps. Let \typeX(-):ℑ→ D be a diagram where ℑ is a finite
oriented graph which, qua unoriented graph, is simply connected. Show that this diagram has a
limit in C and that the limiting cone consists of D-maps. Assuming that D is a subcategory closed
under cofiltered limits, extend the result to arbitrary simply connected diagrams.
14. Draw the diagram which introduces the variable qua term of an arbitrary instance of a type.
15. Explain how the definition of a generalised algebraic theory reduces to Definition 4.6.2ff in
simple type theory. Similarly compare the notions of interpretation and homomorphism,
reconciling the presentation of laws as commutative polygons and as pullbacks. How does the
canonical language reduce to Section 7.6 in simple type theory?
16. Show that, once the interpretation of the display map [^(x)] is fixed, changing those of the
auxiliary objects by isomorphisms does not affect the interpretation of a.
INTRODUCTION
● Comprehension
● The type of propositions
9.6 UNIVERSES
EXERCISES IX
Chapter 9
The Quantifiers
Gentzen liberated the quantifiers from their old metaphysical interpretation as infinitary truth-functions
and expressed them as elementary sequent rules. In Chapter I we showed that his natural deduction
really does agree with the mathematical vernacular. Dag Prawitz, Nikolas de Bruijn and Per Martin-Löf
developed calculi which their followers have used to formalise many mathematical arguments, and even
for programming. These help us to understand very clearly the extent to which equality, the quantifiers,
powersets and the other components of logic are actually employed.
We have claimed that universal properties in category theory deserve to be called ``foundational'' like
the logical connectives. In a tradition which began with the axiomatisation of Abelian categories,
Chapter V showed that sum types and relational algebra also have an intimate connection with the
idioms of mathematics. Employing a technology from algebraic geometry, Bill Lawvere saw that the
rules for the quantifiers say that they too are characterised by universal properties. By similar methods,
Jean Bénabou reduced the infinitary limits and colimits in category theory to elementary form. Robert
Seely concluded this phase of development by extending the logic of Remark 5.2.8 to locally cartesian
closed categories.
The introduction by Jean-Yves Girard and Thierry Coquand of calculi that have quantification over all
types and even universal types forced a rethink of Lawvere's ideas. These calculi do not have direct set-
theoretic interpretations, so many sorts of domain, sometimes themselves categories, were investigated
in the search for their semantics. As a result, we now have very efficient ways of calculating quantifiers
in semantic categories , which we shall illustrate for Sp (such a quantifier was found in 1965). Together
with the previous one, this chapter makes the link between categories and syntax precise, to the point
that we cease to distinguish between them notationally.
The study of these powerful calculi has given us ways to tackle other ``big'' questions in logic. In
category theory, what does it mean to treat categories such as Set and functors between them as single
entities? Are the axiom of replacement and large cardinals in set theory of relevance to mathematical
constructions? How can we understand Gödel's incompleteness theorem on the one hand, and the proofs
of consistency afforded by the gluing construction on the other?
Although this book has kept its promise to remain within or equivalent to Zermelo-Fraenkel set theory,
our study of the roles of propositions and types in the quantifiers enables us to change the rules of
engagement between them. Semantically, we may look for pullback-stable classes of maps in topology
and elsewhere which obey the rules for the quantifiers, and then start to use the notation of logic to
describe geometric and other constructions. Syntactically, the blanket assumption of such a clumsy
framework as ZF for mathematical reasoning can be fine-tuned to capture exactly the arguments which
we want to express, and thereby apply them in novel situations. It seems to me that the quantifiers and
equalities on which Cantor's diagonalisation argument and the Burali-Forti paradox rely are stronger
than are justified by logical intuition. We now have both the syntax and the semantics to weaken them.
Such a revolution in mathematical presentation may be another century away: excluded middle and
Choice were already on the agenda in 1908, but the consensus of that debate has yet to swing around to
the position on which such developments depend. On the other hand, the tide of technology will drive
mathematicians into publishing their arguments in computer-encoded form. Theorems which provide
nirvana will lose out to those that are programs and do calculations for their users on demand. A new
philosophical and semantic basis is needed to save mathematics from being reduced yet again to
programming.
Throughout the book we have stressed the formal analogy between types and propositions. The only
distinction between them in Chapter VIII, which set up the algebraic formalism for dependent types and
its relationship with category theory, was the rather superficial one that proofs of propositions are not
distinguished, so their displays are mono.
REMARK 9.1.1 The theme of this chapter is not the extensional difference between all maps and monos,
but the separate roles which sets and propositions play in the quantifiers, comprehension and powerset.
(a)
Elements (terms) of the same set (type) are distinguished, but proofs are anonymous. (In
Section 2.4 we mentioned the possibility that, by adding proof-terms, one might extract programs
from proofs.)
(b)
Predicates φ[x] depend on set-variables but not vice versa , so we may rearrange any context to
put the set-variables first and the propositions dependent on them afterwards. This is an important
technical simplification, which we shall exploit in Section 9.2.
(c)
Some forms of quantification, Σx:X.Y[x], ∃x:X.φ[x], Πx:X.Y[x] and ∀x:X.φ[x], are allowed
(Sections 9.3 and 9.4), but not others such as ∃y:φ.X[y], because of (b),
(c)
and we may extract a witness x:X from a term of type Σx:X.Y[x] but, by (a), not from the
provability of ∃x:X.φ [x].
(d)
We may form the comprehension {x:X|φ[x]} of a predicate on a set, giving a (sub)set, and
(e)
there is a set Ω whose elements name propositions (Section 9.5).
(f)
There may even be a type of all types - if for ``type'' we allow a domain with fixed points instead
of a set with equality (Section 9.6) - but there is no proposition of propositions or proposition of
sets.
We say that sets and propositions are types of two kinds (some authors say sort or order). These
differences and the generality of which they are examples are sometimes presented in an extremely
abstract way. Henk Barendregt uses *, [] and like symbols for kinds in describing his pure type systems
[Bar92]. We end up proposing the use of such a formalism for semantic reasons, but we will keep the
familiar terminology of the predicate calculus for the sake of motivation.
CONVENTION 9.1.2 However, we employ the words ``proposition'' and ``type'' as variables. In each
section, these kinds are only required to obey the conditions of that particular section. They may in fact
stand for the same kind - φ may be a type like X - but we use this predicate convention to show where
they are potentially different. What we say in propositional notation applies mutatis mutandis to types -
in fact simply by transliteration of X for φ etc .
Like characters in a Greek tragedy, prop, type, * and [] act out dramas about the essential interactions of
things, in which their own identity is not relevant. In particular, the principle (a) of proof-anonymity is
only used at one point in this chapter, namely in the discussion of the η-rule for the powerset (Remark
9.5.6). Assuming proof-anonymity would in fact make the second half of Section 9.4 pointless.
Classes of display maps Chapter VIII set up the interpretation of types in context ( Γ\vdash X type) as
pullback-stable ``display maps'' X→ Γ. The partition of the class of all types into various kinds is
handled by equipping the category with two or more classes of displays.
x
([Γ,x:X]→ [^( )]Γ) ∈ D and ([Γ,φ]\hookrightarrow Γ) ∈ M
because this would mean that proofs were anonymous, which, as we have said, we shall usually not
assume. There are, for example, realisability models of the predicate calculus in which proofs are
distinguished and displays for propositions are not monos.
Note that nothing that we did in Chapter VIII mixes up the two classes, since there we considered
displays one at a time. This unary closure condition gave rise to the []-modality ``stably'' in Example
8.3.6(k). It is the union of the kinds to which Theorem 8.4.10 refers.
The same semantic display map may belong to several classes, and be used as the interpretation of types
of different kinds. In particular, we shall set out the theory of the existential quantifier and the type of
propositions in detail, and deduce that of the dependent sum and type of types from it by substituting M
= D and prop = type. Proof-anonymity is itself the parameter which distinguishes ∃ from Σ as we
understand them semantically, so of course we are careful not to assume it.
Fibrations We shall continue to work in the category of contexts and substitutions developed in the
previous chapter, but now these contexts consist of types and propositions.
REMARK 9.1.4 Do not confuse [x:X,y:φ[x]] with {x:X|φ[x]} , which is the subset formed by
comprehension. Compare the ``virtual objects'' consisting of program-variables and midconditions
introduced in Remark 5.3.3: semantically, a context is not just a set but a subset of a particular
``ambient'' set. The semantics of the category of contexts is not Set but the comma category M↓ Set
(Definition 7.3.8).
Until Section 9.5, where we consider comprehension as a type-forming operation, this means that the
essential logical information is in a sense duplicated in the category. Exercise 9.9 explains how the
results in Section 9.3 about factorisation systems in M↓ S relate to S itself.
REMARK 9.1.5 Cn× , which consists of set-only contexts, is called the base category. For each set (or
type
each context Γ consisting only of sets), there is a subcategory of this big category that consists of the
predicates φ[[(x)\vec]] over Γ; it is called the fibre P(Γ). For each function u:Γ→∆, there is a
substitution or inverse image functor u*:P(∆)→ P(Γ); the assignment u→ u* provides a functor P(-):
op
(Cn× ) → Cat, which is known as an indexed category. The big category Cn× whose
type type|prop
contexts consist of both types and propositions collects all of these fibres together. There's a proposition-
erasing functor P:Cn× → Cn× , called a fibration. Beware that P and P are in different
type|prop type
typefaces!
Bill Lawvere and Jean Bénabou used fibrations to study the quantifiers and infinitary limits and colimits.
We show in Section 9.2 that fibrations capture the independence of types from propositions (Remark
9.1.1(b)), but this is not relevant to the quantifiers (it is to comprehension).
The quantifier formation rules As for the predicate and λ-calculi (Sections 1.5 and 2.3), we present
the syntactic rules for the quantifiers in both the box and sequent styles.
REMARK 9.1.6 The quantifier formation rule binds a variable, so the box begins with a context-
formation (Definition 8.2.2).
which bind the variables x and y, so they are subject to α- equivalence, as are the type formation rules.
Corresponding to the formation rules, there are also type and term equality rules, such as
omitted prooftree
environment
There is no necessary connection between the use of the name x in the quantifier Qx.φ and in its terms:
they may be renamed separately. That we don't is one way in which we break Convention 1.1.8 about
reusing the name x. It is also broken by the way in which we express the rules for the quantifiers,
especially in the adjoint form; this is inevitable, as they are all about passing from a world with x to one
without, and back.
As in Section 1.6, types and terms may be imported into boxes from above, and exported if wrapped in
the quantifier or indirect operation.
Adjointness in foundations These are symbolic trivia: what we really want to know is how the
indirect, β- and η-rules correspond to the universal properties of ve and ev.
REMARK 9.1.7 Bill Lawvere finally brought symbolic logic into the heart of mathematics by recognising
the bijective correspondences
REMARK 9.1.8 Since universal properties describe objects in category theory only up to isomorphism, we
shall understand notation such as ∃x.φ in the same way: it means any object ψ which satisfies the
introduction, elimination, β- and η-rules for an existential quantifier for the predicate φ[x] over the type
X. This need not be the string ``∃x.φ,'' just as X→ Y in Section 4.7 did not have to be literally an arrow.
In fact the symbolic rules also characterise ψ only up to isomorphism.
Following the preference stated in Chapter VII, we shall discuss the universal properties of the
quantifiers diagrammatically, instead of using Lawvere's adjunctions.
Substitution and the Beck-Chevalley condition We are used to substituting directly under the
quantifiers: Definition 1.1.10(d) said that u*∃x:X.φ is equal to ∃x:u*X.u*φ. We don't state such a rule
here because to assert equalities between types may conflict with their possibly different histories of
formation.
REMARK 9.1.9 Instead of asserting a substitution rule directly for the types, we give stronger indirect
rules for the terms. Then, as for all universal properties, we prove that u*Qx:X.φ ≡ Qx:u*X.u*φ, in a
unique way which commutes with the structure.
and we want the diagram on the right to commute, in the sense that a certain natural transformation
(provided by the universal properties of Qx and Qx ) is invertible. This equation between types is
Γ ∆
known as the Beck-Chevalley condition, although it was Lawvere and Bénabou who identified it in
categorical logic, respectively attributing it to Jon Beck and Claude Chevalley because of analogous
This property is often presented as an additional burden: something extra to be checked after the already
heavy labour of the construction of the concrete objects to be used in the interpretation. Of course, if you
choose to define the quantifiers categorically as the adjoints to substitution, then this condition does
need verification. But we give another characterisation of ∀ with the condition built in, so it can do the
work for us: we choose the values of quantifiers of the form ∀x:X.φ where X and φ are type- and
proposition- symbols, and then the substituted forms ∀x:u*X.\nearrow *φ are derived from the Beck-
Chevalley condition.
The recursive definition of interpretations Theorem 8.4.4 showed how to use the history of
formation of types and terms to interpret generalised algebraic theories in categories with display maps.
REMARK 9.1.10 The quantifiers and their terms contribute new cases to this structural recursion, just as
the simply typed λ-calculus in Remark 4.7.4 extended the interpretation of algebra in categories with
products. By the recursion hypothesis, we already have
REMARK 9.1.11 For the sake of making [[-]] into a bona fide functor, let us consider briefly how to
choose a particular object from the isomorphism class which the universal property provides. In
Zermelo type theory (Remark 2.2.4) this may be done by fiat for ∃, as comprehension names canonical
subsets for the image factorisation.
The formal rules as we give them say that the Beck-Chevalley condition only holds up to unique
isomorphism, even in the syntax. Without some trick, ∃x:X.φ must be regarded as a new proposition-
symbol, even when X and φ are type- expressions: f*∃x:X.φ is a substitution-instance of it, but ∃x:f*X.f*φ
is not.
We would like to find some way of prescribing interpretations to make the Beck-Chevalley condition
hold up to equality. Although we have introduced it semantically, this is really a symbolic problem: does
cut or substitution commute with the type formation rules? If it does then once again we need only
choose the results of quantification at type- symbols. Recall that Theorem 8.4.4 did this for unquantified
type-expressions: given a choice of semantic displays at type-symbols, it used pullbacks anchored there
to interpret type-expressions.
Since f*\propext (\nearrow *φ) = (f;\nearrow )*\propext φ, there is no difficulty with comprehension (or
the powerset). In the cases of ∃ and ∀, substitutions may be embedded in two places: for the range and
the body. Corollary 9.4.15 shows that
Separating propositions from types Instead of beginning, as is usual, with the formal definition of
fibrations, let us first consider the feature of the predicate calculus which lies behind it (Remark 9.1.1
(b)); Bart Jacobs called this a propositional situation [Jac90].
theory L (Definition 8.1.10), the members of the second class being styled ``propositions.'' Then L is said
to admit a division of contexts if
(a)
the types do not depend on propositional variables,
omitted prooftree
environment
(b)
the type terms do not depend on propositional variables,
omitted prooftree
environment
(c)
and likewise the laws for type terms,
omitted prooftree
environment
omitted prooftree
environment
the type variables in a context may be listed first, with the propositions following. This is often indicated
by a vertical bar:
x:X,y:Y,z:Z,…|p:φ,q:ψ,r:θ,…\vdash ···
or, briefly, Γ|Φ\vdash J and [Γ|Φ].
We shall study the full subcategory Cn× ⊂ Cn× of divided contexts. The conditions say that
type|prop L
this inclusion is an equivalence, so our results about divided contexts actually apply to the whole
category.
We shall write → for the display maps corresponding to the types and \hookrightarrow for the
propositions (Notation 9.1.3). According to the predicate convention, the latter are not necessarily
monos.
x y
LEMMA 9.2.2 Every pair of displays ΓφX→ [l = 2.5em][^( )]Γφ\hookrightarrow [l = 2.5em][^( )]Γ is part
of a ( P-) pullback,
omitted diagram
environment
and every section of [^(x)] is the (\check S) pullback of a unique section of [^(x)].
φ
PROOF: ΓφX ≡ ΓX′φ by (a) and the R equation (Remark 8.2.8). The second diagram follows from (b) and
uniqueness from (c). []
Parts (b) and (c) of Definition 9.2.1 say that, given a morphism of divided contexts, we may erase all
propositional information: this is well defined as a functor, the fibration. So the tree structure in John
Cartmell's contextual categories (Remark 8.3.9) may be pruned to the types. We write → for the
fibration, as it ``displays'' propositions over types.
PROPOSITION 9.2.3 Let Cn× be the full subcategory consisting of those contexts which only involve
type
where P forgets the propositional part of a divided context and T gives an empty propositional part to a
type-only context. Cn× is called the total category and P the fibration.
type|prop
x
For each object Γ of Cn× (known as the base category), the fibre is the relative slice Cn ↓ Γ, whose
type
objects are contexts of the form [Γ|Φ] and whose morphism act as the identity on Γ. TΓ ≡ [Γ|] is the
terminal object of the fibre over Γ.
(a)
Syntactic substitution defines a functor
x x
u*: Cn ↓ Γ→ Cn ↓ ∆ for u:∆→ Γ; moreover
(b)
Semantically, the analogous operation is pullback (inverse image), but this is only defined up to
unique isomorphism.
PROOF: Notice that P cannot be defined for types which depend on the omitted propositional variables.
The other two parts of Definition 9.2.1 are needed to define P on morphisms involving type operation-
symbols, such that the laws are respected. The substitution functors were defined in Section 8.2 and TΓ
is terminal in the relative slice by Lemma 8.2.10. []
EXAMPLES 9.2.4
(a)
Sets and predicates. The fibre over Γ is the Lindenbaum algebra of predicates with free variables
in Γ with respect to the provability order; TΓ is the constantly true predicate. Semantically, the
fibre is the powerset P(Γ), consisting of subsets of [[Γ]] . The substitution functor u* is the
inverse image u-1; notice that it preserves ⇒ , ∨ and ∧, and has adjoints on both sides
(Remark 3.8.13(b)).
(b)
In declarative programs, the types are those of the run-time program-variables but the
propositions or midconditions only appear in the analysis, which may perhaps be carried out by a
proof- checking compiler (Remark 5.3.3). The fibration P erases the midconditions (on objects),
and the correctness proofs from programs (morphisms). The substitution functors give the
weakest precondition interpretation of Remark 4.3.5ff.
(c)
If the types of both kinds are independent of each other's terms then Cn× = Cn× xCn×
type|prop type
, the two projections being fibrations. The normal form theorem (Lemma 8.2.10) expresses
prop
(d)
Three or more kinds obeying the analogous exchange rules give rise to three-part contexts [Γ|Φ|
Λ ], and to a composition of fibrations.
(e)
Let L = (Σ,\triangleright ) be a propositional Horn theory (Section 3.7). Suppose that
Then there is a division of the contexts in the classifying semilattice of L (Theorem 3.9.1).
(f)
The classifying category Cn× for the theory of rings has lists of polynomials as morphisms.
type
Contexts for the theory L of rings-with-modules divide into two parts, of which the first refers
purely to rings (Exercise 4.23). The dependency of modules on rings is not at the type level, but
is due to the term (action) RxM→ M.
(g)
The theory of categories (Example 8.1.12) admits a division into objects (O) and morphisms (H
[x,y]), if the operation-symbols src and tgt are omitted from the theory ( cf Example 8.2.1 and
Exercise 8.1). Similarly the theory of 2-categories is fibred over the theory of categories, dividing
natural transformations from functors.
(h)
Let L consist of the sorts and operation- symbols of an algebraic theory. Then the laws of the
type
given theory, together with reflexivity, transitivity, symmetry and congruence with respect to
each of the operation-symbols, may be formulated as L (Remark 7.4.10).
prop
(i)
More generally, let L be a conservative extension of a generalised algebraic theory L : the
type
type-symbols of L are defined in type-only contexts, as are all operation-symbols and laws of
type
L whose types are in L . The levels of a stratified algebraic theory (Definition 8.1.11) are
type
(j)
Let C be a category with two classes D and M of displays (Definition 8.3.2) satisfying
Lemma 9.2.2. Then there is a fibration C\relcomma 1→ C \relcomma 1 of relative slices
D|M D
(Definition 8.3.8).
We take one example separately from the rest, since for Jean Bénabou it was the paradigm.
i:I| x:X[i] .
The division is not intrinsic but imposed by us arbitrarily, between the indices and what they index. The
fibration forgets everything but the indices, and the substitution functors perform ``re-indexing.''
Jean Bénabou explained how the dependent sum Σi:I.X[i] and product Πi:I.X[i], which we study in the
next two sections, give an elementary axiomatisation of coproducts and products of infinitary indexed
families of sets or other mathematical objects. We regard X[i] and Σi.X[i] as idiomatic notation for the
associated display map. The previous chapter developed this notation and its formal interpretation.
In practice, more than one such suffix (i:I, j:J, ...) is often needed, so it is better to write [Γ | x:X] than to
collect I, J, ... as a single type IxJ or Σi:I.J[i] (category theory exists to eliminate suffixes from
mathematics). There may also be several I-indexed families (\typeX ), (\typeY ), ..., from which we may
i i
for example need to select tuples, so these too we keep together in the many-type divided context [Γ|Φ].
I find this example very confusing as the main use used to demonstrate the theory of fibrations, for one
thing because the category of sets is too special to illustrate many of the difficulties, such as the Beck-
Chevalley condition and quantification over equality (Remark 8.3.5). But, in terms of our definition,
lemma and proposition above, how can the principle of independence possibly apply, when the types on
both sides of the division are of the same kind (sets)? Making such a division cannot be expressing any
semantic fact about the theory of sets: it is a convention:
In the process of inventing a formal language with divided contexts that uses displays of sets on both
sides, we ``taint'' the displays by their use in the two roles: we take the same class twice, or one class and
a subclass M of it, and mark the copies as ``types'' and ``propositions.'' Then the formal language ( ie the
arguments we permit ourselves to write in it), is artificially restricted according to the rules in
Definition 9.2.1. The axiom of comprehension forgets this restriction (Section 9.5).
Fibrations The ``substitution'' functors have a universal property, which (like adjunctions) we may
capture without making choices of them.
(a)
For any object Γ ∈ obS, a map g:Φ→ Ψ with
PΦ = PΨ = Γ and Pg = \id in C is called vertical. The subcategory P-1(Γ) is known as the fibre
Γ
PΦ = Γ and its maps are the vertical ones. A morphism that is merely made invertible by P (not
necessarily an identity) is called pseudo-vertical.
(b)
A morphism \nearrow :Ψ→ Θ in C is said to be horizontal, prone or cartesian if it has the
universal property illustrated in the right hand diagram below, in which
| = P\nearrow :∆ = P Ψ→ Ξ = P Θ: given any map Φ→ Θ whose image factors into | in the sense
of forming a commutative triangle in S as illustrated, there is a unique fill-in Φ→ Ψ such that the
upper triangle commutes and the image is the given map Γ→ ∆.
(c)
The functor P is a fibration if for every object Θ ∈ obC and map
(d)
Dually, s:Φ→ Ψ is called op- horizontal, supine or cocartesian if it has the property shown on
the left; P is an op-fibration if each base map u:Γ→ ∆ has a supine lifting at each object Φ
over Γ.
op op
So P:C→ S is an op-fibration iff P:C → S is a fibration.
(e)
A functor which is both a fibration and an op-fibration is known as a bifibration. This is the case
iff the substitution functors are adjunctible, the unit and co-unit being the comparisons between
prone and supine maps whose targets and sources agree, respectively.
(d)
A hyperdoctrine is a fibration P:C→ S where S and the fibres \nearrow [thick](Γ) for Γ ∈ obS
are locally cartesian closed, and the substitution functors u*:\nearrow [thick](∆) → \nearrow
[thick](Γ) for u:Γ→ ∆ in S have adjoints on both sides obeying the Beck-Chevalley conditions.
(As we do not follow Lawvere's approach, we don't use the word hyperdoctrine.)
By the same argument as for Theorem 4.5.6, prone liftings are unique up to unique isomorphism; indeed
pseudo-vertical and prone maps form a cartesian factorisation system (Definition 5.7.2 and
Exercise 9.5). The fibration is recovered from the fibres and substitution functors by the following
Grothendieck construction.
op
PROPOSITION 9.2.7 Let \nearrow [thick](-):S → Cat be any functor, where we write u*:\nearrow [thick]
(∆)→ \nearrow [thick](Γ) instead of \nearrow [thick](u) for the action of maps in S. Then the following
define a fibration P:C→ S:
(a)
the objects of C are pairs, which we write as [Γ|Φ], consisting of Γ ∈ obS and Φ ∈ ob\nearrow
[thick](Γ); and then P[Γ|Φ] = Γ;
(b)
the morphisms [Γ|Φ]→ [∆|Ψ] are also pairs [u|f], where u:Γ→ ∆ and f:Φ→ u*Ψ in \nearrow
[thick](Γ), and then P[u|f] = u;
(c)
vertical morphisms are those of the form [id| f], and the fibre over Γ is isomorphic to the category
\nearrow [thick](Γ);
(d)
the prone lifting of |:∆→ Ξ at [Ξ|Θ] is
\v ≡ [||\id * ] :[∆||*Θ]→ [Ξ|
Θ | Θ
Θ]
(e)
the composite [u|f];[||g] is [u;||f;u*g] ( cf Definition 8.4.6(f)).
EXAMPLES 9.2.8 As the data for Proposition 9.2.7 consist merely of an assignment of a category to each
object of the base category (in a functorial way), examples of fibrations are not difficult to find.
(a)
For sets and predicates, the total category is the comma category M↓ Set (Remark 9.1.4), of
which an object is a set with a subset.
(b)
→
For indexed families (Example 9.2.5), tgt:Set → Set is the fibration. (Section 9.5 takes up these
two examples.)
(c)
Any fibration is a replete functor (Definition 4.4.8(d)), so its strict pullback U*P against any
functor U:A→ S is equivalent to the pseudo-pullback (Definition 7.3.9). This is also a fibration,
whose fibres over A ∈ obA are isomorphic to those over UA ∈ obS. For example, π1 = U*tgt in
the gluing construction ( Proposition 7.7.1).
(d)
op
A presheaf S → Set is an indexed discrete category. It gives rise to a discrete fibration,
characterised by the fact that all vertical maps are identities, eg Remark 7.7.8. Any indexed
op
category S → Cat is a (pre)sheaf of categories, or an internal category in Shv( S).
(e)
In an indexed groupoid, all maps are prone, and all vertical maps are isomorphisms. The
restriction of the fibration to each slice ( sic, not fibre) C↓ Θ → S↓ PΘ is an equivalence of
categories ; any functor with the latter property is called an isotomy and is weakly equivalent to a
fibration with groupoid fibres.
(f)
A groupoid homomorphism is a fibration iff it is an op-fibration iff it is replete. The substitution
functors are equivalences.
(g)
Any continuous function f:X→ Y between spaces induces a homomorphism of their fundamental
groupoids π1(f):π1(X)→ π1(Y). This is a fibration iff the weak path lifting property holds: for
every point x ∈ X and path q:I ≡ [0,1]→ Y with f(x) = q(0), there is some path p:I→ X with p(0) =
x whose image fop is homotopic (relative to the endpoints) to q.
Models In line with the theme of the book, we have so far considered fibrations of classifying
categories Cn× , but these underlie much more familiar fibrations of categories of models , for which
L
there is a richer structure. The typical example is the theory of rings-with-modules, in which the fibre
over each ring is its category of modules.
PROPOSITION 9.2.9 Let L ⊂ L be generalised algebraic theories, one a conservative extension of the
type
other. Then there is a bifibration of their categories of models, which has adjoints on both sides.
(a)
In the final algebra for the larger theory (L) in the fibre over R, the interpretation of each
``proposition'' is a singleton.
(b)
For an algebra N over S, the restriction along R→ S has the same interpretation of ``types'' as R,
but the `` propositions'' and actions are obtained from those of S by substitution.
(c)
The initial algebra over R is that generated by R together with the ``propositional'' operation-
symbols, modulo ``propositional'' laws,
(d)
and induced algebras are computed in a similar way. []
EXAMPLES 9.2.10
(a)
Let u:R→ S be a homomorphism of commutative rings and M, N be modules for R and S
respectively. Then
u,id)
RxN→ [l = 3em]( SxN→ [l = 3em]
N
is the action of the restrictionof Nto R, and the left adjoint M→S⊗ Mis known as induction(not in the
R
logical sense).
(b)
Let
Σ=Σ +Σ be a divided Horn theory, M,N ⊂ Σ two closed subsets and u:R = M∩Σ ⊂S
type prop type
= N∩Σ . Then the initial algebra in the fibre over R is the closure of R with respect to the
type
larger theory; the final one is R+Σ . The restriction of N is R+N∩Σ , and the induction of
prop prop
For finitary theories, the restriction functors preserve filtered colimits. Restriction and induction maps
between algebraic lattices are called projections and embeddings respectively (Exercises 9.10ff). []
Coherence issues For any (``semantically given'') fibration, a choice of prone liftings \nearrow :Ψ→ Θ
for each | and Θ is called a cleavage, and extends to ``substitution'' functors |* between fibres. A split
fibration is one for which these act functorially, as they do for syntactic categories. In general, the
natural isomorphisms between (u;|)* and u*·|* in a cleavage must be specified, and obey a system of
coherence laws.
These isomorphisms are defined by universal properties, so are uniquely determined by certain
equations. However, as Jean Bénabou rather forcefully pointed out [B \' en85], there is a casual tendency
just to call them ``canonical'' - as if this were an intrinsic property like continuity. The problem is that if
isomorphisms are not kept under strict control, they conspire to form non-trivial groups.
PROPOSITION 9.2.11 A group homomorphism P:C→ S is a fibration iff it is surjective, and the fibre over
the unique object of S is the kernel,
EXAMPLES 9.2.12
(a)
Every product projection in Gp is a split extension, for example the Klein 4-group (Z/(2))2
\twoheadrightarrow Z/(2), where Z/(2) is cyclic of order 2.
(b)
Symm3\twoheadrightarrow Z/(2) is a split extension with kernel Z/(3), but not a product, where
Symm3 is the non-Abelian group of order 6.
(c)
Z/(4)\twoheadrightarrow Z/(2) is not a split extension, since it cannot be described by
conjugation, all three groups being Abelian.
(d)
The squaring map S1\twoheadrightarrow S1 on the unit circle S1 ⊂ C (Examples 2.4.8 and 6.6.7)
is a non-split extension of topological groups.
(e)
There is a double cover SU2\twoheadrightarrow SO3, where SO3 is the group of rotations of the
sphere S2 = {(ix,y+iz)|x2+y2+z2 = 1} ⊂ C2 and SU2 consists of the complex matrices ((a b) ||
( c d)) with a c*+b d* = 0 and |a|2+|b|2 = |c|2+|d |2 = 1, which acts by conjugation. This has a
manifestation in quantum mechanics called the Pauli exclusion principle: electrons have a non-
o
geometrical property called ``spin'' which changes sign if the particle is rotated through 360 ,
whereas it stays the same for photons.
REMARK 9.2.13 Let [(u)] ∈ C be a cleavage, ie a choice of pre-images (prone liftings) for each u ∈ S.
This is also known as a transversal for the normal subgroup K\triangleleftC. Then
Example 9.2.12(d) is a topological group. In this case, the base S1 can be covered by open subsets (in the
sense of Section 3.9) for each of which the two square roots may be distinguished continuously. The
indexation is now over the frame Ω(S1) of open subsets of the circle, but it is also necessary to say how
these (and things defined over them) are pasted together with respect to the coverage. A fibration which
respects this pasting is called a stack, or champs in French. (The French for a field of numbers is corps,
but beware that Bénabou has used the word corpus in yet another sense!) For an account of the
Grothendieck school's approach to fibred categories, see [Gir71], in which Jean Giraud studies non-
Abelian cohomology of topological groups up to dimension 3. []
Stacks arise in type theory too, for example to say that S2 ≅ S↓ 2, respecting the coproduct 2 = 1+1, cf
Exercise 9.20 . They were also used by Hyland, Robinson and Rosolini [ HRR90] to state precisely the
completeness of the category of modest sets (Example 7.3.2(l)).
In type theory, Bénabou's criticism has been (mis-understood and) a source of confusion - for
syntactically defined indexed structures, the substitutions really are functorial on the nose, and the
group- theoretic issues do not arise. The more difficult technology of fibrations has been employed
where the simpler indexed one would have been enough. Even for semantics, we saw in Sections 7.6
and 8.4 that there is an equivalent syntactic category: Exercise 9.17 provides a similar construction for
fibred versus indexed categories. One should not try to choose product, substitution functors, etc but
work instead in the syntactically constructed category, taking the results back along the weak
equivalence.
In the Lawvere presentation, the existential quantifier is the left adjoint of the substitution functor. So P:
Cn× → Cn× is a bifibration (Definition 9.2.6(e)), but the Beck-Chevalley condition, for
type|prop type
invariance under substitution, must be stated separately (Exercise 9.19). The dependent sum arises in the
same way from arbitrarily divided contexts (Example 9.2.5), the Beck-Chevalley condition now being
automatic.
We shall use display maps instead of the fibred technology of the previous section, and derive
diagrammatic universal properties directly from the syntactic introduction and elimination rules. The
quantifiers are only naturally defined when the ``substitution'' is weakening, so ∃x\dashv[^(x)]*.
Dependent sums and composition We begin with the case where the range, body and result types are
of the same kind. This quantifier, the dependent sum Σ, exists when the (single) class of display maps is
closed under composition ( cf Definition 8.3.2).
REMARK 9.3.1 We defined a display map to be one which arises in the category of contexts from
forgetting a single variable, so if the class is to be closed under composition, every context [Γ,x:X,y:Y]
with two (extra) variables must be the same as some [Γ,z:Z] with just one.
In this case we have an isomorphism over Γ making Z the dependent sum Σx:X.Y[x]. As the notation
suggests, this situation reduces to a binary product in the case where Y[x] does not in fact depend on x:X
(indeed, ΓXY is then the ( P-)pullback of ΓX and ΓY, Remark 8.3.1).
Similarly, if an identity (or more generally an isomorphism) is a display then it must be possible to omit
the corresponding type, and conversely to introduce its value in a unique way. This type is therefore a
singleton.
Hence dependent sums are freely added to a type theory by closing its class of displays under
composition.
Sum-introduction We cannot recover a:X and b:φ[a] from a proof of ∃x.φ[x], so the projections π are
i
lost when we quantify one kind over another. But we still have the pairing map. This operation has not
previously been given a name in type theory: we do so to stress that it need not be an isomorphism, and
that it is to ∃ as evaluation is to ∀.
Γ,x:X,y:φ\vdash ve:∃x:X.φ.
(∃ℑ )
This operation takes a witness a:X and its evidence b:φ[a], and yields a proof of ∃x.φ[x], so we shall call
it the verdict , ve(a,b). Spelling this out, the right rule in the sequent calculus is
omitted prooftree
environment
The verdict operation is a map [Γ,x:X,y:φ]\twoheadrightarrow [Γ,z:∃x.φ]. We use the double arrowhead
because, if the propositional displays \hookrightarrow are monos, then the verdict maps are stable
regular epis, as we shall see.
For the binary sum (Remark 2.3.10) we wrote ν (y) for ve(x,y). Note that x,y ∉ FV(∃x.φ); we ought
x
perhaps to write [^(y)]*[^(x)]*∃x′:X.φ[x: = x′], but we shan't. Also, we never meet the quantifier ∃y:φ.X;
this is because its verdict operation ve(y,x) would be a type-term depending on a proof y:φ, violating
omitted prooftree
environment
which reduces to (∃E ) as in Definition 1.5.1 if we erase the proofs (terms of propositional type). `` Γ
\vdash θ prop '' is the more precise type- theoretic way of saying that θ is a well formed proposition with
x ∉ FV(θ). With the variable z, this is actually the ``left rule'' in the sequent calculus, cf Remark 1.4.9:
the elimination rule in natural deduction gives a value c to z. This term might involve parameters from
another context, but we shall ignore these at first.
The let syntax, though ugly, is needed to match the argument z or c against the pattern ve(x,y), binding
the variables x and y ( cf ν in ( +E), Remark 2.3.10), whereas (λx.p)a assigns its argument to just one
i
omitted prooftree
environment
On the face of it, we associate the value a to the variable x and b to y within the proof f. But the verdict
need not be the pairing function, and many different pairs a,b may be associated with the same proof c =
ve(a,b) of ∃x.φ[x]. So there is an equivalence relation amongst such pairs, and the let expression
corresponds to the familiar idiom of ``choosing'' a member of the equivalence class, cf Remark 1.6.7 - ie
to using an (∃E )-rule. This indeterminacy is why we say let instead of put.
environment
The η-rule is
omitted prooftree
environment
omitted prooftree
environment
PROPOSITION 9.3.4 ∃ and ve satisfy the unsubstituted weak sum rules iff in the square (commutative
since the rules happen in the context Γ),
PROOF: The sum elimination rule (∃- E) gives such a map; it is a term of type θ, so the lower triangle
id
commutes. The upper one is the (∃-β)-rule in the unsubstituted form ( cf Example 7.2.7)
REMARK 9.3.5 Existential quantification is the left adjoint of weakening. Indeed [z: = ve(x,y)] is a
universal map (Definition 7.1.1) in the fibre category P([Γ,x:X]) from the object [Γ,x:X,y:φ], ie the
proposition φ, to the functor [^(x)]*:P(Γ)→P([Γ,x:X]). Then
^
η = [z: = ve(x,y)]
* ∃x.
[Γ,x:X,y:φ] → [Γ,x:X,z: x φ]
and
^ ε = [m: = let(x,y)be zin y]
[Γ,z:∃x. x θ] →
* [Γ,m:
θ]
are respectively the unit and co-unit of the adjunction ∃x\dashv [^(x)]*. []
Substitution and the Beck-Chevalley condition The adjunction is very neat, but it is not the whole
story, because we have overlooked substitution for the parameters which might occur in the proof c of
∃x.φ.
These parameters are provided by the substitution u:Γ→ ∆ . We make the convention that the types X, φ
[x] and ∃x.φ are defined in ∆ ( cf type- symbols being defined over ∆ in the previous chapter), although
for reasons of space we often omit these from the rules. Then
Γ\vdash c:u*∃x.φ
[x],
corresponds by Lemma 8.2.14 to a morphism
[u,z: = c]:Γ→ [∆,z:∃x.
φ].
As Γ, u and c are arbitrary, this is a generalised element of [∆,z:∃x.φ].
omitted prooftree
environment
which turns the diagram in Proposition 9.3.4 into
arises from the unsubstituted rules - the introduction rule (unit, ve) of ∃x \dashv [^(x)] and the
∆ ∆
ε
η
^ ^ ·u* ·∃x∆ → Γ u* ·
∃x ·\u * → ∆ ∃x ·\u * · x * ·∃x∆ = ∃xΓ · x *
Γ x Γ x ∆ Γ ∃x .
∆
In the case u = id, this is the identity, by the triangular laws given in Definition 7.2.1. The fully
substituted form of (∃-E) provides the inverse of the comparison for any substitution u, as may be seen
by putting ∃x:u*X.u*φ for θ in the diagram. This is the Beck-Chevalley condition for sums, cf the
context diagrams in Remark 9.1.9.
THEOREM 9.3.8 The verdict obeys the weak sum rules iff ∃x\dashv [^(x)]* and the Beck-Chevalley
condition holds. []
∃x.(φ[x]∧ψ) ≡ (∃x.φ[x])∧
ψ.
because the two factorisations of [∆,X,φ∧ψ]\hookrightarrow [∆,X]→ ∆ in the diagram overleaf must be
isomorphic. []
Our Frobenius Law follows a corollary of Beck-Chevalley because, unlike Lawvere and Bénabou, we
include substitution for propositional as well as type variables in the change of base u.
Putting φ = T, the co-unit of ∃x\dashv [^(x)]* is a cartesian transformation ( cf Remark 6.3.4 and
Proposition 7.7.1(e)).
An idiomatic form of this proof would use `` let(x,y)becin'' to open the box, after which we may say c =
ve(x,y). However, when the box is ultimately closed (at the end of the proof or before closing a
surrounding box), the term has to be exported in its let form.
Semantics and open maps The existential quantifier adds a case to the structural recursion in Remark
9.1.10. Given displays [[[^(y)]]] and [[[^(x)]]],
we have to find denotations for ve and [^(z)] making the square commute. For soundness of the let
syntax, these must have the same universal property in the semantics as that described in
Proposition 9.3.4 for the syntax. So the functor [[-]] preserves the factorisation system.
EXAMPLE 9.3.10 Let S be Sp or Loc and M the class of open inclusions; by the definition of continuity,
this is closed under pullback.
For any Γ ∈ obS, the relative slice P(Γ) ≡ S↓ Γ is the frame of open subsets of the space Γ, and for any
continuous map u:Γ→ ∆, the substitution functor u*:P(∆)→ P(Γ) is the inverse image (frame
homomorphism) of the same name. By Remark 9.3.5, if u (deserves to be called a display [^(x)] and)
admits an existential quantifier ∃x, then this must be the left adjoint \u!\dashv u* with the Frobenius law
(a)
contains all isomorphisms,
(b)
is closed under composition,
(c)
is stable under pullback against arbitrary continuous maps,
(d)
satisfies the Beck-Chevalley condition for such pullbacks, and
(e)
obeys the cancellation law that, if e is a surjection ( ie e* is full) and e;u is open, then u is also
open.
Since the class E of open surjections also has these properties, and in particular the factorisation of open
maps into open surjections and open inclusions is stable under pullback, we have a sound interpretation
of ∃ ( geometric logic), where D provides display maps for the types. []
Strong sums So far we have only defined the adjunction ∃x\dashv [^(x)]* between the categories
whose objects are a single type φ over [Γ,x:X] and θ over Γ. If the kind of propositions admits dependent
``sums'' over itself, ie the class M of displays is closed under composition, then this is enough to deal
with contexts ( lists of propositions) Φ and Θ ( cf Exercises 5.47 and 9.27).
Otherwise we must use a more complicated version of (∃E ) for the context Θ = [θ1,θ2], and in
particular for the case where θ1 = ∃x.φ:
omitted prooftree
environment
The orthogonality condition for a factorisation system (Definition 5.7.1) provides the stronger rule, in
which θ may depend on z:∃x.φ.
Γ,x:X,y:φ\vdash ve(x,y):∃x:X.φ
is orthogonal to all propositional displays [^(m)]:[Θ,m:θ]\hookrightarrow Θ iff it satisfies the strong
sum elimination rule,
omitted prooftree
environment
where θ also becomes θ[z: = ve(x,y)] in the conclusion of (∃β) and θ[z: = z′] in (∃η). Hence strong sums
with range D and body M exist iff there is another class E (consisting of all maps which are isomorphic
to verdicts) such that E⊥M and M;D ⊂ E;M, this factorisation ( [^(y)];[^(x)] = ve;[^(z)]) being stable
under pullback.
PROOF: According to the original definition, we should show that ve is orthogonal to [Θ,θ]
\hookrightarrow [l = 1.8em]Θ with respect to any z:[Γ,∃x.φ]→ [l = 1.5em]Θ. However, Lemma 5.7.10
lets us consider ``epi'' and ``mono'' maps with the same target [Γ,z:∃x.φ], ignoring z. Recall that the
proof of this lemma used pullback along z ( cf Exercise 9.30). As for the weak sum in Proposition
9.3.4, the fill-in p is given by the elimination rule, the β-rule says that it makes the triangle commute,
and it is unique by the η- and equality rules. Remark 9.3.7 has already shown that the pullback along u
of a verdict map must be another such. []
REMARK 9.3.12 As a special case, a single kind admits strong sums over itself ( M = D) iff it is closed
under composition (Exercise 9.26) . In this case the verdict maps are isomorphisms, so the Beck-
Chevalley condition and stability under pullback are automatic. The formulation with ``product''
projections (Remark 9.3.1) relies on the kinds being the same, otherwise we would have witnesses for
existential quantification.
REMARK 9.3.13 The strong sum generalises the test proposition θ to a context extension, but there seems
to be no type-theoretic notation for ∃x.Φ, where Φ is a list of propositions involving x. This is because
∃x.φ[x]∧ψ[x]
∃y.φ[y] ∧ ∃z.ψ[z]
\vdash
is irreversible without ``co-operation'' between the witnesses of φ and ψ. By Lemma 5.7.6(e), the
comparison map
[Γ,∃x.(φ,ψ)]→ [Γ,∃x.φ] \hookrightarrow Γ
is in \orthr E, so maybe it should be regarded as a proposition (in M), but this is not clear. If we do adopt
this view, then the factorisation problem reduces to two cases, of which the first is called the support of
X:
Even then, type theory does not require all maps to be factorisable. In particular the pullback diagonal
(contraction, variable qua term)
Finally, notice that it is the factorisation e;m rather than the class E which is required to be stable under
pullback, since the target of the substitution u:Γ→ ∆ is that of m, not e. (Before Definition 9.3.6 we did
mention a morphism to [∆,∃x.φ], but its extra component c did not contribute to the pullback.)
Whereas the previous section was very much about the quantifier (albeit directly applicable to categories
in which the ``propositional'' displays need not be mono), this one quickly leaves the predicate calculus
behind, and has much more of the flavour of function-types. However, following Convention 9.1.2, we
retain the predicate notation, although its only purpose is to distinguish the positive role of the body φ[x]
from the negative range x:X of the quantifier (Remark 1.5.9). Far from assuming the display of φ to be
mono, in Lemma 9.4.10ff we consider the special case in which φ is a fixed object and the type display
is mono. You may prefer to start with Definition 9.4.8, skipping the type theory, as there are
applications in geometric topology, as well as to free algebraic theories.
∆,x:X,f:∀x:X.φ\vdash ev(f,x):φ,
(∀E )
the corresponding left rule in the sequent calculus being
omitted prooftree
environment
The term-formation rule introducing the ∀-type is λ -abstraction:
omitted prooftree
environment
combining the (∀ℑ )- and (→ ℑ )-rules of Sections 1.5 and 2.3, together with the substitution u which
we have discussed in detail for ∃. There is also an equality rule (λ = ) saying that if p = q then λx.p = λx.
q. The β-rule incorporates the substitution u into Definition 2.3.7,
omitted prooftree
environment
and finally the η-rule is
omitted prooftree
environment
We have already devoted Sections 2.3 and 4.7 to simple type theory, where the dependent product
reduces to the function-type X→ Y. The diagram in Definition 4.7.9 is obtained from that opposite by
replacing ΓΨ, φ and ∀x.φ by Γ, Y and F respectively, and otherwise deleting Γ, ∆ and u. In particular,
the ground context Γ was previously the terminal object and was not drawn. See also Example 7.2.7.
THEOREM 9.4.2 The octagon interprets the rules for the dependent product iff for every dotted map as the
lower left oblique edge,
Γ,x:u*X,Ψ\vdash p:u*φ
there is a unique map for each of the upper diagonals
Γ,Ψ\vdash f:u*∀x:X.φ
making the diagram commute. We write f = λx.p and F = ∀x.φ.
PROOF: As before, (∀ℑ ) provides the map, (∀β) makes the diagram opposite commute and (λ = ,∀η)
say that the fill-in is unique. The obtuse dotted triangles define bijections \expx p↔ [u,f] and p↔ [u,p]
by Lemma 8.2.14, and the parallelograms (but not the kites) are pullbacks - there is no room for the right-
angle symbol any more! []
The composite [^(Ψ)] ;u:[Γ,Ψ]→ ∆ above in fact corresponds to the u in the syntactic rules; Ψ has only
been included to state the next result.
REMARK 9.4.3 Universal quantification is the right adjoint of weakening. With u = id, consider Ψ not
as a context extension but as an object of the relative slice S↓ Γ. Then there is a natural bijection
omitted prooftree
environment
for which the unit and co-unit of [^(x)]*\dashv ∀x are
^
η = [f: = λx.z] *
[Γ,z:ψ] → [Γ,f:∀x. x ψ]
^ ev
* ∀x.φ] → ε = [y: = (f,x)] [Γ,x:X,y:
[Γ,x:X,f: x
φ]
cf Theorem 9.3.8 for the weak sum. Currying (Example 4.7.3(c)) and Lemma 9.4.13 essentially show
how to do this for a list Φ instead of a single proposition φ.
Now consider the general substitution u:Γ→ ∆, which imports proofs into the box in the language of
Lemma 1.6.3ff. In that section we only saw the consequences of importation for ∃, not for ∀, because
the range of quantification (X) was a simple type.
The separate adjunctions [^(x)]* \dashv ∀x and [^(x)]* \dashv ∀x give a comparison
Γ Γ ∆ ∆
u*φ.
In categorical notation this is
η
ε
^ ^ ·u* ·∀x∆ ← Γ u* ·
∀x ·\u ← ∆ ∀x ·\u · x ·∀x∆ = ∀xΓ · x *
* * *
Γ x Γ x ∆ Γ ∀x ,
∆
(a)
the Beck-Chevalley condition has to be stated separately, and
(b)
the bijection p↔ f is quantified over a class of displays such as [^(Ψ)] .
In Section 9.3 the analogous object to Ψ, testing the adjunction, was called θ. There the choice between
types and propositions for θ (rather than for the other participating objects X, φ[x] and ∃x.φ) was the
crucial difference between the dependent sum and existential quantifier.
The next result, due to Thomas Streicher [Str91], shows that, for the product, these two complications
actually cancel each other out. Hence dependent products are absolute: they are defined independently
of the choice of the classes M and D of displays.
x y f
THEOREM 9.4.5 Let ∆X→ [^( )]∆, ∆X φ\hookrightarrow [^( )]∆X and ∆F \rightarrowtail [^( )]∆ be displays,
and ev:∆XF→ ∆Xφ. Then F and ev satisfy the rules for the dependent product ∀x:X.φ over ∆ iff there is
a natural bijection
omitted prooftree
environment
for all u:Γ→ ∆ whatever. Notice too that we have reverted to the ordinary slice C↓ ∆, since Γ is not a
context extension of ∆. There is no further Beck-Chevalley condition.
Local cartesian closure Dependent products may also be described by restricting to slices or fibres, as
the octagonal diagram resides there. Together with Chapter V, this is the usual categorical treatment of
full first order type theory as it was done in the 1970s.
PROPOSITION 9.4.6 The following are equivalent for any category C which has a terminal object.
(a)
C has all dependent products;
(b)
every (ordinary) slice C↓ Γ is cartesian closed;
(c)
C has pullbacks and every pullback functor u* has a right adjoint.
In particular the Beck-Chevalley condition is automatic and all colimits are pullback-stable. Then C is
said to be locally cartesian closed. []
In a locally cartesian closed category all maps are treated as displays, so it has equality types and
dependent sums (Remarks 8.3.5 and 9.3.1). Taking the class of monos for the propositional displays,
Exercise 9.34 shows that this is closed under dependent products along all maps, so we may interpret
universal quantification. Existential quantification may be interpreted as in Section 5.8 on the additional
assumption that equivalence relations have quotients (which are automatically stable).
However, the syntax may lack equality types, for the reasons discussed in Section 8.3. In semantic
categories, such as those consisting of domains or (locally compact) topological spaces, not all finite
limits need exist. Hence we need to consider products whose ranges belong to a restricted class, which
we have called type displays (→ ).
DEFINITION 9.4.7 A relatively cartesian closed category C has a single rooted class D of displays, for
which all dependent products exist (with the range, body and result in D). Equivalently, every relative
slice C↓ Γ is cartesian closed and pullback preserves exponentials, ie u*\ev has the universal property
∆
of \ev . Such categories were introduced by me, together with an essentially syntactic example based on
Γ
retracts of a model of the untyped λ-calculus [Tay86a]. Dependent products of domains were
constructed in my thesis [ Tay86b], and this notion was studied further in [HP89].
Partial products Examples of a particular case of dependent product were identified in geometric
topology by Boris Pasynkov [Pas65], before Lawvere had studied the quantifiers categorically, and long
before the modern type-theoretic account of them had been formulated and applied in informatics.
Partial products are more complicated than ordinary function- spaces, but not much, so it is quite
feasible to investigate them in semantic categories such as in topology.
Susan Niefield [Nie82] characterised those continuous functions which admit partial products for the
Sierpi\'nski space Φ = S, and hence for all Φ ∈ obSp. She also studied the categories of uniform spaces
and affine varieties. Building on this, the relationship between partial products and a notion of
exponentiability was formulated by Roy Dyckhoff and Walter Tholen [ DT87], although none of these
authors discussed the Beck-Chevalley condition. In fact we shall see that all dependent products ( with
this condition) can be derived from partial products.
DEFINITION 9.4.8 Let [^(x)]:∆X→ ∆ a carrable map in a category C, ie one for which all pullbacks exist
(Example 8.3.6(e)), so it is legitimate to call it a display. Also, let Φ be any object of C. Then the sub-
diagram in bold below (where the parallelogram is a pullback) is called a partial product if it is the
X
universal such figure, ie given u:Γ→∆ and p:[Γ,u*X]→ Φ there is a unique map [u,f]:Γ→ ∆Φ making
the diagram commute.
Treating Φ as a constant type (defined in the empty context), and using the binary product (∆X)xΦ,
shown dashed, this is another special case of the octagonal diagram in Theorem 9.4.2. Hence the partial
X
product ∆Φ is the dependent product ∀x.[^(x)]*Φ in the context ∆.
X
When ∆ = 1, this is the ordinary function-type Φ (Definition 4.7.9).
X
The map ∆X→ ∆ is exponentiable if partial products ∆Φ exist for all Φ ∈ obC.
EXAMPLES 9.4.9
(a)
X X
In Set, elements of ∆Φ , ie maps Γ = {∗} → ∆Φ , are pairs (u,p), where u ∈ [[∆]] and p:X[u]→
Φ, so
∑
X X[u ] [^(p)]
∆Φ Φ →
= ∆.
u ∈ [[∆]]
(b)
X
The functor T that codes a free algebraic theory (Definition 6.1.1) is of the form ∆(-) , where
X→ ∆ is κ→ Ω and X[r] = ar[r] .
(c)
In the general dependent product in Set for displays,
y)] x)]
\coprod \coprod φ[u,x] → [l = 3em][^( \coprod X[u] → [l = 3em][^(
u ∈ [[∆]] x ∈ X[u] u ∈ [[∆]]
∆,
the elements of F[u] are the sections of \coprod φ[u,x] →X[u].
x
(d)
X
In Pos, the elements of ∆Φ are again pairs (u,p), where now p is monotone. As in Theorem
X
4.7.13, to determine the order on the function- space we consider f:Γ = {0 < 1}→ ∆Φ ; then
(x) ≤ \polly1(y).
Φ
X
For this relation to be transitive on ∆Φ , it would suffice that
(e)
F. Conduché found the analogous characterisation of exponentiable functors in Cat, which is
discussed in [Joh77, p. 57]. In particular, any fibration or op-fibration is exponentiable, but if
X
∆X→ [l = 1.8em] ∆ belongs to one class then ∆Φ \rightarrowtail [l = 1. 5em]∆ is in the other.
Also, all replete functors between groupoids are exponentiable.
We have stressed that \hookrightarrow for the body of the quantifier is merely a notational convention.
For many of the interesting examples of partial products, it is actually the type display ∆X→ ∆ that is
mono.
X X
LEMMA 9.4.10 If ∆X→ ∆ is mono and the partial product ∆Φ exists, then ∆XΦ ≡ (∆X)xΦ, ie the
dashed vertical morphism in the diagram opposite is invertible.
X
u:Γ = Γ′→ ∆X→ ∆, so ΓX = Γ′ is the inverse image, and let p = p′. Then [u,f]:Γ→ ∆Φ factors through
X
∆X by construction, so the pullback mediator is the required product mediator Γ→ ∆XΦ . []
n n+1
EXAMPLES 9.4.11 In the following examples, S denotes the Sierpi\'nski space whilst S ⊂ R is a circle
or sphere.
(a)
Lifting is a partial product, since the test is a partial map Γ\rightharpoonup Φ. Φ is the final
⊥
(b)
See Exercise 3.71 for the same construction in Loc; Example 7.7.4, the Freyd cover, gives it for
toposes and geometric morphisms.
(c)
Since every open inclusion U\hookrightarrow Γ is the pullback of the open point of the Sierpi
\'nski space along the classifying map Γ→ S of U, and the universal property of the (topological)
lift LiftΦ is stable by the Beck-Chevalley condition, U\hookrightarrow Γ is exponentiable in Sp.
(d)
Similarly for a closed subset A ⊂ Γ. By composition, any locally closed inclusion A∩U ⊂ Γ is
exponentiable in Sp. Niefield showed that this characterises exponentiability for subspace
inclusions.
(e)
The diagonal [^ = ]:X\hookrightarrow XxX is (locally) closed iff X is (locally) Hausdorff, cf
Remark 8.3.5. If Y is locally compact and X locally Hausdorff, then any map Y→ X is
exponentiable.
(f)
n n
Pasynkov's original example [Pas65 , p. 181] applied (c) to the interior of the ball B ⊂ R . The
sphere S2 can be seen as the partial product of either the interval B1 by the circle S1, or the disc
B2 by two points S0. Similarly for higher dimensions: omitted diagram environment
(g)
Using the universal property, a path
X
[u,f]:Γ = [0,1]→ ∆Φ is determined by the path u in ∆, together with open segments p in Φ
defined on the inverse image of the interior ∆X ⊂ ∆. This suggests a way of computing the
X
fundamental groupoid π1(∆Φ ).
EXAMPLES 9.4.12 In practice, the result of a quantifier need not lie in the same class of maps as its (range
or) body.
(a)
N
The function-space N exists in Sp (it is called Baire space ), but it is not locally compact, so
NN
S does not exist.
(b)
Equality and the order relation on {⊥ < T} consist respectively of two and three points of the four-
point Boolean algebra. The order can also be thought of as implication, and is definable from
lifting. However, by Example 9.4.9(d), a full subposet ∆X→ ∆ is exponentiable iff it is convex,
which these examples are not.
(c)
In the analogous topological situation for the Sierpi\'nski space S, these subspaces are not locally
closed (Example 9.4.11(e)).
Facts such as these about a category could be presented by identifying certain triples of classes of
display maps ( l=15pt,abut] →, \hookrightarrow , \rightarrowtail ) such that we may form quantifiers
whose range, body and result belong to the respective classes. In other words, we have a type theory
with many kinds (\K1,\K2,\K3), and the usual introduction, elimination, β- and η-rules, but a restriction
on the formation-rule for the dependent product:
omitted prooftree
environment
Henk Barendregt [Bar92, § 5.4] has developed just such a formalism, but with the quite different
motivation of unifying various syntactic calculi, including Girard's System F (Definition 2.8.10) and
Thierry Coquand's Calculus of Constructions [CH88].
Partial products suffice Partial products are easier to calculate semantically than general dependent
products, but we shall now show that, together with pullbacks, they are enough. The trick, categorically,
is to consider naturality with respect to Φ.
The reason for writing capital Φ and \hookrightarrow with a double head above is that we shall later use
Φ as the defining context of the proposition-symbol φ, a substitution-instance of which occurs as the
body of some dependent product to be calculated. We shall now write
X
LEMMA 9.4.14 Suppose that ∆Φ (at the top of the diagram below) is the partial product of Φ along [^
(x)] with evaluation map \ev , as in the parallelogram on the right, and let [^(y)]:[Φ,y:φ]
Φ
(a)
X
[(p)] ∆(Φφ) is the partial product of Φφ along [^(x)] with evaluation \ev , shown as the big
Φφ
bold rectangle,
(b)
X X X
[(d)] ∆(Φφ) \rightarrowtail [l = 1.5em]∆ Φ is the dependent product ∆,f:Φ \vdash ∀ x.φ[fx],
whose evaluation map \ev is shown dashed.
φ
X X
The subscripts Φ and Φφ on ev and λ refer to the partial products ∆Φ and ∆(Φφ) , whilst φ indicates
the dependent product ∀x.φ.
(a)
X X
[[p⇒ d]] We are given [u,f]:Γ → ∆Φ and [u,f,x,q]:ΓX→ ∆Φ Xφ, so by composition of the
equilateral triangle put p = \ev (f,x) to get [p,q]:ΓX→ Φφ, and [u,f,g] = \Lamb x.[p,q]: Γ→ ∆
Φ Φφ
X
(Φφ) using the partial product. This is the dependent product by Theorem 9.4.5, whose u, ∆ and
X
φ correspond to [u,f], ∆Φ and \ev *φ ≡ φ[fx] here. The slender triangle at the top left commutes
Φ
X
by universality of ∆Φ , and the triangle involving \ev commutes using the pullback \ev *φ.
φ Φ
(b)
[[d⇒ p]] Given u:Γ→ ∆ and [p,q]:ΓX→ Φφ, use the dependent product to put [u,f] = \Lamb x.p:
Φ
X X
Γ→ ∆Φ , define [u,f,x,q] using the pullback \ev *φ and let [u,f,\Lamb x.q]: Γ→ ∆Φ ∀x.φ[fx].
Φ φ
[]
THEOREM 9.4.15 Let C be a category with a carrable map [^(x)], a rooted (`` propositional'') display
structure, and (choices of) partial products of all objects along [^(x)]. Then all dependent products along
[^(x)] exist.
PROOF: We want the dependent product ∀x:u*X.p*φ in Γ (shown in bold), where u:Γ→ ∆ and p:ΓX→ Φ.
The Beck-Chevalley condition says that the universal property of ∀x.\ev *φ is stable, so the required
Φ
X
product is given by pullback along [u,λx.p]:Γ→ ∆Φ as shown, since [u,λx.p] ;\ev = p using the
x Φ
X
partial product ∆Φ . []
COROLLARY 9.4.16 The interpretation of dependent products can be defined to make the Beck-
Chevalley condition hold up to equality.
PROOF: The binary dependency of ∀x:u*X.\nearrow *φ on u and \nearrow is replaced by a unary one on
X
[u,λx.\nearrow ] and the pullback is anchored at ∆Φ , where X and φ are type-symbols ( cf Remark
9.1.11). []
Comprehension The formation and elimination rules of {x: X|φ[x]} resemble those of an existential
quantifier, yielding a set instead of a proposition, and it behaves like Σx on its range. But the
introduction rule for comprehension is different from that for sums. Comprehension turns a proposition
into a type, so the effect is to move it across the division of the context in Section 9.2. It is like the single-
kind Example 9.2.5, where the division was imposed arbitrarily.
Although {x:X|φ[x]} is usually called a sub set, it is a common idiom to put two or more variables on the
left of the divider, or none at all (Remark 2.2.7). In fact the formal rules also suggest that we should
view comprehension as an operation on contexts. Then {∆|φ} is the context ∆ extended by a new type
\propext φ, which is what the proposition φ becomes after its move across the division.
There is a very simple account of comprehension in terms of fibrations, which was, of course, found by
Bill Lawvere [Law70]. It was rediscovered by Thomas Ehrhard [Ehr89] and considered further by Bart
Jacobs [Jac90] and Du sko Pavlovi\'c [Pav90].
DEFINITION 9.5.1 In a generalised algebraic theory with divided contexts, the extent of a proposition φ is
the type \propext φ with type-formation and indirect term-formation (type-introduction) rules
variables, so no α-equivalence need be stated. The β- and η-rules for comprehension are
omitted prooftree environment ∆, y:\propext φ | \vdash y = κ.ev( y):\propext φ (\propext
η).
The context ∆ in these rules has no propositional part; this ensures that the condition for the
propositional situation (Definition 9.2.1) is preserved. In fact ∆ may be extended (weakened) by a
propositional part Ψ, but φ and p must not depend on it.
REMARK 9.5.2 The comprehension {∆|φ} is the context [∆,\propext φ| ]. The introduction rule
(combined with cut) is then
omitted prooftree
environment
and the β- and η-rules are summed up by the diagram
THEOREM 9.5.3 Let P:C→ S be a fibration and P\dashv T with co-unit P·T = \id (Proposition 9.2.3).
S
Then P interprets comprehension iff there is an adjoint T\dashv C, with co-unit ev and transposition κ.
In other words, the way in which comprehension turns propositions into types is by a co-reflection ( cf
the support of a type, which is its reflection into propositions, Remark 9.3.13). Substitution-invariance
is automatic.
C·T. []
EXAMPLES 9.5.4
(a)
In the predicate calculus, where the class M of propositional displays consists of monos in Set,
the total category Cn× is M↓ Set. The maps of this category are commutative squares
type| prop
whose verticals are mono. Such squares are prone iff they are pullbacks (hence the alternative
Γ
name cartesian). The fibres are P(Γ) = Ω . omitted diagram environment
(b)
In Example 9.2.5 the division of the context is an arbitrary one between an indexing set I and the
sets X[i] which it indexes. This can be described in the same way with M = Set, so any four maps
may form the commutative square. The comma category Set↓ Set is the same as the functor
→
category S , the fibration functor P being tgt, so this is known as the codomain (target)
fibration. The fibre over Γ is Set↓ Γ and the adjoints in this case (and the previous one) are given
Γ
by X→ \id and src. The slice Set↓ Γ is equivalent to the functor category Set , cf extensive
X
(c)
Γ
Let S = Cat and \nearrow [thick](Γ) = Set ; then the comprehension functor gives discrete op-
fibrations [Law70, SW73].
(d)
Not all comprehension fibrations are like this. The fibration P of the category of rings-with-
modules over rings (Example 9.2.10(a)) has comprehension, but it is trivial: C = P (since T gives
the zero module, we have T\dashv P as well as P\dashv T). []
Not all subobjects in the base category need arise by comprehension. Those that do form a class of
support maps (Definition 5.2.10); indeed the notion of a class of supports corresponds to that of a
fragment of logic in the propositional kind.
This account of comprehension, pretty though it is, does not explain its role in Zermelo type theory as a
way of creating sets beside those definable with 1, x, N and P. Exercises 9.45ff describe three
approaches.
The type of propositions In higher order logic, proposition- and type-expressions are handled as if
they were terms. In a purely syntactic investigation such as [ Bar92], the colon notation a:U between
terms and types can be extended to types and kinds U:type or φ:prop, and even used to say things like
prop:type or type:type.
With categorical interpretations in mind, we prefer to keep the term:type relationship a two-level one,
and distinguish the substance of a type (to which its terms belong) from its name considered as a term
belonging to the type of types; the new types Prop and Type classify the kinds or classes of display
maps prop and type respectively. (Per Martin-Löf attributes these two approaches to higher order logic
to Bertrand Russell and Alfred Tarski respectively.)
(a)
names into propositions, using a special dependent proposition
z:Prop\vdash ω[z] prop,
(Prop E)
dependent on a new type
\vdash Prop
type
of names of propositions;
(b)
and propositions into names:
for each well formed Γ\vdash φ prop , there is some Γ\vdash a:Prop with Γ\vdash φ ≡ ω
[a], (Prop ℑ)
Recall from Theorem 8.2.16 that the substituted type ω[a] is given by a canonical pullback, as in the
diagram on the left:
More concisely, for any propositional display φ\hookrightarrow Γ there is some characteristic map a:
Γ→ Prop which makes the square on the right a (not necessarily canonical) pullback.
How does a depend on φ? Since φ is a proposition (or type) and not a term, there can be no operation-
symbol φ→ a, so the rule (b) must be a scheme: the existence of a is asserted individually for each
proposition φ.
REMARK 9.5.6 Is the characteristic map a:Γ→ Prop at least uniquely determined by the display φ
\hookrightarrow Γ? If it is then any isomorphic display ψ\hookrightarrow Γ must correspond to the same
map a ( up to equality), since we have discarded the isomorphism (i,j) in the translation. This is an
extensionality rule ( cf Definition 2.2.5 and Remark 2.8.4),
omitted prooftree
environment
where we write χ.φ for a ( cf κ.p in Definition 9.5.1), so
THEOREM 9.5.7 Prop is a support classifier (Definition 5.2.10) iff it satisfies comprehension, proof-
anonymity and extensionality.
PROOF: We must replace the proposition φ by its extent, the type \propext φ (Definition 9.5.1), as it
remains to show that {z:Prop|ω[z]} ≡ 1. Consider any map Γ→ {z:Prop|ω[z]}, which corresponds to
Observe that any single display map generates a class of displays which satisfies Definition 9.5.5(a), by
Example 8.3.6(j). Conversely, if a kind has a generic display ω\hookrightarrow Prop then we may state
the type-theoretic properties of the kind in terms of this instead of the whole class.
REMARK 9.5.8 Using the uniquely determined characteristic map, the connectives may be transferred
from the propositions to their names, ie they define algebraic operations on Prop as in Remark 2.8.5.
We write Ω for Prop, as in an elementary topos, when it classifies all monos (Definition 5.2.6); this
happens exactly when the internal connectives and quantifiers for all types X exist ([ Tay98],
Exercise 9.52).
The displays on the left of each square are derived from the universal properties or proof rules of the
type-theoretic connectives ∧, ∨ and ⇒ : they are the product, coproduct and exponential in the context [x,
y:Ω]. Then and, or and implies are defined by these diagrams, ie
and(x,y) = χ. ω[x]∧ω[y] .
There is a similar correspondence between predicates Γ,x:X\vdashφ prop and terms of type X→ Prop, so
X X
this type is the powerset P(X) or Ω . The internalised quantifiers (Remark 2.8.5) some,all:Ω → Ω are
defined by some (f) = χ.(∃x.ω[fx]) and all (f) = χ.(∀x.ω[fx]). []
X X
REMARK 9.5.9 We may do the same for other support classifiers such as the Sierpi\'nski space
(Definition 3.4.10, Example 5.2.11(d)). In this case the kind of propositions consists of open inclusions
in Sp, for which we discussed existential quantification along open maps in Example 9.3.10. Although
(ω[x]⇒ω[y]) ⊂ SxS exists as a subspace, it is not open, or even locally closed (Example 9.4.12(c)).
However, Barendregt's calculus allows us to juggle the kinds so that when ∀ is applied to a geometric
proposition it is called something else (not G , but something similar) and so is no longer a candidate
δ
for classification by S.
REMARK 9.5.10 There is a certain awkwardness in the type-theoretic rules for the powerset. In symbolic
logic (where proofs and formulae alike are expressions) it is natural to repeat the term:type relationship
as type:kind. This is also the practice in that tradition of category theory which is founded in algebraic
topology (though not the one from logic), and it is unnatural to force isomorphisms into equalities.
These considerations do not arise in first order logic, as equality is a matter between two terms relative
to some type. But the extensionality law brings us back to the questions of equality versus
interchangeability of mathematical objects in Section 1.2, so let's review its uses.
(a)
Suppose we have a type of proof-anonymous propositions that does not necessarily satisfy
(Prop η). Then there is an extensional such type Ω iff every equivalence relation on any type has
an effective quotient ( cf Example 2.1.5 and Proposition 5.6.8).
(b)
Proposition 3.1.10 used equivalence relations to reduce any preorder such as (Prop,\vdash ) to a
poset. We went on from there to discuss the Yoneda embedding, but from Chapter IV onwards
we used this quite successfully for non-skeletal categories.
(c)
The extensional lattice of ideals of a ring (Example 3.2.5(f)) must often be replaced by its non-
skeletal category of modules, and for algebraic rather than logical reasons. For example, the ring
p
may have an automorphism such as x→ x that does not act faithfully on the lattice.
(d)
Let x = (L,U) and y = (M,V) be Dedekind cuts (Example 2.1.1) bearing the same relation to all
rationals, ie ∀q:Q.q ∈ L⇔ q ∈ M and similarly for U and V. If f:\realno → Θ is continuous then
D
Extensionality was needed in older mathematical constructions because Zermelo type theory has very
few connectives. Nowadays function- and list-types are used in functional programming to do many jobs
previously done by the powerset (Exercise 9.55). However, Exercises 2.54 and 9.57 do depend on
extensionality.
Set theory treats higher order logic metaphysically, model theory pretends that it doesn't exist, type
theory bureaucratises it and category theory has no view which is identifiably its own. I feel that it does
play an important part in mathematics, at least in the weak case of the Sierpi\'nski space, and that this
needs to be explained [Tay98].
9.6 Universes
The real test of foundations is of course how they support the edifice above. We have already
demonstrated this throughout the book, but without going beyond its proper scope, we may ask what our
logic has to say about its own construction: meta-mathematics as an example of mathematics. This does
put the theory in jeopardy as the main question is consistency.
The description of the object language is in two parts: in these last two chapters we have said what it is
to be a structure for a certain fragment of logic, and Chapters VI and VII constructed the free such
structure.
REMARK 9.6.1 Example 8.1.12 gave the generalised algebraic theory of categories, which may be
summed up semantically by the display
src tgt
mor → ,
x,y:O, f:H[x,y]
≡ O2
together with operation-symbols id and compose satisfying the axioms for a category. Then we may
add constants unit,empty ∈ O and operations
Gödel's incompleteness theorem Consider first the free structure. Although Cantor's theorem is valid
inside, from the outside the objects (contexts) and maps (substitutions of terms) are defined
syntactically. So, by the techniques of Section 6.2, there is a recursive cover of each hom-set, in
particular N\twoheadrightarrow H[unit,P], where P is the internal version of P(N) (as long as the sorts,
operation-symbols and axioms are recursively enumerable). This is the Skolem paradox, page 2.2.9.
In 1931 Kurt Gödel used powers of primes to describe the enumeration, but recent authors seem to
forget that any modern technology that they may employ to write books about his argument works with
Gödel numbers as a matter of course. (These spectacularly infeasible calculations do, however, illustrate
the need for exponentials, in both the logical and arithmetical senses.) Instead of numbers, it is more
natural to use texts, ie terms of type List(A), where the alphabet A contains everything used in the syntax
of [], including variables and the meta-notation for substitution and proofs. It might be the set of distinct
symbols used in this book, including my TEX macros for proof trees and boxes, which specify the two-
dimensional arrangement of formulae using a linear stream of tokens. We also need a quoting function
\qqdash :List(A)→List(A) such as
THEOREM 9.6.2 Let [] be a consistent fragment of logic that is recursively axiomatisable and adequate for
arithmetic. Then [] cannot prove its own consistency.
PROOF: Using primitive recursion, it is a decidable property of a triplet (p,Γ,φ) of texts whether p is a
well formed proof in [] whose last line is Γ\vdash φ ( cf the proof of Proposition 6.2.6). This property
can itself be expressed as a text ok ∈ List(A) containing (symbols in A for) variables x, y and z. Using the
informal notation ok[\qq p,\qq Γ,\qq φ] to indicate substitution for x, y and z, it satisfies
(a)
an introduction rule for each sequent rule r of [],
omitted prooftree
environment
that ok[\qq p1,\qq Γ1,\qq φ1],…, ok[\qq p ,\qq Γ ,\qq φ ] \vdash ok[\qq r([(p)\vec]),\qq ∆,\qq θ],
k k k
(b)
and an elimination rule that
_
ok[\qq p,\qq Γ,\qq φ] \vdash ok \qq p , \qq [ ], \qq ∃q .ok[\qq q,\qq Γ,\qq φ] ,
One might suppose that Gödel's theorem says that the concept of truth is metaphysical, and needs simply
to be replaced by provability. On the contrary, André Joyal (1973) considered the free model Cn[]
L
instead of Set for the outer world, and then the internal free model within that. As the objects, maps and
equality in Cn[] are given by the syntax of [], their truth is our provability, and the internal notion is
L
different again.
In particular, the property ψ of inconsistency says that there is a map unit→ empty, ie that the global
sections functor U ≡ H[unit,-] does not preserve the initial object ( cf Theorem 7.7.10(a)). The
subobject classified by the Gödel sentence θ is then the non-empty equaliser
∅\subsetneqq {∗|θ}\subsetneqq 1 →
yes
9.7 Exercises IX
1. Investigate the naturality of ∃x\dashv [^(x)]* \dashv ∀x as in Section 7.2.
2. Definition 9.2.1 associates a divided context to any jumbled one. Construct the isomorphism
between them.
3. In the first part of Lemma 9.2.2, show that this is the initial solution to forming a pullback square
with the given top and right sides.
4. Show that any fibration P:C→ S preserves pullbacks. Give a semilattice example in which it
does not preserve T, but show that if there is a functor T with P·T = \id then P does preserve all
S
finite limits. Conversely, show that if C and S have and P:C→ S preserves pullbacks, and P
\dashv T with P·T = \id , then P is a fibration.
S
5. Show that the pseudo-vertical and prone maps for any fibration P:C→ S form a factorisation
system ( Definition 5.7.2) in C. [Hint: the prone part of a C-morphism f is the lifting of Pf ; use
the universal property again to show that the fill-in property holds.] Also show that every square
consisting of parallel vertical and prone maps is a pullback. We call this situation a cartesian
factorisation system.
6. Show that every fibration in which the base category and fibres have all finite limits arises
uniquely from a conservative extension of Horn (or essentially algebraic) theories: the two
classes of displays consist of the prone and vertical maps respectively.
7. Show that π0:S↓ U→ S in Proposition 7.7.1 is a fibration, but is not comprehensive or an op-
fibration. What is the corresponding conservative extension of theories?
8. In any fibration P:C→ S, show how to find limits in C, given limits in S and the fibres. Show
that pts:Sp→ Set and ob:Cat→ Set are bifibrations (Definition 9.2.6(e)), and use this to describe
limits and colimits in Sp and Cat, cf Section 7.4.
9. Let E⊥M be a factorisation system in a category C. Verify that the supine, vertical and prone
morphisms for the codomain fibration tgt = π1: M↓ C→ C are as shown below, and that if E⊥M
is stable then so is the supine-vertical factorisation. omitted diagram environment Conversely,
suppose that M ⊂ C is such that the class M′ of squares like those in the middle, but with
isomorphisms at the bottom, is part of a (stable) factorisation system; show that M is too.
↑
10. Prove Proposition 9.2.9 in detail for Horn theories. Let S = ∪ \typeR in the base category
i
(algebraic lattice) Mod(L ). Prove the limit-colimit coincidence, that the projections Mod(S)
type
→ Mod(\typeR ) form a limiting cone, and the embeddings a colimiting cocone in the category of
i
algebraic lattices and Scott-continuous functions. Show that every fibration of algebraic lattices
with these properties arises from a conservative extension of Horn theories.
11. Formulate the divided theory whose type part has one sort and whose propositional part is the
theory of groups. Use this and Proposition 9.2.9 to compute limits and colimits in Gp. More
generally, describe the theory which divides the laws of an algebraic theory from its sorts and
operation- symbols ( cf Remark 7.4.10 and Example 9.2.4(h)), and use it to compute free
algebras.
12. Formulate the sense in which the total category C of a fibration P:C→ S is the lax colimit of the
corresponding indexed category, considered as a diagram of shape S in the 2-category C/at.
13. Use the Grothendieck construction for a diagram ℑ → Set to compute its (strict) colimit. [Hint:
consider the connected components of the total category.]
14. (For group theorists.) Use the fact that fibrations of groups are surjective homomorphisms and
that this class is closed under pullbacks to prove the Jordan-Hölder theorem.
Cop
15. Repeat Exercise 4.41, that Set is cartesian closed, using the Grothendieck construction to
replace presheaves by discrete fibrations.
16. Let P:C→ S be a fibration. Define the morphisms of the category D, whose objects are triples (Γ,
Φ,u:Γ→ PΦ), such that π0:D → S is a split fibration and Φ→ (PΦ,Φ,\id ) is a weak
Φ
equivalence. [Hint: this is not the comma category C↓ P.] Describe the groupoid that arises in
this way from Z/(4)\twoheadrightarrow Z/(2).
17. Let P:C→ S be any fibration. Suppose that there is a choice of re-indexing functors u*:P( ∆)→ P
(Γ) for each u:Γ→ ∆ in S. Formulate the coherence conditions which relate id* to the identity re-
indexing functor and (u;| )* to u*·|*. Conversely, given assignments P(Γ) and u* satisfying these
conditions, adapt the Grothendieck construction (Proposition 9.2.7) to recover the fibration.
18. Investigate the formation and coherence rules for explicit isomorphisms instead of equalities of
types in Definition 8.1.7.
19. In terms of bifibrations, show that the Beck-Chevalley condition for a particular pullback in the
base category says that the prone and supine liftings going two ways around the square ``join up.''
20. Let C be a category with products and 2 = 1+1 such that the inclusions 1→ 2 are carrable. Show
that C is extensive (Section 5.5) iff there is a class D of displays with the following type-
theoretic property: for any two types Y, N there is a dependent type i:2\vdash X[i] type such that
Y ≡ X[1], N ≡ X[0], and Σi:2.X[i] is their coproduct. In this case show also that Πi:2.X[i] is their
product. ( Cf Exercise 5.35.)
22. Consider a generalised cut of a substitution u: Γ→ ∆ with the (∃-E)-rule. Show that the
substituted form of the rule as we gave it in Definition 9.3.6 is what is needed to commute these
sequent rules.
23. What is the type of λz.(let(x,y )bezind)? Considering this type as a proposition, what is its
relationship to that of λx, y.d? [Hint: Exercise 1.22.] How is this relationship expressed by the ve
operation?
24. Formulate the canonical ∃-language (Section 7.6) for a stable factorisation system, and use
gluing (Section 7.7) to prove conservativity and equivalence.
25. Show that the Beck-Chevalley comparison maps as given in Definition 9.3.6ff for the existential
quantifier are mutually inverse.
26. Show that the Beck-Chevalley condition is automatic for dependent sums, ie composition of
displays. Using Exercise 7.29, deduce the same condition for locally cartesian closed categories
(Proposition 9.4.6).
27. (Jacobs, Moggi and Streicher) Show that if M is closed under composition, ie it admits strong
sums over itself, then weak sums over D are sufficient to derive the strong version ( cf
Exercise 5.47).
29. Adapt Section 9.3 to sums Σx:X. φ in which the result is of a different kind (M) from the body
(\nearrow [thick]). [Hint: Theorem 9.3.11 becomes E⊥M and \nearrow [thick];D ⊂ E;M stably.]
30. For the continuation rule for the strong sum type, the top right square in Theorem 9.3.11 need not
be a pullback. What is the effect of this in terms of the universal property of a factorisation
system, and why is this automatic? Formulate the corresponding type-theoretic rule for the strong
and weak sums and also discuss it in the idioms of Section 7.2.
31. Let E and M be two classes of maps satisfying the conditions of Theorem 9.3.11. Suppose that
either (a) all maps (in particular verdicts) factorise, or (b) all M-maps are mono. Show that E is
pullback-stable .
32. Formulate the cut-elimination step which expresses Πβ in the sequent calculus. (The right rule is
essentially the same as (∀ℑ ), and the left rule was given in Definition 9.4.1.)
33. Show how to extend the universal quantifier to ∀x. Φ, where Φ is a list of propositions dependent
on x:X. [Hint: adapt Lemma 9.4.13.]
X
34. Deduce from Exercise 7.36 that ∆(-) preserves monos, so the class of monos is closed under
universal quantification in the sense needed for the remarks following Proposition 9.4.6.
35. Show that the free algebra FG on a set G for a finitary free theory is the partial product Πx.G of
G along the map X→ F 1 that is itself defined as the pullback of tgt:( < ) → N along the
interpretation [[-]]:F1→ N defined in Exercise 6.3.
36. Formulate the canonical language for partial products and prove conservativity for dependent
products as in Sections 7.6 and 7.7.
37. Prove Theorem 9.4.14, that dependent products can be reduced to partial products, type-
theoretically.
X
38. Prove Lemma 9.4.10, that ∆XΦ ≡ (∆X)xΦ if ∆X→ ∆ is mono, using type theory.
39. Let Φ be a propositional context and suppose that the partial products of Φ along the displays
∆Xψ \hookrightarrow ∆X→ ∆ exist. Prove, both type- theoretically and by diagram-chasing, that
the partial product along ∆(∃x.ψ)\hookrightarrow ∆ also exists. [Hint: ∀z:(∃x.ψ).Φ ≡ ∀x.∀y :ψ.
Φ.]
41. Find a Scott-continuous function between dcpos which satisfies the conditions of Example 9.4.9
(d) for being exponentiable in Pos, but which is not exponentiable in Dcpo. [Hint: the bilimit of
the fibres over some ascending sequence fails to be the fibre over its directed join.]
42. How can Pasynkov's expression of spheres as partial products be used to calculate π1(S1), the
fundamental group of the circle?
43. Assuming proof-anonymity, explain in terms of Theorem 9.5.3 why the results of
comprehension are monos.
45. Give the equational formulation in a 2-category or bicategory ( cf Exercise 7.30) of the notion of
v
a fibration with lex fibres, and of the same with comprehension. In this sense show that tgt:C →
v
C freely adds comprehension to P:C→ S, where the objects of C are the vertical morphisms of
C; type-theoretically, this inserts an arbitrary division in the propositional part of contexts. [Hint:
for uniqueness of the meditator between the total categories, you need to know that omitted
v
diagram environment is a pullback in C , where
Γ = TPΦ = TPΨ .] Although having comprehension is a property of the fibration, viz that an
adjoint exists, this construction is not idempotent; explain why this is. In Section 5.3 we wanted
to use fibrations and comprehension to add partial maps; does the construction achieve this goal?
46. What does it mean for a fibration P:C→ S to have equality predicates corresponding to the types
given by product diagonals in S (Section 8.3)? Assume that the fibres are all preorders. Construct
the category of partial maps ( cf Propositions 5.3.5) and a fibration with comprehension whose
objects are the virtual objects of the allegory ( cf footnote pfn comp footn on page pfn comp
footn).
47. Assuming that P:C→ S admits existential quantification, construct the allegory of relations
(Proposition 5.8.7), and recover the comprehensive fibration using Freyd's tabular allegories
[ FS90, §2.166].
48. Let S be the category of types in Zermelo type theory generated by 1, N, x and P ( cf
Exercise 2.17), and all functions between them, so S is a full subcategory of Set which is closed
under these operations but not under pullback. Carry out and contrast the constructions of the
preceding three exercises for the predicate fibration P:C→ S.
49. Discuss the commutation of comprehension and powerset with substitution ( cf Remark 9.1.11)
and with the connectives and quantifiers ( cf Exercise 2.17). Show that all proposition-symbols
except ω can be eliminated, and that if types do not depend on proofs (Section 9.2) then the fibres
P(Γ) are simply typed.
50. Describe the types ω[x]xω[y] etc more explicitly and show that ∧, ∨ and ⇒ make Ω into a
Heyting lattice.
51. Reformulate the type-theoretic rules in Definition 9.5.5 so that, categorically, propositions over
Γ are classified by spans Γ\twoheadleftarrow ∆ → Prop (with two pullbacks) instead of just
maps Γ→ Prop. Relate this to Remark 6.6.5 and Example 9.3.10(e). Rework Sections 9.5 and
9.6 with this definition, which is the one used in [JM95].
X
52. Let T:1→ Σ in S be a support classifier (Definition 5.2.10). Suppose that Σ admits ⇒ , Σ exists
X
for each X ∈ obE, the Leibniz principle holds (Proposition 2.8.7) and the quantifiers ∃:Σ → Σ
ΣX
and ∀:Σ → Σ exist. Show that S is a topos and Σ = Ω [Tay98].
X
53. Show that the inverse image map !*:S→ S , where !:X→ 1 in Sp, has a continuous right adjoint
( all ) iff X is compact.
X
55. Implement Proposition 6.1.11 in a functional programming language with list- and function-
types, showing that any free theory has an equationally free model without assuming
propositional extensionality (Remark 9.5.10). Assume that Prop is a semilattice with structure
and:Prop→ Prop→ Prop and true: Prop, and the type Ω of operation-symbols has an equality
function eq:Ω→ Ω→ Prop.
56. Prove the second order representations of the propositional connectives and Leibniz' principle
(Proposition 2.8.6ff), making use of proof-anonymity but not extensionality.
57. For any support classifier Σ (satisfying extensionality), show that a∧f(a) = a∧f(T) for Γ\vdash a:Σ
and Γ, x:Σ\vdash f:Σ.
Chapter 9
Bibliography
[AJ94]
Samson Abramsky and Achim Jung. Domain theory. In Samson Abramsky et al., editors,
Handbook of Logic in Computer Science, volume 3, pages 1-168. Oxford University Press, 1994.
[Acz88]
Peter Aczel. Non-well-founded Sets. Number 14 in Lecture Notes. Center for the Study of
Language and Information, Stanford University, 1988.
[AR94]
Ji ri Adámek and Ji ri Rosický. Locally Presentable and Accessible Categories. Number 189 in
London Mathematical Society Lecture Notes. Cambridge University Press, 1994.
[Age92]
Pierre Ageron. The logic of structures. Journal of Pure and Applied Algebra, 79:15-34, 1992.
[AHS95]
Thorsten Altenkirch, Martin Hofmann, and Thomas Streicher. Categorical reconstruction of a
reduction-free normalisation proof. In Peter Johnstone, David Pitt, and David Rydeheard, editors,
Category Theory and Computer Science VI, number 953 in Lecture Notes in Computer Science,
pages 182-199. Springer-Verlag, 1995.
[AC98]
Roberto Amadio and Pierre-Louis Curien. Domains and Lambda-Calculi. Number 46 in
Cambridge Tracts in Theoretical Computer Science, Cambridge University Press, 1998.
[App92]
Andrew Appel. Compiling with Continuations. Cambridge University Press, 1992.
[AGV64]
Michael Artin, Alexander Grothendieck, and Jean-Louis Verdier, editors. Séminaire de
Géometrie Algébrique, IV: Théorie des Topos, numbers 269-270 in Lecture Notes in
Mathematics. Springer-Verlag, 1964. Second edition, 1972.
[Bae97]
John Baez. An introduction to n-categories. In Eugenio Moggi and Giuseppe Rosolini, editors,
Category Theory and Computer Science VII, number 1290 in Lecture Notes in Computer
Science, pages 1-33. Springer-Verlag, 1997.
[Bar81]
Henk Barendregt. The Lambda Calculus: its Syntax and Semantics. Number 103 in Studies in
Logic and the Foundations of Mathematics. North-Holland, 1981. Second edition, 1984.
[Bar92]
Henk Barendregt. Lambda calculi with types. In Samson Abramsky et al., editors, Handbook of
Logic in Computer Science, volume 2, pages 117-309. Oxford University Press, 1992.
[BHPRR66]
Yehoshua Bar-Hillel, E. I. J. Poznanski, M. O. Rabin, and Abraham Robinson, editors. Essays of
the Foundations of Mathematics . Magnes Press, Hebrew University, 1966. Distributed by
Oxford University Press.
[BB69]
Michael Barr and Jon Beck. Homology and standard constructions. In Eckmann [Eck69], pages
245-335.
[BGvO71]
Michael Barr, Pierre Grillet, and Donovan van Osdol, editors. Exact Categories and Categories
of Sheaves. Number 236 in Lecture Notes in Mathematics. Springer-Verlag, 1971.
[Bar79]
Michael Barr. *-Autonomous Categories. Number 752 in Lecture Notes in Mathematics.
Springer-Verlag, 1979.
[BW85]
Michael Barr and Charles Wells. Toposes, Triples, and Theories. Number 278 in Grundlehren der
mathematischen Wissenschaften. Springer-Verlag, 1985.
[BW90]
Michael Barr and Charles Wells. Category Theory for Computing Science. International Series in
Computer Science. Prentice-Hall, 1990. Second edition, 1995.
[Bar91]
Michael Barr. *-Autonomous categories and linear logic. Mathematical Structures in Computer
Science, 1:159-178, 1991.
[Bar77]
Jon Barwise, editor. Handbook of Mathematical Logic. Number 90 in Studies in Logic and the
Foundations of Mathematics. North-Holland, 1977.
[Bee80]
Michael Beeson. Foundations of Constructive Mathematics: Metamathematical Studies.
Number 6 in Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer-Verlag, 1980. Second
edition, 1985.
[Bel88]
John Lane Bell. Toposes and Local Set Theories: an Introduction. Number 14 in Logic Guides.
Oxford University Press, 1988.
[Bén85]
Jean Bénabou. Fibred categories and the foundations of naive category theory. Journal of
Symbolic Logic, 50:10-37, 1985.
[BP64]
Paul Benacerraf and Hilary Putnam, editors. Philosophy of Mathematics: Selected Readings.
Prentice-Hall, 1964. Second edition, Cambridge University Press, 1983.
[Ben64]
Paul Benacerraf. What numbers could not be. In Benacerraf and Putnam [BP64], pages 272-294.
Second edition, Cambridge University Press, 1983.
[Ber35]
Paul Bernays. Sur le platonisme dans les mathématiques. Enseignement Mathématique, 34:52-69,
1935. English translation, ``Platonism in Mathematics'' in [ BP64], pages 258-271.
[Bib82]
Wolfgang Bibel. Automated Theorem Proving. Friedrich Vieweg & Sohn, Braunschweig, 1982.
Second edition, 1987.
[BML41]
Garrett Birkhoff and Saunders Mac Lane. A Survey of Modern Algebra. MacMillan, New York,
1941. Fourth edition, 1977.
[Bir76]
Garrett Birkhoff. The rise of modern algebra. In Jan Dalton Tarwater, John White, and John
Miller, editors, Men and Institutions in American Mathematics, pages 41-86. Texas Technical
University, 1976.
[BB85]
Errett Bishop and Douglas Bridges. Constructive Analysis. Number 279 in Grundlehren der
mathematischen Wissenschaften. Springer-Verlag, 1985.
[Bla33]
Max Black. The Nature of Mathematics, a Critical Survey. International Library of Psychology.
Kegan Paul, 1933.
[Boc51]
Josef Bochenski. Ancient Formal Logic. Studies in Logic and the Foundations of Mathematics.
North-Holland, 1951.
[Boe52]
Philotheus Boehner. Medieval Logic: an Outline of its Development from 1250 to c.1400.
Manchester University Press, 1952.
[Boe58]
Philotheus Boehner. Collected articles on Ockham. Franciscan Institute, 1958. Edited by Elio
Marie Buytaert.
[Bol51]
Bernard Bolzano. Paradoxien des Undlichen. 1851. English translation, ``Paradoxes of the
Infinite'' by Fr. Prihonsky, published by Routledge, 1950.
[BJ74]
George Boolos and Richard Jeffrey. Computability and Logic. Cambridge University Press,
1974. Third edition, 1989.
[Boo93]
George Boolos. The Logic of Provability. Cambridge University Press, 1993.
[Boo98]
George Boolos. Logic, Logic and Logic. Harvard University Press, 1998. Edited by Richard
Jeffrey.
[Bor94]
Francis Borceux. Handbook of Categorical Algebra. Number 50 in Encyclopedia of Mathematics
and its Applications. Cambridge University Press, 1994. Three volumes.
[Bou57]
Nicolas Bourbaki. Eléments de Mathématique XXII: Théories des Ensembles, Livre I, Structures.
Number 1258 in Actualités scientifiques et industrielles. Hermann, 1957. English translation,
``Theory of Sets,'' 1968.
[Boy68]
Carl Boyer. A History of Mathematics. Wiley, 1968. Revised edition by Uta Merzbach, Wiley,
1989.
[BM88]
Robert Boyer and J. Strother Moore. A Computational Logic Handbook. Number 23 in
Perspectives in Computing. Academic Press, 1988.
[Bro75]
Jan Brouwer. Collected Works: Philosophy and Foundations of Mathematics, volume 1. North-
Holland, 1975. Edited by Arend Heyting.
[Bro81]
Jan Brouwer. Brouwer's Cambridge Lectures on Intuitionism . Cambridge University Press,
1981. Edited by Dirk van Dalen.
[Bro76]
Felix Browder, editor. Mathematical Developments Arising from Hilbert Problems, number 28 in
Proceedings of Symposia in Pure Mathematics. American Mathematical Society, 1976.
[Bro87]
Ronald Brown. From groups to groupoids: a brief survey. Bulletin of the London Mathematical
Society, 19:113-134, 1987.
[Bro88]
Ronald Brown. Topology: a Geometric Account of General Topology, Homotopy Types and the
Fundamental Groupoid. Mathematics and its Applications. Ellis Horwood, 1988. First edition
``Elements of modern topology,'' 1968.
[BS81]
Stanley Burris and H. P. Sankappanavar. A Course in Universal Algebra. Number 78 in Graduate
Texts in Mathematics. Springer-Verlag, 1981.
[Bur81]
Albert Burroni. Algèbres graphiques. Cahiers de Topologie et Géométrie Différentielle, XXII,
1981.
[Caj93]
Florian Cajori. A History of Mathematics. MacMillan, 1893. Fifth edition, Chelsea, N.Y., 1991.
[Caj28]
Florian Cajori. A History of Mathematical Notations. Open Court, 1928. Reprinted by Dover,
1993.
[Cam98]
Peter Cameron. Introduction to Algebra. Oxford University Press, 1998.
[Can15]
Georg Cantor. Contributions to the Founding of the Theory of Transfinite Numbers. Open Court,
1915. Translated and edited by Philip Jourdain; reprinted by Dover, 1955.
[Can32]
Georg Cantor. Gesammelte Abhandlungen mathematischen und philosophischen Inhalts.
Springer-Verlag, 1932. Edited by Ernst Zermelo; reprinted by Olms, Hildeshaim, 1962.
[CPR91]
Aurelio Carboni, Maria-Cristina Peddicchio, and Giuseppe Rosolini, editors. Proceedings of the
1990 Como Category Theory Conference, number 1488 in Lecture Notes in Mathematics.
Springer-Verlag, 1991.
[CLW93]
Aurelio Carboni, Steve Lack, and Robert Walters. Introduction to extensive and distributive
categories. Journal of Pure and Applied Algebra, 84:145-158, 1993.
[Car34]
Rudolf Carnap. Logische Syntax der Sprache. Vienna, 1934. English translation by Amethe
Smeaton, ``The Logical Syntax of Language,'' Kegan Paul, 1937.
[CE56]
Henri Cartan and Sammy Eilenberg. Homological Algebra. Princeton University Press, 1956.
[Car86]
John Cartmell. Generalised algebraic theories and contextual categories. Annals of Pure and
Applied Logic, 32:209-243, 1986.
[CK73]
Chen Chung Chang and Jerome Keisler. Model Theory. Number 73 in Studies in Logic and the
Foundations of Mathematics. North-Holland, 1973. Third edition, 1990.
[CR92]
Jon Chapman and Frederick Rowbottom. Relative Category Theory and Geometric Morphisms: a
Logical Approach. Number 16 in Logic Guides. Oxford University Press, 1992.
[Chu56]
Alonso Church. Introduction to Mathematical Logic. Princeton University Press, 1956.
[Coc93]
Robin Cockett. Introduction to distributive categories. Mathematical Structures in Computer
Science, 3:277-307, 1993.
[CCS98]
A. M. Cohen, H. Cuypers, and H. Sterk, editors. Some Tapas of Computer Algebra, number 4 in
Algorithms and Computation in Mathematics. Springer-Verlag, 1998.
[Coh66]
Paul Cohen. Set Theory and the Continuum Hypothesis. W.A. Benjamin, 1966.
[Coh77]
Paul Cohn. Algebra, volume 2. Wiley, 1977.
[Coh81]
Paul Cohn. Universal Algebra. Number 6 in Mathematics and its Applications. Reidel, 1981.
Originally published by Harper and Row, 1965.
[Con71]
John Horton Conway. Regular Algebra and Finite Machines. Chapman and Hall, 1971.
[Con76]
John Horton Conway. On Numbers and Games. Number 6 in London Mathematical Society
Monographs. Academic Press, 1976.
[CH88]
Thierry Coquand and Gérard Huet. The calculus of constructions. Information and Computation,
76:95-120, 1988.
[Coq90a]
Thierry Coquand. Metamathematical investigations of a calculus of constructions. In Odifreddi
[ Odi90], pages 91-122.
[Coq90b]
Thierry Coquand. On the analogy between propositions and types. In Gérard Huet, editor,
Logical Foundations of Functional Programming, pages 399-418. Addison-Wesley, 1990.
[Coq97]
Thierry Coquand. Computational content of classical logic. In Pitts and Dybjer [PD97], pages 33-
78.
[Cos79]
Michel Coste. Localisation, spectra and sheaf representation. In Fourman et al. [ FMS79], pages
212-238.
[Cou05]
Louis Couturat. Les Principes des Mathématiques, avec un Appendice sur le Philosophie de
Kant. 1905.
[CP92]
Roy Crole and Andrew Pitts. New foundations for fixpoint computations: FIX-hyperdoctrines
and the FIX-logic. Information and Computation, 98:171-210, 1992.
[Cro93]
Roy Crole. Categories for Types. Cambridge Mathematical Textbooks. Cambridge University
Press, 1993.
[CDS98]
Djordje Cubri\'c, Peter Dybjer, and Philip Scott. Normalisation and the Yoneda embedding.
Mathematical Structures in Computer Science, 8:153-192, 1998.
[Cur86]
Pierre-Louis Curien. Categorical Combinators, Sequential Algorithms, and Functional
Programming. Pitman, 1986. Second edition, Birkhäuser, Progress in Theoretical Computer
Science, 1993.
[CF58]
Haskell Curry and Robert Feys. Combinatory Logic I. Studies in Logic and the Foundations of
Mathematics. North- Holland, 1958. Volume II, with Jonathan Seldin, 1972.
[Cur63]
Haskell Curry. Foundations of Mathematical Logic. McGraw-Hill, 1963. Republished by Dover,
1977.
[CSH80]
Haskell Curry, Jonathan Seldin, and Roger Hindley, editors. To H.B. Curry: Essays on
Combinatory Logic, Lambda Calculus and Formalism. Academic Press, 1980.
[Dau79]
Joseph Warren Dauben. Georg Cantor: his Mathematics and Philosophy of the Infinite. Harvard
University Press, 1979.
[DST88]
James Davenport, Y. Siret, and E. Tournier. Computer Algebra: Systems and Algorithms for
Algebraic Computation. Academic Press, 1988. Translated from French; third edition 1993.
[DP90]
B. A. Davey and Hilary Priestley. Introduction to Lattices and Order. Cambridge University
Press, 1990.
[Dav65]
Martin Davis. The Undecidable. Basic Papers on Undecidable, Unsolvable Problems and
Computable Functions. Raven Press, Hewlett, N.Y., 1965.
[dB80]
Nikolas de Bruijn. A survey of the project Automath. In Curry et al. [CSH80], pages 579-606.
[Ded72]
J. W. Richard Dedekind. Stetigkeit und irrationale Zahlen. Braunschweig, 1872. Reprinted in
[Ded32], pages 315-334; English translation, ``Continuity and Irrational Numbers'' in [Ded01].
[Ded88]
J. W. Richard Dedekind. Was sind und was sollen die Zahlen? Braunschweig, 1888. Reprinted in
[Ded32], pages 335-391; English translation, ``The Nature and Meaning of Numbers'' in [Ded01].
[Ded01]
J. W. Richard Dedekind. Essays on the theory of numbers. Open Court, 1901. English
translations by Wooster Woodruff Beman; republished by Dover, 1963.
[Ded32]
J. W. Richard Dedekind. Gesammelte mathematische Werke, volume 3. Vieweg, Braunschweig,
1932. Edited by Robert Fricke, Emmy Noether and Ø ystein Ore; republished by Chelsea, New
York, 1969.
[Det86]
Michael Detlefsen. Hilbert's Program: an Essay on Mathematical Instrumentalism. Number 182
in Synthese Library. Reidel, 1986.
[Die77]
Jean Alexandre Dieudonné. Panorama des Mathé matiques Pures: la Choix Bourbachique.
Gauthier-Villars, 1977. English translation, ``A panorama of pure mathematics, as seen by N.
Bourbaki'' by I. G. Macdonald, Academic Press, Pure and applied mathematics, 97, 1982.
[Die88]
Jean Alexandre Dieudonné. A History of Algebraic and Differential Topology 1900-1960.
Birkhäuser, 1988.
[Dol72]
Albrecht Dold. Lectures on Algebraic Topology. Number 200 in Grundlehren der
mathematischen Wissenschaften. Springer- Verlag, 1972.
[Dum77]
Michael Dummett. Elements of Intuitionism. Logic Guides. Oxford University Press, 1977.
[Dum78]
Michael Dummett. Truth and Other Enigmas. Duckworth, London, 1978.
[DT87]
Roy Dyckhoff and Walter Tholen. Exponentiable maps, partial products and pullback
complements. Journal of Pure and Applied Algebra, 49:103-116, 1987.
[Eck69]
Beno Eckmann, editor. Seminar on Triples and Categorical Homology Theory, number 80 in
Lecture Notes in Mathematics. Springer- Verlag, 1969.
[Ehr84]
Charles Ehresman. OEuvres complètes et commentées. Amiens, 1980-84. Edited by Andrée
[Ehr88]
Thomas Ehrhardt. Categorical semantics of constructions. In Yuri Gurevich, editor, Logic in
Computer Science III, pages 264-273. IEEE Computer Society Press, 1988.
[Ehr89]
Thomas Ehrhardt. Dictoses. In Pitt et al. [ PRD+89], pages 213-223.
[ES52]
Sammy Eilenberg and Norman Steenrod. Foundations of Algebraic Topology. Princeton
University Press, 1952.
[EHMLR66]
Sammy Eilenberg, D. K. Harrison, Saunders Mac Lane, and Helmut Röhrl, editors. Categorical
Algebra (La Jolla, 1965). Springer- Verlag, 1966.
[EK66]
Sammy Eilenberg and Max Kelly. Closed categories. In Eilenberg et al. [EHMLR66].
[EE70]
Sammy Eilenberg and Calvin Elgot. Recursiveness. Academic Press, 1970.
[EML86]
Sammy Eilenberg and Saunders Mac Lane. Eilenberg-Mac Lane, Collected Works. Academic
Press, 1986.
[FG87]
John Fauvel and Jeremy Gray. The History of Mathematics, a Reader. Macmillan and the Open
University, 1987.
[Fen71]
Jens Erik Fenstad, editor. Second Scandinavian Logic Symposium, number 63 in Studies in Logic
and the Foundations of Mathematics. North-Holland, 1971.
[FF69]
Richard Feys and Frederic Fitch. Dictionary of Symbols of Mathematical Logic. Studies in Logic
and the Foundations of Mathematics. North-Holland, 1969.
[FJM +96]
Marcelo Fiore, Achim Jung, Eugenio Moggi, Peter O'Hearn, Jon Riecke, Giuseppe Rosolini, and
Ian Stark. Domains and denotational semantics: History, accomplishments and open problems.
Bulletin of the EATCS, 59:227-256, 1996.
[Fit52]
Frederic Benton Fitch. Symbolic Logic: an Introduction. Ronald Press, New York, 1952.
[Fit69]
Melvin Fitting. Intuitionistic Logic, Model Theory and Forcing. Studies in Logic and the
Foundations of Mathematics. North-Holland, 1969.
[Flo67]
Robert Floyd. Assigning meaning to programs. In J. T. Schwartz, editor, Mathematical Aspects of
Computer Science, number 19 in Proceedings of Symposia in Applied Mathematics, pages 19-32.
American Mathematical Society, 1967.
[FMS79]
Michael Fourman, Chris Mulvey, and Dana Scott, editors. Applications of Sheaves, number 753
in Lecture Notes in Mathematics. Springer-Verlag, 1979.
[FJP92]
Michael Fourman, Peter Johnstone, and Andrew Pitts, editors. Applications of categories in
computer science, number 177 in London Mathematical Society Lecture Notes. Cambridge
University Press, 1992.
[Fow87]
David Fowler. The Mathematics of Plato's Academy: a New Reconstruction. Oxford University
Press, 1987.
[FBH58]
Abraham Fraenkel and Yehoshua Bar-Hillel. Foundations of Set Theory. Studies in Logic and the
Foundations of Mathematics . North-Holland, 1958.
[Fre60]
Gottlob Frege. Translations from the Philosophical Writings of Gottlob Frege. Blackwell, 1960.
Edited by Peter Geach and Max Black; third edition, 1980.
[Fre84]
Gottlob Frege. Collected Papers on Mathematics, Logic and Philosophy. Blackwell, 1984. Edited
by Brian McGinness.
[Fre64]
Peter Freyd. Abelian Categories: an Introduction to the Theory of Functors. Harper and Row,
1964.
[Fre66]
Peter Freyd. The theory of functors and models. In John Addison, Leon Henkin, and Alfred
Tarski, editors, Theory of Models, Studies in Logic and the Foundations of Mathematics, pages
107-120. North-Holland, 1966.
[Fre72]
Peter Freyd. Aspects of topoi. Bulletin of the Australian Mathematical Society, 7:1-76 and 467-
480, 1972.
[FK72]
Peter Freyd and Max Kelly. Categories of continuous functors, I . Journal of Pure and Applied
Algebra, 2:169-191, 1972.
[FS90]
Peter Freyd and Andre Scedrov. Categories, Allegories. Number 39 in Mathematical Library.
North-Holland, 1990.
[Fre91]
Peter Freyd. Algebraically complete categories. In Carboni et al. [CPR91], pages 95-104.
[GU71]
Peter Gabriel and Fritz Ulmer. Lokal präsentierbare Kategorien. Number 221 in Lecture Notes in
Mathematics. Springer-Verlag, 1971.
[Gal38]
Galileo Galilei. Two New Sciences. 1638. Translated by Stillman Drake, University of Wisconsin
Press, 1974; Re-published by Wall & Thompson, 1989.
[Gal86]
Jean Gallier. Logic for Computer Science: Foundations of Automated Theorem Proving.
Computer Science and Technology Series. Harper and Row, 1986. Republished by Wiley, 1987.
[Gan56]
Robin Gandy. On the axiom of extensionality. Journal of Symbolic Logic, 21:36-48 and 24:287-
300, 1956.
[Gen35]
Gerhard Gentzen. Untersuchungen über das Logische Schliessen. Mathematische Zeitschrift,
39:176-210 and 405-431, 1935. English translation in [ Gen69], pages 68-131.
[Gen69]
Gerhard Gentzen. The Collected Papers of Gerhard Gentzen. Studies in Logic and the
Foundations of Mathematics. North-Holland, 1969. Edited by M. E. Szabo.
[GHK +80]
Gerhard Gierz, Karl Heinrich Hoffmann, Klaus Keimel, Jimmie Lawson, Michael Mislove, and
Dana Scott. A Compendium of Continuous Lattices. Springer-Verlag, 1980.
[Gil82]
Donald Gillies. Frege, Dedekind and Peano on the Foundations of Arithmetic. Number 2 in
Methodology and Science Foundation. Van Gorcum, 1982.
[Gir71]
Jean-Yves Girard. Une extension de l'interpretation de Gödel à l'analyse, et son application à
l'élimination des coupures dans l'analyse et la théorie des types. In Fenstad [Fen71], pages 63-92.
[Gir87a]
Jean-Yves Girard. Linear logic. Theoretical Computer Science, 50:1-102, 1987.
[Gir87b]
Jean-Yves Girard. Proof Theory and Logical Complexity, volume 1. Bibliopolis, 1987.
[GLT89]
Jean-Yves Girard, Yves Lafont, and Paul Taylor. Proofs and Types. Number 7 in Cambridge
Tracts in Theoretical Computer Science. Cambridge University Press, 1989.
[Gir71]
Jean Giraud. Cohomologie non-abélienne. Number 179 in Grundlehren der mathematischen
Wissenschaften. Springer- Verlag, 1971.
[Gir72]
Jean Giraud. Classifying topos. In Lawvere [ Law72], pages 43-56.
[Göd31]
Kurt Gödel. Über formal unentscheidbare Sätze der Principia Mathematica und verwandter
Systeme I. Monatshefte für Mathematik und Physik, 38:173-198, 1931. English translations, ``On
[Göd80]
Kurt Gödel. Kurt Gödel: Collected Works. Oxford University Press, 1980. Edited by Solomon
Feferman and others.
[God58]
Roger Godement. Topologie Algebrique et Theorie des Faisceaux. Hermann, 1958.
[Gol79]
Robert Goldblatt. Topoi: The Categorial Analysis of Logic. Number 98 in Studies in Logic and
the Foundations of Mathematics. North-Holland, 1979. Third edition, 1983.
[GG80]
Ivor Grattan-Guinness, editor. From the Calculus to Set Theory, 1630-1910: an Introductory
History. Duckworth, London, 1980.
[GG97]
Ivor Grattan-Guinness. The Fontana History of the Mathematical Sciences: the Rainbow of
Mathematics. Fontana, 1997.
[Grä68]
George Grätzer. Universal Algebra. Van Nostrand, 1968.
[Gra66]
John Gray. Fibred and cofibred categories. In Eilenberg et al. [EHMLR66], pages 21-83.
[Gra79]
John Gray. Fragments of the history of sheaf theory. In Fourman et al. [FMS79], pages 1-79.
[GS89]
John Gray and Andre Scedrov, editors. Categories in Computer Science and Logic, number 92 in
Contemporary Mathematics. American Mathematical Society, 1989.
[Gro64]
Alexander Grothendieck, editor. Séminaire de Géometrie Algébrique, I (1960/1), number 224 in
Lecture Notes in Mathematics. Springer-Verlag, 1964.
[Gun92]
Carl Gunter. Semantics of Programming Languages: Structures and Techniques. Foundations of
Computing. MIT Press, 1992.
[Hal60]
Pál Halmos. Naive Set Theory. Van Nostrand, 1960. Reprinted by Springer-Verlag,
Undergraduate Texts in Mathematics, 1974.
[Har40]
G. H. Hardy. A Mathematician's Apology. Cambridge University Press, 1940. Reprinted 1992.
[Hat82]
William Hatcher. The Logical Foundations of Mathematics. Foundations and Philosophy of
Science and Technology. Pergamon Press, 1982.
[Hau14]
Felix Hausdorff. Mengenlehre. 1914. English translation, ``Set Theory,'' published by Chelsea,
1962.
[Hec93]
André Heck. Introduction to Maple. Springer-Verlag, 1993. Second edition, 1996.
[Hei86]
Gerhard Heinzmann, editor. Poincaré, Russell, Zermelo et Peano: Textes de la Discussion (1906-
1912) sur les Fondements des Mathématiques: des Antinomies à la Prédicativité. Albert
Blanchard, Paris, 1986.
[HM75]
Matthew Hennessey and Robin Milner. Algebraic laws for non- determinism and concurrency.
Journal of the ACM, 32:137- 161, 1975.
[Hen88]
Matthew Hennessy. Algebraic Theory of Processes. Foundations of Computing. MIT Press,
1988.
[Hen90]
Matthew Hennessy. The Semantics of Programming Languages: an Elementary Introduction
using Structural Operational Semantics. Wiley, 1990.
[Hey56]
Arend Heyting. Intuitionism, an Introduction. Studies in Logic and the Foundations of
[HA28]
David Hilbert and Wilhelm Ackermann. Grundzüge der theoretischen Logik. Springer-Verlag,
1928. Republished 1972; English translation by Lewis Hammond et al., ``Principles of
Mathematical Logic,'' Chelsea, New York, 1950.
[HB34]
David Hilbert and Paul Bernays. Grundlagen der Mathematik . Number 40 in Grunlagen der
Mathematischen Wissenschaften. Springer-Verlag, 1934.
[Hil35]
David Hilbert. Gesammelte Abhandlungen, volume 3. Springer-Verlag, 1935. Reprinted, 1970.
[HS71]
Peter Hilton and Urs Stammbach. A Course in Homological Algebra. Number 4 in Graduate
Texts in Mathematics. Springer- Verlag, 1971. Second edition, 1997.
[HS86]
Roger Hindley and Jonathan Seldin. Introduction or Combinators and Lambda Calculus.
Number 1 in London Mathematical Society Student Texts. Cambridge University Press, 1986.
[Hoa69]
Tony Hoare. An axiomatic basis for computer programming. Communications of the ACM,
12:576-580 and 583, 1969.
[Hod93]
Wilfrid Hodges. Model Theory. Number 42 in Encyclopedia of Mathematics and its Applications.
Cambridge University Press, 1993.
[Hof95]
Martin Hofmann. On the interpretation of type theory in locally cartesian closed categories. In
Leszek Pacholski and Jerzy Tiuryn, editors, Computer Science Logic VIII, number 933 in Lecture
Notes in Computer Science, pages 427-441. Springer-Verlag, 1995.
[Hof79]
Dougals Hofstadter. Gödel, Escher, Bach, and Eternal Golden Braid. Harvester, 1979. Reprinted
by Penguin, 1980.
[How80]
William Howard. The formulae-as-types notion of construction. In Curry et al. [CSH80 ], pages
479-490.
[Hue73]
Gérard Huet. The undecidability of unification in third order logic. Information and Control, 22
(3):257-267, April 1973.
[Hue75]
Gérard Huet. A unification algorithm for typed lambda calculus. Theoretical Computer Science,
1:27-57, 1975.
[HJP80]
Martin Hyland, Peter Johnstone, and Andrew Pitts. Tripos theory. Mathematical Proceedings of
the Cambridge Philosophical Society, 88:205-232, 1980.
[Hyl81]
Martin Hyland. Function spaces in the category of locales. In Bernhard Banachewski and Rudolf-
Eberhard Hoffman, editors, Continuous Lattices, number 871 in Lecture Notes in Mathematics,
pages 264-281. Springer-Verlag, 1981.
[Hyl82]
Martin Hyland. The effective topos. In Troelstra and van Dalen [TvD82], pages 165-216.
[Hyl88]
Martin Hyland. A small complete category. Annals of Pure and Applied Logic, 40:135-165, 1988.
[HP89]
Martin Hyland and Andrew Pitts. The theory of constructions: Categorical semantics and topos-
theoretic models. In Gray and Scedrov [GS89], pages 137-199.
[HRR90]
Martin Hyland, Edmund Robinson, and Giuseppe Rosolini. The discrete objects in the effective
topos. Proceedings of the London Mathematical Society, 60:1-36, 1990.
[Jac90]
Bart Jacobs. Categorical Type Theory. PhD thesis, Universiteit Nijmegen, 1990.
[JMS91]
Bart Jacobs, Eugenio Moggi, and Thomas Streicher. Relating models of impredicative type
theories. In Pitt et al. [ PCA+91], pages 197-218.
[Jac93]
Bart Jacobs. Comprehension categories and the semantics of type dependency. Theoretical
Computer Science, 107(2):169-207, 1993.
[Jás34]
Stanislaw Jáskowski. On the rules of suppositions in formal logic. Studia Logica, 1, 1934.
Reprinted in [McC67], pages 232-258.
[Jec78]
Thomas Jech. Set Theory. Number 79 in Pure and Applied Mathematics. Academic Press, 1978.
Second edition, 1997.
[Joh77]
Peter Johnstone. Topos Theory. Number 10 in London Mathematical Society Monographs.
Academic Press, 1977.
[JPR +78]
Peter Johnstone, Robert Paré, Robert Roseburgh, Steve Schumacher, Richard Wood, and Gavin
Wraith. Indexed Categories and their Applications. Number 661 in Lecture Notes in
Mathematics. Springer-Verlag, 1978.
[Joh82]
Peter Johnstone. Stone Spaces. Number 3 in Cambridge Studies in Advanced Mathematics.
Cambridge University Press, 1982.
[Joh85]
Peter Johnstone. When is a variety a topos? Algebra Universalis, 21:198-212, 1985.
[Joh90]
Peter Johnstone. Collapsed toposes and cartesian closed varieties. Journal of Algebra, 129:446-
480, 1990.
[JT84]
André Joyal and Myles Tierney. An extension of the Galois theory of Grothendieck. Memoirs of
the American Mathematical Society, 51(309), 1984.
[Joy87]
André Joyal. Foncteurs analytiques et espèces de structures. In Gilbert Labelle and Pierre Leroux,
editors, Combinatoire énumerative, number 1234 in Lecture Notes in Mathematics, pages 126-
159. Springer- Verlag, 1987.
[JM95]
André Joyal and Ieke Moerdijk. Algebraic Set Theory. Number 220 in London Mathematical
Society Lecture Notes. Cambridge University Press, 1995.
[Jun90]
Achim Jung. Cartesian closed categories of algebraic CPO's. Theoretical Computer Science,
70:233-250, 1990.
[KR91]
Hans Kamp and Uwe Reyle. From Discourse to Logic: Introduction to Model-theoretic
Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Reidel,
1991. Re- published by Kluwer, Studies in Linguistics and Philosophy, 42, 1993.
[Kan58]
Daniel Kan. Adjoint functors. Transactions of the American Mathematical Society, 87:294-329,
1958.
[Kel55]
John Kelley. General Topology. Van Nostrand, 1955. Reprinted by Springer-Verlag, Graduate
Texts in Mathematics, 27, 1975.
[Kel69]
Max Kelly. Monomorphisms, epimorphisms and pull-backs. Journal of the Australian
Mathematical Society, 9:124-142 , 1969.
[Kel74]
Max Kelly, editor. Proceedings of the Sydney Category Theory Seminar 1972-3. Number 420 in
Lecture Notes in Mathematics. Springer-Verlag, 1974.
[Kel82]
Max Kelly. Basic Concepts of Enriched Category Theory. Number 64 in London Mathematical
Society Lecture Notes. Cambridge University Press, 1982.
[Kle52]
Stephen Kleene. Introduction to Metamathematics. Number 1 in Bibliotheca mathematica. North-
Holland, 1952. Revised edition, Wolters-Noordhoff, 1971.
[KV65]
Stephen Kleene and Richard Vesley. The Foundations of Intuitionistic Mathematics, Especially
in relation to Recursive Functions. North-Holland, 1965.
[Kle67]
Stephen Kleene. Mathematical Logic. John Wiley and Sons, 1967.
[Knu68]
Donald Knuth. The Art of Computer Programming. Addison-Wesley, 1968. Three volumes
published out of seven planned; second edition, 1973.
[KB70]
Donald Knuth and Peter Bendix. Simple word problems in universal algebra. In John Leech,
editor, Computational Problems in Abstract Algebra, pages 263-297. Pergamon Press, 1970.
[Knu74]
Donald Knuth. Surreal Numbers. Addison-Wesley, 1974.
[Koc81]
Anders Kock. Synthetic Differential Geometry. Number 51 in London Mathematical Society
Lecture Notes. Cambridge University Press, 1981.
[Koc95]
Anders Kock. Monads for which structures are adjoint to units. Journal of Pure and Applied
Algebra, 104:41-59, 1995.
[Kol25]
Andrei Kolmogorov. On the principle of excluded middle. Matemati ceskii Sbornik, 32:646-667,
1925. In Russian; English translation in [vH67 ], pages 414-437.
[Koy82]
C. P. J. Koymans. Models of the lambda calculus. Information and Control, 52:206-332, 1982.
[Kre58]
Georg Kreisel. Mathematical significance of consistency proofs. Journal of Symbolic Logic,
23:155-182, 1958.
[Kre67]
Georg Kreisel. Informal rigour and completeness proofs. In Imre Lakatos, editor, Problems in the
Philosophy of Mathematics. North-Holland, 1967.
[Kre68]
Georg Kreisel. A survey of proof theory. Journal of Symbolic Logic, 33:321-388, 1968.
[Kre71]
Georg Kreisel. A survey of proof theory II. In Fenstad [Fen71], pages 109-170.
[KKP82]
Norman Kretzmann, Anthony Kenny, and Jan Pinborg, editors. The Cambridge history of later
medieval philosophy: from the rediscovery of Aristotle to the disintegration of scholasticism,
1100-1600 . Cambridge University Press, 1982.
[KM66]
Kazimierz Kuratowski and Andrzej Mostowski. Teoria mnogosci. Polish Scientific Publishers,
1966. English translation, ``Set Theory'' by M. Maczynski, North-Holland, Studies in Logic and
the Foundations of Mathematics, number 86, 1968; second edition, 1976.
[Laf87]
Yves Lafont. Logiques, Catégories et Machines. PhD thesis, Université de Paris 7, 1987.
[LS91]
Yves Lafont and Thomas Streicher. Game semantics for linear logic. In Logic in Computer
Science VI, pages 43-50. IEEE Computer Society Press, 1991.
[Lai83]
Christian Lair. Diagrammes localement libres, extensions de corps et théorie de Galois.
Diagrammes, 10, 1983.
[Lak63]
Imre Lakatos. Proofs and refutations: the logic of mathematical discovery. British Journal for the
Philosophy of Science, 14:1-25, 1963. Edited by John Worrall and Elie Zahar, Cambridge
University Press, 1976.
[Lak86]
George Lakoff. Women, Fire, and Dangerous Things: What Categories Reveal about the Mind.
University of Chicago Press, 1986.
[Lam58]
Joachim Lambek. The mathematics of sentence structure. American Mathematical Monthly,
65:154-170, 1958.
[Lam68]
Joachim Lambek. Deductive systems and categories I: Syntactic calculus and residuated
categories. Mathematical Systems Theory, 2:287-318, 1968.
[Lam69]
Joachim Lambek. Deductive systems and categories II: Standard constructions and closed
categories. In Peter Hilton, editor, Category Theory, Homology Theory and their Applications,
number 86 in Lecture Notes in Mathematics, pages 76-122. Springer-Verlag, 1969.
[Lam72]
Joachim Lambek. Deductive systems and categories III: Cartesian closed categories, intuitionist
propositional calculus, and combinatory logic. In Lawvere [ Law72], pages 57-82.
[LS80]
Joachim Lambek and Philip J. Scott. Intuitionist type theory and the free topos. Journal of Pure
and Applied Algebra, 19:215-257, 1980.
[LS86]
Joachim Lambek and Philip Scott. Introduction to Higher Order Categorical Logic. Number 7 in
Cambridge Studies in Advanced Mathematics. Cambridge University Press, 1986.
[Lam89]
Joachim Lambek. Multicategories revisited. In Gray and Scedrov [GS89], pages 217-240.
[Lan70]
Serge Lang. Introduction to linear algebra. Addison-Wesley, 1970. Third edition, Springer-
Verlag, Undergraduate Texts in Mathematics, 1987.
[Law63]
Bill Lawvere. Functorial semantics of algebraic theories. Proceedings of the National Academy of
Sciences of the United States of America, 50(1):869-872, 1963.
[Law64]
Bill Lawvere. An elementary theory of the category of sets. Proceedings of the National
Academy of Sciences of the United States of America, 52:1506-1511, 1964.
[Law66]
William Lawvere. The category of categories as a foundation for mathematics. In Eilenberg et al.
[ EHMLR66], pages 1-20.
[Law68a]
Bill Lawvere. Diagonal arguments and cartesian closed categories . In Peter Hilton, editor,
Category Theory, Homology Theory and their Applications II, number 92 in Lecture Notes in
Mathematics, pages 134-145. Springer-Verlag, 1968.
[Law68b]
Bill Lawvere. Some algebraic problems in the context of functorial semantics of algebraic
structures. In Saunders Mac Lane, editor, Reports of the Midwest Category Seminar II,
number 61 in Lecture Notes in Mathematics, pages 41-61. Springer-Verlag, 1968.
[Law69]
Bill Lawvere. Adjointness in foundations. Dialectica, 23:281-296, 1969.
[Law70]
Bill Lawvere. Equality in hyperdoctrines and the comprehension schema as an adjoint functor. In
Alex Heller, editor, Applications of Categorical Algebra, number 17 in Proceedings of Symposia
in Pure Mathematics, pages 1-14. American Mathematical Society, 1970.
[Law71]
Bill Lawvere. Quantifiers and sheaves. In Actes du Congrès International des Mathématiciens,
volume 1, pages 329-334. Gauthier-Villars, 1971.
[Law72]
Bill Lawvere, editor. Toposes, Algebraic Geometry, and Logic, number 274 in Lecture Notes in
Mathematics. Springer-Verlag, 1972.
[Law73]
Bill Lawvere. Metric spaces, generalised logic, and closed categories. In Rendiconti del
Seminario Matematico e Fisico di Milano, volume 43. Tipografia Fusi, Pavia, 1973.
[LMW75]
Bill Lawvere, Christian Maurer, and Gavin Wraith, editors. Model Theory and Topoi, number
445 in Lecture Notes in Mathematics. Springer-Verlag, 1975.
[LS97]
Bill Lawvere and Stephen Schanuel. Conceptual Mathematics: a First Introduction to
Categories. Cambridge University Press, 1997.
[LS81]
Daniel Lehmann and Michael Smyth. Algebraic specifications of data types: a synthetic
approach. Mathematical Systems Theory, 14:97-139, 1981.
[Lei90]
Daniel Leivant. Contracting proofs to programs. In Odifreddi [Odi90], pages 279-327.
[Lew18]
Clarence Lewis. A Survey of Symbolic Logic. University of California Press, 1918. Republished
by Dover, 1960.
[Lin71]
Carl Linderholm. Mathematics made Difficult. Wolfe, London, 1971.
[Lin69]
Fred Linton. An outline of functorial semantics. In Eckmann [Eck69], pages 7-52.
[Luk51]
Jan Lukasiewicz. Aristotle's Syllogistic from the Standpoint of Modern Formal Logic. Oxford
University Press, 1951. Second edition, 1963.
[Luk63]
Jan Lukasiewicz. Elements of Mathematical Logic. Pergamon Press, 1963. Translated from
Polish by Olgierd Wojtasiewicz.
[Luk70]
Jan Lukasiewicz. Selected Works. Studies in Logic and the Foundations of Mathematics. North-
Holland, 1970. Edited by Ludwig Berkowski.
[Luo92]
Zhaohui Luo. A unifying theory of dependent types: the schematic approach. In Anil Nerode and
Mikhail Taitslin, editors, Logical Foundations of Computer Science (Logic at Tver '92), number
620 in Lecture Notes in Computer Science, pages 293-304. Springer- Verlag, 1992.
[MW91]
Malcolm MacCallum and Francis Wright. Algebraic Computing with REDUCE: lecture notes
from the First Brazilian School on Computer Algebra. Oxford University Press, 1991.
[MLB67]
Saunders Mac Lane and Garrett Birkhoff. Algebra. MacMillan, New York, 1967. Second edition,
1979.
[ML71]
Saunders Mac Lane. Categories for the Working Mathematician . Number 5 in Graduate Texts in
Mathematics. Springer-Verlag, 1971.
[ML81]
Saunders Mac Lane. Mathematical models: a sketch for the philosophy of mathematics.
American Mathematical Monthly, 88:462-472, 1981.
[ML79]
Saunders Mac Lane. Selected Papers. Springer- Verlag, 1979. Edited by Irving Kaplansky.
[ML86]
Saunders Mac Lane. Mathematics, Form and Function. Springer-Verlag, 1986.
[ML88]
Saunders Mac Lane. Categories and concepts in perspective. In Peter Duren, Richard Askey, and
Uta Merzbach, editors, A Century of Mathematics in America, volume 1, pages 323-365.
American Mathematical Society, 1988. Addendum in volume 3, pages 439-441.
[MLM92]
Saunders Mac Lane and Ieke Moerdijk. Sheaves in Geometry and Logic: a First Introduction to
Topos Theory. Universitext. Springer-Verlag, 1992.
[MR77]
Michael Makkai and Gonzalo Reyes. First Order Categorical Logic: Model-Theoretical Methods
in the Theory of Topoi and Related Categories. Number 611 in Lecture Notes in Mathematics.
Springer-Verlag, 1977.
[Mak87]
Michael Makkai. Stone duality for first order logic. Advances in Mathematics, 65:97-170, 1987.
[MP87]
Michael Makkai and Andrew Pitts. Some results on locally finitely presentable categories.
Transactions of the American Mathematical Society, 299:473-496, 1987.
[MP90]
Michael Makkai and Robert Paré. Accessible Categories: the Foundations of Categorical Model
Theory. Number 104 in Contemporary Mathematics. American Mathematical Society, 1990.
[Mak93]
Michael Makkai. The fibrational formulation of intuitionistic predicate logic. Notre Dame
Journal of Formal Logic, 34:334-7 and 471-498, 1993.
[Mak96]
Michael Makkai. Avoiding the axiom of choice in category theory. Journal of Pure and Applied
Algebra, 108:109-173, 1996.
[Mak97]
Michael Makkai. First order logic with dependent sorts. ftp.math.mcgill.ca, 1997.
[Mal71]
Anatolii Mal'cev. The Metamathematics of Algebraic Systems. Collected Papers 1936-67.
Number 66 in Studies in Logic and the Foundations of Mathematics. North-Holland, 1971.
Edited by Benjamin Wells.
[Man98]
Paolo Mancosu. From Brouwer to Hilbert: the Debate on the Foundations of Mathematics in the
1920s. Oxford University Press, 1998.
[Man76]
Ernest Manes. Algebraic Theories. Number 26 in Graduate Texts in Mathematics. Springer-
Verlag, 1976.
[Mar98]
Francisco Marmolejo. Continuous families of coalgebras. Journal of Pure and Applied Algebra,
130:197-215, 1998,
[ML75]
Per Martin-Löf. An intuitionistic theory of types: Predicative part. In Harvey Rose and John
Sheperdson, editors, Logic Colloquium '73, number 80 in Studies in Logic and the Foundations
of Mathematics, pages 73-118. North-Holland, 1975.
[ML84]
Per Martin-Löf. Intuitionistic Type Theory. Bibliopolis, Naples, 1984.
[MR73]
Adrian Mathias and Hartley Rogers, editors. Cambridge Summer School in Mathematical Logic.
Number 337 in Lecture Notes in Mathematics. Springer-Verlag, 1973.
[McC67]
Storrs McCall. Polish Logic, 1920-1939. Oxford University Press, 1967.
[MF87]
Ralph McKenzie and Ralph Freese. Commutator Theory for Congruence Modular Varieties.
Number 125 in London Mathematical Society Lecture Notes. Cambridge University Press, 1987.
[MMT87]
Ralph McKenzie, George McNulty, and Walter Taylor. Algebras, Lattices, Varieties. Wadsworth
and Brooks, 1987.
[McL92]
Colin McLarty. Elementary Categories, Elementary Toposes. Number 21 in Logic Guides.
Oxford University Press, 1992.
[MNPS91]
Dale Miller, Gopalan Nadathur, Frank Pfenning, and Andre Scedrov. Uniform proofs as a
foundation for logic programming. Annals of Pure and Applied Logic, 51:125-137, 1991.
[Mit65]
Barry Mitchell. Theory of Categories. Number 17 in Pure and applied mathematics. Academic
Press, 1965.
[MS93]
John Mitchell and Andre Scedrov. Notes on sconing and relators. In Egon Börger, Gerhard Jäger,
Hans Büning, and Michael Richter, editors, Computer Science Logic '92, number 702 in Lecture
Notes in Computer Science, pages 352-378. Springer-Verlag, 1993.
[Mit96]
John Mitchell. Foundations for Programming Languages. MIT Press, 1996.
[Mog91]
Eugenio Moggi. Notions of computation and monads. Information and Computation, 93:55-92,
1991.
[Mon66]
Richard Montague. Fraenkel's addition to the axioms of Zermelo. In Bar-Hillel et al.
[ BHPRR66], pages 91-114.
[Moo82]
Gregory Moore. Zermelo's Axiom of Choice: its Origins, Development, and Influence. Number 8
in Studies in the History of Mathematics and Physical Science. Springer-Verlag, 1982.
[MS55]
John Myhill and John Shepherson. Effective operations on partial recursive functions. Zeitschrift
für Mathematische Logik und Gründlagen der Mathematik, pages 310-317, 1955.
[NST93]
Peter Neumann, Gabrielle Stoy, and Edward Thompson. Groups and Geometry. Oxford
University Press, 1993.
[Nie82]
Susan Niefield. Cartesianness: Topological spaces, uniform spaces and affine varieties. Journal
of Pure and Applied Algebra, 23:147-167, 1982.
[Noe83]
Emmy Noether. Gesammelte Abhandlungen. Springer-Verlag, 1983. Edited by Nathan Jabobson.
[NPS90]
Bengt Nordström, Kent Petersson, and Jan Smith. Programming in Martin-Löf's Type Theory: an
Introduction. Number 7 in International Series of Monographs on Computer Science. Oxford
University Press, 1990.
[Obt89]
Adam Obtulowicz. Categorical and algebraic aspects of Martin-Löf type theory. Studia Logica,
48:299-318, 1989.
[Odi89]
Piergiorgio Odifreddi. Classical Recursion Theory: the Theory of Functions and Sets of Natural
Numbers. Number 125 in Studies in Logic and the Foundations of Mathematics. North-Holland,
1989.
[Odi90]
Piergiorgio Odifreddi, editor. Logic and Computer Science. Number 31 in APIC Studies in Data
Processing. Academic Press, 1990.
[Osi74]
Gerhard Osius. Categorical set theory: a characterisation of the category of sets. Journal of Pure
and Applied Algebra, 4:79-119, 1974.
[Par76]
David Park. The Y-combinator in Scott's lambda-calculus models. Research Report CS-RR-013,
Department of Computer Science, University of Warwick, June 1976. Revised, 1978.
[Pas65]
Boris Pasynkov. Partial topological products. Transactions of the Moscow Mathematical Society,
13:153-271, 1965.
[Pau87]
Lawrence Paulson. Logic and Computation: Interactive proof with Cambridge LCF. Number 2 in
Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, 1987.
[Pau91]
Lawrence Paulson. ML for the Working Programmer. Cambridge University Press, 1991. Second
edition, 1996.
[Pau92]
Lawrence Paulson. Designing a theorem prover. In Samson Abramsky et al., editors, Handbook
of Logic in Computer Science, pages 415-475. Oxford University Press, 1992.
[Pau94]
Lawrence Paulson. Isabelle: a Generic Theorem Prover. Number 828 in Lecture Notes in
Computer Science. Springer-Verlag, 1994.
[Pav90]
Du sko Pavlovi\'c. Predicates and Fibrations. PhD thesis, Rijksuniversiteit Utrecht, 1990.
[Pav91]
Du sko Pavlovi\'c. Constructions and predicates. In Pitt et al. [PCA+91 ], pages 173-197.
[Pea89]
Giuseppe Peano. Arithmetices Principia, Nova Methodo Exposita. Fratres Bocca, Turin, 1889.
English translation, ``The Principles of Arithmetic, presented by a new method,'' in [vH67], pages
20-55.
[Pea73]
Giuseppe Peano. Selected Works of Giuseppe Peano. Toronto University Press, 1973. Translated
and edited by Hubert Kennedy.
[Pei33]
Charles Sanders Peirce. Collected Papers. Harvard University Press, 1933. Edited by Charles
Hartshorne and Paul Weiss.
[Pie91]
Benjamin Pierce. Basic Category Theory for Computer Scientists. Foundations of Computing.
MIT Press, 1991.
[PRD +89]
David Pitt, David Rydeheard, Peter Dybjer, Andrew Pitts, and Axel Poigné, editors. Category
Theory in Computer Science III, number 389 in Lecture Notes in Computer Science. Springer-
Verlag, 1989.
[PCA +91]
David Pitt, Pierre-Louis Curien, Samson Abramsky, Andrew Pitts, Axel Poigné, and David
Rydeheard, editors. Category Theory in Computer Science IV, number 530 in Lecture Notes in
Computer Science. Springer-Verlag, 1991.
[Pit89]
Andrew Pitts. Non-trivial power types can't be subtypes of polymorphic types. In Logic in
Computer Science IV, pages 6-13. IEEE Computer Society Press, 1989.
[PT89]
Andrew Pitts and Paul Taylor. A note on Russell's Paradox in locally cartesian closed categories.
Studia Logica, 48:377-387, 1989.
[Pit95]
Andrew Pitts. Categorical logic. Technical Report 367, University of Cambridge Computer
Laboratory, May 1995.
[PD97]
Andrew Pitts and Peter Dybjer, editors. Semantics and Logics of Computation, Publications of
the Newton Institute. Cambridge University Press, 1997.
[Plo77]
Gordon Plotkin. LCF considered as a programming language. Theoretical Computer Science,
5:223-255, 1977.
[Plo81]
Gordon Plotkin. Domain theory. Post-graduate lecture notes, known as the Pisa Notes; ftp.
lfcs.ed.ac.uk, 1981.
[Poh89]
Wolfram Pohlers. Proof Theory: an Introduction. Number 1407 in Lecture Notes in
Mathematics. Springer-Verlag, 1989.
[Pol45]
George Polya. How to Solve It: a New Aspect of Mathematical Method. Princeton University
Press, 1945. Re- published by Penguin, 1990.
[Pra65]
Dag Prawitz. Natural Deduction: a Proof-Theoretical Study. Number 3 in Stockholm Studies in
Philosophy. Almquist and Wiskell, 1965.
[Pra77]
Dag Prawitz. Meanings and proofs: on the conflict between classical and intutitionistic logic.
Theoria, 43:2-40, 1977.
[Qui60]
Willard van Orman Quine. Word and Object. Studies in Communication. MIT Press, 1960.
[Ram31]
Frank Ramsey. Foundations of Mathematics. Kegan Paul, 1931.
[RS63]
Helena Rasiowa and Roman Sikorski. The Mathematics of Metamathematics. Number 41 in
Monogrfie Matematyczne. Polish Scientific Publishers, 1963.
[Ras74]
Helena Rasiowa. An Algebraic Approach to Non-classical Logics. Number 78 in Studies in Logic
and the Foundations of Mathematics. North-Holland, 1974.
[Rey83]
John Reynolds. Types, abstraction and parametric polymorphism. In Richard Mason, editor,
Information Processing, pages 513-524. North-Holland, 1983.
[Rey84]
John Reynolds. Polymorhism is not set-theoretic. In Gilles Kahn, David MacQueen, and Gordon
Plotkin, editors, Semantics of Data Types, number 173 in Lecture Notes in Computer Science,
pages 145-156. Springer-Verlag, 1984.
[RP93]
John Reynolds and Gordon Plotkin. On functors expressible in the polymorphic lambda calculus.
Information and Computation, 105:1-29, 1993.
[Rey98]
John Reynolds. Theories of Programming Languages. Cambridge University Press, 1998.
[Ric56]
Gordon Rice. Recursive and recursively enumerable orders. Transactions of the American
[Rob66]
Abraham Robinson. Non-standard Analysis. Studies in Logic and the Foundations of
Mathematics. North-Holland, Amsterdam, 1966.
[Rob82]
Derek Robinson. A Course in the Theory of Groups. Number 80 in Graduate Texts in
Mathematics. Springer-Verlag, 1982. Second edition, 1996.
[RR88]
Edmund Robinson and Giuseppe Rosolini. Categories of partial maps. Information and
Computation, 79:95-130, 1988.
[Rus03]
Bertrand Russell. The Principles of Mathematics. Cambridge University Press, 1903.
[Rus08]
Bertrand Russell. Mathematical logic based on the theory of types. American Journal of
Mathematics, 30:222-262, 1908. Reprinted in [vH67], pages 150-182.
[RW13]
Bertrand Russell and Alfred North Whitehead. Principia Mathematica. Cambridge University
Press, 1910-13.
[RB88]
David Rydeheard and Rod Burstall. Computational Category Theory. Prentice-Hall, 1988.
[Sam48]
Pierre Samuel. On universal mappings and free topological groups. Bulletin of the American
Mathematical Society, 54:591-598, 1948.
[SS82]
Andre Scedrov and Philip Scott. A note on the Friedman slash and Freyd covers. In Troelstra and
van Dalen [ TvD82], pages 443-452.
[Sch93]
Andrea Schalk. Domains arising as algebras for power space constructions. Journal of Pure and
Applied Algebra, 1993.
[Sch67]
Joseph Schoenfield. Mathematical Logic. Addison -Wesley, 1967.
[Sco66]
Dana Scott. More on the axiom of extensionality. In Bar-Hillel et al. [BHPRR66], pages 115-139.
[Sco70a]
Dana Scott. Constructive validity. In Michel Laudet, D. Lacombe, L. Nolin, and
M. Schützenberger, editors, Automatic Demonstration, number 125 in Lecture Notes in
Mathematics, pages 237-275. Springer-Verlag, 1970.
[Sco70b]
Dana Scott. Outline of a mathematical theory of computation. In Information Sciences and
Systems, pages 169-176. Princeton University Press, 1970.
[Sco76]
Dana Scott. Data types as lattices. SIAM Journal on Computing, 5:522-587, 1976.
[Sco79]
Dana Scott. Identity and existence in intuitionistic logic. In Fourman et al. [FMS79], pages 660-
696.
[SS70]
J. Arthur Seebach and Lynn Arthur Steen. Counterexamples in Topology. Holt, Rinehart and
Winston, 1970. Republished by Springer-Verlag, 1978 and by Dover, 1995.
[See84]
Robert Seely. Locally cartesian closed categories and type theory. Mathematical Proceedings of
the Cambridge Philosophical Society, 95:33-48, 1984.
[See87]
Robert Seely. Categorical semantics for higher order polymorphic lambda calclus. Journal of
Symbolic Logic, 52:969-989, 1987.
[See89]
Robert Seely. Linear logic, *-autonomous categories and cofree algebras. In Gray and Scedrov
[ GS89], pages 371-382.
[Ser71]
Jean-Pierre Serre. Repésentations Linéaires des Groupes Finis. Hermann, 1971. English
[Sha91]
Stewart Shapiro. Foundations without Foundationalism: a Case for Second-order Logic.
Number 17 in Logic Guides. Oxford University Press, 1991.
[Sko22]
Thoralf Skolem. Einige Bemerkungen zur axiomatischen Begründung der Mengenlehre. In
Skandinaviska matematikenkongressen, pages 217-232. Akademiska Bokhandeln, 1922. English
translation, ``Some Remarks on Axiomatized Set Theory'' in [vH67], pages 290-301.
[Sko70]
Thoralf Skolem. Selected Works in Logic. Universitetsforlaget, Oslo, 1970. Edited by Jens Erik
Fenstad.
[Ste67]
Norman Steenrod. A convenient category of topological spaces. Michigan Mathematics Journal,
14:133-152, 1967.
[Sto37]
Marshall Stone. Applications of the theory of Boolean rings to general topology. Transactions of
the American Mathematical Society, 41:375-481, 1937.
[SW73]
Ross Street and Robert Walters. The comprehensive factorization of a functor. Bulletin of the
American Mathematical Society, 79:936-941, 1973.
[Str80]
Ross Street. Cosmoi of internal categories. Transactions of the American Mathematical Society,
258:271-318, 1980.
[Str91]
Thomas Streicher. Semantics of Type Theory: Correctness and Completeness of a Categorical
Semantics of the Calculus of Constructions . Progress in Theoretical Computer Science.
Birkhäuser, 1991. His 1988 Passau Ph.D. thesis.
[Str69]
Dirk Struik. A Source Book in Mathematics, 1200-1800. Harvard University Press, 1969.
[Sty69]
N. I. Styazhkin. History of Mathematical Logic from Leibniz to Peano. MIT Press, 1969.
Translated from Russian, originally published by Nauka, Moscow, 1964.
[Tai75]
William Tait. A realizability interpretation of the theory of species. In R. Parik, editor, Logic
Colloquium, pages 240-251. Springer-Verlag, 1975.
[Tak75]
Gaisi Takeuti. Proof Theory. Number 81 in Studies in Logic and the Foundations of
Mathematics. North-Holland, 1975. Second edition, 1987.
[Tar56]
Alfred Tarski. Logic, Semantics, Metamathematics. Oxford University Press, 1956. Edited by J.
H. Woodger.
[Tay86a]
Paul Taylor. Internal completeness of categories of domains. In David Pitt, editor, Category
Theory and Computer Programming, number 240 in Lecture Notes in Computer Science, pages
449-465. Springer-Verlag, 1986.
[Tay86b]
Paul Taylor. Recursive Domains, Indexed Category Theory and Polymorphism. PhD thesis,
Cambridge University, 1986.
[Tay87]
Paul Taylor. Homomorphisms, bilimits and saturated domains - some very basic domain theory.
ftp.dcs.qmw.ac.uk, 1987.
[Tay88]
Paul Taylor. The trace factorisation of stable functors. 1988.
[Tay89]
Paul Taylor. Quantitative domains, groupoids and linear logic. In Pitt et al. [ PRD+89], pages 155-
181.
[Tay90]
Paul Taylor. An algebraic approach to stable domains. Journal of Pure and Applied Algebra,
64:171-203, 1990.
[Tay91]
Paul Taylor. The fixed point property in synthetic domain theory. In Gilles Kahn, editor, Logic in
Computer Science 6, pages 152-160. IEEE Computer Society Press, 1991.
[Tay96a]
Paul Taylor. Intuitionistic sets and ordinals. Journal of Symbolic Logic, 61:705-744, 1996.
[Tay96b]
Paul Taylor. On the general recursion theorem. 1996.
[Tay98]
Paul Taylor. An abstract stone duality, I: Geometric and higher order logic. 1998. In preparation.
[Ten81]
Robert Tennent. Principles of Programming Languages. Prentice-Hall, 1981.
[Ten91]
Robert Tennent. Semantics of Programming Languages. International Series in Computer
Science. Prentice-Hall, 1991.
[Tie72]
Miles Tierney. Sheaf theory and the continuum hypothesis. In Lawvere [Law72], pages 13-42.
[Tro69]
Anne Sjerp Troelstra. Principles of Intuitionism. Number 95 in Lecture Notes in Mathematics.
Springer-Verlag, 1969.
[Tro77]
Anne Sjerp Troelstra. Choice Sequences: a Chapter of Inttuitionistic Mathematics. Logic Guides.
Oxford University Press, 1977.
[TvD82]
Anne Sjerp Troelstra and Dirk van Dalen, editors. L. E. J. Brouwer Centenary Symposium,
number 110 in Studies in Logic and the Foundations of Mathematics. North-Holland, 1982.
[TvD88]
Anne Sjerp Troelstra and Dirk van Dalen. Constructivism in Mathematics, an Introduction.
Number 121 and 123 in Studies in Logic and the Foundations of Mathematics. North-Holland,
1988.
[TS96]
Anne Sjerp Troelstra and Helmut Schwichtenberg. Basic Proof Theory. Number 43 in Cambridge
Tracts in Theoretical Computer Science. Cambridge University Press, 1996.
[Tur35]
Alan Turing. On computable numbers with an application to the Entscheidungsproblem.
Proceedings of the London Mathematical Society (2), 42:230-265, 1935.
[vD80]
Dirk van Dalen. Logic and Structure. Universitext . Springer-Verlag, 1980. Second edition, 1983.
[vdW31]
Bartel van der Waerden. Moderne Algebra. Ungar, 1931. Fifth edition, Springer-Verlag, 1960;
English translation by Fred Blum and John Schulenberger, ``Algebra,'' Springer-Verlag, 1971.
[vH67]
Jan van Heijenoort, editor. From Frege to Gödel: a Source Book in Mathematical Logic, 1879-
1931. Harvard University Press, 1967. Reprinted 1971, 1976.
[Vic88]
Steven Vickers. Topology via Logic. Number 5 in Cambridge Tracts in Theoretical Computer
Science. Cambridge University Press, 1988.
[vN61]
John von Neumann. Collected Works. Pergamon Press, 1961. Edited by A. H. Taub.
[Web80]
Judson Chambers Webb. Mechanism, Mentalism and Metamathematics: an Essay on Finitism.
Number 137 in Synthese Library. Reidel, 1980.
[Wel96]
Charles Wells. The Handbook of Mathematical Discourse, http://www-math.cwru.edu/
∼ cfw2/abouthbk.htm
[Wey19]
Hermann Weyl. Der Circulus vitiosus in der heutigen Begrü ndung der Analysis. Jahrbericht der
deutschen M atematiker-Vereinigung, 28:85-92, 1919. English translation, ``The Continuum: a
Critical Examination of the Foundations of Analysis'' by Stephen Pollard and Thomas Bole,
published by Thomas Jefferson University Press, 1987, reprinted by Dover, 1993.
[Wey68]
[ZS58]
Oscar Zariski and Pierre Samuel. Commutative Algebra. Van Nostrand, 1958. Reprinted by
Springer-Verlag, Graduate Texts in Mathematics, numbers 28-9, 1975.
[Zer08a]
Ernst Zermelo. Neuer Beweis für die Möglichkeit einer Wohlordnung. Mathematische Annalen,
65:107-128, 1908. English translation, ``New proof that every set can be well ordered'' in [vH67],
pages 183-198.
[Zer08b]
Ernst Zermelo. Untersuchungen über die Grundlagen der Mengenlehre I. Mathematische
Annalen, 65:261- 281, 1908. English translation, ``Investigations in the foundations of set theory''
in [vH67], pages 199-215.
22 August 2000
First, I apologise to Heinrich Kleisli, Dito Pataraia, Maria Cristina Pedicchio, Dietmar Schumacher and
V. Zöberlein for my mistakes in their names on pages 179, 403, 533, 540, 563, 566 and 572.
Introduction
● p. x before Acknowledgements: topics in the mechanics of symbolic logic using the methods of
category theory.
● p. xi: Ruth Horner died the very week that the book went to press, and Doris Wilson died during
the following year.
● p. 4, Remark 1.1.1: The simplest operation on trees is substitution of another term for a variable.
A copy of the expression-tree for the new term is made for each occurrence of the variable, and
attached to the tree in its place. If there are many occurrences, the new term and its own variables
are proliferated, but if there are none the new term disappears.
● p. 7, Lemma 1.1.5: correctly forbids x to be free in a, because the substitution [a/x]*u is meant to
result in a term that doesn't involve x. See Definition 4.3.11(b) for why.
● p. 9, Definition 1.1.9: In fact ... In this book we make (not make make) no systematic distinction.
● p. 17, Definition 1.2.10(a) (broken sentence structure): by at most one thing, so that any two (∀)
solutions are equal (cf. Example 1.8.2):
● p. 34, Remark 1.5.9: so the sign (positive or negative) of the influence of φ depends on whether it
lies behind an odd or even number of implications.
● p. 37, Lemma 1.6.3, proof box, line 6 (significantly wrong symbol): should be ∃y.γ\landφ[y] in
the left-hand box; for clarity, I have put (γ\landφ)∨… and …∨(γ\landψ) in the right-hand box too.
● p. 60, Exercise 1.1: She found a stick or a pebble for (to ``name'') each individual sheep, and
moved it from a pile outside the pen to another inside. Any one object (not necessarily the
sheep's own ``name''). [Hint: cf Exercise 3.63.] (I now think that the Schröder-Bernstein theorem
is the proof of this, which will of course add fuel to Peter Johnstone's fire.)
● pp 60-1, Exercise 1.5: Show how to express a Turing machine as a system of reduction rules.
● p. 128: Example 3.1.6(c): The composite of two monotone functions (or of two antitone ones) is
again monotone.
● p. 132: Example 3.2.5(h) (misleading remark): Example 6.6.3(f) shows that ℑ ⊂ {⊥,T} can only
be regarded as finite if it's a complemented subset.
● p. 143, Definition 3.4.10 (misleading remark): The type 2/Ω of truth values is playing a
topological role here, in which the point T is open and ⊥ is closed. As such, 2/Ω is called the
Sierpinski space and will be written S. As with R, in this book we shall avoid questions that rely
on the intuitionistic nature of this space (see the footnote on page 502). (The remark about being
``intermediate'' is misleading, as 2/Ω is indeed the set of points of the Sierpinski space
intuitionistically.)
● p. 144, intro to Section 3.5 (significantly wrong word): the sum of posets or dcpos.
● p. 160, second paragraph of Remark 3.7.12: Cf [Con76, p. 66] (This was essentially the point of
John Conway's ``Mathematicians' Liberation Movement''.)
● p. 162: "Modal logic has medieval and even ancient roots" belongs after Definition 3.8.2.
● p. 243, Definition 4.8.15(c) (significantly wrong word): the right end of the first cell is the left
end of the second.
● p. 288, Lemma 5.7.3: use W instead of K, for just this lemma, as it's about coequalisers in general
rather than kernels. Conversely, given equal W\rightrightarrows B→ Θ, apply orthogonality.
● p. 290, diagram for Lemma 5.7.6(e) (significantly wrong symbol): (German) f;m and z;n instead
of f;n and z;m.
● p. 295, Remark 5.8.4(e): The lifting property is not unique, but there's no room to insert this.
6 Structural Recursion
● p. 342, Remark 6.5.6: [Knu68, vol. 1, pp. 353-5] explains how to store the equivalence relation.
● p. 346, Example 6.6.3(f): a subset of a finite set is finite iff it is complemented cf. Example 3.2.5
(h).
● p. 346, after Remark 6.6.4: the ambiguity in the usage of the word ``law'' mentioned in Definition
1.2.2.
● p. 350, Corollary 6.6.12: {⊥,T} is a join-semilattice; the unique join-homomorphism taking all
singletons to T maps everything else there except ∅.
● p. 358, Remark 6.7.14: It is probably true in the concrete case of ordinals in Pos and Set that a sh-
coalgebra is well founded in the sense of Definition 6.3.2 iff \prec is a well founded relation
(Definition 2.5.3). Inability to formulate the abstract result for ordinals in A and C where there is
an adjunction F\dashv U is the reason why I have not finished [Tay96b].
● p. 362, Exercise 6.23 (significantly wrong symbol): TΘ instead of T(Γ×Θ) at the top right of the
diagram.
7 Adjunctions
● p. 380, proof of Theorem 7.2.2[b⇒ c]: Putting B = FX.
µ = U·ε·F.
9 The Quantifiers
● p. 472 Notation 9.1.3: mark on plate near ``Note that nothing we did in Chapter VIII''.
● p. 480, Example 9.2.5, last paragraph: I find this example very confusing as the main one used to
demonstrate.
● p. 489, Remark 9.3.3, last line of the dotted proof box (significantly wrong symbol): [a/x, b/y]*f:
θ.
● p. 491, Definition 9.3.6: the slanted pullback symbol has been made parallel with the relevant
map.
● p. 502, Example 9.4.11(d) (misleading remark): Although ⊥ ∈ 2/Ω doesn't classify subsets in
Set intuitionistically, in Sp the closed point of the Sierpinski space does classify closed subsets.
Any point of S can be expressed as the join of a directed diagram taking only ⊥ and T as values,
whilst (the dual of) the equation in Exercise 9.57 characterises support classifiers [Tay98].
● pp 505, after Corollary 9.4.16 (misleading treatment): Beware that, whilst our approach to the
Beck-Chevalley condition does ensure that pullbacks of >-|>-maps are again >-|>-maps, such
pullbacks need not always exist in the category of locally compact spaces.
● p. 506, Section 9.5: In fact the formal rules also suggest that we should view comprehension as
forming types or contexts from contexts and not types.
● p. 519, Proposition 9.6.13[c⇒]: The infinitary version of Example 2.1.7 (rather than of its
converse Exercise 2.14).
● p. 523, Exercise 9.4 (the one and only falsity): Thomas Streicher sent me a simple
counterexample to the claim that fibrations preserve pullbacks. I have replaced the exercise with
Let C and S be categories with pullbacks and P:C→S a functor that preserves them. Suppose
that P\dashv T with P·T = id . Show that P is a fibration. Find a fibration of posets that
S
Bibliography
● [BHPRR66] Essays on the foundations of mathematics.
19 August 2006
The following corrections should be made to the 2000 reprint, which itself corrected these mistakes.
6 Structural Recursion
● p. 342, Remark 6.5.6: It should explain how the equivalence class is stored.
7 Adjunctions
● p. 412, Theorem 7.6.15: The word Corollary should have a capital C.
9 The Quantifiers
● p. 478, Proposition 9.2.3: ... and whose morphisms [plural] act as the identity ...
Bibliography
● Check date of Bart Jacobs' thesis.
● p. 543, [Law66] Bill Lawvere, for consistency.
University of St Andrews
Practical Foundations of Mathematics, Paul Taylor, Cambridge Studies in Advanced Mathematics 59,
CUP 1999, ISBN 0-521-63107-6, hardback, pp xi + 572.
This is a fascinating and rewarding book, an ``account of the foundations of mathematics (algebra) and
theoretical computer science, from a modern constructive viewpoint''. It is intended for ``students and
teachers of computing, mathematics and philosophy''.
Mathematicians are now rarely interested in the foundations of their subject, either because (they think)
the foundations impinge little on their own specialisms, or because they appear too restrictive, or (even
worse) because their justification seems to be philosophical rather than mathematical. For many,
Zermelo-Fraenkel set theory (including a choice axiom, usually in the form of Zorn's lemma) seems to
be adequate, with occasional appeals to a set/class distinction, the continuum hypothesis or large
cardinal axioms if apparently required. The underlying logic should of course be classical, following
Hilbert and his enthusiasm for Cantor's paradise rather than Brouwer and his allegedly obscurantist
views. In particular, it has been argued by some and felt by many that constructive mathematics is too
limitative: that the results are too weak and some of the proofs are too difficult.
Several events led to the need for an alternative point of view. The shift began with the 1967 work of
Bishop on constructive mathematics (especially on real analysis, later extended [2] in collaboration with
Bridges), continued with the work of Grothendieck, Lawvere et al on topos theory [5], followed by the
work [6] of Martin-Löf on predicative constructive type theory (both for its own sake, as a firm
foundation for mathematics, and for its role in software development) and the work of logicians and
computer scientists, such as Abramsky, Aczel, Hyland, Jung, Pitts, Plotkin and Scott, on the semantics
[7] of programming languages, in particular the theory of domains and the theory of non-well-founded
sets [1]. One key idea here is that what mathematicians have traditionally done, and one thing that
computer scientists need to do, is to construct algorithms as solutions to problems. So, what then is the
right foundation for this constructional activity? Constable [4], for example, points out that, for
formalisation of automata theory, including decidability results, one requires a formal theory that
includes primitive notions of computability, in order to avoid presupposing the very subject being
introduced.
Taylor's answer, like Constable's, is a version of intuitionistic type theory, based on constructive logic, i.
e. the standard intuitionistic restriction of classical logic by non-assertion of the law of excluded middle.
Bishop and Bridges showed that in such a framework one can develop a lot of useful analysis, including
for example an adequate theory of integration, constructive analogues to fixed point theorems and a
theory of Hilbert spaces. Note in particular the 1997 work (see [3]) of Richman and Bridges giving a
constructive proof of Gleason's Theorem on measures on the closed subspaces of a Hilbert space. (It had
been argued by some philosophers that this theorem was constructively invalid and that this was fatal for
the constructivist programme: but the constructively invalid result was in fact just a classical
reformulation of the theorem.)
The present book's focus is, instead, on the algebra (in a broad sense, including lattices, posets and
category theory) that can be developed on such a foundation. (But then, the category theory can also be
used as a foundation in turn, and that is what the present book is really about.) The book began ``as the
prerequisites for another book (Domains and Duality)'', which it is hoped will appear in due course. The
book's chapters cover first-order reasoning, types and induction, posets and lattices, cartesian closed
categories, limits and colimits, structural recursion, adjunctions, algebra with dependent types and the
quantifiers.
Chapter 1, on ``First Order Reasoning'', is typically non-standard and interesting: many topics are
covered here that are left out of conventional foundational treatments, such as the difference between
equations and reduction rules, the theory of descriptions and heuristics for proof discovery. The
emphasis is on natural deduction, for its closer correspondence with actual reasoning than the use of
Frege-Hilbert systems (with lots of logical axioms but few inference rules): it also happens to
correspond better with constructive logic than with classical logic, for which the unnatural rule of
reductio ad absurdum is required. There is a section on automated deduction, including a welcome
treatment of the important topic of uniform proofs, the logical basis for logic programming in languages
like Prolog (which, incidentally, is based on constructive rather than classical logic). Unification is
covered carefully, without alas any remark to the effect that it was advocated for automated deduction
by Herbrand and Prawitz before popularised by Robinson's theory of resolution (1965). The final
section, on classical and intuitionistic logic, includes a section on the axiom of choice: it is not clear how
this, involving as it does existential quantification over relations, fits in a chapter on first-order logic, at
least not in the absence of some set theory.
Chapter 2, on ``Types and induction'', presents a constructive version of type theory, embellished for
allegedly historical reasons with the name of Zermelo. This leads to the Curry-Howard analogy between
propositions and types, according to which every logical connective is also a type constructor (e.
g. implication corresponds to the function type operator). Induction and recursion, including structural
induction on lists, are studied carefully: the chapter concludes with higher-order logic, including the
second-order polymorphic lambda calculus of Girard. There is a discussion of the interesting
phenomenon that there is no continuous ``square root'' function defined on the unit circle in the complex
plane, despite Brouwer's view that all such functions are continuous. (This view is not shared by all
constructivists: [2] for example argues that it depends on extra-mathematical considerations. One way
out is, as Taylor observes, to distinguish between e.g. the real numbers and Cauchy sequences, their
representatives.)
Chapter 3, on ``Posets and Lattices'', begins on the metamathematical study of the semantics of the type
theory considered so far, with the simplest case-posets (i.e. categories where any two arrows with the
same source and target are equal). There is a double purpose here: the use of posets such as Lindenbaum
algebras generated by the provability relation between propositions, and the use of ordered sets (e.
g. Scott domains) ``to illustrate many of the phenomena of reasoning, especially about non-terminating
computation'': in other words, as a semantics for programming languages. Chapter 4, on ``Cartesian
Closed Categories'', takes this further: propositions generalise to types, provability to proofs and the
categories of the title thus arise as models of the simply typed lambda calculus, a primitive version of
constructive type theory. The remaining chapters develop this point of view, culminating in dependent
type theory (where types can be parametrised by variables ranging over some simpler type) as required
for both mathematics and software specification. There are splendid examples of (octagonal) commuting
diagrams, that test to their limit the LATEX macros devised by the author (and for which other authors
have reason to be exceedingly grateful). This reviewer is amused that some of his 1970s work on partial
products, abandoned in the 1980s as unfruitful, turns out to be a key concept in the semantics of
dependent type theory. Finally, there is a brief discussion of the much neglected Axiom of Replacement.
Almost every section has historical asides, e.g. that the ``Sheffer stroke'' was discovered at least as early
as 1764 by Ploucquet. Incidentally, the Sheffer stroke is an extreme instance of a minimalist tendency, to
reduce everything to as few primitives as possible: but electronic engineers willingly use devices other
than ``nand gates'', and it is no longer fashionable to try to reduce all of mathematics to logic, despite the
resurgence of logicism in some philosophical circles. This tendency led to an over-emphasis on classical
logic and thus to failure to observe the important connections, extensively studied in this book, between
(intuitionistic) logic and type theory.
Each chapter has several pages of subtle, provocative and imaginative exercises, varying from the trivial
to unsolved research problems. The book finishes with an excellent lengthy bibliography and a decent
index.
The book has several weaknesses. First, by focusing on algebra rather than analysis, it gives heart to
those who, despite the evidence from (e.g.) [2] still insist that constructive mathematics is inadequate for
real mathematics. Second, the word ``Practical'' in the title needs more careful justification: the
foundations are indeed those that are needed in practice for applications of mathematics in, e.
g. computer science: but the connection with practical algebra, in the form of algorithms to solve
algebraic problems, in group theory or ring theory for example, is not made sufficiently clear. Third,
there are (of course) mathematical errors, but this is not the place to consider them all. (A list is
promised for the web site.) As a first example, there is a failure to distinguish carefully between truth
and validity, e.g. in the explanation (p 43) of the preservation of validity by logical rules (``i.e. whenever
the premises are true, so is the conclusion'' [replace ``true'' by ``valid'']) and in the definition, as ``proof
and truth ... coincide'', of completeness of a proof theory (no, completeness is when provability coincides
with validity, i.e. truth in every interpretation). That this matters is shown up by Gödel's incompleteness
theorem, that provability in any formalisation of classical first-order arithmetic (or, equivalently,
arithmetic validity-truth in all models of the formalisation) is not equivalent to truth in the standard
model. A second example is the suggested proof (p.120) that the Axiom of Choice, AC, implies the Law
of Excluded Middle, LEM. According to the constructive meaning of ``exists'' the presented version of
AC is an easy theorem [6] of predicative constructive type theory, where LEM fails to be valid. So the
``proof'' depends on something additional such as the (non-constructive) use of power sets or
equivalence classes. Finally, this reviewer found it at first odd and then irritating that most cited authors
were given familiar forenames, particularly when they were not used consistently, and were sometimes
quite unfamiliar. The family and friends of L. Egbertus J. Brouwer would have been surprised to see him
referred to as ``Jan'' rather than as ``Bertus''.
Despite these imperfections, the book has many strengths. Its main strengths are its breadth, its use of
examples from an amazing spread of mathematics and its history, its exercises and its coverage of key
ideas in categorical logic. In summary, it is a magnificent compilation of ideas and techniques: it is a
mine of (well-organised) information suitable for the graduate student and experienced researcher alike.
Novice graduate students will however need a lot of help in staying afloat. A copy should be at least in
every university library (if only one could decide whether the mathematicians, the computer scientists,
the logicians, the philosophers or perhaps even the linguists should pay for it): experts in the field will
want their own copies. Further details (e.g. chapter outlines) may be found at the author's website:
http://www.dcs.qmw.ac.uk/~pt. There is even an HTML version of the book itself: book-
lovers will be pleased to hear that it is far from adequate, in the absence so far of adequate HTML
representations of mathematical diagrams.
References
[1]
Aczel, P., Non-well-founded sets, CSLI Lecture Notes 14, Stanford 1988.
[2]
Bishop, E. & Bridges, D., Constructive Analysis, Springer-Verlag, 1985.
[3]
Bridges, D., Can constructive mathematics be applied in physics?, Journal of Philosophical Logic
28, 1999, pp 439-453.
[4]
Constable, R. L., Formalising Decidability Theorems About Automata, in ``Computational
Logic'' (eds. U. Berger & H. Schwichtenberg), Springer-Verlag, 1999, pp 179-213.
[5]
Johnstone, P., Topos Theory, Academic Press, 1977.
[6]
Martin-Löf, P., Intuitionistic Type Theory, Bibliopolis (Naples), 1984.
[7]
Pitts, A. & Dybjer, P., Semantics and Logics of Computation, CUP, 1997.
In 1971, Carl Linderholm published a book entitled Mathematics Made Difficult (Wolfe Publishing Ltd.;
Zbl 217,1), in which he attempted to satirize the way in which (as he saw it) `abstract nonsense' was
taking over the foundations of mathematics and making them incomprehensible. Nearly 30 years later,
Paul Taylor has finally written the book of which Mathematics Made Difficult was a parody. That is not
intended as a criticism of Practical Foundations of Mathematics; the reviewer has little sympathy with
Linderholm's rather heavy-handed `humour', and is of the opinion that Taylor has written a splendid and
highly enjoyable book. But the parallel between the two books is inescapable.
Taylor's book is both didactic and descriptive: it attempts to explain how mathematicians and
informaticians (the latter being the author's appealing term for theoretical computer scientists) should
view the foundations on which their work is based, but at the same time it attempts to tease out the
logical structure underpinning the informal way in which mathematicians actually argue --- both today
and in previous eras of history. It is this latter feature which makes it very different from most books on
the foundations of mathematics, but at the same time brings it closer to Linderholm's satire. (To take two
examples, the detailed description of the history of the algorithm for solving the general cubic equation,
and its presentation as a program in the semi-formal language developed by the author (p. 198), remind
one irresistibly of the tribulations of M. Boulangiaire in chapter 3 of Mathematics Made Difficult. And
the author's concern for Bo Peep's difficulties in counting her sheep, evidenced by Exercise 1.1 on p. 60,
might have been lifted straight from Linderholm's discussion of whether one can use the same number-
system for adding and for counting.)
Mention of exercises is a reminder that this book is, at least partly, intended as a textbook. What sort of
students could benefit from courses based on it? The author's own suggestion is that `the first three
chapters should be accessible to final year undergraduates in mathematics and informatics'; this is
probably true, provided the mathematics undergraduates have rather more familiarity with programming
languages (and the informaticians have more familiarity with non-discrete mathematics) than is usual.
(And both groups would find the exercises pretty hard going.) It is, above all, beginning graduate
students in both disciplines who will find this book most useful; it will prompt them to think seriously
about the foundations of their subjects, and the relations between them, in a way that no other existing
book (known to the reviewer) can achieve. Indeed, if it succeeds in becoming widely used and quoted
(as it deserves to do), then it may bring about an altogether new level of understanding, by each of the
two groups, of the way in which the other group thinks about the subject it studies.
At the heart of the book, not surprisingly, is category theory. It is therefore a little unexpected not to find
a chapter headed `Introduction to Categories' or something of the sort; Chapter III on posets and lattices
is immediately followed by a chapter entitled `Cartesian Closed Categories'. However, readers who are
newcomers to category theory should not despair: Chapter IV does largely consist of an introduction to
category theory (albeit a rather more condensed one than that found in most textbooks on category
theory), and this together with Chapter V on `Limits and Colimits' will serve to introduce such readers to
all the important concepts of the subject (except, rather oddly, for adjunctions, which are held back until
Chapter VII, although adjoint functors between posets have been treated in Chapter III).
The author's other main theme is structural recursion, which forms the title of Chapter VI but which, like
the categorical notion of adjoint functor, actually pervades the whole book. Chapter VI itself is a tour of
various aspects of recursive definitions: free algebras, the general recursion theorem (formulated, in the
style of Gerhard Osius, as a theorem about well-founded coalgebras), tail recursion, and Kuratowski-
finiteness. But the thing which links all of these to each other, and to the categorical ideas which are
omnipresent in the foundations of mathematics, is the notion of the syntactic category of a theory and
the syntax/semantics adjunction. These topics are covered more fully in the last two chapters, which
introduce the notions of dependent types, fibrations and the categorical notion of quantification.
Although, in these later chapters, the going inevitably gets tougher (as befits the subject-matter), the
author's style remains user-friendly without becoming imprecise: a student who works through to the
end of the book will (rightly) feel a real sense of achievement, but there is no reason why he shouldn't
get there if he perseveres.
The book's collection of references is splendidly eclectic; Taylor is extremely good at pointing out the
(sometimes surprising) sources of ideas that most of us take for granted, and at finding apposite
quotations to support his argument. Indeed, the reviewer suspects that many readers will gain more
enjoyment from reading the footnotes than they do from the text of this book. (The reviewer's favourite
footnote, a rare unattributed example, is on page 192.) Taylor's insistence on spelling out the forenames
of cited authors, whenever he knows them, may at first seem irritating to a reader brought up on the
tradition of initials-plus-surname, but one soon gets used to it. Indeed, by the time one reaches the
Bibliography, one is tempted to wonder what crimes Godfrey Harold Hardy can have committed in
Taylor's eyes, to cause him to be reduced to `G.H.'. (In this respect, if in no other, Taylor's book is
inferior to Linderholm's!)
[ P.T.Johnstone (Cambridge) ]
MSC 1991:
*00A05 General mathematics
18-01 Textbooks (category theory)
03-01 Textbooks (mathematical logic)
06-01 Textbooks (ordered structures)
Zentralblatt MATH,
© Copyright (c) 2000 European Mathematical Society, FIZ Karlsruhe & Springer-Verlag.