Program Equals Proof PDF
Program Equals Proof PDF
Contents
0 Introduction 10
0.1 Proving instead of testing . . . . . . . . . . . . . . . . . . . . . . 10
0.2 Typing as proving . . . . . . . . . . . . . . . . . . . . . . . . . . 11
0.3 Checking programs . . . . . . . . . . . . . . . . . . . . . . . . . . 13
0.4 Checking proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
0.5 Searching for proofs . . . . . . . . . . . . . . . . . . . . . . . . . 14
0.6 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
0.7 In this course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
0.8 Other references on programs and proofs . . . . . . . . . . . . . . 16
0.9 About this document . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Propositional logic 41
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.1.1 From provability to proofs . . . . . . . . . . . . . . . . . . 41
2.1.2 Intuitionism . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.1.3 Formalizing proofs . . . . . . . . . . . . . . . . . . . . . . 43
2.1.4 Properties of logical system . . . . . . . . . . . . . . . . . 44
2.2 Natural deduction . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2.1 Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2.2 Sequents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2.3 Inference rules . . . . . . . . . . . . . . . . . . . . . . . . 46
2.2.4 Intuitionistic natural deduction . . . . . . . . . . . . . . . 46
2.2.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.6 Fragments . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2.7 Admissible rules . . . . . . . . . . . . . . . . . . . . . . . 49
2.2.8 Definable connectives . . . . . . . . . . . . . . . . . . . . 52
2.2.9 Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2.10 Structural rules . . . . . . . . . . . . . . . . . . . . . . . . 54
2.2.11 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.3 Cut elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3.1 Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.3.2 Proof substitution . . . . . . . . . . . . . . . . . . . . . . 57
2.3.3 Cut elimination . . . . . . . . . . . . . . . . . . . . . . . . 58
2.3.4 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.3.5 Intuitionism . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3.6 Commutative cuts . . . . . . . . . . . . . . . . . . . . . . 64
2.4 Proof search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.4.1 Reversible rules . . . . . . . . . . . . . . . . . . . . . . . . 65
2.4.2 Proof search . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.5 Classical logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.5.1 Axioms for classical logic . . . . . . . . . . . . . . . . . . 68
2.5.2 The intuition behind classical logic . . . . . . . . . . . . . 70
2.5.3 A variant of natural deduction . . . . . . . . . . . . . . . 72
2.5.4 Cut-elimination in classical logic . . . . . . . . . . . . . . 74
2.5.5 De Morgan laws . . . . . . . . . . . . . . . . . . . . . . . 75
2.5.6 Boolean models . . . . . . . . . . . . . . . . . . . . . . . . 79
2.5.7 DPLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.5.8 Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.5.9 Double-negation translation . . . . . . . . . . . . . . . . . 89
2.5.10 Intermediate logics . . . . . . . . . . . . . . . . . . . . . . 91
2.6 Sequent calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.6.1 Sequents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.6.2 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.6.3 Intuitionistic rules . . . . . . . . . . . . . . . . . . . . . . 96
2.6.4 Cut elimination . . . . . . . . . . . . . . . . . . . . . . . . 97
2.6.5 Proof search . . . . . . . . . . . . . . . . . . . . . . . . . . 97
2.7 Hilbert calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.7.1 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.7.2 Other connectives . . . . . . . . . . . . . . . . . . . . . . 103
2.7.3 Relationship with natural deduction . . . . . . . . . . . . 103
2.8 Kripke semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
CONTENTS 4
6 Agda 262
6.1 What is Agda? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
6.1.1 Features of proof assistants . . . . . . . . . . . . . . . . . 262
6.1.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . 266
6.2 Getting started with Agda . . . . . . . . . . . . . . . . . . . . . . 267
6.2.1 Getting help . . . . . . . . . . . . . . . . . . . . . . . . . 267
6.2.2 Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
6.2.3 The standard library . . . . . . . . . . . . . . . . . . . . . 268
6.2.4 Hello world . . . . . . . . . . . . . . . . . . . . . . . . . . 269
6.2.5 Our first proof . . . . . . . . . . . . . . . . . . . . . . . . 269
6.2.6 Our first proof, step by step . . . . . . . . . . . . . . . . . 271
6.2.7 Our first proof, again . . . . . . . . . . . . . . . . . . . . 272
6.3 Basic agda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
6.3.1 The type of types . . . . . . . . . . . . . . . . . . . . . . . 274
6.3.2 Arrow types . . . . . . . . . . . . . . . . . . . . . . . . . . 274
6.3.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
6.3.4 Postulates . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
6.3.5 Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
6.3.6 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
6.4 Inductive types: data . . . . . . . . . . . . . . . . . . . . . . . . . 278
6.4.1 Natural numbers . . . . . . . . . . . . . . . . . . . . . . . 278
6.4.2 Pattern matching . . . . . . . . . . . . . . . . . . . . . . . 278
6.4.3 The induction principle . . . . . . . . . . . . . . . . . . . 281
6.4.4 Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
6.4.5 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
6.4.6 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
6.4.7 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
6.4.8 Finite sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
6.5 Inductive types: logic . . . . . . . . . . . . . . . . . . . . . . . . 287
6.5.1 Implication . . . . . . . . . . . . . . . . . . . . . . . . . . 287
6.5.2 Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
6.5.3 Unit type . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
6.5.4 Empty type . . . . . . . . . . . . . . . . . . . . . . . . . . 289
6.5.5 Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
6.5.6 Coproduct . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
6.5.7 Π-types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
6.5.8 Σ-types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
6.5.9 Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
6.6 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
6.6.1 Equality and pattern matching . . . . . . . . . . . . . . . 295
CONTENTS 7
A Appendix 463
A.1 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
A.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
A.1.2 Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
A.1.3 Quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
A.1.4 Congruence . . . . . . . . . . . . . . . . . . . . . . . . . . 464
A.2 Monoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
A.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
A.2.2 Free monoids . . . . . . . . . . . . . . . . . . . . . . . . . 464
A.3 Well-founded orders . . . . . . . . . . . . . . . . . . . . . . . . . 465
A.3.1 Partial orders . . . . . . . . . . . . . . . . . . . . . . . . . 465
A.3.2 Well-founded orders . . . . . . . . . . . . . . . . . . . . . 465
A.3.3 Lexicographic order . . . . . . . . . . . . . . . . . . . . . 466
A.3.4 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
A.3.5 Multisets . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
A.4 Cantor’s diagonal argument . . . . . . . . . . . . . . . . . . . . . 469
A.4.1 A general Cantor argument . . . . . . . . . . . . . . . . . 469
A.4.2 Agda formalization . . . . . . . . . . . . . . . . . . . . . . 471
Chapter 0
Introduction
These are the extended notes for the INF551 course which I taught at École
Polytechnique starting from 2019. The goal is to give a first introduction to the
Curry-Howard correspondence between programs and proofs, from a theoretical
programmer’s perspective: we want to understand the theory behind logic and
programming languages, but also to write concrete programs (in OCaml) and
proofs (in Agda). Although most of the material is self-contained, the reader is
supposed to be already acquainted with logic and programming.
On day four, he gets quite confident and conjectures that, for every n ∈ N,
Z ∞ Y n
!
sin(t/(100n + 1)) π
dt =
0 i=0
t/(100n + 1) 2
He then spends the rest of the year heating his computer and the planet, suc-
cessfully proving the conjecture for increasing values of n. This approach seems
to be justified since the most complicated function involved in here is the sinus,
which is quite regular (it is periodic), and all the constants are small (we get
factors such as 100), so that if something bad ought to happen it will happen
for a not-so-big value of n and testing should discover it. In fact, the conjecture
breaks starting at
n = 15 341 178 777 673 149 429 167 740 440 969 249 338 310 889
and none of the usual tests would have found this out. There is a nice explana-
tion for this which we will not give here, see [BB01, Bae18], but the morale is:
if you want to be sure of something, don’t test it, prove it.
On the computer science side, analogous examples abound where errors have
been found in programs, although heavily tested. They have recently increased
with the advent of parallel computing (for instance, in order to exploit all the
cores that you have on your laptop or even your smartphone), where bugs might
be triggered by some particular and very rare scheduling of processes. Already
in the 70s, Dijkstra was claiming that program testing can be used to show
the presence of bugs, but never to show their absence! [Dij70], and the idea of
formally verifying programs can even be traced back 20 years before that by, as
usual, Turing [Tur49]. If we want to have software we can really trust (and not
trust most of the time), we should move from testing to proving in computer
science too.
In this course, you will learn how to perform such proofs, as well as the
theory behind it. Actually, the most complicated program we will prove correct
here is a sorting algorithm and I can already hear you thinking “come on,
we have been writing sorting algorithms for decades, we should know how to
write one by now”. While I understand your point, I have two arguments to
provide for this. Firstly, proving a more realistic program is only a matter
of time (and experience): the course covers most of the concepts required to
perform proofs, and attacking full-fledged code will not require new techniques,
only patience. Secondly, in 2015, some researchers found out, using formal
methods, that the default sorting algorithm (the one in the standard library, not
some obscure library found on the GitHub) in both Python and Java (not some
obscure programming language) was flawed, and the bug had been standing
there for more than a decade [dGdBB+ 19]...
a formula, and the program itself contains exactly the information required to
prove this formula. This is the one thing to remember from this course:
PROGRAM = PROOF
This deep relationship allows the use techniques from mathematics in order to
study programs, but also can be used to extract computational contents from
proofs in mathematics.
The goal of this course is to give a precise meaning to this vague description,
but let us give an example in order to understand it better. In a functional
language such as OCaml, we can write a function such as
let comp f g x = g (f x)
and the compiler will automatically infer a type for it. Here, it will be
('a -> 'b) -> ('b -> 'c) -> ('a -> 'c)
meaning that for any types ’a, ’b and ’c,
– if f is a function which takes a value of type ’a as argument and returns
a value of type ’b,
– if g is a function which takes a value of type ’b as argument and returns
a value of type ’c, and
– if x is a value of type ’a,
then the result is of type ’c. For instance, with the function succ of type
int -> int (it adds one to an integer), and the function string_of_int of type
int -> string (it converts an integer to a string), the expression
comp succ string_of_int 2
will be of type string (it will evaluate to "3"). Now, if we read -> as a logical
implication ⇒, the type can be written as
(A ⇒ B) ⇒ (B ⇒ C) ⇒ (A ⇒ C)
which is a valid formula. This is not by chance: in some sense, the program
comp can be considered as a way of proving that this formula is valid.
Of course, if we want to prove richer properties of programs (or use programs
to prove more interesting formulas), we should use a logic which is more expres-
sive than propositional logic. In this course, we will present dependent types
which achieve this, while keeping the proof-as-program correspondence. For
instance the euclidean division, which computes the quotient and remainder of
two integers is usually given the type
int -> int -> int * int
stating that it takes two integers as argument and returns a pair of integers.
This typing is very weak, in the sense that there are many different functions
which also have this type. With dependent types, we will be able to give it the
type
(m : int) → (n : int) → Σ(q : int).Σ(r : int).((m = nq + r) × (r < n))
which can be read as the formula
∀m ∈ int.∀n ∈ int.∃q ∈ int.∃r ∈ int.((m = nq + r) ∧ (r < n))
and entirely specifies its behavior.
CHAPTER 0. INTRODUCTION 13
abstract proof
interpretation assistants
(AbsInt, Astrée, ...) (Agda, Coq, ...)
automation expressiveness
Hoare logic
(Why3, ...)
that spaces are the same as strict ∞-categories in which all morphisms are
weakly invertible (don’t worry if you do not precisely understand all the terms
in this sentence). A few years later, this was shown to be wrong because someone
provided a counter-example [Sim98], but no one could exactly point out what
was the mistake in the original proof. Because of this, Voevodsky thought for
more than 20 years that his proof was still correct. Understanding that there
was indeed a mistake lead him to use proof assistants for all his proofs and, in
fact, propose a new foundation for mathematics using logics, which is nowadays
called homotopy type theory [Uni13]. Quoting him [Voe14]:
I now do my mathematics with a proof assistant and do not have to
worry all the time about mistakes in my arguments or about how to
convince others that my arguments are correct.
But I think that the sense of urgency that pushed me to hurry with
the program remains. Sooner or later computer proof assistants will
become the norm, but the longer this process takes the more misery
associated with mistakes and with unnecessary self-verification the
practitioners of the field will have to endure.
As a much simpler example, suppose that we want to prove that all horses
have the same color (sic). We show by induction on n the property P (n) =
“every set of n horses is monochromatic”. For n = 0 and n = 1, the property is
obvious. Now suppose that P (n) holds and consider a set H of n + 1 horses. We
can figure H as a big set, in which we can pick two distinct elements (horses)
h1 and h2 and consider the sets H1 = H \ {h2 } and H2 = H \ {h1 }:
h1 h2
H1 H2
By induction hypothesis all the horses in H1 have the same color and all the
horses in H2 have the same color. Therefore, by transitivity, all the horses in H
have the same color. Of course this proof is not valid, because we all know that
there are horses of various colors (can you spot the mistake?). Formalizing the
proof in a proof assistant will force you to fill in all the details, thus removing
the possibility for potential errors in vague arguments, and will ensure that the
arguments given are actually valid, so that flaws such as in the above proof
will be uncovered. This is not limited to small reasoning: large and important
proofs have been fully checked in Coq for instance, such as the four color theo-
rem [Gon08] in graph theory or the Feit-Thompson theorem [GAA+ 13] in group
theory.
simple enough fragments of logic (e.g. propositional logic) this can be done:
proof theory allows to carefully design efficient new proof search procedures. For
richer logics, it quickly becomes undecidable. However, modern proof assistants
(e.g. Coq) have so called tactics which can fill in some specific proofs, although
the logic is rich. Typically, they are able to take care of showing boring identities
such as (x + y) − x = y in abelian groups.
Understanding proof theory allows to formulate problems in a logical fash-
ion and solve them. It thus applies to various fields, even outside theoretical
computer science. For instance, McCarthy, a founder of Artificial Intelligence
(the name is due to him!), was a strong advocate of using mathematical logic
to represent knowledge and manipulate it [McC60]. Neural networks are admit-
tedly more fashionable these days, but one never knows what the future will be
made of.
Although we will see some proof search techniques in this course, this will
not be a central subject. The reason for this is that the main message is that
we should take proofs seriously: since a proof is the same as a program, we are
not interested in provability, but rather in proofs themselves, and proof search
techniques give us very little control over the proofs they produce.
0.6 Foundations
At the beginning of the 20th century, some annoying paradoxes surfaced in
mathematics, such as Russell’s paradox, motivating Hilbert’s program to provide
an axiomatization on which all mathematics could be founded and show that
this axiomatization is consistent: this is sometimes called the foundational crisis.
Although Gödel’s incompleteness theorems established that there is no definite
answer to this question, various formalisms have been proposed in which one
can develop most of usual mathematics. One of the most widely used is set
theory, as axiomatized by Zermelo and Fraenkel, but other formalisms have
been proposed such as Russell’s theory of types [WR12], where the current type
theory originates from: in fact, type theory can be taken as a foundation of
mathematics. People usually see set theory as being more fundamental, since
we see a type as representing a set (e.g. A ⇒ B is the set of functions from
the set of A to the one of B), but we can also view type theory as being more
fundamental since we can formalize set theory in type theory. The one you take
for foundations is a matter of taste: are you more into chickens or into eggs?
Type theory also provides a solid framework in which one can study basic
philosophical questions such as: What is reasoning? What is a proof? If I know
that something exists do I know it? What does it mean for two things to be
equal? and so on. We could spend pages discussing those matters (and others
have done so), but we rather like to formalize things, and we will see that very
satisfactory answers to those questions can be given with a few inference rules.
The meaning of life remains an open question, though.
By taking an increasingly important part in our lives and influencing the
way we see the (mathematical) world, it has even evolved for some of us into
some sort of religion based on the computational trinitarism, which stems from
CHAPTER 0. INTRODUCTION 16
categories
logic programming
The aim of the present text is to explain the bottom line of the above diagram
and leave categories for other books [Mac71, LS88, Jac99]. Another closely
related religion is constructivism, a doctrine according to which something can
be accepted only if it can actually be constructed. It will play a central role in
here, because programs precisely constitute a mean to describe the construction
of things.
Reading on the beach. A printed copy of this course be ordered from Amazon:
https://www.amazon.fr/dp/B08C97TD9G/.
CHAPTER 0. INTRODUCTION 17
Color of the cover. In case you wonder, the color of the cover was chosen because
it seemed obvious to me that
Code snippets. Most of the code shown in this book is excerpted from larger files
which are regularly compiled in order to ensure their correctness. The process
of extracting snippets for inclusion into LATEX is automated with a tool whose
code is freely available at https://github.com/smimram/snippetor.
1.1 Introduction
As an illustration of typed functional programming, we present here the OCaml
programming language which was developed by Leroy and collaborators, fol-
lowing ideas from Milner. We recall here some of the basics of the language
both because it will be used in order to provide illustrative implementations,
but also because we will detail the theory behind it and generalized it in later
chapters. This is not meant to be a complete introduction to programming
in OCaml: advanced courses and documentation can be found on the website
http://ocaml.org/, we well as in books [CMP00, MMH13].
After a brief tour of the language, we present the most important construc-
tions in section 1.2 and detail recursive types which are the main way of con-
structing types throughout the book section 1.3. In section 1.4, we present
the ideas behind the typing system and the guarantees it brings. Finally, we
illustrate how types can be thought of as formulas in section 1.5.
1.1.1 Hello world. The mandatory “Hello world!” program, which prints Hello
world!, can be written as follows:
(* Our first program. *)
print_endline "Hello, world!"
This illustrates the concise syntax of the language (compared to Java for in-
stance). Comments are written using (* ... *). Application of a function to
arguments does not require parenthesis. The indentation is not relevant in pro-
grams (contrarily to e.g. Python), but you are of course strongly encouraged to
correctly indent your programs.
1.1.5 Other features. There are some other important features of the OCaml
language, that we mention only briefly here, because we will not use them much.
Other traits. In addition to the functional programming style, OCaml has sup-
port for many other style of programming including imperative (e.g. references
described above), objects, etc. OCaml also has support for records, arrays, mod-
ules, generalized algebraic data types, etc.
1.2.2 Functions. Functions are also defined using let definition, specifying
the arguments after the name of the variable. For instance,
let add x y = x + y
which would be of type
int -> int -> int
Note that arrows are implicitly bracketed on the right: this type means
int -> (int -> int)
Application of a function to arguments is obtained by juxtaposing the function
and the arguments, e.g.
let x = add 3 4
(no need for parenthesis). There is a support for partial application, meaning
that we do not have to give all the arguments to a function (functions are
sometimes called curryfied). For instance, the incrementation of an integer can
be defined by
let incr = add 1
The value incr thus defined is the function which takes an argument y and
returns add 1 y, so that the above definition is equivalent to
let incr y = add 1 y
This is in accordance of the bracketing of the type above: add is a function
which, when given an integer argument, returns a function of type int -> int.
As mentioned above, anonymous functions can be defined by the construc-
tion fun x -> .... The add function could thus have equivalently been defined
by
let add = fun x y -> x + y
or even
let add x = fun y -> x + y
Functions can be recursive, meaning that they can call themselves. In this case,
the rec keyword has to be used. For instance the factorial function is defined
by
let rec fact n =
if n = 0 then 1 else n * fact (n - 1)
1.2.3 Booleans. The type corresponding to booleans is bool, its two values
being true and false. The usual operators are present: conjunction &&, dis-
junction ||, and negation not. In order to compare whether two values x and
y are equal or different, one should use x = y and x <> y. They can be used in
conditional branchings
if ... then ... else ...
CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING 22
or loops
while ... do ... done
as illustrated above.
Beware that the operators == and != also exist, but they compare values
physically, i.e. check whether they have the same memory location, not if they
have the same contents. For instance, using the toplevel, we have:
# let x = ref 0;;
val x : int ref = {contents = 0}
# let y = ref 0;;
val y : int ref = {contents = 0}
# x = x;;
- : bool = true
# x = y;;
- : bool = true
# x == x;;
- : bool = true
# x == y;;
- : bool = false
1.2.4 Products. The pair of x and y is noted x,y. For instance, we can con-
sider the pair 3,"hello" which has the product type int * string (it is a pair
consisting of an integer and a string). Note that addition could have been
defined as
let add' (x,y) = x + y
resulting in a slightly different function than above: it now has the type
(int * int) -> int
meaning that it takes one argument, which is a pair of integers, and returns a
pair of integers. This means that partial application is not directly available as
before, although we could still write
let incr' = fun y -> add (1,y)
1.2.5 Lists. We quite often use lists in OCaml. The empty list is denoted
[], and x::l is the list obtained by putting the value x before a list l. Most
expected functions on lists are available in the module List. For instance,
1.2.6 Strings. Strings are written as "this is a string" and the related func-
tions can be found in the module String. For instance, the function String.length
computes the length of a string and String.sub computes a substring (at given
indices) of a string. Concatenation is obtained by ^.
1.2.7 Unit. In OCaml, the type unit contains only one element, written ().
As explained above, this is the value returned by functions which only have an
effect and return no meaningful value (e.g. printing a string). They are also
quite useful to define function having an effect. For instance, if we define
let f = print_string "hello"
the program will write “hello” at the beginning of the execution, because the
expression defining f is evaluated. However, if we define
let f () = print_string "hello"
nothing will be printed because we define a function taking a unit as argument.
In the course of the program, we can then use f () in order to print “hello”.
1.3.1 Trees. As a first example, consider trees (more specifically, planar binary
trees with integer labels) such as
4 1
1 3
5 2
Here, a tree consists of finitely many nodes which are labeled by an integer and
can either have two sons, which are themselves trees, or none (in which case
they are called leaves). This description translates immediately to the following
type definition in OCaml:
type tree =
| Node of int * tree * tree
| Leaf of int
This says that a tree is recursively characterized as being Node applied to a
triple consisting of an integer and two trees or Leaf applied to an integer. For
instance, the above tree is represented as
let t = Node (3, Node (4, Leaf 1, Node (3, Leaf 5, Leaf 2)), Leaf 1)
CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING 24
Here, Node or Leaf are not functions (Leaf 1 does not reduce to anything),
they are called constructors. By convention, constructors have to begin with a
capital letter, in order to distinguish them from functions (which have to begin
with a lowercase letter).
instead of
let f x = match x with ...
(this shortcut was introduced because it is very common to directly match on
the argument of a function).
1.3.2 Usual recursive types. It is interesting to note that many (most) usual
types can be encoded as recursive types.
although OCaml chose not to do that for performance reasons. A case construc-
tion
if b then e1 else e2
could then be encoded as
match b with
| True -> e1
| False -> e2
Coproducts. We have seen that the elements of a product type 'a * 'b are pairs
x , y consisting of an element x of type ’a and an element y of type ’b. We can
define coproducts consisting of an element of type ’a or an element of type ’b
by
type ('a, 'b) coprod =
| Left of 'a
| Right of 'b
An element of this type is namely of the form Left x with x of type ’a or Right y
with y of type ’b. For instance, we can define a function which provides the
string representation of a value which is either an integer or a float by
let to_string = function
| Left n -> string_of_int n
| Right x -> string_of_float x
which is of type (int, float) coprod -> string.
Unit. The type unit has () as only value. It could have been defined as
type unit =
| T
having () being a notation for T.
CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING 26
The set associated to tree is intuitively the smallest set X ⊆ U which is closed
under adding nodes and leaves, i.e. such that F (X) = X, provided that such a
set exists. Such a set X satisfying F (X) = X is called a fixpoint of F .
In order to be able to interpret the type of trees as the smallest fixpoint of F ,
we should first show that such a fixpoint indeed exists. A crucial observation in
order to do so is the fact that the function F : P(U) → P(U) is monotone, in
the sense that, for X, Y ∈ U,
X ⊆ Y implies F (X) ⊆ F (Y ).
F (fix(F )) = fix(F )
CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING 27
fix(F ) ⊆ X
fix(F ) = F (fix(F )) ⊆ X
i.e. the fixpoint can be obtained by iterating F from the empty set. In the case
of trees,
F 0 (∅) = ∅
F 1 (∅) = {Leaf(n) | n ∈ N}
F 2 (∅) = {Leaf(n) | n ∈ N} ∪ {Nodes(n,t1 ,t2 ) | n ∈ N and t1 , t2 ∈ F 1 (∅)}
and more generally, F n (∅) is the set of trees of height strictly below n. The
theorem states that any tree is a tree of some (finite) height.
Remark 1.3.3.3. In general, there are multiple fixpoints. For instance, for the
function F corresponding to trees, the set of all “trees” where we allow to have
an infinite number of nodes is also a fixpoint of F .
CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING 28
We have
X = {n ∈ N | P (n)}
The requirement F (X) ⊆ X translates as P (0) holds and P (n) implies P (S n).
The induction principle, is thus the classical recurrence principle:
Example 1.3.3.6. Consider the type empty. We have F (X) = ∅ and thus
fix(F ) = ∅. The induction principle states that any property is necessarily
valid on all the elements of the empty set:
∀x ∈ ∅.P (x)
Exercise 1.3.3.7. Define the function F associated to the type of lists. Show that
it also has a greatest fixpoint, distinct from the smallest fixpoint, and provide
concrete description of it.
1.3.4 Option types and exceptions. Another quite useful recursive type
defined in the standard library is the option type
type 'a option =
| Some of 'a
| None
A value of this type is either of the form Some x for some x of type ’a or None.
It can be thought of as the type ’a extended with the default value None and
can be used for functions which normally return a value excepting in some cases
(in other languages such as C or Java, one would return a NULL pointer in this
case). For instance, the function returning the head of a list is almost always
defined, excepting when the argument is the empty list. It thus makes sense to
implement it as the function of type 'a list -> 'a option defined by
let hd l =
match l with
| x::l -> Some x
| [] -> None
CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING 29
It is however quite cumbersome to use because each time we want to use the
result of this function, we have to match it in order to decide whether the result
is None or not. For instance, in order to double the head of a list l of integers
known to be non-empty, we still have to write something like
match head l with
| Some n -> 2*n
| None -> 0 (* This case cannot happen *)
2 * (hd l)
to double the head of a list l. In the case where we take the head of the empty
list, the exception Not_found is raised. We can catch it with the following
construction if we need:
try
...
with
| Not_found -> ...
Abstraction. Having a typing system is also good for abstraction: we can use a
data structure without knowing the details of implementation or even having
access to them. Taking the Queue.add function as example again, we only know
that the second argument is of type 'a queue, without any more details on
this type. This means that we cannot mess up with the internals of the data
structure, and that the implementation of queues can be radically modified
without us having to change our code.
Efficiency. Static typing can also be used to improve efficiency of compiled pro-
grams. Namely, since we know in advance the type of the values we are going to
handle, our code can be specific to the corresponding data structure, and avoid
performing some security checks. For instance, in OCaml, the concatenation
function on strings can simply put the two strings together; in contrast, in a
dynamically typed programming language such as Python, the concatenation
function on strings has first to ensure that the arguments are strings, if they are
not we will try to convert them as strings, and then we can put them together.
Dynamic vs static. The types of programs can either be checked during the ex-
ecution (the typing is dynamic) or during the compilation (the typing is static),
OCaml is using the latter. Static typing has many advantages: potential er-
rors are found very early, without having to perform tests, it can help to op-
timize programs, and provides very strong guarantees on the execution of the
program. The dynamic approach also has some advantages though: the code is
more flexible, the runtime can automatically perform conversions between types
if necessary, etc.
Weak vs strong. The typing system of OCaml is strong which means that it
ensures that the values in a type are actually of that type: there is no implicit
or dynamic type conversion, no NULL pointers, no explicit manipulation of
pointers, and so on. By opposition, when those requirements are not met, the
typing system is said to be weak.
CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING 31
Polymorphism. The types in OCaml are polymorphic which means that they
can contain universally quantified variables. For instance, the identity function
let id x = x
has the type 'a -> 'a, which can also be read as the universally quantified type
∀A.A → A. This means that we can substitute ’a for any type and still get a
valid type for the identity.
Principal types. A program can admit multiple types. For instance, the identity
function admits the following types
'a -> 'a or int -> int or ('a -> 'b) -> ('a -> 'b)
and infinitely many others. The first one 'a -> 'a is however “more general”
than the others because all the other types can be obtained by substituting 'a
by some type. Such a most general type is called a principal type. The type
inference of OCaml has the property that it always generates a principal type.
1.4.3 Safety. The programs which are well-typed in OCaml are safe in the
sense that types are preserved during execution and programs do not get stuck.
In order to formalize these properties, we first need to introduce a notion of
reduction, which formalizes the way programs are executed. We will first do this
on a very small (but representative) subset of the language. Most of the concepts
used here, such as reduction or typing derivation, will be further detailed in
subsequent chapters.
n1 + n2 −→ n1 + n2
p1 −→ p01 p2 −→ p02
p1 + p2 −→ p01 + p2 p1 + p2 −→ p1 + p02
n1 < n2 n1 > n2
n1 < n2 −→ true n1 < n2 −→ false
p1 −→ p01 p2 −→ p02
p1 < p2 −→ p01 < p2 p1 < p2 −→ p1 < p02
p −→ p0
if p then p1 else p2 −→ if p0 then p1 else p2
type prog =
| Val of value
| Add of prog * prog
| Lt of prog * prog
| If of prog * prog * prog
A typical program would thus be
5 + (5 + 4) ←− (3 + 2) + (5 + 4) −→ (3 + 2) + 9
` n : int ` b : bool
` p : bool ` p1 : A ` p2 : A
` if p then p1 else p2 : A
In the second case, the reason why the program 3 + true cannot be further
reduced is that an unexpected value was provided to the sum: we were hoping
for an integer instead of the value true. We will see that the typing system
precisely prevents such situations from arising.
We write ` p : A to indicate that the program p has the type A and call it a
typing judgment. This relation is defined inductively by the rules of figure 1.3.
This means that a program p has type A when ` p : A can be derived using the
above rules. For instance, the program (1.5) has type int:
` 3 : int ` 2 : int
` 3 < 2 : bool ` 5 : int ` 1 : int
` if 3 < 2 then 5 else 1 : int
exception Type_error
` p0 : bool ` p1 : A ` p2 : A
` if p0 then p1 else p2 : A
Since p1 and p2 admit at most one type A by induction hypothesis, p also does.
Other cases are similar.
Safety. We are now ready to formally state the safety properties ensured for
typed programs. The first one, called subject reduction, states that the reduction
preserves typing:
Theorem 1.4.3.2 (Subject reduction). Given programs p and p0 such that p −→ p0 ,
if p has type A then p0 also has type A.
Proof. By hypothesis, we have both a derivation of p −→ p0 and ` p : A. We
reason by induction on the former. For instance, suppose that the last rule is
p1 −→ p01
p1 + p2 −→ p01 + p2
` p1 : int ` p2 : int
` p1 + p2 : int
– p1 and p2 are values: in this case, they are necessarily integers and p1 + p2
reduces to their sum.
Other cases are similar.
The safety property finally states that typable programs never encounter errors,
in the sense that their execution is never stuck: typically, we will never try to
evaluate a program such as 3 + true during the reduction.
Theorem 1.4.3.4 (Safety). A program p of type A is safe: either
– p reduces to a value v in finitely many steps
p −→ p1 −→ p2 −→ · · · −→ pn −→ v
p −→ p1 −→ p2 −→ · · ·
Limitations of typing. The typing systems (such as the one described above or
the one of OCaml) reject legit programs such as
(if true then 3 else false) + 1
which reduces to a value. Namely, the system imposes that the two branches
of a conditional branching should have the same type, which is not the case
here, even though we know that only the first branch will be taken, because the
condition is the constant boolean true. We thus ensure that typable programs
are safe, but not that all safe programs are typable. In fact, this has to be this
way since an easy reduction to the halting problem shows that the safety of
programs is undecidable as soon as the language is rich enough.
Also, the typing system does not prevent all errors from occurring during
the execution, such as dividing by zero or accessing an array out of its bounds.
This is because the typing system is not expressive enough. For instance, the
function
let f x = 1 / (x - 2)
CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING 38
{n : int | n 6= 2} → int
which states this function is correct as long as its input is an integer different
from 2, but this is of course not a valid type in OCaml. We will see in chapters 6
and 8 that some languages do allow such rich typing, at the cost of losing type
inference (but type checking is still decidable).
1.5.2 Other connectives. For now, the fragment of the logic we have is very
poor (we only have implication as connective), but other usual connectives also
have counterparts in types.
A∧B corresponds to ’a * ’b
Truth. The formula > corresponding to truth is always provable and we expect
that there is exactly one reason for which it should be true. Thus
Falsity. The formula ⊥ corresponds to falsity and we do not expect that it can
be proved (because false is never true). We can make it correspond to the empty
type, which can be defined as a type with no constructor:
type empty = |
The formula ⊥ ⇒ A is then shown by
let empty_elim : empty -> 'a = fun x -> match x with _ -> .
(the “.” is a “refutation case” meaning that the compiler should ensure that
this case should never happen, it is almost never used in OCaml unless you are
doing tricky stuff such as the above).
(A ⇒ B) ⇒ (¬B ⇒ ¬A)
by
let contr : ('a -> 'b) -> (('b -> empty) -> ('a -> empty)) =
fun f g a -> g (f a)
or A ⇒ ¬¬A by
let nni : 'a -> (('a -> empty) -> empty) = fun a f -> f a
CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING 40
Propositional logic
2.1 Introduction
2.1.1 From provability to proofs. Most of you are acquainted with boolean
logic based on the booleans, which we write here 0 for false, and 1 for true.
In this setting, every propositional formula can be interpreted as a boolean,
provided that we have an interpretation for the variables. The truth tables for
usual connectives are
A∧B 0 1 A∨B 0 1 A⇒B 0 1
0 0 0 0 0 1 0 1 1
1 0 1 1 1 1 1 0 1
For instance, we know that the formula A ⇒ A is valid because, for whichever
interpretation of A as a boolean, the induced interpretation of the formula is 1.
We have this idea that propositions should correspond to types. Therefore,
rather than booleans, propositions should be interpreted as sets of values and
implications as functions between the corresponding values. For instance, if
we write N for a proposition interpreted as the set of natural numbers, the
type N ⇒ N would correspond to the set of functions from natural numbers to
themselves. We now see that the boolean interpretation is very weak: it only
cares about whether sets are empty or not. For instance, depending on whether
X is empty (∅) or non-empty (¬∅), the following table indicates whether the set
X → X of functions from X to X is empty or not:
A→B ∅ ¬∅
∅ ¬∅ ¬∅
¬∅ ∅ ¬∅
Reading ∅ as “false” and ¬∅ as “true”, we see that we recover the usual truth
table for implication. In this sense, the fact that the formula N ⇒ N is true only
shows that there exists such a function, but in fact there are many such func-
tions, and we would be able to reason about the various functions themselves.
CHAPTER 2. PROPOSITIONAL LOGIC 42
2.1.2 Intuitionism. This shift from provability to proofs, was started by the
philosophical position of Brouwer starting from the early twentieth century,
called intuitionism. According to this point of view, mathematics do not con-
sist in discovering the properties of a preexisting objective reality, but is rather
a mental subjective construction, which is independent of the reality and has
an existence on its own, whose validity follows from the intuition of the mathe-
matician. From this point of view
– the conjunction A ∧ B of two propositions should be seen as having both
a proof of A and a proof of B: if we interpret propositions as sets, A ∧ B
should not be interpreted as the intersection A ∩ B, but rather are the
product A × B,
– a disjunction A∨B should be interpreted as having a proof of A or a proof
of B, i.e. it does not correspond to the union A ∪ B, but rather to the
disjoint union A t B,
– an implication A ⇒ B should be interpreted as having a way to construct
a proof of B from a proof of A,
CHAPTER 2. PROPOSITIONAL LOGIC 43
¬¬A ⇒ A
¬¬Key ⇒ Key
does not hold (explanation borrowed from Ingo Blechschmidt). For similar
reasons, Brouwer also rejected the excluded middle
¬A ∨ A
This is done according to the following steps, resulting in the following trans-
formed formulas to be proved.
– Suppose given ε, we have to show:
Now that we have decomposed the proof is very small steps, it seems possible
to give a list of all the generic rules that we are allowed to apply in a reasoning.
We will do so and will introduce a convenient formalism and notations, so that
the above proof will be noted as:
(when read from bottom to top, you should be able to see the precise corre-
spondence with the previous description of the proof).
2.1.4 Properties of logical system. Once we have formalized our logical sys-
tem we should do some sanity checks. The first requirement is that it should be
consistent: there is at least one formula A which is not provable (otherwise, the
system would be entirely pointless). The second requirement is that typecheck-
ing should be decidable: there should be an algorithm which checks whether a
proof is a valid proof or not. In contrast, the question of deciding whether a
formula is provable or not will not be decidable in general and we do not expect
to have an algorithm for that.
(ax)
Γ, A, Γ0 ` A
Γ`A⇒B Γ`A Γ, A ` B
(⇒E ) (⇒I )
Γ`B Γ`A⇒B
(>I )
Γ`>
Γ`⊥
(⊥E )
Γ`A
Γ ` ¬A Γ`A Γ, A ` ⊥
(¬E ) (¬I )
Γ`⊥ Γ ` ¬A
Γ 1 ` A1 ... Γn ` An
(2.1)
Γ`A
– the elimination rules allow the use of a formula with a given connec-
tive (which is in the formula in the leftmost premise, called the principal
premise),
– the introduction rules construct a formula with a given connective.
In figure 2.1, the elimination (resp. introduction) rules are figured on the left
(resp. right) and bear names of the form (. . .E ) (resp. (. . .I )).
The axiom rule allows the use of a formula in the context Γ: supposing that
a formula A holds, we can certainly prove it. This rule is the only one to really
make use of the context: when read from the bottom to top, all the other rules
either propagate the context or add hypothesis to it, but never inspect it.
The introduction rules are the most easy to understand: they allow proving
a formula with a given logical connective from the proofs of the immediate
subformulas. For instance, (∧I ) states that from a proof of A and a proof of B,
we can construct a proof of A ∧ B. Similarly, the rule (⇒I ) follows the usual
reasoning principle for implication: if, after supposing that A holds, we can
show B, then A ⇒ B holds.
In contrast, the elimination rules allow the use of a connective. For instance,
the rule (⇒E ), which is traditionally called modus ponens or detachment rule,
says that if A implies B and A holds then certainly B must hold. The rule
(∨E ) is more subtle and corresponds to a case analysis: if we can prove A ∨ B
then, intuitively, we can prove A or we can prove B. If in both cases we can
deduce C then C must hold. The elimination rule (⊥E ) is sometimes called ex
falso quodlibet or the explosion principle: it states that if we can prove false
then the whole logic collapses, and we can prove anything.
We can notice that there is no elimination rule for > (knowing that > is
true does not bring any new information), and no introduction rule for ⊥ (we
do not expect that there is a way to prove falsity). There are two elimination
rules for ∧ which are respectively called left and right rules, and similarly there
are two introduction rules for ∨.
2.2.5 Proofs. The set of proofs (or derivations) is the smallest set such that
given proofs πi of the sequent Γi ` Ai , for 1 6 i 6 n, and an inference rule of
the form (2.1) there is a proof of Γ ` A, often noted in the form of a tree as
π1 πn
...
Γ 1 ` A1 Γ n ` An
Γ`A
(A ∧ B) ∧ C ⇔ A ∧ (B ∧ C) A∧B ⇔B∧A
>∧A⇔A⇔A∧> A∧A⇔A
A ∧ (B ∨ C) ⇔ (A ∧ B) ∨ (A ∧ C)
A ∨ (B ∧ C) ⇔ (A ∨ B) ∧ (A ∨ C)
A⇒A (A ⇒ B) ⇒ (B ⇒ C) ⇒ (A ⇒ C)
– curryfication:
((A ∧ B) ⇒ C) ⇔ (A ⇒ (B ⇒ C))
CHAPTER 2. PROPOSITIONAL LOGIC 49
Reasoning on proofs. In this formalism, the proofs are defined inductively and
therefore we can reason by induction on them, which is often useful. Precisely,
the induction principle on proofs is the following one:
Theorem 2.2.5.6 (Induction on proofs). Suppose given a predicate P (π) on
proofs π. Suppose moreover that for every rule of figure 2.1 and every proof π
ending with this rule
π1 πn
...
Γ 1 ` A1 Γn ` An
π =
Γ`A
if P (πi ) holds for every index i, with 1 6 i 6 n, then P (π) also holds. Then
P (π) holds for every proof π.
2.2.7 Admissible rules. A rule is admissible when, whenever the premises are
provable, the conclusion is also provable. An important point here is that the
way the proof of the conclusion is constructed might depend on the proofs of the
premises, and not only on the fact that we know that the premises are provable.
Structural rules. We begin by showing that the structural rules are admissible.
Those rules are named in this way because they concern the structure of the
logical proofs, as opposed to the particular connectives we are considering for
formulas. They express some resource management possibilities for the hypoth-
esis in sequents: we can permute, merge and weaken them, see section 2.2.10.
A first admissible rule is the weakening rule, which states that whenever
one can prove a formula with some hypothesis, we can still prove it with more
hypothesis. The proof with more hypothesis is “weaker” in the sense that it
apply in less cases (since more hypothesis have to be satisfied).
CHAPTER 2. PROPOSITIONAL LOGIC 50
Γ, Γ0 ` B
(wk)
Γ, A, Γ0 ` B
is admissible.
Proof. By induction on the proof of the hypothesis Γ, Γ0 ` B.
– If the proof is of the form
(ax)
Γ, Γ0 ` B
π1 π2
0
Γ, Γ ` B ⇒ C Γ, Γ0 ` B
(⇒E )
Γ, Γ0 ` C
where π10 and π20 are respectively obtained from π1 and π2 by induction
hypothesis:
π1 π2
0 0
Γ, Γ ` B⇒C Γ, Γ `B
π10 = 0 (wk) π20 = (wk)
Γ, A, Γ ` B ⇒ C Γ, A, Γ0 ` B
π
Γ, Γ0 , B ` C
(⇒I )
Γ, Γ0 ` B ⇒ C
Also admissible is the exchange rule, which states that we can reorder hypothesis
in the contexts:
Proposition 2.2.7.2 (Exchange). The exchange rule
Γ, A, B, Γ0 ` C
(xch)
Γ, B, A, Γ0 ` C
is admissible.
Proof. By induction on the proof of the hypothesis Γ, A, B, Γ0 ` C.
Given a proof π of some sequent, we often write w(π) for a proof obtained by
weakening. Another admissible rule is contraction, which states that if we can
prove a formula with two occurrences of an hypothesis, we can also prove it with
one occurrence.
Proposition 2.2.7.3 (Contraction). The contraction rule
Γ, A, A, Γ0 ` B
(contr)
Γ, A, Γ0 ` B
is admissible.
Proof. By induction on the proof of the hypothesis Γ, A, A, Γ0 ` B.
We can also formalize the fact that knowing > does not bring information, what
we call here truth strengthening (we are not aware of a standard terminology
for this one):
Proposition 2.2.7.4 (Truth strengthening). The following rule is admissible:
Γ, >, Γ0 ` A
(tstr)
Γ, Γ0 ` A
Proof. By induction on the proof of the hypothesis Γ, >, Γ0 ` A, the only “sub-
tle” case being that we have to transform
(ax) (>I )
Γ, >, Γ0 ` > into Γ, Γ0 ` >
The cut rule. A most important admissible rule is the cut rule, which states
that if we can prove a formula B using an hypothesis A (thought of as a lemma
used in the proof) and we can prove the hypothesis A, then we can directly
prove the formula B.
Theorem 2.2.7.5 (Cut). The cut rule
Γ`A Γ, A, Γ0 ` B
(cut)
Γ, Γ0 ` B
is admissible.
CHAPTER 2. PROPOSITIONAL LOGIC 52
Proof. For simplicity, we restrict to the case where Γ0 is the empty context,
which is not an important limitation because the exchange rule admissible. The
cut rule can be derived from the rules of implication by
Γ, A ` B
(⇒I )
Γ`A Γ`A⇒B
(⇒E )
Γ`B
We will see in section 2.3.2 that the above proof is not satisfactory and will
provide another one, which brings much more information about the dynamics
of the proofs.
from which it follows that, given a provable formula A, the formula A0 ob-
tained from A by changing all connectives ¬− into − ⇒ ⊥ is provable, without
using (¬E ) and (¬I ). Conversely, suppose given a formula A, such that the
transformed formula A0 is provable. We have to show that A is also provable,
which is more subtle. In the proof of A0 , for each subproof of the form
π
Γ`B⇒⊥
π
Γ`B
(wk) (ax)
Γ, B ` B ⇒ ⊥ Γ, B ` B
(⇒E )
Γ, B ` ⊥
(¬I )
Γ ` ¬B
A ⇒ B = ¬A ∨ B
A ⇔ B = (A ⇒ B) ∧ (B ⇒ A)
Γ, A, B, Γ0 ` C Γ, A, A, Γ0 ` B
(xch) (contr)
Γ, B, A, Γ0 ` C Γ, A, Γ0 ` B
Γ, Γ0 ` B Γ, >, Γ0 ` A
(wk) (tstr)
Γ, A, Γ0 ` B Γ, Γ0 ` A
We have seen in section 2.2.7 that they are admissible our system.
Contexts as sets. The rules of exchange and contraction allow to think of con-
texts as sets (rather than lists) of formulas, because a set is a list “up to permu-
tation and duplication of its elements”. More precisely, given a set A, we write
P(A) for the set of subsets of A, and A∗ for the set of lists over A. We define
an equivalence relation ∼ on A∗ as the smallest equivalence relation such that
Γ, A, B, ∆ ∼ Γ, B, A, ∆ Γ, A, A, ∆ ∼ Γ, A, ∆
Lemma 2.2.10.1. The function f : A∗ → P(A) which to a list associates its set
of elements is surjective. Moreover, given Γ, ∆ ∈ A∗ , we have f (Γ) = f (∆) if
and only if Γ ∼ ∆.
We could therefore have directly defined contexts to be sets of formulas, as is
sometimes done, but this would be really unsatisfactory. Namely, a formula A
in a context can be thought of as some kind of hypothesis which is to be proved
by an auxiliary lemma and we might have twice the same formula A, but proved
by different means: in this case, we would like to be able to refer to a particular
instance of A (which is proved in a particular way), and we cannot do this
if we have a set of hypothesis. For instance, there are intuitively two proofs
of A ⇒ A ⇒ A: the one which uses the left A to prove A and the one which
uses the right one (this will become even more striking with the Curry-Howard
correspondence, see remark 4.1.7.2). However, with contexts as sets, both are
the same:
(ax)
A`A
(⇒I )
A`A⇒A
(⇒I )
`A⇒A⇒A
A less harmful simplification which is sometimes done is to quotient by exchange
only (and not contraction), in which case the contexts become multisets, see
appendix A.3.5. We will refrain from doing that either here.
CHAPTER 2. PROPOSITIONAL LOGIC 55
Variants of the proof system. The structural rules are usually taken as “real”
(as opposed to admissible) rules of the proof system. Here, we have carefully
chosen the formulation of rules, so that they are admissible, but it would not
hold anymore if we had used subtle variants instead. For instance, if we replace
the axiom rule by
(ax) or (ax)
Γ, A ` A A`A
Γ`A ∆`B
(∧I )
Γ, ∆ ` A ∧ B
the structural rules are not all admissible anymore. The fine study of the struc-
ture behind this lead Girard to introduce linear logic [Gir87].
A[B/X]
for the substitution of X by B in A, i.e. the type A where all the occurrences
of X have been replaced by B. More generally, a substitution for A is a function
which to every variable X ∈ FV(A) assigns a type σ(X) and we also write
A[σ]
for the type A where every variable X has been replaced by σ(X). Similarly,
given a context Γ = x1 : A1 , . . . , xn : An , we define
We often write
[A1 /X1 , . . . , An /Xn ]
for the substitution σ such that σ(Xi ) = Ai and σ(X) = X for X different from
each Xi . It satisfies
{X ∈ X | σ(X) 6= X}
In our reasoning to prove that 6 can be halved, we have used the fact that 6 is
even, which we must have proved in this way:
– 6 is even because 6 = 4 + 2 and 4 is even, where
– 4 is even because 4 = 2 + 2 and 2 is even, where
– 2 is even because 2 = 0 + 2 and 0 is even, where
– 0 is even by definition.
From the proof of the lemma, we know that the half of 6 is the successor of the
half of 4, which is the successor of the half of 2 which is the successor of the half
of 0, which is 0. Writing, as usual, n/2 for a half of n, we have
6/2 = (4/2) + 1 = (2/2) + 1 + 1 = (0/2) + 1 + 1 + 1 = 0 + 1 + 1 + 1 = 3
Therefore the half of 6 is 3: we have managed to extract the actual value of the
half of 6 from the proofs the 6 is even and the above lemma. This example is
further formalized in section 6.6.3.
CHAPTER 2. PROPOSITIONAL LOGIC 57
2.3.1 Cuts. In logic, the use of a lemma to show a result is called a “cut”. This
must not be confused with the (cut) rule presented in theorem 2.2.7.5, although
they are closely related. Formally, a cut in a proof is an elimination rule whose
principal premise is proved by an introduction rule of the same connective. For
instance, the following are cuts:
π π0 π
Γ`A Γ`B Γ, A ` B π0
(∧I ) (⇒I )
Γ`A∧B Γ`A⇒B Γ`A
(∧lE ) (⇒E )
Γ`A Γ`B
The formula in the principal premise is called the cut formula: above, the cut
formulas are respectively A∧B and A ⇒ B. A proof containing a cut intuitively
does “useless work”. Namely, the one on the left starts from a proof π of A in
the context Γ, which it uses to prove A ∧ B, from which it deduces A: in order
to prove A, the proof π was already enough and the proof π 0 of B was entirely
superfluous. Similarly, for the proof on the right, we show in π that supposing A
we can prove B, and also in π 0 that we can prove A: we could certainly directly
prove B, replacing in π all the places where the hypothesis A is used (say by an
axiom) by the proof π 0 . For this reason, cuts are sometimes also called detours.
From a proof-theoretic point of view, it might seem a bit strange that some-
one would use such a kind of proof structure, but this is actually common in
mathematics: when we want to prove a result, we often prove a lemma which
is more general than the result we want to show and then deduce the result we
were aiming at. One of the reason for proceeding in this way is that we can
use the same lemma to cover multiple cases, and thus have shorter proofs (not
to mention that they are generally more conceptual and modular, since we can
reuse the lemmas for other proofs). We will see that, however, we can always
avoid using cuts in order to prove formulas. Before doing so, we first need to
introduce the main technical result which allows this.
(ax) (ax)
Γ, A ` A Γ, A ` A
(∧I ) ..
Γ, A, B ` A ∧ A 0 .
π= (⇒I ) π =
Γ, A ` B ⇒ A ∧ A Γ`A
π0 π0
Γ`A Γ`A
(wk) (wk)
Γ, B ` A Γ, B ` A
(∧I )
Γ, B ` A ∧ A
(⇒I )
Γ`B ⇒A∧A
π π0
and
Γ, A, Γ0 ` B Γ`A
π[π 0 /A]
Γ, Γ0 ` B
Γ`A Γ, A, Γ0 ` B
(cut)
Γ, Γ0 ` B
is admissible.
Proof. By induction on π.
We will see that the admissibility of this rule is the main ingredient to prove
cut elimination, thus its name.
2.3.3 Cut elimination. A logic has the cut elimination property when when-
ever a formula is provable then it is also provable with a proof which does not
involve cuts: we can always avoid doing unnecessary things. This procedure
was introduced by Gentzen under the name Hauptsatz [Gen35]. In general, we
not only want to know that such a proof exists, but also to have an effective
cut elimination procedure which transforms a proof into one without cuts. The
reason for this is that we will see in section 4.1.8 that this corresponds to some-
how “executing” the proof (or the program corresponding to it): this is why
Girard [Gir87] claims that
A logic without cut elimination is like a car without an engine.
Although the proof obtained after eliminating cuts is “simpler” in the sense that
it does not contains unnecessary steps (cuts), it cannot always be considered as
“better”: it is generally much bigger than the original one. The quote above
explains it: think of a program computing the factorial of 1000. We see that a
result can be much bigger than the program computing it [Boo84], and it can
take much time to compute [Ore82].
Theorem 2.3.3.1. Intuitionistic natural deduction has the cut elimination prop-
erty.
CHAPTER 2. PROPOSITIONAL LOGIC 59
π
Γ, A ` B π0
(⇒I )
Γ`A⇒B Γ`A π[π 0 /A]
(⇒E )
Γ`B Γ`B
π π0
Γ`A Γ`B
(∧I )
Γ`A∧B π
(∧lE )
Γ`A Γ`A
π π0
Γ`A Γ`B
(∧I )
Γ`A∧B π0
(∧rE )
Γ`B Γ`B
π
Γ`A π0 π 00
(∨lI )
Γ`A∨B Γ, A ` C Γ, B ` C π 0 [π/A]
(∨E )
Γ`C Γ`C
π
Γ`B π0 π 00
(∨rI )
Γ`A∨B Γ, A ` C Γ, B ` C π 00 [π/B]
(∨E )
Γ`C Γ`C
Proof. Suppose given a proof which contains a cut. This means that at some
point in the proof we encounter one of the following situations (i.e. we have a
subproof of one of the following forms), in which case we transform the proof
as indicated by on figure 2.2 (we do not handle the cut on ¬ since ¬A can
be coded as A ⇒ ⊥). For instance,
(ax) (ax)
Γ, A ` A Γ, A ` A
(∧I )
Γ, A ` A ∧ A π
(⇒I )
Γ`A⇒A∧A Γ`A
(⇒E )
Γ`A∧A
is transformed into
π π
Γ`A Γ`A
(∧I )
Γ`A∧A
We iterate the process on the resulting proof until all the cuts have been re-
CHAPTER 2. PROPOSITIONAL LOGIC 60
moved.
As it can be noticed on the above example, applying the transformation
might duplicate cuts: if the above proof π contained cuts, then the transformed
proof contains twice the cuts of π. It is therefore not clear that the process
actually terminates, whichever order we choose to eliminate cuts. We will see
in section 4.2 that it indeed does, but the proof will be quite involved. It
is sufficient for now to show that a particular strategy for eliminating cuts is
terminating: at each step, we suppose that we eliminate a cut of highest depth,
i.e. there is no cut “closest to the axioms” (for instance, we could apply the above
transformation only if π has not cuts). We define the size |A| of a formula A as
its number of connectives and variables:
The degree of a cut is the size of the cut formula (e.g. of A ⇒ A ∧ A in the above
example, whose size is 2 + 3|A|), and the degree of a proof is then defined as the
multiset (see appendix A.3.5) of the degrees of the cuts it contains. It can then
be checked that whenever we apply , the newly created cuts are of strictly
lower degree than the cut we eliminated and therefore the degree of the proof
decreases according to the multiset order, see appendix A.3.5. For instance, if
we apply a transformation
π
Γ, A ` B π0
(⇒I )
Γ`A⇒B Γ`A π[π 0 /A]
(⇒E )
Γ`B Γ`B
we suppose that π 0 has not cuts (otherwise the eliminated cut would not be of
highest depth). The degree of the cut is |A ⇒ B|. All the cuts present in the
resulting proof where already present in the original proof, excepting new cuts
on A which might be created by the substitution of π 0 in π, which are of degree
|A| < |A ⇒ B|. Since the multiset order is well-founded, see theorem A.3.5.1,
the process will eventually come to an end: we cannot have an infinite sequence
of transformations, chosen according to our strategy.
The previous theorem states that, as long as we are interested in provability, we
can restrict to cut-free proofs. This is of interest because we often have a good
idea of which rules can be used in those. In particular, we have the following
useful result:
Proposition 2.3.3.2. For any formula A, a cut-free proof of ` A necessarily ends
with an introduction rule.
Proof. Consider the a cut-free proof π of ` A. We reason by induction on it.
This proof cannot be an axiom because the context is empty. Suppose that π
ends with an elimination rule:
..
.
π= (?E )
`A
For each of the elimination rules, we observe that the principal premise is nec-
essarily of the form ` A0 , and therefore ends with an introduction rule, by
CHAPTER 2. PROPOSITIONAL LOGIC 61
and thus contains a cut, which is excluded since we have supposed π to be cut-
free. Since π cannot end with an axiom nor an elimination rule, it necessarily
ends with an introduction rule.
In the above proposition, it is crucial that we consider a formula in an empty
context: a cut-free proof of Γ ` A does not necessarily end with an introduction
rule if Γ is arbitrary.
2.3.4 Consistency. The least one can expect from a non-trivial logical system
is that not every formula is provable, otherwise the system is of no use. A logical
system is consistent when there is at least one formula which cannot be proved
in the system. Since, by (⊥E ), one can deduce any formula from ⊥, we have:
Lemma 2.3.4.1. The following are equivalent:
(i) the logical system is consistent,
(ii) the formula ⊥ cannot be proved,
(ax)
¬(¬A ∨ A), A ` A
(ax) (∨rI )
¬(¬A ∨ A), A ` ¬(¬A ∨ A) ¬(¬A ∨ A), A ` ¬A ∨ A
(¬E )
¬(¬A ∨ A), A ` ⊥
(¬I )
¬(¬A ∨ A) ` ¬A
(ax) (∨lI )
¬(¬A ∨ A) ` ¬(¬A ∨ A) ¬(¬A ∨ A) ` ¬A ∨ A
(¬E )
¬(¬A ∨ A) ` ⊥
(¬I )
` ¬¬(¬A ∨ A)
(2.2)
This proof will be analyzed in more details in section 2.5.2.
A variant of the above lemma which is sometimes useful is the following one:
Lemma 2.3.5.3. Given a propositional variable X, the formula ¬X ∨¬¬X cannot
be proved in NJ.
Proof. Let us prove this in a slightly different way than in lemma 2.3.5.2. It
can be proved in NJ that ¬> ⇒ ⊥:
(ax) (>I )
¬> ` ¬> ¬> ` >
(¬E )
¬> ` ⊥
(⇒I )
` ¬> ⇒ ⊥
Γ`⊥
(⊥E ) ...
Γ`A Γ`⊥
(?E ) (⊥E )
Γ`B Γ`B
π π0 π 00
Γ`A∨B Γ, A ` C Γ, B ` C
(∨E ) ...
Γ`C
(?E )
Γ`D
π0 π 00
... ...
π Γ, A ` C Γ, B ` C
(?E ) (?E )
Γ`A∨B Γ, A ` D Γ, B ` D
(∨E )
Γ`D
2.3.6 Commutative cuts. Are the cuts the only situations where one is doing
useless work in proofs? No. It turns out that falsity and disjunction induce some
more situations where we would like to eliminate “useless work”. For instance,
consider the following proof:
(ax)
⊥`⊥
(⊥E ) (ax) (ax)
⊥`A∨A ⊥, A ` A ⊥, A ` A
(∨E )
⊥`A
For the hypothesis ⊥, we deduce the “general” statement that A ∨ A holds, from
which we deduce that A holds. Clearly, we ought to be able to simplify this
proof by into
(ax)
⊥`⊥
(⊥E )
⊥`A
where we directly prove A instead of using the “lemma” A∨A as an intermediate
step. Another example of a situation is the following one:
(ax) (ax) (ax) (ax)
A, B ∨ C, B ` A A, B ∨ C, B ` A A, B ∨ C, C ` A A, B ∨ C, C ` A
(ax) (∧I ) (∧I )
A, B ∨ C ` B ∨ C A, B ∨ C, B ` A ∧ A A, B ∨ C, C ` A ∧ A
(∨E )
A, B ∨ C ` A ∧ A
(∧lE )
A, B ∨ C ` A
2.4.2 Proof search. In order to have an idea of how proof search can be
performed in general in NJ, we now describe an algorithm, due to Statman,
see [Sta79] and [SU06, section 6.6]. For simplicity, we restrict to the implica-
tional fragment, where formulas are built out of variables and implication, and
the rules are (ax), (⇒E ) and (⇒I ).
Suppose fixed a sequent Γ ` A. A naive search for a proof tree of this sequent
might end up in a loop: the search space is not finite. For instance, looking for
a proof of X ⇒ X ` X, our search might try to construct an infinite proof tree
looking like this:
..
.
(ax) (⇒E )
Γ`X⇒X Γ`X
(ax) (⇒E )
Γ`X⇒X Γ`X
(ax) (⇒E )
Γ`X⇒X Γ`X
(⇒E )
Γ`X
(** Formulas. *)
type t =
| Var of string
| Imp of t * t
¬ Halts(M ) ∨ Halts(M )
holds for every Turing machine M , which seems to mean that we should be able
to decide whether a Turing machine is halting or not, but there is no hope of
finding such an algorithm since Turing has shown that the halting problem is
undecidable [Tur37].
2.5.1 Axioms for classical logic. A logical system for classical logic, called
NK (for K lassical N atural deduction), can be obtained from NJ (figure 2.1) by
adding a new rule corresponding to the excluded middle
(lem)
Γ ` ¬A ∨ A
In this sense, the excluded middle is the only thing which is missing in intu-
itionistic logic to be classical. This is shown in theorems 2.5.6.1 and 2.5.6.5.
In fact, excluded middle is not the only possible choice, and other equiva-
lent axioms can be added instead. Most of those axioms correspond to usual
reasoning patterns, which have been known for a long time, and thus bear latin
names.
Theorem 2.5.1.1. The following principles are equivalent in NJ:
(i) excluded middle, also called tertium non datur:
¬A ∨ A
¬¬A ⇒ A
(iii) contraposition:
(¬B ⇒ ¬A) ⇒ (A ⇒ B)
¬(A ⇒ B) ⇒ A ∧ ¬B
(¬A ⇒ A) ⇒ A
¬(¬A ∧ ¬B) ⇒ A ∨ B
¬(¬A ∨ ¬B) ⇒ A ∧ B
(A ⇒ (B ∨ C)) ⇒ ((A ⇒ B) ∨ C)
By “equivalent”, we mean here that if we suppose that one holds for every
formulas A, B and C then the other one also holds for every formulas A, B
and C, and conversely.
Proof. We only show here the equivalence between the first two, the other ones
being left as an exercise. Suppose that the excluded middle holds, we can show
reductio ad absurdum by
(ax) (ax)
¬¬A, ¬A ` ¬¬A ¬¬A, ¬A ` ¬A
(¬E ) (ax)
¬¬A ` ¬A ∨ A ¬¬A, ¬A ` A ¬¬A, A ` A
(∨E )
¬¬A ` A
(⇒I )
` ¬¬A ⇒ A
(2.3)
Suppose that reductio ad absurdum holds, we can show the excluded middle by
π
` ¬¬(¬A ∨ A) ⇒ (¬A ∨ A) ` ¬¬(¬A ∨ A)
(∨E )
` ¬A ∨ A (2.4)
where π is the proof (2.2) on page 63.
Remark 2.5.1.2. One should be careful about the quantifications over formu-
las involved in theorem 2.5.1.1. In order to illustrate this, let us detail the
equivalence between excluded middle and reductio ad absurdum. We say that
a formula A is decidable when ¬A ∨ A holds and regular when ¬¬A ⇒ A holds.
The derivation (2.3) shows that every decidable formula is regular, but the con-
verse does not hold: the derivation (2.4) only shows that A is decidable when
¬A ∨ A (as opposed to A) is regular. In fact a concrete example of a formula
which is regular but not decidable, can be given by taking A = ¬X: the for-
mula ¬¬¬X ⇒ ¬X holds (lemma 2.5.9.4), but ¬¬X ∨ ¬X cannot be proved
(lemma 2.3.5.3). Thus, it is important to remark that theorem 2.5.1.1 does not
say that a formula is regular if and only if it is decidable, but rather that every
formula is regular if and only if every formula is decidable.
CHAPTER 2. PROPOSITIONAL LOGIC 70
Among those axioms, the Pierce law is less natural than others but has the
advantage of requiring only implication, so that it still makes sense in some
small fragments of logic such as minimal implicational logic. Also, note that
the fact that material implication occurs in this list means that A ⇒ B is not
equivalent to ¬A ∨ B in NJ, as it is in LK. For each of these axioms, we could
add more or less natural forms of rules. For instance, the law of the excluded
middle can also be implemented by the nicer looking rule
Γ, ¬A ` B Γ, A ` B
(lem)
Γ`B
2.5.2 The intuition behind classical logic. Let us try to give some proof
theoretic intuition about how classical logic works.
one point. For this reason, doubly negated formulas are sometimes said to be
proof irrelevant: again, the actual proof does not matter, only its existence. For
instance, we now understand why
¬¬(¬A ∨ A)
is provable intuitionistically: it states that it is true that there exists a proof
of ¬A or a proof of A, as opposed to ¬A ∨ A which states that we have a proof
of ¬A or a proof of A. From this point of view, the classical axiom
¬¬A ⇒ A
now seems like deep magic: it means that if we know that there exists a proof
of A we can actually extract a proof of A. This can only be true if we assume
that there can be at most one proof for a formula, i.e. formulas are interpreted
as booleans and not sets (see section 2.5.4 for a logical point of view on this).
This also explains why we can actually embed classical logic into intuitionistic
logic by double-negating formulas, see section 2.5.9: if we are only interested in
their existence, intuitionistic proofs behave classically.
Resetting proofs. Let us give another, more operational, point of view on the
axiom ¬¬A ⇒ A. We have mentioned that it is equivalent to having the rule
Γ ` ¬¬A
(¬¬E )
Γ`A
so that when searching for a proof of A, we can instead prove ¬¬A. What do
we gain in doing so? At first it does not seem much, since we can go back to
proving A:
..
.
(ax)
Γ, ¬A ` ¬A Γ, ¬A ` A
(¬E )
Γ, ¬A ` ⊥
(¬I )
Γ ` ¬¬A
But there is one difference, we now have the additional hypothesis ¬A in our
context, and we can use it at any point in the proof to go back to proving A
instead of the current goal B, while keeping the current context:
..
.
(ax)
Γ0 , ¬A ` ¬A 0
Γ , ¬A ` A
(⊥E )
Γ0 , ¬A ` ⊥
(¬E )
Γ0 , ¬A ` B
In other words, we can “reset proofs” during proof search, i.e. we can implement
the following behavior (up to minor details such as weakening):
..
.
0
Γ `A
(reset)
Γ0 ` B
..
.
Γ`A
CHAPTER 2. PROPOSITIONAL LOGIC 72
Γ`∆
where both Γ and ∆ are contexts. Such a sequent should be read as “supposing
all the formula in Γ, I can prove some formula in ∆”. This is a generalization of
previous sequents, where ∆ was restricted to exactly one formula. The rules for
this sequent calculus are given in figure 2.5. In order to ease the presentation,
we consider here that the formulas of ∆ can be explicitly permuted, duplicated,
and so on, using the structural rules (xchR ), (wkR ), (contrR ) and (⊥R ), which
we generally leave implicit in examples. Those rules are essentially the same as
those for NJ, with contexts added on the right, excepting for the rule (∨lI ) and
(∨rI ), which are now combined into the rule
Γ ` A, B, ∆
(∨I )
Γ ` A ∨ B, ∆
CHAPTER 2. PROPOSITIONAL LOGIC 73
(ax)
Γ, A, Γ0 ` A, ∆
Γ ` A ⇒ B, ∆ Γ ` A, ∆ Γ, A ` B, ∆
(⇒E ) (⇒I )
Γ ` B, ∆ Γ ` A ⇒ B, ∆
Γ ` A ∧ B, ∆ l Γ ` A ∧ B, ∆ r Γ ` A, ∆ Γ ` B, ∆
(∧E ) (∧E ) (∧I )
Γ ` A, ∆ Γ ` B, ∆ Γ ` A ∧ B, ∆
Γ`∆
(>I )
Γ ` >, ∆
Γ ` A ∨ B, ∆ Γ, A ` C, ∆ Γ, B ` C, ∆ Γ ` A, B, ∆
(∨E ) (∨I )
Γ ` C, ∆ Γ ` A ∨ B, ∆
Γ`∆
(⊥I )
Γ ` ⊥, ∆
Γ ` ¬A, ∆ Γ ` A, ∆ Γ, A ` ⊥, ∆
(¬E ) (¬I )
Γ ` ⊥, ∆ Γ ` ¬A, ∆
structural rules:
Γ ` ∆, A, B, ∆0 Γ ` ∆, ∆0
(xchR ) (wkR )
Γ ` ∆, B, A, ∆0 Γ ` ∆, A, ∆0
Γ ` ∆, A, A, ∆0 Γ ` ∆, ⊥, ∆0
(contrR ) (⊥R )
Γ ` ∆, A, ∆0 Γ ` ∆, ∆0
Γ ` ⊥, ∆
(⊥R )
Γ ` ⊥, ∆ Γ`∆
(⊥E ) (wkR )
Γ ` A, ∆ Γ ` A, ∆
In fact the constant ⊥ is now superfluous, since one can convince himself that
proving ⊥ amounts to proving the empty sequent ∆.
2.5.4 Cut-elimination in classical logic. Classical logic also does have the
cut-elimination property, see section 2.3.3, although this is more subtle to show
than in the case of intuitionistic logic due to the presence of structural rules. In
particular, in addition to the usual cut elimination steps, we need to add rules
making elimination rules “commute” with structural rules: namely, an intro-
duction and the corresponding elimination rules can be separated by structural
rules. For instance, suppose that we want to eliminate the following “cut”:
π π0
Γ`A Γ`B
(∧I )
Γ`A∧B
(wkR )
Γ ` A ∧ B, C
(∧lE )
Γ ` A, C
CHAPTER 2. PROPOSITIONAL LOGIC 75
We first need to make the elimination rule for conjunction commute with the
weakening:
π π0
Γ`A Γ`B
(∧I )
Γ`A∧B
(∧lE )
Γ`A
(wkR )
Γ ` A, C
and then we can finally properly eliminate the cut:
π
(∧lE )
Γ`A
(wkR )
Γ ` A, C
both cut-eliminates to
π π0
Γ`A Γ`B
(wkR ) and (wkR )
Γ ` A, B Γ ` A, B
This is sometimes called Lafont’s critical pair. We like to identify proofs up to
cut elimination (much more on this in chapter 4) and therefore those two proofs
should be considered as being “the same”. In particular, when both π and π 0
are proofs of Γ ` A, i.e. A = B, this forces us to identify the two proofs
π π0
Γ`A Γ`A
(wkR ) (wkR )
Γ ` A, A Γ ` A, A
(contrR ) and (contrR )
Γ`A Γ`A
and thus to identify the two proofs π and π 0 . More generally, by a similar
reasoning, any two proofs of a same sequent Γ ` ∆ should be identified. Cuts
can hurt! This gives another, purely logical, explanation about why classical
logic is “proof irrelevant”, as already mentioned in section 2.5.2: up to cut-
elimination, there is at most one proof of a given sequent.
¬(A ∧ B) ⇔ ¬A ∨ ¬B ¬> ⇔ ⊥ A ⇒ B ⇔ ¬A ∨ B
¬(A ∨ B) ⇔ ¬A ∧ ¬B ¬⊥ ⇔ > ¬¬A ⇔ A
CHAPTER 2. PROPOSITIONAL LOGIC 76
and the logical system can be reduced to the four following rules:
Γ`∆
0 (ax) (⊥I )
Γ, A, Γ ` A, ∆ Γ ` ⊥, ∆
Γ ` A ⇒ B, ∆ Γ ` A, ∆ Γ, A ` B, ∆
(⇒E ) (⇒I )
Γ ` B, ∆ Γ ` A ⇒ B, ∆
together with the four structural rules. Several other choices of connectives are
possible.
L ::= X | ¬X
C ::= L | C ∨ C | ⊥
¬(A ∧ B) ¬A ∨ ¬B ¬> ⊥
¬(A ∨ B) ¬A ∧ ¬B ¬⊥ >
(A ∧ B) ∨ C (A ∨ C) ∧ (B ∨ C) >∨C >
A ∨ (B ∧ C) (A ∨ B) ∧ (A ∨ C) A∨> >
A⇒B ¬A ∨ B ¬¬A A
Those rules rewrite formulas into classically equivalent ones, since those are
instances of de Morgan laws. However, it is not clear that the process terminates.
It does, but it is not efficient, and we will see below a better way to put a formula
in clausal form.
CHAPTER 2. PROPOSITIONAL LOGIC 77
Efficient computation of the clausal form. Given a clause C, we write L(C) for
the set of literals occurring in it:
(A ∨ B) ∨ C ⇔ A ∨ (B ∨ C) ⊥∨A⇔A B∨A⇔A∨B
A∨⊥⇔A A∨A⇔A
A ∨ B = {Ci ∪ Dj | 1 6 i 6 m, 1 6 j 6 n}
The notion clausal form can be further improved as follows. We say that a
formula is in canonical clausal form when
1. it is in clausal form,
2. it does not contains twice the same clause or > (this is automatic if it is
represented as a set of clauses),
3. no clause contains twice the same literal or ⊥ (this is automatic if they
are represented as sets of literals),
CHAPTER 2. PROPOSITIONAL LOGIC 78
(** Formulas. *)
type t =
| Var of var
| And of t * t
| Or of t * t
| Imp of t * t
| Not of t
| True | False
De Morgan laws in intuitionistic logic. Let us insist once again on the fact that
the de Morgan laws do not hold in intuitionistic logic. Namely, the following
implications are intuitionistically true, but not their converse:
¬A ∧ ¬B ⇔ ¬(A ∨ B)
2.5.6 Boolean models. Classical natural deduction matches exactly the no-
tion of truth one would get from usual boolean models. Let us detail this. We
write B = {0, 1} for the set of booleans. A valuation ρ is a function X → B,
assigning booleans to variables. Such a valuation can be extended as a function
ρ : Prop → B, from propositions to booleans, by induction over the propositions
by
is derivable in NK.
Proof. For concision, we write
δX X = (X ⇒ >) ∧ (¬X ⇒ ⊥)
and we have
(ax)
δX X, ¬X ` (X ⇒ >) ∧ (¬X ⇒ ⊥)
(∧rE ) (ax)
δX X, ¬X ` ¬X ⇒ ⊥ δX X, ¬X ` ¬X
(⇒E )
δX X, ¬X ` ⊥
(¬I )
δX X ` ¬¬X
(¬¬E )
δX X ` X
(⇒I )
` δX X ⇒ X
If A = Y with Y 6= X, we have
δX Y = (X ⇒ Y ) ∧ (¬X ⇒ Y )
δX Y, X ` (X ⇒ Y ) ∧ (¬X ⇒ Y )
..
δX Y, X ` X ⇒ Y δX Y, X ` X .
δX Y ` X ∨ ¬X δX Y, X ` Y δX Y, ¬X ` Y
δX Y ` Y
In the base case, the formula A has no variable and it thus evaluates to the
same value in any environment, and we can easily compute this value: it is
satisfiable if and only if this value is true. This directly leads to a very simple
implementation of a satisfiability algorithm, see figure 2.7: the function subst
compute the substitution of a formula into another one, the function var finds
a free variable, and finally the function sat tests the satisfiability of a formula.
As is, this algorithm is not very efficient: some subformulas get evaluated
many times during the search. It can however be much improved by using
formulas in canonical clausal form, as described in proposition 2.5.5.1. First,
substitution can be implemented on those as follows:
Lemma 2.5.7.2. Given a canonical clausal formula A and a variable X, a canon-
ical clausal formula for A[>/X] (resp. A[⊥/X]) can be obtained from A by
– removing all clauses containing X (resp. ¬X),
(** Formulas. *)
type t =
| Var of int
| And of t * t
| Or of t * t
| Not of t
| True | False
pure literal (or raises Not_found) and finally the function dpll implements the
above algorithm. The function pure uses an auxiliary list vars of pairs X,b
where X is a variable and b is either Some true or Some false if the variable X
occurs only positively or negatively, or None if it occurs both positively and
negatively.
ΓX = {C | C ∨ X ∈ Γ} Γ¬X = {D | ¬X ∨ D ∈ Γ}
and Γ0 for the set of clauses in Γ which neither contain X nor ¬X. We supposed
that the clauses are in canonical form, so that we have a partition of Γ as
Γ = Γ0 ] {C ∨ X | C ∈ ΓX } ] {¬X ∨ D | D ∈ Γ¬X }
Γ \ X = Γ0 ∪ {C ∨ D | C ∈ ΓX , D ∈ Γ¬X }
Remark 2.5.8.4. As defined above, the resolvent might contain clauses not in
canonical form, even if C and D are. In order to keep this invariant, we should
remove all clauses of the form C ∨ D such that C contains a literal and D its
negation, which we will implicitly do; in clauses, we should also remove duplicate
literals.
As indicated above, computing the resolvent reduces the number of free variables
of Γ:
Lemma 2.5.8.5. Given a clausal form Γ and a variable X, we have
Previous lemma implies that resolution is refutation complete in the sense that
is can always be used to show that a set of clauses cannot be satisfied (by
whichever valuation):
Theorem 2.5.8.7 (Refutation completeness). A set Γ of clauses is unsatisfiable
if and only if Γ ` ⊥ can be proved using the axiom and resolution rules only.
Proof. Writing FV(Γ) = {X1 , . . . , Xn } for the free variables of Γ, define the
sequence of sets of clauses Γ06i6n by Γ0 = Γ and Γi+1 = Γi \ Xi :
– the clauses of Γ0 can be deduced from those of Γ using the axiom rule,
– the clauses of Γi+1 can be deduced from those in Γi using the resolution
rule.
Lemma 2.5.8.6 ensures that Γi is satisfiable if and only if Γi+1 is satisfiable, and
thus, by induction, Γ0 is satisfiable if and only if Γn is satisfiable. Moreover, by
lemma 2.5.8.5, we have FV(Γn ) = ∅, thus Γn = ∅ or Γn = {⊥}, and therefore Γn
is unsatisfiable if and only if Γn = {⊥}. Finally, Γ is unsatisfiable if and only if
Γn = {⊥}, i.e. ⊥ can be deduced from Γ using axiom and resolution rules.
X⇒Y Y ⇒Z X
we can deduce Z. Putting those in normal form and using previous lemma, this
amounts to show that Γ consisting of
¬X ∨ Y ¬Y ∨ Z X ¬Z
Γ \ X = Γ0 ∪ {C ∨ D | C ∈ ΓX , D ∈ Γ¬X }
let () =
let g = [
[false,0;true,1];
[false,1;true,2];
[true,0]
] in
let c = [true,2] in
assert (prove g c)
becomes provable:
(ax)
¬(¬A ∨ A), A ` A
(ax) (∨rI )
¬(¬A ∨ A), A ` ¬(¬A ∨ A) ¬(¬A ∨ A), A ` ¬A ∨ A
(¬E )
¬(¬A ∨ A), A ` ⊥
(¬I )
¬(¬A ∨ A) ` ¬A
(ax) (∨lI )
¬(¬A ∨ A) ` ¬(¬A ∨ A) ¬(¬A ∨ A) ` ¬A ∨ A
(¬E )
¬(¬A ∨ A) ` ⊥
(¬I )
` ¬¬(¬A ∨ A)
One of the main ingredients behind this proof is that having ¬(¬A ∨ A) as
hypothesis in a context Γ allows to discard then current proof goal B and go
back to proving ¬A ∨ A:
..
.
(ax)
Γ ` ¬(¬A ∨ A) Γ ` ¬A ∨ A
(¬E )
Γ`⊥
(⊥E )
Γ`B
What do we get more than directly proving ¬A ∨ A? The fact that, during the
proof, we can reset our proof goal to ¬A ∨ A! We thus start by proving ¬A ∨ A
by proving ¬A, which requires proving ⊥ from A. At this point, we change our
mind and start again the proof of ¬A ∨ A, but this time we prove A, which we
can because we gained this information from the previously “aborted” proof.
A more detailed explanation of this kind of behavior was already developed
in section 2.5.2. This actually generalizes to any formula, by a result due to
Glivenko [Gli29]. Given a context Γ, we write ¬¬Γ for the context obtained
from Γ by double-negating every formula.
Theorem 2.5.9.1 (Glivenko’s theorem). Given a context Γ and propositional
formula A, the sequent Γ ` A is provable in classical logic if and only if the
sequent ¬¬Γ ` ¬¬A is provable in intuitionistic logic.
This result allows to relate the consistency of classical and intuitionistic logic
in the following way.
Theorem 2.5.9.2. Intuitionistic logic is consistent if and only if classical logic is
also consistent.
Proof. Suppose that intuitionistic logic is inconsistent: there is an intuitionistic
proof of ⊥. This proof is also a valid classical proof and thus classical logic is
inconsistent. Conversely, suppose that classical logic is inconsistent. There is
a classical proof of ⊥ and thus, by theorem 2.5.9.1, an intuitionistic proof π of
¬¬⊥. However, the implication ¬¬⊥ ⇒ ⊥ holds intuitionistically:
(ax)
¬¬⊥, ⊥ ` ⊥
(ax) (¬I )
¬¬⊥ ` ¬¬⊥ ¬¬⊥ ` ¬⊥
(¬E )
¬¬⊥ ` ⊥
(⇒I )
` ¬¬⊥ ⇒ ⊥
CHAPTER 2. PROPOSITIONAL LOGIC 91
X∗ = X
(A ∧ B)∗ = A∗ ∧ B ∗ (A ∨ B)∗ = ¬(¬A∗ ∧ ¬B ∗ )
>∗ = > ⊥∗ = ⊥
(A ⇒ B)∗ = A∗ ⇒ B ∗ (¬A)∗ = ¬A∗
We conclude by induction on n.
In particular, ¬¬¬¬A ⇔ ¬¬A, so that we gain nothing by performing twice the
double-negation translation.
(A ⇒ B) ∨ (B ⇒ A)
(¬B ⇒ A) ⇒ (((A ⇒ B) ⇒ A) ⇒ A)
(A ⇒ B) ∨ (B ⇒ A)
(A ⇒ B ∨ C) ⇒ (A ⇒ B) ∨ (A ⇒ C)
The proof cannot begin with an introduction rule because we have no hope of
filling the dots:
.. ..
. .
A∨B `B A∨B `A
(∨l ) (∨r )
A∨B `B∨A I A∨B `B∨A I
Γ`A∨B Γ, A ` C Γ, B ` C
(∨E )
Γ`C
which requires us to come up with a formula A∨B which is not directly indicated
in the conclusion Γ ` C and it is not clear how to automatically generate such
formulas. Starting in this way, the proof can be ended as in example 2.2.5.2.
In order to overcome this problem, Gentzen has invented sequent calculus
which is another presentation of logic. In natural deduction, all rules operate on
the formula on the right of ` and there are introduction and elimination rules.
In sequent calculus, there are only introduction rules but those can operate
either on formulas on the left or on the right of `. This results in a highly
symmetrical calculus.
Γ`∆
where Γ and ∆ are contexts: the intuition is that we have the conjunction
of formulas in Γ as hypothesis, from which we can deduce the disjunction of
formulas in ∆.
2.6.2 Rules. In all the systems we consider, unless otherwise stated, we always
suppose that we can permute, duplicate and erase formulas in context, i.e. that
the structural rules of figure 2.9 are always present. The additional rules for
sequent calculus are shown in figure 2.10 and the resulting system is called LK.
In sequent calculus, as opposed to natural deduction, the symmetry between
disjunction and conjunction has been restored: excepting for axiom and cut, all
rules come in a left and right flavor. Although the presentation is quite different,
the provability power of this system is the same as the one for classical natural
deduction presented in section 2.5:
Theorem 2.6.2.1. A sequent Γ ` ∆ is provable in NK (figure 2.5) if and only if
it is provable in LK (figure 2.10).
Proof. The idea is that, by induction, we can translate a proof in NK into a
proof in LK, by induction, and back. The introduction rules in NK correspond
to right rules in LK, the axiom rules match in both systems, the cut rule is
admissible in NK (the proof is similar to the one in proposition 2.3.2.1 for NJ),
as well as various structural rules (shown as in section 2.2.7), so that we only
have to show the that eliminations rules of NK are admissible in LK and the
left rules of LK are admissible in NK. We only handle the case of conjunction
here:
CHAPTER 2. PROPOSITIONAL LOGIC 94
Γ, B, A, Γ0 ` ∆ Γ ` ∆, B, A, ∆0
(xchL ) (xchR )
Γ, A, B, Γ0 ` ∆ Γ ` ∆, A, B, ∆
Γ, A, A, Γ0 ` ∆ Γ ` ∆, A, A, ∆0
(contrL ) (contrR )
Γ, A, Γ0 ` ∆ Γ ` ∆, A, ∆0
Γ, Γ0 ` ∆ Γ ` ∆, ∆0
(wkL ) (wkR )
Γ, A, Γ0 ` ∆ Γ ` ∆, A, ∆0
Γ, >, Γ0 ` ∆ Γ ` ∆, ⊥, ∆0
(>L ) (⊥R )
Γ, Γ0 ` ∆ Γ ` ∆, ∆0
Γ ` A, ∆ Γ, A ` ∆
(ax) (cut)
Γ, A ` A, ∆ Γ`∆
Γ, A, B ` ∆ Γ ` A, ∆ Γ ` B, ∆
(∧L ) (∧R )
Γ, A ∧ B ` ∆ Γ ` A ∧ B, ∆
(>R )
Γ ` >, ∆
Γ, A ` ∆ Γ, B ` ∆ Γ ` A, B, ∆
(∨L ) (∨R )
Γ, A ∨ B ` ∆ Γ ` A ∨ B, ∆
(⊥L )
Γ, ⊥ ` ∆
Γ ` A, ∆ Γ, B ` ∆ Γ, A ` B, ∆
(⇒L ) (⇒R )
Γ, A ⇒ B ` ∆ Γ ` A ⇒ B, ∆
Γ ` A, ∆ Γ, A ` ∆
(¬L ) (¬R )
Γ, ¬A ` ∆ Γ ` ¬A, ∆
X ∗ = ¬X (¬X)∗ = X
(A ∧ B)∗ = A∗ ∨ B ∗ (A ∨ B)∗ = A∗ ∧ B ∗
∗
> =⊥ ⊥∗ = >
Γ ` A, ∆ Γ 0 , A ` ∆0
(ax) (cut)
A`A Γ, Γ ` ∆, ∆0
0
Γ, A, B ` ∆ Γ ` A, ∆ Γ0 ` B, ∆0
(∧L ) (∧R )
Γ, A ∧ B ` ∆ Γ, Γ0 ` A ∧ B, ∆, ∆0
(>R )
Γ ` >, ∆
Γ, A ` ∆ Γ 0 , B ` ∆0 Γ ` A, B, ∆
(∨L ) (∨R )
Γ, Γ0 , A ∨ B ` ∆, ∆0 Γ ` A ∨ B, ∆
(⊥L )
Γ, ⊥ ` ∆
Γ ` A, ∆ Γ0 , B ` ∆0 Γ, A ` B, ∆
(⇒L ) (⇒R )
Γ, Γ0 , A ⇒ B ` ∆, ∆0 Γ ` A ⇒ B, ∆
Γ ` A, ∆ Γ, A ` ∆
(¬L ) (¬R )
Γ, ¬A ` ∆ Γ ` ¬A, ∆
Because of this, we can restrict to proving sequents of the form ` ∆, which are
called single-sided. All the rules preserve single-sidedness excepting the axiom
rule, which is easily modified in order to satisfy this property. With some extra
care, we can even come up with a presentation which does not require any
structural rule (those are admissible): the resulting presentation of the calculus
is given in figure 2.12. If we do not want to consider only formulas where only
variables can be negated, then the de Morgan laws can be added as the following
explicit rules:
(ax) (ax)
` ∆, A∗ , ∆0 , A, ∆00 ` ∆, A, ∆0 , A∗ , ∆00
` ∆, A, ∆0 ` ∆, A∗ , ∆0
(cut)
` ∆, ∆0
` ∆, A, ∆0 ` ∆, B, ∆0 ` ∆, A, B, ∆0
(∧) (∨)
` ∆, A ∧ B, ∆0 ` ∆, A ∨ B, ∆0
` ∆, ∆0
(>) (⊥)
` ∆, >, ∆0 ` ∆, ⊥, ∆0
otherwise not maintain the invariant of having one formula on the right. With
little more care, one can write rules which do not require adding structural rules
(they are admissible): the resulting calculus is presented in figure 2.13. Remark
that, in order for contraction to be admissible, one has to keep A ⇒ B in the
context of the left premise. Similarly to theorem 2.6.2.1, one shows:
Theorem 2.6.3.1. A sequent Γ ` A is provable in NJ if and only if it is provable
in LJ.
Γ`A Γ, A ` B
0 (ax) (cut)
Γ, A, Γ ` A Γ`B
Γ, A, B, Γ0 ` C Γ`A Γ`B
(∧L ) (∧R )
Γ, A ∧ B, Γ0 ` C Γ`A∧B
(>R )
Γ`>
Γ, A, Γ0 ` C Γ, B, Γ0 ` C Γ`A Γ`B
(∨L ) (∨lL ) (∨rL )
Γ, A ∨ B, Γ0 ` C Γ`A∨B Γ`A∨B
(⊥L )
Γ, ⊥, Γ0 ` A
Γ, A ⇒ B, Γ0 ` A Γ, B, Γ0 ` C Γ, A ` B
0 (⇒L ) (⇒R )
Γ, A ⇒ B, Γ ` C Γ`A⇒B
Γ, ¬A, Γ0 ` A Γ, A ` ⊥
(¬L ) (¬R )
Γ, ¬A, Γ0 ` ⊥ Γ ` ¬A
| Or of t * t
| True | False
Using this representation, the negation of a formula can be computed with the
function
let rec neg = function
| Var (n, x) -> Var (not n, x)
| Imp (a, b) -> And (a, neg b)
| And (a, b) -> Or (neg a, neg b)
| Or (a, b) -> And (neg a, neg b)
| True -> False
| False -> True
Proof search in intuitionistic logic. Proof search can be performed in LJ, but
the situation is more subtle. First remark that, similarly to the situation in LK
(proposition 2.6.5.1), we have
Proposition 2.6.5.3. LJ has the subformula property.
As an immediate consequence, we deduce
Theorem 2.6.5.4. We can decide whether a sequent Γ ` A is provable in LJ or
not.
Proof. There is only a finite number of subformulas of Γ ` A. We can restrict
to sequents where a formula occurs at most 3 times in the context [Gir11,
section 4.2.2] and therefore there is a finite number of possible sequents formed
with those subformulas. By testing all the possible rules, we can determine
which of those are provable, and thus determine whether the initial sequent is
provable.
Previous theorem is constructive, but the resulting algorithm is quite inefficient.
The problem of finding proofs is more delicate than for LK because not all
the rules are reversible: (∨lL ), (∨rL ) and (⇒L ) are not reversible. The rules (∨lL ),
(∨rL ) are easy to handle when performing proof search: when trying to prove a
formula A ∨ B, we either try to prove A or to prove B. The rule (⇒L )
Γ, A ⇒ B, Γ0 ` A Γ, B, Γ0 ` C
(⇒L )
Γ, A ⇒ B, Γ0 ` C
is more difficult to handle. If we apply it naively, it can loop for the same
reasons as in section 2.4.2:
.. ..
. .
(⇒L ) ..
Γ, A ⇒ B ` A Γ, B ` B .
(⇒L )
Γ, A ⇒ B ` A Γ, B ` B
(⇒L )
Γ, A ⇒ B ` A
Γ, A, Γ0 ` B X ∈ Γ, Γ0
(⇒X )
Γ, X ⇒ A, Γ0 ` B
Γ, B ⇒ C, Γ0 ` A ⇒ B Γ, C ` D
0 (⇒⇒ )
Γ, (A ⇒ B) ⇒ C, Γ ` D
Γ, A ⇒ (B ⇒ C), Γ0 ` D Γ, A ⇒ C, B ⇒ C, Γ0 ` D
(⇒∧ ) (⇒∨ )
Γ, (A ∧ B) ⇒ C, Γ0 ` D Γ, (A ∨ B) ⇒ C, Γ0 ` D
Γ, A, Γ0 ` B Γ, Γ0 ` B
(⇒> ) (⇒⊥ )
Γ, > ⇒ A, Γ0 ` B Γ, ⊥ ⇒ A, Γ0 ` B
The logic LJT was introduced by Dyckoff in order to overcome this prob-
lem [Dyc92]. It is obtained from LJ by replacing the (⇒L ) rule with the six
rules of figure 2.14, which allow proving sequents of the form Γ, A ⇒ B, Γ0 ` C,
depending on the form of A.
Proposition 2.6.5.5. A sequent is provable in LJ if and only if it is provable
in LJT.
The main interest of this variant is that proof search is always terminating (thus
the T in LJT). Moreover, the rules (⇒∧ ), (⇒∨ ), (⇒> ) and (⇒⊥ ) are reversible
and can thus always be applied during proof search. Many variants of this idea
have been explored, such as the SLJ calculus [GLW99].
A proof search procedure based on this sequent calculus can be implemented
as follows. We describe terms as usual as
type t =
| Var of string
| Imp of t * t
| And of t * t
| Or of t * t
| True | False
The procedure which determines whether a formula is provable is then shown
in figure 2.15. This procedure takes as argument two contexts Γ0 and Γ (respec-
tively called env’ and env) and a formula A. Initially, the context Γ0 is empty
and will be used to store the formulas of Γ which have already “processed”. The
procedure first applies all the reversible right rules, then all the reversible left
rules; a formula of Γ which does not give rise to a reversible left rule is put
in Γ0 . Once this is done, the procedure tries to apply the axiom rule, handles
disjunctions by trying to apply either (∨lL ) or (∨rL ), and finally successfully tries
all the possible applications of the non-reversible rules (⇒X ) and (⇒⇒ ). Here,
the function context_formulas returns, given a context Γ the list of all the pairs
consisting of a formula A and a context Γ0 , Γ00 such that Γ = Γ0 , A, Γ00 , i.e. the
context Γ where some formula A has been removed.
CHAPTER 2. PROPOSITIONAL LOGIC 101
Γ`A⇒B Γ`A
(ax) (⇒E )
Γ, A, Γ0 ` A Γ`B
respectively called axiom and modus ponens. Of course, there is very little that
we can deduce with these two rules only. The other necessary logical principles
are added in the form of axiom schemes, which can be assumed at any time
during the proofs. In the case of the implicational fragment (implication is the
only connective from which are built the formulas), those are
(K) A ⇒ B ⇒ A,
(S) (A ⇒ B ⇒ C) ⇒ (A ⇒ B) ⇒ A ⇒ C.
By “axiom schemes”, we mean that the above formulas can be assumed, for
whichever given formulas A, B and C. In other words, this amounts to adding
the rules
(K) (S)
Γ`A⇒B⇒A Γ ` (A ⇒ B ⇒ C) ⇒ (A ⇒ B) ⇒ A ⇒ C
A sequent is provable when it is the conclusion of a proof built from the above
rules, and a formula A is provable when the sequent ` A is provable.
Example 2.7.1.1. For any formula A, the formula A ⇒ A is provable:
(S) (K)
` (A ⇒ (B ⇒ A) ⇒ A) ⇒ (A ⇒ B ⇒ A) ⇒ A ⇒ A ` A ⇒ (B ⇒ A) ⇒ A
(⇒E ) (K)
` (A ⇒ B ⇒ A) ⇒ A ⇒ A `A⇒B⇒A
(⇒E )
`A⇒A
– Ai is an instance of an axiom, or
– there are indices j, k < i such that Ak = Aj ⇒ Ai , i.e. Ai can be deduced
by
Γ ` Aj ⇒ Ai Γ ` Aj
(⇒E )
Γ ` Ai
CHAPTER 2. PROPOSITIONAL LOGIC 103
1. (A ⇒ (B ⇒ A) ⇒ A) ⇒ (A ⇒ B ⇒ A) ⇒ A ⇒ A by (S)
2. A ⇒ (B ⇒ A) ⇒ A by (K)
3. (A ⇒ B ⇒ A) ⇒ A ⇒ A by modus ponens on 1. and 2.
4. A ⇒ B ⇒ A by (K)
2.7.2 Other connectives. In the case where other connectives than implica-
tion are considered, appropriate axioms should be added:
truth: A⇒>
disjunction: A ∨ B ⇒ (A ⇒ C) ⇒ (B ⇒ C) ⇒ C A⇒A∨B
B ⇒A∨B
falsity: ⊥⇒A
negation: ¬A ⇒ A ⇒ ⊥ A ⇒ ⊥ ⇒ ¬A
It can be observed that the axioms are somehow in correspondence with elim-
ination and introduction rules in natural deduction (respectively left and right
column above). The classical variants of the system can be obtained by further
adding one of the axioms of theorem 2.5.1.1.
Γ, A, B, Γ0 ` C Γ, A, A ` C Γ, > ` A Γ`C
Γ, B, A, Γ0 ` C Γ, A ` C Γ`A Γ, A ` C
Γ, A, Γ0 ` B
(⇒I )
Γ, Γ0 ` A ⇒ B
allows us to conclude.
We can thus show that provability in this system is the usual one.
Theorem 2.7.3.3. A sequent Γ ` A is provable in Hilbert calculus if and only if
it is provable in natural deduction.
Proof. For simplicity, we restrict to the case of the implicational fragment. In
order to show that a proof in the Hilbert calculus induces a proof in NJ, we
should show that the rules (ax) and (⇒E ) are admissible in NJ (this is the case
by definition) and that the axioms (S) and (K) can be derived in NJ, which is
easy:
(ax) (ax) (ax) (ax)
A ⇒ B ⇒ C, A ⇒ B, A ` A ⇒ B ⇒ C A ⇒ B ⇒ C, A ⇒ B, A ` A A ⇒ B ⇒ C, A ⇒ B, A ` A ⇒ B A ⇒ B ⇒ C, A ⇒ B, A ` A
(⇒E ) (⇒E )
A ⇒ B ⇒ C, A ⇒ B, A ` B ⇒ C A ⇒ B ⇒ C, A ⇒ B, A ` B
(⇒E )
A ⇒ B ⇒ C, A ⇒ B, A ` C
(⇒I )
A ⇒ B ⇒ C, A ⇒ B ` A ⇒ C
(⇒I )
A ⇒ B ⇒ C ` (A ⇒ B) ⇒ A ⇒ C
(⇒I )
` (A ⇒ B ⇒ C) ⇒ (A ⇒ B) ⇒ A ⇒ C
CHAPTER 2. PROPOSITIONAL LOGIC 105
(this is voluntarily too small to read, you should prove this by yourself) and
(ax)
A, B ` A
(⇒I )
A`B⇒A
(⇒I )
`A⇒B⇒A
w X iff ρ(w, X)
w > holds
w ⊥ does not hold
w A∧B iff w A and w B
w A∨B iff w A or w B
w A⇒B iff, for every w0 > w, w0 A implies w0 B
w ¬A iff, for every w0 > w, w0 A does not hold
might have different possible futures. In each world, the valuation indicates
which formulas we know are true, and the monotonicity condition imposes that
our knowledge can only grow: if we know that a formula is true then we will
still know it in the future.
Lemma 2.8.1.1. Satisfaction is monotonic: given a formula A, a Kripke structure
W and a world w, if w A then w0 A for every world w0 > w.
Proof. By induction on the formula A.
Given a context Γ = A1 , . . . , An , a formula A, and a Kripke structure W , we
write Γ W A when, for every world w ∈ W in which all the formulas Ai are
satisfied, the formula A is also satisfied. We write Γ A when Γ W A holds
for every structure W : in this case, we say that A is valid in the context Γ.
Remark 2.8.1.2. It should be observed that the notion of Kripke structure gen-
eralizes the notion of boolean model recalled in section 2.5.6. Namely, a boolean
valuation ρ : X → B can be seen as a Kripke structure W with a single world w,
the valuation being given by ρ. The notion of validity for Kripke structures
defined above then coincides with the one for boolean models.
A following theorem ensures that Kripke semantics are sound: a provable
formula is valid.
Theorem 2.8.1.3 (Soundness). If a sequent Γ ` A is derivable in intuitionistic
logic then Γ A.
Proof. By induction on the proof of Γ ` A.
The contraposite of this theorem says that if we can find a Kripke structure in
which there is a world where a formula A is not satisfied, then A is not intu-
itionistically provable. This thus provides an alternative to methods based on
cut-elimination (see section 2.3.5) in order to discard the provability of formulas.
Example 2.8.1.4. Consider formula expressing double negation elimination ¬¬X ⇒ X
and the Kripke structure with W = {w0 , w1 }, with w0 6 w1 , and ρ(w0 , X) = 0
and ρ(w1 , X) = 1, which can be pictured as
X
· ·
w0 w1
A A A
· · · · · ·
w0 w1 w0 w1 w0 w1
In the two first cases, ¬¬A is satisfied and in the last one ¬A is satisfied.
Therefore, the weak excluded middle ¬¬A ∨ ¬A is always satisfied: this shows
CHAPTER 2. PROPOSITIONAL LOGIC 107
that the weak excluded middle does not imply the excluded middle. By using a
similar reasoning, it can be shown that the linearity axiom (A ⇒ B) ∨ (B ⇒ A)
does not imply the excluded middle. Both thus give rise to intermediate logics,
see section 2.5.10.
Example 2.8.1.5. The Kripke structure
X Y
· · ·
W = {wΦ | Φ is complete}
Remark 2.8.2.7. It can be shown that we can restrict to Kripke models which are
tree-shaped and finite without losing completeness. With further restrictions,
various completeness results have been obtained. As an extreme example, if
we restrict to models with only one world, then we obtain boolean models
(remark 2.8.1.2) which are complete for classical logic (theorem 2.5.6.5). For a
more unexpected example, Kripke models which are total orders are complete
for intuitionistic logic extended with the linearity axiom (section 2.5.10), thus
its name.
Chapter 3
Pure λ-calculus
λf x.f (f x)(λy.y)
This means that we can replace the name of the bound variable by any other
(as above) without changing the meaning of the expression. For instance, the
first one is equivalent to
x
lim
z→∞ z
This process of changing the name of the variable is called α-conversion and
is more subtle than it seems at first: there are actually some restrictions on
the names of the variables we can use. For instance, in the above example, we
cannot rename t to x since the following reasoning is clearly not valid:
x x
0 = lim = lim = lim 1 = 1
t→∞ t x→∞ x x→∞
The problem here is that we tried to change the name of t to a variable name
which was already used somewhere else. These issues are generally glossed over
in mathematics, but in computer science we cannot simply do that: we have to
understand in details these α-conversion mechanisms when implementing func-
tional programming languages, otherwise we will incorrectly evaluate programs.
Believe it or not this simple matter is a major source of bugs and headaches.
CHAPTER 3. PURE λ-CALCULUS 111
It means that the function which to x associates some expression t, when applied
to an argument u reduces to t where all the occurrences of x have been replaced
by u. The properties of this reduction relation is one of our main objects of
interest here.
In this chapter. We introduce the λ-calculus in section 3.1 and the β-reduction
in section 3.2. We then study the computational power of the resulting calculus
in section 3.3 and show that reduction in confluent in section 3.4. We discuss
the various ways in which reduction can be implemented in section 3.5, and the
ways to handle α-conversion in section 3.6.
References. Should you need a more detailed presentation of λ-calculus, its prop-
erties and applications, good introductions include [Bar84, SU06, Sel08].
3.1 λ-terms
3.1.1 Definition. Suppose fixed an infinite countable set X = {x, y, z, . . .}
whose elements are called variables. The set Λ of λ-terms is generated by the
following grammar:
t, u ::= x | t u | λx.t
This means that a λ-term is either
– a variable x,
– an application t u, which is a pair of terms t and u, though of as applying
the function t to an argument u,
– an abstraction λx.t, which is a pair consisting of a variable x and a term t,
thought of as the function which to x associates t.
By convention,
tuv = (tu)v
λx.xy = λx.(xy)
3.1.2 Bound and free variables. In a term of the form λx.t, the variable x
is said to be bound in the term t: in a sense the abstraction “declares” the
variable in the term t, and all occurrences of x in t will make reference to the
variable declared here (unless it is bound again). Thus, in the term (λx.xy)x, the
first occurrence of x refers to the variable declared by the abstraction whereas
the second does not. Intuitively, this term is the same as (λz.zy)x, but not
to (λz.zy)z; this will be made formal below through the notion of α-equivalence,
but we should keep in mind that there is always the possibility of renaming
bound variables.
A free variable in a term is a variable which is not bound in a subterm. We
define the set FV(t) of a term t is formally defined, by induction on t by
FV(x) = {x}
FV(t u) = FV(t) ∪ FV(u)
FV(λx.t) = FV(t) \ {x}
t{y/x}
for the resulting term. There is one subtlety though, we only want to rename
free occurrences of x, since the other ones refer to the abstraction to which they
are bound. Formally, the renaming t{y/x} is defined by
x{y/x} = y
z{y/x} = z if z 6= x
(t u){y/x} = (t{y/x}) (u{y/x})
(λx.t){y/x} = λx.t
(λy.t){y/x} = λx.(t{y/x})
(λz.t){y/x} = λz.(t{y/x}) if z 6= x and z 6= y
CHAPTER 3. PURE λ-CALCULUS 113
The tree last lines handle the possible cases when renaming a variable in an
abstraction: either we are trying to rename the bound variable, or we are trying
to rename a variable into the bound variable, or the bound variable and variables
involved in the renaming are distinct.
The α-equivalence ===α (or α-conversion) is the smallest congruence on
terms which identifies terms differing only by renaming bound variables, i.e.
Formally, the fact that α-equivalence is a congruence means that it is the small-
est relation such that whenever all the relations above the bar hold, the relation
below the bar also holds:
y 6∈ FV(t)
λx.t ===α λy.(t{y/x})
The equation on the first line is the one we have already seen above, those
on the second line ensure that α-equivalence is compatible with application
and abstraction, and those on the third line imposes that α-equivalence is an
equivalence relation (i.e. reflexive, symmetric and transitive).
x[u/x] = u
y[u/x] = y if y 6= x
(t1 t2 )[u/x] = (t1 [u/x]) (t2 [u/x])
(λx.t)[u/x] = λx.t
(λy.t)[u/x] = λy.(t[u/x]) if y 6= x and y 6∈ FV(t)
0 0
(λy.t)[u/x] = λy .(t{y /y}[u/x]) if y 6= x, y ∈ FV(t)
and y 0 6∈ FV(t) ∪ FV(u).
Because of the last line, the result of the substitution is not well-defined, be-
cause it depends on an arbitrary choice of fresh variable y 0 , but one can show
that this is a well-defined operation on λ-terms up to α-equivalence. For this
reason, as soon as we want to perform substitutions, it only makes sense to
consider the set of λ-terms quotiented by the α-equivalence relation: we will
implicitly do so in the following, and implicitly ensure that all the constructions
we perform are compatible with α-equivalence. The only time where we should
take α-conversion seriously is when dealing with implementation matters, see
section 3.6.2 for instance. Adopting this convention, the three last cases can be
replaced by
(λy.t)[u/x] = λy.(t[u/x])
where we suppose that y 6∈ FV(t) ∪ {x}, which we can always do up to α-con-
version.
3.2 β-reduction
Consider a term of the form
(λx.t) u (3.1)
It intuitively consists of a function expecting an argument x and returning a
result t(x), which is given an argument u. We expect therefore the computation
to reach the term t[u/x] consisting of the term t where all the free occurrences
of x have been replaced by u. This is what the notion of β-reduction does and
we write
(λx.t) u −→β t[u/x] (3.2)
to indicate that the term on the left reduces to the term on the right. Actually,
we want to be able to also perform this kind of reduction within a term: we call
a β-redex in a term t, a subterm of the form (3.1) and the β-reduction consists
in preforming the replacement (3.2) in the term.
t −→β t0
(βs ) (βλ )
(λx.t)u −→β t[u/x] λx.t −→β λx.t0
t −→β t0 u −→β u0
(βl ) (βr )
tu −→β t0 u tu −→β tu0
CHAPTER 3. PURE λ-CALCULUS 115
A “proof tree” showing that t −→β u is called a derivation of it. For instance,
a derivation of λx.(λy.y)xz −→β λx.xz is
(βs )
(λy.y)x −→β x
(βl )
(λy.y)xz −→β xz
(βλ )
λx.(λy.y)xz −→β λx.xz
Such derivations are often useful to reason about β-reduction steps, by induction
on the derivation tree.
3.2.3 Reduction and redexes. Let us now make some basic observations
about how reductions interact with redexes. Reduction can create β-redexes:
In the initial term there was only one redex, and after reducing it a new redex
has appeared. Reductions can duplicate β-redexes:
The β-redex (λy.y)(λz.z) occurs once in the initial term and twice in the reduced
one. Reduction can also erase β-redexes:
(λx.y)((λy.y)(λz.z)) −→β y
There were two redexes in the initial term, but there is none left after reducing
one of them.
3.2.4 Confluence. The reduction is not deterministic since some terms can
reduce in multiple ways:
when there exists a reduction path from t to u as above, and say that t reduces
∗
in multiple steps to u. The relation −→β on terms is the reflexive and transitive
closure of the relation −→β .
3.2.6 Normalization. Some terms cannot reduce, they are called normal forms:
x x(λy.λz.y) ...
λx.t or x t1 . . . tn
Ω = (λx.xx)(λx.xx)
(λx.y)((λx.xx)(λx.xx))
can reduce to y, which is a normal form and is thus weakly normalizing. It can
also reduce to itself and is thus not strongly normalizing.
3.2.7 β-equivalence. We write ===β for the β-equivalence, which is the small-
est equivalence relation containing −→β . It is not difficult to show that this
∗
relation can be characterized as the transitive closure of the relation −→β : we
have
t ===β u
whenever there exists terms t0 , . . . , t2n such that
∗ ∗ ∗ ∗ ∗ ∗
t = t0β ←− t1 −→β t2β ←− t3 −→β . . . β ←− t2n−1 −→β t2n = u
3.2.8 η-equivalence. In OCaml, the functions sin and fun x -> sin x are
clearly “the same”: one can be used in place of another without changing any-
thing, both will compute the sine of their input. However, they are not iden-
tical: their syntax differ. In λ-calculus, the η-equivalence relation relates two
such terms: it identifies a term t (which is a function, since everything is func-
tion in λ-calculus) with the function which to x associates t x. Formally, the
η-equivalence relation ===η is the smallest congruence such that
t ===η λx.tx
λx.tx −→η t
t −→η λx.tx
CHAPTER 3. PURE λ-CALCULUS 118
is also useful and called η-expansion. We have that η-equivalence is the reflexive,
symmetric and transitive closure of this relation. The converse relation
t −→ λx.tx
I = λx.x
I t −→β t
3.3.2 Booleans. The booleans true and false can respectively be encoded as
T = λxy.x F = λxy.y
With this encoding, the usual if-then-else conditional construction can be en-
coded as
if = λbxy.bxy
Namely, we have
∗ ∗
if T t u −→β t if F t u −→β u
and suggests the definition we made. There are of course other possible imple-
mentations, e.g.
and = λxy.xyx
In the above implementations, we only guarantee that the expected reductions
will happen when the arguments are booleans, but nothing is specified when
the arguments are arbitrary λ-terms.
3.3.3 Pairs. The encoding of pairs can be deduced from booleans. We can
namely encode the pairing operator as
pair = λxyb.if b x y
which can be thought of as an encoding of the pair ht, ui. In order to recover
the components of the pair, we can simply apply it to either T or F:
∗ ∗
(pair t u) T −→β t (pair t u) F −→β u
f 0x = x f n+1 x = f (f n x)
Otherwise said, the λ-term n is such that, when applied to arguments f and x,
iterates n times the application of f to x. For low values of n, we have
add = λmnf x.m succ n mul = λmnf x.m (add n) 0 exp = λmn.n (mul m) 1
or, alternatively, as
Exercise 3.3.4.1. The Ackermann function [Ack28] from pairs of natural numbers
to natural numbers is the function A defined by
A(0, n) = n + 1
A(m + 1, 0) = A(m, 1)
A(m + 1, n + 1) = A(m, A(m + 1, n))
Predecessor. We are now going to see how we can implement the predecessor
function mentioned above. Before going into that, let us see how we can imple-
ment the Fibonacci sequence fn defined by f0 = 0, f1 = 1 and fn+1 = fn +fn−1 .
A naive implementation would be
let rec fib n =
if n = 0 then 0
else if n = 1 then 1
else fib (n-1) + fib (n-2)
This function is highly inefficient because many computations are performed
multiple times. For instance, to compute fn , we compute both fn−1 and fn−2 ,
but the computation of fn−1 will require computing another time fn−2 , and so
on. The usual strategy to improve that consists in computing two successive
values (fn−1 , fn ) of the Fibonacci sequence at a time. Given such a pair, the
next pair is computed by
t u ===β u
t (Y t) ===β Y t
Fixpoints in OCaml. Before giving a λ-term which is fixpoint operator, let us see
how it can be implemented in OCaml and used to program recursive functions.
In practice, we will look for a function Y such that
Y t −→β t (Y t)
Note that such a function is necessarily non-terminating since there is an infinite
sequence of reductions
Y t −→β t (Y t) −→β t t (Y t) −→β t t t (Y t) −→β . . .
but might still be useful because since there might be other possible reductions
reaching a normal form. Following the conventions, we will write fix instead of
Y. A function which behaves as proposed is easily implemented:
let rec fix f = f (fix f)
Let us see how this can be used in order to implement the factorial function
without explicitly resorting to recursion. The factorial function satisfies 0! = 1
and n! = n × (n − 1)! so that it can be implemented as
let rec fact n =
if n = 0 then 1 else n * fact (n-1)
In order to implement it without using recursion, the trick is to first transform
this function into one which takes, as first argument, a function f which is to
be the factorial itself, and replace recursive calls by calls to this function:
let fact_fun f n =
if n = 0 then 1 else n * f (n - 1)
We then expect the factorial function to be obtained as its fixpoint:
let fact = fix fact_fun
Namely, this function will reduce to fact_fun (fix fact_fun), i.e. the above
function where f was replaced by the function itself, as expected. However, if
we try to define the function fact in this way, OCaml complains:
Stack overflow during evaluation (looping recursion?).
This is because OCaml always evaluates arguments first, so that it will fall into
the infinite sequence of reductions mentioned above (the stack will grow at each
recursive call and will exceed the maximal authorized value):
fix fact_fun −→β fact_fun (fix fact_fun) −→β . . .
The trick in order to avoid that in order to avoid that is to add an argument in
the definition of fix:
let rec fix f x = f (fix f) x
and now the above definition of factorial computes as expected: this time, the
argument fix f does not evaluate further because it is a function which is
still expecting its second argument. It is interesting to remark that the two
definitions of fix (the looping one and the working one) are η-equivalent, see
section 3.2.8, so that two η-equivalent terms can act differently depending on
the properties we consider.
CHAPTER 3. PURE λ-CALCULUS 124
Fixpoints in λ-calculus. The above definition of fix does not easily translate
to λ-calculus, because there is no simple way of defining recursive functions. A
possible implementation of the fixpoint combinator can be obtained by a variant
on the looping term Ω (see section 3.2.6). The Curry fixpoint combinator is
Remark 3.3.5.1. Following Church’s initial intuition when introducing the λ-cal-
culus, we can think of λ-terms as describing sets, in the sense of set theory (see
section 5.3). Namely, a set t can be thought of as a predicate, i.e. a function
CHAPTER 3. PURE λ-CALCULUS 125
let fix f =
(fun x y -> f (arr x x) y) (Arr (fun x y -> f (arr x x) y))
where we use the shorthand
let arr (Arr f) = f
In order to make this result more precise, we should encode Turing machines
into λ-terms. Instead of doing this directly, we can rather encode recursive
functions, which are already known to have the same expressiveness as Turing
machines. The class of recursive functions is the smallest class of partially
defined functions f : Nk → N for some k ∈ N, which contains the zero constant
function z, the successor function s and the projections pki , for k ∈ N and
1 6 i 6 k:
z : N0 → N s : N1 → N pki : Nk → N
() 7→ 0 (n) 7→ n + 1 (n1 , . . . , nk ) 7→ ni
f : Nl → N and g1 , . . . , gl : Nk → N
the function
compfg1 ,...,gl : Nl → N
(n1 , . . . , nk ) 7→ f (g1 (n1 , . . . , nk ), . . . , gl (n1 , . . . , nk ))
is also recursive,
– primitive recursion: given recursive functions
f : Nk → N and g : Nk+2 → N
CHAPTER 3. PURE λ-CALCULUS 127
the function
recf,g : Nk+1 → N
(0, n1 , . . . , nk ) 7→ f (n1 , . . . , nk )
(n0 + 1, n1 , . . . , nk ) 7→ g(recf,g (n0 , n1 , . . . , nk ), n0 , n1 , . . . , nk )
is also recursive,
– minimization: given a recursive function f : Nk+1 → N the function
minf : Nk → N
composition is given by
primitive recursion by
and minimization by
can now use this trick to correct the behavior of our encoding. For instance,
the projection should be encoded as
For the converse property, i.e. the definable functions are recursive, we should
encode λ-terms and their reduction into natural numbers, sometimes called
Gödel numbers. This can be done, see [Bar84] (or if you are willing to accept
that recursive functions are Turing-equivalent to usual programming languages,
this amounts to show that we can make a program which reduces λ-terms, which
we can, see section 3.5).
is encoded as
ptq = λi.i(λnf x.if (i(i n f )x))(λf x.x)
Even though the original term t could reduce, the term ptq cannot (because of
the manipulation we have performed on applications), and can thus be consid-
ered as a decent encoding of t. We can then define an interpreter as
This term has the property that, for every λ-term t, int ptq β-reduces to the
normal form of t. More details can be found in [Bar91, Mog92, Lyn17].
3.3.8 Adding constructors. Even though we have seen that all the usual
constructions can be encoded in the λ-calculus, it is often convenient to add
those as new explicit constructions to the calculus. For instance, products can
be added to the λ-calculus by extending the syntax of λ-terms to
The β-reduction also has to be extended in order to account for those. We add
the two new reduction rules
which express the fact that the left (resp. right) projection extracts the left
(resp. right) component of a pair. Although most important properties (such as
confluence) generalize to such variants of λ-calculus, we stick here to the plain
one for simplicity. Some extensions are used and detailed in section 4.3.
(λxy.y)((λa.a)(λb.b))
y (λxy.y)(λb.b)
Another hope might be that if we reduce long enough a term, we will end
up on a normal form (a term that cannot be reduced further), which can be
considered as a result of the computation, and that if we perform two such
reductions on a term, we will en up on the same normal form: the intermediate
steps might not be the same, but in the end we always end up with the same
result. For instance, on natural numbers, we can speak of 10 as the result of
(1 + 2) + (3 + 4)
because it does not depend on the intermediate steps used to compute it:
(1 + 2) + (3 + 4)
3 + (3 + 4) (1 + 2) + 7
3+7
10
However, in the case of λ-calculus, this hope is vain because we have seen that
some terms might lead to infinite sequence of β-reductions, thus never reaching
a normal form.
3.4.1 Confluence. The property which turns out to be satisfied in the case of
λ-calculus is called confluence: it states that if starting from a term t reduces
CHAPTER 3. PURE λ-CALCULUS 130
in many steps to a term u1 and also to a term u2 , then there exists a term v
such that both u1 and u2 reduce in many steps to v:
t
∗ ∗
u1 u2
∗ ∗
v
(λyx.xyy)(I I)
λx.x(I I)I
λx.x I I
where I = λx.x is the identity. The easiest way to prove this confluence result
first requires to introduce a variant of the β-reduction.
k t − t0 u − u0 k
(βx ) (βs )
x − x (λx.t)u − t0 [u0 /x]
t − t0 u − u0 k t − t0 k
(βa ) (β )
t u − t0 u0 λx.t − λx.t0 λ
∗
As usual, we write − for the reflexive and transitive closure of the relation −.
Informally, t − u means that u is obtained from t by reducing in one step
many of the β-redexes present in t at once. For instance, we have
As for usual β-reduction, the parallel β-reduction might create some β-redexes
which were not present in the original term, and could thus not be reduced at
CHAPTER 3. PURE λ-CALCULUS 131
first. For this reason, even though we can reduce in multiple places at once, we
cannot perform a parallel β-reduction step directly from the term on the left to
the term on the right in the above example.
In parallel β-reduction, we are allowed not to perform all the available β-
reduction steps. In particular, we may perform none:
Lemma 3.4.2.1. For every λ-term t, we have t − t.
Proof. By induction on the term t.
t = t0 − t1 − t2 − . . . − tn = u
t = t0 − t1 − t2 − . . . − tn = u
∗
and thus t − u.
y[u/x] = y − y = y[u/x]
or
x[u/x] = u − u0 = x[u/x]
depending on whether y 6= x or y = x.
– If the last rule is
t1 − t01 t2 − t02 k
(βs )
(λy.t1 ) t2 − t01 [t02 /y]
with y 6= x, then, by induction hypothesis, we have
(t1 [u/x]) (t2 [u/x]) − (t01 [u0 /x]) (t02 [u0 /x])
otherwise said
(t1 t2 )[u/x] − (t01 t02 )[u0 /x]
(λy.t1 )[u/x] = λy.t1 [u/x] − λy.t01 [u0 /x] = (λy.t01 )[u0 /x]
We can use this lemma to show that the β-reduction satisfies a variant of the
confluence property called the diamond property, or local confluence:
Lemma 3.4.3.5 (Diamond property). Suppose that t − u and t − u0 . Then
there exists v such that u − v and u0 − v:
u u0
x u0
u0
t1 t2
u1 u01 u2 u02
v1 v2
u1 [u2 /x] − v1 [v2 /x] and u01 [u02 /x] − v1 [v2 /x]
CHAPTER 3. PURE λ-CALCULUS 134
(λx.t1 ) t2
v1 [v2 /x]
t1 t2
u1 u01 u2 u02
v1 v2
We thus have
u1 [u2 /x] − v1 [v2 /x]
by lemma 3.4.3.4 and
(λx.t1 ) t2
v1 [v2 /x]
u u0
∗
v
∗
Proof. By recurrence on the length of the upper-right reduction t − u0 , using
lemma 3.4.3.5.
∗ ∗
Theorem 3.4.3.7 (Confluence). Suppose that t − u and t − u0 . Then there
∗ ∗
exists v such that u − v and u0 − v:
t
∗ ∗
u u0
∗ ∗
v
∗
Proof. By recurrence on the length of the upper-left reduction t − u, using
lemma 3.4.3.6.
t
∗ ∗
u u0
∗ ∗
v
∗ ∗
Proof. Suppose that t −→β u1 and t −→β u2 . By lemma 3.4.3.3, we have
∗ ∗
t − u1 and t − u2 . From theorem 3.4.3.7, we deduce the existence of v such
∗ ∗ ∗
that u1 − v and u2 − v and, by lemma 3.4.3.3 again, we have u1 −→β v
∗
and u2 −→β v.
This implies the following theorem, sometimes called the Church-Rosser property
of λ-calculus:
CHAPTER 3. PURE λ-CALCULUS 136
Theorem 3.4.4.2 (Church-Rosser). Given two terms t and u such that t ===β u,
∗ ∗
there exists a term v such that t −→β v and u −→β v:
∗
t u
∗ ∗
v
t1 t3 t2n−1
∗ ∗ ∗ ∗ ∗ ∗
Proof. The terms λxy.x and λxy.y are normal forms. If they where equivalent
they would be equal by previous proposition, which is not the case.
will always print dcab in the toplevel. We shall now try to look at the options
we have here, in order to chose a strategy. A first question we have to answer is:
should we reduce functions or arguments first? Namely, consider a term of the
form (λx.t)u such that u reduces to u0 , we have two possible ways of reducing
it:
t[u/x]β ←− (λx.t)u −→β (λx.t)u0
which correspond to reducing functions or arguments first, giving rise to strate-
gies which are respectively called call-by-name and call-by-value. The call-by-
value has a tendency to be more efficient: even if the argument is used multiple
times in the function, we reduce it only once beforehand, whereas the call-by-
∗
name strategy reduces it each time it is used. For instance, if u −→β û, where
û is a normal form, we have the following sequences of reductions:
∗
– in call-by-value: (λx.f xx)u −→β (λx.f xx)û −→β f ûû,
∗ ∗
– in call-by-name: (λx.f xx)u −→β f uu −→β f ûu −→β f ûû.
The function λx.f xx uses twice its argument and therefore we have to reduce u
twice in the second case compared to only once in the first (and this can make a
huge difference if the argument is used much more than twice or if the reduction
of u requires many steps). However, there is a case where the call-by-value
strategy is inefficient: when the argument is not used in the function. Namely,
we always reduce the argument, even if it is not used afterward. For instance,
we have the following sequences of reductions:
∗
– in call-by-value: (λx.y)u −→β (λx.y)û −→β y
– in call-by-name: (λx.y)u −→β y
Otherwise said, we have already observed in section 3.2.3 that β-reduction can
duplicate and erase β-redexes. The call-by-value strategy is optimized for du-
plication and the call-by-name strategy is optimized for erasure. In practice,
people often write programs where they use a result multiple times and rarely
discard the result of computations, so that call-by-value strategies are generally
implemented (this is for instance the case in OCaml). However, for theoretical
purposes call-by-value strategies can be a problem: it might happen that a term
has a normal form and that this strategy does not find it. Namely, consider the
term
(λx.y)Ω
A call-by-value strategy will first try to compute the normal form for Ω and thus
loop, whereas a call-by-name strategy will directly reduce it to y. A strategy is
called normalizing when it will reach a normal form whenever a term has one:
we have seen that call-by-value does not have this property.
(λx.t)u
CHAPTER 3. PURE λ-CALCULUS 138
tu
For instance, the above examples illustrate the fact that the call-by-value and
call-by-name strategies are respectively innermost and outermost.
let f n =
print_endline "Incrementing!";
n+1
would always print the message exactly once, even if the function is never called,
whereas we expect that the message is printed each time the function is called
(which is the case with a weak evaluation strategy). In pure λ-calculus, there is
no printing but one thing is easily observed: non-termination. For instance, we
want to be able to define a function which loops or not depending on a boolean
as follows:
λb.if b Ω I
This function takes a boolean as argument: if it is true it will return the term Ω
whose evaluation is going to loop, otherwise it returns the identity. If we evaluate
CHAPTER 3. PURE λ-CALCULUS 139
v ::= λx.v | x v1 . . . vn
v ::= λx.t | x v1 . . . vn
v ::= λx.v | x t1 . . . tn
v ::= λx.t | x t1 . . . tn
type term =
| Var of var
| App of term * term
| Abs of var * term
where var is an arbitrary type for identifying variables (in practice, we would
choose int or maybe string). We will also need a substitution function, such
that subst x t u computes the term u where all occurrences of the variable x
have been replaced by the term t:
let rec subst x t = function
| Var y -> if x = y then t else Var y
| App (u, v) -> App (subst x t u, subst x t v)
| Abs (y', u) ->
let y = fresh () in
let u = subst y' (Var y) u in
Abs (y, subst x t u)
In order to avoid name captures, we always refresh the names of the abstracted
variables when substituting under an abstraction (this is correct, but quite in-
efficient): in order to do so we use a function fresh which generates a new
variable name each time it is called, e.g. using an internal counter incremented
at each call. For each of the considered strategies below, we will define a func-
tion reduce which performs multiple β-reduction steps, in the order specified
by the strategy.
In the case App (t, u), it can be observed that both terms t and u are always
reduced, so that taking the rightmost variant of the strategy has little influ-
ence. Since it is a weak strategy, it is not normalizing, and normal forms for
this strategy will be weak normal forms. The above function does not directly
compute the weak normal form: it has to be iterated. For instance, applying it
to (λx.xy)(λx.x) will result in (λx.x)y, which further reduces to y.
Applicative order. The applicative order strategy (AO) is the leftmost innermost
strategy, i.e. the variant of call-by-name where we are allowed to reduce under
abstractions.
let rec reduce = function
| Var x -> Var x
| Abs (x, t) -> Abs (x, reduce t)
| App (t, u) ->
match reduce t with
| Abs (x, t') -> subst x (reduce u) t'
| t -> App (t, reduce u)
Normal forms are normal forms in the usual sense. As illustrated above, by the
term (λx.y)Ω, this strategy might not terminate even though the term has a
normal form, otherwise said the strategy is not normalizing.
Call-by-name. The call-by-name strategy (CBN) is the weak head leftmost out-
ermost strategy. Here, arguments are computed at each use and not once for all
as in call-by-value strategy. An implementation of the corresponding reduction
is
Normal order. The normal order strategy (NO) is the leftmost outermost strat-
egy. An implementation is
v ::= λx.v | x v1 . . . vn
n ::= x | n v
CHAPTER 3. PURE λ-CALCULUS 143
v ::= λx.v | n
u 7→ t[u/x]
– if the term is an abstraction λx.t, we return the value which is the function
which to a value v associates the evaluation of t in the environment where x
is bound to v,
this last part being taken care of by the auxiliary function vapp (which
applies a value to another).
Finally, the environment is only really used during the evaluation and we define
let eval t = eval [] t
Readback. We are now pleased because we have a short and efficient imple-
mentation of normalization, excepting on one point: we cannot easily print or
serialize values because they contain functions. We now explain how we can
convert a value back to a term: this procedure is called readback. We will need
an infinite countable pool of fresh variables, so that we define the function
It takes as argument an integer i (the index of the first fresh variable we have
not used yet) and a value v and returns the term corresponding to the value:
CHAPTER 3. PURE λ-CALCULUS 145
– if the value is a function f , we return the term λx.t where t = f (x) for
some fresh variable x,
– otherwise it is of the form x v1 . . . vn and we return x v 1 . . . v n where v i is
the term corresponding to the value vi .
We can then define function which normalizes a term by evaluating it to a value
and reading back the result:
let normalize t = readback 0 (eval t)
For instance, we can compute the normal form of the λ-term (λxy.x)y, which
is λz.y, by
let _ =
let t = App (Abs ("x", Abs ("y", Var "x")), Var "y") in
normalize t
which gives the expected result
Abs ("x@0", Var "y")
Note that this reduction requires α-converting the abstraction on y, and this
was correctly taken care of for us here.
Proof. This follows from the facts that the only possible way to β-reduce a term
λx.t (resp. x t) is of the form λx.t −→β λx.t0 (resp. x t −→β x t0 ) by the rule (βλ )
(resp. (βr )).
For this reason, we know that two terms of the form λx.t and x u are never
β-convertible, no matter what the terms t and u are. In such a situation,
there is thus no need to fully normalize the two terms to compare them. More
generally, a term x t1 . . . tn is never equivalent to an abstraction. Based on this
observation, we can implement the test of β-equivalence as follows:
let eq t u =
(* Equality of values *)
let rec veq i v w =
match v, w with
| VAbs f, VAbs g ->
let x = VNeu (NVar (fresh i)) in
veq (i+1) (f x) (g x)
| VNeu m, VNeu n -> neq i m n
| _, _ -> false
(* Equality of neutral terms *)
and neq i m n =
match m, n with
| NVar x, NVar y -> x = y
| NApp (m, v), NApp (n, w) -> neq i m n && veq i v w
| _, _ -> false
in
veq 0 (eval t) (eval u)
Given two terms t and u, we reduce them to their weak normal form, i.e. we
reduce them until we find abstractions:
– if they are of the form λx.t0 and x u1 . . . un (or conversely), we know that
they are not equivalent (even though we have not computed the normal
form for t),
– if they are of the form λx.t0 and λx.u0 then we compare t0 and u0 (which
requires evaluating them further)
– if they are of the form x t1 . . . tm and y u1 . . . um , where the ti and ui are
weak normal forms, then they are equivalent if and only if x = y, m = n
and ti = ui for every index i.
For instance, this procedure allows ensuring that λx.Ω is not convertible to x:
let () =
let t = Abs ("x", omega) in
assert (not (eq t (Var "x")))
whereas the former equality procedure would loop when comparing the two
terms because it tries to fully evaluate λx.Ω.
CHAPTER 3. PURE λ-CALCULUS 147
x[u/x] = u
y[u/x] = y when y 6= x
(t t0 )[u/x] = (t[u/x]) (t0 [u/x])
(λy.t)[u/x] = λy.t[u/x]
The last case is incorrect, because we do not suppose that y 6∈ FV(t): we are
substituting x by t under the abstraction λy without taking in account the fact
that y might get bound in t in this way. For instance, this implementation
would lead to the following sequence of β-reductions
(λy.yy)(λf x.f x) −→ (λf x.f x)(λf x.f x) −→ λx.(λf x.f x)x −→ λxx.xx
number is called the de Bruijn index of the variable. For instance, consider the
λ-term
λx.x(λy.yx)
This lambda term can be graphically represented as a tree where a node la-
beled “λx” corresponds to an abstraction and a node “@” corresponds to an
application:
λx
x λy
y x
we have also figured in dotted arrows the links between a variable and the
abstraction which created it. In the first variables x and y, the abstraction we
are referring to is the one immediately above (we have to skip 0 λ’s), whereas
in the last occurrence of x, when going up starting from x in the syntactic tree,
the corresponding abstraction is not the first one (which is λy) but the second
one (we have to skip 1 λ). The information in the λ-term can thus equivalently
be represented by
λx.0(λy.01)
where each variable has been replaced by the number of λ’s we have to skip when
going up to reach to corresponding abstraction (note that a given variable, such
as x above, can have different indices, depending on its position in the term).
Now, the names of the variables do not really matter since we are working
modulo α-conversion: we might as well drop them and simply write
λ.0(λ.01)
This is a very convenient notation because it does not mention variables any-
more. What is not entirely clear yet is that we can implement β-reduction in
this formalism. We will see that it is indeed possible, but quite subtle and
difficult to get right.
Terms with de Bruijn indices. We thus consider a variant of the λ-calculus where
terms are generated by the grammar
t, u ::= i | t u | λ.t
term t with n free variables FV(t) = {x0 , . . . , xn−1 } as if we were computing the
de Bruijn representation of t in the term λxn−1 . . . . λx0 .t, i.e. the free variables
are implicitly abstracted. For instance
i.e. graphically
λx λx
@ λz
λy λt −→β λt
λz t t
i[u/i] = u
j[u/i] = j for j 6= i
0 0
(t t )[u/i] = (t[u/i]) (t [u/i])
(λ.t)[u/i] = λ.t[u/i + 1]
But it is incorrect because, in the last case, u might contain free variables, which
refer to above abstractions, and have to be increased by 1 when going under the
abstraction. For instance,
i.e. graphically
λx λx
@ λz
λy x −→β x
λz
where u0 is the term obtained from u by increasing by 1 all free variables (and
leaving other variables untouched), what we will write u0 = ↑0 u in the following.
The “corrected version” with (3.3) still contains a bug, which comes from the fact
that β-reduction removes an abstraction, and therefore the indices of variables
in t referring to the variables abstracted above the removed abstraction have to
be decreased by 1. For instance
i.e. graphically
λx λx
@ x
−→β
λy λt
x t
corresponds to
This means that we should also correct the second case of substitution in order
to decrease the index of variables which were free in the original substitution.
And now we have it right.
In order to distinguish between bound and free variables in a term, it will be
convenient to maintain an index l, called the cutoff level, such that the indices
strictly below l correspond to bound variables and those above are free variables.
We thus first define a function ↑l such that ↑l t is the term obtained from t by
CHAPTER 3. PURE λ-CALCULUS 152
increasing by one all variables with index i > l, called the lifting of t at level l.
By induction,
(
i if i < l
↑l i =
i + 1 if i > l
↑l (t u) = (↑l t) (↑l u)
↑l (λ.t) = λ.(↑l+1 t)
The right way to think about it is that ↑l t is the term obtained from t by adding
a “new variable” of index l: the variables of index i > l have to be increased
by 1 in order to make room for the new variable. Similarly, we can define a
function ↓l such that, for every term t which does not contain the variable l, ↓l t
is the term obtained by removing the variable l (the unlifting of t): all variables
of index i > l have to be decreased by one. It turns out that we will only need
it when t is a variable so that we define
(
i − 1 if i > l
↓l i =
i if i < l
(it is not defined when i = l). With those at hand, we can finally correctly
define substitution:
Definition 3.6.2.1 (Substitution). Given terms t and u and variable i, we define
the substitution of i by u in t
t[u/i]
by induction by
i[u/i] = u
j[u/i] = ↓i j for j 6= i
(t t )[u/i] = (t[u/i]) (t0 [u/i])
0
S T U V −→ (T V ) (U V ) K T U −→ T I T −→ T
T −→ T 0 U −→ U 0
T U −→ T 0 U T U −→ T U 0
We implicitly bracket application on the left, i.e. T U V is read as (T U ) V . As
∗
usual, we write −→ for the reflexive and transitive closure of the relation −→,
∗
and ←→ for it reflexive, symmetric and transitive closure. A normal form is
a term which does not reduce. We write FV(T ) for the set of variables of a
term T . A combinator is a term without variables.
S K K T −→ K T (K T ) −→ T
Λx.x = I
Λx.T = K T if x 6∈ FV(T ),
Λx.(T U ) = S (Λx.T ) (Λx.U ) otherwise.
CHAPTER 3. PURE λ-CALCULUS 156
Λx.Λy.x = S (K K) I
Note that the term on the right is a normal form (in particular, it does not
reduce to K).
Given terms T, U , we write T [U/x] for the term T where the variable x has been
replaced by U .
Lemma 3.6.3.5. For any terms T, U and variable x, we have
∗
(Λx.T ) U −→ T [U/x]
Proof. By induction on T .
JxKcl = x JxKλ = x
Jt uKcl = JtKcl JuKcl JT U Kλ = JT Kλ JU Kλ
Jλx.tKcl = Λx.JtKcl JSKλ = λxyz.(xz)(yz)
JKKλ = λxy.x
JIKλ = λx.x
Jλxy.xKcl = S (K K) I Jλxy.yyKcl = K (S I I)
JS (K K) IKλ = (λxyz.(xz)(yz))((λxy.x)(λxy.x))(λx.x)
∗ ∗
Lemma 3.6.3.7. For any terms T, U , if T −→ U then JT Kλ −→β JU Kλ .
Proof. By induction on T .
The reduction of combinatory terms can be simulated in λ-calculus:
∗
Lemma 3.6.3.8. For any term T , JΛx.T Kλ −→β λx.JT Kλ .
Proof. By induction on T .
Translating a λ-term back and forth has no effect up to β-equivalence:
∗
Lemma 3.6.3.9. For any λ-term t, JJtKcl Kλ −→β t.
The previous theorem, together with lemma 3.6.3.7, can be seen as the fact that
combinatory logic embeds into λ-calculus (modulo β-reduction). It also implies
that the basic combinators S, K and I can be thought of as a “basis” from which
all the λ-terms can be generated:
Corollary 3.6.3.10. Every closed λ-term is β-equivalent to one obtained from S,
K and I by application.
This dictionary between λ-calculus and combinatory logic unfortunately has
a number of minor defects. First, it is not true that
∗ ∗
t −→β u implies JtKcl −→ JuKcl
For instance, we have
Jλx.(λy.y) xKcl = S (K I) I Jλx.xKcl = I (3.4)
where both combinatory terms are normal forms. If we try to go through the
induction, the problem comes from the fact that β-reduction satisfies the rule
on the left below, often called (ξ), whereas the corresponding principle on the
right is not valid in combinatory logic:
t −→β t0 T −→ T 0
(ξ)
λx.t −→β λx.t0 Λx.T −→ Λx.T 0
as the above example illustrates. Intuitively, this is due to the fact that we have
not yet provided enough arguments to the terms. Namely, if we apply both
terms of (3.4) to an arbitrary term T , we obtain the same result:
S (K I) I T −→ K I T (I T ) −→ I (I T ) −→ I T −→ T and I T −→ T
In general, it can be shown that
∗ ∗
t −→β u implies JtKcl T1 . . . Tn −→ JuKcl T1 . . . Tn
for every terms Ti , provided that n is a large enough natural number depending
on t and u. It is also not true that the translation of a combinatory term in
normal form is a normal λ-term:
JK xKcl = (λxy.x) x −→β λy. x
Again, the term K x is intuitively a normal form only because it is not applied
to enough arguments. Finally, given a term T , the terms JJT Kλ Kcl and T are
not convertible in general. For instance
JJKKλ Kcl = Jλxy.xKcl = S (K K) I 6= K
Both terms are normal forms and if they were convertible, they would reduce to
a common term by theorem 3.6.3.3. This is again due to the lack of arguments:
for every term T , we have
∗
S (K K) I T −→ K T
Two combinatory terms T and T 0 are extensionally equivalent, when for every
∗
term U , we have T U ←→ T 0 U . It can be shown that combinatory terms modulo
reduction and extensional equivalence are in bijection with λ-terms modulo β
and η, via the translations we have defined.
CHAPTER 3. PURE λ-CALCULUS 158
ι = λx.x S K
we have
I = ιι K = ι (ι (ι ι)) S = ι (ι (ι (ι ι)))
We can therefore base combinatory logic on the only combinator ι, the reduction
rule being
ι T −→ T S K = T (ι (ι (ι (ι ι)))) (ι (ι (ι ι)))
In the sense described above, any λ-term can thus be encoded as a combinator
based on ι, i.e. as a term generated by the grammar
t, u ::= ι | t u
[ι] = 1 [t u] = 0[t][u]
4.1 Typing
4.1.1 Types. A simple type is an expression made of variables and arrows.
Those are generated by the grammar
A, B ::= X | A → B
Γ = x 1 : A1 , . . . , x n : An
is a list of pairs consisting of a variable xi (in the sense of λ-calculus, see sec-
tion 3.1.1) and a type Ai . A context is thus either the empty context or of
the form Γ, x : A for some context Γ, which is useful to reason by induction on
contexts. The domain dom(Γ) of the context Γ is the set of variables occurring
in it:
dom(Γ) = {x1 , . . . , xn }
Given a variable x ∈ dom(Γ), sometimes write Γ(x) for the type associated with
it. Here, we do not require that in a context all the variables xi are distinct: to
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 160
for y 6= x.
Church vs Curry style for λ-terms. The above convention, where abstractions
are typed, is called Church style λ-terms. We will see that adopting it greatly
simplifies the questions one is usually interested in for those terms (such as type
checking, see section 4.1.6), at the cost of requiring small annotations from the
user (the type of the abstractions).
A variant of the theory where abstractions are not typed can also be devel-
oped and is called Curry style, see section 4.4. This is for instance the convention
used in OCaml: one would typically write
let f = fun x -> x
although the Church style is also supported, i.e. we can also write
let f = fun (x:int) -> x
Γ`t:A (4.1)
(ax)
Γ ` x : Γ(x)
Γ, x : A ` t : B
(→I )
Γ ` λxA .t : A → B
Γ`t:A→B Γ`u:A
(→E )
Γ ` tu : B
λf A→A .λxA .f (f x)
has type
(A → A) → A → A
Namely, we have the typing derivation
(ax) (ax)
Γ`f :A→A Γ`x:A
(ax) (→E )
Γ`f :A→A Γ ` fx : A
(→E )
f : A → A, x : A ` f (f x) : A
(→I )
f : A → A ` λxA .f (f x) : A → A
(→I )
` λf A→A .λxA .f (f x) : (A → A) → A → A
with
Γ = f : A → A, x : A
Remark 4.1.4.2. Although this will mostly remain implicit in the following, we
consider sequents up to α-conversion: this means that, in a sequent Γ ` t : A,
we can change a variable x into y both in Γ and in t at the same time, provided
that y 6∈ dom(Γ). Because of this, we can always assume that all the variables
are distinct in the contexts we consider. This assumption is sometimes useful
to reason about proofs, e.g. with this convention, the axiom rule is equivalent
to
(ax)
Γ, x : A, Γ0 ` x : A
We do however feel bad about systematically assuming this because, in practice,
implementations of logical or typing systems do not maintain this invariant.
4.1.5 Basic properties of the typing system. We state here some basic
properties of the typing system, which will be used in the sequel. First, the
following variant of the structural rules (see section 2.2.10) hold.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 162
Γ, Γ0 ` t : B
(wk)
Γ, x : A, Γ0 ` t : B
Γ, x : A, y : B, Γ0 ` t : C
(xch)
Γ, y : B, x : A, Γ0 ` t : C
provided that x 6= y.
Proof. By induction on the proof of the premise.
Γ, x : A, y : A, Γ0 ` t : B
(contr)
Γ, x : A, Γ0 ` t[x/y] : B
4.1.6 Type checking, type inference and typability. The three most im-
portant algorithmic questions when considering a typing system are the follow-
ing ones.
In simply-typed λ-calculus all those three problems are very easy: they can
be answered in linear time over the size of the term t (neglecting the size of Γ):
Theorem 4.1.6.1 (Uniqueness of typing). Given a context Γ and a term t there
is at most one type A such that t has type A in the context Γ and at most one
derivation of Γ ` t : A.
Proof. By induction on the term t. We have the following cases depending on
its shape:
– if the term is of the form x then it is typable iff x ∈ dom(Γ) and in this
case the typing derivation is
(ax)
Γ`x:A
with A = Γ(x),
– if the term is of the form t u then it is typable iff both t and u are typable
in Γ, with respective types of the form A → B and A, and in this case the
typing derivation is
.. ..
. .
Γ`t:A→B Γ`u:A
(→E )
Γ ` tu : B
– if the term is of the form λxA .t then it is typable iff t is typable in con-
text Γ, x : A with some type B, and in this case the typing derivation
is
..
.
Γ, x : A ` B
(→I )
Γ ` λxA .t : A → B
(ax) (ax)
Γ, x : A, Γ0 ` x : A Γ, A, Γ0 ` A
Γ, x : A ` t : B Γ, A ` B
A
(→I ) (⇒I )
Γ ` λx .t : A → B Γ`A⇒B
(ax)
Γ, A, Γ0 ` A
(ax)
Γ, x : A, Γ0 ` x : A
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 165
(** Types. *)
type ty =
| TVar of string
| Arr of ty * ty
(** Terms. *)
type term =
| Var of var
| App of term * term
| Abs of var * ty * term
exception Type_error
(** Typability. *)
let typable env t =
try let _ = infer env t in true
with Type_error -> false
π π0
Γ`A⇒B Γ`A
(⇒E )
Γ`B
π
Γ, A ` B
(⇒I )
Γ`A⇒B
and they are clearly different (they respectively correspond to the first and the
second projection). This sheds a new light on our remark of section 2.2.10,
stating that contexts should be lists and not sets in proof systems. If we han-
dled them as sets, we would not be able to distinguish them since both would
correspond, via the “Curry-Howard correspondence”, to the proof
(ax)
A`A
(⇒I )
A`A⇒A
(⇒I )
`A⇒A⇒A
(A ⇒ A ⇒ B) ⇔ (A ⇒ B)
t : (A → A → B) → (A → B) u : (A → B) → (A → A → B)
In the above example, the first equality does hold, but not the second since
Γ, x : A, Γ0 ` t : B Γ, Γ0 ` u : A
0
Γ, Γ ` t[u/x] : B
is admissible.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 168
– If it is of the form
π1 π2
Γ, x : A, Γ0 ` t : B ⇒ C Γ, x : A, Γ0 ` t0 : B
(⇒E )
Γ, x : A, Γ0 ` t t0 : C
where π10 and π20 are respectively obtained from π1 and π2 by induction
hypothesis.
– If it is of the form
π
0
Γ, x : A, Γ , y : B ` t : C
(⇒I )
Γ, x : A, Γ0 ` λy.t : B ⇒ C
by induction hypothesis.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 169
Remark 4.1.8.2. Note that, through the Curry-Howard correspondence, the sub-
stitution lemma precisely corresponds to the “proof substitution” of proposi-
tion 2.3.2.1: the term erasure of the rule of lemma 4.1.8.1 is the cut rule
Γ, A, Γ0 ` B Γ, Γ0 ` A
(cut)
Γ, Γ0 ` B
and the typing derivation of the term on the left is of the form
..
.
..
Γ, x : A ` t : B .
(→I )
Γ ` λx.t : A → B Γ`u:B
(→E )
Γ ` (λx.t) u : B
t u −→β t0 u
with t −→β t0 , and the typing derivation of the term on the left is of the
form
π1 π2
Γ`t:A→B Γ, x : A ` u : B
(→E )
Γ ` tu : B
We conclude with the derivation
π10 π2
0
Γ`t :A→B Γ, x : A ` u : B
(→E )
Γ ` t0 u : B
The proof of the above theorem deserves some attention. It should be observed
that, by erasing the terms, the β-reduction of a typable term described in the
above proof corresponds precisely to the procedure we used in section 2.3.3 in
order to eliminate a cut in the corresponding proof:
..
.
..
Γ, A ` B .
(⇒I ) ..
Γ`A⇒B Γ`B .
(⇒E )
Γ`B Γ`B
Thus,
Theorem 4.1.8.5 (Dynamical Curry-Howard correspondence). Through the Curry-
Howard correspondence, β-reduction corresponds to eliminating cuts.
This explains the remark already made in section 2.3.3: although cut-free proofs
are “simpler” in the sense that they do not contain cuts, they can be much bigger
than the corresponding cut-free proof, in a same way that executing a program
can give rise to a much bigger result than the program itself (e.g. a program
computing the factorial of 1000). As a direct consequence of previous theorem,
we have that
Corollary 4.1.8.6. Through the Curry-Howard correspondence, typable terms
in normal form correspond to cut-free proofs.
t −→η λxA .t x
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 171
4.1.10 Confluence. Recall from section 3.4 that β-reduction of λ-terms is con-
fluent. By theorem 4.1.8.3, we can immediately extend this result to typable
terms:
Theorem 4.1.10.1 (Confluence). The β-reduction is confluent on typable terms
∗
(in some fixed context): given typable terms t, u1 and u2 such that t −→β u1
∗ ∗ ∗
and t −→β u2 , there exists a typable term v such that u1 −→β v and u2 −→β v.
(λxy.yxx)(I I) (λxy.yxx) I
We see that the redex I I −→β I on the top line has become two redexes in the
bottom line: this is because the term λxy.yxx contains twice the variable x and
the vertical reduction will thus cause the term substituted for x to be duplicated.
Following the terminology introduced in section 3.5.1, what theorem 2.3.3.1
establishes is thus that the innermost reduction strategies, such as call-by-value,
terminate for typable λ-terms.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 172
Failure of the naive proof. A first attempt to show the result would consist
in showing that, for any derivable sequent Γ ` t : A, the term t is strongly
normalizing by induction on the proof of the sequent.
– For the rule (ax), this is immediate since a variable is strongly normalizing
(it is even a normal form).
– For the rule (→I ), we have to show that a term λx.t is strongly normal-
izing knowing that t is strongly normalizing. A sequence of reductions
starting from λx.t is of the form λx.t −→β λx.t1 −→β λx.t2 −→β . . . with
t −→β t1 −→β t2 −→β . . ., and is thus finite since t is strongly normalizing
by induction hypothesis.
– For the rule (→E ), we have to show that a term t u is strongly normalizing
knowing that both t and u are strongly normalizing. However, a reduction
in t u is not necessarily generated by a reduction in t or in u in the case
where t is an abstraction, and we cannot conclude.
If we try to identify the cause of the failure, we see that we do not really use the
fact that the terms are typable in the last case. We are left proving that if t and
u are normalizable then t u is normalizable, and there is a counter-example to
that, already encountered in section 3.2.6: take t = λx.xx and u = λx.xx, both
are strongly normalizable, but t u is not since it leads to an infinite sequence of
reductions. This however not a counter-example to the strong normalizability
property, because λx.xx cannot be typed, but we have no easy way of exploiting
this fact.
In the first case, we have not been particularly subtle: we wanted a set of
strongly normalizable terms which contains all the terms of type X, and we
simply took all strongly normalizable terms. However, in the second case, we
have crafted our definition to avoid the previous problem: in the case of the rule
(→E ), it will be immediate to deduce that, given t ∈ RA→B and u ∈ RA that
t u ∈ RB . However, it is not immediate that every term in RA→B is strongly
normalizing and we will have to prove that. A term is said to be reducible when
it belongs to a set of reducibility candidates RA for some type A.
We begin by showing that every term t ∈ RA is strongly normalizing by
induction on the type A, but in order to do so we need to strengthen the
induction hypothesis and show together additional properties on A. A term is
neutral when it is not an abstraction; otherwise said, a neutral term is of the
form t u or x.
Proposition 4.2.2.1. Given a type A and a term t, we have
Lemma 4.2.2.3. Suppose given a term t such that Γ ` t : A is derivable for some
context Γ = x1 : A1 , . . . , xn : An and type A. Then, for every terms ti ∈ RAi ,
for 1 6 i 6 n, we have t[t1 /x1 , . . . , tn /xn ] ∈ RA .
Proof. We write t[t∗ /x∗ ] for the above substitution, and show the result by
induction on t. By induction on the derivation of Γ ` t : A.
– If the last rule is
(ax)
Γ ` x i : Ai
then, for every terms ti ∈ RAi , we have t[t∗ /x∗ ] = ti ∈ RAi .
then, for every terms ti ∈ RAi , by induction hypothesis, u[t∗ /x∗ ] ∈ RA→B
and v[t∗ /x∗ ] ∈ RA and therefore t[t∗ /x∗ ] = (u[t∗ /x∗ ])(v[t∗ /x∗ ]) ∈ RB .
– If the last rule is
Γ, x : A ` u : B
(→I )
Γ ` λx.u : A → B
then, by induction hypothesis, for every terms ti ∈ RAi and for every
term v ∈ RA , we have u[t∗ /x∗ ][v/x] = u[t∗ /x∗ , v/x] ∈ RB . Therefore, by
lemma 4.2.2.2, we have t[t∗ /x∗ ] = λx.(u[t∗ /x∗ ]) ∈ RA→B .
RΓ`A = {t | Γ ` t : A is derivable}
However, the way the definition is formulated allows to perform the proofs by
induction!
t̂ ?
û
We have thus reduced the problem of deciding whether two terms are convertible
to deciding whether two terms are equal, which is easily done. Using the func-
tions defined in section 3.5, the following function eq tests for the β-equivalence
of two λ-terms which are supposed to be typable:
– t −→ u implies t −→β u,
– if t −→ u and t −→ u0 then u = u0 ,
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 177
the last point being the determinism of the strategy. In order words, the strategy
−→ picks, for every term t, at most one way to reduce it. We can then show
that every closed typable term is terminating with respect to −→ using a much
simplified version of the above proof, see [Pie02, Chapter 12].
We define sets RA of λ-terms by induction on A by
– t ∈ RX if ` t : X is derivable and t is strongly normalizing,
– t ∈ RA→B if ` t : A → B is derivable, t is strongly normalizing and
t u ∈ RB for every u ∈ RA .
Contrarily to section 4.2.2, by lemma 4.1.5.4, the sets RA contain only closed
terms of type A.
Above, we have been using the following property of reduction when showing
the properties of reducibility candidates in proposition 4.2.2.1:
Lemma 4.2.5.1. If t −→ t0 and t is strongly normalizing then t0 is.
Proof. An infinite sequence of reductions t0 −→ . . . starting from t0 can be
extended as one t −→ t0 −→ . . . starting from t, i.e. if t0 not strongly normalizing
then t is not either. We conclude by contraposition.
The main consequence of having a deterministic reduction is that the converse
now also holds:
Lemma 4.2.5.2. If t −→ t0 and t0 is strongly normalizing then t is.
Proof. By determinism, an infinite sequence of reductions starting from t is
necessarily of the form t −→ t0 −→ . . ., and thus induces one starting from t0 .
Remark 4.2.5.3. Again, this property would not be true with the relation −→β ,
which is not deterministic. For instance, we have (λx.y)Ω −→β y, where (λx.y)Ω
is not strongly normalizing but y is.
We can now easily show variants of the properties of proposition 4.2.2.1. Note
that the proof is greatly simplified because we do not need to prove them all at
once.
Lemma 4.2.5.4 (CR1). If t ∈ RA then is strongly normalizing.
Proof. By induction on A, immediate by definition of RA .
Lemma 4.2.5.5 (CR2). If t ∈ RA and t −→ t0 then t0 ∈ RA .
Proof. By induction on the type A.
The last one uses lemma 4.2.5.2 and thus relies on the fact that we have a
deterministic reduction:
Lemma 4.2.5.6 (CR3). If t −→ t0 and t0 ∈ RA then t ∈ RA .
Proof. By induction on the type A.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 178
(ax)
Γ ` x : Γ(x)
Γ`t:A→B Γ`u:A Γ, x : A ` t : B
(→E ) (→I )
Γ ` tu : B Γ ` λxA .t : A → B
(1I )
Γ ` hi : 1
Γ`t:A+B Γ, x : A ` u : C Γ, y : B ` v : C
(+E )
Γ ` case(t, x 7→ u, y 7→ v) : C
Γ`t:A Γ`B
(+lI ) (+rI )
Γ` ιB
l (t) :A+B Γ` ιA
r (t) :A+B
Γ`t:0
(0E )
Γ ` caseA (t) : A
Figure 4.3: Typing rules for λ-calculus with products and sums.
t, u ::= x | t u | λxA .t
| ht, ui | πl (t) | πr (t) | hi
| ιA A A
l (t) | ιr (t) | case(t, x 7→ u, y 7→ v) | case (t)
Moreover, each such connective will give rise to typing rules and the full list of
rules is given in figure 4.3. In addition, we need to add new rules for β-reduction,
which correspond to cut elimination for the new rules (see section 4.1.8), re-
sulting in the rules of figure 4.4, and η-expansion rules which correspond to
introducing “co-cuts” (see section 4.1.9). We now gradually introduce each of
those.
Most of the important theorems extend to the λ-calculus with these new
added constructors and types, although we will not detail these:
– confluence (theorem 3.4.3.7),
β-reduction rules:
case(ιA
r (t), x 7→ u, y 7→ v) −→β v[t/y]
Figure 4.4: Reduction rules for λ-calculus with products and sums.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 181
where
– ht, ui is the pair of two λ-terms t and u,
The first one states that if t is a term, which is a pair consisting of an element
of A and an element of B, then the term πl (t), obtained by taking its first
projection, has type A. The second rule is similar. The last rule establishes
that if t is of type A and u is of type B then the pair ht, ui is of type A × B.
If apply our term erasing procedure of section 4.1.7, and replace the symbols
× by ∧, we recover the rules for conjunction:
This means that our extension of simply typed λ-calculus is compatible with
the Curry-Howard correspondence (theorem 4.1.7.1).
Recall that the cut-elimination rules for conjunction consist in the following
two cases:
π π0
Γ`A Γ`B
(∧I )
Γ`A∧B π
(∧lE )
Γ`A Γ`A
π π0
Γ`A Γ`B
(∧I )
Γ`A∧B π0
(∧rE )
Γ`B Γ`B
π π0
Γ`t:A Γ`u:B
(×I )
Γ ` ht, ui : A × B π
(×lE )
Γ ` πl (ht, ui) : A Γ`t:A
π π0
Γ`t:A Γ`u:B
(×I )
Γ ` ht, ui : A × B π0
(×rE )
Γ ` πr (ht, ui) : B Γ`u:A
which indicate that the reduction rules associated to the new connectives should
be
as expected: taking the first component of a pair ht, ui returns t, and similarly
for the second component.
Finally, the η-expansion rule corresponds to the following transformation of
the proof derivation:
π π
Γ`t:A×B Γ`t:A×B
(×lE ) (×rE )
π Γ ` πl (t) : A Γ ` πr (t) : B
(×I )
Γ`t:A×B Γ ` hπl (t), πr (t)i : A × B
Γ ` πlA,B : A × B → A Γ ` πrA,B : A × B → B
unpair(t, xy 7→ u)
let (x,y) = t in u
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 183
Γ`t:A×B Γ, x : A, y : B ` u : C
Γ ` unpair(t, xy 7→ u) : C
This is the flavor of rules which has to be used when working with dependent
types, see section 8.3.3. We did not use it here because, through the Curry-
Howard correspondence, it corresponds to the following variant of elimination
rule for conjunction
Γ`A∧B Γ, A, B ` C
(∧E )
Γ`C
which is not the one which is traditionally used (the main reason is that it
involves a “new” formula C, whereas the usual rules (∧lE ) and (∧rE ) only use A
and B).
are isomorphic, see remark 4.1.7.3, where the first type is implicitly bracketed
as (A × B) → C. In OCaml, this means that it is roughly the same to write a
function of the form
let f x y = ...
or a function of the form
let f (x, y) = ...
More precisely, the isomorphism between the two types means that we have
λ-terms which allow converting elements of on type into an element of the other
type, in both directions:
called unit. It corresponds to the unit type of OCaml and, through the Curry-
Howard correspondence to the formula >. We add a new constant λ-term hi
which is the only element of 1 (up to β-equivalence), and corresponds to () in
OCaml. The typing rule is
(1I )
Γ ` hi : 1
which corresponds to the usual rule for truth by term erasure:
(>I )
Γ`>
A+B
on types, which represents the coproduct of the two types A and B. Intuitively,
this corresponds to the set-theoretic disjoint union: an element of A+B is either
an element of type A or an element of type B. We add three new constructions
to the syntax of λ-terms:
t, u, v ::= . . . | case(t, x 7→ u, y 7→ v) | ιA A
l (t) | ιr (t)
where t, u and v are terms, x and y are variables and A is a type. Since
A + B is the disjoint union of A and B, we should be able to see a term t of
type A (resp. B) as a term of type A + B: this is precisely represented by the
term ιB A
l (t) (resp. ιr (t)), which can be thought of as the term t “cast” into an
element of type A + B. For this reason, ιA A
l and ιr are often called the canonical
injections. Conversely, any element of A + B should either be an element of A
or an element of B. This means that we should be able to construct new values
by case analysis: typically, given a term t of type A + B,
– if t is of type A then we return u(t),
– if t is of type B then we return v(t).
Above, u (resp. v) should be a λ-term with a distinguished free variable x
(resp. y) which is to be replaced by t. In formal notation, such a case analysis
is written
case(t, x 7→ u, y 7→ v)
The symbol “7→” is purely formal here (it indicates bound variables), and our
operation takes 5 arguments t, x, u, y and w. With the above intuitions, it
should be no surprise that the typing rules are
Γ`t:A+B Γ, x : A ` u : C Γ, y : B ` v : C
(+E )
Γ ` case(t, x 7→ u, y 7→ v) : C
Γ`t:A Γ`B
(+lI ) (+rI )
Γ` ιB
l (t) :A+B Γ` ιA
r (t) :A+B
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 185
to
π 0 [π/x]
Γ ` u[t/x] : C
as well as the symmetric one, obtained by suitably using ιr instead of ιl . The
β-reduction rules are thus
case(ιB
l (t), x 7→ u, y 7→ v) −→β u[t/x]
case(ιA
r (t), x 7→ u, y 7→ v) −→β v[t/y]
Church vs Curry style. Note that if we remove the type annotations on the
injections, i.e. write ιl (t) instead of ιB
l (t), then the typing of a λ-term is not
unique anymore. Namely, the type rules become
Γ`t:A Γ`t:B
(+lI ) (+rI )
Γ ` ιl (t) : A + B Γ ` ιr (t) : A + B
and, in the first rule, there is no way of guessing the type B in the conclusion
from the premise (and similarly for the other rule). Similar issues happen if we
remove the type annotations from abstractions and are detailed in section 4.4.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 186
α-conversion and substitution. The reason why we use the symbol “7→” in terms
case(t, x 7→ u, y 7→ v) is that it indicates that x is bound in u (and similarly for
v), in then sense that our α-equivalence should include
This also means that substitution should take care not to accidentally bind
variables: the equation
case(ιB
l (t), u, v) −→β u t case(ιA
r (t), u, v) −→β v t
4.3.4 Empty type. The empty type is usually denoted 0. We extend the
syntax of λ-terms
t ::= . . . | caseA (t)
by adding one eliminator caseA (t) which allows us to construct an element of an
arbitrary type A, provided that we have constructed an element of the empty
type 0 (which we do not expect to be possible). The typing rule is thus
Γ`t:0
(0E )
Γ ` caseA (t) : A
Γ`⊥
(⊥E )
Γ`A
π
Γ`t:0 π
0 (0E )
Γ ` case (t) : 0 Γ`t:0
π π0 π 00
Γ`A∨B Γ, A ` C ∧ D Γ, B ` C ∧ D
(∨E )
Γ`C ∧D
(∧lE )
Γ`C
which reduces to
π0 π 00
π Γ, A ` C ∧ D Γ, B ` C ∧ D
(∧lE ) (∧lE )
Γ`A∨B Γ, A ` C Γ, B ` C
(∨E )
Γ`C
π π0 π 00
Γ`t:A+B Γ, x : A ` u : C ∧ D Γ, y : B ` v : C ∧ D
(+E )
Γ ` case(t, x 7→ u, y 7→ v) : C ∧ D
(×lE )
Γ ` πl (case(t, x 7→ u, y 7→ v)) : C
should reduce to
π0 π 00
π Γ, x : A ` u : C × D Γ, y : B ` v : C × D
(×lE ) (×lE )
Γ`t:A+B Γ, x : A ` πl (u) : C Γ, y : B ` πr (v) : C
(+E )
Γ ` case(t, x 7→ πl (u), y 7→ πr (v)) : C
which states that projections can “go through” case operators. Other rules are
obtained similarly.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 188
A ::= X | A → B | Nat
where the newly added type Nat stands for natural numbers. Terms are gener-
ated by
t, u, v ::= x | t u | λx.t | Z | S(t) | rec(t, u, xy 7→ v)
where the term Z stands for the zero constant, and S(t) for the successor of a
term t (supposed to be a natural number). The construction rec(t, u, xy 7→ v)
allows to define a function by induction:
– if t is 0, it returns u,
Rules. The typing rules for the new terms are the following ones:
Γ ` t : Nat
(ZI ) (SI )
Γ ` Z : Nat Γ ` S(t) : Nat
Properties. It can be shown, see section 4.3.7, that this system is terminating
and confluent. Moreover, the functions of type
Nat → Nat
which can be implemented in this system are precisely the recursive functions
which are provably total (in Peano Arithmetic, see section 5.2.5), i.e. recursive
functions for which there is a proof that they terminate on every input. This
class of function strictly includes the primitive recursive ones, and it strictly
included in the class of total recursive functions.
We also extend the notion of neutral term: a term is neutral when it is not of
the following forms
which correspond to the possible introduction rules in our system. With those
definitions, the proofs can be performed following the same structure as in
section 4.2.2.
A, B ::= X | A → B
t, u ::= x | λx.t | t u
(ax)
Γ ` x : Γ(x)
Γ, x : A ` t : B
(→I )
Γ ` λx.t : A → B
Γ`t:A→B Γ`u:A
(→E )
Γ ` tu : B
This seemingly minor change of not writing types for abstractions has strong
consequences on the properties of typing. In particular, theorem 4.1.6.1 does
not hold anymore: a given λ-term might admit multiple types. For instance,
the identity λ-term admits the following types:
(ax) (ax)
x:X`x:X x:Y →Z`x:Y →Z
(→I ) (→I )
` λx.x : X → X ` λx.x : (Y → Z) → (Y → Z)
and in fact, every type of the form A → A for some type A is an admissible
type for the identity.
The reason for this is that when we derive a type containing a type variable
in this system, we can always replace this variable by any other type. Formally,
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 191
given types A and B and a type variable X, we write A[B/X] for the type
obtained from A by replacing every occurrence of X by B in A. Similarly, given
a context Γ, we write Γ[B/X] for the context where X has been replaced by B
in every type. We have
Lemma 4.4.1.1. If Γ ` t : A is derivable then Γ[B/X] ` t : A[B/X] is also
derivable for every type B and variable X.
Proof. By induction on the derivation of Γ ` t : A.
For instance, since the identity admits the type X → X, it also admits the same
type where X has been replaced by Y → Z, i.e. (Y → Z) → (Y → Z). The
first type is “more general” than the second, in the sense that the second can
be obtained by substituting type variables in the first. We will see that any
term admits a type which is “most general”, in the sense that it is more general
than any other of its types. For instance, the most general type for identity
is X → X. Again, this phenomenon is not present in Church style typing,
e.g. the two terms
λxX .x : X → X λxY →Z .x : (Y → Z) → (Y → Z)
dom(σ) = {X ∈ X | σ(X) 6= X}
This set will always be finite for the substitutions we consider here, so that, in
practice, a substitution can be described by the images of the variables X in its
domain. Given a type A, we write A[σ] for the type A where every variable X
has been replaced by σ(X). Formally, it is defined by induction on the type A
by
X[σ] = σ(X)
(A → B)[σ] = A[σ] → B[σ]
In this case, we sometimes say that the context Γ[σ] is a refinement of the
context Γ. It is easily shown that if a term admits a type, it also admits a less
general type: lemma 4.4.1.1 generalizes as follows.
Lemma 4.4.2.2. Given a term t such that Γ ` t : A is derivable and a substitu-
tion σ then Γ[σ] ` t : A[σ] is also derivable.
Proof. By induction on the derivation of Γ ` t : A.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 192
– Γ[σ] ` t : A is derivable,
– for every substitution τ such that Γ[τ ] ` t : B is derivable, there exists a
substitution τ 0 such that τ = τ 0 ◦ σ and B = A[τ 0 ].
In other words, the most general type is a type A for t in some refinement of
the context Γ such that every other type can be obtained by substitution, in
the sense of lemma 4.4.2.2.
This is often used in the case where the context Γ is empty, in which case the
substitution σ is not relevant. In this case, the principal type for t is a type A
such that ` t : A is derivable and which is minimal: given a type B, we have
` t : B derivable if and only if A v B.
Example 4.4.2.4. The principal type for t = λx.x is X → X: the types of t are
those of the form A → A for some type A.
Example 4.4.2.5. The principal types for the λ-terms
are respectively
(X → Y → Z) → (X → Y ) → X → Z and X→Y →X
E = {A1 = ? Bn }
? B1 , . . . , A n = (4.2)
Ai [σ] = Bi [σ]
Typing with constraints. The idea is that to every context Γ and λ-term t, we
are going to associate a type A and a type equation system E which are complete
in the sense that
Γ[σ] ` t : A[σ]
Γ[σ] ` t : B
In this sense, the solutions of E describe all the possible types of t in the
refinements of the context Γ. Its elements are sometimes called constraints
since they encode constraints on acceptable substitutions. We will do so by
imposing the “minimal amount of equations” to E so that t admits a type A in
the context Γ. As usual, this is performed by induction on t, distinguishing the
three possible cases:
– x: we have a type A if and only if x ∈ dom(Γ), in which case A = Γ(x),
– λx.t: the type A should be of the form B → C for where C is the type
of t. Writing At for the type inferred for t, we thus define A = X → At
for some fresh variable X,
– t u: we have a type A if and only if t is of the form B → A and u is
of type B. Writing At for the type inferred for t and Au for the type
inferred for u, we thus define A = X for some fresh variable X and add
the equation
? (Au → X)
At =
Above, the fact that X is “fresh” means that it does not occur somewhere else
(in the contexts, the types or the equation systems).
Γ ` t : A|E
Γ, x : X ` t : At | Et
(→I ) with X fresh
Γ ` λx.t : X → At | Et
Γ ` t : At | Et Γ ` u : Au | Eu
(→E ) with X fresh
Γ ` t u : X | Et ∪ Eu ∪ {At =
? (Au → X)}
Example 4.4.3.1. For instance, for the term λf x.f x, we have the following
derivation:
(ax) (ax)
f : Z, x : X ` f : Z | f : Z, x : X ` x : X |
(→E )
f : Z, x : X ` f x : Y | Z = X → Y
(→I )
f : Z ` λx.f x : X → Y | Z = X → Y
(→I )
` λf.λx.f x : Z → (X → Y ) | Z = X → Y
The type A and the equations E describe exactly all the possible types for t
in the context Γ in the following sense.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 194
– for every solution σ of E the sequent Γ[σ] ` t : A[σ] is derivable (in the
sense of section 4.1.4),
– if there is a substitution σ and a type B such that Γ[σ] ` t : B is derivable
then σ is a solution of E and B = σ[A].
Proof. By induction on the derivation of Γ ` t : A | E.
It is easily seen that given a context Γ and term t there is exactly one type A
and system E such that Γ ` t : A | E is derivable (up to the choice of type
variables), we can thus speak of the type A and the system E associated to a
term t in a context Γ. Moreover, the above rules are easily translated into a
method for computing those. An implementation of the resulting algorithm is
provided in figure 4.5: the function infer generates, given a an environment
env describing the context Γ, the type A and the equation system E, encoded
as a list of pairs of terms.
We will see in section 5.4 that if a system of equations admits a solution then
it admits a most general one: provided there is a solution, there is a solution σ
such that the solutions are exactly substitutions of the form τ ◦ σ for some
substitution τ . Moreover, we will see an algorithm to actually compute this
most general solution: this is called the unification algorithm. This finally
provides us with what we were looking for.
Theorem 4.4.3.3. Suppose given a context Γ and a term t. Consider the type A
and the system E such that Γ ` t : A | E is derivable, and write σ for the most
general solution of E. Then the substitution σ together with the type A[σ] is a
principal type for t in the environment Γ.
(** Types *)
type ty =
| TVar of int
| TArr of ty * ty
(** Terms. *)
type term =
| Var of string
| Abs of string * term
| App of term * term
If we infer its type (in the empty environment) using the above function infer,
we obtain the following result
Arr
(TVar
{contents =
Link
(TArr (TVar {contents = AVar 1}, TVar {contents = AVar 2})
},
TArr (TVar {contents = AVar 1}, TVar {contents = AVar 2}))
which is OCaml’s way of saying
(X → Y ) → (X → Y )
(in OCaml, references are implemented as records with one mutable field la-
beled contents).
Remark 4.4.3.5. In the unification function, when facing an equation X = ? A,
it is important to check that X does not occur in A. For instance, let us try
to type λx.xx, which is not expected to be typable. The inference will roughly
proceed as follows.
1. Since it is an abstraction, the type of λx.xx must be of the form X → A,
where A is the type of xx. Let’s find the type of xx assuming x of type X.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 198
With the above implementation, the algorithm will raise an error: the unification
of X and X → Y will fail because X ∈ FV(X → Y ). If we forgot to check this,
we would generate for x the type X → Y where X is (physically) the type itself.
This would intuitively correspond to allowing the infinite type
(((. . . → Y ) → Y ) → Y ) → Y
Typability. The above algorithms can also be used to decide the typability of a
term t, i.e. answer the following question: is there a context in which t admits
a type?
Theorem 4.4.3.6. The typability problem for λ-calculus is decidable.
Proof. Suppose given a term t. We write FV(t) = {x1 , . . . , xn } for the set of free
variables and define the context Γ = x1 : X1 , . . . , xn : Xn . Using lemma 4.1.5.1,
it is not difficult to show that t admits a type if and only if it admits a type in
the context Γ, which can be decided as above.
Type schemes. Formally, a type A is defined as before, and type schemes A are
generated by the grammar
A ::= A | ∀X.A
where X is a type variable and A is a type. In other words, a type scheme is a
type with some universally quantified variables at toplevel, i.e. a formula of the
form
∀X1 . . . . ∀Xn .A
Having such a “type” for a term means that it can have any type in the set
i.e. any type obtained by replacing the universally quantified type variables by
some types. As usual, in a type scheme ∀X.A, the variable X is bound in A and
could be renamed. The free variables of a type scheme are
Given a variable X and a type B, we write A[B/X] for the type scheme A where
the variable X has been replaced by B (as usual, one has to properly take care
of bound variables):
if and only if there are types A1 , . . . , An such that B = A[A1 /X1 , . . . , An /Xn ]
and the Yi are variables which are not free in ∀X1 . . . . ∀Xn .A.
When we have A v B, this thus means that B was obtained from A by replacing
some universally quantified variables Xi by types Ai , but not only: we can also
universally quantify some of the fresh variables introduced by the Ai afterwards.
For instance, we have
∀X.X → X v ∀Y.(Y → Y ) → (Y → Y ) v (Z → Z) → (Z → Z)
Hindley-Milner typing system. We are now going to give a typing system for a
programming language whose terms are
This is allowed because the type inferred for id is ∀X.X → X, which is poly-
morphic. On the other hand, the following code is rejected:
let () =
(fun id ->
print_int (id 3); print_string (id "a")
) (fun x -> x)
Namely, the type inferred for the argument id of the function is X → X. During
the type inference, OCaml sees that it is applied to 3, and therefore replaces
X by int, i.e. it guesses that the type of id must be int → int and thus
raises a type error when we also apply it to a string. The identity argument is
monomorphic: it can be applied to an integer, or to a string, but not both.
We now present an algorithm, due to Hindley and Milner [Hin69, Mil78]
which infers such types. A context Γ is a list
x 1 : A1 , . . . , x n : An
consisting of pairs of variables and type schemes. The free variables of such a
context are
n
[
FV(Γ) = FV(Ai )
i=1
Γ`t:A→B Γ`u:A Γ, x : A ` t : B
(→E ) (→I )
Γ ` tu : B Γ ` λx.t : A → B
The rules (→E ) and (→I ) for elimination and introduction of functions are the
usual ones. The rule (ax) allows to specialize the type of a variable in the
context: if x has type A in the context Γ, then we can assume that it actually
has any type B with A v B: with our above example, if id has the type scheme
∀X.X → X in the context, then we can assume that it has type int → int
(or string → string) when we use it, and we can make different assumptions
at each use. Finally, the rule let states that if we can show that t has type A,
then we can assume that it has the more general type scheme
∀Γ A = ∀X1 . . . . ∀Xn .A
suppose that we did not put this restriction and quantify over all the variables
of A. We would then have the derivation
(ax) (ax)
x:X`x:X x : X, y : ∀X.X ` y : Y
(let)
x : X ` let y = x in y : Y
(→I )
` λx.let y = x in y : X → Y
This is clearly incorrect since the term λx.let y = x in y is essentially the iden-
tity, and thus should have X → X as principal type.
The following proposition shows that this typing system amounts to the
simple one of section 4.4.1, where we would allow us to infer the type of an
expression each time we use it:
Proposition 4.4.4.3. The sequent Γ ` let x = t in u : A is derivable if and only
if t is typable in the context Γ and Γ ` u[t/x] : A is derivable.
Γ(x) = A
(ax)
Γ ` x : !A | id
Γ, x : X ` t : B | σ X fresh
(→I )
Γ ` λx.t : X[σ] → B | σ
Γ ` t : C |σ Γ ` u : A | σ0 X fresh σ 00 = mgu(A → X, C)
00 00 0 (→E )
Γ ` t u : X[σ ] | σ ◦ σ ◦ σ
which do not already occur in Γ). If the type scheme A is ∀X1 . . . . ∀Xn .A,
the type !A is thus
A[Y1 /X1 , . . . , Yn /Xn ]
where the variables Yi are fresh and distinct.
(→I ) In order to infer the type of λx.t, we have to guess a type for x and infer
the type of t in the context where x has this type. Since we have no idea of
what this type should be, we simply infer the type of t in the environment
where x has type X, a fresh type variable. This will result in a type B
and a substitution σ such that Γ[σ], x : X[σ] ` t : B and therefore we can
deduce that λx.t has type X[σ] → B.
(→E ) We first infer a type A for u, and the type C for t. In order for t u
to be typable C should be of the form A → B. We therefore use the
unification procedure described in section 5.4 in order to compute the
most general substitution σ 00 such that σ 00 (A → X) = σ(B) for some
fresh variable X, and we will have B = X[σ]; this substitution being
noted σ 00 = mgu(A → X, C) (here, “mgu” means most general unifier, see
section 5.4.2). We deduce that t u has the type B we have computed.
(let) There is no real novelty in this rule compared to earlier. In order to infer
the type of let x = t in u, we infer a type A for t and then infer a type
B for u in the environment where x has the type scheme obtained by
generalizing A with respect to Γ.
This algorithm generates a valid type according to the previous rules:
Theorem 4.4.4.4 (Correctness). If Γ ` t : A | σ is derivable then Γ[σ] ` t : A is
derivable.
Moreover, it is actually the most general one that could be inferred:
Theorem 4.4.4.5 (Principal types). Suppose that Γ ` t : A | σ is derivable.
Then, for every substitution τ and type B such that Γ[τ ] ` t : B there exists a
substitution τ 0 such that τ = τ 0 ◦ σ and B = A[τ 0 ].
Example 4.4.4.6. Here are some principal types which can be computed with
the algorithm:
λx.let y = x in y : X → X
λx.let y = λz.x in y : X → Y → X
λx.let y = λz.x z in y : (X → Y ) → (X → Y )
type ty =
| EVar of int (* non-quantified variable *)
| UVar of int (* universally quantified variable *)
| TArr of ty * ty
Here, instead of universally quantifying some variables, we use two constructors:
UVar n is a variable which is universally quantified, and EVar n is a variable
which is not. The generation of fresh type variables can be achieved with a
counter, as usual:
let fresh =
let n = ref (-1) in fun () -> incr n; EVar !n
Next, the instantiation of a type scheme is performed by replacing each universal
variable with a fresh existential one (we use a list tenv in order to remember
when a universal variable has already been replaced by some variable, in order
to always replace it by the same variable):
let inst =
let tenv = ref [] in
let rec inst = function
| UVar x ->
if not (List.mem_assoc x !tenv) then
tenv := (x, fresh ()) :: !tenv;
List.assoc x !tenv
| EVar x -> EVar x
| TArr (a, b) -> TArr (inst a, inst b)
in
inst
The following function checks whether a variable occurs in a type:
let rec occurs x = function
| EVar y -> x = y
| UVar _ -> false
| TArr (a, b) -> occurs x a || occurs x b
We can then generalize a type with respect to a given a context by changing
each variable EVar n which does not occur in the context into the corresponding
universal variable UVar n:
let rec gen env = function
| EVar x ->
if List.exists (fun (_,a) -> occurs x a) env
then EVar x else UVar x
| UVar x -> UVar x
| TArr (a, b) -> TArr (gen env a, gen env b)
We can finally implement the function which will infer the type of a term in a
given environment and return it together with the corresponding substitution.
The four cases of the match correspond to the four different rules above:
let rec infer env = function
| Var x ->
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 204
let a =
try List.assoc x env
with Not_found -> raise Type_error
in
inst a, Subst.id
| Abs (x, t) ->
let a = fresh () in
let b, s = infer ((x,a)::env) t in
TArr (Subst.app s a, b), s
| App (t, u) ->
let a, su = infer env u in
let b = fresh () in
let c, st = infer env t in
let s = unify (TArr (a, b)) c in
Subst.app s b, Subst.comp s (Subst.comp su st)
| Let (x, t, u) ->
let a, st = infer env t in
let b, su = infer ((x, gen (Subst.app_env st env) a)::env) u in
b, Subst.comp su st
We have implemented substitutions as functions int -> ty associating a type to
a type variable. The functions Subst.id, Subst.comp, Subst.app and Subst.app_env
respectively compute the identity substitution, the composite of substitutions
and the application of a substitution to a term and to an environment. Their
implementation is left to the reader. Finally, above, the function unify imple-
ments the unification algorithm described in section 5.4:
let rec unify l =
match l with
| (EVar x, b)::l ->
if occurs x b then raise Type_error;
Subst.comp (unify l) (Subst.make [x, b])
| (a, EVar x)::l ->
unify ((EVar x, a)::l)
| (TArr (a, b), TArr (a', b'))::l ->
unify ([a, a'; b, b']@l)
| (UVar _, _)::_ | (_, UVar _)::_ -> assert false
| [] -> Subst.id
let unify a b = unify [a, b]
| TArr of ty * ty
and tvar =
| Unbd of int (* unbound variable *)
| Link of ty (* substituted variable *)
Most functions are straightforwardly adapted. The main novelties are the uni-
fication function, which now performs the modification of types in place:
let rec unify a b =
match unlink a, unlink b with
| EVar x, _ ->
if occurs x b then raise Type_error else x := Link b
| _, EVar y -> unify b a
| TArr (a1, a2), TArr (b1, b2) -> unify a1 b1; unify a2 b2
| _ -> raise Type_error
and the type inference function which is simpler to write, because it does not
need to propagate the substitutions:
let rec infer env = function
| Var x -> (try inst (List.assoc x env)
with Not_found -> raise Type_error)
| Abs (x, t) ->
let a = fresh () in
let b = infer ((x,a)::env) t in
TArr (a, b)
| App (t, u) ->
let a = infer env u in
let b = fresh () in
let c = infer env t in
unify (TArr (a,b)) c;
b
| Let (x, t, u) ->
let a = infer env t in
infer ((x, gen env a)::env) u
The substitutions are now very efficiently performed because we do not have
to go through terms anymore, references are doing the job for us. There is
however one last source of inefficiency in this code: in the function unify, the
function occurs x b has to go through all the type b to see whether the variable
x occurs in it or not. There is a very elegant solution to this due to Rémy [Rém92]
that we learned from [Kis13]. To each type variable, we are going to assign an
integer called its level, which indicates the depth of let-declaration when it was
created. Initially, the level is 0 by convention and in an expression let x = t in u
at level n, the variables created by t will have level n + 1, whereas the variables
of u will still have level n (it is some sort of de Bruijn index). For instance, in
the term
let a = (let b = λx.x in λy.y) in λz.z
the type variables associated to x, y and z will have level 2, 1 and 0 respectively.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 206
There is a catch however: it might happen that, during unification (see the
function unify above), a variable X with low level ` gets substituted with a
type A containing variables of high level. In this case, all the levels of the
variables in A should be lowered to the minimum of their level and ` before
performing the substitution: their level get “contaminated” by the one of the
variable they are unified with. However, we are smart and see that the function
occurs is already going through the type just before we substitute, and it is the
only place where it is used, so that we can use it to both check the occurrence
and update the levels. We therefore change it to
let rec occurs x a =
match unlink a with
| EVar y when x = y -> raise Type_error
| EVar y ->
let l = tlevel y in
let l = match !x with Unbd (_,l') -> min l l' | _ -> l in
y := Unbd (tname y, l)
| UVar _ -> ()
| TArr (a, b) -> occurs x a; occurs x b
which changes the level of all variables by the minimum of their old level and
the current level. Without this modification of occurs, for the term
λx.let y = λz.x z in y
principal type, see section 4.4.2, but here we do not want to make any choice
for the user).
This suggests splitting the usual typing judgment Γ ` t : A in two:
– Γ ` t ⇒ A: we infer the type A for the term t in the context Γ,
– Γ ` t ⇐ A: we check that the term t has type A in the context Γ.
We will consider terms of the form
t, u ::= x | λx.t | t u | (t : A)
The only new construction is the last one, (t : A), which means “check that t
has type A”. It will become handy since it allows bringing type information in
terms and is already present in languages such as OCaml, where we can define
the identity function on integers by
let id = fun x -> (x : int)
The rules for type inference and checking are the following ones:
(ax)
Γ ` x ⇒ Γ(x)
Γ`t⇒A→B Γ`u⇐A Γ, x : A ` t ⇐ B
(→E ) (→I )
Γ ` tu ⇒ B Γ ` λx.t ⇐ A → B
Γ`t⇐A Γ`t⇒A
(cast) (sub)
Γ ` (t : A) ⇒ A Γ`t⇐A
They read as follows:
(ax) If we know that x has type A then we can come up with a type for x:
namely A.
(→E ) If we can infer a type A → B for t and check that u has type A then we
can infer the type B for t u.
(→I ) In order to check that λx.t has type A → B, we should check that t has
type B when x has type A.
(cast) We can infer the type A for (t : A) provided that t actually has type A.
(sub) This subsumption rule states that, as last resort, if we do not know how
to check that a term t has type A, we can go back to the old method of
inferring a type for it and ensuring that this type is A.
Note that there is no rule for inferring the type of λx.t, because there is no way
to come up with a type for x without type annotations. Again, this means that
we cannot infer a type for the identity λx.x, but we can in presence of type
annotations:
(ax)
x:A`x⇒A
(sub)
x:A`x⇐A
(→I )
` λx.x ⇐ A → A
(cast)
` (λx.x : A → A) ⇒ A → A
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 209
Γ`r⇒R
for every real number r. We also suppose that we have access to the usual
mathematical functions (addition, multiplication), as well as a function which
computes the mean of a function between two points, i.e. Γ contains
mean : (R → R) → R → R → R
Γ, x : R ` x ⇒ R
Γ, x : R ` x ⇐ R
Γ ` mean ⇒ (R → R) → R → R → R Γ ` λx.x ⇐ R → R Γ`5⇒R
Γ ` mean (λx.x × x) ⇒ R → R → R Γ`5⇐R Γ`7⇒R
Γ ` mean (λx.x × x) 5 ⇒ R → R Γ`7⇐R
Γ ` mean (λx.x × x) 5 7 ⇒ R
λf xy.(f x + f y)/2
This is why in a programming language such as Agda you have to declare the
type of a function when defining it:
mean : (R → R) → R → R → R
mean f x y = (x + y) / 2
Remark 4.4.5.2. If we omit the rule (cast), it is interesting to note that the
terms v such that Γ ` v ⇐ A and the terms n such that Γ ` n ⇒ A is derivable
for some context Γ and type A are respectively generated by the grammars
which is precisely the traditional definition of values (also called normal forms)
and neutral terms (already encountered in section 3.5.2 for instance).
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 210
(** Types. *)
type ty =
| TVar of string
| TArr of ty * ty
(** Terms. *)
type term =
| Var of var
| App of term * term
| Abs of var * term
| Cast of term * ty
exception Cannot_infer
exception Type_error
S = λxyz.(xz)(yz) K = λxy.x
We have also seen in example 4.4.2.5 that the principal types of those terms are
respectively
(X → Y → Z) → (X → Y ) → X → Z and X→Y →X
(A → B → C) → (A → B) → A → C and A→B→A
(S)
Γ ` S : (A → B → C) → (A → B) → A → C
(K)
Γ`K:A→B→A
Γ`t:A→B Γ`u:A
(→E )
Γ ` tu : B
(S)
Γ ` (A ⇒ B ⇒ C) ⇒ (A ⇒ B) ⇒ A ⇒ C
(K)
Γ`A⇒B⇒A
Γ`A⇒B Γ`A
(⇒E )
Γ`B
which is precisely the Hilbert calculus described in section 2.7! In other words,
in the same way that natural deduction corresponds, via the Curry-Howard cor-
respondence, to simply typed λ-calculus, Hilbert calculus corresponds to typed
combinatory terms.
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 212
Γ ` ¬¬A
(¬¬E )
Γ`A
The second rule somehow states that the application to the argument u goes
through under C: if our calculus had products or coproducts, similar rules would
have to be added in order to enforce their compatibility with C.
Let us try to understand what this could mean. Suppose given a term v
of type ¬¬A. Since ¬¬A = (A → ⊥) → ⊥, this means that v must be an
abstraction taking an argument k of type A → ⊥ and return a value of type ⊥,
i.e. v will typically reduce to a term of the form λk A→⊥ .u. Since there is
no introduction rule for ⊥ (there is no way of directly constructing a term of
type ⊥), at some point during the evaluation of u, it must apply k to some
argument t of type A in order to produce the value of type ⊥, i.e. v will reduce
to λk A→⊥ .k t. Thus, C(v) will reduce to C(λk A→⊥ .k t), which will reduce to t.
A typical reduction is thus
∗ ∗
C(v) −→β C(λk ¬A .u) −→β C(λk ¬A .k t) −→β t
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 214
This means that C(v) waits for v to apply its argument k to some term t of
type A and returns this argument t. The term k can thus be thought of as an
analogous of return in some languages such as C, or maybe also as the raise
operator of OCaml which raises exceptions (more on this later on). However,
things are more subtle here because the returned term might itself use some of
the terms computed during the evaluation of v. In order to see that in action,
let us compute the term associated to the usual proof of ¬A ∨ A:
(ax)
k : ¬(¬A ∨ A), a : A ` a : A
(∨rI )
k : ¬(¬A ∨ A), a : A ` ιr (a) : ¬A ∨ A
(¬E )
k : ¬(¬A ∨ A), a : A ` k ιr (a) : ⊥
(¬I )
k : ¬(¬A ∨ A) ` λaA .k ιr (a) : ¬A
(∨lI )
k : ¬(¬A ∨ A) ` ιl (λaA .k ιr (a)) : ¬A ∨ A
(¬E )
k : ¬(¬A ∨ A) ` k ιl (λaA .k ιr (a)) : ⊥
(¬¬I )
` λk ¬A .k ιr (λaA .k ιl (a)) : ¬¬(¬A ∨ A)
(¬¬E )
` C(λk ¬A .k ιl (λaA .k ιr (a))) : ¬A ∨ A
As indicated above, the term C(λk ¬A .k ιl (λaA .k ιr (a))) cannot reasonably reduce
to
t = ιl (λaA .k ιr (a))
because the variable k occurs in t. The additional rules make it so that it
however acts as t, i.e. it states that it is a proof of ¬A = A → ⊥, albeit being
surrounded by C(λk ¬A .k . . .). If, at some point, we use this proof and apply it
to some argument u of type A, the term will thus reduce to
C(λk ¬A .k ιr (u))
which in turn will reduce to ιr (u) by the reduction rule associated to C. It thus
fakes being a proof of ¬A until we actually use this proof and apply it to some
argument u of type A, at which point it changes its mind and declares that it
was actually a of A, namely u. This is exactly the behavior we were describing
in section 2.5.2, when explaining that classical logic allows to “resetting proofs”.
Variants of the calculus. The operator C is due to Felleisen [FH92] and the
observation that it could be typed by double negation elimination was first made
by Griffin [Gri89], see also [SU06, chapter 7]. There are many small possible
variants of the calculus. First note that we could add C (as opposed to C(t))
as a constant to the language, which corresponds to adding double negation
elimination as an axiom instead of a rule:
Γ ` C : ¬¬A → A
If we instead use Clavius’ law instead of double negation, see theorem 2.5.1.1,
then we would have defined an operator cc called callcc (for call with current
continuation):
Γ ` cc : (¬A → A) → A
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 215
This means that we now add a construction µxA .t to our terms which corre-
sponds to
µxA .t = C(λx¬A .t)
in the previous calculus. In next section, we see an alternative calculus, based
on the similar ideas though, with much nicer and intuitive reduction rules.
The first constructions are the usual ones from the λ-calculus. A term of the
form µα.t should be thought of as a term catching an exception named α and a
term [α]t as raising the exception α with the value t. The reduction will make
it so that the place where it is catched is replaced by t. For instance, we will
have a reduction
∗
t (µα.u ([α]v)) −→ t v
meaning that during the evaluation of the argument of t, the exception α will
be raised with value v and will thus replace the term at the corresponding µα.
The constructor µ is considered as a binder and terms are considered modulo
α-equivalence: µα.t = µβ.(t[β/α]). Beware of the unfortunate similarity in
notation between raising and substitution.
The three reduction rules of the calculus are
– the usual β-reduction:
(λx.t)u −→β t[u/x]
where the weird notation for in the substitution means that we should
replace every subterm of t of the form [α]v by [β]vu,
– the following reduction rule for µ, stating that if we catch exceptions raised
on α and immediately re-raise on β, we might as well raise them directly
on β:
[β](µα.t) −→β t[β/α]
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 216
let () = print_int (f 3)
Although, the exception Alpha seems to be catched (the raise is surrounded by
a try / catch), executing the program results in
Fatal error: exception Alpha(_)
meaning that it was not the case: when executing f 3, f is replaced by its value
and the reduction raises the exception. The analogous of this program in λµ is
f = µα.[α](λn.[α](λx.n × x))
(we allow ourselves to use integers and multiplication). It does not suffer from
this problem, and corresponds to a function which, when applied to an argu-
ment n, turns into the function which multiplies by n. When we apply it to 3,
it thus turns to the function which multiplies its argument (which is 3) by 3 and
the result will actually be 9 as expected:
f 3 −→ µβ.[β](λn.[β](λx.n × x)3)3
−→ µβ.[β][β](λx.3 × x)3
−→ µβ.[β][β](3 × 3)
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 217
A, B ::= X | A → B | ⊥
Γ ` t : A|∆
Γ = x1 : B1 , . . . , x m : Bm ∆ = α1 : A1 , . . . , αn : An
where the variables of Γ are regular ones, whereas those of ∆ are control vari-
ables. Namely, Γ provides the type of the free variables of t as usual, whereas ∆
gives the type of exceptions that might be raised. Finally, A is the type of the
result of t, which might never be actually given if some exception is raised. In
particular, a term of type ⊥ is called a command: we know that it will never
return a value, and thus necessarily raises some exception.
The typing rules for λµ-calculus are
(ax)
Γ, x : A, Γ0 ` x : A | ∆
Γ ` t : A → B |∆ Γ ` u : A|∆ Γ, x : A ` t : B | ∆
(→E ) (→I )
Γ ` tu : B |∆ Γ ` λxA .t : A → B | ∆
Γ ` t : ⊥ | ∆, α : A, ∆0 Γ ` t : A | ∆, α : A, ∆0
(⊥E ) (⊥I )
A
Γ ` µα .t : A | ∆, ∆ 0
Γ ` [α]t : ⊥ | ∆, α : A, ∆0
The rule (⊥E ) says that a term µαA .t of type A is a command which raises some
value of type A on α and the rule (⊥I ) says that a term [α]t is a command (of
type ⊥, not returning anything) and that the type A of the raised term t has
to match the one expected for α.
Exercise 4.6.3.1. Shows that the Pierce law
((A → B) → A) → A
It can be shown that this system has the expected properties [Par92, Par97],
which were detailed above in the case of simply-typed λ-calculus:
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 218
(ax)
Γ, A, Γ0 ` A, ∆
Γ ` A ⇒ B, ∆ Γ ` A, ∆ Γ, A ` B, ∆
(⇒E ) (⇒I )
Γ ` B, ∆ Γ ` A ⇒ B, ∆
Γ ` ⊥, ∆, A, ∆0 Γ ` A, ∆, A, ∆0
(⊥E ) (⊥I )
Γ ` A, ∆, ∆0 Γ ` ⊥, ∆, A, ∆0
All the rules are the usual ones excepting for the rule (⊥I ) which combines
weakening, contraction and exchange:
Γ ` A, ∆, A, ∆0
(xch)
Γ ` ∆, A, A, ∆0
(contr)
Γ ` ∆, A, ∆0
(wk)
Γ ` ⊥, ∆, A, ∆0
Γ ` B, A, ∆, B, ∆0 Γ ` t : B | α : A, ∆, β : B, ∆0
(⊥I ) (⊥I )
Γ ` ⊥, A, ∆, B, ∆ Γ ` [β]t : ⊥ | A, ∆, B, ∆
(⊥E ) (⊥E )
Γ ` A, ∆, B, ∆0 Γ ` µαA .[β]t : A | ∆, β : B, ∆0
Adding the usual rules for coproducts, we can show the excluded middle as
follows in this settings
(ax)
x : A ` x : A | α : ¬A ∨ A
(∨rI )
x : A ` ι¬A
r (x) : ¬A ∨ A | α : ¬A ∨ A
(⊥I )
x : A ` [α]ι¬A
r (x) : ⊥ | α : ¬A ∨ A
(¬I )
` λxA .[α]ι¬A
r (x) : ¬A | α : ¬A ∨ A
¬A
(∨lI )
` ιA A
l (λx .[α]ιr (x)) : A ∨ B | α : ¬A ∨ A
¬A
(⊥I )
` [α]ιA A
l (λx .[α]ιr (x)) : ⊥ | α : ¬A ∨ A
(⊥E )
` µα¬A∨A .[α]ιA A ¬A
l (λx .[α]ιr (x)) : ¬A ∨ A |
In order to give a more concrete idea of this program, let us try to implement it in
OCaml. Remember from section 1.5 that the empty type ⊥ can be implemented
as
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 219
type bot
and negation as
type 'a neg = 'a -> bot
From those, the above term proving excluded middle can roughly be translated
as
let em () : (a neg, a) sum =
let exception Alpha of (a neg, a) sum in
try Left (fun x -> raise (Alpha (Right x)))
with Alpha x -> x
As explained above, this does not behave exactly as it should in OCaml, because
exceptions are not properly scoped there...
4.6.4 A more symmetric calculus. The reduction rule for (µα.t) u in the
λµ-calculus involves a slightly awkward substitution. In order to overcome this
defect and reveal the symmetry of terms and environments, Curien and Herbelin
have introduced a variant of the λµ-calculus called the λµµ̃-calculus [CH00]. In
this calculus there are three kinds of “terms”:
Γ ` t : A|∆ Γ|e : A ` ∆ c : (Γ ` ∆)
Γ ` t : A|∆ Γ|e : B ` ∆ Γ, x : A ` t : B | ∆
(→L ) (→R )
Γ, t · e : A → B ` ∆ Γ ` λx.t, A → B | ∆
c : (Γ, x : A ` ∆) c : (Γ ` α : A, ∆)
(⊥L ) (⊥R )
Γ | µ̃x.c : A ` ∆ Γ ` µα.c : A | ∆
Γ ` t : A|∆ Γ|e : A ` ∆
ht | ei : (Γ ` ∆)
You are strongly encouraged to observe their beautiful symmetry and find out
their meaning by yourselves. In particular, Lafont’s critical pair presented in
CHAPTER 4. SIMPLY TYPED λ-CALCULUS 220
section 2.5.4 corresponds to the fact that the following term can reduce in two
different ways, showing that the calculus is not confluent (for good reasons!):
First-order logic
5.1 Definition
5.1.1 Signature. A signature Σ is a set of function symbols together with a
function a : Σ → N associating an arity to each symbol: f can be thought of as
a formal operation with a(f ) inputs. In particular, symbols of arity 0 are called
constants.
t ::= x | f (t1 , . . . , tn )
In the following, we generally omit parenthesis for constants, e.g. write 0 instead
of 0().
CHAPTER 5. FIRST-ORDER LOGIC 222
5.1.5 Bound and free variables. In a formula of the form ∀x.A or ∃x.A, the
variable x is said to be bound in A. This means that the name of the variable x
does not really matter and we could have renamed it to some other variable
name, without changing the formula. We thus implicitly consider formulas up
to proper (or capture avoiding) renaming of variables (by “proper”, we mean
here that we should take care of not renaming a variable to some already bound
variable name). For instance, we consider that the two formulas
are the same (the second is obtained from the first by renaming x to z), but
they are different from the formula
∀x.∃x.x + x = x
where, given a term t, we write F V (t) for the set of all the variables occurring
in t. A formula A is closed when it has no free variable, i.e. FV(A) = ∅. We
sometimes write
A(x1 , . . . , xn )
for a formula A whose free variables are among x1 , . . . , xn . In this case, we write
A(t1 , . . . , tn ) instead of A[t1 /x1 , . . . , tn /xn ].
Given a formula A and a term t, we write A[t/x] for the formula A where all
the free occurrences of x have been substituted by t avoiding captures, i.e. we
suppose that all bound variables are different from the variables of t. For in-
stance, with A being
(∃y.x + x = y) ∨ (∃x.x = y)
we have that A[z + z/x] is
(∃y.(z + z) + (z + z) = y) ∨ (∃x.x = y)
but in order to compute A[y + y/y], we have to rename the bound variable y
(say, to z) and the result will be
(∃z.(y + y) + (y + y) = z) ∨ (∃x.x = y)
and not
(∃y.(y + y) + (y + y) = y) ∨ (∃x.x = y)
CHAPTER 5. FIRST-ORDER LOGIC 224
5.1.6 Natural deduction rules. The rules for first order logic in intuitionistic
natural deduction are the usual ones (see figure 2.1) together with the following
introduction and elimination rules for universal and existential quantification:
Γ ` ∀x.A Γ`A
(∀E ) (∀I )
Γ ` A[t/x] Γ ` ∀x.A
Γ ` ∃x.A Γ, A ` B Γ ` A[t/x]
(∃E ) (∃I )
Γ`B Γ ` ∃x.A
Remark 5.1.6.2. The side conditions avoid clearly problematic proofs such as
(ax)
A(x) ` A(x)
(∀I )
A(x) ` ∀x.A(x)
(⇒I )
` A(x) ⇒ ∀x.A(x)
(∀I )
` ∀x.(A(x) ⇒ ∀x.A(x))
(∀E )
` A(t) ⇒ ∀x.A(x)
which can be read as: if the formula A holds for some term t then it holds for
any term. The problematic rule is the (∀I ) just after the (ax) rule: x is not
fresh.
Properties of the calculus. We do not detail this here, but the usual properties
of natural deduction generalize to first order logic. In particular, the structural
rules (contraction, exchange, weakening) are admissible, see section 2.2.7. We
will also see in section 5.1.9 that cuts can be eliminated.
CHAPTER 5. FIRST-ORDER LOGIC 225
5.1.7 Classical first order logic. Following section 2.5, classical first order
logic, is the system obtained from the above one by adding one of the following
rules
Γ ` ¬¬A Γ, ¬A ` A
(lem) (¬¬E ) (raa)
Γ ` ¬A ∨ A Γ`A Γ`A
A = ∃x.(D(x) ⇒ (∀y.D(y)))
¬(∀x.¬A(x)) ⇒ ∃x.A(x)
which states that if it is not the case that every element x does not satisfy A(x),
then we can actually produce an element which satisfies A(x). It can be proved
CHAPTER 5. FIRST-ORDER LOGIC 226
as follows:
(ax)
. . . ` A(x)
(ax) (∃I )
. . . ` ¬∃x.A(x) . . . ` ∃x.A(x)
(¬E )
¬∀x.¬A(x), ¬∃x.A(x), A(x) ` ⊥
(¬I )
¬∀x.¬A(x), ¬∃x.A(x) ` ¬A(x)
(ax) (∀I )
. . . ` ¬∀x.¬A(x) ¬∀x.¬A(x), ¬∃x.A(x) ` ∀x.¬A(x)
(¬E )
¬∀x.¬A(x), ¬∃x.A(x) ` ⊥
(¬I )
¬∀x.¬A(x) ` ¬¬∃x.A(x)
(¬¬E )
¬∀x.¬A(x) ` ∃x.A(x)
(⇒I )
` ¬(∀x.¬A(x)) ⇒ ∃x.A(x)
As in example 5.1.7.1, it is enough to show that it is not the case that there is
no x satisfying A.
Exercise 5.1.7.3. Another proof for the drinker formula of example 5.1.7.1 is the
following. We have two possibilities for the pub:
– either everybody drinks: in this case, we can take anybody as the universal
drinker,
– otherwise, there is someone who does not drink: we can take him as
universal drinker.
Formalize this reasoning in natural deduction.
Example 5.1.7.5. The formula of example 5.1.7.2 can be put into prenex form
as follows:
5.1.8 Sequent calculus rules. The rules for first-order quantifiers in classical
sequent calculus are
Γ, ∀x.A, A[t/x] ` ∆ Γ ` A, ∆
(∀L ) (∀R )
Γ, ∀x.A ` ∆ Γ ` ∀x.A, ∆
Γ, A ` ∆ Γ ` A[t/x], ∃x.A, ∆
(∃L ) (∃R )
Γ, ∃x.A ` ∆ Γ ` ∃x.A, ∆
with the side condition for (∀R ) and (∃L ) that x 6∈ FV(Γ)∪FV(∆). Intuitionistic
rules are obtained, as usual, by restricting to sequents with one formula on the
right:
Γ, ∀x.A, A[t/x] ` B Γ`A
(∀L ) (∀R )
Γ, ∀x.A ` B Γ ` ∀x.A
Γ, A ` B Γ ` A[t/x]
(∃L ) (∃R )
Γ, ∃x.A ` B Γ ` ∃x.A
with the expected side conditions for (∀R ) a,d (∃L ).
Remark 5.1.8.1. In the rules (∀R ) and (∃L ), we have been careful to keep a copy
of the hypothesis: with this formulation, contraction is admissible.
Example 5.1.8.2. The drinker formula from example 5.1.4.2 can be proved clas-
sically by
(ax)
D(x), D(y) ` D(y), ∀y.D(y), ∃x.(D(x) ⇒ (∀y.D(y)))
(⇒R )
D(x) ` D(y), D(y) ⇒ (∀y.D(y)), ∃x.(D(x) ⇒ (∀y.D(y)))
(∃R )
D(x) ` D(y), ∃x.(D(x) ⇒ (∀y.D(y)))
(∀R )
D(x) ` ∀y.D(y), ∃x.(D(x) ⇒ (∀y.D(y)))
(⇒R )
` D(x) ⇒ (∀y.D(y)), ∃x.(D(x) ⇒ (∀y.D(y)))
(∃R )
` ∃x.(D(x) ⇒ (∀y.D(y)))
5.1.9 Cut elimination. The properties and proof techniques developed in sec-
tion 2.3 extend to first order natural deduction, allowing to prove that it has
the cut elimination property:
Theorem 5.1.9.1. A sequent Γ ` A admits a proof if and only if it admits a cut
free proof.
In the cut elimination procedure, there are two new cases, which can be handled
as follows:
π
Γ ` A(x)
(∀I )
Γ ` ∀x.A(x) π[t/x]
(∀E )
Γ ` A(t) Γ ` A(t)
π
Γ ` A(t) π0
(∃I )
Γ ` ∃x.A(x) Γ, A(x) ` B π 0 [t/x][π/A]
(∃E )
Γ`B Γ`B
Above, π[t/x] denote the proof π where all the free occurrences of the variable x
have been replaced by the term t (details left to the reader). As in the case of
propositional logic, it can be shown that a proof of a formula in an empty context
necessarily ends with an introduction rule (proposition 2.3.3.2) and thus deduce
(as in theorem 2.3.4.2):
Theorem 5.1.9.2 (Consistency). First order (intuitionistic or classical) natural
deduction is consistent: there is no proof of ` ⊥.
Another important consequence is that that the logic has the existence property:
if we can prove that there exists a term satisfying some property, then we can
actually construct such a term:
Theorem 5.1.9.3 (Existence property). A formula of the form ∃x.A is provable
in intuitionistic first order natural deduction if and only if there exists a term t
such that A[t/x] is provable.
Proof. For the left-to-right implication, if we have a proof of ∃x.A then, by
theorem 5.1.9.1, we have a cut-free one which, by proposition 2.3.3.2, ends with
an introduction rule, i.e. is of the form
π
` A[t/x]
(∃I )
` ∃x.A
We therefore have a proof π of A[t/x] for some term t. The right-to-left impli-
cation is given by an application of the rule (∃I ).
In contrast, we do not expect this property to hold in classical logic. For in-
stance, consider the drinker formula of example 5.1.7.2. We can feel that the
proof we have given is not constructive: there is no way of determining who is
the drinker in general (i.e. without performing a reasoning specific to the bar in
which we currently are).
CHAPTER 5. FIRST-ORDER LOGIC 229
Note that in the premise of the (∃I ), we use the fact that > = >[x/x], i.e. we
use x as witness for the existence. A variation on the previous example is the
following proof, which expresses the fact that if a property A is satisfied for
every term x, then we can exhibit a term satisfying A. Again, we would have
expect that this is not true when there is no element in the model, and moreover,
this does not feel like very constructive:
(ax)
∀x.A ` ∀x.A
(∀E )
∀x.A ` A
(∃I )
∀x.A ` ∃x.A
(⇒I )
` (∀x.A) ⇒ ∃x.A
Here also, in the premise of the (∃I ) rule, we use the fact that A = A[x/x],
i.e. we use x as witness for the existence. We will see in section 5.2.3 that this
is the reason why models are usually supposed to be non-empty, while there is
no good reason to exclude this particular case.
In order to fix that, we should keep track of the variables which are declared
in the context, which are sometimes called eigenvariables. This can be done by
adding a new context Ξ to our sequents, which is a list of first order variables
which are declared. We thus consider sequents of the form
Ξ|Γ`A
the vertical bar being there to mark the delimitation between the context of
eigenvariables and the traditional context. The rules for logical connectives
simply “propagate” the new context Ξ, e.g. the rules for conjunction become
Ξ | Γ ` ∀x.A Ξ, x | Γ ` A
(∀E ) (∀I )
Ξ | Γ ` A[t/x] Ξ | Γ ` ∀x.A
Ξ | Γ ` ∃x.A Ξ, x | Γ, A ` B Ξ | Γ ` A[t/x]
(∃E ) (∃I )
Ξ|Γ`B Ξ | Γ ` ∃x.A
where we suppose
CHAPTER 5. FIRST-ORDER LOGIC 230
Example 5.1.10.2. We cannot prove ∃x.> in this system. In particular, the proof
(>I )
| ` >[x/x]
(∃I )
| ` ∃x.>
is not valid because the side condition is not satisfied for the rule (∃I ).
Exercise 5.1.10.3. Show that the formula (∀x.⊥) ⇒ ⊥ is provable with tradi-
tional rules, but not with the rules presented in this section.
Expressions. We begin with the language for proofs introduced in chapter 4, the
simply typed λ-calculus. In this section, we call expressions its terms in order
not to confuse them with first order terms, and write e for an expression. The
syntax for expressions is thus
e, e0 ::= λxA .e | e e0 | . . .
In order to account for first order logic, we extend them with the following
constructions:
Γ ` e : ∃x.A Γ, y : A ` e0 : B Γ ` e : A[t/x]
(∃E ) (∃I )
Γ ` unpair(e, xy 7→ e0 ) : B Γ ` ht, ei : ∃x.A
and can be read as follows:
– (∀I ): a proof of ∀x.A is a function which takes a term x as argument and
returns a proof of A,
– (∀E ): using a proof of ∀x.A consists in applying it to a term t,
– (∃I ): a proof of ∃x.A(x) is a pair consisting of a term t and a proof that
A(t) is satisfied,
– (∃E ): we can use a proof of ∃x.A by extracting its components.
Example 5.1.11.1. Consider again the derivation of example 5.1.6.1. It can be
decorated with terms as follows:
(ax)
f : ∀x.¬A, e : ∃x.A ` e : ∃x.A
(ax)
f : ∀x.¬A, e : ∃x.A, a : A ` f : ∀x.¬A
(∀E )
f : ∀x.¬A, e : ∃x.A, a : A ` f x : ¬A
(ax)
f : ∀x.¬A, e : ∃x.A, a : A ` a : A
(¬E )
f : ∀x.¬A, e : ∃x.A, a : A ` f x a : ⊥
(∃E )
f : ∀x.¬A, e : ∃x.A ` unpair(e, xa 7→ f x a) : ⊥
(¬I )
f : ∀x.¬A ` λe∃x.A . unpair(e, xa 7→ f x a) : ¬(∃x.A)
(⇒I )
` λf ∀x.¬A .λe∃x.A . unpair(e, xa 7→ f x a) : (∀x.¬A) ⇒ ¬(∃x.A)
CHAPTER 5. FIRST-ORDER LOGIC 232
i.e.
i.e.
i.e.
λ∀ x.e x −→η e
and
π
(ax)
Γ ` e : ∃x.A Γ, y : A ` y : A
(∃E )
Γ ` unpair(e, xy 7→ y) : A π
(∃I )
Γ ` hx, unpair(e, xy 7→ y)i : ∃x.A Γ ` e : ∃x.A
CHAPTER 5. FIRST-ORDER LOGIC 233
i.e.
5.2 Theories
A first-order theory Θ on a given signature and set of predicates is a (possibly
infinite) set of closed formulas called axioms. A formula A is provable in a
theory Θ is there is a finite subset Γ ⊆ Θ such that Γ ` A is provable. Unless
otherwise specified, the ambient fist order logic is usually taken to be classical
when considering first order theories.
5.2.1 Equality. We often consider theories with equality. This means that we
suppose that we have a predicate “=” of arity 2, together with axioms
∀x.x = x
∀x.∀y.x = y ⇒ y = x
∀x.∀y.∀z.x = y ⇒ y = z ⇒ x = z
and, for every function symbol f of arity n, we have an axiom
5.2.3 Models. Theories are though of as denoting structures made of sets and
functions satisfying axioms. For instance, the theory of groups of example 5.2.1.1
can be seen as a syntax for groups in the traditional sense. These structures
are called models of the theory and we very briefly recall those here. We do
not even scratch the surface of model theory here, and the reader interested in
knowing more about those is urged read some standard textbooks about that
such as [CK90].
JtKk : M k → M
defined by induction by
Jxi Kk : M k → M
is the canonical i-th projection and, for every function symbol f of arity n, and
(m1 , . . . , mk ) ∈ M k ,
where Jf K is given by the structure and Jti Kk is computed inductively for every
index i. In other words, the interpretation of terms is the only extension of the
structure which is compatible with composition. Given k ∈ N and a formula A
whose free variables are among {x1 , . . . , xk }, we define its interpretation JAKk
as the subset of M k defined inductively as follows:
J⊥Kk = ∅ J>Kk = M k
JA ∧ BKk = JAKk ∩ JBKk JA ∨ BKk = JAKk ∪ JBKk
J¬AKk = M k \ JAKk JA ⇒ BKk = J¬A ∨ BKk
together with
\
J∀xk+1 .AKk = {(m1 , . . . , mk ) ∈ M k | (m1 , . . . , mk , m) ∈ JAKk+1 }
m∈M
and
[
J∃xk+1 .AKk = {(m1 , . . . , mk ) ∈ M k | (m1 , . . . , mk , m) ∈ JAKk+1 }
m∈M
The interpretation of A is thus intuitively the set of values in M for its free
variables making it true.
CHAPTER 5. FIRST-ORDER LOGIC 235
Satisfaction for closed formulas. Given a closed formula A, its interpretation JAK0
is a subset of M 0 = {()}, which is a set with one element, conventionally de-
noted (). There are therefore two possible values for JAK0 : ∅ and {()}. In the
second case, we say that the formula A is satisfied in the structure.
– a constant J1K : M 0 → M ,
– a relation J=K ⊆ M × M .
We say that such a structure has strict equality when the interpretation of the
equality is the diagonal relation
J=K = {(m, m) | m ∈ M }
Such a structure M is a model of the theory of groups, i.e. is a model for all
its axioms, precisely if (M, J×K, J1K) is a group in the traditional sense, and
conversely every group gives rise to a model where equality is interpreted in
such a way: the models with strict equality of the theory of groups are precisely
groups.
Remark 5.2.3.2. As can be seen in the previous example, it is often useful to
restrict to models with strict equality. Since equality is always a congruence
(because of the axioms imposed in section 5.2.1), from any model we can con-
struct a model with strict equality by quotienting the model under the relation
interpreting equality, so that this assumption is not very restrictive.
Validity. A sequent
y1 : A1 , . . . , yn : An ` A
is satisfied in M if for every k ∈ N such that the free variables of the sequent
are in {x1 , . . . , xk }, we have
Correctness. We can now formally state the fact that our notion of semantics is
compatible with our logical system.
Theorem 5.2.3.3 (Correctness). Every derivable sequent is valid.
Corollary 5.2.3.4. For every theory Θ and closed formula A such that Θ ` A is
derivable, every model of Θ is also a model of A.
Example 5.2.3.5. In the theory of groups, one can show
∀x.∀y.∀y 0 .x × y = 1 ⇒ y 0 × x = 1 ⇒ y = y 0
x×y =1
y 0 × (x × y) = y 0 × 1
y 0 × (x × y) = y 0
(y 0 × x) × y = y 0
1 × y = y0
y = y0
∀x.∀y.x × y = y × x
We know that there exist non-abelian groups, for instance the symmetric group
on 3 elements S3 . Such a non-abelian group being a model for the theory of
groups but not for the above formula, we can conclude that this formula cannot
be deduced in the theory of groups.
Finally, a major consequence of the theorem is the following. A theory is satis-
fiable when it admits a model.
Proposition 5.2.3.8. A satisfiable theory is consistent.
states that for every element x there exists a y such that A(x, y) is satisfied.
When this formula admits a model, we can construct a function f which to
every x associates one of the associated y. Thus it implies that the formula
∀x.A(x, f (x))
∀x.0 = S(x) ⇒ ⊥
∀x.∀y.S(x) = S(y) ⇒ x = y
∀x.0 + x = x
∀x.∀y.S(x) + y = S(x + y)
together with, for every formula A(x) with one free variable x, an axiom
– A(0): 0 + 0 = 0.
– Suppose A(x), we have A(S(x)), namely S(x) + 0 = S(x + 0) = S(x).
This theory was shown by Presburger to be consistent, complete and decid-
cn
able [Pre29]. In the worse case, any decision algorithm has a complexity O(22 )
with respect to the size n of the formula to decide [FR98], although it is useful
in practice (it is for example implemented in the tactic omega of Coq). It is also
very weak: for instance, one cannot define the multiplication function in it (if
we could, it would not be decidable, see next section).
5.2.5 Peano and Heyting arithmetic. The Peano arithmetic, often written
PA, extends Presburger arithmetic by also axiomatizing multiplication. It is
the theory with equality on the signature Σ = {0 : 0, S : 1, + : 2, × : 2} whose
axioms are those of equality, those of Presburger arithmetic, and
∀x.0 × x = 0
∀x.∀y.S(x) × y = y + (x × y)
This theory is implicitly understood with an ambient classical first order logic.
When the logic is intuitionistic, the theory is called Heyting arithmetic (or HA).
Exercise 5.2.5.1. In HA, prove ∀x.x + 0 = x.
· · x · · ·
>
· y · · · · · · ·
· z ·
the following Hydra game on trees [KP82]. This game with two players starts
with a tree as above and at each turn
– the first player removes a leaf x (a node without sons) of the tree,
– the second player chooses a number n, looks for the parent y of x and the
parent z of y (it does nothing if no such parents exist), and adds n copies
of the tree with y as root as new children of z.
The game stops when the tree is reduced to its root. We now see where the
game draws its name from: the first player cuts the head of the Hydra, but in
response the Hydra grows many new heads! For instance, in the figure above,
the tree on the right is obtained from the one of the left after one round. Given
trees α and β, it can be shown that α > β if and only if β can be obtained after
some finite number rounds of the game starting from α. Believing that ε0 is
well-founded is thus equivalent to believing that every such game will necessarily
end (try it, to convince yourself that it always does!).
5.3.1 Naive set theory. The naive set theory is the theory with a binary pred-
icate “∈” and the following axiom scheme, called unrestricted comprehension
∃y.∀x.x ∈ y ⇔ A
for every formula A with x as only free variable. Informally, this states for every
property A(x), the existence of a set
y = {x | A(x)}
Russell’s paradox. There is only a “slight” problem with this theory: it is in-
consistent, meaning that we can in fact prove any formula, which explains why
everything was so simple. This was first formalized by Russell in 1901, using
what is known nowadays as the Russell paradox, which goes as follows. Consider
the property
A = ¬(x ∈ x)
The unrestricted comprehension scheme ensures the existence of a set y such
that
∀x.x ∈ y ⇔ ¬(x ∈ x)
In particular, for x being y, we have
y ∈ y ⇔ ¬(y ∈ y)
(A ⇒ ¬A) ⇒ (¬A ⇒ A) ⇒ ⊥
can be proved by
(ax) (ax) (ax) (ax)
Γ, ¬A ` ¬A ⇒ A Γ, ¬A ` ¬A Γ, A ` A ⇒ ¬A Γ, A ` A
(ax) (⇒E ) (⇒E ) (ax)
Γ, ¬A ` ¬A Γ, ¬A ` A Γ, A ` ¬A Γ, A ` A
(¬E ) (⇒E )
Γ, ¬A ` ⊥ Γ, A ` ⊥
(¬I ) (¬I )
Γ ` ¬¬A Γ ` ¬A
(¬E )
Γ`⊥
(⇒I )
` (A ⇒ ¬A) ⇒ (¬A ⇒ A) ⇒ ⊥
CHAPTER 5. FIRST-ORDER LOGIC 242
Size issues. The problem with naive set theory is due to size: the collection of
all sets is “too big” to actually form a set. Once this issue identified, subsequent
attempts at formalizing set theory have struggled to take this issue in account.
We should not be able to consider this as a set and therefore we cannot consider
the set of all sets which satisfy a property, such as not belonging to itself...
Axiom of extensionality. This axiom states that two sets with the same elements
are equal:
∀x.∀y.((∀z.z ∈ x ⇔ z ∈ y) ⇒ x = y)
If we introduce the notation x ⊆ y for the formula ∀z.z ∈ x ⇒ z ∈ y which
expresses that the set x in included in the set y, the axiom of extensionality can
be rephrased as
∀x.∀y.(x ⊆ y ∧ y ⊆ x) ⇒ x = y
i.e. two sets are equal precisely when they have the same elements.
CHAPTER 5. FIRST-ORDER LOGIC 243
Axiom of union. This axiom states that the union of the elements of a set is
still a set:
∀x.∃y.∀i.(i ∈ y ⇔ ∃z.(i ∈ z ∧ z ∈ x))
In more usual notation, this states the existence, for every set x, of the set
[ [
y= x= z
z∈x
where the set {x, y} is constructed using the axiom schema of replacement, see
below.
Axiom of powerset. This axiom states that given a set x, there is a set whose
elements are precisely the subsets of x, usually called the powerset of x and
denoted P(x):
∀x.∃y.∀z.(z ∈ y ⇔ (∀i.i ∈ z ⇒ i ∈ x))
In more usual notation,
∀x.∃y.∀z.(z ∈ y ⇔ z ⊆ x)
y = P(x) = {z | z ⊆ x}
Axiom of infinity. The axiom of infinity states the existence of an infinite set:
where the empty set ∅ is defined using the axiom schema of replacement below
and S(y) = y ∪ {y} is the successor of a set. A set is called inductive when
it contains the empty set and is closed under successor: the axiom states the
existence of an inductive set. In particular, the set N of natural numbers can be
constructed as the intersection of all inductive sets. Here, the natural numbers
are encoded following the von Neumann convention:
Axiom schema of replacement. This axiom states that the image of a set under
a partial function is a set:
For simplicity, we consider the case where the formula contains only i and j
as free variables, and is thus denoted A(i, j). In this case the axiom reads as
Axiom of foundation. The axiom of foundation states that every non-empty set
contains a member which is disjoint from the whole set:
∀x.x 6= ∅ ⇒ ∃y ∈ x.y ∩ x = ∅
The axiom of choice. The axiom of choice states that given a collection x of
non-empty sets, we can pick an element in each of the sets:
[
∀x.∅ 6∈ x ⇒ ∃(f : x → x).∀y ∈ x.f (y) ∈ y
element of y (i.e. f (y) ∈ y): this is called a choice function for x. The careful
reader will notice that the existence of a function is not a formal statement of
our language but it can be encoded: the formula ∃(f : x → y).A asserting the
existence of a function f from x to y such that A, is a notation for a formula of
the form
∃f.f ⊆ x × y ∧ . . .
which would state (details left to the reader) the existence of a subset f of x × y
which, as a relation, encodes a total function such that A is satisfied.
The axiom of choice has a number of classically equivalent formulations
among which
– every relation defined everywhere contains a function,
– every surjective function admits a section,
– the product of a family of non-empty sets is non-empty,
– every set can be well-ordered,
and so on.
5.3.3 Intuitionistic set theory. Set theory, as any other theory can also
be considered within intuitionistic first order logic, in which case it is called
IZF. The reason for is the usual one: we want to be able to exhibit explicit
witnesses when constructing elements of sets. We will however see that there is
a price to pay for this, which is that things behave much differently than usual:
intuitionism is not necessarily intuitive, see [Bau17] for a very good general
introduction to the subject.
(0 ∈ {x ∈ N | A}) ∨ (0 6∈ {x ∈ N | A})
which is equivalent to
A ∨ ¬A
and we conclude.
The intuition behind this result is the following one. In a constructive world,
an element of x = {y ∈ N | A} consists of as an element of N together with a
proof that A holds. Therefore, in order to decide whether 0 belongs to x or not,
we have to decide whether A holds or not.
Considering the variant of the excluded middle recalled in lemma 2.3.5.3,
similarly, we cannot test a set for emptiness either:
Lemma 5.3.3.2. In IZF, the formula
∀x.(x = ∅) ∨ (x 6= ∅)
¬A ∨ ¬¬A
∀x.∀y.(x = y) ∨ (x 6= y)
would imply that we can test for emptiness as a particular case. Of course, this
does not imply that we cannot decide the equality of some particular sets. For
instance, one can show that 0 = ∅ =6 {∅} = 1 (because ∅ belongs to 1 but not
to 0) and therefore, writing
B = {0, 1} = {x ∈ N | x = 0 ∨ x = 1}
for the set of booleans, we can decide the equality of booleans. By a similar
reasoning, we can decide the equality of natural numbers.
Many other “unexpected” properties of IZF (compared to the classical case)
can be proved along similar lines. For instance, the finiteness of subsets of a
finite set is equivalent to being classical. By a finite set, we mean here a set x
for which there is a natural number n and a bijection f : {0, . . . , n − 1} → x.
Lemma 5.3.3.3. In IZF, every subset of a finite set is finite if and only if the law
of excluded middle is satisfied.
CHAPTER 5. FIRST-ORDER LOGIC 248
Proof. Suppose that every finite subset of a finite set is finite. Given a prop-
erty A, consider the set x = {y ∈ B | A}, which is a subset of the finite set B of
booleans. By hypothesis, this set is finite and therefore there exists a natural
number n and a function f as above. Since we can decide equality for natural
numbers as argued above, we have either n = 0 or n 6= 0: in the first case x = ∅
and thus ¬A holds, in the second case, f (0) ∈ x and thus A holds. We therefore
have A ∨ ¬A. Conversely, in classical logic, every subset of a finite set is finite,
as everybody knows.
The axiom of choice. Seen from a constructive perspective the axiom of choice
is quite dubious: it allows the construction of an element in each set of a family
of non-empty sets, without having to provide any hint at how such an element
could be constructed. In particular, given a non-empty set x, the axiom of
choice provides a function f : {x} → x, i.e. an element of x (the image of x
under f ), and allows proving
x 6= ∅ ⇒ ∃y.y ∈ x
i.e. we can construct an element in x by only knowing that there exists one.
This is precisely the kind of behavior we invoked in section 2.5.2 in order to
motivate the fact that double negation elimination was not constructive. In
fact, we will see below that having the axiom of choice implies that the ambient
logic is classical.
Another reason why the axiom of choice can be questioned is that it al-
lows proving quite counter-intuitive results, the most famous perhaps being the
Banach-Tarski theorem recalled below. Two sets A and B of points in R3 are
congruent if one can be obtained from the other by an isometry, i.e. by using
translations, rotations and reflections.
Theorem 5.3.3.4 (Banach-Tarski). Given two bounded subsets of R3 of non-
empty interior, there are partitions
A = A1 ] . . . ] An B = B1 ] . . . ] Bn
such that Ai is congruent to Bi for 1 6 i 6 n.
Proof. Using the axiom of choice and other ingredients...
In particular, consider the case where A is a ball in R3 and B is two copies of
the ball A. The theorem states that there is a way to partition the ball A and
move the subsets of the partition using isometries only, in order to make two
balls. If you try at home, you should convince yourself that there is no easy
way to do so.
For such reasons, people started to investigate the status of the axiom of
choice with respect to ZF. In 1938, Gödel constructed a model of ZFC (i.e. a
model of ZF satisfying the axiom of choice) inside an arbitrary model of ZF,
thus showing that ZFC is consistent if ZF is [Göd38]. In 1963, Cohen showed
that the situation is similar with the negation of the axiom of choice. The
axiom of choice is thus independent of ZF: neither this axiom nor its negation
is a consequence of the axioms of ZF and one can add it or its negation without
affecting consistency.
Constructivists however will reject the axiom of choice, because it implies
the excluded middle:
CHAPTER 5. FIRST-ORDER LOGIC 249
Theorem 5.3.3.5. In IZF with the axiom of choice, the law of elimination of
double negation holds.
Proof. Fix a formula A and suppose ¬¬A holds. The set
x = {y ∈ N | A}
x = {z ∈ B | (z = 0) ∨ A} and y = {z ∈ B | (z = 1) ∨ A}
Those sets are not empty since 0 ∈ x and 1 ∈ y. By the axiom of choice, there
is therefore a function f : {x, y} → B such that f (x) ∈ x and f (y) ∈ y. Now,
f (x) and f (y) are booleans, where equality is decidable, so that we can reason
by case analysis on those.
– If f (x) = f (y) = 0 then 0 ∈ y thus (0 = 1) ∨ A holds, thus A holds.
– If f (x) = f (y) = 1 then 1 ∈ x thus (1 = 0) ∨ A holds, thus A holds.
– If f (x) = 0 6= 1 = f (y) then x 6= y (otherwise, f (x) = f (y) would hold),
and we have ¬A: namely, supposing A, we have x = y = A, and thus thus
⊥ since x 6= y,
– If f (x) = 1 6= 0 = f (y) then we can show both A and ¬A as above (so
that this case cannot happen).
CHAPTER 5. FIRST-ORDER LOGIC 250
Therefore, we have ¬A ∨ A.
This motivates, for the reader convinced of the interest of intuitionistic logic,
which we hope you are by now, the exploration of set theory without choice,
but the reader should be warned that this theory behave much differently than
usual. For instance, Blass has shown the following result [Bla84]:
Theorem 5.3.3.7. In ZF, the axiom of choice is equivalent to the fact that every
vector space has a basis.
In fact, we know models of ZF where there is a vector space admitting no basis,
and one admitting two basis of different cardinalities.
f 0 (x) = (f (x + ε) − f (x))/ε
for any non-zero infinitesimal ε. Namely, f 0 (x) should be the slope of the line
tangent to the slope of f at x, i.e.
f (x + ε) = f (x) + f 0 (x)ε
More precisely, by “almost 0”, we mean here that it should capture first-order
variations, i.e. it should be so small that ε2 = 0. If we are ready to accept
the existence of such entities, we find out that computations which traditionally
involve subtle concepts such as limits, become simple algebraic manipulations.
For instance, consider the function f (x) = x2 . We have
D = {ε ∈ R | ε2 = 0}
f (ε) = a + bε
Once this axiom postulated, we necessarily have a = f (0) and we can define
f 0 (x) to be the coefficient b. We have already given an example of such a
computation above. We can similarly, compute the derivative of a product of
two functions by
(f × g)0 (x + ε)
= f (x + ε) × g(x + ε)
= (f (x) + f 0 (x)ε) × (g(x) + g 0 (x)ε)
= (f (x) + g(x)) + (f 0 (x)g(x) + f (x)g 0 (x))ε + (f 0 (x) + g 0 (x))ε2
= (f (x) + g(x)) + (f 0 (x)g(x) + f (x)g 0 (x))ε
and therefore (f × g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x) as expected. Similarly, the
derivative of the composite of two functions can be computed by
ε = ε2 /ε = 0/ε = 0
¬¬(ε = 0)
5.4 Unification
Suppose fixed a signature. Given two terms t and u, a very natural question
is: is there a way to substitute their variables in order to make them equal? In
other words, we are trying to solve the equation
t=u
One quickly finds out that there is quite often an infinite number of solutions,
and we refine the question to: is there a “smallest” way of substituting the
CHAPTER 5. FIRST-ORDER LOGIC 252
t=
?u
where t and u are respectively called the left and right member of the equation.
A substitution σ, see section 5.1.3, is a solution of the equation when
t[σ] = u[σ]
– f (x, b()) =
? f (a(), y) has one unifier: [a()/x, b()/y],
– x=
? f (y, z) has many unifiers: [f (y, z)/x], [f (a(), z)/x, a()/y], etc.
– f (x, y) =
? g(x, y) has no unifier,
– x=
? f (x, y) has no unifier.
Since the solution to an equation system is not unique in general, we can wonder
whether there is a best one in some sense when there is one. We will see that it
is indeed the case.
Suppose fixed an equation system E. It easy to see that its set of solutions
is upward closed:
Lemma 5.4.2.3. Given substitutions σ and τ such that σ 6 τ , if σ is a solution
of E then τ is also a solution of E.
A solution σ of E is a most general unifier when it generates all the solutions
by upward closure, i.e. when τ is a solution of E if and only if σ 6 τ . We will
see in next section that when an equation systems admits an unifier, it always
admits a most general one, and we have an algorithm to efficiently compute it.
We will thus prove, in a constructive way, the following:
Theorem 5.4.2.4. An equation system E has a solution if and only if it has a
most general unifier.
To every equation system in solved form E as above, one can canonically asso-
ciate the substitution
σE = [t1 /x1 , . . . , tn /xn ]
Lemma 5.4.3.1. Given an equation system in solved form E, the substitution σE
is a most general unifier of E.
Given an equation system E, the unification algorithm, due to Herbrand [Her30]
and Robinson [Rob65], applies the transformations of figure 5.2, in an arbitrary
order, and terminates when no transformation applies. We write
E E0
E ⊥
{f (t1 , . . . , tn ) =
? f (u1 , . . . , un )} ∪ E {t1 = ? un } ∪ E
? u1 , . . . , tn =
{f (t1 , . . . , tn ) =
? g(u1 , . . . , um )} ⊥
{f (t1 , . . . , tn ) =
? f (t1 , . . . , tn )} ∪ E E
{f (t1 , . . . , tn ) =
? x} ∪ E {x =
? f (t1 , . . . , tn )} ∪ E
{x =
? f (t1 , . . . , tn )} ⊥
? t} ] E 0
{x = ? t} ∪ E 0 [t/x]
{x =
We have
E {a() =
? a(), g(x) =
? y, g(x) =
? g(h(z))} by Decompose,
{g(x) =
? y, g(x) =
? g(h(z))} by Delete,
{y =
? g(x), g(x) =
? g(h(z))} by Orient,
{y =
? g(x), x =
? h(z)} by Decompose,
{y =
? g(h(z)), x =
? h(z)} by Propagate.
The size |t| of a term is the number of function symbols occurring in it:
n
X
|x| = 0 |f (t1 , . . . , tn )| = 1 + |ti |
i=1
Theorem 5.4.3.3. Given any equation system E as input the unification algo-
rithm always terminates. It fails if and only if E has no solution, otherwise the
equation system E 0 at the end of the execution is in solved form and σE 0 is a
most general unifier of E.
Proof. This is detailed in [BN99, section 4.6]. Termination can be shown by
observing that the rules make the size of the equation system E decrease: here,
the size is the triple (n1 , n2 , n3 ) of natural numbers, ordered lexicographically,
where n1 is the number of unsolved variables (a variable is solved
P when it occurs
exactly once in E, as a left member of an equation), n2 = (t = ? u)∈E |t| + |u|
is the size of the equation system and n3 is the number of equations of the
form t =? x in E. The other properties result from the fact that the transforma-
tions preserve the set of unifiers (⊥ has no unifier by convention) and that the
resulting equation system is in solved form.
Example 5.4.3.4. The most general unifier of example 5.4.3.2 is
[g(h(z))/y, h(z)/x]
Remark 5.4.3.5. The side conditions of propagate are quite important (and often
forgotten by students when first implementing unification). Without those,
unification problems such as {x = ? f (x)} would lead to an infinite number of
applications of rules Propagate and Decompose, and thus fail to terminate:
{x =
? f (x)} {f (x) =
? f (f (x))} {x =
? f (x)} ...
The side condition avoids this and the rule Occurs-check makes the unification
fail: the solution would intuitively be the “infinite term”
f (f (f (. . .)))
{x1 =
? f (x0 , x0 ), x2 =
? f (x1 , x1 ), . . . , xn =
? f (xn−1 , xn−1 )}
This function raises the exception Not_unifiable when the system has no solu-
tion. The unifier of example 5.4.3.2 can then be computed with
let s =
let t =
App ("f", [
CHAPTER 5. FIRST-ORDER LOGIC 257
A variable will initially be Var r with the reference r containing None (we do not
use a string to indicate the name of the variable since the name is not relevant,
only the position in memory where the None is stored is), and substitution with
a term t will amount to replace this value by Some t. We can thus generate a
fresh variable with
let var () = Var (ref None)
and the right notion of equality between terms is given by the following function
let rec eq t u =
match t, u with
| Var {contents = Some t}, u -> eq t u
| t, Var {contents = Some u} -> eq t u
| Var x, Var y -> x == y
| App (f, tt), App (g, uu) ->
f = g && List.for_all2 eq tt uu
| _ -> false
where we use the fact that a reference is implemented in OCaml as a record
with contents as only field, which is mutable. We can check whether a variable
occurs in a term with
let () =
let x = var () in
let y = var () in
let z = var () in
let t =
App ("f", [
App ("a", []);
App ("g", [x]);
App ("g", [x])
]) in
let u =
App ("f", [
App ("a", []);
y;
App ("g", [App ("h", [z])])
]) in
unify t u
Clausal form. In order to do so, we must first generalize the notion of clausal
form:
– a literal L is a formula a predicate applied to terms or its negation
C ::= L1 ∨ L2 ∨ . . . ∨ Lk
The resolution rule. We can assimilate a theory Γ in clausal form with a first
order context. The resolution rule of section 2.5.8 can then be modified as
follows in order to account for first-order:
Γ ` C ∨ P (t1 , . . . , tn ) Γ ` ¬P (u1 , . . . , un ) ∨ D
(res)
Γ ` (C ∨ D)[σ]
{t1 = ? un }
? u1 , . . . , tn =
The factoring rule. As is, the resolution rule is not complete (see example 5.4.6.6
below). We can however make the system complete by adding the following
factoring rule
Γ ` C ∨ P (t1 , . . . , tn ) ∨ P (u1 , . . . , un )
(fac)
Γ ` (C ∨ P (t1 , . . . , tn ))[σ]
where σ is the most general unifier of {t1 = ? un }. With this rule,
? u1 , . . . , tn =
the completeness theorem 2.5.8.7 generalizes as follows:
Theorem 5.4.6.5 (Refutation completeness). A set Γ of clauses is not satisfiable
if and only if we can show Γ ` ⊥ using axiom, resolution and factoring rules
rules only.
Example 5.4.6.6. Given a unary predicate P , consider the theory
Γ = {P (x) ∨ P (y), ¬P (x) ∨ ¬P (y)}
which is not satisfiable. The resolution rule only allows us to deduce the clauses
P (x) ∨ ¬P (x) and P (y) ∨ ¬P (y), from which we cannot deduce any other clause:
without factoring, the resolution rule is not complete. With factoring, we can
show that Γ is inconsistent by
(ax)
Γ ` P (x) ∨ P (y)
(ax) (fac) (ax)
Γ ` P (x) ∨ P (y) Γ ` P (x) Γ ` ¬P (x) ∨ ¬P (y)
(fac)
Γ ` P (x) Γ ` ¬P (y)
(res)
Γ`⊥
Agda
6.1.1 Features of proof assistants. We shall first present some of the gen-
eral features that Agda has, or has not. There is no room here for a detailed
comparison with other provers, the interested reader can find details in [Wie06]
for instance. Let us simply mention some difference with the main competitor,
which is currently Coq. Other well-known proof assistants include ACL2, HOL
Light, Isabelle, Lean, Mizar, PVS, etc.
the first one, and it will be more involved to define a function of the second than
of first type (although not considerably so).
Programs vs tactics. The Agda code looks pretty much like a program in a
functional programming language. For instance, the proof of A × B → A is, as
expected a program which takes a pair (a, b) and returns a:
postulate A B : Set
proj : A × B → A
proj (a , b) = a
which is easily compared with the corresponding definition in the OCaml toplevel
# let proj (a , b) = a;;
val proj : 'a * 'b -> 'a = <fun>
On the contrary, Coq uses tactics which describe how to progress into the proof.
The same proof in Coq would look like this:
Variables A B : Prop.
The difference between the two is mostly a matter of taste, both are quite con-
venient to use and have the same expressive power. The reason we chose to use
Agda in this course is that it makes more clear the Curry-Howard correspon-
dence, which is one of the main objects of this course.
CHAPTER 6. AGDA 264
Automation. There is one main advantage of using tactics over programs how-
ever: it allows more easily for automation, i.e. Coq can automatically build
parts of the proofs for us. For instance, the previous example can be proved in
essentially one line, which will automatically generate all the above steps:
Variables A B : Prop.
∀m ∈ Z.∀n ∈ Z.(1 + 2 × m) 6= (n + n)
Program extraction. A major feature of Coq is that the typing system allows to
perform what is called program extraction: once the program is proved correct,
one can extract the program (in OCaml) and forget about the parts which are
present only to prove the correctness of the program. In contrast, the support
for program extraction in Agda is less efficient and more experimental.
Correctness. It might seem obvious, but let us state this anyway: a proof assis-
tant should be correct, in the sense that when it accepts a proof then the proof
should actually be correct. Otherwise, it would be very easy to write a proof
assistant:
let () =
while true do
let _ = read_line () in
print_endline "Your proof is correct."
done
We will see that sometimes the logic implemented in proof assistants is not
consistent for very subtle reasons (for instance, in section 8.2.2): in this case, the
program allows proving ⊥ and thus any formula, and thus essentially amounts
CHAPTER 6. AGDA 265
to the above although it is not obvious at all. For modern and well-developed
proof assistants, we however have good reasons to trust that this is not the case,
see below.
Small kernel. An important design point for a theorem prover is that it should
have a small kernel, whose correctness ensures the correctness of the whole
program: this is called the de Bruijn criterion. A proof assistant is made of a
large number of lines of code (roughly 100 000 lines of Haskell for Agda and
225 000 lines of OCaml for Coq), those lines are written by humans and there is
always the possibility that there is a bug in the proof assistant. For this reason,
it is desirable that the part of the software that we really have to trust, its
“kernel”, which mainly consists in the typechecker, is as small as possible and
isolated from the rest of the software, so that all the efforts for correctness can
be focused on this part. For instance, in Coq, a tactic can produce roughly any
proof in order to automate part of the reasoning: this is not really a problem
because, in the end, the typechecker will ensure that the proof is correct, so
that we do not have to trust the tactic. In Coq, the kernel is roughly 10% of the
software; in Agda, the kernel is a bit larger, because it contains more features
(dependent pattern matching in particular), which means that programming is
easier on some aspects, but the trust that we have in the proof checker is a bit
smaller.
In order to have a small kernel, it is desirable to reuse as much as possible
existing features; this principle is followed by most proof assistants. For instance
in OCaml, there is a type bool of booleans, but those could already have been
implemented using inductive types by
type bool = False | True
This is reasonable in OCaml to have a dedicated type for performance reasons
but, in a proof assistant, this would mean more code to trust which is a bad
thing: if we can encode a feature in some already existing feature, this is good.
In Agda, booleans are actually implemented as above:
data Bool : Set where false true : Bool
as well as in Coq:
Bootstrapping. A nice idea in order to gain confidence in the proof checker would
be to bootstrap and prove its correctness inside itself: OCaml is programmed in
OCaml, why couldn’t we prove Agda in Agda? Gödel’s second incompleteness
theorem unfortunately shows that this is impossible. However, a fair amount
can be done, and has been done in the case of Coq [BW97]: the part which
is out of reach is to show the termination of Coq programs inside Coq (we
already faced a similar situation, in the simpler case of Peano arithmetic, see
section 5.2.5).
functions that you can write in proof assistants such as Agda are total: they
always produce a result in a finite amount of time. In order to ensure this,
heavy restrictions are imposed on the programs which can be implemented in
proof assistants. Firstly, since all functions are total, the language is not Turing-
complete: there are some programs that you can write in usual programming
languages that you will not be able to write in a proof assistant. Fortunately,
those are “rare” and typically arise when trying to bootstrap as explained above.
Secondly, since the problem of deciding whether a function terminates or not
is undecidable, the proof assistant actually implements conditions which ensure
that accepted programs will terminate, but some terminating programs will ac-
tually get rejected for “no good reason”. These issues are detailed in section 6.8.
6.1.2 Installation. In order to use Agda you will need two pieces of software:
Agda itself and an editor which supports interacting with Agda. The advised
editor is Emacs.
cabal update
cabal install Agda
agda-mode setup
Atom. For people thinking that Emacs looks too old, a most modern-looking
editor compatible with Agda is Atom1 , which is available for most platforms.
In order to activate Agda support, you should go into the menu
Edit > Preferences > Install
search for “agda”, and install both agda-mode and language-agda. In the pref-
erences, under the Editor tab, it is also recommended to activate the Scroll
Past End checkbox. If Atom has trouble finding the standard libraries, in the
settings of the Package agda-mode, change the Agda path to something like
/usr/bin/agda -i /usr/share/agda-stdlib
cabal update
cabal install Agda
agda-mode setup
As usual you can also search on the web. In particular, there are also various
forums such as Stackoverflow:
https://stackoverflow.com/questions/tagged/agda
A very good introduction to Agda is [WK19].
6.2.2 Shortcuts. When writing a proof in Agda, we do not have to write the
whole program directly: this would be almost impossible in practice. The editor
allows us to leave “holes” in proofs (written ?) and provides us with shortcuts
which can be used in order to fill those holes and refine programs. Below we
provide the shortcuts for the most helpful ones, writing C-x for the control key
+ the x key. They might seem a bit difficult to learn at first, but you will see
that they are easy to get along, and we can live our whole Agda life with only
six shortcuts.
3 https://emacsformacosx.com/
4 http://markokoleznik.github.io/agda-writer/
5 https://www.haskell.org/platform/windows.html
6 http://ftp.gnu.org/gnu/emacs/windows/emacs-26/
7 http://homepage.divms.uiowa.edu/~astump/agda/
8 https://wiki.portal.chalmers.se/agda/pmwiki.php/Libraries/StandardLibrary
CHAPTER 6. AGDA 268
Agda. The main shortcuts for Agda that we will need are the following ones,
their use is explained below in section 6.2.6.
C-c C-l typecheck and highlight the current file
C-c C-, get information about the hole under the cursor
C-c C-space give a solution
C-c C-c case analysis on a variable
C-c C-r refine the hole
C-c C-a automatic fill
middle click definition of the term
A complete list can be found in the online documentation. Shortcuts which are
also sometimes useful are C-c C-. which is like C-c C-, but also shows the
inferred type for the proposed term for a hole, and C-c C-n which normalizes a
term (useful to test computations).
Symbols. Agda allows for using fancy UTF-8 symbols: those are entered using
\ (backslash) followed by the name of the symbol (many names are shared with
LaTeX). Most of them can be found in the documentation. The most useful
ones are for logic
∧ \and \top → \to ∀ \all Π \Pi λ \Gl
∨ \or \bot ¬ \neg ∃ \ex Σ \Sigma ≡ \equiv
and some other useful ones are
\bN × \times ≤ \le ∈ \in
\uplus ∷ \:: ∎ \qed
Indices and exponents such as in x ₁ and x¹ are respectively typed \_1 and \^1,
and similarly for other.
6.2.3 The standard library. The standard library defines most of the ex-
pected data types. The default path is /usr/share/agda-stdlib and you are
encouraged to have a look in there or in the online documentation. We list
below some of the most useful modules.
Logic. Not much is defined in the core of the Agda language and most of the
type constructors are also defined in the standard library:
Data.Sum Sum types ( , ∨)
Data.Product Product types (×, ∧, ∃, Σ)
Relation.Nullary Negation (¬)
Relation.Binary.PropositionalEquality Equality (≡)
Algebra. The standard library contains modules for useful algebraic structures
in Algebra.*: monoids, rings, groups, lattices, etc.
6.2.4 Hello world. A mandatory example is the “hello world” program, see
section 1.1.1. We can of course write it in Agda:
open import IO
6.2.5 Our first proof. As a first proof, let’s show that the propositional for-
mula
A∧B ⇒B∧A
is valid. By the Curry-Howard correspondence, we want a program of the type
A×B →B×A
showing that × is commutative. In OCaml, we would have typed
# let prod_comm (a , b) = (b , a);;
val prod_comm : 'a * 'b -> 'b * 'a = <fun>
The full proof in Agda goes as follows:
open import Data.Product
Importing modules. The programs in Agda, including the standard library, are
organized in modules which are collections of functions dedicated to some fea-
ture. Here, we want to use the product, and therefore we have to import the
corresponding module, which is called Data.Product in the standard library. In
order to do so, we use the open import command which loads all its functions.
CHAPTER 6. AGDA 270
Function definitions. The definition of the function is then the expected one
×-comm A B (a , b) = (b , a)
We take three arguments: A, B and a pair (a , b) and return the pair (b , a).
Note that the fact that we can write (a , b) for the third argument is because
Agda allows definitions by pattern matching (just as OCaml): here, the product
has only one constructor, the pair.
Typesetting UTF-8 symbols. Since we want our proofs to look fancy, we have
used some nice UTF-8 symbols: for instance “→” and “×”. In the editor, such
symbols are typed by commands such as \to or \times as indicated above, in
section 6.2.2. There are usually text replacements (e.g. we could have written
-> and *), but those are not used much in Agda.
6.2.6 Our first proof, step by step. The above proof is very short, so that
we could have typed it at once and then made sure that it typechecks, but
even for moderately sized proofs, it is out of question to write them in one
go. Fortunately, we can input those gradually, by leaving “holes” in the proofs
which are refined later. Let us detail how one would have done this proof step
by step, in order to introduce all the shortcuts.
We begin by giving the type of the function and its declaration as
×-comm : (A B : Set) → (A × B) → (B × A)
×-comm A B p = ?
We know that our function takes three arguments (A, B and p), which is obvious
from the type, but we did not think hard enough yet of the result so that we
have written ? instead, which can be thought of as a “hole” in the proof. We
can then typecheck the proof by typing C-c C-l. Basically, this makes sure that
Agda is aware of what is in the editor (and report errors) so that you should
use it whenever you have changed something in the file (outside a hole). Once
we do that, the file is highlighted and changed to
×-comm : (A B : Set) → (A × B) → (B × A)
×-comm A B p = { }0
The hole has been replaced by { }0, meaning that Agda is waiting for some
term here (the 0 is the number of the hole). Now, place your cursor in the hole.
We can see the variables at our disposal (i.e. the context) by typing C-c C-,:
Goal: B × A
------------------------------------------------------------
p : A × B
B : Set
A : Set
This is useful to know where we are exactly in the proof: here we want to prove
B × A with A, B and p of given types. Now, we want to reason by case analysis
on p. We therefore use the shortcut C-c C-c, Agda then asks for the variable
on which we want to reason by case on, in this case we reply p (and enter). The
file is then changed to
×-comm : (A B : Set) → (A × B) → (B × A)
×-comm A B (fst , snd) = { }0
Since the type of p is a product, p must be a pair and therefore Agda changes p
to the pattern (fst , snd). Since we do not like the default names given by
Agda to the variables, we rename fst to a and snd to b:
×-comm : (A B : Set) → (A × B) → (B × A)
×-comm A B (a , b) = { }0
CHAPTER 6. AGDA 272
We should then do C-c C-l so that Agda knows of this change (remember that
we have to do it each time we modify something outside a hole). Now, we place
our cursor into the hole. By the same reasoning, the hole has a product as a
type, so that it must be a pair. We therefore use the command C-c C-r which
“refines” the hole, i.e. introduces the constructor if there is only one possible for
the given type. The file is then changed to
×-comm : (A B : Set) → (A × B) → (B × A)
×-comm A B (a , b) = { }1 , { }2
The hole was changed in a pair of two holes. In the hole { }1, we know that
the value should be b. We can therefore write b inside it and type C-c C-space
to indicate that we have given the value to fill the hole:
×-comm : (A B : Set) → (A × B) → (B × A)
×-comm A B (a , b) = b , { }2
We could do the same for the second hole (by giving a), but we get bored: this
hole is of type A so that the only possible value for it was a anyway. Agda is
actually able to find that by typing C-c C-a which is the command for letting
the prover try to automatically fill a hole:
×-comm : (A B : Set) → (A × B) → (B × A)
×-comm A B (a , b) = b , a
6.2.7 Our first proof, again. We would like to point out that these steps ac-
tually (secretly) correspond to constructing a proof. For simplicity, we suppose
that A and B are two fixed types, this can be done by typing
postulate A B : Set
and consider the proof
×-comm : (A × B) → (B × A)
×-comm (a , b) = b , a
which is a small variant of previous one. We now explain that constructing this
proof corresponds to constructing a proof in sequent calculus. As a general rule:
6.3.1 The type of types. In Agda, there is by default a type named Set,
which can be thought of as the type of all types: an element of type Set is a
type.
6.3.2 Arrow types. In Agda, we have the possibility of forming function types:
given types A and B, one can form the type
A → B
of functions taking an argument of type A and returning a value of type B. For
instance, the function isEven which determines whether a natural number is
boolean will be given the type
isEven : → Bool
Type constructors. Functions in Agda can operate on types. For instance, the
type of lists, is a type constructor: it is a function which takes a type A as
argument and produces a new type List A, the type of lists whose elements are
of type A. We can thus give it the type
List : Set → Set
The type List A can also be seen as a type which is parametrized by another
type, just as in OCaml the type of lists is ’a list is parametrized by the type ’a.
(x : A) → B
For instance, the even function could also have been given the type
isEven : (x : ) → Bool
CHAPTER 6. AGDA 275
However, the added power comes from the fact that the type B is also allowed
to make use of the variable x. For instance, the function which constructs a
singleton list of some type can be given the following type (see section 6.3.3 for
the full definition of this function):
singleton : (A : Set) → A → List A
Both the second argument and the result use the type A which is given as first
argument. Such a type is called a dependent type: it can depend on a value,
which is given as an argument.
∀x ∈ A.B
For instance, we can define the type of equalities between two elements of a
given type A by
eq : (A : Set) → A → A → Set
and a proof that this equality is reflexive is given the type
refl : (A : Set) → (x : A) → eq A x x
which corresponds to the usual formula
∀A.∀x ∈ A.x = x
Implicit arguments. Sometimes, some arguments can be deduced from the type
of other arguments. For instance, in the singleton function above, A is the type
of the second argument. In this case, we can make the first argument implicit,
which means that we will not have to write it and we will let Agda guess it
instead. This is done by using curly brackets in the type
singleton : {A : Set} → A → List A
This allows us to simply write
singleton 3
and Agda will be able to find out that A has to be , since this is the type of
3. In case we want to specify the implicit argument, we have to use the same
brackets:
singleton { } 3
Another way of having Agda make a guess is to use _ which a placeholder that
has to be filled automatically by Agda. For instance, we could try to let Agda
guess the type of A (which is Set) by declaring
singleton : {A : _} → A → List A
which can equivalently be written
singleton : ∀ {A} → A → List A
CHAPTER 6. AGDA 276
Infix notations. In function names, underscores (_) are handled as places where
the arguments should be put, which allows to easily define infix operators. For
instance, we can define the addition with type
_+_ : → →
_+_ 3 2
The priorities of binary operators can be specified by commands such as
infix 6 _+_
which states that the priority of addition is 6 (the higher the number, the
higher the priority). Operations can also be specified to be left (resp. right)
associative by replacing infix by infixl (resp. infixr). For instance, addition
and multiplications are usually given priorities
infixl 6 _+_
infixl 7 _*_
6.3.5 Records. Records in Agda are pretty similar to those in other language
(e.g. OCaml) and will not be used much here. In order to illustrate the syntax,
we provide here an implementation of pairs using records:
record Pair (A B : Set) : Set where
field
fst : A
snd : B
which will expose all the functions of the module Name. After this command the
modifiers hiding (...) or renaming (... to ...) in order to hide or rename
some of the functions.
6.4.1 Natural numbers. As a first example, the natural numbers are defined
as the inductive type in the module Data.Nat by
data : Set where
zero :
suc : →
The first constructor is zero, which does not take any argument, and the second
constructor is suc, which takes a natural number as argument. A value of type
is
zero suc zero suc (suc zero) suc (suc (suc zero))
and so on. As a commodity, the usual notation for natural numbers is also
supported and we can write 2 as a shorthand for suc (suc zero).
6.4.2 Pattern matching. The way one typically uses elements of an inductive
type is by pattern matching: it allows inspecting a value of an inductive type
and return a result depending on the constructor of the value. As explained
above, the cases are usually generated by using the C-c C-c shortcut which
instructs the editor to perform case analysis on some variable. For instance, in
order to define the predecessor function, we start with
pred : →
pred n = ?
CHAPTER 6. AGDA 279
_mod_ : → →
m mod n with m <? n
m mod n | p = ?
and we can then reason by case analysis on p. Incidentally, we can avoid typing
again the match on the arguments of the function by simply writing “...”:
_mod_ : → →
m mod n with m <? n
... | p = ?
At this point, we reason by case analysis on p (with C-c C-c p) which will
produce two cases depending on the value of p:
_mod_ : → →
m mod n with m <? n
m mod n | yes _ = ?
m mod n | no _ = ?
We can finally, fill those two cases, as indicated by the above formula:
_mod_ : → →
m mod n with m <? n
m mod n | yes _ = m
m mod n | no _ = (m ∸ n) mod n
As a side note, if you actually try the above definition in Agda, you will see that
it gets rejected because it is not clear for Agda that it is actually terminating.
The actual definition is slightly more involved because of this, see section 6.8.
Empty pattern matching. Some inductive types do not have any element. For
instance, we can define the empty type as
data : Set where
-elim : {A : Set} → → A
-elim ()
CHAPTER 6. AGDA 281
Of course, since the type A is arbitrary, there is no way for us in the proof to
actually exhibit a term of this type. But we do not have to: the pattern ()
states that there are no cases to handle when matching on the argument of
type , so that we are done.
It might seem at first that this is not so useful, unless one insists on using
the type (which is actually done quite often since negation is defined using it
as you can expect). This is not so because there are many less obvious ways of
constructing empty inductive types in Agda. For instance, the type zero ≡ suc
zero of equalities between 0 and 1 is also an empty inductive type.
pred : →
pred = λ { zero → zero ; (suc n) → n }
6.4.3 The induction principle. We would now like to briefly mention that
pattern matching in Agda corresponds to the presence of a recurrence or induc-
tion principle. For instance, if we define a function f from natural numbers to
some type A, we will typically define it using pattern matching by
f : → A
f zero = t
f (suc n) = u'
where t and u’ are terms of type A. Here, u’ might make use of the natural
number n provided by the argument, as well as the result of the recursive call
f n: we can suppose that u’ is of the form u n (f n) for some function u of
type → A → A. Any such terms t and u will give rise to a function of type
→ A in this way. The recurrence principle expresses this as a function which
takes two arguments (of type A and → A, respectively corresponding to t and
u) and produces the resulting function:
rec : {A : Set} → A → ( → A → A) → → A
rec t u zero = t
rec t u (suc n) = u n (rec t u n)
This is precisely the recursor we have already met when adding natural numbers
to simply typed λ-calculus in section 4.3.6. Moreover, any function of type →
A defined using pattern matching can be defined using this function instead: this
recurrence function encapsulates all the expressive power of pattern matching.
For instance, the predecessor function would be defined as
pred : →
pred = rec zero (λ n _ → n)
CHAPTER 6. AGDA 282
From a logical point of view, the recurrence principle corresponds to the elimi-
nation rule: for this reason it is also sometimes also called an eliminator.
Pattern matching in Agda is more powerful than this however: it can also be
used in order to define functions where the return type depends on the argument.
This means that we now consider functions of the form
f : (n : ) → P n
f zero = t
f (suc n) = u n (f n)
– a predicate P,
– an element t of P zero, and
– a function u of type → P n → P (suc n),
this function allows us to construct a function of type (n : ) → P n. If, fol-
lowing the Curry-Howard correspondence, we read the type as a logical formula
(see section 6.5), we precisely recover the usual recurrence principle over natural
numbers:
P (0) ⇒ (∀n ∈ N.P (n) ⇒ P (n + 1)) ⇒ ∀n ∈ N.P (n)
For instance, the following proof by recurrence that n + 0 = n for every natural
number n
+-zero : (n : ) → n + zero ≡ n
+-zero zero = refl
+-zero (suc n) = cong suc (+-zero n)
can be expressed as follows using the dependent induction principle:
+-zero : (n : ) → n + zero ≡ n
+-zero = rec (λ n → n + zero ≡ n) refl (λ n p → cong suc p)
6.4.7 Vectors. A vector is a list of given length. The type of vectors is defined
in Data.Vec by
data Vec (A : Set) : → Set where
[] : Vec A zero
_∷_ : {n : } → A → Vec A n → Vec A (suc n)
An element of Vec A n can be seen as a list whose elements are of type A and
whose length is n. In this type, we thus have both a parameter A of type Set
and an index of type , corresponding to the length of the vector, indicated
by the fact that the return type is → Set. Indices are roughly the same as
parameters, excepting that they can vary with constructors, as seen above: the
constructor [] produces a vector of length zero, whereas the constructor _∷_ a
list of length suc n. It is “pure coincidence” if the names of the constructors
are the same as for lists: they have nothing to do with those and could have
named differently (however, people chose to name them in the same way because
vectors are usually used as a replacement for lists).
Dependent types. It should be observed that the type Vec A n of vectors depends
on a term n, the natural number indicating its length: this is a defining feature
of dependent types. We can also define functions such that the type of the result
depend on the argument. For instance, we have the following function, building
a vector containing n occurrences of a given value:
replicate : {A : Set} → A → (n : ) → Vec A n
replicate x zero = []
replicate x (suc n) = x ∷ replicate x n
CHAPTER 6. AGDA 285
Dependent pattern matching. Another natural function on this type is the func-
tion returning the head of the list:
head : {n : } {A : Set} → Vec A (suc n) → A
head (x ∷ xs) = x
This is a good illustration of the dependent pattern matching present in Agda.
Since the argument is a list of type Vec A (suc n), Agda automatically infers
that this function will never be applied to an empty list, because it cannot have
such a type, thus avoiding the problem we had when defining the same function
on lists in section 6.4.6.
Convertibility. Even though the type is more informative than the one of lists,
typical functions are not significantly harder to write. For instance, the con-
catenation of vectors is comparable to the one of lists:
_++_ : {m n : } {A : Set} → Vec A m → Vec A n → Vec A (m + n)
[] ++ l = l
(x ∷ l) ++ l' = x ∷ (l ++ l')
Looking closely at the first case of the pattern matching, we can note that the
result l we are providing is of type Vec A n whereas the type of the function
indicates that we should provide a result of type Vec A (zero + n). This illus-
trates the fact that Agda is able to compare types up to β-reduction on terms
(zero + n reduces to n): we can never distinguish between two β-convertible
terms.
6.4.8 Finite sets. In section 6.4.4, we have defined the set of booleans, which
contains two elements, and clearly we could have defined a set with n elements
for any fixed natural number n. For instance, the following type has four ele-
ments:
data Four : Set where
a : Four
b : Four
c : Four
d : Four
In fact, we can define, once for all, a type Fin n which depends on a natural
number n and has n elements. The definition is done in Data.Fin by
data Fin : → Set where
zero : {n : } → Fin (suc n)
suc : {n : } → Fin n → Fin (suc n)
Looking at it, we can see that Fin n is essentially the collection of natural
numbers restricted to
Fin n = {0, . . . , n − 1}
The above inductive type namely corresponds to the following inductive set-
theoretic definition:
Fin 0 = ∅
Fin (n + 1) = {0} ∪ {i + 1 | i ∈ Fin n}
As for vectors, the fact that the constructors have the same name as for natural
numbers is “pure coincidence”: the elements of Fin n are not elements of ,
although there is obviously a canonical mapping:
to : {n : } → Fin n →
to zero = zero
to (suc i) = suc (to i)
Some black magic in Agda allows it to determine, using types, whether we are
using the constructors of Fin or those of .
The lookup function. The type Fin n is typically used to index elements over
finite sets. For instance, consider the lookup function, which returns the i-th
element of a vector l of length n. Clearly, this function is only well defined when
i < n, otherwise said when i belongs to Fin n. We can define this function as
follows:
lookup : {n : } {A : Set} → Fin n → Vec A n → A
lookup zero (x ∷ l) = x
lookup (suc i) (x ∷ l) = lookup i l
The typing ensures that the index will always be such that the function is well-
defined, i.e. that we will never request an element outside the boundaries of the
vector.
Let us present other possible implementations of this functions, using natural
numbers as the type of i, in order to show that they are more involved and less
CHAPTER 6. AGDA 287
elegant. Since the function is not defined for every natural number i, a first
possibility would be to have a return value of type Maybe A, where nothing
would indicate that the function is not defined:
lookup : → {A : Set} {n : } → Vec A n → Maybe A
lookup zero [] = nothing
lookup zero (x ∷ l) = just x
lookup (suc i) [] = nothing
lookup (suc i) (x ∷ l) = lookup i l
This is quite heavy to use in practice, because we have to account for the
possibility that the function is not defined each time we use it. Another option
could be to add, as argument a proof of i < n, ensuring that the index is not
out of bounds. This is more acceptable in practice, but the definition is not as
direct as the one above:
lookup : {i n : } {A : Set} → i < n → Vec A n → A
lookup {i} {.0} () []
lookup {zero} {.(suc _)} i<n (x ∷ l) = x
lookup {suc i} {.(suc _)} i<n (x ∷ l) = lookup (≤-pred i<n) l
6.5.1 Implication. The first logical connective we have at our disposal is im-
plication, which corresponds to the arrow → in types. For instance, the classical
formulas
A⇒B⇒A (A ⇒ B ⇒ C) ⇒ (A ⇒ B) ⇒ A ⇒ C
→-× : {A B C : Set} → (A → B → C) → (A × B → C)
→-× f (x , y) = f x y
Γ`A Γ`B
(∧I )
Γ`A∧B
This is a general fact: when defining logical connectives with inductive types,
constructors correspond to introduction rules. We see below that the elimination
rule corresponds to the associated induction principle.
Γ, A, B ` P Γ`A∧B
(∧E )
Γ`P
It namely states that if the premises are true then the conclusion is also true.
The dependent induction principle corresponds to the elimination rule in de-
pendent types, as we will see in section 8.3.3.
CHAPTER 6. AGDA 289
Γ`P Γ`>
(>E )
Γ`P
This is not very interesting from a logical point of view: if we know that P holds
and > holds then we can deduce that P holds, which we already knew.
Γ`⊥
(⊥E )
Γ`P
We recall from section 6.4.2 that () is the empty pattern in Agda, which in-
dicates here that there are no cases to handle when matching on a value of
type .
(A ∨ ¬A) ⇒ ¬¬A ⇒ A
Γ`A Γ`B
(∨lI ) (∨rI )
Γ`A∨B Γ`A∨B
and the non-dependent variant of the induction principle to the elimination rule
Γ, A ` P Γ, B ` P Γ`A∨B
(∨E )
Γ`P
6.5.8 Σ-types. Σ-types are a dependent variant of product types, whose ele-
ments are of the form a , b where a is of type A and b is of type B a: the type
of the second component depends on the first component. They are defined in
Data.Product by
data Σ (A : Set) (B : A → Set) : Set where
_,_ : (a : A) → B a → Σ A B
(for technical reasons the actual definition in Agda is done using a record, but
is equivalent to the above one). As for usual products, we can define two pro-
jections by
proj ₁ : {A : Set} {B : A → Set} → Σ A B → A
proj ₁ (a , b) = a
and
proj ₂ : {A : Set} {B : A → Set} → (s : Σ A B) → B (proj ₁ s)
proj ₂ (a , b) = b
Again, in the second projection, note that the returned type depends on the
first component.
{x ∈ A | B(x)}
For instance, in set theory, given a function f : A → B (from a set A to a set B),
its image Im(f ) is the subset of B consisting of elements in the image of f . It
is formally defined as
This immediately translates as a definition in Agda, with two Σ types (one for
the comprehension and one for the universal quantification):
Im : {A B : Set} (f : A → B) → Set
Im {A} {B} f = Σ B (λ y → Σ A (λ x → f x ≡ y))
and one can for instance show that every function f : A → B has a right inverse
(or section) g : Im(f ) → A:
sec : {A B : Set} (f : A → B) → Im f → A
sec f (y , x , p) = x
CHAPTER 6. AGDA 293
The axiom of choice. In a similar vein, the axiom of choice states that for every
relation R ⊆ A × B satisfying
∀x ∈ A.(x, f (x)) ∈ R
Truth values. In classical logic, the set B of booleans is the set of truth values,
i.e. the values in which we evaluate predicates: a predicate on a set A can either
be false or true, and can thus be modeled as a function A → B. In Agda, we
use intuitionistic logic and therefore we are not so much interested in whether a
predicate is true or not, but rather in its proofs, so that the role of truth values
is now played by Set. A predicate P on a type A can thus be seen as a term of
type
A → Set
A×A→B
A→A→B
Inductive predicates. In Agda, we can define types inductively, and these types
can depend on other types (inductive types can have parameters and indices).
This means that we can define predicates by induction! For instance, the pred-
icate on natural numbers of being even can be defined by induction by
data isEven : → Set where
even-z : isEven zero
even-s : {n : } → isEven n → isEven (suc (suc n))
and transitive
CHAPTER 6. AGDA 295
≤-trans : {m n p : } → (m ≤ n) → (n ≤ p) → (m ≤ p)
≤-trans z≤n n≤p = z≤n
≤-trans (s≤s m≤n) (s≤s n≤p) = s≤s (≤-trans m≤n n≤p)
Because of the support in Agda for reasoning by induction (and dependent
pattern matching), this is often the best choice of style for defining predicates,
leading to the simplest proofs, although there are many other possible. In order
to illustrate this, the order on natural numbers could have been defined by
_≤_ : → → Set
m ≤ n = Σ (λ m' → m + m' ≡ n)
which is base on the classical equivalence, for m, n ∈ N,
m 6 n ⇔ ∃m0 ∈ N.m + m0 = n
_≤_ : → → Set
m ≤ n = le m n ≡ true
We leave as an exercise to the reader to show reflexivity and transitivity with
those formalizations.
Finally, as a more involved example, the implicational fragment of intuition-
istic natural deduction is formalized in section 7.2: here, the relation Γ ` A
between a context Γ and a type A, which is true when the sequent is provable,
is defined inductively.
6.6 Equality
Even equality is defined as an inductive type in Agda. The definition is given
in Relation.Binary.PropositionalEquality by
data _≡_ {A : Set} (x : A) : A → Set where
refl : x ≡ x
The equality is typed, in the sense that we can compare only elements of the
same type A. Moreover, there is only one way to show that two elements are
equal: it is when they are the same! Because of dependent pattern matching,
we will see that it is not as dumb as it might seem at first.
6.6.1 Equality and pattern matching. As a first proof with equality, let us
show that the successor function on natural numbers is injective. Otherwise
said, for every natural numbers m and n, we have
m+1=n+1⇒m=n
6.6.2 Main properties. Apart from reflexivity, which is ensured by the con-
structor refl, equality can be shown to be a congruence: it is symmetric, tran-
sitive and compatible with every operation.
sym : {A : Set} {x y : A} → x ≡ y → y ≡ x
sym refl = refl
trans : {A : Set} {x y z : A} → x ≡ y → y ≡ z → x ≡ z
trans refl refl = refl
cong : {A B : Set} (f : A → B) {x y : A} → x ≡ y → f x ≡ f y
cong f refl = refl
Two other important operations on equality are substitutivity which allows to
transport the elements of a type along an equality
subst : {A : Set} (P : A → Set) → {x y : A} → x ≡ y → P x → P y
subst P refl p = p
and coercion which allow to convert an element of type to another equal type
coe : {A B : Set} → A ≡ B → A → B
coe p x = subst (λ A → A) p x
The properties of equality will be discussed again in section 9.1.
The predicate isEven, which indicates whether a natural number is even or not,
was already defined in section 6.5.9 and we can thus formalize our property as
follows
even-half : {n : } → isEven n → Σ (λ m → m + m ≡ n)
even-half even-z = zero , refl
even-half (even-s e) with even-half e
even-half (even-s e) | m , p =
suc m , cong suc (trans (+-suc m m) (cong suc p))
m + (n + 1) = (m + n) + 1
open ≡-Reasoning
Then, one can write a proof of t ₀ ≡ t ₙ in the form
begin t ₀ ≡ P₁ t₁ ≡ P₂ ... ≡ Pₙ tₙ ∎
m + (n + 1) = (m + n) + 1 by +-suc,
= (n + m) + 1 by induction hypothesis.
For comparison, a direct proof of this fact, using the properties of equality of
section 6.6.2, would have been
+-comm : (m n : ) → m + n ≡ n + m
+-comm m zero = +-zero m
+-comm m (suc n) = trans (+-suc m n) (cong suc (+-comm m n))
As usual in Agda, these notations not builtin but defined in the standard library
by
begin_ : {A : Set} {x y : A} → x ≡ y → x ≡ y
begin_ x≡y = x≡y
_≡ _ _ : {A : Set} (x {y z} : A) → x ≡ y → y ≡ z → x ≡ z
_ ≡ x≡y y≡z = trans x≡y y≡z
_∎ : {A : Set} (x : A) → x ≡ x
_∎ _ = refl
6.6.5 Definitional equality. In Agda, two terms which are convertible (i.e. re-
duce to a common term) are considered to be “equal”. The equality we are
referring to here is not ≡, but the equality which is internal to Agda, sometimes
referred to as definitional equality: one cannot distinguish between two defini-
tionally equal terms. For instance, over natural numbers, the term zero + n is
definitionally equal to n, because this is the way we defined addition. Of course,
definitional equality implies equality by refl:
+-zero' : (n : ) → zero + n ≡ n
+-zero' n = refl
On the other side, the terms n + zero and n are not definitionally equal (there
is nothing in the definition of addition which immediately allows to conclude
that). The equality between these two terms can of course be proved, but
requires some more work:
+-zero : (n : ) → n + zero ≡ n
+-zero zero = refl
+-zero (suc n) = cong suc (+-zero n)
Because of this, subtle variations in the definitions, even though they axioma-
tize isomorphic structures, can have a large impact on the length of the proofs,
and one should take care of choosing the “best definition” for a concept, which
requires some practice. Typically, for properties involving multiple natural num-
bers, the choice of the one on which we perform the induction can drastically
change the size of the proof.
CHAPTER 6. AGDA 299
6.6.6 More properties with equality. Having introduced the notion of equal-
ity, we show here some more example of properties involving it, for natural
numbers and lists.
Natural numbers. We can show that zero is not the successor of any natural
number (which is one of the axioms of Presburger and Peano arithmetic, see
section 5.2.4), by a direct use of pattern matching:
zero-suc : {n : } → zero ≡ suc n →
zero-suc ()
Namely, when matching on the argument of type zero ≡ suc n, Agda knows
that there can be no proof of such a type because zero and suc n do not begin
with the same constructor. We can thus use the empty pattern () to indicate
that the pattern matching contains no case to handle. This behavior is detailed
in section 8.4.5.
We can show that addition is associative by a simple induction:
+-assoc : (m n o : ) → (m + n) + o ≡ m + (n + o)
+-assoc zero n o = refl
+-assoc (suc m) n o = cong suc (+-assoc m n o)
Showing that multiplication is associative follows the same pattern, but requires
some algebraic reasoning
*-assoc : (m n o : ) → (m * n) * o ≡ m * (n * o)
*-assoc zero n o = refl
*-assoc (suc m) n o = begin
(m * n + n) * o ≡ *-+-dist-r (m * n) n o
m * n * o + n * o ≡ cong (λ m → m + n * o) (*-assoc m n o)
m * (n * o) + n * o ∎
where we use the fact that multiplication distributes over the addition:
*-+-dist-r : (m n o : ) → (m + n) * o ≡ m * o + n * o
*-+-dist-r zero n o = refl
*-+-dist-r (suc m) n o = begin
(m + n) * o + o ≡ cong (λ n → n + o) (*-+-dist-r m n o)
(m * o + n * o) + o ≡ +-assoc (m * o) (n * o) o
m * o + (n * o + o) ≡ cong (λ n → m * o + n) (+-comm (n * o) o)
m * o + (o + n * o) ≡ sym (+-assoc (m * o) o (n * o))
m * o + o + n * o ∎
++-not-comm :
¬ ({A : Set} → (l l' : List A) → (l ++ l') ≡ (l' ++ l))
++-not-comm f with f (1 ∷ []) (2 ∷ [])
++-not-comm f | ()
We can also show that the concatenation of two lists produces a list whose
length is the sum of the lengths of the original lists:
++-length : {A : Set} → (l l' : List A) →
length (l ++ l') ≡ length l + length l'
++-length [] l' = refl
++-length (x ∷ l) l' = cong suc (++-length l l')
Finally, let us present an all-time classic. We can define a function rev which
reverse the order of the elements of a list: we show that applying this function
twice to a list gets us back to the original list. We begin by introducing a
function snoc (this is cons backwards) which adds an element at the end of a
list:
This proof requires to first show the following auxiliary lemma, stating that
reversing a list l with x as last element will produce a list with x as first element,
followed by the reversal of the rest of the list:
rev-snoc : {A : Set} → (l : List A) → (x : A) →
rev (snoc l x) ≡ x ∷ (rev l)
rev-snoc [] x = refl
rev-snoc (y ∷ l) x = cong (λ l → snoc l y) (rev-snoc l x)
6.6.8 Decidable equality. Recall from section 6.5.6 that a type A is decidable
when either A or ¬A is provable, and we write Dec A for the type of proofs of
decidability of A: such a proof is either yes p, where p is a proof of A, or no
q, where q is a proof of ¬A. A relation on a type A is decidable when the type
R x y is decidable for every elements x and y of type A. The standard library
defines, in the module Relation.Binary, the following predicate:
Decidable : {A : Set} (R : A → A → Set) → Set
Decidable {A} R = (x y : A) → Dec (R x y)
A term of type Decidable R is a proof that the relation R is decidable.
A type A has decidable equality when the equality relation _≡_ on A is
decidable. This means that we have a function (i.e. an algorithm) which is able
to determine, given two elements of A, whether they are equal or not. To be
precise, we not only have the information of whether they are equal or not,
which would be a boolean, but actually a proof of their equality or a proof of
their inequality (see section 8.4.5 for a use of this).
Equality on any finite type is always decidable. For instance, in the case of
booleans:
CHAPTER 6. AGDA 302
which are not the same. Of course, the two types are propositionally equal: we
can namely prove
Vec A ((m + n) + o) ≡ Vec A (m + (n + o))
by
cong (Vec A) (+-assoc m n o)
But the two types are not definitionally equal, which is what is required in order
to compare terms with ≡.
CHAPTER 6. AGDA 303
Proof with standard equality. In order, to perform our comparison, we can use
coe and the above propositional equality in order to cast one of the members
to have the same type as the other one. Namely, the term
coe (cong (Vec A) (+-assoc m n o))
has type
Vec A ((m + n) + o) → Vec A (m + (n + o))
and we can use it to “cast” (l ++ l’) ++ l” in order to change its type to the
same one as l ++ (l’ ++ l”), after which we can compare the two with ≡. We
can finally prove associativity of concatenation of vectors as follows:
++-assoc : {A : Set} {m n o : } →
(l : Vec A m) → (l' : Vec A n) → (l'' : Vec A o) →
coe (cong (Vec A) (+-assoc m n o))
((l ++ l') ++ l'') ≡ l ++ (l' ++ l'')
++-assoc [] l' l'' = refl
++-assoc {_} {suc m} {n} {o} (x ∷ l) l' l'' =
∷-cong x (+-assoc m n o) (++-assoc l l' l'')
The above proof uses the following auxiliary lemma which states that if l and
l’ are propositional equal vectors, up to propositional equality of their types as
above, then x ∷ l and x ∷ l’ are also propositionally equal:
∷-cong : {A : Set} → {m n : } {l : Vec A m} {l' : Vec A n} →
(x : A) → (p : m ≡ n) → coe (cong (Vec A) p) l ≡ l' →
coe (cong (Vec A) (cong suc p)) (x ∷ l) ≡ x ∷ l'
∷-cong x refl refl = refl
As you can observe, the statement of those properties is considerably obscured
by the use of coe, which is used to coerce the type of terms so that they can be
compared to other terms, as explained above.
Proof with heterogeneous equality. In order to overcome this problem, we can use
the heterogeneous equality relation, also sometimes called John Major equality,
which is defined by
data _ _ : {A B : Set} (x : A) (y : B) → Set where
refl : {A : Set} {x : A} → x x
in the module Relation.Binary.HeterogeneousEquality. It is a variant of
propositional equality, which allows comparing two elements x and y of dis-
tinct types A and B. It is however a reasonable notion of equality because the
constructor refl only allows to construct an heterogeneous equality when A
and B are the same. This ability of comparing elements of distinct types allows
formulating and proving in a much easier way the associativity of vectors:
++-assoc : {A : Set} {m n o : }
(l : Vec A m) (l' : Vec A n) (l'' : Vec A o) →
((l ++ l') ++ l'') (l ++ (l' ++ l''))
++-assoc [] l' l'' = refl
++-assoc {_} {suc m} {n} {o} (x ∷ l) l' l'' =
∷-cong x (+-assoc m n o) (++-assoc l l' l'')
CHAPTER 6. AGDA 304
List → List
as usual and then show that this function actually sorts a list, i.e. prove
the proposition
List → Sorted
This example is detailed in section 6.7.2 for the insertion sort algorithm. The
intrinsic approach usually results in shorter code, and is not significantly harder
than the extrinsic one, although it usually requires more thought in order to
formulate the property we want to prove in a way which will give rise to an
elegant proof.
which both defines the concatenation and shows the property we were looking
for at once.
CHAPTER 6. AGDA 306
The insertion sort algorithm then proceeds, in order to sort a given list, by
iteratively inserting all its elements in a list which is initially the empty list. We
write sort(l) for the list obtained in this way:
sort([]) = []
sort(x :: l) = insert(x, sort(l))
Extrinsic approach. The correctness of insertion sort using the extrinsic ap-
proach is shown in figure 6.2. We can define the function insert, to insert an
element in a list, and sort, to sort a list, by a direct translation of the above
definitions (for simplicity, we only handle the case of lists of natural numbers).
Note that the sorting function has the usual type
sort : List → List
In the definition of the insertion function, we use the predicate
_≤?_ : (m n : ) → Dec (m ≤ n)
CHAPTER 6. AGDA 307
≤*-trans : {x y : } → (x ≤ y) → (l : List ) → y ≤* l → x ≤* l
≤*-trans x≤y [] tt = tt
≤*-trans x≤y (z ∷ l) (y≤z , y≤*l) = ≤-trans x≤y y≤z , ≤*-trans x≤y l y≤*l
≤*-insert : {x y : } → (x ≤ y) → (l : List ) →
x ≤* l → x ≤* (insert y l)
≤*-insert x≤y [] tt = x≤y , tt
≤*-insert {x} {y} x≤y (z ∷ l) x≤*zl with y ≤? z
≤*-insert x≤y (z ∷ l) x≤*zl | yes _ = x≤y , x≤*zl
≤*-insert x≤y (z ∷ l) (x≤z , x≤*l) | no _ = x≤z , ≤*-insert x≤y l x≤*l
which shows that the order on natural numbers is decidable, which is proved
similarly as for equality, see section 6.6.8.
Since lists are defined by induction, and all the reasoning about those will
be performed by induction, it is better to define the predicate of being sorted
for a list by induction. In order to do so, we first define a relation 6∗ between
natural numbers and lists such that x 6∗ l whenever x 6 y for every element y
of l, i.e. the elements of l are bounded below by x. This is defined by induction
on l by
– x 6∗ l always holds when l is the empty list,
– x 6∗ (y :: l) whenever x 6 y and x 6∗ l.
We can then define the predicate of being sorted for a list by induction on the
list by
– the empty list is always sorted,
– a list x :: l is sorted whenever x 6∗ l and l is sorted.
Finally, using two easy lemmas involving the relation 6∗ , we can show that
given any number x and list l which is sorted, the list insert(x, l) is also sorted
(this is insert-sorting), from which we can deduced that, for any list l, the
list sort(l) is sorted (this is sorting).
≤*-trans : {x y : } → (x ≤ y) → (l : List ) →
y ≤* l → x ≤* l
≤*-trans x≤y [] tt = tt
≤*-trans x≤y (z ∷ l) (y≤z , y≤*l) =
≤-trans x≤y y≤z , ≤*-trans x≤y l y≤*l
This function always returns the empty list, whichever list is provided as input.
This will not usually be considered as a valid sorting functions, although it fills
the bill. The empty list is, after all, a sorted list. The culprit is not the proof
assistant nor the proof here, but the specification itself: what we expect from a
sorting function is not only to return a sorted list, but also that the returned
list has the same elements as the one given as argument.
Exercise 6.7.3.1. Show that the insertion sort function satisfies the strengthened
specification.
6.8 Termination
6.8.1 Termination and consistency. In order to maintain consistency, Agda
ensures that all the defined functions are terminating, by which we mean that
they will always give a result after a finite amount of time. To understand
why this is required, we can force it to accept a non-terminating function and
this what happens (spoiler: inconsistency). This can be achieved by using the
pragma {-# TERMINATING #-} before the definition of a function, which means
“trust me, this function is terminating”. For instance, the function f defined on
natural numbers by f (n) = f (n + 1) is clearly not terminating. It can be given
the type → , from which it is easy to make the system inconsistent:
{-# TERMINATING #-}
f : →
f n = f (suc n)
absurd :
absurd = f zero
0≡1 : 0 ≡ 1
0≡1 = -elim absurd
Yes, we have managed to prove 0 = 1. If we do not use the pragma, Agda
correctly detects that the function f is problematic and prevents us from defining
it:
6.8.2 Structural recursion. In order to ensure that the programs are termi-
nating, Agda uses a “rough” criterion, which is simple to check and safe, in
the sense that it ensures every accepted program is terminating. This criterion
is that recursive programs must be structurally recursive, meaning that all the
recursive calls must be done on strict subterms of the argument (we say that
the argument is structurally decreasing).
For instance, the following function computes the n-th term of the Fibonacci
sequence, defined by f0 = 0, f1 = 1 and fn+2 = fn+1 + fn :
CHAPTER 6. AGDA 311
fib : →
fib zero = zero
fib (suc zero) = suc zero
fib (suc (suc n)) = fib n + fib (suc n)
In the third case, the argument is suc (suc n), whose strict subterms are suc n
and n, see section 5.1.2. Since the recursive calls are performed with those as
argument, the program is accepted. If we had instead used recursive calls of
one of the following forms then the program would be rejected
– fib (suc (suc n)): the argument suc (suc n) is a subterm of itself, but
not a strict one,
– fib (zero + n): the term zero + n is not a strict subterm of suc (suc
n), i.e. the first does not occur in the second; as you can see the no-
tion of subterm has to be taken purely syntactically here, no reduction is
performed (the fact that zero + n reduces to n is not taken in account).
Multiple arguments. In the case where the function has two arguments (and
this generalizes to multiple arguments), either the first argument must be struc-
turally decreasing (in which case there is no restriction on the second one) or it
should stay the same and the second argument must be structurally decreasing.
Pairs of arguments are thus compared using the lexicographic order, see ap-
pendix A.3.3. For instance, the following Ackermann function is also accepted:
ack : (x y : ) ->
ack zero n = suc n
ack (suc m) zero = ack m (suc zero)
ack (suc m) (suc n) = ack m (ack (suc m) n)
In the second case, the first argument m is a subterm of the first argument suc m
of the function. In the third case, one recursive call is performed with m as
first argument, which is a subterm of the first argument suc m and the second
recursive call is performed with suc n as first argument, which stays unchanged,
and n as second argument, which is a subterm of the second argument suc n.
Programming with total functions. The programming language Agda has one
particularity compared to usual programming languages: since every function
is terminating, all the functions which can be implemented are total. From this
follows the following property.
Theorem 6.8.3.1. In a programming language such as Agda in which all the func-
tions which can be implemented are total, there is a total computable function
which cannot be implemented.
Proof. The idea is that the if all total computable functions were implementable
in Agda then some partial function would also be implementable. Here is a de-
tailed sketch of the proof. The functions f : N → N which can be implemented
in Agda are described by a string, and are therefore countable: we can enu-
merate those and write fi for the i-th implementable function. The function
g : N × N → N such that g(i, n) = fi (n) is also implementable: given an argu-
ment (i, n) the function g enumerates all strings in order and, for each string,
tests whether it is a valid Agda definition of a function of type N → N, until it
finds the i-th such function fi , at which points it returns the evaluation of fi
on the argument n (this would require programming an evaluator of Agda func-
tions in OCaml, which can be done). Suppose that g can be implemented in
Agda (otherwise, we can conclude immediately). Then the function d : N → N
defined by d(n) = g(n, n) + 1 is clearly also implementable. Therefore, there is
an index i such that d = fi and we have
Contradiction.
Functions which cannot be implemented are rare. In practice, all the usual func-
tions that one manipulates can be implemented in Agda. A typical function
which cannot be implemented in Agda is an interpreter for the Agda language
itself.
6.8.4 The number of bits. We have mentioned that the restriction to struc-
turally recursive functions is quite strong and rejects perfectly terminating func-
tions. Let us study an example and see the available workarounds. We consider
the function bits which associates to every natural number n, the number of
bits necessary to write it in base 2. For instance,
where, by convention, log2 (0) = −1. This function satisfies the following recur-
rence relation
6.8.5 The fuel technique. In order to define our function, a general technique
consists in adding new arguments to it, so that the recursive calls are performed
with one of these arguments being structurally decreasing. Typically, we can
add as argument a natural number which will decrease at each call (when the
function is called with suc n, the recursive call is performed with n), provided
that we know in advance a bound on the number of recursive calls (i.e. we also
have to add a proof that this argument will be non-zero so that we can decrease
it). This is called the fuel technique because this natural number can be thought
of as some fuel which we are consuming in order to perform the recursive calls.
For instance, in order to define the bits function, we can add a natural
number fuel as argument to the function, which is to be structurally decreasing:
bits-fuel : (n : ) → (fuel : ) →
bits-fuel zero f = zero
bits-fuel (suc n) zero = ?
bits-fuel (suc n) (suc f) = suc (bits-fuel (div2 (suc n)) f)
In order to overcome the problem encountered in the last case, we have to ensure
that we never “run out of fuel”, i.e. that the fuel is strictly positive when we need
to perform a recursive call. This can be achieved by adding a second additional
argument which ensures an invariant on the fuel which will enforce this. For
instance, we can add the requirement that the fuel is always greater than the
original argument n. When performing a recursive call, we will have to show
that this invariant is preserved: under the hypothesis n + 1 6 f + 1, we have to
show (n + 1)/2 6 f , which can be done as follows:
(n + 1)/2 6 (f + 1)/2 6 f
We thus define
bits-fuel : (n : ) → (fuel : ) → (n ≤ fuel) →
bits-fuel zero f p = zero
bits-fuel (suc n) zero ()
bits-fuel (suc n) (suc f) p = suc (bits-fuel (div2 (suc n)) f n+1/2≤f)
where
n+1/2≤f : div2 (suc n) ≤ f
n+1/2≤f = begin
div2 (suc n) ≤ ≤-div2 p
div2 (suc f) ≤ ≤-div2-suc f
f ∎
This follows the same pattern as the previous definition, excepting that we know
that the problematic case where the original argument is suc n and the fuel is
zero will not happen: by the third argument, we would have suc n ≤ zero,
which is impossible. The code is longer than above because, when performing
the recursive call on div2 n with f as fuel, we have to provide a third argument,
which shows that the invariant is preserved, i.e. div2 n ≤ f holds. This is shown
in the lemma named n+1/2≤f, using two auxiliary lemmas
≤-suc : (n : ) → n ≤ suc n
and
≤-div2-suc : (n : ) → div2 (suc n) ≤ n
whose proof is left to the reader. We can finally define the bits function by
providing, as fuel, a high enough number. For instance n is suitable:
bits : →
bits n = bits-fuel n n ≤-refl
↓0 = ∅ ↓(n + 1) = {n}
for every n ∈ N. If we use the following generic notation for the image of n
under the function associated to r0 and r,
f (n) = rec(n, r0 , r)
we have
which are precisely the rules for the usual recursor associated to natural numbers
in λ-calculus, see section 4.3.6.
The well-founded subterm order. We now explain that the kind of recursion
which can be found in Agda is a particular case of the on given in theorem 6.8.6.6.
Suppose given a first order signature Σ and consider the set TΣ of terms it
generates, see section 5.1.2. Given a term t, we write |t| for its size, defined as
the number of operators it contains:
X
|f (t1 , . . . , tn )| = 1 + |ti | |x| = 0
i
Given two terms s and t, we write s < t when s is a strict subterm of t. Note
that s < t implies |s| < |t|.
Lemma 6.8.6.8. The relation < on terms is well-founded.
Proof. Suppose that there is an infinite sequence of terms ti such that
Accessible elements. Suppose given a set A and a relation R on it, not supposed
to be well-founded. We can define a subset of A, written AccR (A), which is
the largest subset of A on which well-founded induction and recursion works, as
follows.
A subset B ⊆ A is R-closed when, for every x ∈ A, ↓x ⊆ B implies x ∈ B:
if an element has all its predecessors in B then it is also in B. We define the
set AccR (A) as the smallest R-closed subset of A (such a set exists since it can
be obtained as the intersection of all R-closed subsets of A). An element of A
is accessible with respect to R when it belongs to AccR (A).
Theorem 6.8.6.9. A relation R on a set A is well-founded if and only if every
element of A is accessible, i.e. A = AccR (A).
In particular, given a relation R on a set A, the restriction of R to AccR (A) is
always well-founded.
Example 6.8.6.10. In N equipped with the relation ≺ or <, every element is
accessible.
Example 6.8.6.11. On the set Z equipped with the relation <, no element is
accessible.
Example 6.8.6.12. On the set R equipped with the relation ≺ such that x ≺ x+1
for every x in R+ , the set of positive reals, we have Acc≺ (R+ ) = N.
It is used here as a witness that the function is terminating. For instance, the
function bits becomes
bits-wf : (n : ) → Acc _<_ n →
bits-wf zero _ = zero
bits-wf (suc n) (acc a) =
suc
(bits-wf
(div2 (suc n))
(a (div2 (suc n)) (s≤s (≤-div2-suc n))))
In order to perform the recursive call on div2 (suc n), we have to show that
this number is accessible, which is deduced from the fact that div2 (suc n) <
suc n holds, as explained above. Finally, the usual bits function, without the
extra argument, is defined by providing the proof that every natural number is
accessible as second argument (which is precisely the fact that the relation < is
well-founded on natural numbers):
bits : →
bits n = bits-wf n (<-wellFounded n)
(m : ) → m < n →
and provides a function to compute the recursive calls on strictly smaller argu-
ments. We thus obtain the following function:
bits-rec : (n : ) → ((m : ) → m < n → ) →
bits-rec zero r = zero
bits-rec (suc n) r = suc (r (div2 (suc n)) (s≤s (≤-div2-suc n)))
CHAPTER 6. AGDA 321
q = m/n r = m mod n
and they can be computed using the following classical algorithm, here imple-
mented in OCaml:
let rec euclid m n =
if m < n then (0, m) else
let (q, r) = euclid (m - n) n in
(q + 1, r)
(m n : ) → 0 < n → ×
taking m and n as arguments, as well as a proof of 0 < n, and returning the pair
(q , r) as result. Since we perform the induction on the first argument, we are
going to define, by well-founded recursion on m, a function of type
(n : ) → 0 < n → ×
for every natural number m. In order not to have to type this every time, we
define the notation Euclid m for this type:
Euclid : → Set
Euclid m = (n : ) → 0 < n → ×
We can then implement euclidean division by well-founded recurrence, following
the above definition: when computing the result of the division of m and n, we
first check whether m < n holds or not, and provide an answer appropriately,
which requires performing a recursive call in the case m 6< n (which requires
additional code because we now have to provide a proof that m − n < n holds).
The definition is
CHAPTER 6. AGDA 322
div : (m : ) → Euclid m
div m = <-rec Euclid rec m
where
rec : (m : ) → ((m' : ) → m' < m → Euclid m') → Euclid m
rec m f n 0<n with m <? n
rec m f n 0<n | yes m<n = zero , m
rec m f n 0<n | no m≮n with
f (m ∸ n) (m∸n<m m n (<-trans ˡ 0<n (≮ ≥ m≮n)) 0<n) n 0<n
rec m f n 0<n | no m≮n | q , r = suc q , r
and uses the following auxiliary lemma (in addition to those already present in
the standard library):
m∸n<m : (m n : ) → 0 < m → 0 < n → m ∸ n < m
m∸n<m (suc m) (suc n) _ _ = s≤s (n∸m≤n n m)
Finally, it can be shown that this implementation is correct, in the sense that
it satisfies the specification (6.2). Formally, we can show
div-correct :
(m n : ) (0<n : 0 < n) →
m ≡ proj ₁ (div m n 0<n) * n + (proj ₂ (div m n 0<n)) ×
(proj ₂ (div m n 0<n)) < n
However, the proof is not immediate, due to the use of the well-founded induc-
tion and a more satisfactory approach is detailed in next section.
Euclid : → Set
Euclid m = (n : ) → 0 < n →
Σ (λ q → Σ (λ r → m ≡ q * n + r × r < n))
so that euclidean division will have type
(m : ) → Euclid m
– a natural number n,
– a proof of 0 < n,
and will return a dependent 4-uple consisting of
– a natural number q (the quotient),
– a proof of m ≡ q * n + r,
– a proof of r < n,
which is a type theoretic description of the specification (6.2). The implemen-
tation is very similar to the above one, excepting that we now have to return,
in addition to the quotient and the remainder, the two proofs indicated above
which show that those results are correct. The full code is
div : (m : ) → Euclid m
div m = <-rec Euclid rec m
where
rec : (m : ) → ((m' : ) → m' < m → Euclid m') → Euclid m
rec m f n 0<n with m <? n
rec m f n 0<n | yes m<n = zero , m , refl , m<n
rec m f n 0<n | no m≮n with
f (m ∸ n) (m∸n<m m n (<-trans ˡ 0<n (≮ ≥ m≮n)) 0<n) n 0<n
rec m f n 0<n | no m≮n | q , r , e , r<n = suc q , r , lem , r<n
where
lem : m ≡ n + q * n + r
lem = begin
m ≡ sym (m+[n∸m]≡n (≮ ≥ m≮n))
n + (m ∸ n) ≡ cong (λ x → n + x) e
n + (q * n + r) ≡ sym (+-assoc n (q * n) r)
n + q * n + r ∎
Instead of trying to read it, the reader is urged to try this by himself instead.
Inductive definition. For general culture, we shall also mention that it is also
possible to implement euclidean division by structural definition, avoiding the
use of well-founded induction. This is the approach followed in Agda’s standard
library, in Data.Nat.DivMod. The trick consists in adding two extra arguments q
and r’ to the naive function, which will keep track of the quotient and remainder
(or, more precisely, n minus the remainder). Namely, given m and n, we will
perform our definition by induction on m. Initially q is 0 and r’ is n, each time
m is decreased by one,
– if r’ is strictly positive, we decrease it by one,
– if r’ is 0, we increase q by one and reset r’ to n.
Formally, the code follows:
euclid : (m n q r' : ) → ×
euclid zero n q r' = q , n ∸ r'
euclid (suc m) n q zero = euclid m n (suc q) n
euclid (suc m) n q (suc r') = euclid m n q r'
It can be shown that, for every m and n, the result of
euclid m n zero n
computes the quotient and remainder of m by suc n (we consider here suc n in
order to ensure that the denominator is non-zero).
Exercise 6.8.7.1. Show that this function is correct.
Exercise 6.8.7.2. Give an intrinsic inductive definition of euclidean division.
Chapter 7
Formalization of important
results
_<?_ : → → Bool
m <? n with m <? n
(m <? n) | yes _ = true
(m <? n) | no _ = false
We then define the reduction relation as an inductive binary predicate _ _, so
that, given programs p and q, a proof of p q corresponds to a derivation of
` p −→ q using the rules of figure 1.1: we add one constructor to this inductive
predicate for each inference rule.
data _ _ : Prog → Prog → Set where
-Add : (m n : ) →
V (VNat m) + V (VNat n) V (VNat (m + n))
-Add-l : {p p' : Prog} → p p' → (q : Prog) →
p + q p' + q
-Add-r : {q q' : Prog} → (p : Prog) → q q' →
p + q p + q'
-Lt : (m n : ) →
V (VNat m) < V (VNat n) V (VBool (m <? n))
-Lt-l : {p p' : Prog} → p p' → (q : Prog) →
p < q p' < q
-Lt-r : {q q' : Prog} → (p : Prog) → q q' →
p < q p < q'
-If : {p p' : Prog} → p p' → (q r : Prog) →
if p then q else r if p' then q else r
-If-t : (p q : Prog) →
if V (VBool true) then p else q p
-If-f : (p q : Prog) →
if V (VBool false) then p else q q
Typing. We now define the typing system of our language, starting with the
definition of a type which is either a natural number or a boolean:
data Type : Set where
TNat TBool : Type
We then define the typing relation as an inductive binary predicate _∷_, so
that a proof of p ∷ A for a program p and type A corresponds precisely to a
proof of ` p : A using the type inference rules given in section 1.4.3:
data _∷_ : Prog → Type → Set where
-Nat : (n : ) →
V (VNat n) ∷ TNat
-Bool : (b : Bool) →
V (VBool b) ∷ TBool
-Add : {p q : Prog} → p ∷ TNat → q ∷ TNat →
p + q ∷ TNat
-Lt : {p q : Prog} → p ∷ TNat → q ∷ TNat →
p < q ∷ TBool
-If : {p q r : Prog} {A : Type} →
p ∷ TBool → q ∷ A → r∷A →
if p then q else r ∷ A
CHAPTER 7. FORMALIZATION OF IMPORTANT RESULTS 326
For instance, in the first case (the program is a natural number), Agda infers
that its type is necessarily TNat and therefore A and A’ must be equal (to TNat).
Subject reduction. The subject reduction theorem (theorem 1.4.3.2) states that
if a program p reduces to p0 and p admits the type A, then p0 also admits the
type A. The proof is most easily done by induction on the derivation of p −→ p0 :
sred : {p p' : Prog} {A : Type} → (p p') → p∷A → p' ∷ A
Progress. The last important property of our typed language is progress (theo-
rem 1.4.3.3) which states that a typable program is either a value or reduces to
some other program. Given a program p which admits a type A, the proof is
performed on the derivation of ` p : A:
prgs : {p : Prog} {A : Type} → p∷A →
Σ Value (λ v → p ≡ V v) Σ Prog (λ p' → p p')
We could also have formalized contexts as lists of formulas, but the above for-
malization allows for a slightly more natural notation. We write Γ ,, Δ for the
concatenation of two contexts Γ and Δ:
_,,_ : Context → Context → Context
Γ ,, ε = Γ
Γ ,, (Δ , A) = (Γ ,, Δ) , A
Γ`B
(ax) (wk)
Γ, A ` A Γ, A ` B
Admissible rules. This can be used to show that the usual rules are admissible.
For instance, we can prove that the contraction rule
Γ, A, A, Γ0 ` B
(contr)
Γ, A, Γ0 ` B
is admissible (see section 2.2.7) by induction, both on the context Γ0 and on the
proof of the premise, by
CHAPTER 7. FORMALIZATION OF IMPORTANT RESULTS 329
Γ`A Γ, A, Γ0 ` B
Γ, Γ0 ` B
is shown by
cut : ∀ {Γ A B} → ∀ Γ' → Γ A → Γ , A ,, Γ' B → Γ ,, Γ' B
cut ε p ax = p
cut ε p (wk q) = q
cut ε p ( E q r) = E (cut ε p q) (cut ε p r)
cut ε p ( I q) = I (cut (ε , _) p q)
cut (Γ' , A) p ax = ax
cut (Γ' , A) p (wk q) = wk (cut Γ' p q)
cut (Γ' , A) p ( E q r) = E (cut (Γ' , A) p q) (cut (Γ' , A) p r)
cut (Γ' , A) p ( I q) = I (cut (Γ' , A , _) p q)
7.3.1 Naive approach. We can first think of directly translating the definition
of λ-terms given in section 3.1. We suppose fixed an infinite set of variables (say,
the strings),
Var : Set
Var = String
and define the syntax of λ-terms as
data Tm : Set where
var : Var → Tm
_·_ : Tm → Tm → Tm
ƛ_,_ : Var → Tm → Tm
meaning that a term is either of the form var x (the variable x), or t · u (the
application of t to u) or ƛ x , t (the function which to x associates t). The
CHAPTER 7. FORMALIZATION OF IMPORTANT RESULTS 330
weird choice of symbols in the last case comes from the fact that the dot (.)
and lambda (λ) are reserved in Agda.
We could proceed in this way, but one should remember that λ-terms are
not terms generated by the above syntax, but rather of the quotient under
α-equivalence (section 3.1.3). This means that we will have to define this equiv-
alence and show that all the constructions we are going to make are compatible
with it. This is rather long and painful.
Exercise 7.3.1.1. Try to properly define β-reduction with this formalization.
Lifting. The next thing we want to do is define β-reduction, but before being
able to do this, we first need to introduce helper functions in order to explicitly
manipulate variables, following section 3.6.2.
The first one is lifting which can be thought of as creating a fresh variable
numbered x. After performing this operation, all the variable indices y which
are greater to x have to be increased by one in order to make room for x. The
new index of y after the creation of x is written ↑x y and defined by
(
y if y < x,
↑x y =
y + 1 if y > x.
The function is not defined when y = x, because we have supposed that the
variable x is not used. In Agda, this can be defined as
↓ : (x y : ) → x ≢ y →
↓ zero zero ¬p = -elim (¬p refl)
↓ zero (suc y) ¬p = y
↓ (suc x) zero ¬p = zero
↓ (suc x) (suc y) ¬p = suc (↓ x y (λ p → ¬p (cong suc p)))
and we write ↓ x y p for ↓x y: in addition to x and y, the Agda function takes
a third argument p which is a proof that x is different from y.
The above lifting operation can be extended to λ-terms. Given a variable x
and a λ-term t, the term ↑x t obtained after creating a fresh variable x will be
written here wk x t, because it is thought of as some form for weakening for the
term t. The weakening function wk is defined here by
wk : → Tm → Tm
wk x (var y) = var (↑ x y)
wk x (t · t') = wk x t · wk x t'
wk x (ƛ t) = ƛ (wk (suc x) t)
This definition uses lifting on variables, recursively applies weakening for ap-
plications and abstractions. There is a subtlety for the last case: since the
abstraction binds the variable 0 in a term t, a variable x in λ.t corresponds to
a the variable x + 1 in t, which explains why we have to increase by one the
weakened variable when going under abstractions.
7.3.3 Keeping track of free variables. As a side note, let us present a re-
finement of the above formalization. Since the implementation of λ-calculus
with de Bruijn indices is quite technical and error-prone, it is sometimes useful
to have the most precise type as possible, in order to detect errors early. One
way to do so is to keep track of the free variables used in a term. Instead of
defining the type Tm of all terms, we can define, for each natural number n,
the type Tm n of terms whose free variables x are natural numbers such that
0 6 x < n. This last constraint is conveniently described by requiring that x is
an element of type Fin n, see section 6.4.8. This refinement of the formalization
avoids unnoticingly getting the wrong names for free variables and allows for
reasoning by induction on the number of free variables of terms. We thus define
terms as
data Tm (n : ) : Set where
var : Fin n → Tm n
_·_ : Tm n → Tm n → Tm n
ƛ_ : Tm (suc n) → Tm n
In the last case, the term t should have at least one free variable, so that its
type is of the form Fin (suc n), and will have one less free variable since one
variable was bound, so that the return type is Fin n.
Most previous functions can be adapted directly to this setting, so that we
only give the refined types for those. The type now makes it clear that lifting
inserts a fresh variable
↑ : {n : } → Fin (suc n) → Fin n → Fin (suc n)
as well as does weakening
The rest of the developments can be performed in this way. We did not present
those here because they are more cumbersome to perform: in all the proofs,
we have to show that the number of free variables is correctly handled. The
formalization of section 7.5 is also quite close to this one: here, in addition to
keeping track of the number of variables, we will also keep track of their type.
-sub x ( v y) ru with x ≟ y
-sub x ( v y) ru | yes p = ru
-sub x ( v y) ru | no ¬p = v (↓ x y ¬p)
-sub x ( a rt ₁ rt ₂ ) ru = a ( -sub x rt ₁ ru) ( -sub x rt ₂ ru)
-sub x ( λ rt) ru = λ ( -sub (suc x) rt ( -wk 0 ru))
-sub x ( β {t} {t'} {u} {u'} rt ₁ rt ₂ ) ru =
subst ₂ _ _ refl
(sym (sub-sub t' u' _ 0 x z≤n))
( β ( -sub (suc x) rt ₁ ( -wk 0 ru)) ( -sub x rt ₂ ru))
This function itself uses two auxiliary lemmas. The first one states that reduc-
tion is compatible with weakening:
-wk : {t t' : Tm} (x : ) → t t' → wk x t wk x t'
↑x ↑y z = ↑y+1 ↑x z
↓x ↓y z = ↓y ↓x+1 z
↑x ↑y t = ↑y+1 ↑x t
CHAPTER 7. FORMALIZATION OF IMPORTANT RESULTS 336
*→ * : {t u : Tm} → t * u → t * u
*→ * ε = ε
*→ * (r rr) = → r *→ * rr
Conversely, we can show that iterated parallel β-reduction implies iterated
β-reduction (see lemma 3.4.3.3, formal proof is left to the reader):
*→ * : {t u : Tm} → t * u → t * u
We can finally use this to deduce the confluence β-reduction (theorem 3.4.4.1)
from the one of parallel β-reduction shown above:
-confl : {t u v : Tm} →
t * u → t * v → Σ Tm (λ w → u * w × v * w)
-confl rr ss with -confl ( *→ * rr) ( *→ * ss)
... | w , ss' , rr' = w , *→ * ss' , *→ * rr'
and
ilam : CL → Tm
ilam (var x) = var x
ilam (T · T') = ilam T · ilam T'
ilam S = ƛ ƛ ƛ (var 2 · var 0 · (var 1 · var 0))
ilam K = ƛ ƛ var (suc 0)
ilam I = ƛ (var 0)
and show the various lemmas expressing preservation of reduction such as lemma 3.6.3.7:
Types. We suppose fixed an infinite countable set of type variables, say the
natural numbers:
TVar : Set
TVar =
and the types are inductively to be defined as type variables of arrows between
types:
where the constructors of the inductive type correspond to the typing rules of
simply typed λ-calculus given in section 4.1.4. In this formalization, we are
right in the middle of the Curry-Howard correspondence: a proof that Γ ` A is
derivable is precisely a λ-term t of type A in the context Γ. In the constructor
corresponding to variables, we use Γ A, which is the type of proofs that a type
A belongs to Γ: such a proof essentially consists of a natural number n such that
the n-th element of Γ is A. Formally, it can be defined as follows:
data _ _ : Ctxt → Type → Set where
zero : ∀ {Γ A} → (Γ , A) A
suc : ∀ {Γ B A} → Γ A → (Γ , B) A
Note that this corresponds to identifying variables by their de Bruijn index in
the context.
Γ`t:A
(wk)
Γ, x : B ` t : A
and thus need to show that this rule is admissible. A naive approach would
consist in trying to show the following corresponding lemma:
wk : ∀ {Γ A B} → Γ A → Γ , B A
However, we cannot manage to prove it because the induction hypothesis is not
strong enough in the case of abstraction: we have to show
Γ, x : B, y : A ` t : A0
(wk)
Γ, x : B ` λy A .t : A → A0
and we cannot use the induction hypothesis on the premise because the weak-
ened variable x is not in the last position in the context. In order to overcome
this problem, we could prove the following generalization of the weakening rule:
Γ, ∆ ` t : A
(wk)
Γ, x : B, ∆ ` t : A
It will turn out equally easy and more natural to prove the following even more
general version:
Γ`t:A
(wk)
∆`t:A
whenever Γ is obtained by from ∆ by removing multiple typed variables, what
we write Γ ⊆ ∆ (this corresponds to performing at once multiple weakening
rules in the previous version). We thus define the “inclusion” relation between
contexts as
data _⊆_ : Ctxt → Ctxt → Set where
∅⊆∅ : ∅ ⊆ ∅
keep : ∀ {Γ Δ A} → Γ ⊆ Δ → Γ , A ⊆ Δ , A
drop : ∀ {Γ Δ A} → Γ ⊆ Δ → Γ ⊆ Δ , A
CHAPTER 7. FORMALIZATION OF IMPORTANT RESULTS 341
wk-var : ∀ {Γ Δ A} → Γ ⊆ Δ → Γ A → Δ A
wk-var (keep i) zero = zero
wk-var (keep i) (suc x) = suc (wk-var i x)
wk-var (drop i) x = suc (wk-var i x)
Finally, we can show that the first weakening rule considered above can be
deduced as the particular cases where the inclusion is of the form Γ ⊆ Γ, A:
wk-last : ∀ {Γ A B} → Γ A → Γ , B A
wk-last t = wk (drop ⊆-refl) t
where ⊆-refl is a proof that inclusion is reflexive:
⊆-refl : ∀ {Γ} → Γ ⊆ Γ
⊆-refl {∅} = ∅⊆∅
⊆-refl {Γ , A} = keep ⊆-refl
l : ∀ {A B} {t t' : Γ A B} → t t' → (u : Γ A) →
t · u t' · u
r : ∀ {A B} (t : Γ A B) → {u u' : Γ A} → u u' →
t · u t · u'
λ : ∀ {A B} {t t' : Γ , A B} → t t' →
ƛ t ƛ t'
where we use substitution in the first case, as indicated above.
Strong normalizability. We first have to define what it means for the reduction
relation to be halting, or strongly normalizing. A term t is halting when there
is no infinite reduction starting from it. It is however generally a bad idea to
define concepts by negation, because we lose constructivity, and we will not
directly adopt this definition. Instead, we will define by induction a that a
term t is halting whenever all the terms it can reduce to are themselves halting:
halts : ∀ {Γ A} → Γ A → Set
halts t = Acc _ _ t
where the opposite of the reduction relation is
_ _ : ∀ {Γ A} → Γ A → Γ A → Set
u t = t u
_ *_ : ∀ {Γ A} → Γ A → Γ A → Set
_ *_ = Star _ _
Given a halting term t, the reduction relation is terminating on terms u such
that t * u. We can thus reason by well-founded induction on it, i.e. we have
the following induction principle:
we deduce that t must also be halting. The last step is taken care of by the
lemma halts-vapp, which states that if the term t x is terminating then t is also
terminating:
halts-vapp : ∀ {Γ A B} (t : Γ A B) → (x : Γ A) →
halts (t · var x) → halts t
The (easy) proof is left to the reader. We have to confess that we have been
cheating in the above proof: there is no reason that we should have a variable x
such that Γ ` x : A, unless A belongs to Γ, which nothing guarantees here (this
was not a problem in the proof of proposition 4.2.2.1, because we were working
in an untyped setting). In our Agda proof, we have simply been postulating the
existence of such a variable:
postulate x? : ∀ {Γ A} → Γ A
Of course, this is wrong, but can be mitigated in two ways. First, if we had
a more full fledged programming language with data types (natural numbers,
booleans, strings, etc.), we could prove that every program of a type which
does not contain type variables is terminating in the same way, by using values
instead of variables, and this would cover most cases of interest. For instance,
supposing that we have a type N of natural numbers, we can show that t ∈ RN→N
because, by induction hypothesis, we have that t 5 is terminating and reason as
above. Another way to solve the problem is to change slightly the proof of
the second case of CR1. Suppose that t ∈ RΓ`A , by weakening we have that
Γ, x : A ` t : A, and now we have the variable x such that Γ, x : A ` x : A: by
induction hypothesis we have that t x is halting, therefore the weakening of t is
terminating, and therefore t is terminating. In practice, this makes the proof
much more delicate, because we have to explicitly deal with matters related to
weakening: in Agda, the weakening of t is not the same as t. Moreover, the
definition of reducibility candidates has to be slightly generalized in order to
take weakening in account and have the right induction hypothesis [Sak14]:
R : {Γ : Ctxt} {A : Type} (t : Γ A) → Set ∋
R {Γ} {X _} t = halts t
R {Γ} {A B} t = {Γ' : Ctxt} {u : Γ' A} → ∋
(i : Γ ⊆ Γ') → R u → R (wk i t · u)
Strong normalization. Finally, we can deduce that simply typed λ-terms are
strongly normalizing by following section 4.2.2. We do not detail the proofs
here. Lemma 4.2.2.2 can be formalized as
R-abs : ∀ {Γ A B} (t : Γ , A B) →
((u : Γ A) → R (t [ u /0])) → R (ƛ t)
lemma 4.2.2.3 as
R-sub : ∀ {Γ A} (t : Γ A)
(σ : ∀ {B} → Γ B → Γ B) →
(∀ {B} → (x : Γ B) → R (σ x)) → R (t [ σ ])
the adequacy proposition 4.2.2.4 as
R-all : ∀ {Γ A} (t : Γ A) → R t
CHAPTER 7. FORMALIZATION OF IMPORTANT RESULTS 345
Terms. Our actual formalization is inspired of [Arn17]. We use the same defi-
nitions as above for types, contexts, and λ-terms. Inspired of the notation for
bidirectional typechecking (section 4.4.5), we write Γ A (resp. Γ A) for the
type of normal forms (resp. neutral terms) of type A in the context Γ, defined
as the following inductive types:
data _ _ : Ctxt → Type → Set
data _ _ : Ctxt → Type → Set
data _ _ where
abs : ∀ {Γ A B} → Γ , A B → Γ A B
neu : ∀ {Γ A} → Γ A → Γ A
data _ _ where
var : ∀ {Γ A} → Γ A → Γ A
app : ∀ {Γ A B} → Γ A B → Γ A → Γ B
CHAPTER 7. FORMALIZATION OF IMPORTANT RESULTS 348
Note that those are not characterized here by a predicate on terms as before,
but rather implemented as a new inductive type. For this reason, we need to
implement again substitution on those types:
_ [_] : ∀ {Γ Δ A} → Γ A → Γ ⊆ Δ → Δ A
_ [_] : ∀ {Γ Δ A} → Γ A → Γ ⊆ Δ → Δ A
abs t [ σ ] = abs (t [ keep σ ])
neu t [ σ ] = neu (t [ σ ])
var x [ σ ] = var (x v[ σ ])
app t u [ σ ] = app (t [ σ ]) (u [ σ ])
where the case of variables is handled by
_v[_] : ∀ {Γ Δ A} → Γ A → Γ ⊆ Δ → Δ A
zero v[ keep σ ] = zero
suc x v[ keep σ ] = suc (x v[ σ ])
x v[ drop σ ] = suc (x v[ σ ])
Reflection and reification. The reflection and reification functions are defined
by mutual induction by following their definition given above. We also need to
define a function Var which is the variable corresponding to the last element of
the context in the set JΓ ` AK:
Var : ∀ {Γ A} → Γ , A A
↑ : ∀ {Γ A} → Γ A → Γ A
↓ : ∀ {Γ A} → Γ A → Γ A
↑ {Γ} {X i} t = t
↑ {Γ} {A B} t σ u = ↑ (app (t [ σ ]) (↓ u))
↓ {Γ} {X i} t = neu t
↓ {Γ} {A B} f = abs (↓ (f (drop ⊆-refl) Var))
These are the environments adapted to terms whose free variables are in Γ. We
can define the interpretation of terms following the above definition by
_ : ∀ {Γ Δ A} → Γ A → Δ * Γ → Δ A
var x ρ = ρ x
t · u ρ = ( t ρ) ⊆-refl ( u ρ)
ƛ t ρ = λ σ u → t ((λ x → wk* σ (ρ x)) ,* u)
In this definition, we have used the following auxiliary function, which extends
an environment with a new value
_,*_ : ∀ {Γ Δ A} → Δ * Γ → Δ A → Δ * (Γ , A)
(ρ ,* t) zero = t
(ρ ,* t) (suc x) = ρ x
as well as the following weakening principle for sets of normal forms:
wk* : ∀ {Γ Δ A} → Γ ⊆ Δ → Γ A → Δ A
wk* {Γ} {Δ} {X i} σ t = t [ σ ]
wk* {Γ} {Δ} {A B} σ f = λ τ t → f (⊆-trans σ τ) t
normalize : ∀ {Γ A} → Γ A → Γ A
normalize t = ↓ ( t id*)
where the trivial environment is
id* : ∀ {Γ} → Γ * Γ
id* x = ↑ (var x)
An example. For instance, we can define the term t = (λx.λy.x) x whose type
is x : X0 ` t : X1 ⇒ X0 by
K : ∅ , X 0 X 0 X 1 X 0
K = ƛ (ƛ var (suc zero))
V : ∅ , X 0 X 0
V = var zero
t : ∅ , X 0 X 1 X 0
t = K · V
If we ask Agda to compute (i.e. normalize) the normalized term normalize t,
we obtain
Handling η-conversion. The above algorithm can be used in order to test whether
two λ-terms are β-convertible: in order to know whether t and u are convert-
ible, we simply need to look whether their respective normal forms t̂ and û are
equal. However this does not work if we want to test for βη-convertibility. For
instance, the terms t = λx.λy.x y and u = λx.x, of type (X → Y ) → X → Y
are η-convertible and in normal form: we have t̂ = t 6= u = û. In order to
overcome this problem, a nice solution consists in slightly tightening the notion
of normal form we consider, and require that terms with an arrow type should
be abstractions: normal forms satisfying this are called η-long normal forms. In
the definition of normal forms, this amounts to allow considering neutral terms
as normal ones, only when they have base types (and not arrow types), i.e. the
definition of normal forms becomes
data _ _ where
abs : ∀ {Γ A B} → Γ , A B → Γ A B
neu : ∀ {Γ i} → Γ X i → Γ X i
Exercise 7.5.3.1. Modify the above normalization by evaluation algorithm in
order to compute η-long normal forms.
Chapter 8
We now introduce the logic we have seen at work in Agda. The type theory that
we are presenting here was originally introduced by Martin-Löf in 1972 [ML75,
ML82, ML98], most of Martin-Löf’s work being freely accessible at [ML]. Its
types are said to be dependent because they can depend on values. For instance,
we can define a type Vec n of lists of length n, which depends on the natural
number n. Another major feature of this type theory is that we can manipulate
types as any other data: for instance, we can define functions which create types
from other types, etc. In order to make this possible, the distinction between
types and terms is dropped: types are simply the terms which admit a particular
type, called “Type”. Making all this work together nicely requires quite some
care.
The core of the type theory is presented in section 8.1, universes being
properly added in section 8.2, other usual type constructors in section 8.3 and
inductive types in section 8.4. The ways a dependent proof assistant can be
implement is discussed in section 8.5
In the following, we keep the old habit of writing t and A for expressions thought
of as terms and as types, even though we cannot syntactically distinguish be-
tween both. The expressions can be read as follows:
– x: a term or a type variable,
– t u: application of a term to a term (or a type),
– λxA .t: the function (the λ-term) which to an element x of type A asso-
ciates t,
– Π(x : A).B: the type of (dependent) functions from A to B,
– Type: the type of all types.
In Agda notation, Π(x : A).B is written (x : A) → B and Type is written Set.
CHAPTER 8. DEPENDENT TYPE THEORY 352
FV(x) = {x}
FV(t u) = FV(t) ∪ FV(u)
FV(λxA .t) = FV(A) ∪ (FV(t) \ {x})
FV(Π(x : A).B) = FV(A) ∪ (FV(B) \ {x})
FV(Type) = ∅
x[u/x] = u
y[u/x] = y if x 6= y
(t t0 )[u/x] = (t[u/x]) (t0 [u/x])
(λy A .t)[u/x] = λy A[u/x] .t[u/x] with y 6∈ FV(u) ∪ {x}
(Π(y : A).B)[u/x] = Π(y : A[u/x]).B[u/x] with y 6∈ FV(u) ∪ {x}
Type [u/x] = Type
Γ = x 1 : A1 , . . . , x n : An
where the xi are variables and the Ai are expressions. We sometimes write ∅
for the empty context, although we usually omit writing it. The set of free
variables of a context is defined by
n
[
FV(Γ) = FV(Ai )
i=1
l : Vec n, n : Nat
CHAPTER 8. DEPENDENT TYPE THEORY 353
8.1.5 Sequents. In order to take all of this in account, we need to have three
different forms of judgments in the sequent calculus:
– Γ ` means that Γ is a well-formed context,
– Γ ` t : A means that t has type A in the context Γ,
– Γ ` t = u : A means that t and u are equal (i.e. convertible) terms of
type A in the context Γ.
As usual, we will give rules which allow the derivation of those judgments
through derivation trees. The derivation rules for all these three kinds of judg-
ments mutually depend on each other, so that they all have to be defined at
once.
As indicated above, there is no syntactic distinction between terms and
types: both are expressions. The logic will however allow us to distinguish
between the two. An expression A for which Γ ` A : Type is derivable for some
context Γ is called a type. An expression t for which Γ ` t : A is derivable for
some context Γ and type A is called a term.
CHAPTER 8. DEPENDENT TYPE THEORY 354
8.1.6 Rules for contexts. There are two rules for contexts:
Γ ` A : Type
∅` Γ, x : A `
The first one states that the empty context ∅ is always well-formed. The second
one states that if A is a well-formed type in a context Γ, then Γ, x : A is a well-
formed context. In the second rule, one would expect that we require that Γ is
a well-formed context as a premise, as in
Γ` Γ ` A : Type
Γ, x : A `
but we will see in section 8.1.11 that from the premise Γ ` A : Type, we
will actually be able to deduce that Γ is a well-formed context (and similar
observations could be made on subsequent rules). As indicated above, the reason
why we need to ensure that A is a well-formed type in the context Γ is to avoid
considering a context such as
n : Bool, l : Vec n
8.1.7 Rules for equality. We now give the rules for definitional equality.
First, we have three rules ensuring that equality is an equivalence relation,
by respectively imposing reflexivity, symmetry and transitivity:
We will need that the definitional equality is not only an equivalence relation,
but a congruence: rules expressing compatibility with type constructors will be
added later on for each type constructor.
Finally, we add rules expressing the fact that a type can be substituted by
an equal one in a typing derivation:
8.1.8 Axiom rule. We now turn to rules allowing the typing of a term. The
axiom rule is
Γ, x : A, Γ0 `
(ax)
Γ, x : A, Γ0 ` x : A
with the following side conditions:
– x 6∈ dom(Γ0 ), and
– FV(A) ∩ dom(Γ0 ) = ∅.
We follow the convention that a variable always refers to the rightmost occur-
rence of the variable in a context. With this in mind, the side conditions avoid
clearly wrong derivations such as
(ax) (ax)
x : A, x : B ` x : A n : Nat, l : Vec n, n : Bool ` l : Vec n
Alternatively, we could use the convention that the variables declared in a con-
text are always distinct, which we can always do because we consider terms up
to α-conversion, although this is a bad habit because we do not want to spend
our time performing α-conversions when implementing a proof assistant.
8.1.9 Terms and rules for type constructors. We now give the rules for
Π-types, which are generalized function types. As for any type constructor in
this type theory, we will need to have three constructions for expressions:
8.1.10 Rules for Π-types. The Π-types are dependent function types: they
are like the plain old function types, excepting that the type of the result might
depend on the argument. Such a type is noted
Π(x : A).B
(x : A) → B
and should be read as the type of functions taking an argument x of type A and
returning a value of type B. Here, the variable x might occur in the type B,
i.e. the type B can depend on x. For instance, a function taking a natural
number n as argument and returning a vector of length n will have the type
see section 6.4.7 actual uses of such functions. In a Π-type as above, the vari-
able x is bound in the type B, and we can rename bound variables. For instance,
the previous type is α-equivalent to Π(m : Nat). Vec m. From a logical point of
view, a type Π(x : A).B, can be read as a universal quantification
∀x ∈ A.B
Γ ` A : Type Γ, x : A ` B : Type
(ΠF )
Γ ` Π(x : A).B : Type
and allows constructing a type Π(x : A).B whenever A and B are well-formed
types.
Γ, x : A ` t : B Γ`u:A
A
(ΠC )
Γ ` (λx .t) u = t[u/x] : B
Γ ` t : Π(x : A).B
(ΠU )
Γ ` t = λxA .t x : Π(x : A).B
Γ ` A = A0 : Type Γ, x : A ` B = B 0 : Type
Γ ` Π(x : A).B = Π(x : A0 ).B 0 : Type
Γ ` A = A0 : Type Γ, x : A ` B : Type Γ, x : A ` t = t0 : B
0
Γ ` λxA .t = λxA .t0 : Π(x : A).B
and
Γ ` t = t0 : Π(x : A).B Γ ` u = u0 : A
Γ ` t u = t0 u0 : B[t/x]
They express the expected compatibility of equality with all the constructors
for expressions: Π-types, λ-abstractions, and applications. We will generally
omit the congruence rules in the following, but they should be formulated in a
similar way for every constructor.
Example 8.1.10.1. The polymorphic identity function, which takes a type A and
returns the identity function from A to A can be typed as follows:
..
.
A : Type, x : A `
(ax)
A : Type, x : A ` x : A
(ΠI )
A : Type ` λxA .x : Π(x : A).A
(ΠI )
` λAType .λxA .x : Π(A : Type).Π(x : A).A
CHAPTER 8. DEPENDENT TYPE THEORY 358
Arrow types. The traditional arrow type A → B can be recovered as the partic-
ular case of a Π-type Π(x : A).B which is not dependent, meaning that x does
not occur as a free variable in B. We thus write
where “_” is a variable name which is supposed to never occur in any type; in
particular, we always have B[t/_] = B. It can be checked that all the rules give
back the usual ones, up to notations. For instance, (ΠE ) allows us to recover
the elimination rule:
Γ`t:A→B Γ`u:A
(→E )
Γ ` tu : B
8.1.11 Admissible rules. Many basic properties of the logical system can be
expressed as the admissibility of some rules, some of which we now indicate. We
concentrate on typing rules, i.e. judgments of the form Γ ` t : A, but similar
admissible rules can usually be formulated for the two other kinds of judgments:
well-formation of contexts (Γ `) and convertibility (Γ ` t = u : A), details being
left to the reader. The proofs are, as usual, performed by induction on the
derivation of the judgment in the premise.
Before stating those, we first make the following simple, but useful, obser-
vation:
Lemma 8.1.11.1. For every derivable sequent Γ ` t : A, we have the inclusions
FV(t) ⊆ dom(Γ) and FV(A) ⊆ dom(Γ).
Basic checks. The rules ensure that only well-formed types and contexts can be
manipulated at any point in a proof. This can be formulated as the admissibility
of the following rules:
Γ`t:A Γ`t:A
Γ` Γ ` A : Type
Weakening rule. The following weakening rule is admissible, accounting for the
fact that if some typing judgment holds in some context, it also holds with more
hypothesis in the context.
Γ ` A : Type Γ, Γ0 ` t : B
(wk)
Γ, x : A, Γ0 ` t : B
Exchange rule. The exchange rule states that we can swap two entries x : A and
y : B in a context, provided that there is no dependency between them, i.e. B
does not have x as free variable:
Γ ` B : Type Γ, x : A, y : B, ∆ ` t : C
Γ, y : B, x : A, ∆ ` t : C
Here, the hypothesis Γ ` B : Type ensures that B does not depend on x by
lemma 8.1.11.1.
Cut rule. The type theory has the cut elimination property, which corresponds
to the admissibility of the following rule:
Γ`t:A Γ, x : A, ∆ ` u : B
(cut)
Γ, ∆[t/x] ` u[t/x] : B[t/x]
see sections 2.3.3 and 4.1.8.
8.2 Universes
8.2.1 The type of Type. There is one missing thing in the type theory we have
given up to now. Everything should have a type in the sequent we manipulate,
but the constant Type does not, because there is no rule allowing us to do so.
For instance, in order to type the polymorphic identity in example 8.1.10.1, we
have show that the context
A : Type, x : A ` x : A
is well-formed, which will at some point require showing
` Type : Type
which we have no rule to derive for now.
There is an obvious candidate for the rule we are lacking: we are tempted
to add the rule
Γ`
Γ ` Type : Type
which is sometimes called the type-in-type rule. This rule was in fact present
in the original Martin-Löf type system, but Girard showed that the resulting
system was inconsistent [Gir72]. A variant of this proof is presented below.
Encoding finite sets in OCaml. As a starter let’s first see how to implement
finite sets in OCaml. A finite set
A = {a1 , . . . , an }
whose elements ai belong to some fixed type ’a, can be described by giving its
elements: we can encode it as an array of elements. We thus define the type of
sets of elements of ’a as
CHAPTER 8. DEPENDENT TYPE THEORY 360
Encoding set theory in type theory. We can play the same game in type theory
and define finite sets of elements in a type A in the same way. Instead of using
arrays however, it is more natural to encode a finite set as a function of type
Fin n → A
(we recall that Fin n is the type whose elements are (isomorphic to) natural
numbers from 0 to n − 1). We can thus define
data finset (A : Set) : Set where
Finset : {n : } → (Fin n → A) → finset A
In order to define “sets” of elements of A, instead of finite ones, we can allow
indexing by any type instead of Fin n. Finally, we can encode sets (in the sense
of type theory) as sets of sets. This suggests the following encoding of sets
data U : Set ₁ where
set : (I : Set) → (I → U) → U
which is due to Aczel [Acz78, Wer97]: a set consists of a type I of indices and a
function which assigns a set to every element of I. In order to avoid confusion
with the notation Set of Agda, we write U for the type of our sets.
With this encoding the usual constructions can be performed. For instance,
we can define the empty set:
∅ : U
∅ = set (λ ())
the pairing of two sets:
_,_ : (A B : U) → U
A , B = set Bool (λ {false → A ; true → B})
the product of two sets:
prod : (A B : U) → U
prod (set I f) (set J g) =
set (I × J) (λ { (i , j) → f i , g j })
the equality of two sets (which implements the extensionality axiom):
_==_ : (A B : U) → Set
set I f == set J g =
((i : I) → Σ J (λ j → f i == g j)) ∧
((j : J) → Σ I (λ i → f i == g j))
the membership relation:
_∈_ : (A B : U) → Set
A ∈ set I f = Σ I (λ i → A == f i)
the union of sets (which implements the axiom of union):
CHAPTER 8. DEPENDENT TYPE THEORY 362
_ : (A : U) → U
set I f =
set (Σ I (λ i → dom (f i))) (λ { (i , j) → F (f i) j })
F : (A : U) → dom A → U
F (set _ f) = f
the von Neumann natural numbers (which implements the axiom of infinity):
vonN : → U
vonN .zero = ∅
vonN ( .suc n) = vonN n , [ vonN n ]
Nat : U
Nat = set vonN
and so on.
The Russell paradox. Now, suppose that we accept this type-in-type rule which
tells us that Type has type Type. This behavior can be achieved in Agda by
using the flag
{-# OPTIONS --type-in-type #-}
at the beginning of the file. As before, we define sets as
regular : U → Set
regular A = ¬ (A ∈ A)
and consider Russell’s paradoxical set R of all sets which do not contain them-
selves, see section 5.3.1:
R : U
R = set (Σ U (λ A → regular A)) proj ₁
CHAPTER 8. DEPENDENT TYPE THEORY 363
comp : {A B C : Ord} {b : ∥ B ∥} {c : ∥ C ∥}
(f : BEmb A B b) (g : BEmb B C c) →
BEmb A C (fun (emb g) b)
whose proof is left to the reader. If we suppose that we have the type-in-type
rule with
CHAPTER 8. DEPENDENT TYPE THEORY 366
empty : {A : Set} → U → A
empty (c f) = empty (f (c (λ z → z)))
absurd : {A : Set} → A
absurd = empty (c (λ z → z))
8.2.4 The hierarchy of universes. How should we fix this? If we think of the
situation we already faced when considering naive set theory, the explanation
was that the collection of all sets was “too big” to be a set. Similarly, we think
of Type as being “too big” to be a type. However, we still need to give it a
type, and the natural next move is to introduce a new constructor, say TYPE,
which is the type of “big types”, together with the rule
Γ`
Γ ` Type : TYPE
stating that Type is a big type. However, we now need to give a type to TYPE,
which forces us to introduce a type of “very big types” and so on.
In the end, we introduce a hierarchy of types Typei indexed by natural
numbers i ∈ N, together with the rule
Γ`
Γ ` Typei : Typei+1
for every i ∈ N. The type Type is simply a notation for Type0 , Type1 is the
type of “big types”, Type2 is the type of “very big types”, Type3 is the type of
“very very big types”, and so on:
The types Typei are called universes and i is called the level of the universe
Typei . In order to make the theory more manageable, we also add a cumulativity
rule
Γ ` A : Typei
Γ ` A : Typei+1
which states that a “small” type can always be seen as a “bigger” type. This
allows us to see a type in a given universe as a type in a universe of higher
level, so that all constructions can be cast in to higher levels if necessary and
we do not have to precisely take care of the levels. Finally, we change all the
CHAPTER 8. DEPENDENT TYPE THEORY 368
type formation rules by adding levels to occurrences of Type. For instance, the
formation rule for Π-types becomes
Γ ` A : Typei Γ, x : A ` B : Typei
(ΠF )
Γ ` Π(x : A).B : Typei
Universes in Agda. In Agda, Set is a notation for Type0 , Set ₁ is a notation for
Type1 and so on. For instance, we can define the type of predicates on a type A
as
Predicate : (A : Set) → Set ₁
Predicate A = A → Set
Cumulative universes. Systems like Coq have the cumulativity rule builtin, but
systems such as Agda chose not to, mostly for technical reasons. Since we don’t
have it, the type formation rules now have to allow constructors to have different
levels, and for instance the formation rule for Π-types has to be changed to
Γ ` A : Typei Γ, x : A ` B : Typej
(ΠF )
Γ ` Π(x : A).B : Typemax(i,j)
can compute the maximum t of two levels, we actually rather give it the type
Typei → Typeitj , which also ensures that the returned level is greater than the
one given as input. The definition is performed based on the observation that
the lifted type should have the “same” elements as the original one, which can
be expressed by the following inductive type:
data Lift {i} j (A : Set i) : Set (i j) where
lift : A → Lift j A
8.3.1 Empty type. For the empty type, or falsity, we add the following two
constructions to expressions
e ::= . . . | ⊥ | bot(e, x 7→ e0 )
The type ⊥ is the type for falsity, which is empty, and the construction
bot(t, x 7→ A)
Γ`
(⊥F )
Γ ` ⊥ : Type
Γ`t:⊥ Γ, x : ⊥ ` A : Type
(>E )
Γ ` bot(t, x 7→ A) : A[t/x]
CHAPTER 8. DEPENDENT TYPE THEORY 371
Computation. No rule.
Elimination. No rule.
8.3.2 Unit type. For the unit type, or truth, we add the following construc-
tions to expressions:
where > is the type for truth, ? is the constructor for truth and
top(t, x 7→ A, u)
Formation.
Γ`
(>F )
Γ ` > : Type
Introduction.
Γ`
(>I )
Γ`?:>
Elimination.
Γ`t:> Γ, x : > ` A : Type Γ ` u : A[?/x]
(>E )
Γ ` top(t, x 7→ A, u) : A[t/x]
Computation.
Uniqueness.
Γ`t:>
(>U )
Γ`t=?:>
In OCaml. The type > corresponds to unit, the constructor ? to (), the elimi-
nator top(t, x 7→ A, u) to
match t with
| () -> u
the computation rule says that
match () with
| () -> u
evaluates to u, and uniqueness says that () is the only value of type unit.
CHAPTER 8. DEPENDENT TYPE THEORY 372
8.3.3 Products. For the product, or conjunction, of two types, we add the
following constructions to expressions:
e ::= . . . | e × e0 | he, e0 i | unpair(e, x 7→ e0 , hy, zi 7→ e00 )
The type A × B is the product of A and B (it is sometimes also written A ∧ B).
The term ht, ui is the pair of two terms t and u and
unpair(t, z 7→ A, hx, yi 7→ u)
eliminates a pair t, extracting its components x and y, in order to construct a
proof u whose type is A which might depend on t as z.
Formation.
Γ ` A : Type Γ ` B : Type
(×F )
Γ ` A × B : Type
Introduction.
Γ`t:A Γ`u:B
(×I )
Γ ` ht, ui : A × B
Elimination.
Γ`t:A×B
Γ, z : A × B ` C : Type Γ, x : A, y : B ` u : C[hx, yi/z]
(×E )
Γ ` unpair(t, z 7→ C, hx, yi 7→ u) : C[t/z]
Computation.
Γ`t:A
Γ`u:B Γ, z : A × B ` C : Type Γ, x : A, y : B ` v : C[hx, yi/z]
(×C )
Γ ` unpair(ht, ui, z 7→ C, hx, yi 7→ v) = v[t/x, u/y] : C[ht, ui/z]
Uniqueness.
Γ`t:A×B
(×U )
Γ ` unpair(t, z 7→ A × B, hx, yi 7→ hx, yi) = t : A × B
Σ(x : A).B
and the elements of this type are the pairs (t, u) consisting of a term t of type
A and a term u of type B[t/x]. From a logical point of view, this corresponds
to an existential quantification
∃x ∈ A.B
Formation.
Γ ` A : Type Γ, x : A ` B : Type
(ΣF )
Γ ` Σ(x : A).B : Type
Introduction.
Γ, x : A ` B : Type Γ`t:A Γ ` u : B[t/x]
(ΣI )
Γ ` ht, ui : Σ(x : A).B
Elimination.
Γ ` t : Σ(x : A).B
Γ, z : Σ(x : A).B ` C : Type Γ, x : A, y : B ` u : C[hx, yi/z]
(ΠE )
Γ ` unpair(t, z 7→ C, hx, yi 7→ u) : C[t/z]
Computation.
Γ`t:A Γ ` u : B[t/x]
Γ, z : Σ(x : A).B ` C : Type Γ, x : A, y : B ` v : C[hx, yi/z]
(ΠC )
Γ ` unpair(ht, ui, z 7→ C, hx, yi 7→ v) = v[t/x, u/y] : C[ht, ui/z]
Uniqueness.
Γ ` t : Σ(x : A).B
(ΠU )
Γ ` unpair(t, z →
7 Σ(x : A).B, hx, yi 7→ hx, yi) = t : Σ(x : A).B
and a product is a particular case of this where the family is constant (i.e. ni = n
for every index i).
The type A+B is the coproduct of A and B, which logically corresponds to their
disjunction. The elements of this type are either a term t of A, written ιB
l (t), or
a term u of B, written ιAr (u), and the eliminator case(t, z 7→ C, x 7→ u, y 7→ v)
eliminates t to construct a term of type C (which might depend on t as x) by
considering whether it is of the first or the second form, in which case u or v is
returned.
Formation.
Γ ` A : Type Γ ` B : Type
(+F )
Γ ` A + B : Type
Introduction.
Γ`t:A Γ ` B : Type Γ ` A : Type Γ`t:B
(+lI ) (+rI )
Γ` ιB
l (t) :A+B Γ` ιA
r (t) :A+B
Elimination.
Γ`t:A+B Γ, z : A + B ` C : Type
Γ, x : A ` u : C[ιB
l (x)/z] Γ, y : B ` v : C[ιA
r (y)/z]
(+E )
Γ ` case(t, z 7→ C, x 7→ u, y 7→ v) : C[t/z]
Computation.
Uniqueness.
Γ`t:A+B
(+U )
Γ ` case(t, z 7→ A + B, x 7→ ιB A
l (x), y 7→ ιr (y)) = t : A + B
case(t, z 7→ C, x 7→ u, y 7→ v)
to
match t with
| Left x -> u
| Right y -> v
The left computation rule says that
match Left t with
| Left x -> u x
| Right y -> v y
reduces to u x (and similarly for the right one) and the uniqueness rule says
that
match t with
| Left x -> Left x
| Right y -> Right y
is the same as t.
In Agda. The standard notation for + is and the notations for ιl and ιr are
respectively inj ₁ and inj ₂ , see section 6.5.6.
Formation.
Γ`
(BoolF )
Γ ` Bool : Type
Introduction.
Γ` Γ`
(Bool1I ) (Bool0I )
Γ ` 1 : Bool Γ ` 0 : Bool
CHAPTER 8. DEPENDENT TYPE THEORY 376
Elimination.
Γ ` t : Bool
Γ, x : Bool ` A : Type Γ ` u : A[1/x] Γ ` v : A[0/x]
(BoolE )
Γ ` ite(t, x 7→ A, u, v) : A[t/x]
Computation.
Γ, x : Bool ` A : Type Γ ` u : A[1/x] Γ ` v : A[0/x]
(Bool1C )
Γ ` ite(1, x 7→ A, u, v) = v : A[1/x]
Uniqueness.
Γ ` t : Bool
(BoolU )
Γ ` ite(t, x 7→ Bool, 1, 0) = t : Bool
In OCaml. Bool corresponds to the type bool, 1 and 0 correspond to true and
false respectively and the eliminator ite(x, A 7→ u, v, t) corresponds to
if t then u else v
The computation rule says that
if true then u else v
reduces to u and that
if false then u else v
reduces to v, and the uniqueness rule says that
if t then true else false
is the same as t.
8.3.7 Natural numbers. For natural numbers, we add the following construc-
tions to expressions:
where Nat is the type of natural numbers, Z is zero, S(t) is the successor of t
and rec(z, A 7→ u, x, yv 7→ t) is the induction principle on t: u is the base case
and t is the inductive case.
Formation.
Γ`
(NatF )
Γ ` Nat : Type
Introduction.
Γ` Γ ` t : Nat
(NatZI ) (NatSI )
Γ ` Z : Nat Γ ` S(t) : Nat
CHAPTER 8. DEPENDENT TYPE THEORY 377
Elimination.
Γ ` t : Nat Γ, x : Nat ` A : Type
Γ ` u : A[Z /x] Γ, x : Nat, y : A ` v : A[S(x)/x]
(NatE )
Γ ` rec(t, x 7→ A, u, xy 7→ v) : A[t/x]
Computation.
Γ, x : Nat ` A : Type
Γ ` u : A[Z /x] Γ, x : Nat, y : A ` v : A[S(x)/x]
(NatZC )
Γ ` rec(Z, x 7→ A, u, xy 7→ v) = u : A[Z /x]
Uniqueness.
Γ ` t : Nat
(NatU )
Γ ` rec(t, x 7→ Nat, Z, xy 7→ S(y)) = t : Nat
reduces to u and
ind (S t)
reduces v where x has been replaced by t and y by ind t, and the uniqueness
rule says that
8.3.8 Other type constructors. There are two fundamental type construc-
tions which were not given in this section: inductive types are presented in
section 8.4 and identity types are presented in section 9.1.
Finite families of types. Suppose that our type theory contains the type ⊥ with 0
element (section 8.3.1), the type > with 1 element (section 8.3.2) and coproducts
(section 8.3.5). Given a natural number n, we can build a type Finn with n
elements as
Finn = > + > + . . . + >
the sum being ⊥ in the case n = 0. For instance, the type Fin4 with 4 elements
is
Fin4 = > + (> + (> + >))
A typical element of this type is ιr (ιr (ιl (?))), but we will simplify the notations
and write 0, 1, 2 and 3 for its elements. In Agda, we have already encountered
CHAPTER 8. DEPENDENT TYPE THEORY 380
this type in section 6.4.8. It can be noted that, given a type A, defining a
function f : Finn → A precisely amounts to specifying n elements of A, those
elements being f (0), . . . , f (n − 1).
W-types. Now that we have made the previous remark, we can reformulate our
definition of inductive types using types with finite number of elements instead
of natural numbers. A polynomial type consists of
– a type A with n elements, for some natural number n,
– for every element x of type A, a type B(x) with nx elements for some
natural number nx .
A : Type B : A → Type
such that A = Finn for some natural number n and, for every x : A, we have
B(x) = Finnx for some natural number nx . It turns out that this restriction to
the case where A and B(x) are finite types is not very useful in the following,
so that we will drop it. Having an infinite type A (e.g. natural numbers) corre-
sponds to having an infinite number of constructors, which seems worrying at
first, but we will see that it is actually reasonable and useful.
Given a type A, and a type B which might have x has free variable, we write
W(x : A).B
for the inductive type defined by this data and call it a W-type. Again, this
should be thought of as an inductive type with a constructor for each element x
of type A, this constructor taking as many arguments as there are elements
in B(x). The constructor W is binding x in B, and α-conversion allows us to
rename it as we want.
Example 8.4.1.1. The type of binary trees can be defined by
We now wonder what the terms of type W(x : A).B look like. Consider the
type of binary trees as defined in Agda above. A typical element of this type is
which consists of a the constructor node, applied to two binary trees: the trees
node leaf (node leaf leaf) and leaf. More generally, an element of the type
W(x : A).B consists of
– a constructor, i.e. an element a of A, and
– n elements of W(x : A).B, where n is the number of elements of the type
B a, which is most naturally specified by giving a function B a → W(x : A).B.
CHAPTER 8. DEPENDENT TYPE THEORY 381
Encoding into W-types. The class of types which we can handle looks quite
restricted because the arguments of constructors can only be of the W-type
itself. It is actually not, thanks to the extra generality brought by the possibility
of having arbitrary type as A and B(x), and not only finite types. For instance,
the type of lists
data List (A : Set) : Set where
nil : List A
cons : A → List A → List A
is not obviously a W-type because the constructor cons takes an argument of
type A, whereas we are trying to define List A, and thus the arguments of
constructors should have this type. However, instead of thinking of cons as one
constructor, we can think of it as an infinite family of constructors cons a, one
for each element a of A, each of which is taking one argument of type List A. In
this way, it is natural to take Maybe A as the type of constructors where nothing
corresponds to the constructor nil and just a corresponds to cons a, and we
define
List : (A : Set) → Set
List A = W (Maybe A) (λ { nothing → ; (just x) → })
CHAPTER 8. DEPENDENT TYPE THEORY 382
8.4.2 Rules for W-types. In order to add support for W-types, one should
add the following constructions to expressions:
Formation.
Γ, x : A ` B : Type
(WF )
Γ ` W(x : A).B : Type
Introduction.
Γ`t:A Γ ` u : B[t/x] → W(x : A).B
(WI )
Γ ` sup(t, u) : W(x : A).B
Elimination.
Γ ` t : W(x : A).B Γ, x : W(x : A).B ` C : Type
Γ, x : A, y : B → W(x : A).B, z : Π(w : B).C[(y w)/x] ` u : C[sup(x, y)/x]
(WE )
Γ ` Wrec(t, x 7→ C, xyz 7→ u) : C[t/x]
Computation.
Γ`t:A Γ, x : W(x : A).B ` C : Type Γ ` u : B[t/x] → W(x : A).B
Γ, x : A, y : B → W(x : A).B, z : Π(w : B).C[(y w)/x] ` v : C[sup(x, y)/x]
(WC )
Γ ` Wrec(sup(t, u), x 7→ C, xyz 7→ v) = v[t/x, u/y, λw. Wrec(u w, x 7→ C, xyz 7→ v)/z] : C[sup(t, u)/x]
8.4.3 More inductive types. W-types are very fine if you want to perform a
clean and easy implementation of inductive types, or want to study metatheo-
retic properties of types. In practice, provers have more involved implementa-
tions of inductive types. One reason is user-friendliness: we want to be able to
give nice names for constructors, have a nice syntax for pattern matching, gen-
erate pattern-matching cases automatically, etc. Also, we do not want the user
to have to explicitly encode his types into W-types, and more generally we want
to implement extensions of W-types. The interested reader is advised to look
at good descriptions of actual inductive types in Agda [Nor07], in Coq [PM93]
or theory [Dyb94]. We list below some common extensions of inductive types.
so that Fin n is a type with n elements. Here, the type takes a natural number n
as argument, and various values for this argument are needed for constructors,
e.g. suc needs an argument of type Fin n to produce a Fin (n+1).
The definition of W-types can be modified in order to account for indices as
follows. We only give here the implementation in Agda:
data W (I : Set) (A : I → Set) (B : (i : I) → A i → I → Set) : I → Set
where
sup : (i : I) (a : A i) → ((j : I) → B i a j → W I A B j) → W I A B i
In this type, I is the type for indices, A i is the type indicating the constructors
with index i, and B a j indicates the number of arguments of index j of the
constructor a.
Example 8.4.3.1. For instance, in the case of Fin,
– I is the type of natural numbers,
– A 0 is the empty type (there is no constructor for Fin 0) and, for i > 0,
A i is the type Bool with two elements (there are two constructors for
Fin i: respectively zero and suc),
– for indices i and j,
– the constructor zero of type Fin j takes zero argument of type Fin i
– the constructor suc of type Fin j takes one argument of type Fin i
when suc i is j, and zero argument otherwise,
which determines the types B i a j.
We thus define the type A as
A : → Set
A zero =
A (suc n) = Bool
the type B as
B : (n : ) → A n → → Set
B (suc n) false m =
B (suc n) true m with n ≟ m
B (suc n) true m | yes _ =
B (suc n) true m | no _ =
and finally, the type of finite sets as
Fin : → Set
Fin n = W A B n
Exercise 8.4.3.2. Define the types Vec A n of vectors of length n containing
elements of type A using indexed W-types.
CHAPTER 8. DEPENDENT TYPE THEORY 384
Mutually inductive types. One might want to define two inductive types which
mutually depend on each other. For instance, trees and forest can be defined in
a mutually inductive fashion as follows:
data Tree : Set
data Forest : Set
Nested inductive types. One might want to define inductive types in which argu-
ments are other inductive types applied to the type itself. For instance, trees can
also be defined as nodes taking lists of trees as argument, lists being themselves
defined as an inductive types:
open import Data.List
Coinductive types. Inductive types are defined as a smallest fixpoint, see sec-
tion 1.3.3. For instance, the type of natural numbers is the smallest type con-
taining zero and closed under successor. It is also possible to consider greatest
fixpoints, and the resulting types are called coinductive types.
8.4.4 The positivity condition. When adding more general forms of induc-
tive types, one should be very careful. Adding seemingly useful or natural
inductive types can make the system inconsistent.
loop : Term
loop = app ω ω
where ω is defined as
ω : Term
ω = abs (λ x → app x x)
which contradicts the postulate that all terms should be terminating in Agda.
Indeed, if we consider the small variation where we define terms
CHAPTER 8. DEPENDENT TYPE THEORY 386
The positivity condition. In practice, when defining the type Bad in Agda, we
get an error message stating that
Bad is not strictly positive, because it occurs to the left of an
arrow in the type of the constructor bad in the definition of Bad.
This message indicates that our type is rejected, thus preventing the logic from
being inconsistent, because it does not satisfies the “strict positivity condition”
explained below. In order to test the above examples, you can however disable
this check by writing
{-# NO_POSITIVITY_CHECK #-}
just before the definition of the type.
In order to build intuition, first consider traditional functions between sets.
We write A ⇒ B for the set of all functions from a set A to a set B. Given sets
A, B and B 0 , it can be noted that
B ⊆ B0 implies (A ⇒ B) ⊆ (A ⇒ B 0 )
A ⊆ A0 implies (A ⇒ B) ⊇ (A0 ⇒ B)
A, B ::= X | A → B
A → ((B → C) → (D → E))
the types A, C and D are negative and B and E is positive. The syntactic tree
of the type can be written as follows
→+
A− →+
→− →+
B+ C− D− E+
-- Polarities
data Polarity : Set where
pos : Polarity
neg : Polarity
-- Opposite of a polarity
op : Polarity → Polarity
op pos = neg
op neg = pos
-- Types
data Type : Set where
var : → Type
arr : Type → Type → Type
– B → C is negative, thus
– B is positive,
the programming language (OCaml in our case). Moreover, it allows for an effi-
cient implementation of convertibility: instead of fully performing β-reduction,
we can compute weak head normal forms, so that we can potentially detect
when two terms are not equal without fully reducing them.
type expr =
| Var of string (** a variable *)
| Abs of string * expr (** a lambda-abstraction *)
| App of expr * expr (** an application *)
| Pi of string * expr * expr (** a Pi-type *)
| Type of int (** a universe *)
| Nat (** type of natural numbers *)
| Zero (** zero *)
| Succ of expr (** successor *)
| Rec of expr * expr * expr * expr (** recurrence *)
| Id of expr * expr * expr (** identity type *)
| Refl of expr (** reflexivity *)
| J of expr * expr * expr * expr * expr * expr (** id elim *)
An expression is thus either a variable, a λ-abstraction
a universe of given level, the type of natural numbers, zero, the successor of a
natural number, the recurrence principle
rec(n, x 7→ A, z, mr 7→ s)
written
Rec(n, Abs(x, a), z, Abs(m, Abs(r, s)))
(note that we use abstractions as arguments of Rec in order to avoid having
to handle α-conversion here, and only take care of it for abstractions, see sec-
tion 4.3.3) and identity type
a reflexivity proof
refl(t) written Refl(t)
or a J eliminator
J(e, xye 7→ A, x 7→ r)
written
Values. A term will evaluate to a value which is, by definition, a term which
does not reduce anymore. The type corresponding to values is
type value =
| VAbs of (value -> value) (** a lambda-abstraction *)
| VPi of value * (value -> value) (** a Pi-type *)
| VType of int (** a universe *)
| VNat (** type of natural numbers *)
| VZero (** zero *)
| VSucc of value (** successor *)
| VId of value * value * value (** identity type *)
| VRefl of value (** reflexivity *)
| VNeutral of neutral (** a neutral value *)
which roughly corresponds to the definition of expressions, with a few notable
differences, as we now explain. For abstractions (VAbs), the body is not yet
evaluated, because we are computing weak head normal forms: instead, we have
a function which given an argument, will compute the normal form of the body
with the argument substituted as expected. Similarly, a Π-type Π(x : A).B
is stored in VPi as the type A and the function λxA .B, which provides the
type B given the argument of type A. The last case corresponds to neutral
values: those are expressions in which the computation is not fully performed,
but is stuck because we do not know the value for some variable. For instance,
given a variable x and a term t, the term x t is a value: in order to evaluate
this application, we would need to know the value for x, which should be a
λ-abstraction. Neutral values are defined by the type
and neutral =
| NVar of string
| NApp of neutral * value
| NRec of neutral * value * value * value
| NJ of value * value * value * value * value * neutral
and thus consist either of a variable, or a neutral value applied to a value
(typically, x t), or a recurrence on a neutral value (e.g. a recurrence on a variable)
or an elimination of a neutral proof of identity.
8.5.2 Evaluation. We can then easily write a function which applies a value t
to another value u. In the case t is an abstraction, we apply it to u. Otherwise,
if we assume that the terms are suitably typed, t has to be a neutral value
(e.g. we cannot apply a natural number to some other term), in which case the
result is still a neutral value:
let vapp u v =
match u with
| VAbs f -> f v
| VNeutral t -> VNeutral (NApp (t, v))
| _ -> assert false
Thanks to this helper function, we can write a function eval which evaluates
an expression t to a value. The function also take an environment env, which
is a list of pairs associating to a free variable its value, in the case it is known.
CHAPTER 8. DEPENDENT TYPE THEORY 393
will be α-convertible when they have the same readback. We can therefore test
the equality of two values t and u with the following function:
More efficient equality. The above test for equality of values is not very effi-
cient: it essentially requires evaluating the whole term, which can be very costly,
whereas this is unnecessary when the two terms are not equal. For instance,
the two terms VAbs f and VZero are not equal, and there is no need to proceed
to the evaluation of f in order to determine this. The following refined test for
equality takes this into account: it combines both readback and comparison,
and amounts to compute the weak head normal forms of the two terms (see
section 3.5.1) in order to compare them, and only evaluate under abstractions
if the two weak head normal forms are abstractions.
let rec veq k t u =
let rec neq k t u =
match t, u with
| NVar x, NVar y -> x = y
| NApp (t, v), NApp (t', v') ->
neq k t t' && veq k v v'
| NRec (n, a, z, s), NRec (n', a', z', s') ->
neq k n n' && veq k a a' && veq k z z' && veq k s s'
| NJ (a, p, r, t, u, e), NJ (a', p', r', t', u', e') ->
veq k a a' && veq k p p' && veq k r r' &&
veq k t t' && veq k u u' && neq k e e'
| _, _ -> false
in
match t, u with
| VAbs f, VAbs g ->
let x = var (fresh k) in
veq (k+1) (f x) (g x)
| VPi (a, b), VPi (a', b') ->
let x = var (fresh k) in
veq k a a' && veq (k+1) (b x) (b' x)
| VType i, VType j -> i = j
| VNeutral t, VNeutral u -> neq k t u
| VNat, VNat -> true
| VZero, VZero -> true
| VSucc t, VSucc u -> veq k t u
| VId (a, t, u), VId (a', t', u') ->
veq k a a' && veq k t t' && veq k u u'
| VRefl t, VRefl u -> veq k t u
| _, _ -> false
The helper function neq compares neutral values for equality.
Exercise 8.5.3.1. Modify this function in order to compare values for η-equi-
valence. You should start by adding a new argument to the function which is
the common type of the two values. See also exercise 7.5.3.1.
CHAPTER 8. DEPENDENT TYPE THEORY 396
This function is defined by mutual induction with a function which checks that
an expression is a type and returns its universe level:
and universe k tenv env t =
match infer k tenv env t with
| VType i -> i
| _ -> raise Type_error
and with a function which checks that a term t has a given type a:
and check k tenv env t a : unit =
match t, a with
| Abs (x, t), VPi (a, b) ->
let y = var (fresh k) in
check (k+1) ((x,a)::tenv) ((x,y)::env) t (b y)
| Refl t, VId (_, u, v) ->
let t = eval env t in
if not (veq k t u) then raise Type_error;
if not (veq k t v) then raise Type_error
CHAPTER 8. DEPENDENT TYPE THEORY 398
| t, a ->
let a' = infer k tenv env t in
if not (veq k a a') then raise Type_error
Note that the case where the term is an abstraction λx.t (constructor Abs) and
the type is a Π-type Π(y : A).B (constructor VPi) is subtle: when checking that
the body t has type B, we do so by after replacing both x and y by a fresh
variable name.
8.5.5 Testing. In order to test our implementation, we can check that the
addition has the type A = Nat → Nat → Nat:
let () =
let a = varr VNat (varr VNat VNat) in
let t =
Abs (
"m",
Rec (
Var "m",
Pi ("_", Nat, Pi ("_", Nat, Nat)),
Abs ("n", Var "n"),
Abs ("m",
Abs ("r",
Abs ("n", Succ (App (Var "r", Var "n")))))
)
)
in
check 0 [] [] t a
Of course, it is not reasonable to proceed in this way in order to use the im-
plementation and one should implement a proper lexer and parser. We do not
describe this part here since it is out of the scope of this book.
Chapter 9
+-zero' : (n : ) → zero + n ≡ n
+-zero' n = refl
whereas the right unitality is more involved
+-zero : (n : ) → n + zero ≡ n
+-zero zero = refl
+-zero (suc n) = cong suc (+-zero n)
In this chapter, we mostly focus on propositional equality, leaving the def-
initional one implicit as it should be, and sometimes simply say equality for
propositional equality. The propositional equality is also referred to as identity
and a type t ≡ u as an identity type.
9.1.3 The rules. The rules for propositional equality, or identity types, follow
from the above definition of equality as an inductive type, but can also be
given directly, as for other connectives. These were first formulated by Martin-
Löf [ML75].
We extend the syntax of expressions with
e ::= . . . | Ide (e0 , e00 ) | refl(e) | J(e, xyz 7→ e0 , x0 7→ e00 )
The new constructions are the following:
– the type IdA (t, u) is called an identity type and expresses the fact that two
terms t and u of type A are equal,
– refl(t) is the reflexivity of t, and
– J is the eliminator for identities.
In the following, we will often simply write t ≡ u instead of IdA (t, u), in accor-
dance with Agda’s notation for equality types.
Formation. The formation rule states that we can consider the type of propo-
sitional equalities, or identities, between any two terms t and u of the same
type:
Γ`t:A Γ`u:A
(IdF )
Γ ` IdA (t, u) : Type
Computation. The computation rule expresses the fact that, when we use a
proof constructed by J in the case where the considered proof of identity is
reflexivity, we recover the proof r we provided:
Γ`t:A Γ, x : A, y : A, z : IdA (x, y) ` B : Type
Γ, x : A ` r : B[x/x, x/y, refl(x)/z]
(IdC )
Γ ` J(refl(t), xyz 7→ B, x 7→ r) = r[t/x] : B[t/x, t/y, refl(t)/z]
CHAPTER 9. HOMOTOPY TYPE THEORY 402
Γ ` p : IdA (t, u)
(IdU )
Γ ` J(p, xyz 7→ IdA (x, y), x 7→ refl(x)) = p : IdA (t, u)
9.1.4 Leibniz equality. The definition of equality given above is not the first
one one might think of. Another definition which is perhaps easier to accept
was proposed by Leibniz [Lei86]. In this context, two things are said to be
identical ⇒ indiscernible
indiscernible ⇒ identical
is called the principle of identity of indiscernibles: it states that two things satis-
fying the same properties are the same. This is somehow an “interactive” point
of view on the world, considering that in order for two things to be distinct, there
should be some sort of experiment which allows distinguishing between the two.
Leibniz postulate that both principles hold, i.e. the two notions are equivalent.
The reference often quoted for the second principle is the following [Lei86]:
(which goes on with assertions such as “On peut même dire que toute substance
porte en quelque façon le caractere de la sagesse infinie et de la toute puissance
de Dieu, et l’imite autant qu’elle en est susceptible.” which are less clear from a
logical point of view). If we also accept this implication, then we can in fact take
indiscernability as a definition for equality. This is sometimes called Leibnitz
equality:
Definition 9.1.4.1 (Leibniz equality). Two things are equal when there every
property satisfied by one is also satisfied by the other.
.
We write x = y when x and y are equal according to Leibniz definition, i.e. when
for every predicate P (z), with a free variable z, we have
≡-to-≐ : {A : Set} {x y : A} → x ≡ y → x ≐ y
≡-to-≐ refl = ≐-refl
.
and the converse implication can be obtained as the variant of the proof that =
is symmetric:
≐-to-≡ : {A : Set} {x y : A} → x ≐ y → x ≡ y
≐-to-≡ {x = x} e = e (λ z → x ≡ z) refl
More details can be found in [ACD+ 18].
Extensional equality on pairs. Two pairs are extensionally equal when their
members are equal. It is easy to show that two extensionally equal pairs are
equal:
×-≡ : {A B : Set} {x x' : A} {y y' : B} →
x ≡ x' → y ≡ y' → (x , y) ≡ (x' , y')
×-≡ refl refl = refl
and conversely, two equal pairs are extensionally so:
≡-× : {A B : Set} {x x' : A} {y y' : B} →
(x , y) ≡ (x' , y') → (x ≡ x') × (y ≡ y')
≡-× refl = refl , refl
Extensional equality on lists. Similarly, two lists are extensionally equal when
they have the same (i.e. equal) elements. In Agda, this relation can be defined
inductively by
data _==_ {A : Set} : (l l' : List A) → Set where
==-[] : [] == []
==-∷ : {x x' : A} {l l' : List A} →
x ≡ x' → l == l' → (x ∷ l) == (x' ∷ l')
This relation is easily shown to be reflexive by induction
==-refl : {A : Set} (l : List A) → l == l
==-refl [] = ==-[]
==-refl (x ∷ l) = ==-∷ refl (==-refl l)
from which one can show that equality implies extensional equality:
≡-== : {A : Set} {l l' : List A} → l ≡ l' → l == l'
≡-== {l = l} refl = ==-refl l
Conversely, one can show that two lists with the same head and the same tail
are equal:
≡-∷ : {A : Set} → {x x' : A} → {l l' : List A} →
x ≡ x' → l ≡ l' → x ∷ l ≡ x' ∷ l'
≡-∷ refl refl = refl
CHAPTER 9. HOMOTOPY TYPE THEORY 405
from which one can deduce that two extensionally equal lists are equal:
==-≡ : {A : Set} {l l' : List A} → l == l' → l ≡ l'
==-≡ ==-[] = refl
==-≡ (==-∷ x e) = ≡-∷ x (==-≡ e)
This diagram makes it plausible that showing that p is the same as q (in the
sense that p ≡ q), should be equivalent to showing that the path from y to y,
obtained as the concatenation of p taken backward followed by q is the same as
the reflexivity on y. And indeed, one can show the following implication:
loop-≡ : {A : Set} {x y : A} (p q : x ≡ y) →
trans (sym p) q ≡ refl → p ≡ q
loop-≡ refl q h = sym h
from which one deduces that URP implies UIP:
URP-UIP : URP → UIP
URP-UIP URP p q = loop-≡ p q (URP (trans (sym p) q))
URP-K : URP → K
URP-K URP P r p = subst P (sym (URP p)) r
and that K implies URP:
K-URP : K → URP
K-URP K p = K (λ p → p ≡ refl) refl p
Note that K is a slight variant of the eliminator J, where we consider proposi-
tions depending on proofs of x ≡ x (instead of x ≡ y), thus the name. However,
K cannot be proved from J (try it!): this can be demonstrated by observing
that the non-trivial models of homotopy type theory validate the later but not
the former: the first such model was found by Hofmann and Streicher [HS98],
by interpreting types as groupoids.
Paths. The reason why this interpretation is useful to reason about identities is
that we now have a representation for them: they correspond to paths, as we
now explain. We write I for the interval space:
Concretely, it can be defined as the set I = [0, 1] of reals between 0 and 1 (both
included) equipped with the euclidean topology. A path in a space A from a
point x to a point y is a continuous function
p:I→A
x p y
Interpreting types. From now on, we are going to work with the following inter-
pretation of types in mind:
– we interpret a type A as a space,
path from x to x. We insist on the fact that the interpretations of functions are
always continuous, even if we omit mentioning it.
Given two elements x, y : A and two identities p, q : IdA (x, y), consider an
identity α : IdIdA (x,y) (p, q) from p to q. Topologically, it will correspond to a
continuous way of deforming the path p into the path q within paths from x
to y, i.e. the endpoints are fixed. It thus corresponds to a surface:
x α y
A
q
Moreover, since the circle is hollow, there can be no continuous way of deform-
ing p into q. A type theory which can account for such a type will not validate
the principle of uniqueness of identity proofs.
y
CHAPTER 9. HOMOTOPY TYPE THEORY 410
−→
f z
x ←−
g
We can still define functions f and g in the same way. Moreover, given a point z
in B there is still a path from y = f ◦ g(z) to z. However, there is no way of
choosing such a path in a continuous way: when moving z around the circle, at
some point the path has to jump from turning counterclockwise to clockwise,
or from turning once around the circle to turning twice, etc. For this reason the
point and the circle are not homotopy equivalent.
Remark 9.2.1.1. In previous example, it can be noted that the circle has a
hole whereas the point does not. It can be shown that homotopy equivalence
preserves the number of holes in any dimension [Hat02] (these are called the
Betti numbers and are closely related to homotopy groups), from which we
could have easily seen that the two spaces are not equivalent. There is actually
even a subtle converse to this property. A map f : A → B between spaces is
a weak homotopy equivalence when it induces a bijection between the holes (in
any dimension) of A and those of B (as a particular case, it should induce a
bijection between the path components of A and those of B). When A and B
are “nice” spaces, by which we mean gluing of disks (traditionally called CW-
complexes), a map f : A → B is a weak homotopy equivalence if and only if
it is a homotopy equivalence (this is Whitehead theorem). The restriction to
CW-complexes is not really a limitation here, because any space can be shown
to be weakly homotopy equivalent to a CW-complex.
UIP : (x y : A) (p q : x ≡ y) → p ≡ q
UIP x .x refl q = ?
x
q
UIP : (x y : A) (p q : x ≡ y) → p ≡ q
UIP x .x refl refl = ?
which means that we are now restricting to the space A00 reduced to a point
obtained from A0 by assimilating q to the constant path. This step should not
be valid because, as we have seen, A0 and A00 are not homotopy equivalent. In
fact, if we activate the flag --without-K of Agda, as we should always do, Agda
rejects this last step by issuing an error:
I'm not sure if there should be a case for the constructor refl,
because I get stuck when trying to solve the following unification
problems (inferred index ≟ expected index):
x₁ ≟ x₁
Possible reason why unification failed:
Cannot eliminate reflexive equation x ₁ = x ₁ of type A ₁ because K
has been disabled.
when checking that the expression ? has type refl ≡ q
which is his verbose way of saying that you are trying to do something forbidden
in the absence of axiom K.
Univalence. We will see that the type theory (without K) does not exactly
match the intuition that we have of types as spaces: some properties that we
expect to be shown cannot be proved. The reason is that, we lack some ways of
constructing equalities. For instance, we cannot construct non-trivial equalities
between functions: typically, we cannot prove function extensionality. In order
for logic and topology to match precisely, one needs to assume an axiom, called
univalence. It will be only be presented in section 9.4, but we will mention some
of the properties which it allows to prove before that, in order to motivate the
need for it (e.g. function extensionality will be a consequence of it).
9.2.2 The structure of paths. We shall now study the constructions and
operations which are available on paths. The first one, which we have seen many
times, is the construction of the constant path on a point x, which is simply
given by refl. Given two paths p : x ≡ y and q : y ≡ z such that the end of p
CHAPTER 9. HOMOTOPY TYPE THEORY 412
matches the beginning of q (both are y), we can build their concatenation p · q,
which is a path from x to z. If we see them as continuous functions p : I → A
and q : I → A, where I is the interval [0, 1], this is defined as
(
p(t) if 0 6 t 6 1/2,
(p · q)(t) =
q(t − 1/2) if 1/2 6 t 6 1.
In the following, we will generally not give such explicit constructions, and
simply provide the formalization in Agda, which is in this case
and is a different function from p: we usually don’t have (refl ·p)(t) = p(t) for
every t ∈ I. They are however homotopic, in the sense that there is a path
(i.e. a deformation) from the former to the latter (exercise: explicitly define this
path). Similarly, the constant path is also a unit on the right for concatenation:
which means that taking a path backwards and then forward is the same (up to
homotopy) as doing nothing (try it in the street). The same holds on the right:
∙-inv-r : ∀ {i} {A : Type i} {x y : A} →
(p : x ≡ y) → p ∙ ! p ≡ refl
∙-inv-r refl = refl
and taking the inverse twice does nothing:
!-! : ∀ {i} {A : Type i} {x y : A} → (p : x ≡ y) → ! (! p) ≡ p
!-! refl = refl
such that
– concatenation is associative and admits constant paths as neutral elements
on both sides,
– every path admits an path which is an inverse on both sides.
A groupoid is precisely this structure, if we assume that the two above axioms
hold up to equality (as opposed to up to homotopy): it consists of a set (of
points or objects), together with a set (of paths or morphisms) between any
two pair of points, equipped with a composition and identities (constant paths),
such that composition is associative, unital and admit inverses. The first model
of dependent type theory which did not validate UIP, was precisely done by
interpreting types as groupoids [HS98]. It can be seen as a “degenerate” version
of the model of spaces, in the sense that the only paths between paths are
constant paths.
9.3 n-types
Now that we have this point of view on types as spaces, we can start classify-
ing types depending on their topological properties. A particularly interesting
classification is given by n-types, which are types which contain no holes of
dimension k > n, for some natural number n.
CHAPTER 9. HOMOTOPY TYPE THEORY 414
9.3.1 Propositions. The most simple kind of types are propositions [Uni13,
Section 3.3]. We can think of a proposition as either being
with the intended meaning that isProp A holds when the type A is a proposition.
-isProp : isProp
-isProp tt tt = refl
We can also show that the type of booleans is not a proposition since it has two
points, true and false, which are not equal:
on types × t → Π Σ
on propositions ∧ ∨ ⇒ ∀ ∃
(we are using here the more traditional notation t instead of the usual Agda
notation ] for coproduct). The Curry-Howard correspondence allowed us to
identify both lines, but now that we have a rich type theory, we can tear logic
and types apart again! In order for this to make sense, we should check that the
operations are well-defined on propositions, i.e. that the result is a proposition
when applied to proposition. We will see that it is actually not always the case
and that their definition have to be adapted to properly operate on propositions.
Propositions are closed under products, i.e. the product of two propositions
is itself a proposition:
×-isProp : ∀ {i j} {A : Type i} {B : Type j} →
isProp A → isProp B → isProp (A × B)
×-isProp PA PB (a , b) (a' , b') with PA a a' , PB b b'
×-isProp PA PB (a , b) (.a , .b) | refl , refl = refl
We can therefore simply define the conjunction of propositions as their product:
CHAPTER 9. HOMOTOPY TYPE THEORY 416
reasons as for coproduct, propositions are not closed under Σ-types and we also
defer the definition of the ∃ quantifier.
As a final remark on connectives on propositions, we mention that it would
be cleaner and more conceptual to define them directly on hProp, i.e. have them
provide the proof that they produce propositions. For instance, conjunction
could be defined as
_∧_ : ∀ {i j} → hProp i → hProp j → hProp (lmax i j)
(A , PA) ∧ (B , PB) = (A × B) , ×-isProp PA PB
We choose not to do this here in order to keep closer to bare metal and avoid
small lemmas which would obfuscate the code at first read.
Predicates and propositions. For any type A, the type isProp A is itself a propo-
sition: being a proposition is a proposition. If this was not the case, there could
be multiple reasons why a proposition could be a proposition, and the meaning
of this would be rather obscure. The proof, which is called isProp-isProp, is
deferred to section 9.3.2.
Up to now, we have formalized a predicate on a type A as a function P
whose type is A → Set, see section 6.5.9. For the same reasons as above, such a
function really deserves the name of predicate only when it is the case that P x
is a proposition for every element x of type A. We can thus formalize the fact of
being a predicate as
isPred : ∀ {i j} {A : Type i} → (A → Set j) → Set (lmax i j)
isPred {_} {_} {A} P = (x : A) → isProp (P x)
The function isProp-isProp described above shows that isProp is a predicate:
isProp-isPred : ∀ {j} → isPred {j = j} isProp
isProp-isPred A = isProp-isProp
Similarly, as expected, being a predicate is itself a predicate:
isPred-isPred : ∀ {i j} {A} → isPred (isPred {i} {j} {A})
isPred-isPred P = Π-isProp (λ x → isProp-isProp)
(A ⇔ B) ⇒ (A ≡ B)
isProp(A) ⇒ ¬A ⇒ (A ≡ ⊥)
We currently cannot show that, but we will see in section 9.4.6 that it can be
proved if we assume the univalence axiom.
9.3.2 Sets. After having considered propositions, the next interesting kind of
types are sets [Uni13, section 3.1]. Those are types which are collections of
points (up to homotopy). A typical set is thus:
is not a set because it is not a collection of points. In a set, two points x and y
are either in the same connected component, in which case they are equal in a
unique way (up to homotopy), or they are in distinct components, in which case
they are not equal. Otherwise said, if they are equal, they should be uniquely
so. This suggests defining the following predicate for sets:
isSet : ∀ {i} → Type i → Type i
isSet A = (x y : A) (p q : x ≡ y) → p ≡ q
-isSet : isSet
-isSet zero zero refl refl = refl
-isSet (suc m) (suc n) p q =
p ≡ suc-pred-≡ p
ap suc (ap pred p) ≡ ap (ap suc) ( -isSet m n _ _)
ap suc (ap pred q) ≡ sym (suc-pred-≡ q)
q ∎
More generally, all the basic datatypes we usually use (natural numbers, strings,
etc.) are sets. This includes the types of previous section, since one can show
that every proposition is a set, see below. Moreover, all usual type constructors
(lists, vectors, etc.) preserve the fact of being a set.
Exercise 9.3.2.1. Show that the type List A is a set when A is a set.
Closure properties. Sets are closed under most usual operations (products, co-
products, arrows, Π-types, Σ-types), as expected from set theory. As an illus-
tration, let us show the closure under products. Recall from section 9.1.5 that,
given two types A and B, a pair of paths p : x ≡ x0 in A and q : y ≡ y 0 in
B canonically induce a path from (x, y) to (x0 , y 0 ) in A × B, that we abusively
write (p, q) : (x, y) ≡ (x0 , y 0 ) here:
×-≡ : ∀ {i j} {A : Type i} {B : Type j} {x x' : A} {y y' : B} →
x ≡ x' → y ≡ y' → (x , y) ≡ (x' , y')
×-≡ refl refl = refl
Moreover, every path in A × B is equal to a path of this form. More precisely, a
path p : (x, y) ≡ (x0 , y 0 ) in A × B induces, by congruence under the projections,
paths pA : x ≡ x0 and pB : y ≡ y 0 , and the path induced by pA and pB using
previous function is equal to p, i.e. p ≡ (pA , pB ):
×-≡-η : ∀ {i} {j} {A : Type i} {B : Type j}
{z z' : A × B} {p : z ≡ z'} →
p ≡ ×-≡ (ap fst p) (ap snd p)
×-≡-η {p = refl} = refl
Finally, we can use this to show that the product A × B of the sets A and B is
itself a set. Namely, given parallel paths p and q in A × B, we have
p ≡ (pA , pB ) ≡ (qA , qB ) ≡ q
where the first and last equalities come from the previous observation, and the
one in the middle follows from the fact that we have pA ≡ qA and pB ≡ qB
because both A and B are sets:
×-isSet : ∀ {i j} {A : Type i} {B : Type j} →
isSet A → isSet B → isSet (A × B)
×-isSet SA SB (x , y) (x' , y') p q =
p ≡ ×-≡-η
×-≡ (ap fst p) (ap snd p) ap2 ≡ ×-≡
(SA x x' (ap fst p) (ap fst q))
(SB y y' (ap snd p) (ap snd q))
×-≡ (ap fst q) (ap snd q) ≡ sym ×-≡-η
q ∎
CHAPTER 9. HOMOTOPY TYPE THEORY 421
Propositions are sets. Any proposition is a set [Uni13, Lemma 3.3.4]. This is
intuitively expected because a proposition should be either empty or a point,
and thus a particular case of a collection of points. Consider a proposition A,
and two paths p, q : x ≡ y between points x and y of A. In order to show that A
is a set, we have to show that the paths p and q are equal, which is not easily
done directly. Instead, we are going to show that both are equal to a third
“canonical” path.
p
x y
q
px py
z
Fix a point z in A. Since A is a proposition, for every point x of A, there is
path px : z ≡ x. We now have a candidate for the canonical path: let’s show
that p ≡ p−1
x · py . By induction on p, this is immediate, since when p = refl(x),
we have refl(x) ≡ p−1
x · px , see section 9.2.2:
Hedberg’s theorem. An abstract reason why most usual types are sets is because
they have decidable equality: Hedberg’s theorem states that any type with a
decidable equality is necessarily a set [Hed98, KECA16] and [Uni13, section 7.2].
For instance, we can decide the equality of natural numbers (see section 6.6.8),
therefore they form a set (which we have already proved directly above).
We recall that a type A is said to be decidable when ¬A t A holds, i.e. we
can either show that it is empty or produce an element of it:
isDec : ∀ {i} → Type i → Type i
isDec A = ¬ A A
CHAPTER 9. HOMOTOPY TYPE THEORY 422
In particular, a type A has decidable equality when we can decide whether any
two elements of A are equal or not:
and therefore decidability of equality implies that equality has the double nega-
tion property. In this section, by “having a decidable equality”, we will therefore
without loss of generality mean “having an equality with the double negation
elimination property”.
Suppose that the type A has decidable equality. In order to show that A
is a set, we have to show that any two paths p, q : x ≡ y are equal. The
proof strategy here is the same as above: we should show that p is equal to
a “canonical” path of type x ≡ y, the path q will similarly be equal to this
path and we will be able to conclude. The fact that A has decidable equality
provides us with a canonical path between x and y. Namely, the existence of
the path p implies that we have a proof λk.kp of ¬¬(x ≡ y) and the double
negation elimination property provides us with a path x ≡ y:
nnePath : ∀ {i} {A : Type i} → isNNEq A →
{x y : A} → (p : x ≡ y) → x ≡ y
nnePath N {x} {y} p = N x y (λ k → k p)
This path is canonical, in the sense that it does not depend on the choice of
the path p. Namely, we know from section 9.3.1 that the type ¬¬(x ≡ y) is a
proposition (any negation of a type is). In particular, given two paths p and q
of type x ≡ y, the proofs λk.kp and λk.kq of ¬¬(x ≡ y) are equal and therefore
induce equal paths of type x ≡ y by elimination of double negation:
nnePathIndep : ∀ {i} {A : Type i} (N : isNNEq A) {x y : A}
(p q : x ≡ y) → nnePath N p ≡ nnePath N q
nnePathIndep N {x} {y} p q =
ap (N x y) ((¬-isProp (λ k → k p) (λ k → k q)))
In this way, we have constructed a canonical path px,y : x ≡ y, which depends
only on x and y. Finally, we want to show that p ≡ px,y , i.e. the arbitrary path p
CHAPTER 9. HOMOTOPY TYPE THEORY 423
Groupoids. It can be observed that the definition of being a set can be reformu-
lated as:
isSet : ∀ {i} → Type i → Type i
isSet A = (x y : A) → isProp (x ≡ y)
i.e. a set is a type such that every pair of points x and y, the type x ≡ y is
a proposition. This reformulation suggests the next thing to try: we define a
groupoid as a type such that for every pair of points x and y, the type x ≡ y is
a set:
isGroupoid : ∀ {i} → Type i → Type i
isGroupoid A = (x y : A) → isSet (x ≡ y)
In a groupoid, two points x and y might be equal in multiple ways, but there
should be at most one equality between two paths p, q : x ≡ y. Typically, the
circle (on the left) is a groupoid
p p
x y x y
q q
but the sphere (on the right) is not a groupoid: between the point x and y there
are two paths p and q and between those paths there are two non-homotopic
paths (the deformations though the front or the back hemisphere).
CHAPTER 9. HOMOTOPY TYPE THEORY 424
The hierarchy. Continuing in this way, we define the notion of n-type, or a type
of homotopy level n, by recurrence on n [Uni13, Chapter 7]:
– a 0-type is a set, and
– an (n+1)-type is a type such that the type x ≡ y is an n-type, for every
points x and y.
In particular, a 1-type is a groupoid.
The intuition is that an n-type is a type which is trivial in dimension higher
than n, in the sense that it does not contain any non-trivial k-sphere for k > n.
In low dimensions k, the k-spheres (or spheres in dimension k) can be pictured
as follows:
Negative types. The choice of n = 0 for sets is done in order to agree with
traditional conventions in mathematics, but it can be extended a bit to negative
numbers. We have seen that in a proposition is such that x ≡ y is a 0-type (a
set) for every pair of point x and y, so that it makes senses to define a (−1)-type
as a proposition: if we adopt this convention, a 0-type is a type in which x ≡ y
is a (−1)-type, in accordance with the above definition.
Can we also make sense of a (−2)-type? In a (−1)-type, i.e. a proposition,
for every pair of points x and y, we should have that x ≡ y is a (−2)-type. Since
in a proposition every pair of points is related by a unique path, a (−2)-type can
be defined as a contractible type, i.e. a type which is a point up to homotopy,
see below. If we go on with this reasoning, we find that a (−3)-type should still
be a contractible type, so that we stop at dimension n = −2.
Contractible types. In Agda, the predicate of being contractible for a type can
be defined as
isContr : ∀ {i} → Type i → Type i
isContr A = Σ A (λ x → (y : A) → x ≡ y)
It expresses the fact that a type is contractible when it contains a point x such
that for every point y there is a path py from x to y. Typically, the type > is
contractible since every point of it is equal to the only constructor tt:
-isContr : isContr
-isContr = tt , (λ { tt → refl })
CHAPTER 9. HOMOTOPY TYPE THEORY 425
Once again, it might seem that the circle is contractible because there is a
path between any two pair of points, but it is not so because the choice of the
path py has to be made continuously in y, which is not possible for the circle.
A contractible type is thus homotopy equivalent to a point:
n-types in Agda. We can define a predicate hasLevel such that hasLevel (n+2)
A holds when A is an n-type (we start at n = −2 instead of n = 0) by
hasLevel : ∀ {i} → → Type i → Type i
hasLevel zero A = isContr A
hasLevel (suc n) A = (x y : A) → hasLevel n (x ≡ y)
Remark 9.3.3.1. Note that for a type A, being a (−1)-type according to the
above definition (i.e. satisfying hasLevel 1) requires slightly more than the pre-
vious definition of propositions: for every pair of points x and y, there should
be a path p : x ≡ y as before, but we should also show that for every other
path q : x ≡ y, we have p ≡ q. However, the second requirement is automatic if
we carefully chose paths so that the two definitions coincide:
isProp-is1Type : ∀ {i} → {A : Type i} → isProp A → hasLevel 1 A
isProp-is1Type p x y = ! (p x x) ∙ p x y ,
λ { refl → ∙-inv-l (p x x) }
(we are using the same trick here than for Hedberg’s theorem, see section 9.3.2).
CHAPTER 9. HOMOTOPY TYPE THEORY 426
The property of being an n-type. One can show that the property of being an
n-type is a proposition: a type either is an n-type or not, but there cannot be
multiple ways in which a type is an n-type.
For the base case, one has to show that being contractible is a proposition.
Suppose given two proofs (x, p) and (y, q) that a type A is contractible, where x
(resp. y) is a point of A and p (resp. q) associates to every point z of A a
path x ≡ z (resp. y ≡ z). Showing that these two proofs are equal amounts
to show that x ≡ y, which is given by py , and that p ≡ q. Assuming function
extensionality, this last point is equivalent to showing that, for every point z
in A, the paths pz : x ≡ z and qz : x ≡ z are equal, up to some transport of
the first. Since A is contractible (we have a proof (x, p) of it), it is a 0-type
(i.e. a set) by cumulativity, and therefore any two parallel paths in it are equal,
thus pz ≡ qz :
isContr-isProp : ∀ {i} {A : Type i} → isProp (isContr A)
isContr-isProp {_} {A} (x , p) (y , q) =
Σ-≡ (p y) (funext (λ z → fst (A-isSet y z _ (q z))))
where
A-isSet : hasLevel 2 A
A-isSet = hasLevel-cumulative (hasLevel-cumulative (x , p))
The inductive case is handled immediately using function extensionality:
hasLevel-isProp : ∀ {i} {A : Type i} (n : ) → isProp (hasLevel n A)
hasLevel-isProp zero = isContr-isProp
hasLevel-isProp (suc n) f g =
funext2 (λ x y → hasLevel-isProp n (f x y) (g x y))
of an actual proof. Therefore, the type kAk should be a empty when A is and
a point otherwise. If A is decidable, this operation is easy to define: either A
or ¬A holds, and we respectively define kAk = > or kAk = ⊥. However, since
we do not live in a classical world, we cannot define propositional truncation in
this way. A more faithful description is that the propositional truncation starts
from the type A and adds a path between any pair of points in order to turn it
into a proposition, see section 9.5.4.
where
– kAk is the propositional truncation of A,
– kAkisProp is a proof that kAk is a proposition, and
– |t| provides a proof that kAk is non-empty when there is a term t of type A,
– rec(t, B, x 7→ u) is the eliminator for truncated types.
The formation rules state that the propositional truncation kAk exists for every
type A and is a proposition:
Γ ` A : Type Γ ` A : Type
(kkF ) (kk0F )
Γ ` kAk : Type Γ ` kAkisProp : isProp(A)
The introduction rule states that the propositional truncation kAk is non-empty
when A is
Γ`t:A
(kkI )
Γ ` |t| : kAk
The elimination rule states that if we have an element of kAk, then we can
assume that we have an element of A provided that the type we are currently
proving (or “eliminating into”) is a proposition:
Γ ` t : kAk Γ, x : A ` u : B Γ ` P : isProp(B)
(kkE )
Γ ` rec(t, B, x 7→ u) : B
The computation rule states that the element of A given by the elimination rule
above is t when the witness given for kAk is |t|:
Γ`t:A Γ, x : A ` u : B Γ ` P : isProp(B)
(kkC )
Γ ` rec(|t|, B, x 7→ u) = u[t/x] : B
Γ ` t : kAk Γ ` P : isProp(A)
(kkU )
Γ ` | rec(t, A, x 7→ x)| = t : kAk
CHAPTER 9. HOMOTOPY TYPE THEORY 428
Remark 9.3.4.1. For simplicity, we have given the rules in the non-dependent
case, which is the most useful one in practice. For full generality, we should
allow B to depend on kAk and adapt the rules accordingly. For instance, the
elimination rule should be
Γ ` t : kAk
Γ, x : kAk ` B Γ, x : A ` u : B[|x|/x] Γ, x : kAk ` P : isProp(B)
(kkE )
Γ ` rec(t, x 7→ B, x 7→ u) : B[t/x]
computation is
postulate ∥∥-comp : ∀ {i j} {A : Type i} {B : Type j} →
(P : isProp B) (f : A → B) (x : A) →
∥∥-rec P f ∣ x ∣ ≡ f x
and uniqueness is
postulate ∥∥-eta : ∀ {i} {A : Type i} (P : isProp A) (x : ∥ A ∥) →
∣ ∥∥-rec P id x ∣ ≡ x
Logical connectives. Remember from section 9.3.1 that we had difficulties defin-
ing the disjunction of propositions because the coproduct of two propositions is
not a proposition in general (we can only show that it is a set). Now that we
have the propositional truncation at hand, we can use it on order to squash the
result of the coproduct into a proposition. We can thus define disjunction as
_∨_ : ∀ {i j} → Type i → Type j → Type (lmax i j)
A ∨ B = ∥ A B ∥
The disjunction of two propositions is now a proposition by definition. Similarly,
the existential quantification is a truncated variant of Σ-types:
(x : A) → Σ B (λ y → R x y)
witnesses the fact that every element of A is in relation with some element of B.
A term of this type is a function r which to every x ∈ A associates a pair
consisting of an element y ∈ B together with a proof that the pair (x, y) is in
the relation R. From this data it is easy to construct a function A → B (by
post-composing r with the first projection) and a proof that we have (x, r(x))
in the relation R for every x ∈ A:
cac : ∀ {i j k} → CAC {i} {j} {k}
cac R f = (λ x → fst (f x)) , (λ x → snd (f x))
In some sense this was “too easy”, because the function r directly provided us
with a way to construct a suitable element of B from an element of A.
A more faithful way of implementing the axiom of choice in type theory con-
sists, instead of supposing that we have a function r as above, in only supposing
the existence of such a function, i.e. that its propositional truncation is inhab-
ited, otherwise said we use an existential quantification instead of a Σ-type.
Similarly, as a result, we only want to show that there exists a suitable function
from A to B, without explicitly constructing it. The “right” formulation of the
axiom of choice is thus:
AC : ∀ {i j k} → Type (lmax (lmax (lsuc i) (lsuc j)) (lsuc k))
AC {i} {j} {k} = {A : Type i} {B : Type j} →
isSet A → isSet B →
(R : A → B → Type k) →
((x : A) (y : B) → isProp (R x y)) →
(r : (x : A) → ∃ B (λ y → R x y)) →
∃ (A → B) (λ f → (x : A) → R x (f x))
Note that, since we are serious about homotopy levels, we have also restricted
to the case where A and B are sets and R x y is a proposition for every element
x of A and y of B (the axiom without this restriction would be inconsistent
with univalence [Uni13, Lemma 3.8.5]). There is also a dependent variant of
this axiom (where the type B is allowed to depend on A):
CHAPTER 9. HOMOTOPY TYPE THEORY 430
F : U
F = (λ b → b ≡ false ∨ P) , ∣ false , ∣ inl refl ∣ ∣
T : U
T = ((λ b → b ≡ true ∨ P)) , ∣ true , ∣ inl refl ∣ ∣
An element Q of U consists of a subset Q0 of Bool together with a proof Q00
that Q0 is non-empty. The family consisting of all Q0 such that Q belongs to U
is thus a family of non-empty sets and, by the axiom of choice, it is non-empty:
we have a function f which to every element Q of U associates an element of Q0 .
We will prove in the function
dec : ((Q : U) → Σ Bool (fst Q)) → P ∨ ¬ P
that this entails that P ∨ ¬P holds, from which we will be able to conclude as
explained above:
Diaconescu : isProp P → PAC → P ∨ ¬ P
Diaconescu prop ac = ∥∥-rec ∥∥-isProp dec
(ac {A = U} {B = (λ Q → Σ Bool (fst Q))} (λ Q → snd Q))
The crux of this proof is thus the function dec. It proceeds by case analysis
on f F and f T :
– if f F is true then true ≡ false ∨ P holds and thus P holds,
– if f T is false then false ≡ true ∨ P holds and thus P holds,
– if f F is false and f T is true then we can show that ¬P holds.
The subtle case is the last one, when f F is false and f T is true, because this
entails that false ≡ false ∨ P and true ≡ true ∨ P hold, from which we
cannot extract information. However, we can show that ¬P holds in this case.
Namely, suppose that P holds (we write x for its proof) and let us deduce ⊥.
Since P holds, by definition of F and T we have F b ⇔ T b for every boolean b,
thus F b ≡ T b by propositional extensionality, and thus F ≡ T by function
extensionality:
F≡T : F ≡ T
F≡T =
Σ-≡
(funext
λ {
false → propext ∥∥-isProp ∥∥-isProp
((λ _ → right x) , (λ _ → right x)) ;
true → propext ∥∥-isProp ∥∥-isProp
((λ _ → right x) , (λ _ → right x))
})
(∥∥-isProp (transport (∃ Bool) _ (snd F)) (snd T))
CHAPTER 9. HOMOTOPY TYPE THEORY 432
From there, we can deduce that the boolean of f F is equal to the boolean
of f T (recall that f Q is a pair consisting of a boolean and a proof that it
belongs to Q):
We are thus able to extract a witness from knowing its existence. Note that
the fact that N can be enumerated is crucial here: the implication kAk ⇒ A
does not hold in general, for an arbitrary type A. For instance, if f was of type
(N → N) → N, we would not expect to be able to construct a root from knowing
its existence, because the type of functions N → N is not countable.
So, suppose that we have a proof E of ∃(n : N).f (n) = 0 and we want
to prove the proposition R which is Σ(n : N).f (n) = 0. We cannot directly
provide the required natural number n (we cannot magically guess the root)
and we cannot use the hypothesis E: in order to do so, we would have to use
the eliminator for propositional truncation, which we cannot do because the
goal we are proving is not a proposition. Namely, the type R is a set, the set of
all roots of f , and not a proposition (f might admit multiple roots). However,
we can take a variant of this type in order to have a proposition: instead of
constructing any root of f , we are going to construct a particular one, say the
smallest one. Namely, the set R0 of natural numbers which is a smallest root
of f contains exactly one element (the smallest root of f ) and will thus be a
proposition. We can prove it by using elimination of propositional truncation
on E and then conclude that we have an element of R because we have the
implication R0 ⇒ R (a smallest root of f is a root).
In Agda, we are going to reason on an arbitrary predicate P on natural
numbers, our above example being the particular case where P n is f (n) = 0.
We can define a predicate isFirst such that a natural number n satisfies isFirst n
when n is the smallest natural number for which P holds:
isFirst : ∀ {i} (P : → Type i) → → Type i
isFirst P n = P n × ((m : ) → P m → n ≤ m)
perform inductions, it will be useful to consider the type of the smallest natural
number greater than a fixed number k satisfying a proposition:
(proof left to the reader). We can thus construct the first natural number n
satisfying P , by applying previous lemma to the case k = 0:
find-first : ∀ {i} (P : → Type i) → ((n : ) → isDec (P n)) →
(m : ) → P m → Σ (λ n → isFirst P n)
find-first P dec m Pm with find-first-from P dec m Pm 0 z≤n
find-first P dec m Pm | n , (_ , Pn) , Fn =
n , (Pn , λ n Pn → Fn n (z≤n , Pn))
It is now time for application to our original problem. Given a function f : N → N
for which we have a proof E that ∃(n : N).f (n) = 0. We can use the elimination
principle for propositional truncation in order to show Σ(n : N). isFirst(f (n) = 0),
which is a proposition, and we are left with showing
i.e. knowing a root of f we have to construct the smallest one, which is precisely
the purpose of our find-first function above:
extract-first-root : (f : → ) →
∃ (λ n → f n ≡ zero) →
Σ (isFirst (λ n → f n ≡ zero))
extract-first-root f E =
∥∥-rec
(first-isProp P (λ n → -isSet (f n) 0))
(λ { (n , Pn) → find-first P (λ n → f n ≟ 0) n Pn})
E
where
P : → Type ₀
P n = f n ≡ zero
Finally, we can conclude with our root extraction procedure:
extract-root : (f : → ) →
∃ (λ n → f n ≡ zero) →
Σ (λ n → f n ≡ zero)
extract-root f E with extract-first-root f E
extract-root f E | n , Pn , _ = n , Pn
Relationship with double negation. Given a type A, the type ¬¬A is a propo-
sition (as is the negation of any type) and there is a canonical map from the
former to the later:
¬¬-trunc : ∀ {i} {A : Type i} → A → ¬ (¬ A)
¬¬-trunc x k = k x
In this sense, double negation is very similar to propositional truncation, ex-
cepting that the resulting type is “classical” in the sense that it satisfies the
law of elimination of double negation (or, equivalently, the excluded middle).
If propositional truncation kAk can be seen as a quotient of A (we identify all
proofs), and ¬¬A can be thought of as a further quotient, making the type
classical. This quotient is witnessed by the existence of a canonical function
kAk → ¬¬A, which can be constructed by
CHAPTER 9. HOMOTOPY TYPE THEORY 436
9.4 Univalence
As indicated before, we still lack ways to prove equalities which ought to hold
in our geometric model. We now introduce the univalence axiom, due to Vo-
evodsky, which fixes this in a satisfactory way.
Application. The first one, called ap, states that all functions preserve paths:
given a function f : A → B and a path p : x ≡ y in A, we can construct a path
f (x) ≡ f (y) in B, sometimes abusively written f (p), by “applying” (thus the
name) f to p:
ap : ∀ {i j} {A : Type i} {B : Type j} {x y : A} →
(f : A → B) → x ≡ y → f x ≡ f y
ap f refl = refl
It can also be seen as a witness for the fact that equality is a congruence
(and we have already met this function under the name cong in section 6.6).
This application is compatible with concatenation of paths in the sense that
f (p · q) ≡ f (p) · f (q):
Similarly, if two functions are equal and we apply them to the same argument,
the results will also be equal:
happly : ∀ {i j} {A : Type i} {B : A → Type j}
{f g : (x : A) → B x} →
f ≡ g → (x : A) → f x ≡ g x
happly refl x = refl
CHAPTER 9. HOMOTOPY TYPE THEORY 438
B(x)
B(y)
a
Σ(x : A).B b
A p
x y
A
B
x
coe p x
Type
p
A B
Formally,
coe : ∀ {i} {A B : Type i} → (A ≡ B) → A → B
coe p x = transport (λ A → A) p x
CHAPTER 9. HOMOTOPY TYPE THEORY 439
B(x)
B(y)
A p
x y
The intuitive reason for this is that f has to be a continuous function from A
to B. Formally,
apd : ∀ {i j} {A : Type i} {B : A → Type j} {x y : A} →
(f : (x : A) → B x) → (p : x ≡ y) →
transport B p (f x) ≡ f y
apd f refl = refl
9.4.2 Equivalences. We consider that two spaces are equivalent when they are
“isomorphic up to homotopy”, i.e. they are homotopy equivalent, in the sense
defined in section 9.2. We now formalize this notion, see [Uni13, Chapter 4] for
details. We will see that it behaves much like the notion of isomorphism.
isQinv {A = A} {B = B} f =
Σ (B → A) (λ g → (g ∘ f) id × (f ∘ g) id)
The name comes from the fact that, a function f satisfying this property is, in
this context, said to be quasi-invertible. Above, the identity is defined as
id : ∀ {i} {A : Type i} → A → A
id x = x
and the composition by
and one can show that isEquiv(f ) is a proposition for every function f [Uni13,
Theorem 4.2.13]. Note that every quasi-invertible map is canonically an equiv-
alence:
isQinv-isEquiv : ∀ {i j} {A : Type i} {B : Type j} {f : A → B} →
isQinv f → isEquiv f
isQinv-isEquiv (g , gf , fg) = (g , gf) , (g , fg)
There is also a converse map (which is not obvious to define), the subtle point
being that the resulting pair of maps does not form an equivalence. That being
said, all the equivalences we will construct in practice will be quasi-inverses.
We then say that a map is contractible when all its fiber are:
isContrMap : ∀ {i j} {A : Type i} {B : Type j} →
(A → B) → Type (lmax i j)
isContrMap {B = B} f = (y : B) → isContr (fib f y)
It can be shown for a map f , the types isEquiv(f ) and isContrMap(f ) are
equivalent, so that we could use contractibility as an alternative definition for
being an equivalence.
Equivalence of types. Two types A and B are equivalent when there is an equiv-
alence from A to B, what we write A ' B:
_ _ : ∀ {i j} (A : Type i) (B : Type j) → Type (lmax i j)
A B = Σ (A → B) isEquiv
This relation is an equivalence relation. It is reflexive:
-refl : ∀ {i} {A : Type i} → A A
-refl = id , (id , (λ x → refl)) , (id , λ x → refl)
transitive:
-trans : ∀ {i j k} {A : Type i} {B : Type j} {C : Type k} →
A B → B C → A C
-trans (f , (g , gf) , (g' , fg')) (h , (i , ih) , (i' , hi')) =
(h ∘ f) ,
(((g ∘ i) , λ x → trans (ap g (ih (f x))) (gf x)) ,
((g' ∘ i') , λ x → trans (ap h (fg' (i' x))) (hi' x)))
but also symmetric, which is not immediate because the definition of equivalence
is not:
-sym : ∀ {i j} {A : Type i} {B : Type j} → A B → B A
-sym {B = B} (f , (g , gf) , (g' , fg')) =
g , (f , left) , (f , gf)
where
g-g' : (x : B) → g x ≡ g' x
g-g' x = trans (sym (ap g (fg' x))) (gf (g' x))
left : (x : B) → f (g x) ≡ x
left x = trans (ap f (g-g' x)) (fg' x)
An equivalence e consists of a map f : A → B together with two maps
g, g 0 : B → A which are respectively left and right inverse for f . We can define
a function which to such an equivalence associates the corresponding f :
-→ : ∀ {i j} {A : Type i} {B : Type j} → A B → A → B
-→ (f , _) = f
and one associating the corresponding g:
-← : ∀ {i j} {A : Type i} {B : Type j} → A B → B → A
-← (_ , ((g , _) , _)) = g
It will also be useful to have a notation for the proof that g is a left inverse
for f , i.e. x = g(f (x)) for every x in A:
CHAPTER 9. HOMOTOPY TYPE THEORY 442
-η : ∀ {i j} {A : Type i} {B : Type j}
(e : A B) (x : A) → x ≡ -← e ( -→ e x)
-η (f , (g , gl) , (h , hr)) x = sym (gl x)
We also define one providing a proof that g is a right inverse for f , i.e. we have
f (g(x)) = x for every x in A:
-ε : ∀ {i j} {A : Type i} {B : Type j}
(e : A B) (y : B) → -→ e ( -← e y) ≡ y
-ε (f , (g , gl) , (h , hr)) y =
f (g y) ≡ sym (ap (λ y → f (g y)) (hr y))
f (g (f (h y))) ≡ ap f (gl (h y))
f (h y) ≡ hr y
y ∎
Note that the proof is slightly more complicated than the previous one because
we show here that g and not g 0 is a right inverse for f .
Finally, we show a last useful theorem. In set theory, a function f : A → B
which is bijective, i.e. which admits an inverse g, is always injective. This means
that for every elements x and y of A, if f (x) = f (y) then x = y. Namely, we
have
x = g(f (x)) = g(f (y)) = y
This property also holds in our context:
-inj : ∀ {i j} {A : Type i} {B : Type j}
(e : A B) {x y : A} → -→ e x ≡ -→ e y → x ≡ y
-inj e {x} {y} p =
x ≡ -η e x
-← e ( -→ e x) ≡ ap ( -← e) p
-← e ( -→ e y) ≡ sym ( -η e y)
y ∎
9.4.3 Univalence. We can easily define a function which shows that two equal
types A and B are equivalent:
id-to-equiv' : ∀ {i} {A B : Type i} → (A ≡ B) → (A B)
id-to-equiv' refl = id , ((id , (λ _ → refl)) , id , (λ _ → refl))
Namely, by induction on the equality p : A ≡ B we can suppose that A and B
are the same, and in this case we can take the identity function as equivalence
between the two types, left and right inverses being the identity. Given a path
p : A ≡ B, note that the induced function A → B is precisely given by coe f ,
so that it is conceptually better to define this operator as
id-to-equiv : ∀ {i} {A B : Type i} → (A ≡ B) → (A B)
id-to-equiv p = coe p , coe-isEquiv p
where the proof that coercion gives rise to equivalences is
coe-isEquiv : ∀ {i} {A B : Type i} (p : A ≡ B) → isEquiv (coe p)
coe-isEquiv refl = (id , (λ x → refl)) , (id , λ x → refl)
The univalence axiom introduced by Voevodsky states that this function is
itself an equivalence [Uni13, section 2.10]:
CHAPTER 9. HOMOTOPY TYPE THEORY 443
A'B→A≡B
Σ(b : Bool).δA,B b
where δA,B : Bool → Type is the function such that δA,B false = A and
δA,B true = B. This means that we can describe an element of A t B as a
pair (b, x) where b is a boolean and x is an element of A (resp. B) when A is
false (resp. true). An equivalence
(A t B) ≡ (Σ(b : Bool).δA,B b)
CHAPTER 9. HOMOTOPY TYPE THEORY 444
meaning that we can convert any property on one representation into a property
on the other representation. Similarly, the type A × B can be described as the
type
(A × B) ≡ (Π(b : Bool).δA,B b)
As a more programming-oriented example, natural numbers can either be
defined in unary or binary representation, giving rise to equivalent types. By
univalence, we can automatically transport any operation on one representation
(e.g. addition) into the other.
9.4.5 Describing identity types. Using univalence, we can describe the iden-
tity types for most type constructions.
Identity types in products. Given types A and B, we expect that a path in A×B
consists of a pair of paths in A and B respectively, i.e. given x, x0 in A and y, y 0
in B, we should have
IdA×B ((x, y), (x0 , y 0 )) ≡ IdA (x, x0 ) × IdB (y, y 0 )
By univalence, this amounts to show that the corresponding equivalence between
types
IdA×B ((x, y), (x0 , y 0 )) ' IdA (x, x0 ) × IdB (y, y 0 )
which is easily constructed:
×- : ∀ {i j} {A : Type i} {B : Type j} {x y : A × B} →
(x ≡ y) ((fst x ≡ fst y) × (snd x ≡ snd y))
×- {x = x} {y = y} =
f , (g , λ { refl → refl }) , (g , λ { (refl , refl) → refl })
where
f : x ≡ y → (fst x ≡ fst y) × (snd x ≡ snd y)
f refl = refl , refl
g : (fst x ≡ fst y) × (snd x ≡ snd y) → x ≡ y
g (refl , refl) = refl
Identity types over natural numbers. For data types, similar characterizations
can be achieved. For instance, for natural numbers, we expect that there is one
proof of equality in IdN (n, n) for any natural number n and none in IdN (m, n)
for m 6= n. Otherwise said, we expect IdN (n, n) = > and IdN (m, n) = ⊥ for
m 6= n. We can therefore code the expected type for identity types between any
two natural numbers as
code : → → Type ₀
code zero zero =
code zero (suc n) =
code (suc m) zero =
code (suc m) (suc n) = code m n
By univalence, in order to show that natural numbers have the expected identity
types, it is enough to show that there is an equivalence
IdN (m, n) ' code m n
To this aim we define an encoding function
CHAPTER 9. HOMOTOPY TYPE THEORY 445
enc : {m n : } → m ≡ n → code m n
enc {zero} {.zero} refl = tt
enc {suc n} {.(suc n)} refl = enc {n} {n} refl
and a decoding function in the other direction
dec : {m n : } → code m n → m ≡ n
dec {zero} {zero} tt = refl
dec {suc m} {suc n} c = ap suc (dec c)
and finally show that they form an equivalence:
-eq : (m n : ) → (m ≡ n) code m n
-eq m n =
enc , ((dec , dec-enc) , (dec , enc-dec {m}))
where
dec-enc : {m n : } → (p : m ≡ n) → dec (enc p) ≡ p
dec-enc {zero} {.zero} refl = refl
dec-enc {suc m} {.(suc m)} refl = ap (ap suc) (dec-enc refl)
enc-suc : {m n : } → (p : m ≡ n) → enc (ap suc p) ≡ enc p
enc-suc refl = refl
enc-dec : {m n : } → (c : code m n) → enc (dec {m} c) ≡ c
enc-dec {zero} {zero} tt = refl
enc-dec {suc m} {suc n} c =
trans (enc-suc (dec {m} {n} c)) (enc-dec {m} {n} c)
From there, one easily deduces that a proposition is either ⊥ when empty
Prop- - : ∀ {i} {A : Type i} → isProp A → ¬ A → A
Prop- - PA k = ¬- - k
or > when non-empty
Prop- -T : ∀ {i} {A : Type i} → isProp A → A → A
Prop- -T PA x = Contr- - (aProp-isContr PA x)
p : Bool ≡ Bool
which will not be the identity path. Geometrically, we can picture the situation
as follows. The type Bool is a point in the space of all types, which contains a
loop p on it induced by negation:
Bool p
If we assume that Type is a set, then we will assimilate this path to the identity
path, which will lead to a contradiction, because it will also force us to identify
false and true, which we know is not the case:
false≢true : ¬ (false ≡ true)
false≢true ()
Namely, the function coe p : Bool → Bool transports a boolean along p, and the
computation rule for univalence tells us that it is precisely negation. Now, if we
assume that Type is a set, the path p will be equal to the path refl : Bool → Bool
and therefore, we will have coe p ≡ coe refl, otherwise said the boolean negation
function is equal to the identity. If we apply both to true (using happly), we
get that false is equal to true, hence a contradiction.
CHAPTER 9. HOMOTOPY TYPE THEORY 448
An equality in the product is thus the same as an equality in each of the com-
ponents. Now, we have seen in section 9.4.4 that a product is a particular case
of a dependent function
(A × B) ≡ (Π(b : Bool).δA,B b)
More precisely, we expect that for every functions f, g : Π(x : A).B, we have
IdΠ(x:A).B (f, g) ≡ (f ∼ g)
i.e. the two functions f and g are equal when we have f x ≡ g x for every
element x of A. While we will see that this is true, the proof performed for
products above does notes generalize easily. Namely, our first hope is to prove
this identity using univalence, by showing an equivalence between the two types.
However, this is not easy. Constructing a map from left to right is not a problem:
the function happly defined in section 9.4.1 provides us with such a function
IdΠ(x:A).B (f, g) → (f ∼ g)
(f ∼ g) → IdΠ(x:A).B (f, g)
General approach. Instead, the trick is to show the equality for all pairs of
functions f and g at once, i.e. show
Σ(f : Π(x : A).B).Σ(g : Π(x : A).B). IdΠ(x:A).B (f, g)
≡
Σ(f : Π(x : A).B).Σ(g : Π(x : A).B).f ∼ g
We are thus lead to consider the type
Homotopy(A, B) = Π(x : A). Path(B) ' Π(x : A).B ' Path(Π(x : A).B)
Paths. Let us first define the type Path(A) of all paths in a type A, as well as
simple helper functions. This type can be formalized in Agda as
We can see a homotopy between two given functions as an element of this type
Homotopy-of : ∀ {i j} {A : Type i} {B : A → Type j}
{f g : (x : A) → B x} → f g → Homotopy A B
Homotopy-of h x = Path-of (h x)
where the function -to is detailed below. From there, the non-dependent
function extensionality is easily deduced, its type being
FE {i} {j} = {A : Type i} {B : Type j} → {f g : A → B} →
((x : A) → f x ≡ g x) → f ≡ g
We can namely proceed as explained before, by considering the constant homo-
topy h0 on f and the homotopy h between f and g, showing that they have the
same image under the function -→ Homotopy- -Path (the proof is simply refl
because we have carefully defined -to, see below), deducing by injectivity that
h0 = h and deducing that f = g by projecting on the respective targets of h0
and h.
funext-nd : ∀ {i j} → FE {i} {j}
funext-nd {A = A} {B = B} {f = f} {g = g} h =
ap (λ h x → Homotopy-tgt h x) p
where
p : Homotopy-cst f ≡ Homotopy-of h
p = -inj Homotopy- -Path refl
(A → B) ' (A → B 0 )
CHAPTER 9. HOMOTOPY TYPE THEORY 454
from
B ' B0
This is actually the only place where the univalence axiom is used. Since we
have application of functions to equivalences, this is actually pretty easy to
define:
-to : ∀ {i j} → {A : Type i} → {B B' : Type j} →
B B' → (A → B) (A → B')
-to {A = A} e = -ap (λ B → A → B) e
Given a function f : B → B 0 which is an equivalence, we have “no control” over
the function (A → B) → (A → B 0 ) which is the produced equivalence, which
complicates the proofs. However, there would be a natural candidate, namely
the function
λgx.f (g x) : (A → B) → (A → B 0 )
It simplifies much the proofs if we enforce this choice. This can be done by
defining instead:
-to : ∀ {i j} → {A : Type i} → {B B' : Type j} →
B B' → (A → B) (A → B')
-to {i} {j} {A} {B} {B'} e = (λ f x → ( -→ e) (f x)) , lem e
where
lem : {B B' : Type j} (e : B B') →
isEquiv (λ (f : A → B) x → ( -→ e) (f x))
lem = -ind
(λ {B} e → isEquiv (λ (f : A → B) x → ( -→ e) (f x)))
(λ {B} → snd ( -refl {A = A → B}))
which we respectively call f 0 and g 0 , the definition of the latter using the fact
that we have a homotopy. Recall that the type Σ(y : B). IdB (f (x), y), is what
we called the singleton at f (x) and is contractible, see section 9.3.3; therefore, by
weak function extensionality, the above type is also contractible. The functions
f 0 and g 0 being elements of a contractible type, they are necessarily equal, from
which one easily deduces that f and g are equal.
funext : ∀ {i j} → DFE {i} {j}
funext {A = A} {B = B} {f = f} {g = g} p =
ap (λ f x → fst (f x)) p'
where
f' : (x : A) → Singleton (f x)
f' x = f x , refl
g' : (x : A) → Singleton (f x)
g' x = g x , p x
contr : isContr ((x : A) → Singleton (f x))
contr = wfunext (λ x → Singleton-isContr (f x))
p' : f' ≡ g'
p' = Contr-isProp contr f' g'
The above proof does not use univalence and therefore, without univalence,
WFE implies DFE. The converse also holds, as explained above,
9.4.10 Propositional extensionality. Recall from section 9.3.1 that the propo-
sitional extensionality axiom states that two logically equivalent propositions A
and B are equal:
PE : ∀ {i} → Type (lsuc i)
PE {i} = ∀ {A B : Type i} → isProp A → isProp B → A ↔ B → A ≡ B
This is intuitively justified because, since A and B are both propositions they
are either empty or a point, and since they are equivalent they are both empty
or both non-empty. We show here that this principle follows from univalence.
Namely, two logically equivalent propositions A and B are equivalent: the log-
ical equivalence provides functions f : A → B and g : B → A and we have
g ◦ f (x) ≡ x and f ◦ g(y) ≡ y for every x in A and y in B because A and B are
propositions (and thus any two elements are equal).
↔-to- : ∀ {i} {A B : Type i} →
isProp A → isProp B → A ↔ B → A B
↔-to- PA PB (f , g) =
f ,
(g , (λ x → PA (g (f x)) x)) ,
(g , (λ x → PB (f (g x)) x))
Finally, univalence provides us with the required equality:
propext : ∀ {i} → PE {i}
propext PA PB e = ua (↔-to- PA PB e)
By transport, this means that given two equivalent propositions, one can be
substituted for the other. We have already encountered an instance of this in
lemma 2.2.9.1.
path
beg end
CHAPTER 9. HOMOTOPY TYPE THEORY 457
This type is of course a set (and even a contractible type), but the approach will
generalize to types which are not. The corresponding type, that we are going
to write I, can be thought of as freely generated by two points beg and end, as
well as a path path : beg ≡ end, as figured above, which suggests the following
rules. The formation rule states that I is a well-formed type in any well-formed
context
Γ`
(IF )
Γ ` I : Type
The introduction rules states that beg and end are elements of the interval and
that path is a path between them:
Γ` Γ` Γ`
(Ibeg ) (Iend
I ) (Ipath )
Γ ` beg : I I Γ ` end : I Γ ` path : IdI (beg, end) I
A(beg)
A(end)
b b0 = transport(A, path, b)
p
e
Γ`t:I Γ, x : I ` A : Type
Γ ` b : A[beg /x] Γ ` e : A[end /x] Γ ` p : IdA[end /x] (b0 , e)
(IE )
Γ ` rec(t, x 7→ A, b, e, p) : A[t/x]
CHAPTER 9. HOMOTOPY TYPE THEORY 458
where b0 is a shorthand for transport(A, path, b). The computation rules state
that when we apply the elimination rule in the case where t is beg, end and
path, we recover b, e and p respectively:
Γ, x : I ` A : Type
Γ ` b : A[beg /x] Γ ` e : A[end /x] Γ ` p : IdA[end /x] (b0 , e) beg
(IC )
Γ ` rec(beg, x 7→ A, b, e, p) = b : A[beg /x]
Γ, x : I ` A : Type
Γ ` b : A[beg /x] Γ ` e : A[end /x] Γ ` p : IdA[end /x] (b0 , e) end
(IC )
Γ ` rec(end, x 7→ A, b, e, p) = e : A[end /x]
Γ, x : I ` A : Type
Γ ` b : A[beg /x] Γ ` e : A[end /x] Γ ` p : IdA[end /x] (b0 , e) path
(IC )
Γ ` apd(rec(−, x 7→ A, b, e, p), path) = p : IdA[end /x] (b0 , e)
We do not include a uniqueness rules because it can be shown to hold proposi-
tionally (this is detailed in section 9.5.3 in the case of the circle type).
The circle type. A type Circle corresponding to the circle can easily be imple-
mented, if we think of the circle as being freely generated by a point, that we
call base, and a path loop : base ≡ base:
base loop
In other words, it is the above interval type, where the beginning and end point
have been identified.
The formation rule states that Circle is a well-formed type in a well-formed
context:
Γ`
(CircleF )
Γ ` Circle : Type
The introduction rules allow typing the point base and the path loop:
Γ` Γ`
(Circlebase
I ) (Circleloop
I )
Γ ` base : Circle Γ ` loop : IdCircle (base, base)
The elimination rule states that an application from the circle Circle into an
arbitrary type A is determined by a point b of A (the image of base) and a
path p (which determines the image of loop, as explained above):
Γ ` t : Circle
Γ, x : I ` A : Type Γ ` b : A[base /x] Γ ` p : IdA[base /x] (b0 , e)
(CircleE )
Γ ` rec(t, x 7→ A, b, e, p) : A[t/x]
where b0 is a shorthand for transport(A, loop, b). The computation rules are left
to the reader. The reader should get convinced that we could write the rules
for the type corresponding to the usual low dimensional spaces in this way.
CHAPTER 9. HOMOTOPY TYPE THEORY 459
Exercise 9.5.1.1. This is not the only way of implementing the circle. For in-
stance, formalize the type corresponding to the following description of the
sphere:
p
x y
q
i.e. freely generated by two points x and y and two paths p and q.
Exercise 9.5.1.2. Write down the rules for the type corresponding to the sphere.
9.5.2 Paths over. As noted above, when writing the elimination rule of types
involving paths as constructors one needs to compare elements (say, b and e) of
distinct types (say, A[beg /x] and A[end /x]), and the way we used to proceed
consisted in transporting the first along p into b0 , so that it lies in the same type
as the second. Here, a path between b0 and e can be thought of as representing
a path between b and e, i.e. as a way of comparing two elements which do not
live in the same type. This is similar to what we have done in section 6.6.9
when defining heterogeneous equality, although here we have to be more precise
about equalities here.
Given a path p : x ≡ y in a type A, a dependent type B : A → Type,
and two elements t : B(x) and u : B(y), we write t ≡B p u for the type of paths
over p between t and u. This intuitively corresponds to the collection of paths
between t and u whose projection onto A gives the path p:
B(x)
B(y)
t u
A p
x y
t ≡ u [ B ↓ p ]
what we have been writing t ≡B p u earlier. This new definition could be used
to simplify the types of functions in various places. For instance, the function
apd, see section 9.4.1, could be defined as
apd : ∀ {i j} {A : Type i} {B : A → Type j} (f : (a : A) → B a)
{x y : A} → (p : x ≡ y) → f x ≡ f y [ B ↓ p ]
apd f refl = refl
9.5.4 Useful higher inductive types. In order to further illustrate the use
of higher inductive types, we briefly present two quite useful ones: suspension
and propositional truncation.
S
In particular, if iteratively apply this suspension operation starting from the
empty space, we obtain the spheres:
N
S
N S N ...
Σ0 ∅ Σ1 ∅ Σ2 ∅ Σ3 ∅ ...
CHAPTER 9. HOMOTOPY TYPE THEORY 462
More precisely, the n-sphere is the (n+1)-th suspension of the empty space (the
empty space could thus be considered as a good notion of (−1)-sphere). In
Agda, we can define the suspension of a type as the higher inductive type
Sphere : → Type ₀
Sphere zero = Susp
Sphere (suc n) = Susp (Sphere n)
The first constructor (∣_∣) states that any point of A is a point of kAk, and
the second one (∥∥-isProp) adds all the required paths. The resulting type
is trivially a proposition by ∥∥-isProp and the associated recurrence principle,
which corresponds to the elimination rule (kkE ), can be shown as follows:
∥∥-rec : ∀ {i j} {A : Type i} {B : Type j} →
isProp B → (A → B) → ∥ A ∥ → B
∥∥-rec PB f ∣ x ∣ = f x
∥∥-rec PB f (∥∥-isProp x y ι) =
PB (∥∥-rec PB f x) (∥∥-rec PB f y) ι
It can, for instance, be used to construct the canonical map kAk → ¬¬A for an
arbitrary type A described in section 9.3.4:
∥∥-¬¬ : ∀ {i} {A : Type i} → ∥ A ∥ → ¬ (¬ A)
∥∥-¬¬ = ∥∥-rec ¬-isProp (λ x f → f x)
Appendix A
Appendix
A.1 Relations
A.1.1 Definition. Given a set A a relation R on A is a subset R ⊆ A × A. We
sometimes write a R b when (a, b) ∈ R. It is
– reflexive if a R a for every a ∈ A,
– transitive if a R c for every a, c ∈ A such that there exists b ∈ A for which
a R b and b R c,
R ∪ {(a, a) | a ∈ A}
R ∪ {(b, a) | (a, b) ∈ R}
The following characterization is often useful (and similar results hold for other
closure operations):
Lemma A.1.2.1. The reflexive and transitive closure R∗ of a relation R on a
set A is the smallest subset of A such that
– a R a for every a ∈ A,
– a R c for every a, c ∈ A such that there exists b ∈ A for which a R b and
b R∗ c.
APPENDIX A. APPENDIX 464
f (E1 , . . . , En ) = f (a1 , . . . , an )
A.2 Monoids
A.2.1 Definition. A monoid (M, ·, 1) is a set M equipped with
– a function _ · _ : M × M → M called multiplication,
(u · v) · w = u · (v · w) 1·u=u=u·1
Such a monoid is
A.2.2 Free monoids. Given a set A, we write (A∗ , ·, 1) for the monoid such
that A∗ is the set of words on A, i.e. finite sequences a1 . . . an of elements of A,
multiplication is concatenation, i.e.
(a1 . . . an ) · (b1 . . . bm ) = a1 . . . an b1 . . . bm
and unit 1 is the empty sequence. We write |a1 . . . an | = n for the length of a
word.
Proposition A.2.2.1. The monoid (A∗ , ·, 1) is the free monoid on A: given a
monoid (M, ·, 1) and a function f : A → M , there exists a unique morphism of
monoids f such that f (a) = f (a) for every a ∈ A.
Given a set A, we define in appendix A.3.5 below the set A# of all multisets
on A. It is a monoid when equipped with disjoint union ] as multiplication and
empty multiset ∅ as unit.
APPENDIX A. APPENDIX 465
is eventually stationary
of elements ai such that ai > ai+1 and P (ai ) does not hold. Since (A, 6) is
well-founded, such a sequence cannot exist.
APPENDIX A. APPENDIX 466
Remark A.3.2.2. None of the above reasoning does exploit the fact that < is
reflexive, transitive nor antisymmetric, and the reasoning would hold for any
relation R in place of <. A relation R on a set A is well-founded if there is no
infinite sequence of elements ai of A such that
a0 R a1 R a2 R . . .
It is easily shown that u < t implies ht(u) < ht(t). Therefore, if the subterm
order was not well-founded, (N, 6) would not be well-founded either.
A.3.3 Lexicographic order. Given two posets (A, 6A ) and (B, 6B ), we de-
fine the lexicographic order 6 on A × B by (a, b) < (a0 , b0 ) whenever a < a0 , or
a = a0 and b < b0 .
Lemma A.3.3.1. The relation 6 on A × B is a partial order.
Lemma A.3.3.2. The partial order 6 is total when both 6A and 6B are.
Theorem A.3.3.3. The partial order 6 is well-founded when both 6A and 6B
are.
Proof. Suppose given an infinite sequence
By definition of >, for every index i, we either have ai > ai+1 or bi > bi+1 . The
sets
{i ∈ N | ai > ai+1 } and {i ∈ N | bi > bi+1 }
are such that their union is N, therefore one of them must be infinite. We thus
have an infinite strictly decreasing sequence of elements of A or of elements
of B. This is impossible since both posets (A, 6A ) and (B, 6B ) are supposed
to be well-founded.
The elements of T are called the nodes of the tree and x0 is called the root node.
Given x ∈ T , τ (x) is called the parent of x, and x is a child of τ (x). A node x
such that τ −1 (x) = ∅ is called a leaf. Given a node x ∈ T , the subtree at x is
the tree
Tx = {y ∈ T | ∃n ∈ N.τ n (y) = x}
with parent function τx such that τx (y) = τ (y) for y 6= x.
Lemma A.3.4.1. The set of nodes of a tree T satisfies
[
T = {x0 } ∪ Tx
x∈τ −1 (x0 )
A labeled tree is a tree equipped with a function which to every node asso-
ciates a label, which is an element of some fixed set.
APPENDIX A. APPENDIX 468
M :A→N
for a ∈ A.
Suppose that (A, 6) is a poset. We define a partial order 6# on A# , called
the multiset extension of 6, by M 6# N whenever there exists finite multisets
X, Y ∈ A# such that
This order is such that we get a smaller multiset by removing and element and
replacing it with an arbitrary number of smaller elements: the elements get
smaller and smaller, but also more and more numerous. It can still be shown
that the resulting order is well-founded when the original one is [DM79].
fin , 6 ) is well-founded if and only if (A, 6) is
Theorem A.3.5.1. The poset (A# #
well-founded.
Proof. The left-to-right implication is easy, we show the right-to-left implication.
We define a relation C on A# by M C N when there exists a ∈ A and a finite
multiset Y such that
M = (N \ {a}) ] Y
and b < a for every a ∈ Y . The relation 6# is easily shown to be the reflexive
and transitive closure by C. Now, by contradiction, suppose that there is an
infinite decreasing sequence for 6# . This means that there exists an infinite
sequence
M 0 B M1 B M2 B . . .
where
Mi+1 = (Mi \ {xi }) ] Yi
APPENDIX A. APPENDIX 469
Theorem A.4.1.2. Given sets A and B such that B contains at least two distinct
elements y0 and y1 , there is no injection from A → B to A.
Proof. Suppose given an injection ψ : (A → B) → A. We define a function
φ : A → (A → B) by
(
x 7→ y0 if there is no f : A → B such that ψ(f ) = x,
φ(x) =
f for some f : A → B such that ψ(f ) = x, otherwise.
ψ ◦ φ ◦ ψ(f ) = ψ(f )
Thus, by injectivity of ψ,
φ(ψ(f )) = f
and φ is surjective. We conclude using theorem A.4.1.2.
Note that the above proof implicitly requires the excluded middle in order to
construct the function φ. It is apparently not possible to prove this theorem in
a constructive setting [Bau11].
Corollary A.4.1.3. Given a set A, write P(A) for its powerset. There is no
surjection A → P(A) and no injection P(A) → A. In particular, there is no
bijection between A and P(A).
Proof. Taking B = {0, 1} in the previous theorems, we have P(A) ' (A → B)
and we conclude.
Corollary A.4.1.4. There is no bijection between N → N and N.
Proof. Take A = B = N in the previous theorems.
Lemma A.4.1.5. The set P of programs (in any reasonable language) is count-
able.
Proof. A program is a finite sequence of characters: writing Σ for the finite set of
characters (e.g. the UTF-8 characters), programs are elements of Σ∗ . Otherwise
said, writing P for the set of programs, we have P ⊆ Σ∗ . The set Σ can be
totally ordered (e.g. a < b < c < . . .), thus Σ∗ is totally ordered by the deglex
order (theorem A.3.3.4) and thus P is totally ordered, as a subset of a totally
ordered set. Given a program p F ∈ P ⊆ Σ∗ , writing n for its length, the elements
below it belong to the finite set i6n Σi which is finite, as a finite union of finite
sets. We can thus associate, to every program p ∈ P, the natural number np
defined as the cardinal of the longest ascending chain in P with p as maximal
element (which is finite by the previous argument). The function P → N thus
defined is easily seen to be a bijection.
Corollary A.4.1.6. There is a function N → N which is not computable by a
program.
Proof. By absurd, suppose that this is not the case. This means that there is a
surjection φ : P → (N → N) and, by precomposing with the isomorphism N ' P
of lemma A.4.1.5, a surjection N → (N → N). We conclude by theorem A.4.1.1.
APPENDIX A. APPENDIX 471
[Abe17] Andreas Abel. How safe is Type:Type? Mail on the Agda mailing
list, 2017. Available at https://lists.chalmers.se/pipermail/
agda/2017/009337.html.
[ACD+ 18] Andreas Abel, Jesper Cockx, Dominique Devriese, Amin Timany,
.
and Philip Wadler. ='∼ = Leibniz Equality is Isomorphic to
Martin-Löf Identity, Parametrically. Unpublished, 2018.
[Ack28] Wilhelm Ackermann. Zum Hilbertschen Aufbau der reellen
Zahlen. Mathematische Annalen, 99(1):118–133, 1928.
[Acz78] Peter Aczel. The Type Theoretic Interpretation of Constructive
Set Theory. In Studies in Logic and the Foundations of Mathe-
matics, volume 96, pages 55–66. Elsevier, 1978.
[Alt19] Thorsten Altenkirch. Naïve type theory. In Reflections on the
Foundations of Mathematics, pages 101–136. Springer, 2019.
[Arn17] Michael Arntzenius. Normalisation by evaluation for the simply-
typed lambda calculus, in Agda, 2017. Available at https:
//gist.github.com/rntz/2543cf9ef5ee4e3d990ce3485a0186e2/
revisions.
[Bae18] John Baez. Patterns That Eventually Fail. Azimuth
blog, 2018. https://johncarlosbaez.wordpress.com/2018/09/
20/patterns-that-eventually-fail/.
[Bar84] Hendrik Pieter Barendregt. The Lambda Calculus: Its Syntax and
Semantics. 1984.
[Bar91] Henk P Barendregt. Self-interpretation in lambda calculus. 1991.
[Bau11] Andrej Bauer. An injection from NN to N. Unpublished note,
2011.
[Bau12] Andrej Bauer. How to implement dependent type theory. Blog
post, 2012. Available at http://math.andrej.com/2012/11/08/
how-to-implement-dependent-type-theory-i/.
[Bau17] Andrej Bauer. Five stages of accepting constructive mathemat-
ics. Bulletin of the American Mathematical Society, 54(3):481–
498, 2017.
[BB01] David Borwein and Jonathan M Borwein. Some remarkable prop-
erties of sinc and related integrals. The Ramanujan Journal,
5(1):73–89, 2001.
[Bel98] John Lane Bell. A primer of infinitesimal analysis. Cambridge
University Press, 1998.
BIBLIOGRAPHY 474
[DM82] Luis Damas and Robin Milner. Principal type-schemes for func-
tional programs. In Proceedings of the 9th ACM SIGPLAN-
SIGACT symposium on Principles of programming languages,
pages 207–212, 1982.
[FH92] Matthias Felleisen and Robert Hieb. The revised report on the
syntactic theories of sequential control and state. Theoretical com-
puter science, 103(2):235–271, 1992.
[FR98] Michael J Fischer and Michael O Rabin. Super-exponential com-
plexity of Presburger arithmetic. In Quantifier Elimination and
Cylindrical Algebraic Decomposition, pages 122–135. Springer,
1998.
[Fre79] Gottlob Frege. Begriffsschrift, eine der arithmetischen nachge-
bildete Formelsprache des reinen Denkens. Nebert, 1879.
[KL20] Chris Kapulkin and Peter LeFanu Lumsdaine. The law of excluded
middle in the simplicial model of type theory, 2020.
[Kna28] Bronisław Knaster. Un théorème sur les functions d’ensembles.
Ann. Soc. Polon. Math., 6:133–134, 1928.
BIBLIOGRAPHY 479
483
INDEX 484
W-type, 380