100% found this document useful (1 vote)
20 views340 pages

Dao FP

Uploaded by

C D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
20 views340 pages

Dao FP

Uploaded by

C D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 340

The Dao of Functional Programming

Bartosz Milewski
(Last updated: July 5, 2024)

Contents

Contents i
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

1 Clean Slate 1
1.1 Types and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Yin and Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 The Object of Arrows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Composition 5
2.1 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Function application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Monomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Epimorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Isomorphisms 13
3.1 Isomorphic Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Naturality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Reasoning with Arrows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Reversing the Arrows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

i
ii CONTENTS

4 Sum Types 21
4.1 Bool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Enumerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Short Haskell Digression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Sum Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Maybe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 Cocartesian Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
One Plus Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Something Plus Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Associativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Functoriality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Symmetric Monoidal Category . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 Product Types 33
Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Tuples and Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1 Cartesian Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Tuple Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Functoriality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 Monoidal Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Monoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6 Function Types 43
Elimination rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Introduction rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Currying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Relation to lambda calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Modus ponens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1 Sum and Product Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Sum types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Product types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Functoriality revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2 Functoriality of the Function Type . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3 Bicartesian Closed Categories . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Distributivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7 Recursion 55
7.1 Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Introduction Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Elimination Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
In Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.2 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Elimination Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.3 Functoriality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
CONTENTS iii

8 Functors 63
8.1 Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Category of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Opposite categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Product categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Slice categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Coslice categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.2 Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Functors between categories . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.3 Functors in Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Endofunctors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Bifunctors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Contravariant functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Profunctors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.4 The Hom-Functor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.5 Functor Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Category of categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

9 Natural Transformations 75
9.1 Natural Transformations Between Hom-Functors . . . . . . . . . . . . . . . . 75
9.2 Natural Transformation Between Functors . . . . . . . . . . . . . . . . . . . . 77
9.3 Natural Transformations in Programming . . . . . . . . . . . . . . . . . . . . 78
Vertical composition of natural transformations . . . . . . . . . . . . . . . . . 80
Functor categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Horizontal composition of natural transformations . . . . . . . . . . . . . . . . 82
Whiskering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Interchange law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.4 Universal Constructions Revisited . . . . . . . . . . . . . . . . . . . . . . . . 86
Picking objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Cospans as natural transformations . . . . . . . . . . . . . . . . . . . . . . . . 87
Functoriality of cospans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Sum as a universal cospan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Product as a universal span . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
9.5 Limits and Colimits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Equalizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Coequalizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
The existence of the terminal object . . . . . . . . . . . . . . . . . . . . . . . 96
9.6 The Yoneda Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Yoneda lemma in programming . . . . . . . . . . . . . . . . . . . . . . . . . 99
The contravariant Yoneda lemma . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.7 Yoneda Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.8 Representable Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
The guessing game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Representable functors in programming . . . . . . . . . . . . . . . . . . . . . 103
9.9 2-category 𝐂𝐚𝐭 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.10 Useful Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
iv CONTENTS

10 Adjunctions 105
10.1 The Currying Adjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
10.2 The Sum and the Product Adjunctions . . . . . . . . . . . . . . . . . . . . . . 106
The diagonal functor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
The sum adjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
The product adjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Distributivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
10.3 Adjunction between functors . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
10.4 Limits and Colimits as Adjunctions . . . . . . . . . . . . . . . . . . . . . . . 110
10.5 Unit and Counit of an Adjunction . . . . . . . . . . . . . . . . . . . . . . . . . 111
Triangle identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
The unit and counit of the currying adjunction . . . . . . . . . . . . . . . . . . 114
10.6 Adjunctions Using Universal Arrows . . . . . . . . . . . . . . . . . . . . . . . 115
Comma category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Universal arrow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Universal arrows from adjunctions . . . . . . . . . . . . . . . . . . . . . . . . 117
Adjunction from universal arrows . . . . . . . . . . . . . . . . . . . . . . . . 117
10.7 Properties of Adjunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Left adjoints preserve colimits . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Right adjoints preserve limits . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.8 Freyd’s adjoint functor theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Freyd’s theorem in a preorder . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Solution set condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Defunctionalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.9 Free/Forgetful Adjunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
The category of monoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Free monoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Free monoid in programming . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
10.10The Category of Adjunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
10.11Levels of Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

11 Dependent Types 131


11.1 Dependent Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
11.2 Dependent Types Categorically . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Fibrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Type families as fibrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Pullbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Dependent environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Weakening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Base-change functor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
11.3 Dependent Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Adding the atlas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Existential quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
11.4 Dependent Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Dependent product in Haskell . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Dependent product of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Dependent product categorically . . . . . . . . . . . . . . . . . . . . . . . . . 143
CONTENTS v

Adding the atlas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


Universal quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
11.5 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Equational reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Equality vs isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Equality types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Introduction rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
𝛽-reduction and 𝜂-conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Induction principle for natural numbers . . . . . . . . . . . . . . . . . . . . . 151
Equality elimination rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

12 Algebras 155
12.1 Algebras from Endofunctors . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
12.2 Category of Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Initial algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
12.3 Lambek’s Lemma and Fixed Points . . . . . . . . . . . . . . . . . . . . . . . . 158
Fixed point in Haskell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
12.4 Catamorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Lists as initial algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
12.5 Initial Algebra from Universality . . . . . . . . . . . . . . . . . . . . . . . . . 163
12.6 Initial Algebra as a Colimit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
The proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

13 Coalgebras 169
13.1 Coalgebras from Endofunctors . . . . . . . . . . . . . . . . . . . . . . . . . . 169
13.2 Category of Coalgebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
13.3 Anamorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Infinite data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
13.4 Hylomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
The impedance mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
13.5 Terminal Coalgebra from Universality . . . . . . . . . . . . . . . . . . . . . . 174
13.6 Terminal Coalgebra as a Limit . . . . . . . . . . . . . . . . . . . . . . . . . . 176

14 Monads 179
14.1 Programming with Side Effects . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Partiality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Nondeterminism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Input/Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
14.2 Composing Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
14.3 Alternative Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
14.4 Monad Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Partiality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
vi CONTENTS

Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Nondeterminism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Input/Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
14.5 Do Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
14.6 Continuation Passing Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Tail recursion and CPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Using named functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Defunctionalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
14.7 Monads Categorically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Monad as a monoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
14.8 Free Monads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Category of monads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Free monad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Stack calculator example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
14.9 Monoidal Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Lax monoidal functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Functorial strength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Applicative functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Closed functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Monads and applicatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

15 Monads and Adjunctions 209


15.1 String Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
String diagrams for the monad . . . . . . . . . . . . . . . . . . . . . . . . . . 212
String diagrams for the adjunction . . . . . . . . . . . . . . . . . . . . . . . . 214
15.2 Monads from Adjunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
15.3 Examples of Monads from Adjunctions . . . . . . . . . . . . . . . . . . . . . 216
Free monoid and the list monad . . . . . . . . . . . . . . . . . . . . . . . . . . 216
The currying adjunction and the state monad . . . . . . . . . . . . . . . . . . . 217
M-sets and the writer monad . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Pointed objects and the Maybe monad . . . . . . . . . . . . . . . . . . . . . . 221
The continuation monad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
15.4 Monad Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
State monad transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
15.5 Monad Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Eilenberg-Moore category . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Kleisli category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

16 Comonads 231
16.1 Comonads in Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
The Stream comonad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
16.2 Comonads Categorically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Comonoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
16.3 Comonads from Adjunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Costate comonad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
CONTENTS vii

Comonad coalgebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238


Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

17 Ends and Coends 241


17.1 Profunctors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Collages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Profunctors as relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Profunctor composition in Haskell . . . . . . . . . . . . . . . . . . . . . . . . 243
17.2 Coends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Extranatural transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Profunctor composition using coends . . . . . . . . . . . . . . . . . . . . . . . 248
Colimits as coends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
17.3 Ends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Natural transformations as an end . . . . . . . . . . . . . . . . . . . . . . . . 251
Limits as ends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
17.4 Continuity of the Hom-Functor . . . . . . . . . . . . . . . . . . . . . . . . . . 253
17.5 Fubini Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
17.6 Ninja Yoneda Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Yoneda lemma in Haskell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
17.7 Day Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Applicative functors as monoids . . . . . . . . . . . . . . . . . . . . . . . . . 257
Free Applicatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
17.8 The Bicategory of Profunctors . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Monads in a bicategory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Prearrows as monads in 𝐏𝐫𝐨𝐟 . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
17.9 Existential Lens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Existential lens in Haskell . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Existential lens in category theory . . . . . . . . . . . . . . . . . . . . . . . . 263
Type-changing lens in Haskell . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Lens composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Category of lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
17.10Lenses and Fibrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Transport law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Identity law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Composition law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Type-changing lens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
17.11Important Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

18 Tambara Modules 271


18.1 Tannakian Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Monoids and their Representations . . . . . . . . . . . . . . . . . . . . . . . . 271
Tannakian reconstruction of a monoid . . . . . . . . . . . . . . . . . . . . . . 272
Cayley’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Proof of Tannakian reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 276
Tannakian reconstruction in Haskell . . . . . . . . . . . . . . . . . . . . . . . 277
Tannakian reconstruction with adjunction . . . . . . . . . . . . . . . . . . . . 278
18.2 Profunctor Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Iso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
viii CONTENTS

Profunctors and lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281


Tambara module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Profunctor lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Profunctor lenses in Haskell . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
18.3 General Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Prisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Traversals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
18.4 Mixed Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

19 Kan Extensions 291


19.1 Closed Monoidal Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Internal hom for Day convolution . . . . . . . . . . . . . . . . . . . . . . . . . 292
Powering and co-powering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
19.2 Inverting a functor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
19.3 Right Kan extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Right Kan extension as an end . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Right Kan extension in Haskell . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Limits as Kan extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Left adjoint as a right Kan extension . . . . . . . . . . . . . . . . . . . . . . . 301
Codensity monad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Codensity monad in Haskell . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
19.4 Left Kan extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
Left Kan extension as a coend . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Left Kan extension in Haskell . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Colimits as Kan extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Right adjoint as a left Kan extension . . . . . . . . . . . . . . . . . . . . . . . 308
Day convolution as a Kan extension . . . . . . . . . . . . . . . . . . . . . . . 308
19.5 Useful Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

20 Enrichment 311
20.1 Enriched Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Set-theoretical foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Hom-Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
Enriched Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Preorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Self-enrichment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
20.2 -Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
The Hom-functor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
Enriched co-presheaves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Functorial strength and enrichment . . . . . . . . . . . . . . . . . . . . . . . . 317
20.3 -Natural Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
20.4 Yoneda Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
20.5 Weighted Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
20.6 Ends as Weighted Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
20.7 Kan Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
20.8 Useful Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
CONTENTS ix

Index 327
x CONTENTS

Preface
Most programming texts, following Brian Kernighan, start with “Hello World!”. It’s natural
to want to get the immediate gratification of making the computer do your bidding and print
these famous words. But the real mastery of computer programming goes deeper than that, and
rushing into it may only give you a false feeling of power, when in reality you’re just parroting
the masters. If your ambition is just to learn a useful, well-paid skill then, by all means, write
your "Hello World!" program. There are tons of books and courses that will teach you how to
write code in any language of your choice. However, if you really want to get to the essence of
programming, you need to be patient and persistent.
Category theory is the branch of mathematics that provides the abstractions that accord
with the practical experience of programming. Paraphrasing von Clausewitz: Programming is
merely the continuation of mathematics with other means. A lot of complex ideas of category
theory become obvious to programmers when explained in terms of data types and functions. In
this sense, category theory might be more accessible to programmers than it is to professional
mathematicians.
When faced with a new categorical concepts I would often look them up on Wikipedia or
nLab, or re-read a chapter from Mac Lane or Kelly. These are great sources, but they require
some up front familiarity with the topics and the ability to fill in the gaps. One of the goals of
this book is to provide the necessary bootstrap to continue studying category theory.
There is a lot of folklore knowledge in category theory and in computer science that is
nowhere to be found in the literature. It’s very difficult to acquire useful intuitions when go-
ing through dry definitions and theorems. I tried, as much as possible, to provide the missing
intuitions and explain not only the whats but also the whys.
The title of this book alludes to Benjamin Hoff’s “The Tao of Pooh” and to Robert Pirsig’s
“Zen and the Art of Motorcycle Maintenance,” both being attempts by Westerners to assimilate
elements of Easter philosophy. Loosely speaking, the idea is that category theory is to program-
ming as the Dao1 is to Western philosophy. Many of the definitions of category theory make no
sense on first reading but in time you learn to appreciate their deep wisdom. If category theory
were to be summarized in one soundbite, it would be: “Things are defined by their relationship
to the Universe.”

Set theory
Traditionally, set theory was considered the foundation of mathematics, although more recently
type theory is vying for this title. In a sense, set theory is the assembly language of mathematics,
and as such contains a lot of implementation details that often obscure the presentation of high
level ideas.
Category theory is not trying to replace set theory, and it’s often used to build abstrac-
tions that are later modeled using sets. In fact the fundamental theorem of category theory, the
Yoneda lemma, connects categories to their models in set theory. We can find useful intuition in
computer graphics, where we build and manipulate abstract worlds only to, at the last moment,
project and sample them for a digital display.
It’s not necessary to be fluent in set theory in order to study category theory. But some
familiarity with the basics is necessary. For instance the idea that sets contain elements. We say
that, given a set 𝑆 and an element 𝑎, it makes sense to ask whether 𝑎 is an element of 𝑆 or not.
1
Dao is the more modern spelling of Tao
PREFACE xi

This statement is written as 𝑎 ∈ 𝑆 (𝑎 is a member of 𝑆). It’s also possible to have an empty set
that contains no elements.
The important property of elements of a set is that they can be compared for equality. Given
two elements 𝑎 ∈ 𝑆 and 𝑏 ∈ 𝑆, we can ask: Is 𝑎 equal to 𝑏? Or we may impose the condition
that 𝑎 = 𝑏, in case 𝑎 and 𝑏 are results of two different recipes for selecting elements of the set
𝑆. Equality of set elements is the essence of all the commuting diagrams in category theory.
A cartesian product of two sets 𝑆 × 𝑇 is defined as a set of all pairs of elements ⟨𝑠, 𝑡⟩, such
that 𝑠 ∈ 𝑆 and 𝑡 ∈ 𝑇 .
A function 𝑓 ∶ 𝑆 → 𝑇 , from the source set called the domain of 𝑓 to the target set called
the codomain, is also defined as a set of pairs. These are the pairs of the form ⟨𝑠, 𝑡⟩ where
𝑡 = 𝑓 𝑠. Here 𝑓 𝑠 is the result of the action of the function 𝑓 on the argument 𝑠. You might be
more familiar with the notation 𝑓 (𝑠) for function application, but here I’ll follow the Haskell
convention of omitting the parentheses (and commas, for functions of multiple variables).
In programming we are used to functions being defined by a sequence of instructions. We
provide an argument 𝑠 and apply the instructions to eventually produce the result 𝑡. We are often
worried about how long it may take to evaluate the result, or if the algorithm terminates at all.
In mathematics we assume that, for any given argument 𝑠 ∈ 𝑆 the result 𝑡 ∈ 𝑇 is immediately
available, and that it’s unique. In programming we call such functions pure and total.

Conventions
I tried to keep the notation coherent throughout the book. It’s based loosely on the prevailing
style in nLab.
In particular, I decided to use lowercase letters like 𝑎 or 𝑏 for objects in a category and up-
percase letters like 𝑆 for sets (even though sets are objects in the category of sets and functions).
Generic categories have names like  or , whereas specific categories have names like 𝐒𝐞𝐭 or
𝐂𝐚𝐭.
Programming examples are written in Haskell. Although this is not a Haskell manual, the
introduction of language constructs is gradual enough to help the reader navigate the code. The
fact that Haskell syntax is often based on mathematical notation is an additional advantage.
Program fragments are written in the following format:
apply :: (a -> b, a) -> b
apply (f, x) = f x
Chapter 1

Clean Slate

Programming starts with types and functions. You probably have some preconceptions about
what types and functions are: get rid of them! They will cloud your mind.
Don’t think about how things are implemented in hardware. What computers are is just
one of the many models of computation. We shouldn’t get attached to it. You can perform
computations in your mind, or with pen and paper. The physical substrate is irrelevant to the
idea of programming.

1.1 Types and Functions


Paraphrasing Lao Tzu1 : The type that can be described is not the eternal type. In other words,
type is a primitive notion. It cannot be defined.
Instead of calling it a type, we could as well call it an object or a proposition. These are
the words that are used to describe it in different areas of mathematics (type theory, category
theory, and logic, respectively).
There may be more than one type, so we need a way to name them. We could do it by
pointing fingers at them, but since we want to effectively communicate with other people, we
usually name them. So we’ll talk about type 𝑎, 𝑏, 𝑐; or Int, Bool, Double, and so on. These
are just names.
A type by itself has no meaning. What makes it special is how it connects to other types.
The connections are described by arrows. An arrow has one type as its source and one type as
its target. The target could be the same as the source, in which case the arrow loops around.
An arrow between types is called a function. An arrow between objects is called a morphism.
An arrow between propositions is called an entailment. These are just words that are used to
describe arrows in different areas of mathematics. You can use them interchangeably.
A proposition is something that may be true. In logic, we interpret an arrow between two
objects as 𝑎 entails 𝑏, or 𝑏 is derivable from 𝑎.

1
The modern spelling of Lao Tzu is Laozi, but I’ll be using the traditional one. Lao Tzu was the semi-legendary
author of Tao Te Ching (or Daodejing), a classic text on Daoism.

1
2 CHAPTER 1. CLEAN SLATE

There may be more than one arrow between two types, so we need to name them. For
instance, here’s an arrow called 𝑓 that goes from type 𝑎 to type 𝑏
𝑓
𝑎 ←←←→
← 𝑏
One way to interpret this is to say that the function 𝑓 takes an argument of type 𝑎 and
produces a result of type 𝑏. Or that 𝑓 is a proof that if 𝑎 is true then 𝑏 is also true.
Note: The connection between type theory, lambda calculus (which is the foundation of
programming), logic, and category theory is known as the Curry-Howard-Lambek correspon-
dence.

1.2 Yin and Yang


An object is defined by its connections. An arrow is a proof, a witness, of the fact that two objects
are connected. Sometimes there’s no proof, the objects are disconnected; sometimes there are
many proofs; and sometimes there’s a single proof—a unique arrow between two objects.
What does it mean to be unique? It means that if you can find two of those, then they must
be equal.
An object that has a unique outgoing arrow to every object is called the initial object.
Its dual is an object that has a unique incoming arrow from every object. It’s called the
terminal object.
In mathematics, the initial object is often denoted by 0 and the terminal object by 1.
The arrow from 0 to any object 𝑎 is denoted by ¡𝑎 , often abbreviated to ¡.
The arrow from any object 𝑎 to 1 is denoted by !𝑎 , often abbreviated to !.
The initial object is the source of everything. As a type it’s known in Haskell as Void.
It symbolizes the chaos from which everything arises. Since there is an arrow from Void to
everything, there is also an arrow from Void to itself.

Void
Thus Void begets Void and everything else.
The terminal object unites everything. As a type it’s known as Unit. It symbolizes the
ultimate order.
In logic, the terminal object signifies the ultimate truth, symbolized by 𝑇 or ⊤. The fact that
there’s an arrow to it from any object means that ⊤ is true no matter what your assumptions are.
Dually, the initial object signifies logical falsehood, contradiction, or a counterfactual. It’s
written as False and symbolized by an upside down T, ⊥. The fact that there is an arrow from it
to any object means that you can prove anything starting from false premises.
In English, there is a special grammatical construct for counterfactual implications. When
we say, “If wishes were horses, beggars would ride,” we mean that the equality between wishes
and horses implies that beggars be able to ride. But we know that the premise is false.
A programming language lets us communicate with each other and with computers. Some
languages are easier for the computer to understand, others are closer to the theory. We will use
Haskell as a compromise.
In Haskell, the name for the terminal type is (), a pair of empty parentheses, pronounced
Unit. This notation will make sense later.
1.3. ELEMENTS 3

There are infinitely many types in Haskell, and there is a unique function/arrow from Void
to each one of them. All these functions are known under the same name: absurd.

Programming Category theory Logic


type object proposition
function morphism (arrow) implication
Void initial object, 0 False ⊥
() terminal object, 1 True ⊤

1.3 Elements
An object has no parts but it may have structure. The structure is defined by the arrows pointing
at the object. We can probe the object with arrows.
In programming and in logic we want our initial object to have no structure. So we’ll assume
that it has no incoming arrows (other than the one that’s looping back from it). Therefore Void
has no structure.
The terminal object has the simplest structure. There is only one incoming arrow from any
object to it: there is only one way of probing it from any direction. In this respect, the terminal
object behaves like an indivisible point. Its only property is that it exists, and the arrow from
any other object proves it.
Because the terminal object is so simple, we can use it to probe other, more complex objects.
If there is more than one arrow coming from the terminal object to some object 𝑎, it means
that 𝑎 has some structure: there is more than one way of looking at it. Since the terminal object
behaves like a point, we can visualize each arrow from it as picking a different point or element
of its target.
In category theory we say that 𝑥 is a global element of 𝑎 if it’s an arrow
𝑥
1 ←←→
← 𝑎
We’ll often simply call it an element (omitting “global”).
In type theory, 𝑥 ∶ 𝐴 means that 𝑥 is of type 𝐴.
In Haskell, we use the double-colon notation instead:
x :: A
(Haskell uses capitalized names for concrete types, and lower-cased names for type variables.)
We say that x is a term of type A but, categorically, we’ll interpret it as an arrow 𝑥 ∶ 1 → 𝐴,
a global element of A. 2
In logic, such 𝑥 is called the proof of 𝐴, since it corresponds to the implication ⊤ → 𝐴 (if
True is true then A is true). Notice that there may be many different proofs of 𝐴.
Since we have mandated there be no arrows from any other object to Void, there is no arrow
from the terminal object to it. Therefore Void has no elements. This is why we think of Void
as empty.
The terminal object has just one element, since there is a unique arrow coming from it to
itself, 1 → 1. This is why we sometimes call it a singleton.
Note: In category theory there is no prohibition against the initial object having incoming
arrows from other objects. However, in cartesian closed categories that we’re studying here,
this is not allowed.
2
The Haskell type system distinguishes between x :: A and x :: () -> A. However, they denote the same
thing in categorical semantics.
4 CHAPTER 1. CLEAN SLATE

1.4 The Object of Arrows


Arrows between any two objects form a set3 . This is why some knowledge of set theory is a
prerequisite to the study of category theory.
In programming we talk about the type of functions from a to b. In Haskell we write:
f :: a -> b
meaning that f is of the type “function from a to b”. Here, a->b is just the name we are giving
to this type.
If we want function types to be treated the same way as other types, we need an object that
would represent a set of arrows from a to b.
To fully define this object, we would have to describe its relation to other objects, in partic-
ular to a and b. We don’t have the tools to do that yet, but we’ll get there.
For now, let’s keep in mind the following distinction: On the one hand we have arrows which
connect two objects a and b. These arrows form a set. On the other hand we have an object of
arrows from a to b. An “element” of this object is defined as an arrow from the terminal object
() to the object we call a->b.
The notation we use in programming tends to blur this distinction. This is why in category
theory we call the object of arrows an exponential and write it as 𝑏𝑎 (the source object is in the
exponent). So the statement:
f :: a -> b
is equivalent to
𝑓
← 𝑏𝑎
1 ←←←→
In logic, an arrow 𝐴 → 𝐵 is an implication: it states the fact that “if A then B.” An expo-
nential object 𝐵 𝐴 is the corresponding proposition. It could be true or it could be false, we don’t
know. You have to prove it. Such a proof is an element of 𝐵 𝐴 .
Show me an element of 𝐵 𝐴 and I’ll know that 𝐵 follows from 𝐴.
Consider again the statement, “If wishes were horses, beggars would ride”—this time as an
object. It’s not an empty object, because you can point at a proof of it—something along the
lines: “A person who has a horse rides it. Beggars have wishes. Since wishes are horses, beggars
have horses. Therefore beggars ride.” But, even though you have a proof of this statement, it’s
of no use to you, because you can never prove its premise: “wish = horse”.

3
Strictly speaking, this is true only in a locally small category.
Chapter 2

Composition

2.1 Composition
Programming is about composition. Paraphrasing Wittgenstein, one could say: “Of that which
cannot be decomposed one should not speak.” This is not a prohibition, it’s a statement of
fact. The process of studying, understanding, and describing is the same as the process of
decomposing; and our language reflects this.
The reason we have built the vocabulary of objects and arrows is precisely to express the
idea of composition.
Given an arrow 𝑓 from 𝑎 to 𝑏 and an arrow 𝑔 from 𝑏 to 𝑐, their composition is an arrow that
goes directly from 𝑎 to 𝑐. In other words, if there are two arrows, the target of one being the
same as the source of the other, we can always compose them to get a third arrow.

𝑎 𝑓
𝑏 𝑔 𝑐

In math we denote composition using a little circle

ℎ = 𝑔◦𝑓

We read this: “ℎ is equal to 𝑔 after 𝑓 .” The choice of the word “after” suggests temporal ordering
of actions, which in most cases is a useful intuition.
The order of composition might seem backward, but this is because we think of functions
as taking arguments on the right. In Haskell we replace the circle with a dot:
h = g . f
This is every program in a nutshell. In order to accomplish h, we decompose it into simpler
problems, f and g. These, in turn, can be decomposed further, and so on.
Now suppose that we were able to decompose 𝑔 itself into 𝑗◦𝑘. We have

ℎ = (𝑗◦𝑘)◦𝑓

We want this decomposition to be the same as

ℎ = 𝑗◦(𝑘◦𝑓 )

5
6 CHAPTER 2. COMPOSITION

We want to be able to say that we have decomposed ℎ into three simpler problems

ℎ = 𝑗◦𝑘◦𝑓

and not have to keep track which decomposition came first. This is called associativity of com-
position, and we will assume it from now on.
Composition is the source of two mappings of arrows called pre-composition and post-
composition.
When you post-compose an arrow ℎ with an arrow 𝑓 , it produces the arrow 𝑓 ◦ℎ (the arrow
𝑓 is applied after the arrow ℎ). Of course, you can post-compose ℎ only with arrows whose
source is the target of ℎ. Post-composition by 𝑓 is written as (𝑓 ◦−), leaving a hole for ℎ. As
Lao Tzu would say, “Usefulness of post-composition comes from what is not there.”
Thus an arrow 𝑓 ∶ 𝑎 → 𝑏 induces a mapping of arrows (𝑓 ◦−) that maps arrows which are
probing 𝑎 to arrows which are probing 𝑏.

(𝑓 ◦−)

𝑎 𝑏
𝑓

Since objects have no internal structure, when we say that 𝑓 transforms 𝑎 to 𝑏, this is exactly
what we mean.
Post-composition lets us shift focus from one object to another.
Dually, you can pre-compose by 𝑓 , or apply (−◦𝑓 ) to arrows originating in 𝑏 to map them
to arrows originating in 𝑎 (notice the change of direction).

𝑓
𝑎 𝑏

(−◦𝑓 )

Pre-composition lets us shift the perspective from observer 𝑏 to observer 𝑎.


Pre- and post-composition are mappings of arrows. Since arrows form sets, these are func-
tions between sets.
Another way of looking at pre- and post-composition is that they are the result of partial
application of the two-hole composition operator (−◦−), in which we can pre-fill one hole or
the other with a fixed arrow.
In programming, an outgoing arrow is interpreted as extracting data from its source. An
incoming arrow is interpreted as producing or constructing the target. Outgoing arrows define
the interface, incoming arrows define the constructors.
For Haskell programmers, here’s the implementation of post-composition as a higher-order
function:
postCompWith :: (a -> b) -> (x -> a) -> (x -> b)
postCompWith f = \h -> f . h
And similarly for pre-composition:
2.2. FUNCTION APPLICATION 7

preCompWith :: (a -> b) -> (b -> x) -> (a -> x)


preCompWith f = \h -> h . f
Do the following exercises to convince yourself that shifts in focus and perspective are com-
posable.

Exercise 2.1.1. Suppose that you have two arrows, 𝑓 ∶ 𝑎 → 𝑏 and 𝑔 ∶ 𝑏 → 𝑐. Their composition
𝑔◦𝑓 induces a mapping of arrows ((𝑔◦𝑓 )◦−). Show that the result is the same if you first apply
(𝑓 ◦−) and follow it by (𝑔◦−). Symbolically:

((𝑔◦𝑓 )◦−) = (𝑔◦−)◦(𝑓 ◦−)

Hint: Pick an arbitrary object 𝑥 and an arrow ℎ ∶ 𝑥 → 𝑎 and see if you get the same result.
Note that ◦ is overloaded here. On the right, it means regular function composition when put
between two post-compositions.

Exercise 2.1.2. Convince yourself that the composition from the previous exercise is associative.
Hint: Start with three composable arrows.

Exercise 2.1.3. Show that pre-composition (−◦𝑓 ) is composable, but the order of composition
is reversed:
(−◦(𝑔◦𝑓 )) = (−◦𝑓 )◦(−◦𝑔)

2.2 Function application


We are ready to write our first program. There is a saying: “A journey of a thousand miles
begins with a single step.” Consider a journey from 1 to 𝑏. Our single step can be an arrow from
the terminal object 1 to some 𝑎. It’s an element of 𝑎. We can write it as:
𝑥
1 ←←→
← 𝑎

The rest of the journey is the arrow:


𝑓
𝑎 ←←←→
← 𝑏
These two arrows are composable (they share the object 𝑎 in the middle) and their composition
is the arrow 𝑦 from 1 to 𝑏. In other words, 𝑦 is an element of 𝑏:
𝑦

1 𝑥 𝑎 𝑓
𝑏

We can write it as:


𝑦 = 𝑓 ◦𝑥
We used 𝑓 to map an element of 𝑎 to an element of 𝑏. Since this is something we do quite
often, we call it the application of a function 𝑓 to 𝑥, and use the shorthand notation

𝑦 = 𝑓𝑥

Let’s translate it to Haskell. We start with an element 𝑥 of 𝑎 (a shorthand for x :: ()-> a)


8 CHAPTER 2. COMPOSITION

x :: a
We declare a function 𝑓 as an element of the “object of arrows” from 𝑎 to 𝑏
f :: a -> b
with the understanding (which will be elaborated upon later) that it corresponds to an arrow
from a to b. The result is an element of 𝑏
y :: b
and it is defined as
y = f x
We call this the application of a function to an argument, but we were able to express it purely
in terms of function composition. (Note: In other programming languages function application
requires the use of parentheses, e.g., y = f(x).)

2.3 Identity
You may think of arrows as representing change: object 𝑎 becomes object 𝑏. An arrow that loops
back represents a change in an object itself. But change has its dual: lack of change, inaction
or, as Lao Tzu would say wu wei.
Every object has a special arrow called the identity, which leaves the object unchanged. It
means that, when you compose this arrow with any other arrow, either incoming or outgoing,
you get that other arrow back. Thought of as an action, the identity arrow does nothing and
takes no time.
An identity arrow on the object 𝑎 is called 𝑖𝑑𝑎 . So if we have an arrow 𝑓 ∶ 𝑎 → 𝑏, we can
compose it with identities on either side

𝑖𝑑𝑏 ◦𝑓 = 𝑓 = 𝑓 ◦𝑖𝑑𝑎
or, pictorially:
𝑖𝑑𝑏
𝑖𝑑𝑎

𝑓
𝑎 𝑏
We can easily check what an identity does to elements. Let’s take an element 𝑥 ∶ 1 → 𝑎 and
compose it with 𝑖𝑑𝑎 . The result is:
𝑖𝑑𝑎 ◦𝑥 = 𝑥
which means that identity leaves elements unchanged.
In Haskell, we use the same name id for all identity functions (we don’t subscript it with the
type it’s acting on). The above equation, which specifies the action of 𝑖𝑑 on elements, translates
directly to:
id x = x
and it becomes the definition of the function id.
We’ve seen before that both the initial object and the terminal object have unique arrows
circling back to them. Now we are saying that every object has an identity arrow circling back
to it. Remember what we said about uniqueness: If you can find two of those, then they must be
equal. We must conclude that these unique looping arrows we talked about must be the identity
arrows. We can now label these diagrams:
2.4. MONOMORPHISMS 9

𝑖𝑑 𝑖𝑑

Void ()
In logic, identity arrow translates to a tautology. It’s a trivial proof that, “if 𝑎 is true then 𝑎
is true.” It’s also called the identity rule.
If identity does nothing then why do we care about it? Imagine going on a trip, composing
a few arrows, and finding yourself back at the starting point. The question is: Have you done
anything, or have you wasted your time? The only way to answer this question is to compare
your path to the identity arrow.
Some round trips bring change, others don’t.
More importantly, identity arrows will allow us to compare objects. They are an integral
part of the definition of an isomorphism.

Exercise 2.3.1. What does (𝑖𝑑𝑎 ◦−) do to arrows terminating in 𝑎? What does (−◦𝑖𝑑𝑎 ) do to
arrows originating from 𝑎?

2.4 Monomorphisms
Consider the function even that tests whether its input is divisible by two:
even :: Int -> Bool
It’s a many-to-one function: all even numbers are mapped to True and all odd numbers to
False. Most nformation about the input is discarded, we are only interested in its evenness, not
its actual value. By discarding information we arrive at an abstraction1 . Functions (and later
functors) epitomize abstractions.
Contrast this with the function injectBool:
injectBool :: Bool -> Int
injectBool b = if b then 1 else 0
This function doesn’t discard information. You can recover its argument from its result.
Functions that don’t discard information are also useful: they can be thought of as injecting
their source into their target. You may imagine the type of the source as a shape that is being
embedded in the target. Here, we are embedding a two-element shape Bool into the type of
integers.

Bool Int

Injective functions, or injections are defined to always assign different values to different
arguments. In other words, they don’t collapse multiple elements into one.
Here’s another, slightly convoluted, way of saying this: An injection maps two elements to
one only when the two elements are equal.
1
to abstract literally means to draw away
10 CHAPTER 2. COMPOSITION

We could translate this definition to the categorical language by replacing “elements” with
arrows from the terminal object. We would say that 𝑓 ∶ 𝑎 → 𝑏 is an injection if, for any pair of
global elements 𝑥1 ∶ 1 → 𝑎 and 𝑥2 ∶ 1 → 𝑎, the following implication holds:

𝑓 ◦𝑥1 = 𝑓 ◦𝑥2 ⟹ 𝑥1 = 𝑥2

𝑥1
𝑓
1 𝑥2

𝑎
𝑏

The problem with this definition is that not every category has a terminal object. A better
definition would replace global elements with arbitrary shapes. Thus the notion of injectivity is
generalized to that of monomorphism.
An arrow 𝑓 ∶ 𝑎 → 𝑏 is monomorphic if, for any choice of an object 𝑐 and a pair of arrows
𝑔1 ∶ 𝑐 → 𝑎 and 𝑔2 ∶ 𝑐 → 𝑎 we have the following implication:

𝑓 ◦𝑔1 = 𝑓 ◦𝑔2 ⟹ 𝑔1 = 𝑔2

To show that an arrow 𝑓 ∶ 𝑎 → 𝑏 is not a monomorphism, it’s enough to find a counterex-


ample: two different shapes in 𝑎, such that 𝑓 maps them to the same shape in 𝑏.
Monomorphisms, or “monos” for short, are often denoted using special arrows, as in 𝑎 ↪ 𝑏
or 𝑎 ↣ 𝑏.
In category theory objects are indivisible, so we can only talk about sub-objects using ar-
rows. We say that monomorphism 𝑎 ↪ 𝑏 picks a subobject of 𝑏 in the shape of 𝑎.

Exercise 2.4.1. Show that any arrow from the terminal object is a monomorphism.

2.5 Epimorphisms
The function injectBool is injective (hence a monomorphism), but it only covers a small
subset of its target — just two integers out of infinitely many.
injectBool :: Bool -> Int
injectBool b = if b then 1 else 0
In contrast, the function even covers the whole of Bool (it can produce both True and False).
A function that covers the whole of its target is called a surjection.
To generalize injections we used additional mappings-in. To generalize surjections, we’ll
use mappings-out. The categorical counterpart of a surjection is called an epimorphism.
An arrow 𝑓 ∶ 𝑎 → 𝑏 is an epimorphism if for any choice of an object 𝑐 and a pair of arrows
𝑔1 ∶ 𝑏 → 𝑐 and 𝑔2 ∶ 𝑏 → 𝑐 we have the following implication:

𝑔1 ◦𝑓 = 𝑔2 ◦𝑓 ⟹ 𝑔1 = 𝑔2
2.5. EPIMORPHISMS 11

Conversely, to show that 𝑓 is not an epimorphism, it’s enough to pick an object 𝑐 and two
different arrows 𝑔1 and 𝑔2 that agree when precomposed with 𝑓 .
To get some insight into this definition, we have to visualize mappings out. Just as a mapping
into an object can be thought of as defining a shape, mapping out of an object can be thought of
as defining properties of that object.
This is clear when dealing with sets, especially if the target set is finite. You may think of an
element of the target set as defining a color. All elements of the source that are mapped to that
element are “painted” a particular color. For instance, the function even paints all even integers
the True color, and all odd ones the False color.
In the definition of an epimorphism we have two such mapping, 𝑔1 and 𝑔2 . Suppose that
they differ only slightly. Most of the object 𝑏 is painted alike by both of them.

𝑔1 𝑔2

𝑐 𝑐

𝑏 𝑏

If 𝑓 is not and epimorphism, it’s possible that its image only covers the part that is painted
alike by 𝑔1 and 𝑔2 . The two arrows then agree on painting 𝑎 when precomposed by 𝑓 , even
though they are different on the whole.

𝑐
𝑎

Of course, this is just an illustration. In an actual category there is no peeking inside objects.
Epimorphisms, or “epis” for short, are often denoted by a special arrow 𝑎 ↠ 𝑏.
In sets, a function that is both injective and surjective is called a bijection. It provides a one-
to-one invertible mapping between elements of two sets. This role is played by isomorphisms
in category theory. However, in general it’s not true that an arrow that is both mono and epi is
an isomorphism.

Exercise 2.5.1. Show that any arrow to the terminal object is an epimorphism.
Chapter 3

Isomorphisms

When we say that:


𝑓 ◦(𝑔◦ℎ) = (𝑓 ◦𝑔)◦ℎ
or:
𝑓 = 𝑓 ◦𝑖𝑑
we are asserting the equality of arrows. The arrow on the left is the result of one operation, and
the arrow on the right is the result of another. But the results are equal.
We often illustrate such equalities by drawing commuting diagrams, e.g.,

𝑓 ◦(𝑔◦ℎ)

𝑖𝑑
𝑔◦ℎ

ℎ 𝑔 𝑓 𝑓
𝑎 𝑏 𝑐 𝑑 𝑎 𝑏
𝑓 ◦𝑔

(𝑓 ◦𝑔)◦ℎ

Thus we compare arrows for equality.


We do not compare objects for equality1 . We see objects as confluences of arrows, so if we
want to compare two objects, we look at the arrows.

3.1 Isomorphic Objects


The simplest relation between two objects is an arrow.
The simplest round trip is a composition of two arrows going in opposite directions.

𝑓
𝑎 𝑏
𝑔

There are two possible round trips. One is 𝑔◦𝑓 , which goes from 𝑎 to 𝑎. The other is 𝑓 ◦𝑔 ,
which goes from 𝑏 to 𝑏.
1
Half-jokingly, invoking equality of objects is considered “evil” in category theory.

13
14 CHAPTER 3. ISOMORPHISMS

If both of them result in identities, then we say that 𝑔 is the inverse of 𝑓

𝑔◦𝑓 = 𝑖𝑑𝑎

𝑓 ◦𝑔 = 𝑖𝑑𝑏
and we write it as 𝑔 = 𝑓 −1 (pronounced 𝑓 inverse). The arrow 𝑓 −1 undoes the work of the
arrow 𝑓 .
Such a pair of arrows is called an isomorphism and the two objects are called isomorphic.
What does the existence of an isomorphism tell us about the two objects it connects?
We have said that objects are described by their interactions with other objects. So let’s
consider what the two isomorphic objects look like from the perspective of an observer 𝑥. Take
an arrow ℎ coming from 𝑥 to 𝑎.

𝑥

𝑓
𝑎 𝑏
𝑓 −1

There is a corresponding arrow coming from 𝑥 to 𝑏. It’s just the composition of 𝑓 ◦ℎ, or the
action of (𝑓 ◦−) on ℎ.
𝑥
ℎ 𝑓 ◦ℎ

𝑓
𝑎 𝑏
𝑓 −1

Similarly, for any arrow probing 𝑏 there is a corresponding arrow probing 𝑎. It is given by the
action of (𝑓 −1 ◦−).
We can move focus back and forth between 𝑎 and 𝑏 using the mappings (𝑓 ◦−) and (𝑓 −1 ◦−).
We can combine these two mappings (see exercise 2.1.1) to form a round trip. The result
is the same as if we applied the composite ((𝑓 −1 ◦𝑓 )◦−). But this is equal to (𝑖𝑑𝑎 ◦−) which, as
we know from exercise 2.3.1, leaves the arrows unchanged.
Similarly, the round trip induced by 𝑓 ◦𝑓 −1 leaves the arrows 𝑥 → 𝑏 unchanged.
This creates a “buddy system” between the two groups of arrows. Imagine each arrow
sending a message to its buddy, as determined by 𝑓 or 𝑓 −1 . Each arrow would then receive
exactly one message, and that would be a message from its buddy. No arrow would be left
behind, and no arrow would receive more than one message. Mathematicians call this kind of
buddy system a bijection or one-to-one correspondence.
Therefore, arrow by arrow, the two objects 𝑎 and 𝑏 look exactly the same from the perspective
of 𝑥. Arrow-wise, there is no difference between the two objects.
Two isomorphic objects have exactly the same properties.
In particular, if you replace 𝑥 with the terminal object 1, you’ll see that the two objects have
the same elements. For every element 𝑥 ∶ 1 → 𝑎 there is a corresponding element 𝑦 ∶ 1 → 𝑏,
namely 𝑦 = 𝑓 ◦𝑥, and vice versa. There is a bijection between the elements of isomorphic
objects.
Such indistinguishable objects are called isomorphic because they have “the same shape.”
You’ve seen one, you’ve seen them all.
We write this isomorphism as:
3.2. NATURALITY 15

𝑎≅𝑏
When dealing with objects, we use isomorphism in place of equality.
In programming, two isomorphic types have the same external behavior. One type can be
implemented in terms of the other and vice versa. One can be replaced by the other without
changing the behavior of the system (except, possibly, the performance).
In classical logic, if B follows from A and A follows from B then A and B are logically
equivalent. We often say that B is true “if and only if” A is true. However, unlike previous
parallels between logic and type theory, this one is not as straightforward if you consider proofs
to be relevant. In fact, it led to the development of a new branch of fundamental mathematics
called homotopy type theory, or HoTT for short.

Exercise 3.1.1. Make an argument that there is a bijection between arrows that are outgoing
from two isomorphic objects. Draw the corresponding diagrams.

Exercise 3.1.2. Show that every object is isomorphic to itself

Exercise 3.1.3. If there are two terminal objects, show that they are isomorphic

Exercise 3.1.4. Show that the isomorphism from the previous exercise is unique.

3.2 Naturality
We’ve seen that, when two objects are isomorphic, we can switch focus from one to another
using post-composition: either (𝑓 ◦−) or (𝑓 −1 ◦−).
Conversely, to switch between different observers, we would use pre-composition.
Indeed, an arrow ℎ probing 𝑎 from 𝑥 is related to the arrow ℎ◦𝑔 probing the same object
from 𝑦.
𝑔
𝑥 𝑦
ℎ◦𝑔

𝑓
𝑎 𝑏
𝑓 −1

Similarly, an arrow ℎ′ probing 𝑏 from 𝑥 corresponds to the arrow ℎ′ ◦𝑔 probing it from 𝑦.


𝑔
𝑥 𝑦
ℎ′
ℎ′ ◦𝑔
𝑓
𝑎 𝑏
𝑓 −1

In both cases, we change perspective from 𝑥 to 𝑦 by applying precomposition (−◦𝑔).


The important observation is that the change of perspective preserves the buddy system
established by the isomorphism. If two arrows were buddies from the perspective of 𝑥, they are
still buddies from the perspective of 𝑦. This is as simple as saying that it doesn’t matter if you
first pre-compose with 𝑔 (switch perspective) and then post-compose with 𝑓 (switch focus), or
first post-compose with 𝑓 and then pre-compose with 𝑔. Symbolically, we write it as:
16 CHAPTER 3. ISOMORPHISMS

(−◦𝑔)◦(𝑓 ◦−) = (𝑓 ◦−)◦(−◦𝑔)

and we call it the naturality condition.


The meaning of this equation is revealed when you apply it to a morphism ℎ ∶ 𝑥 → 𝑎. Both
sides evaluate to 𝑓 ◦ℎ◦𝑔.
(−◦𝑔)
ℎ ℎ◦𝑔
(𝑓 ◦−) (𝑓 ◦−)
(−◦𝑔)
𝑓 ◦ℎ 𝑓 ◦ℎ◦𝑔

Here, the naturality condition is satisfied automatically due to associativity, but we’ll soon
see it generalized to less trivial circumstances.
Arrows are used to broadcast information about an isomorphism. Naturality tells us that all
objects get a consistent view of it, independent of the path.
We can also reverse the roles of observers and subjects. For instance, using an arrow ℎ ∶ 𝑎 →
𝑥, the object 𝑎 can probe an arbitrary object 𝑥. If there is an arrow 𝑔 ∶ 𝑥 → 𝑦, it can switch
focus to 𝑦. Switching the perspective to 𝑏 is done by precomposition with 𝑓 −1 .

𝑓 −1
𝑓
𝑎 𝑏
𝑔◦ℎ

𝑥 𝑔 𝑦

Again, we have the naturality condition, this time from the point of view of the isomorphic pair:

(−◦𝑓 −1 )◦(𝑔◦−) = (𝑔◦−)◦(−◦𝑓 −1 )

This situation when we have to take two steps to move from one place to another is typical
in category theory. Here, the operations of pre-composition and post-composition can be done
in any order—we say that they commute. But in general the order in which we take steps leads
to different outcomes. We often impose commutation conditions and say that one operation is
compatible with another if these conditions hold.

Exercise 3.2.1. Show that both sides of the naturality condition for 𝑓 −1 , when acting on ℎ,
reduce to:
𝑓 −1 ℎ 𝑔
𝑏 𝑎 𝑥 𝑦

3.3 Reasoning with Arrows


Master Yoneda says: “At the arrows look!”
If two objects are isomorphic, they have the same sets of incoming arrows.
If two objects are isomorphic, they also have the same sets of outgoing arrows.
If you want to see if two objects are isomorphic, at the arrows look!
3.3. REASONING WITH ARROWS 17

When two objects 𝑎 and 𝑏 are isomorphic, any isomorphism 𝑓 induces a one-to-one mapping
(𝑓 ◦−) between corresponding sets of arrows.
𝑥

(𝑓 ◦−)

𝑎 𝑏
𝑓

The function (𝑓 ◦−) maps every arrow ℎ ∶ 𝑥 → 𝑎 to an arrow 𝑓 ◦𝑎 ∶ 𝑥 → 𝑏. It’s inverse (𝑓 −1 ◦−)
maps every arrow ℎ′ ∶ 𝑥 → 𝑏 to an arrow (𝑓 −1 ◦ℎ′ ).
Suppose that we don’t know if the objects are isomorphic, but we know that there is an
invertible mapping, 𝛼𝑥 , between sets of arrows impinging on 𝑎 and 𝑏 from every object 𝑥. In
other words, for every 𝑥, 𝛼𝑥 is a bijection of arrows.
𝑥

𝛼𝑥

𝑎 𝑏

Before, the bijection of arrows was generated by the isomorphism 𝑓 . Now, the bijection of
arrows is given to us by 𝛼𝑥 . Does it mean that the two objects are isomorphic? Can we construct
the isomorphism 𝑓 from the family of mappings 𝛼𝑥 ? The answer is “yes”, as long as the family
𝛼𝑥 satisfies the naturality condition.
Here’s the action of 𝛼𝑥 on a particular arrow ℎ.
𝑥
𝛼𝑥 ℎ

𝑎 𝑏

This mapping, along with its inverse 𝛼𝑥−1 , which takes arrows 𝑥 → 𝑏 to arrows 𝑥 → 𝑎, would
play the role of (𝑓 ◦−) and (𝑓 −1 ◦−), if there was indeed an isomorphism 𝑓 . The family of maps
𝛼 describes an “artificial” way of switching focus from 𝑎 to 𝑏.
Here’s the same situation from the point of view of another observer 𝑦:
𝑥 𝑦
ℎ′
𝛼𝑦 ℎ′

𝑎 𝑏
Notice that 𝑦 is using a different mapping 𝛼𝑦 from the same family.
These two mappings, 𝛼𝑥 and 𝛼𝑦 , become entangled whenever there is a morphism 𝑔 ∶ 𝑦 → 𝑥.
In that case, pre-composition with 𝑔 allows us to switch perspective from 𝑥 to 𝑦 (notice the
direction)
𝑔
𝑥 𝑦

ℎ◦𝑔
𝑎 𝑏
18 CHAPTER 3. ISOMORPHISMS

We have separated the switching of focus from the switching of perspective. The former is done
by 𝛼, the latter by pre-composition. Naturality imposes a compatibility condition between those
two.
Indeed, starting with some ℎ, we can either apply (−◦𝑔) to switch to 𝑦’s point of view, and
then apply 𝛼𝑦 to switch focus to 𝑏:
𝛼𝑦 ◦(−◦𝑔)
or we can first let 𝑥 switch focus to 𝑏 using 𝛼𝑥 , and then switch perspective using (−◦𝑔):

(−◦𝑔)◦𝛼𝑥

In both cases we end up looking at 𝑏 from 𝑦. We’ve done this exercise before, when we had an
isomorphism between 𝑎 and 𝑏, and we’ve found out that the results were the same. We called it
the naturality condition.
If we want the 𝛼’s to give us an isomorphism, we have to impose the equivalent naturality
condition:
𝛼𝑦 ◦(−◦𝑔) = (−◦𝑔)◦𝛼𝑥
When acting on some arrow ℎ ∶ 𝑥 → 𝑎, we want this diagram to commute:

(−◦𝑔)
ℎ ℎ◦𝑔
𝛼𝑥 𝛼𝑦
(−◦𝑔)
𝛼𝑥 ℎ (𝛼𝑥 ℎ)◦𝑔 = 𝛼𝑦 (ℎ◦𝑔)

This way we know that replacing all 𝛼’s with (𝑓 ◦−) will work. But does such 𝑓 exist? Can we
reconstruct 𝑓 from the 𝛼’s? The answer is yes, and we’ll use the Yoneda trick to accomplish
that.
Since 𝛼𝑥 is defined for any object 𝑥, it is also defined for 𝑎 itself. By definition, 𝛼𝑎 takes a
morphism 𝑎 → 𝑎 to a morphism 𝑎 → 𝑏. We know for sure that there is at least one morphism
𝑎 → 𝑎, namely the identity 𝑖𝑑𝑎 . It turns out that the isomorphism 𝑓 we are seeking is given by:

𝑓 = 𝛼𝑎 (𝑖𝑑𝑎 )

or, pictorially:
𝑎
𝑓 =𝛼𝑎 (𝑖𝑑𝑎 )
𝑖𝑑𝑎

𝑎 𝑏
Let’s verify this. If 𝑓 is indeed our isomorphism then, for any 𝑥, 𝛼𝑥 should be equal to
(𝑓 ◦−). To see that, let’s rewrite the naturality condition replacing 𝑥 with 𝑎. We get:

𝛼𝑦 (ℎ◦𝑔) = (𝛼𝑎 ℎ)◦𝑔

as illustrated in the following diagram:


𝑔
𝑎 𝑦
ℎ 𝛼𝑦 (ℎ◦𝑔)
𝛼𝑎 (ℎ)
𝑎 𝑏
3.3. REASONING WITH ARROWS 19

Since both the source and the target of ℎ is 𝑎, this equality must also be true for ℎ = 𝑖𝑑𝑎

𝛼𝑦 (𝑖𝑑𝑎 ◦𝑔) = (𝛼𝑎 (𝑖𝑑𝑎 ))◦𝑔

But 𝑖𝑑𝑎 ◦𝑔 is equal to 𝑔 and 𝛼𝑎 (𝑖𝑑𝑎 ) is our 𝑓 , so we get:

𝛼𝑦 𝑔 = 𝑓 ◦𝑔 = (𝑓 ◦−)𝑔

In other words, 𝛼𝑦 = (𝑓 ◦−) for every object 𝑦 and every morphism 𝑔 ∶ 𝑦 → 𝑎.


Notice that, even though 𝛼𝑥 was defined individually for every 𝑥 and every arrow 𝑥 → 𝑎, it
turned out to be completely determined by its value at a single identity arrow. This is the power
of naturality!

Reversing the Arrows


As Lao Tzu would say, the duality between the observer and the observed cannot be complete
unless the observer is allowed to switch roles with the observed.
Again, we want to show that two objects 𝑎 and 𝑏 are isomorphic, but this time we want to
treat them as observers. An arrow ℎ ∶ 𝑎 → 𝑥 probes an arbitrary object 𝑥 from the perspective
of 𝑎. Previously, when we knew that the two objects were isomorphic, we were able to switch
perspective to 𝑏 using (−◦𝑓 −1 ). This time we have at our disposal a transformation 𝛽𝑥 instead.
It establishes the bijection between arrows impinging on 𝑥.

𝑥
𝛽𝑥 ℎ

𝑎 𝑏

If we want to observe another object, 𝑦, we will use 𝛽𝑦 to switch perspectives between 𝑎 and 𝑏,
and so on.

𝛽𝑥

𝑎 𝑏
If the two objects 𝑥 and 𝑦 are connected by an arrow 𝑔 ∶ 𝑥 → 𝑦 then we also have an option
of switching focus using (𝑔◦−). If we want to do both: switch perspective and switch focus,
there are two ways of doing it. Naturality demands that the results be equal:

(𝑔◦−)◦𝛽𝑥 = 𝛽𝑦 ◦(𝑔◦−)

Indeed, if we replace 𝛽 with (−◦𝑓 −1 ), we recover the naturality condition for an isomorphism.

Exercise 3.3.1. Use the trick with the identity morphism to recover 𝑓 −1 from the family of
mappings 𝛽.

Exercise 3.3.2. Using 𝑓 −1 from the previous exercise, evaluate 𝛽𝑦 𝑔 for an arbitrary object 𝑦
and an arbitrary arrow 𝑔 ∶ 𝑎 → 𝑦.

As Lao Tzu would say: To show an isomorphism, it is often easier to define a natural trans-
formation between ten thousand arrows than it is to find a pair of arrows between two objects.
Chapter 4

Sum Types

4.1 Bool
We know how to compose arrows. But how do we compose objects?
We have defined 0 (the initial object) and 1 (the terminal object). What is 2 if not 1 plus 1?
A 2 is an object with two elements: two arrows coming from 1. Let’s call one arrow True
and the other False. Don’t confuse those names with the logical interpretations of the initial
and the terminal objects. These two are arrows.

True False

2
This simple idea can be immediately expressed in Haskell1 as the definition of a type, tra-
ditionally called Bool, after its inventor George Boole (1815-1864).
data Bool where
True :: () -> Bool
False :: () -> Bool
It corresponds to the same diagram (only with some Haskell renamings):

()

True False

Bool

As we’ve seen before, there is a shortcut notation for elements, so here’s a more compact
version:
data Bool where
True :: Bool
False :: Bool
1
This style of definition is called the Generalized Algebraic Data Types or GADTs in Haskell

21
22 CHAPTER 4. SUM TYPES

We can now define a term of the type Bool, for instance


x :: Bool
x = True
The first line declares x to be an element of Bool (secretly a function ()->Bool), and the second
line tells us which one of the two.
The functions True and False that we used in the definition of Bool are called data con-
structors. They can be used to construct specific terms, like in the example above. As a side note,
in Haskell, function names start with lower-case letters, except when they are data constructors.
Our definition of the type Bool is still incomplete. We know how to construct a Bool term,
but we don’t know what to do with it. We have to be able to define arrows that go out of Bool—
the mappings out of Bool.
The first observation is that, if we have an arrow h from Bool to some concrete type A then
we automatically get two arrows x and y from unit to A, just by composition. The following two
(distorted) triangles commute:

()

True False

𝑥 𝑦
Bool

𝐴
In other words, every function Bool->A produces a pair of elements of A.
Given a concrete type A:
h :: Bool -> A
we have:
x = h True
y = h False
where
x :: A
y :: A
Notice the use of the shorthand notation for the application of a function to an element:
h True -- meaning: h . True
We are now ready to complete our definition of Bool by adding the condition that any func-
tion from Bool to A not only produces but is equivalent to a pair of elements of A. In other
words, a pair of elements uniquely determines a function from Bool.
What this means is that we can interpret the diagram above in two ways: Given h, we can
easily get x and y. But the converse is also true: a pair of elements x and y uniquely defines h.
We have a bijection at work here. This time it’s a one-to-one mapping between a pair of
elements (𝑥, 𝑦) and an arrow ℎ.
In Haskell, this definition of h is encapsulated in the if, then, else construct. Given
x :: A
y :: A
4.1. BOOL 23

we define the mapping out


h :: Bool -> A
h b = if b then x else y
Here, b is a term of the type Bool.
In general, a data type is created using introduction rules and deconstructed using elimina-
tion rules. The Bool data type has two introduction rules, one using True and another using
False. The if, then, else construct defines the elimination rule.
The fact that, given the above definition of h, we can retrieve the two terms that were used
to define it, is called the computation rule. It tells us how to compute the result of h. If we call
h with True, the result is x; if we call it with False, the result is y.
We should never lose sight of the purpose of programming: to decompose complex problems
into a series of simpler ones. The definition of Bool illustrates this idea. Whenever we have to
construct a mapping out of Bool, we decompose it into two smaller tasks of constructing a pair
of elements of the target type. We traded one larger problem for two simpler ones.

Examples
Let’s do a few examples. We haven’t defined many types yet, so we’ll be limited to mappings
out of Bool to either Void, (), or Bool. Such edge cases, however, may offer new insights into
well known results.
We have decided that there can be no functions (other than identity) with Void as a target, so
we don’t expect any functions from Bool to Void. And indeed, we have zero pairs of elements
of Void.
What about functions from Bool to ()? Since () is terminal, there can be only one function
from Bool to it. And, indeed, this function corresponds to the single possible pair of functions
from () to ()—both being identities. So far so good.

()

True False

id id
Bool

()

The interesting case is functions from Bool to Bool. Let’s plug Bool in place of A:

()

True False
𝑥 𝑦

Bool

Bool

How many pairs (𝑥, 𝑦) of functions from () to Bool do we have at our disposal? There are
only two such functions, True and False, so we can form four pairs. These are (𝑇 𝑟𝑢𝑒, 𝑇 𝑟𝑢𝑒),
24 CHAPTER 4. SUM TYPES

(𝐹 𝑎𝑙𝑠𝑒, 𝐹 𝑎𝑙𝑠𝑒), (𝑇 𝑟𝑢𝑒, 𝐹 𝑎𝑙𝑠𝑒), and (𝐹 𝑎𝑙𝑠𝑒, 𝑇 𝑟𝑢𝑒). Therefore there can only be four functions
from Bool to Bool.
We can write them in Haskell using the if, then, else construct. For instance, the last one,
which we’ll call not is defined as:
not :: Bool -> Bool
not b = if b then False else True
We can also look at functions from Bool to A as elements of the object of arrows, or the
exponential object 𝐴2 , where 2 is the Bool object. According to our count, we have zero ele-
ments in 02 , one element in 12 , and four elements in 22 . This is exactly what we’d expect from
high-school algebra, where numbers actually mean numbers.

Exercise 4.1.1. Write the implementations of the three other functions Bool->Bool.

4.2 Enumerations
What comes after 0, 1, and 2? An object with three data constructors. For instance:
data RGB where
Red :: RGB
Green :: RGB
Blue :: RGB
If you’re tired of redundant syntax, there is a shorthand for this type of definition:
data RGB = Red | Green | Blue
This introduction rule allows us to construct terms of the type RGB, for instance:
c :: RGB
c = Blue
To define mappings out of RGB, we need a more general elimination pattern. Just like a function
from Bool was determined by two elements, a function from RGB to A is determined by a triple
of elements of A: x, y, and z. We write such a function using pattern matching syntax:
h :: RGB -> A
h Red = x
h Green = y
h Blue = z
This is just one function whose definition is split into three cases.
It’s possible to use the same syntax for Bool as well, in place of if, then, else:
h :: Bool -> A
h True = x
h False = y
In fact, there is a third way of writing the same thing using the case statement:
h c = case c of
Red -> x
Green -> y
Blue -> z
or even
4.2. ENUMERATIONS 25

h :: Bool -> A
h b = case b of
True -> x
False -> y
You can use any of these at your convenience when programming.
These patterns will also work for types with four, five, and more data constructors. For
instance, a decimal digit is one of:
data Digit = Zero | One | Two | Three | ... | Nine
There is a giant enumeration of Unicode characters called Char. Their constructors are
given special names: you write the character itself between two apostrophes, e.g.,
c :: Char
c = 'a'
As Lao Tzu would say, a pattern of ten thousand things would take many years to complete,
therefore people came up with the wildcard pattern, the underscore, which matches everything.
Because the patterns are matched in order, you should use the wildcard pattern as the last
one in a series:
yesno :: Char -> Bool
yesno c = case c of
'y' -> True
'Y' -> True
_ -> False
But why should we stop at that? The type Int could be thought of as an enumeration of
integers in the range between −229 and 229 (or more, depending on the implementation). Of
course, exhaustive pattern matching on such ranges is out of the question, but the principle
holds.
In practice, the types Char for Unicode characters, Int for fixed-precision integers, Double
for double-precision floating point numbers, and several others, are built into the language.
These are not infinite types. Their elements can be enumerated, even if it would take ten
thousand years. The type Integer is infinite, though.

Short Haskell Digression


Since we are going to write more Haskell code, we have to establish some preliminaries. To
define data types using functions, we need to use the language pragma called GADTs (it stands
for Generalized Algebraic Data Types). The pragma has to be put at the top of the source file.
For instance:
{- # language GADTs # -}

data Bool where


True :: () -> Bool
False :: () -> Bool
The Void data type can be defined as:
data Void where
with the empty where clause (no data constructor!).
26 CHAPTER 4. SUM TYPES

The function absurd works with any type as its target (it’s a polymorphic function), so it
is parameterized by a type variable. Unlike concrete types, type variables must start with a
lowercase letter. Here, a is such a type variable:
absurd :: Void -> a
absurd v = undefined
We use undefined to placate the compiler. In this case, we are absolutely sure that the function
absurd can never be called, because it’s impossible to construct an argument of type Void.
You may use undefined when you’re only interested in compiling, as opposed to running,
your code. For instance, you may need to plug a function f to check if your definitions work
together:
f :: a -> x
f = undefined
If you want to experiment with defining your own versions of standard types, like Either,
you have to tell the compiler to hide the originals that are defined in the standard library called
the Prelude. Put this line at the top of the file, after the language pragmas:
import Prelude hiding (Either, Left, Right)

4.3 Sum Types


The Bool type could be seen as the sum 2 = 1 + 1. But nothing stops us from replacing 1
with another type, or even replacing each of the 1s with different types. We can define a new
type 𝑎 + 𝑏 by using two arrows. Let’s call them Left and Right. The defining diagram is the
introduction rule:
𝑎 𝑏
Left Right
𝑎+𝑏

In Haskell, the type 𝑎 + 𝑏 is called Either a b. By analogy with Bool, we can define it as
data Either a b where
Left :: a -> Either a b
Right :: b -> Either a b
(Note the use of lower-case letters for type variables.)
Similarly, the mapping out from 𝑎 + 𝑏 to some type 𝑐 is determined by this commuting
diagram:
𝑎 𝑏
Left Right

𝑎+𝑏
𝑓 𝑔

Given a function ℎ, we get a pair of functions 𝑓 and 𝑔 just by composing it with Left and Right.
Conversely, such a pair of functions uniquely determines ℎ. This is the elimination rule.
4.3. SUM TYPES 27

When we want to translate this diagram to Haskell, we need to select elements of the two
types. We can do it by defining the arrows 𝑎 and 𝑏 from the terminal object.

1
𝑎 𝑏

𝑎 𝑏
Left Right

𝑎+𝑏
𝑓 𝑔

Follow the arrows in this diagram to get:

ℎ◦Left◦𝑎 = 𝑓 ◦𝑎

ℎ◦Right◦𝑏 = 𝑔◦𝑏

Haskell syntax repeats these equations almost literally, resulting in this pattern-matching
syntax for the definition of h:
h :: Either a b -> c
h (Left a) = f a
h (Right b) = g b
(Again, notice the use of lower-case letters for type variables and the same letters for terms of
that type. Unlike humans, the compilers don’t get confused by this.)
You can also read these equations right to left, and you will see the computation rules for
sum types: The two functions that were used to define h can be recovered by applying h to
(Left a) and (Right b).
You can also use the case syntax to define h:
h e = case e of
Left a -> f a
Right b -> g b
So what is the essence of a data type? It is but a recipe for manipulating arrows.

Maybe
A very useful data type, Maybe is defined as a sum 1 + 𝑎, for any 𝑎. This is its definition in
Haskell:
data Maybe a where
Nothing :: () -> Maybe a
Just :: a -> Maybe a
The data constructor Nothing is an arrow from the unit type, and Just constructs Maybe a
from a. Maybe a is isomorphic to Either () a. It can also be defined using the shorthand
notation
data Maybe a = Nothing | Just a
28 CHAPTER 4. SUM TYPES

Maybe is mostly used to encode the return type of partial functions: ones that are unde-
fined for some values of their arguments. In that case, instead of failing, such functions return
Nothing. In other programming languages partial functions are often implemented using ex-
ceptions (or core dumps).

Logic
In logic, the proposition 𝐴 + 𝐵 is called the alternative, or logical or. You can prove it by
providing the proof of 𝐴 or the proof of 𝐵. Either one will suffice.
If you want to prove that 𝐶 follows from 𝐴+𝐵, you have to be prepared for two eventualities:
either somebody proved 𝐴 + 𝐵 by proving 𝐴 (and 𝐵 could be false) or by proving 𝐵 (and 𝐴
could be false). In the first case, you have to show that 𝐶 follows from 𝐴. In the second case
you need a proof that 𝐶 follows from 𝐵. These are exactly the arrows in the elimination rule for
𝐴 + 𝐵.

4.4 Cocartesian Categories


In Haskell, we can define a sum of any two types using Either. A category in which all sums
exist, and the initial object exists, is called cocartesian, and the sum is called a coproduct. You
might have noticed that sum types mimic addition of numbers. It turns out that the initial object
plays the role of zero.

One Plus Zero


Let’s first show that 1 + 0 ≅ 1, meaning the sum of the terminal object and the initial object is
isomorphic to the terminal object. The standard procedure for this kind of proofs is to use the
Yoneda trick. Since sum types are defined by mapping out, we should compare arrows coming
out of either side.
The Yoneda argument says that two objects are isomorphic if there is a bijection 𝛽𝑎 between
the sets of arrows coming out of them to an arbitrary object 𝑎, and this bijection is natural.
Let’s look at the definition of 1 + 0 and it’s mapping out to any object 𝑎. This mapping is
defined by a pair (𝑥, ¡), where 𝑥 is an element of 𝑎 and ¡ is the unique arrow from the initial
object to 𝑎 (the absurd function in Haskell).

1 0
Left Right
1

1+0 𝑥
𝑥

¡
𝑎
𝑎
We want to establish a one-to-one mapping between arrows originating in 1 + 0 and the ones
originating in 1. The arrow ℎ is determined by the pair (𝑥, ¡). Since there is only one ¡, there is
a bijection between ℎ’s and 𝑥’s.
We define 𝛽𝑎 to map any ℎ defined by a pair (𝑥, ¡) to 𝑥. Conversely, 𝛽𝑎−1 maps 𝑥 to the pair
(𝑥, ¡). But is it a natural transformation?
To answer that, we need to consider what happens when we change focus from 𝑎 to some 𝑏
that is connected to it through an arrow 𝑔 ∶ 𝑎 → 𝑏. We have two options now:
4.4. COCARTESIAN CATEGORIES 29

• Make ℎ switch focus by post-composing both 𝑥 and ¡ with 𝑔. We get a new pair (𝑦 =
𝑔◦𝑥, ¡). Follow it by 𝛽𝑏 .

• Use 𝛽𝑎 to map (𝑥, ¡) to 𝑥. Follow it with the post-composition (𝑔◦−).

In both cases we get the same arrow 𝑦 = 𝑔◦𝑥. So the mapping 𝛽 is natural. Therefore 1 + 0 is
isomorphic to 1.
In Haskell, we can define the two functions that form the isomorphism, but there is no way
of directly expressing the fact that they are the inverse of each other.
f :: Either () Void -> ()
f (Left ()) = ()
f (Right _) = ()

f_1 :: () -> Either () Void


f_1 _ = Left ()
The underscore wildcard in a function definition means that the argument is ignored. The second
clause in the definition of f is redundant, since there are no terms of the type Void.

Something Plus Zero


A very similar argument can be used to show that 𝑎 + 0 ≅ 𝑎. The following diagram explains it.

𝑎 0
𝑎
Left Right

𝑎+0 𝑓
𝑓

¡
𝑥
𝑥

We can translate this argument to Haskell by implementing a (polymorphic) function h that


works for any type a.

Exercise 4.4.1. Implement, in Haskell, the two functions that form the isomorphism between
(Either a Void) and a.

We could use a similar argument to show that 0 + 𝑎 ≅ 𝑎, but there is a more general property
of sum types that obviates that.

Commutativity
There is a nice left-right symmetry in the diagrams that define the sum type, which suggests that
it satisfies the commutativity rule, 𝑎 + 𝑏 ≅ 𝑏 + 𝑎.
Let’s consider mappings out of both sides of this formula. You can easily see that, for every
ℎ that is determined by a pair (𝑓 , 𝑔) on the left, there is a corresponding ℎ′ given by a pair (𝑔, 𝑓 )
on the right. That establishes the bijection of arrows.
30 CHAPTER 4. SUM TYPES

𝑎 𝑏 𝑏 𝑎
Left Right Left Right

𝑎+𝑏 𝑏+𝑎
𝑓 𝑔 𝑔 𝑓
ℎ ℎ′
𝑥 𝑥

Exercise 4.4.2. Show that the bijection defined above is natural. Hint: Both 𝑓 and 𝑔 change
focus by post-composition with 𝑘 ∶ 𝑥 → 𝑦.

Exercise 4.4.3. Implement, in Haskell, the function that witnesses the isomorphism between
(Either a b) and (Either b a). Notice that this function is its own inverse.

Associativity
Just like in arithmetic, the sum that we have defined is associative:

(𝑎 + 𝑏) + 𝑐 ≅ 𝑎 + (𝑏 + 𝑐)

It’s easy to write the mapping out for the left-hand side:
h :: Either (Either a b) c -> x
h (Left (Left a)) = f1 a
h (Left (Right b)) = f2 b
h (Right c) = f3 c
Notice the use of nested patterns like (Left (Left a)), etc. The mapping is fully defined by a
triple of functions. The same functions can be used to define the mapping out of the right-hand
side:
h' :: Either a (Either b c) -> x
h' (Left a) = f1 a
h' (Right (Left b)) = f2 b
h' (Right (Right c)) = f3 c
This establishes a one-to-one mapping between triples of functions that define the two mappings
out. This mapping is natural because all changes of focus are done using post-composition.
Therefore the two sides are isomorphic.
This code can also be displayed in diagrammatical form. Here’s the diagram for the left
hand side of the isomorphism:

𝑎 𝑏 𝑐
𝐿 𝑅
𝑎+𝑏 𝑅
𝑓2
𝐿
(𝑎 + 𝑏) + 𝑐 𝑓3
𝑓1

𝑥
4.4. COCARTESIAN CATEGORIES 31

Functoriality
Since the sum is defined by the mapping out property, it was easy to see what happens when
we change focus: it changes “naturally” with the foci of the arrows that define the product. But
what happens when we move the sources of those arrows?
Suppose that we have arrows that map 𝑎 and 𝑏 to some 𝑎′ and 𝑏′ :

𝑓 ∶ 𝑎 → 𝑎′
𝑔 ∶ 𝑏 → 𝑏′

The composition of these arrows with the constructors Left and Right, respectively, can be used
to define the mapping between the sums:

𝑎 𝑏
𝑓 Left Right 𝑔

𝑎′ 𝑎+𝑏 𝑏′

Left Right
𝑎′ + 𝑏′

The pair of arrows, (Left◦𝑓 , Right◦𝑔) uniquely defines the arrow ℎ ∶ 𝑎 + 𝑏 → 𝑎′ + 𝑏′ .


This property of the sum is called functoriality. You can imagine it as allowing you to
transform the two objects inside the sum and get a new sum. We also say that functoriality lets
us lift a pair of arrows in order to operate on sums.

Exercise 4.4.4. Show that functoriality preserves composition. Hint: take two composable
arrows, 𝑔 ∶ 𝑏 → 𝑏′ and 𝑔 ′ ∶ 𝑏′ → 𝑏′′ and show that applying 𝑔 ′ ◦𝑔 gives the same result as first
applying 𝑔 to transform 𝑎 + 𝑏 to 𝑎 + 𝑏′ and then applying 𝑔 ′ to transform 𝑎 + 𝑏′ to 𝑎 + 𝑏′′ .

Exercise 4.4.5. Show that functoriality preserves identity. Hint: use 𝑖𝑑𝑏 and show that it is
mapped to 𝑖𝑑𝑎+𝑏 .

Symmetric Monoidal Category


When a child learns addition we call it arithmetic. When a grownup learns addition we call it a
cocartesian category.
Whether we are adding numbers, composing arrows, or constructing sums of objects, we
are re-using the same idea of decomposing complex things into their simpler components.
As Lao Tzu would say, when things come together to form a new thing, and the operation
is associative, and it has a neutral element, we know how to deal with ten thousand things.
The sum type we have defined satisfies these properties:

𝑎+0≅𝑎
𝑎+𝑏≅𝑏+𝑎
(𝑎 + 𝑏) + 𝑐 ≅ 𝑎 + (𝑏 + 𝑐)

and it’s functorial. A category with this type of operation is called symmetric monoidal. When
the operation is the sum (coproduct), it’s called cocartesian. In the next chapter we’ll see another
monoidal structure that’s called cartesian without the “co.”
Chapter 5

Product Types

We can use sum types to enumerate possible values of a given type, but the encoding can be
wasteful. We needed ten constructors just to encode numbers between zero and nine.
data Digit = Zero | One | Two | Three | ... | Nine
But if we combine two digits into a single data structure, a two-digit decimal number, we’ll be
able to encode a hundred numbers. Or, as Lao Tzu would say, with just four digits you can
encode ten thousand numbers.
A data type that combines two types in this manner is called a product, or a cartesian product.
Its defining quality is the elimination rule: there are two arrows coming from 𝑎 × 𝑏; one called
“fst” goes to 𝑎, and another called “snd” goes to 𝑏. They are called (cartesian) projections. They
let us retrieve 𝑎 and 𝑏 from the product 𝑎 × 𝑏.

𝑎×𝑏

fst snd
𝑎 𝑏
Suppose that somebody gave you an element of a product, that is an arrow ℎ from the ter-
minal object 1 to 𝑎 × 𝑏. You can easily retrieve a pair of elements, just by using composition:
an element of 𝑎 given by
𝑥 = fst◦ℎ

and an element of 𝑏 given by


𝑦 = snd◦ℎ

1
ℎ 𝑦
𝑥
𝑎×𝑏

fst snd
𝑎 𝑏
In fact, given an arrow from an arbitrary object 𝑐 to 𝑎 × 𝑏, we can define, by composition, a
pair of arrows 𝑓 ∶ 𝑐 → 𝑎 and 𝑔 ∶ 𝑐 → 𝑏

33
34 CHAPTER 5. PRODUCT TYPES

𝑐
ℎ 𝑔
𝑓
𝑎×𝑏

fst snd
𝑎 𝑏
As we did before with the sum type, we can turn this idea around, and use this diagram to
define the product type: We impose the condition that a pair of functions 𝑓 and 𝑔 be in one-
to-one correspondence with a mapping in from 𝑐 to 𝑎 × 𝑏. This is the introduction rule for the
product.
In particular, the mapping out of the terminal object is used in Haskell to define a product
type. Given two elements, a :: A and b :: B, we construct the product
(a, b) :: (A, B)
The built-in syntax for products is just that: a pair of parentheses and a comma in between. It
works both for defining the product of two types (A, B) and the data constructor (a, b) that
takes two elements and pairs them together.
We should never lose sight of the purpose of programming: to decompose complex problems
into a series of simpler ones. We see it again in the definition of the product. Whenever we have
to construct a mapping into the product, we decompose it into two smaller tasks of constructing
a pair of functions, each mapping into one of the components of the product. This is as simple
as saying that, in order to implement a function that returns a pair of values, it’s enough to
implement two functions, each returning one of the elements of the pair.

Logic
In logic, a product type corresponds to logical conjunction. In order to prove 𝐴 × 𝐵 (𝐴 and 𝐵),
you need to provide the proofs of both 𝐴 and 𝐵. These are the arrows targeting 𝐴 and 𝐵. The
elimination rule says that if you have a proof of 𝐴 × 𝐵, then you automatically get the proof of
𝐴 (through fst) and the proof of 𝐵 (through snd).

Tuples and Records


As Lao Tzu would say, a product of ten thousand objects is just an object with ten thousand
projections.
We can form arbitrary products in Haskell using the tuple notation. For instance, a product
of three types is written as (A, B, C). A term of this type can be constructed from three
elements: (a, b, c).
In what mathematicians call “abuse of notation”, a product of zero types is written as (), an
empty tuple, which happens to be the same as the terminal object, or unit type. This is because
the product behaves very much like multiplication of numbers, with the terminal object playing
the role of one.
In Haskell, rather than defining separate projections for all tuples, we use the pattern-matching
syntax. For instance, to extract the third component from a triple we would write
thrd :: (a, b, c) -> c
thrd (_, _, c) = c
5.1. CARTESIAN CATEGORY 35

We use wildcards for the components that we want to ignore.


Lao Tzu said that “Naming is the origin of all particular things.” In programming, keeping
track of the meaning of the components of a particular tuple is difficult without giving them
names. Record syntax allows us to give names to projections. This is the definition of a product
written in record style:
data Product a b = Pair { fst :: a, snd :: b }
Pair is the data constructor and fst and snd are the projections.
This is how it could be used to declare and initialize a particular pair:
ic :: Product Int Char
ic = Pair 10 'A'

5.1 Cartesian Category


In Haskell, we can define a product of any two types. A category in which all products exist,
and the terminal object exists, is called cartesian.

Tuple Arithmetic
The identities satisfied by the product can be derived using the mapping-in property. For in-
stance, to show that 𝑎 × 𝑏 ≅ 𝑏 × 𝑎 consider the following two diagrams:

𝑥 𝑥
𝑓
ℎ 𝑔 𝑔 ℎ′ 𝑓
𝑎×𝑏 𝑏×𝑎

fst snd fst snd


𝑎 𝑏 𝑏 𝑎
They show that, for any object 𝑥 the arrows to 𝑎 × 𝑏 are in one-to-one correspondence with
arrows to 𝑏 × 𝑎. This is because each of these arrows is determined by the same pair 𝑓 and 𝑔.
You can check that the naturality condition is satisfied because, when you shift the perspec-
tive using 𝑘 ∶ 𝑥′ → 𝑥, all arrows originating in 𝑥 are shifted by pre-composition (−◦𝑘).
In Haskell, this isomorphism can be implemented as a function which is its own inverse:
swap :: (a, b) -> (b, a)
swap x = (snd x, fst x)
Here’s the same function written using pattern matching:
swap (x, y) = (y, x)
It’s important to keep in mind that the product is symmetric only “up to isomorphism.” It
doesn’t mean that swapping the order of pairs won’t change the behavior of a program. Sym-
metry means that the information content of a swapped pair is the same, but access to it needs
to be modified.
The terminal object is the unit of the product, 1 × 𝑎 ≅ 𝑎. The arrow that witnesses the
isomorphism between 1 × 𝑎 and 𝑎 is called the left unitor:

𝜆∶ 1 × 𝑎 → 𝑎
36 CHAPTER 5. PRODUCT TYPES

It can be implemented as 𝜆 = snd. Its inverse 𝜆−1 is defined as the unique arrow in the following
diagram:
𝑎
𝜆−1
! id
1×𝑎

fst snd
1 𝑎
The arrow from 𝑎 to 1 is called ! (pronounced, bang). This indeed shows that

snd◦𝜆−1 = id

We still have to prove that 𝜆−1 is the left inverse of snd. Consider the following diagram:

1×𝑎

! snd
1×𝑎

fst snd
1 𝑎

It obviously commutes for ℎ = 𝑖𝑑. It also commutes for ℎ = 𝜆−1 ◦snd, because we have:

snd◦𝜆−1 ◦snd = snd

Since ℎ is supposed to be unique, we conclude that:

𝜆−1 ◦snd = 𝑖𝑑

This kind of reasoning with universal constructions is pretty standard.


Here are some other isomorphisms written in Haskell (without proofs of having the inverse).
This is associativity:
assoc :: ((a, b), c) -> (a, (b, c))
assoc ((a, b), c) = (a, (b, c))
And this is the right unit
runit :: (a, ()) -> a
runit (a, _) = a
These two functions correspond to the associator

𝛼 ∶ (𝑎 × 𝑏) × 𝑐 → 𝑎 × (𝑏 × 𝑐)

and the right unitor:


𝜌∶ 𝑎 × 1 → 𝑎

Exercise 5.1.1. Show that the bijection in the proof of left unit is natural. Hint, change focus
using an arrow 𝑔 ∶ 𝑎 → 𝑏.
5.2. DUALITY 37

Exercise 5.1.2. Construct an arrow

ℎ ∶ 𝑏 + 𝑎 × 𝑏 → (1 + 𝑎) × 𝑏

Is this arrow unique?


Hint: It’s a mapping into a product, so it’s given by a pair of arrow. These arrows, in turn,
map out of a sum, so each is given by a pair of arrows.
Hint: The mapping 𝑏 → 1 + 𝑎 is given by (Left ◦ !)

Exercise 5.1.3. Redo the previous exercise, this time treating ℎ as a mapping out of a sum.

Exercise 5.1.4. Implement a Haskell function maybeAB :: Either b (a, b) -> (Maybe a, b).
Is this function uniquely defined by its type signature or is there some leeway?

Functoriality
Suppose that we have arrows that map 𝑎 and 𝑏 to some 𝑎′ and 𝑏′ :

𝑓 ∶ 𝑎 → 𝑎′
𝑔 ∶ 𝑏 → 𝑏′

The composition of these arrows with the projections fst and snd, respectively, can be used to
define the mapping ℎ between the products:

𝑎×𝑏
fst ℎ snd

𝑎 𝑎′ × 𝑏′ 𝑏
𝑓 𝑔
fst snd
𝑎′ 𝑏′
The shorthand notation for this diagram is:
𝑓 ×𝑔
← 𝑎′ × 𝑏′
𝑎 × 𝑏 ←←←←←←←→

This property of the product is called functoriality. You can imagine it as allowing you to
transform the two objects inside the product to get the new product. We also say that functoriality
lets us lift a pair of arrows in order to operate on products.

5.2 Duality
When a child sees an arrow, it knows which end points at the source, and which points at the
target
𝑎→𝑏
But maybe this is just a preconception. Would the Universe be very different if we called 𝑏 the
source and 𝑎 the target?
We would still be able to compose this arrow with this one

𝑏→𝑐
38 CHAPTER 5. PRODUCT TYPES

whose “target” 𝑏 is the same as the same as the “source” of 𝑎 → 𝑏, and the result would still be
an arrow
𝑎→𝑐
only now we would say that it goes from 𝑐 to 𝑎.
In this dual Universe, the object that we call “initial” would be called “terminal,” because
it’s the “target” of unique arrows coming from all objects. Conversely, the terminal object would
be called initial.
Now consider this diagram that we used to define the sum object:

𝑎 𝑏
Left Right

𝑎+𝑏
𝑓 𝑔

In the new interpretation, the arrow ℎ would go “from” an arbitrary object 𝑐 “to” the object we
call 𝑎 + 𝑏. This arrow is uniquely defined by a pair of arrows (𝑓 , 𝑔) whose “source” is 𝑐. If we
rename Left to fst and Right to snd, we will get the defining diagram for a product.
A product is the sum with arrows reversed.
Conversely, a sum is the product with arrows reversed.
Every construction in category theory has its dual.
If the direction of arrows is just a matter of interpretation, then what makes sum types so
different from product types, in programming? The difference goes back to one assumption we
made at the start: There are no incoming arrows to the initial object (other than the identity
arrow). This is in contrast with the terminal object having lots of outgoing arrows, arrows that
we used to define (global) elements. In fact, we assume that every object of interest has elements,
and the ones that don’t are isomorphic to Void.
We’ll see an even deeper difference when we talk about function types.

5.3 Monoidal Category


We have seen that the product satisfies these simple rules:

1×𝑎≅𝑎
𝑎×𝑏≅𝑏×𝑎
(𝑎 × 𝑏) × 𝑐 ≅ 𝑎 × (𝑏 × 𝑐)

and is functorial.
A category in which an operation with these properties is defined is called symmetric monoidal1 .
We’ve seen a similar structure before, when working with sums and the initial object.
A category can have multiple monoidal structures at the same time. When you don’t want to
name your monoidal structure, you replace the plus sign or the product sign with a tensor sign,
1
Strictly speaking, a product of two objects is defined up to isomorphism, whereas the product in a monoidal
category must be defined on the nose. But we can get a monoidal category by making a choice of a product
5.3. MONOIDAL CATEGORY 39

and the neutral element with the letter 𝐼. The rules of a symmetric monoidal category can then
be written as:

𝐼 ⊗𝑎≅𝑎
𝑎⊗𝑏≅𝑏⊗𝑎
(𝑎 ⊗ 𝑏) ⊗ 𝑐 ≅ 𝑎 ⊗ (𝑏 ⊗ 𝑐)

These isomorphisms are often written as families of invertible arrows called associators and
unitors. If the monoidal category is not symmetric, there is a separate left and right unitor.

𝛼 ∶ (𝑎 ⊗ 𝑏) ⊗ 𝑐 → 𝑎 ⊗ (𝑏 ⊗ 𝑐)
𝜆∶ 𝐼 ⊗ 𝑎 → 𝑎
𝜌∶ 𝑎 ⊗ 𝐼 → 𝑎

The symmetry is witnessed by:


𝛾∶ 𝑎⊗𝑏 → 𝑏⊗𝑎
Functoriality lets us lift a pair of arrows:

𝑓 ∶ 𝑎 → 𝑎′
𝑔 ∶ 𝑏 → 𝑏′

to operate on tensor products:


𝑓 ⊗𝑔
← 𝑎′ ⊗ 𝑏′
𝑎 ⊗ 𝑏 ←←←←←←←←→
If we think of morphisms as actions, their tensor product corresponds to performing two
actions in parallel. Contrast this with the serial composition of morphisms, which suggests
their temporal ordering.
You may think of a tensor product as the lowest common denominator of product and sum.
It still has an introduction rule, which requires both objects 𝑎 and 𝑏; but it has no elimination
rule. Once created, a tensor product “forgets” how it was created. Unlike a cartesian product, it
has no projections.
Some interesting examples of tensor products are not even symmetric.

Monoids
Monoids are very simple structures equipped with a binary operation and a unit. Natural num-
bers with addition and zero form a monoid. So do natural numbers with multiplication and
one.
The intuition is that a monoid lets you combine two things to get another thing. There is also
one special thing, such that combining it with anything else gives back the same thing. That’s
the unit. And the combining must be associative.
What’s not assumed is that the combining is symmetric, or that there is an inverse element.
The rules that define a monoid are reminiscent of the rules of a category. The difference is
that, in a monoid, any two things are composable, whereas in a category this is usually not the
case: You can only compose two arrows if the target of one is the source of another. Except,
that is, when the category contains only one object, in which case all arrows are composable.
A category with a single object is called a monoid. The combining operation is the compo-
sition of arrows and the unit is the identity arrow.
40 CHAPTER 5. PRODUCT TYPES

This is a perfectly valid definition. In practice, however, we are often interested in monoids
that are embedded in larger categories. In particular, in programming, we want to be able to
define monoids inside the category of types and functions.
However, in a category, rather than looking at individual elements, we prefer to define oper-
ations in bulk. So we start with an object 𝑚. A binary operation is a function of two arguments.
Since elements of a product are pairs of elements, we can characterize a binary operation as an
arrow from a product 𝑚 × 𝑚 to 𝑚:
𝜇∶ 𝑚 × 𝑚 → 𝑚
The unit element can be defined as an arrow from the terminal object 1:

𝜂∶ 1 → 𝑚

We can translate this description directly to Haskell by defining a class of types equipped
with two methods, traditionally called mappend and mempty:
class Monoid m where
mappend :: (m, m) -> m
mempty :: () -> m
The two arrows 𝜇 and 𝜂 have to satisfy monoid laws but, again, we have to formulate them
in bulk, without any recourse to elements.
To formulate the left unit law, we first create the product 1 × 𝑚. We then use 𝜂 to “pick the
unit element in 𝑚” or, in terms of arrows, turn 1 into 𝑚. Since we are operating on a product
1 × 𝑚, we have to lift the pair ⟨𝜂, 𝑖𝑑𝑚 ⟩, which ensures that we “do not touch” the 𝑚. Finally we
perform the “multiplication” using 𝜇.
We want the result to be the same as the original element of 𝑚, but without mentioning
elements. So we just use the left unitor 𝜆 to go from 1 × 𝑚 to 𝑚 without “stirring things up.”
𝜂×𝑖𝑑𝑚
1×𝑚 𝑚×𝑚
𝜇
𝜆
𝑚
Here is the analogous law for the right unit:
𝑖𝑑𝑚 ×𝜂
𝑚×𝑚 𝑚×1
𝜇
𝜌
𝑚
To formulate the law of associativity, we have to start with a triple product and act on it in bulk.
Here, 𝛼 is the associator that rearranges the product without “stirring things up.”
𝛼
(𝑚 × 𝑚) × 𝑚 𝑚 × (𝑚 × 𝑚)
𝜇×𝑖𝑑 𝑖𝑑×𝜇

𝑚×𝑚 𝑚×𝑚
𝜇 𝜇

𝑚
Notice that we didn’t have to assume a lot about the categorical product that we used with
the objects 𝑚 and 1. In particular we never had to use projections. This suggests that the above
5.3. MONOIDAL CATEGORY 41

definition will work equally well for a tensor product in an arbitrary monoidal category. It
doesn’t even have to be symmetric. All we have to assume is that: there is a unit object, that the
product is functorial, and that it satisfies the unit and associativity laws up to isomorphism.
Thus if we replace × with ⊗ and 1 with 𝐼, we get a definition of a monoid in an arbitrary
monoidal category.
A monoid in a monoidal category is an object 𝑚 equipped with two morphisms:

𝜇∶ 𝑚 ⊗ 𝑚 → 𝑚

𝜂∶ 𝐼 → 𝑚
satisfying the unit and associativity laws:

𝜂⊗𝑖𝑑𝑚 𝑖𝑑𝑚 ⊗𝜂
1⊗𝑚 𝑚⊗𝑚 𝑚⊗1
𝜇
𝜆 𝜌
𝑚

𝛼
(𝑚 ⊗ 𝑚) ⊗ 𝑚 𝑚 ⊗ (𝑚 ⊗ 𝑚)
𝜇⊗𝑖𝑑𝑚 𝑖𝑑𝑚 ⊗𝜇

𝑚⊗𝑚 𝑚⊗𝑚
𝜇 𝜇

𝑚
We used the functoriality of ⊗ to lift pairs of arrows, as in 𝜂 ⊗ 𝑖𝑑𝑚 , 𝜇 ⊗ 𝑖𝑑𝑚 , etc.
Chapter 6

Function Types

There is another kind of composition that is at the heart of functional programming. It happens
when you pass a function as an argument to another function. The outer function can then use
this argument as a pluggable part of its own machinery. It lets you implement, for instance, a
generic sorting algorithm that accepts an arbitrary comparison function.
If we model functions as arrows between objects, then what does it mean to have a function
as an argument?
We need a way to objectify functions in order to define arrows that have an “object of ar-
rows” as a source or as a target. A function that takes a function as an argument or returns a
function is called a higher-order function. Higher-order functions are the work-horses of func-
tional programming.

Elimination rule
The defining quality of a function is that it can be applied to an argument to produce the result.
We have defined function application in terms of composition:

1
𝑦
𝑥

𝑎 𝑓
𝑏

Here 𝑓 is represented as an arrow from 𝑎 to 𝑏, but we would like to be able to replace 𝑓 with
an element of the object of arrows or, as mathematicians call it, the exponential object 𝑏𝑎 ; or as
we call it in programming, a function type a->b.
Given an element of 𝑏𝑎 and an element of 𝑎, function application should produce an element
of 𝑏. In other words, given a pair of elements:

𝑓 ∶ 1 → 𝑏𝑎
𝑥∶ 1 → 𝑎

it should produce an element:


𝑦∶ 1 → 𝑏
Keep in mind that, here, 𝑓 denotes an element of 𝑏𝑎 . Previously, it was an arrow from 𝑎 to
𝑏.

43
44 CHAPTER 6. FUNCTION TYPES

We know that a pair of elements (𝑓 , 𝑥) is equivalent to an element of the product 𝑏𝑎 × 𝑎. We


can therefore define function application as a single arrow:

𝜀𝑎𝑏 ∶ 𝑏𝑎 × 𝑎 → 𝑏

This way 𝑦, the result of the application, is defined by this commuting diagram:

1
𝑦
(𝑓 ,𝑥)

𝑏𝑎 × 𝑎 𝜀𝑎𝑏 𝑏

Function application is the elimination rule for function type.


When somebody gives you an element of the function object, the only thing you can do with
it is to apply it to an element of the argument type using 𝜀.

Introduction rule
To complete the definition of the function object, we also need the introduction rule.
First, suppose that there is a way of constructing a function object 𝑏𝑎 from some other object
𝑐. It means that there is an arrow
ℎ ∶ 𝑐 → 𝑏𝑎
We know that we can eliminate the result of ℎ using 𝜀𝑎𝑏 , but we have to first multiply it by 𝑎.
So let’s first multiply 𝑐 by 𝑎 and the use functoriality to map it to 𝑏𝑎 × 𝑎.
Functoriality lets us apply a pair of arrows to a product to get another product. Here, the
pair of arrows is (ℎ, 𝑖𝑑𝑎 ) (we want to turn 𝑐 into 𝑏𝑎 , but we’re not interested in modifying 𝑎)
ℎ×𝑖𝑑𝑎
← 𝑏𝑎 × 𝑎
𝑐 × 𝑎 ←←←←←←←←←←→

We can now follow this with function application to get to 𝑏


ℎ×𝑖𝑑𝑎 𝜀𝑎𝑏
← 𝑏𝑎 × 𝑎 ←←←←←→
𝑐 × 𝑎 ←←←←←←←←←←→ ← 𝑏

This composite arrow defines a mapping we’ll call 𝑓 :

𝑓∶ 𝑐×𝑎→𝑏

Here’s the corresponding diagram

𝑐×𝑎
𝑓
ℎ×𝑖𝑑𝑎

𝑏𝑎 × 𝑎 𝜀 𝑏
This commuting diagram tells us that, given an ℎ, we can construct an 𝑓 ; but we can also demand
the converse: Every mapping out, 𝑓 ∶ 𝑐 × 𝑎 → 𝑏, should uniquely define a mapping into the
exponential, ℎ ∶ 𝑐 → 𝑏𝑎 .
We can use this property, this one-to-one correspondence between two sets of arrows, to
define the exponential object. This is the introduction rule for the function object 𝑏𝑎 .
We’ve seen that product was defined using its mapping-in property. Function application,
on the other hand, is defined as a mapping out of a product.
45

Currying
There are several ways of looking at this definition. One is to see it as an example of currying.
So far we’ve been only considering functions of one argument. This is not a real limitation,
since we can always implement a function of two arguments as a (single-argument) function
from a product. The 𝑓 in the definition of the function object is such a function:
f :: (c, a) -> b
h on the other hand is a function that returns a function:
h :: c -> (a -> b)
Currying is the isomorphism between these two sets of arrows.
This isomorphism can be represented in Haskell by a pair of (higher-order) functions. Since,
in Haskell, currying works for any types, these functions are written using type variables—they
are polymorphic:
curry :: ((c, a) -> b) -> (c -> (a -> b))

uncurry :: (c -> (a -> b)) -> ((c, a) -> b)


In other words, the ℎ in the definition of the function object can be written as
ℎ = 𝑐𝑢𝑟𝑟𝑦 𝑓
Of course, written this way, the types of curry and uncurry correspond to function objects
rather than arrows. This distinction is usually glossed over because there is a one-to-one cor-
respondence between the elements of the exponential and the arrows that define them. This is
easy to see when we replace the arbitrary object 𝑐 with the terminal object. We get:

1×𝑎
𝑓
ℎ×𝑖𝑑𝑎

𝑏𝑎 × 𝑎 𝜀𝑎𝑏 𝑏
In this case, ℎ is an element of the object 𝑏𝑎 , and 𝑓 is an arrow from 1 × 𝑎 to 𝑏. But we know
that 1 × 𝑎 is isomorphic to 𝑎 so, effectively, 𝑓 is an arrow from 𝑎 to 𝑏.
Therefore, from now on, we’ll call an arrow -> an arrow →, without making much fuss
about it. The correct incantation for this kind of phenomenon is to say that the category is
self-enriched.
We can write 𝜀𝑎𝑏 as a Haskell function apply:
apply :: (a -> b, a) -> b
apply (f, x) = f x
but it’s just a syntactic trick: function application is built into the language: f x means f ap-
plied to x. Other programming languages require the arguments to a function to be enclosed in
parentheses, not so in Haskell.
Even though defining function application as a separate function may seem redundant, Haskell
library does provide an infix operator $ for that purpose:
($) :: (a -> b) -> a -> b
f $ x = f x
The trick, though, is that regular function application binds to the left, e.g., f x y is the same
as (f x) y; but the dollar sign binds to the right, so that
46 CHAPTER 6. FUNCTION TYPES

f $ g x
is the same as f (g x). In the first example, f must be a function of (at least) two arguments;
in the second, it could be a function of one argument.
In Haskell, currying is ubiquitous. A function of two arguments is almost always written
as a function returning a function. Because the function arrow -> binds to the right, there is no
need to parenthesize such types. For instance, the pair constructor has the signature:
pair :: a -> b -> (a, b)
You may think of if as a function of two arguments returning a pair, or a function of one argument
returning a function of one argument, b->(a, b). This way it’s okay to partially apply such a
function, the result being another function. For instance, we can define:
pairWithTen :: a -> (Int, a)
pairWithTen = pair 10 -- partial application of pair

Relation to lambda calculus


Another way of looking at the definition of the function object is to interpret 𝑐 as the type of the
environment in which 𝑓 is defined. In that case it’s customary to call the environment Γ. The
arrow is interpreted as an expression that uses the variables defined in Γ.
Consider a simple example, the expression:

𝑎𝑥2 + 𝑏𝑥 + 𝑐

You may think of it as being parameterized by a triple of real numbers (𝑎, 𝑏, 𝑐) and a variable 𝑥,
taken to be, let’s say, a complex number. The triple is an element of a product ℝ × ℝ × ℝ. This
product is the environment Γ for our expression.
The variable 𝑥 is an element of ℂ. The expression is an arrow from the product Γ × ℂ to the
result type (here, also ℂ)
𝑓∶ Γ×ℂ→ℂ
This is a mapping-out from a product, so we can use it to construct a function object ℂℂ and
define a mapping ℎ ∶ Γ → ℂℂ
Γ×ℂ
ℎ×𝑖𝑑ℂ 𝑓

ℂℂ × ℂ 𝜀 ℂ
This new mapping ℎ can be seen as a constructor of the function object. The resulting function
object represents all functions from ℂ to ℂ that have access to the environment Γ; that is, to the
triple of parameters (𝑎, 𝑏, 𝑐).
Corresponding to our original expression 𝑎𝑥2 + 𝑏𝑥 + 𝑐 there is a particular function in ℂℂ
that we write as:
𝜆𝑥. 𝑎𝑥2 + 𝑏𝑥 + 𝑐
or, in Haskell, with the backslash replacing 𝜆,
\x -> a * x^2 + b * x + c
The arrow ℎ ∶ Γ → ℂℂ is uniquely determined by the arrow 𝑓 . This mapping produces a
function that we call 𝜆𝑥.𝑓 .
6.1. SUM AND PRODUCT REVISITED 47

In general, the defining diagram for the function object becomes:

Γ×𝑎
𝑓
ℎ×𝑖𝑑𝑎

𝑏𝑎 × 𝑎 𝜀 𝑏

The environment Γ that provides free parameters for the expression 𝑓 is a product of multiple
objects representing the types of the parameters (in our example, it was ℝ × ℝ × ℝ).
An empty environment is represented by the terminal object 1, the unit of the product. In
that case, 𝑓 is just an arrow 𝑎 → 𝑏, and ℎ simply picks an element from the function object 𝑏𝑎
that corresponds to 𝑓 .
It’s important to keep in mind that, in general, a function object represents functions that
depend on external parameters. Such functions are called closures. Closures are functions that
capture values from their environment.
Here’s our example translated to Haskell. Corresponding to 𝑓 we have an expression:
(a :+ 0) * x * x + (b :+ 0) * x + (c :+ 0)
If we use Double to approximate ℝ, our environment is a product (Double, Double, Double).
The type Complex is parameterized by another type—here we used Double again:
type C = Complex Double
The conversion from Double to C is done by setting the imaginary part to zero, as in (a :+ 0).
The corresponding arrow ℎ takes the environment and produces a closure of the type C -> C:

h :: (Double, Double, Double) -> (C -> C)


h (a, b, c) = \x -> (a :+ 0) * x * x + (b :+ 0) * x + (c :+ 0)

Modus ponens
In logic, the function object corresponds to an implication. An arrow from the terminal object
to the function object is the proof of that implication. Function application 𝜀 corresponds to
what logicians call modus ponens: if you have a proof of the implication 𝐴 ⇒ 𝐵 and a proof of
𝐴 then this constitutes the proof of 𝐵.

6.1 Sum and Product Revisited


When functions gain the same status as elements of other types, we have the tools to directly
translate diagrams into code.

Sum types
Let’s start with the definition of the sum.
𝑎 𝑏
Left Right

𝑎+𝑏
𝑓 𝑔

𝑐
48 CHAPTER 6. FUNCTION TYPES

We said that the pair of arrows (𝑓 , 𝑔) uniquely determines the mapping ℎ out of the sum. We
can write it concisely using a higher-order function:
h = mapOut (f, g)
where:
mapOut :: (a -> c, b -> c) -> (Either a b -> c)
mapOut (f, g) = \aorb -> case aorb of
Left a -> f a
Right b -> g b
This function takes a pair of functions as an argument and it returns a function.
First, we pattern-match the pair (f, g) to extract f and g. Then we construct a new function
using a lambda. This lambda takes an argument of the type Either a b, which we call aorb,
and does the case analysis on it. If it was constructed using Left, we apply f to its contents,
otherwise we apply g.
Note that the function we are returning is a closure. It captures f and g from its environment.
The function we have implemented closely follows the diagram, but it’s not written in the
usual Haskell style. Haskell programmers prefer to curry functions of multiple arguments. Also,
if possible, they prefer to eliminate lambdas.
Here’s the version of the same function taken from the Haskell standard library, where it
goes under the name (lower-case) either:
either :: (a -> c) -> (b -> c) -> Either a b -> c
either f _ (Left x) = f x
either _ g (Right y) = g y
The other direction of the bijection, from ℎ to the pair (𝑓 , 𝑔), also follows the arrows of the
diagram.
unEither :: (Either a b -> c) -> (a -> c, b -> c)
unEither h = (h . Left, h . Right)

Product types
Product types are dually defined by their mapping-in property.

𝑐
ℎ 𝑔
𝑓
𝑎×𝑏

fst snd
𝑎 𝑏

Here’s the direct Haskell reading of this diagram


h :: (c -> a, c -> b) -> (c -> (a, b))
h (f, g) = \c -> (f c, g c)
And this is the stylized version written in Haskell style as an infix operator &&&
(&&&) :: (c -> a) -> (c -> b) -> (c -> (a, b))
(f &&& g) c = (f c, g c)
6.2. FUNCTORIALITY OF THE FUNCTION TYPE 49

The other direction of the bijection is given by:


fork :: (c -> (a, b)) -> (c -> a, c -> b)
fork h = (fst . h, snd . h)
which also closely follows the reading of the diagram.

Functoriality revisited
Both sum and product are functorial, which means that we can apply functions to their contents.
We are ready to translate those diagrams into code.
This is the functoriality of the sum type:

𝑎 𝑏
𝑓 Left Right 𝑔

𝑎′ 𝑎+𝑏 𝑏′

Left Right
𝑎′ + 𝑏′
Reading this diagram we can immediately write ℎ using either:
h f g = either (Left . f) (Right . g)
Or we could expand it and call it bimap:
bimap :: (a -> a') -> (b -> b') -> Either a b -> Either a' b'
bimap f g (Left a) = Left (f a)
bimap f g (Right b) = Right (g b)
Similarly for the product type:

𝑎×𝑏
fst ℎ snd

𝑎 𝑎′ × 𝑏′ 𝑏
𝑓 𝑔
fst snd
𝑎′ 𝑏′
ℎ can be written as:
h f g = (f . fst) &&& (g . snd)
Or it could be expanded to
bimap :: (a -> a') -> (b -> b') -> (a, b) -> (a', b')
bimap f g (a, b) = (f a, g b)
In both cases we call this higher-order function bimap since, in Haskell, both the sum and the
product are instances of a more general class called Bifunctor.

6.2 Functoriality of the Function Type


The function type, or the exponential, is also functorial, but with a twist. We are interested in a

mapping from 𝑏𝑎 to 𝑏′𝑎 , where the primed objects are related to the non-primed ones through
50 CHAPTER 6. FUNCTION TYPES

some arrows—to be determined.


The exponential is defined by its mapping-in property, so if we’re looking for

𝑘 ∶ 𝑏𝑎 → 𝑏′𝑎

we should draw the diagram that has 𝑘 as a mapping into 𝑏′𝑎 . We get this diagram from the
original definition by substituting 𝑏𝑎 for 𝑐 and primed objects for the non-primed ones:

𝑏𝑎 × 𝑎′
𝑘×𝑖𝑑𝑎 𝑔


𝑏′𝑎 × 𝑎′ 𝜀 𝑏′
The question is: can we find an arrow 𝑔 to complete this diagram?

𝑔 ∶ 𝑏𝑎 × 𝑎′ → 𝑏′

If we find such a 𝑔, it will uniquely define our 𝑘.


The way to think about this problem is to consider how we would implement 𝑔. It takes the
product 𝑏𝑎 × 𝑎′ as its argument. Think of it as a pair: an element of the function object from
𝑎 to 𝑏 and an element of 𝑎′ . The only thing we can do with the function object is to apply it to
something. But 𝑏𝑎 requires an argument of type 𝑎, and all we have at our disposal is 𝑎′ . We can’t
do anything unless somebody gives us an arrow 𝑎′ → 𝑎. This arrow applied to 𝑎′ will generate
the argument for 𝑏𝑎 . However, the result of the application is of type 𝑏, and 𝑔 is supposed to
produce a 𝑏′ . Again, we’ll need an arrow 𝑏 → 𝑏′ to complete our assignment.
This may sound complicated, but the bottom line is that we require two arrows between the
primed and non-primed objects. The twist is that the first arrow goes from 𝑎′ to 𝑎, which feels

backward from the usual functoriality considerations. In order to map 𝑏𝑎 to 𝑏′𝑎 we need a pair
of arrows:

𝑓 ∶ 𝑎′ → 𝑎
𝑔 ∶ 𝑏 → 𝑏′

This is somewhat easier to explain in Haskell. Our goal is to implement a function a' -> b',
given a function h :: a -> b.
This new function takes an argument of the type a' so, before we can pass it to h, we need
to convert a' to a. That’s why we need a function f :: a' -> a.
Since h produces a b, and we want to return a b', we need another function g :: b -> b'.
All this fits nicely into one higher-order function:
dimap :: (a' -> a) -> (b -> b') -> (a -> b) -> (a' -> b')
dimap f g h = g . h . f
Similar to bimap being an interface to the typeclass Bifunctor, dimap is a member of the
typeclass Profunctor.

6.3 Bicartesian Closed Categories


A category in which both the product and the exponential are defined for any pair of objects,
and which has a terminal object, is called cartesian closed. The idea is that hom-sets are not
6.3. BICARTESIAN CLOSED CATEGORIES 51

something alien to the category in question: the category is “closed” under the operation of
forming hom-sets.
If the category also has sums (coproducts) and the initial object, it’s called bicartesian
closed.
This is the minimum structure for modeling programming languages.
Data types constructed using these operations are called algebraic data types. We have
addition, multiplication, and exponentiation (but not subtraction or division) of types; with all
the familiar laws we know from high-school algebra. They are satisfied up to isomorphism.
There is one more algebraic law that we haven’t discussed yet.

Distributivity
Multiplication of numbers distributes over addition. Should we expect the same in a bicartesian
closed category?
𝑏 × 𝑎 + 𝑐 × 𝑎 ≅ (𝑏 + 𝑐) × 𝑎

The left to right mapping is easy to construct, since it’s simultaneously a mapping out of a
sum and a mapping into a product. We can construct it by gradually decomposing it into simpler
mappings. In Haskell, this means implementing a function
dist :: Either (b, a) (c, a) -> (Either b c, a)
A mapping out of the sum on the left is given by a pair of arrows:

𝑓 ∶ 𝑏 × 𝑎 → (𝑏 + 𝑐) × 𝑎
𝑔 ∶ 𝑐 × 𝑎 → (𝑏 + 𝑐) × 𝑎

𝑏×𝑎 Left Right 𝑐×𝑎

𝑏×𝑎+𝑐×𝑎
𝑓 𝑔
dist

(𝑏 + 𝑐) × 𝑎

We write it in Haskell as:


dist = either f g
where
f :: (b, a) -> (Either b c, a)
g :: (c, a) -> (Either b c, a)
The where clause is used to introduce the definitions of sub-functions.
Now we need to implement 𝑓 and 𝑔. They are mappings into the product, so each of them
is equivalent to a pair of arrows. For instance, the first one is given by the pair:

𝑓 ′ ∶ 𝑏 × 𝑎 → (𝑏 + 𝑐)
𝑓 ′′ ∶ 𝑏 × 𝑎 → 𝑎
52 CHAPTER 6. FUNCTION TYPES

𝑏×𝑎
𝑓′ 𝑓
𝑓 ′′
(𝑏 + 𝑐) × 𝑎

fst snd
𝑏+𝑐 𝑎
In Haskell:
f = f' &&& f''
f' :: (b, a) -> Either b c
f'' :: (b, a) -> a
The first arrow can be implemented by projecting the first component 𝑏 and then using Left to
construct the sum. The second is just the projection snd:
𝑓 ′ = Left◦fst
𝑓 ′′ = snd
Similarly, we decompose 𝑔 into a pair 𝑔 ′ and 𝑔 ′′ :
𝑐×𝑎
𝑔′ 𝑔
𝑔 ′′
𝑏+𝑐×𝑎

fst snd
(𝑏 + 𝑐) 𝑎
Combining all these together, we get:
dist = either f g
where
f = f' &&& f''
f' = Left . fst
f'' = snd
g = g' &&& g''
g' = Right . fst
g'' = snd
These are the type signatures of the helper functions:
f :: (b, a) -> (Either b c, a)
g :: (c, a) -> (Either b c, a)
f' :: (b, a) -> Either b c
f'' :: (b, a) -> a
g' :: (c, a) -> Either b c
g'' :: (c, a) -> a
They can also be inlined to produce this terse form:
dist = either ((Left . fst) &&& snd) ((Right . fst) &&& snd)
This style of programming is called point free because it omits the arguments (points). For
readability reasons, Haskell programmers prefer a more explicit style. The above function would
normally be implemented as:
6.3. BICARTESIAN CLOSED CATEGORIES 53

dist (Left (b, a)) = (Left b, a)


dist (Right (c, a)) = (Right c, a)
Notice that we have only used the definitions of sums and products. The other direction of
the isomorphism requires the use of the exponential, so it’s only valid in a bicartesian closed
category. This is not immediately clear from the straightforward Haskell implementation:
undist :: (Either b c, a) -> Either (b, a) (c, a)
undist (Left b, a) = Left (b, a)
undist (Right c, a) = Right (c, a)
but that’s because currying is implicit in Haskell.
Here’s the point-free version of this function:
undist = uncurry (either (curry Left) (curry Right))
This may not be the most readable implementation, but it underscores the fact that we need the
exponential: we use both curry and uncurry to implement the mapping.
We’ll come back to this identity later, when we are equipped with more powerful tools:
adjunctions.

Exercise 6.3.1. Show that:


2×𝑎≅𝑎+𝑎
where 2 is the Boolean type. Do the proof diagrammatically first, and then implement two
Haskell functions witnessing the isomorphism.
Chapter 7

Recursion

When you step between two mirrors, you see your reflection, the reflection of your reflection,
the reflection of that reflection, and so on. Each reflection is defined in terms of the previous
reflection, but together they produce infinity.
Recursion is a decomposition pattern that splits a single task into many steps, the number
of which is potentially unbounded.
Recursion is based on suspension of disbelief. You are faced with a task that may take
arbitrarily many steps. You tentatively assume that you know how to solve it. Then you ask
yourself the question: "How would I make the last step if I had the solution to everything but
the last step?"

7.1 Natural Numbers


An object of natural numbers 𝑁 does not contain numbers. Objects have no internal structure.
Structure is defined by arrows.
We can use an arrow from the terminal object to define one special element. By convention,
we’ll call this arrow 𝑍 for “zero.”
𝑍∶ 1 → 𝑁

But we have to be able to define infinitely many arrows to account for the fact that, for every
natural number, there is another number that is one larger than it.
We can formalize this statement by saying: Suppose that we know how to create a natural
number 𝑛 ∶ 1 → 𝑁. How do we make the next step, the step that will point us to the next
number—its successor?
This next step doesn’t have to be any more complex than just post-composing 𝑛 with an
arrow that loops back from 𝑁 to 𝑁. This arrow should not be the identity, because we want the
successor of a number to be different from that number. But a single such arrow, which we’ll
call 𝑆 for “successor” will suffice.
The element corresponding to the successor of 𝑛 is given by the composition:

𝑛 𝑆
1 ←←→
← 𝑁 ←←←→
← 𝑁

(We sometimes draw the same object multiple times in a single diagram, if we want to straighten
the looping arrows.)

55
56 CHAPTER 7. RECURSION

In particular, we can define 𝑂𝑛𝑒 as the successor of 𝑍:

𝑂𝑛𝑒
𝑍 𝑆
1 𝑁 𝑁

and 𝑇 𝑤𝑜 as the successor of the successor of 𝑍

𝑇 𝑤𝑜

𝑍 𝑆 𝑆
1 𝑁 𝑁 𝑁

and so on.

Introduction Rules
The two arrows, 𝑍 and 𝑆, serve as the introduction rules for the natural number object 𝑁. The
twist is that one of them is recursive: 𝑆 uses 𝑁 as its source as well as its target.

𝑍
1 𝑁

The two introduction rules translate directly to Haskell


data Nat where
Z :: Nat
S :: Nat -> Nat
They can be used to define arbitrary natural numbers; for instance:
zero, one, two :: Nat
zero = Z
one = S zero
two = S one
This definition of natural number type is not very useful in practice. However, it’s often
used in defining type-level naturals, where each number is its own type.
You may encounter this construction under the name of Peano arithmetic.

Elimination Rules
The fact that the introduction rules are recursive complicates the matters slightly when it comes
to defining elimination rules. We will follow the pattern from previous chapters of first assuming
that we are given a mapping out of 𝑁:

ℎ∶ 𝑁 → 𝑎

and see what we can deduce from there.


Previously, we were able to decompose such an ℎ into simpler mappings (pairs of mappings
for sum and product; a mapping out of a product for the exponential).
7.1. NATURAL NUMBERS 57

The introduction rules for 𝑁 look similar to those for the sum (it’s either 𝑍 or the successor),
so we would expect that ℎ could be split into two arrows. And, indeed, we can easily get the
first one by composing ℎ◦𝑍. This is an arrow that picks an element of 𝑎. We call it 𝑖𝑛𝑖𝑡:

𝑖𝑛𝑖𝑡 ∶ 1 → 𝑎

But there is no obvious way to find the second one.


To see that, let’s expand the definition of 𝑁:

𝑍 𝑆 𝑆
1 𝑁 𝑁 𝑁 ...

and plug ℎ and init into it:

𝑍 𝑆 𝑆
1 𝑁 𝑁 𝑁 ...
ℎ ℎ ℎ
𝑖𝑛𝑖𝑡
𝑎 𝑎 𝑎

The intuition is that an arrow from 𝑁 to 𝑎 represents a sequence 𝑎𝑛 of elements of 𝑎. The


zeroth element is given by
𝑎0 = 𝑖𝑛𝑖𝑡
The next element is
𝑎1 = ℎ◦𝑆◦𝑍
followed by
𝑎2 = ℎ◦𝑆◦𝑆◦𝑍
and so on.
We have thus replaced one arrow ℎ with infinitely many arrows 𝑎𝑛 . Granted, the new arrows
are simpler, since they represent elements of 𝑎, but there are infinitely many of them.
The problem is that, no matter how you look at it, an arbitrary mapping out of 𝑁 contains
infinite amount of information.
We have to drastically simplify the problem. Since we used a single arrow 𝑆 to generate all
natural numbers, we can try to use a single arrow 𝑎 → 𝑎 to generate all the elements 𝑎𝑛 . We’ll
call this arrow 𝑠𝑡𝑒𝑝:
𝑍 𝑆
1 𝑁 𝑁
ℎ ℎ
𝑖𝑛𝑖𝑡
𝑠𝑡𝑒𝑝
𝑎 𝑎
The mappings out of 𝑁 that are generated by such pairs, 𝑖𝑛𝑖𝑡 and 𝑠𝑡𝑒𝑝, are called recursive. Not
all mappings out of 𝑁 are recursive. In fact very few are; but recursive mappings are enough to
define the object of natural numbers.
We use the above diagram as the elimination rule. We decree that every recursive mapping
ℎ out of 𝑁 is in one-to-one correspondence with a pair 𝑖𝑛𝑖𝑡 and 𝑠𝑡𝑒𝑝.
This means that the evaluation rule (extracting (𝑖𝑛𝑖𝑡, 𝑠𝑡𝑒𝑝) for a given ℎ) cannot be for-
mulated for an arbitrary arrow ℎ ∶ 𝑁 → 𝑎, only for those arrows that have been previously
recursively defined using a pair (𝑖𝑛𝑖𝑡, 𝑠𝑡𝑒𝑝).
The arrow 𝑖𝑛𝑖𝑡 can be always recovered by composing ℎ◦𝑍. The arrow 𝑠𝑡𝑒𝑝 is a solution to
the equation:
𝑠𝑡𝑒𝑝◦ℎ = ℎ◦𝑆
58 CHAPTER 7. RECURSION

If ℎ was defined using some 𝑖𝑛𝑖𝑡 and 𝑠𝑡𝑒𝑝, then this equation obviously has a solution.
The important part is that we demand that this solution be unique.
Intuitively, the pair 𝑖𝑛𝑖𝑡 and 𝑠𝑡𝑒𝑝 generate the sequence of elements 𝑎0 , 𝑎1 , 𝑎2 , ... If two
arrows ℎ and ℎ′ are given by the same pair (𝑖𝑛𝑖𝑡, 𝑠𝑡𝑒𝑝), it means that the sequences they generate
are the same.
So if ℎ were somehow different from ℎ′ , it would mean that 𝑁 contains more than just the
sequence of elements 𝑍, 𝑆𝑍, 𝑆(𝑆𝑍), ... For instance, if we added −1 to 𝑁 (that is, made 𝑍
somebody’s successor), we could have ℎ and ℎ′ differ at −1 and yet be generated by the same
𝑖𝑛𝑖𝑡 and 𝑠𝑡𝑒𝑝. Uniqueness means there are no natural number before, after, or in between the
numbers generated by 𝑍 and 𝑆.
The elimination rule we’ve discussed here corresponds to primitive recursion. We’ll see a
more advanced version of this rule, corresponding to the induction principle, in the chapter on
dependent types.

In Programming
The elimination rule can be implemented as a recursive function in Haskell:
rec :: a -> (a -> a) -> (Nat -> a)
rec init step = \n ->
case n of
Z -> init
(S m) -> step (rec init step m)
This single function, which is called a recursor, is enough to implement all recursive func-
tions of natural numbers. For instance, this is how we could implement addition:
plus :: Nat -> Nat -> Nat
plus n = rec init step
where
init = n
step = S
This function takes n as an argument and produces a function (a closure) that takes another
number and adds n to it.
In practice, programmers prefer to implement recursion directly—an approach that is equiv-
alent to inlining the recursor rec. The following implementation is arguably easier to under-
stand:
plus n m = case m of
Z -> n
(S k) -> S (plus k n)
It can be read as: If m is zero then the result is n. Otherwise, if m is a successor of some k,
then the result is the successor of k + n. This is exactly the same as saying that init = n and
step = S.
In imperative languages recursion is often replaced by iteration. Conceptually, iteration
seems to be easier to understand, as it corresponds to sequential decomposition. The steps in
the sequence usually follow some natural order. This is in contrast with recursive decomposition,
where we assume that we have done all the work up to the 𝑛’th step, and we combine that result
with the next consecutive step.
7.2. LISTS 59

On the other hand, recursion is more natural when processing recursively defined data struc-
tures, such as lists or trees.
The two approaches are equivalent, and compilers often convert recursive functions to loops
in what is called tail recursion optimization.

Exercise 7.1.1. Implement a curried version of addition as a mapping out of 𝑁 into the function
object 𝑁 𝑁 . Hint: use these types in the recursor:
init :: Nat -> Nat
step :: (Nat -> Nat) -> (Nat -> Nat)

7.2 Lists
A list of things is either empty or a thing followed by a list of things.
This recursive definition translates into two introduction rules for the type 𝐿𝑎 , the list of 𝑎:

Nil ∶ 1 → 𝐿𝑎
Cons ∶ 𝑎 × 𝐿𝑎 → 𝐿𝑎

The Nil element describes an empty list, and Cons constructs a list from a head and a tail.
The following diagram depicts the relationship between projections and list constructors.
The projections extract the head and the tail of the list that was constructed using Cons.

Cons
𝑎 × 𝐿𝑎 1

𝑓 𝑠𝑡 𝑠𝑛𝑑 Nil
𝑎 𝐿𝑎

This description can be immediately translated to Haskell:


data List a where
Nil :: List a
Cons :: (a, List a) -> List a

Elimination Rule
Suppose that we have a mapping out from a list of 𝑎 to some arbitrary type 𝑐:

ℎ ∶ 𝐿𝑎 → 𝑐

This is how we would plug it into our definition of the list:

Nil Cons
1 𝐿𝑎 𝑎 × 𝐿𝑎
ℎ 𝑖𝑑𝑎 ×ℎ
𝑖𝑛𝑖𝑡
𝑐 𝑠𝑡𝑒𝑝
𝑎×𝑐

We used the functoriality of the product to apply the pair (𝑖𝑑𝑎 , ℎ) to the product 𝑎 × 𝐿𝑎 .
60 CHAPTER 7. RECURSION

Similar to the natural number object, we can try to define two arrows, 𝑖𝑛𝑖𝑡 = ℎ◦Nil and 𝑠𝑡𝑒𝑝.
The arrow 𝑠𝑡𝑒𝑝 is a solution to:

𝑠𝑡𝑒𝑝◦(𝑖𝑑𝑎 × ℎ) = ℎ◦Cons

Again, not every ℎ can be reduced to such a pair of arrows.


However, given 𝑖𝑛𝑖𝑡 and 𝑠𝑡𝑒𝑝, we can define an ℎ. Such a function is called a fold, or a list
catamorphism.
This is the list recursor in Haskell:
recList :: c -> ((a, c) -> c) -> (List a -> c)
recList init step = \as ->
case as of
Nil -> init
Cons (a, as) -> step (a, recList init step as)
Given init and step, it produces a mapping out of a list.
A list is such a basic data type that Haskell has a built-in syntax for it. The type (List a)
is written as [a]. The Nil constructor is an empty pair of square brackets, [], and the Cons
constructor is an infix colon (:).
We can pattern-match on these constructors. A generic mapping out of a list has the form:
h :: [a] -> c
h [] = -- empty-list case
h (a : as) = -- case for the head and the tail of a non-empty list
Corresponding to the list recursor, recList, here’s the type signature of the function foldr
(fold right), which you can find in the standard library:
foldr :: (a -> c -> c) -> c -> [a] -> c
Here’s one possible implementation:
foldr step init = \as ->
case as of
[] -> init
a : as -> step a (foldr step init as)
As an example, we can use foldr to calculate the sum of the elements of a list of natural
numbers:
sum :: [Nat] -> Nat
sum = foldr plus Z

Exercise 7.2.1. Consider what happens when you replace 𝑎 in the definition of a list with the
terminal object. Hint: What is base-one encoding of natural numbers?

Exercise 7.2.2. How many mappings ℎ ∶ 𝐿𝑎 → 1 + 𝑎 are there? Can we get all of them using
a list recursor? How about Haskell functions of the signature:
h :: [a] -> Maybe a

Exercise 7.2.3. Implement a function that extracts the third element from a list, if the list is long
enough. Hint: Use Maybe a for the result type.
7.3. FUNCTORIALITY 61

7.3 Functoriality
Functoriality means, roughly, the ability to transform the “contents” of a data structure. The
contents of a list 𝐿𝑎 is of the type 𝑎. Given an arrow 𝑓 ∶ 𝑎 → 𝑏, we need to define a mapping
of lists ℎ ∶ 𝐿𝑎 → 𝐿𝑏 .
Lists are defined by the mapping out property, so let’s replace the target 𝑐 of the elimination
rule by 𝐿𝑏 . We get:

Nil𝑎 Cons𝑎
1 𝐿𝑎 𝑎 × 𝐿𝑎
ℎ 𝑖𝑑𝑎 ×ℎ
𝑖𝑛𝑖𝑡
𝐿𝑏 𝑠𝑡𝑒𝑝
𝑎 × 𝐿𝑏

Since we are dealing with two different lists here, we have to distinguish between their construc-
tors. For instance, we have:

Nil𝑎 ∶ 1 → 𝐿𝑎
Nil𝑏 ∶ 1 → 𝐿𝑏

and similarly for Cons.


The only candidate for 𝑖𝑛𝑖𝑡 is Nil𝑏 , which is to say that ℎ acting on an empty list of 𝑎’s
produces an empty list of 𝑏’s:
ℎ◦Nil𝑎 = Nil𝑏
What remains is to define the arrow:

𝑠𝑡𝑒𝑝 ∶ 𝑎 × 𝐿𝑏 → 𝐿𝑏

We can guess:
𝑠𝑡𝑒𝑝 = Cons𝑏 ◦(𝑓 × 𝑖𝑑𝐿𝑏 )
This corresponds to the Haskell function:
mapList :: (a -> b) -> List a -> List b
mapList f = recList init step
where
init = Nil
step (a, bs) = Cons (f a, bs)
or, using the built-in list syntax and inlining the recursor,
map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (a : as) = f a : map f as
You might wonder what prevents us from choosing 𝑠𝑡𝑒𝑝 = 𝑠𝑛𝑑, resulting in:
badMap :: (a -> b) -> [a] -> [b]
badMap f [] = []
badMap f (a : as) = badMap f as
We’ll see, in the next chapter, why this is a bad choice. (Hint: What happens when we apply
badMap to 𝑖𝑑?)
Chapter 8

Functors

8.1 Categories
So far we’ve only seen only one category—that of types and functions. So let’s quickly gather
the essential info about a category.
A category is a collection of objects and arrows that go between them. Every pair of com-
posable arrows can be composed. The composition is associative, and there is an identity arrow
looping back on every object.
The fact that types and functions form a category can be expressed in Haskell by defining
composition as:
(.) :: (b -> c) -> (a -> b) -> (a -> c)
g . f = \x -> g (f x)
The composition of two functions g after f is a new function that first applies f to its argument
and then applies g to the result.
The identity is a polymorphic “do nothing” function:
id :: a -> a
id x = x
You can easily convince yourself that such composition is associative, and composing with id
does nothing to a function.
Based on the definition of a category, we can come up with all kinds of weird categories.
For instance, there is a category that has no objects and no arrows. It satisfies all the condition
of a category vacuously. There’s another one that contains a single object and a single arrow
(can you guess what arrow it is?). There’s one with two unconnected objects, and one where
the two objects are connected by a single arrow (plus two identity arrows), and so on. These are
example of what I call stick-figure categories—categories with a small handful of objects and
arrows.

Category of sets
We can also strip a category of all arrows (except for the identity arrows). Such a bare-object
category is called a discrete category or a set1 . Since we associate arrows with structure, a set
is a category with no structure.
1
Ignoring “size” issues.

63
64 CHAPTER 8. FUNCTORS

Sets form their own category called 𝐒𝐞𝐭 2 . The objects in that category are sets, and the
arrows are functions between sets. Such functions are defined as special kind of relations, which
themselves are defined as sets of pairs.
To the lowest approximation, we can model programming in the category of sets. We often
think of types as sets of values, and functions as set-theoretical functions. There’s nothing wrong
with that. In fact all of categorical constructions we’ve described so far have their set-theoretical
roots. The categorical product is a generalization of the cartesian product of sets, the sum is the
disjoint union, and so on.
What category theory offers is more precision: the fine distinction between the structure
that is absolutely necessary, and the superfluous details.
A set-theoretical function, for instance, doesn’t fit the definition of a function we work with
as programmers. Our functions must have underlying algorithms because they have to be com-
putable by some physical systems, be it computers or a human brains.

Opposite categories
In programming, the focus is on the category of types and functions, but we can use this category
as a starting point to construct other categories.
One such category is called the opposite category. This is the category in which all the
original arrows are inverted: what is called the source of an arrow in the original category is
now called its target, and vice versa.
The opposite of a category  is called  𝑜𝑝 . We’ve had a glimpse of this category when we
discussed duality. The objects of  𝑜𝑝 are the same as those of .
Whenever there is an arrow 𝑓 ∶ 𝑎 → 𝑏 in , there is a corresponding arrow 𝑓 𝑜𝑝 ∶ 𝑏 → 𝑎 in
 .
𝑜𝑝

The composition 𝑔 𝑜𝑝 ◦𝑓 𝑜𝑝 of two such arrows 𝑓 𝑜𝑝 ∶ 𝑎 → 𝑏 and 𝑔 𝑜𝑝 ∶ 𝑏 → 𝑐 is given by the


arrow (𝑓 ◦𝑔)𝑜𝑝 (notice the reversed order).
The terminal object in  is the initial object in  𝑜𝑝 , the product in  is the sum in  𝑜𝑝 , and
so on.

Product categories
Given two categories  and , we can construct a product category  × . The objects in this
category are pairs of objects ⟨𝑐, 𝑑⟩, and the arrows are pairs of arrows.
If we have an arrow 𝑓 ∶ 𝑐 → 𝑐 ′ in  and an arrow 𝑔 ∶ 𝑑 → 𝑑 ′ in  then there is a corre-
sponding arrow ⟨𝑓 , 𝑔⟩ in  × . This arrow goes from ⟨𝑐, 𝑑⟩ to ⟨𝑐 ′ , 𝑑 ′ ⟩, both being objects in
 × . Two such arrows can be composed if their components are composable in, respectively,
 and . An identity arrow is a pair of identity arrows.
The two product categories we’re most interested in are  ×  and  𝑜𝑝 × , where  is our
familiar category of types and functions.
In both of these categories, objects are pairs of objects from . In the first category,  × ,
a morphism from ⟨𝑎, 𝑏⟩ to ⟨𝑎′ , 𝑏′ ⟩ is a pair ⟨𝑓 ∶ 𝑎 → 𝑎′ , 𝑔 ∶ 𝑏 → 𝑏′ ⟩. In the second category,
 𝑜𝑝 × , a morphism is a pair ⟨𝑓 ∶ 𝑎′ → 𝑎, 𝑔 ∶ 𝑏 → 𝑏′ ⟩, in which the first arrow goes in the
opposite direction.

2
Again, ignoring “size” issues, in particular the non-existence of the set of all sets.
8.2. FUNCTORS 65

Slice categories
In a neatly organized universe, objects are always objects and arrows are always arrows. Except
that sometimes sets of arrows can be thought of as objects. But slice categories break this neat
separation: they turn individual arrows into objects.
A slice category ∕𝑐 describes how a particular object 𝑐 is seen from the perspective of its
category . It’s the totality of arrows pointing at 𝑐. But to specify an arrow we need to specify
both of its ends. Since one of these ends is fixed to be 𝑐, we only have to specify the other.
An object in the slice category ∕𝑐 (also known as an over-category) is a pair ⟨𝑒, 𝑝⟩, with
𝑝 ∶ 𝑒 → 𝑐.
An arrow between two objects ⟨𝑒, 𝑝⟩ and ⟨𝑒′ , 𝑝′ ⟩ is an arrow 𝑓 ∶ 𝑒 → 𝑒′ of , which makes
the following triangle commute:

𝑓
𝑒 𝑒′
𝑝 𝑝′
𝑐

Coslice categories
There is a dual notion of a coslice category 𝑐∕, also known as an under-category. It’s a category
of arrows emanating from a fixed object 𝑐. Objects in this category are pairs ⟨𝑎, 𝑖 ∶ 𝑐 → 𝑎⟩.
Morphisms in 𝑐∕ are arrows that make the relevant triangles commute.

𝑐
𝑖 𝑗

𝑎 𝑓
𝑏

In particular, if the category  has a terminal object 1, then the coslice 1∕ has, as objects,
global elements of all the objects of .
Morphisms of 1∕ that correspond to arrows 𝑓 ∶ 𝑎 → 𝑏 map the set of global elements of
𝑎 to the set of global elements of 𝑏.

1
𝑥 𝑦

𝑎 𝑓
𝑏

In particular, the construction of a coslice category from the category of types and functions
justifies our intuition of types as sets of values, with values represented by global elements of
types.

8.2 Functors
We’ve seen examples of functoriality when discussing algebraic data types. The idea is that
such a data type “remembers” the way it was created, and we can manipulate this memory by
applying an arrow to its “contents.”
In some cases this intuition is very convincing: we think of a product type as a pair that
“contains” its ingredients. After all, we can retrieve them using projections.
66 CHAPTER 8. FUNCTORS

This is less obvious in the case of function objects. You can visualize a function object
as secretly storing all possible results and using the function argument to index into them. A
function from Bool is obviously equivalent to a pair of values, one for True and one for False.
It’s a known programming trick to implement some functions as lookup tables. It’s called mem-
oization.
Even though it’s not practical to memoize functions that take, say, natural numbers as argu-
ments; we can still conceptualize them as (infinite, or even uncountable) lookup tables.
If you can think of a data type as a container of values, it makes sense to apply a function
to transform all these values, and create a transformed container. When this is possible, we say
that the data type is functorial.
Again, function types require some more suspension of disbelief. You visualize a function
object as a lookup table, keyed by some type. If you want to use another, related type as your
key, you need a function that translates the new key to the original key. This is why functoriality
of the function object has one of the arrows reversed:
dimap :: (a' -> a) -> (b -> b') -> (a -> b) -> (a' -> b')
dimap f g h = g . h . f
You are applying the transformation to a function h :: a -> b that has a “receptor” that re-
sponds to values of type a, and you want to use it to process input of type a'. This is only
possible if you have a converter from a' to a, namely f :: a' -> a.
The idea of a data type “containing” values of another type can be also expressed by saying
that one data type is paremeterized by another. For instance, the type List a is parameterized
by the type a.
In other words, List maps the type a to the type List a. List by itself, without the
argument, is called a type constructor.

Functors between categories


In category theory, a type constructor is modeled as a mapping of objects to objects. It’s a
function on objects. This is not to be confused with arrows between objects, which are part of
the structure of the category.
In fact, it’s easier to imagine a mapping between categories. Every object in the source cate-
gory is mapped to an object in the target category. If 𝑎 is an object in , there is a corresponding
object 𝐹 𝑎 in .
A functorial mapping, or a functor, not only maps objects but also arrows between them.
Every arrow
𝑓∶ 𝑎→𝑏
in the first category has a corresponding arrow in the second category:

𝐹𝑓 ∶ 𝐹𝑎 → 𝐹𝑏

𝑎 𝐹𝑎
𝑓 𝐹𝑓

𝑏 𝐹𝑏
We use the same letter, here 𝐹 , to name both, the mapping of objects and the mapping of arrows.
If categories distill the essence of structure, then functors are mappings that preserve this
structure. Objects that are related in the source category are related in the target category.
8.2. FUNCTORS 67

The structure of a category is defined by arrows and their composition. Therefore a functor
must preserve composition. What is composed in one category:

ℎ = 𝑔◦𝑓

should remain composed in the second category:

𝐹 ℎ = 𝐹 (𝑔◦𝑓 ) = 𝐹 𝑔◦𝐹 𝑓

We can either compose two arrows in  and map the composite to , or we can map individual
arrows and then compose them in . We demand that the result be the same.

𝑎 𝐹𝑎
𝑓 𝐹𝑓

𝑔◦𝑓 𝑏 𝐹 (𝑔◦𝑓 ) 𝐹𝑏 𝐹 𝑔◦𝐹 𝑓

𝑔 𝐹𝑔

𝑐 𝐹𝑐
Finally, a functor must preserve identity arrows:

𝐹 𝑖𝑑𝑎 = 𝑖𝑑𝐹 𝑎

𝑖𝑑𝐹 𝑎

𝐹 𝑖𝑑𝑎
𝑖𝑑𝑎

𝑎 𝐹𝑎
These conditions taken together define what it means for a functor to preserve the structure
of a category.
It’s also important to realize what conditions are not part of the definition. For instance, a
functor is allowed to map multiple objects into the same object. It can also map multiple arrows
into the same arrow, as long as the endpoints match.
In the extreme, any category can be mapped to a singleton category with one object and one
arrow.
Also, not all object or arrows in the target category must be covered by a functor. In the
extreme, we can have a functor from the singleton category to any (non-empty) category. Such
a functor picks a single object together with its identity arrow.
A constant functor Δ𝑐 is an example of a functor that maps all objects from the source
category to a single object 𝑐 in the target category, and all arrows from the source category to a
single identity arrow 𝑖𝑑𝑐 .
In category theory, functors are often used to create models of one category inside another.
The fact that they can merge multiple objects and arrows into one means that they produce
simplified views of the source category. They “abstract” some aspects of the source category.
The fact that they may only cover parts of the target category means that the models are
embedded in a larger environment.
Functors from some minimalistic, stick-figure, categories can be used to define patterns in
larger categories.
68 CHAPTER 8. FUNCTORS

Exercise 8.2.1. Describe a functor whose source is the “walking arrow” category. It’s a stick-
figure category with two objects and a single arrow between them (plus the mandatory identity
arrows).
𝑖𝑑𝑏
𝑖𝑑𝑎

𝑓
𝑎 𝑏

Exercise 8.2.2. The “walking iso” category is just like the “walking arrow” category, plus one
more arrow going back from 𝑏 to 𝑎. Show that a functor from this category always picks an
isomorphism in the target category.

8.3 Functors in Programming


Endofunctors are the class of functors that are the easiest to express in a programming language.
These are functors that map a category (here, the category of types and functions) to itself.

Endofunctors
The first part of the endofunctor is the mapping of types to types. This is done using type
constructors, which are type-level functions.
The list type constructor, List, maps an arbitrary type a to the type List a.
The Maybe type constructor maps a to Maybe a.
The second part of an endofunctor is the mapping of arrows. Given a function a -> b, we
want to be able to define a function List a -> List b, or Maybe a -> Maybe b. This is the
“functoriality” property of these data types that we have discussed before. Functoriality lets us
lift an arbitrary function to a function between transformed types.
Functoriality can be expressed in Haskell using a typeclass. In this case, the typeclass is
parameterized by a type constructor f (in Haskell we use lower case names for type-constructor
variables). We say that f is a Functor if there is a corresponding mapping of functions called
fmap:
class Functor f where
fmap :: (a -> b) -> (f a -> f b)
The compiler knows that f is a type constructor because it’s applied to types, as in f a and f b.
To prove to the compiler that a particular type constructor is a Functor, we have to provide
the implementation of fmap for it. This is done by defining an instance of the typeclass Functor.
For example:
instance Functor Maybe where
fmap g Nothing = Nothing
fmap g (Just a) = Just (g a)
A functor must also satisfy some laws: it must preserve composition and identity. These laws
cannot be expressed in Haskell, but should be checked by the programmer. We have previously
seen a definition of badMap that didn’t satisfy the identity laws, yet it would be accepted by the
compiler. It would define an “unlawful” instance of Functor for the list type constructor [].

Exercise 8.3.1. Show that WithInt is a functor


8.3. FUNCTORS IN PROGRAMMING 69

data WithInt a = WithInt a Int

There are some elementary functors that might seem trivial, but they serve as building blocks
for other functors.
We have the identity endofunctor that maps all objects to themselves, and all arrows to
themselves.
data Id a = Id a

Exercise 8.3.2. Show that Id is a Functor. Hint: implement the Functor instance for it.

We also have a constant functor Δ𝑐 that maps all objects to a single object 𝑐, and all arrows
to the identity arrow on this object. In Haskell, it’s a family of functors parameterized by the
target object c:
data Const c a = Const c
This type constructor ignores its second argument.

Exercise 8.3.3. Show that (Const c) is a Functor. Hint: The type constructor takes two ar-
guments, but in the Functor instance it’s partially applied to the first argument. It is functorial
in the second argument.

Bifunctors
We have also seen data constructors that take two types as arguments: the product and the
sum. They were functorial as well, but instead of lifting a single function, they lifted a pair
of functions. In category theory, we would define these as functors from the product category
 ×  to .
Such functors map a pair of objects to an object, and a pair of arrows to an arrow.
In Haskell, we treat such functors as members of a separate class called Bifunctor.
class Bifunctor f where
bimap :: (a -> a') -> (b -> b') -> (f a b -> f a' b')
Again, the compiler deduces that f is a two-argument type constructor because it sees it applied
to two types, e.g., f a b.
To prove to the compiler that a particular type constructor is a Bifunctor, we define an
instance. For example, bifunctoriality of a pair can be defined as:
instance Bifunctor (,) where
bimap g h (a, b) = (g a, h b)

Exercise 8.3.4. Show that MoreThanA is a bifunctor.


data MoreThanA a b = More a (Maybe b)

Contravariant functors
Functors from the opposite category  𝑜𝑝 are called contravariant. They have the property of
lifting arrows that go in the opposite direction. Regular functors are sometimes called covariant.
In Haskell, contravariant functors form the typeclass Contravariant:
70 CHAPTER 8. FUNCTORS

class Contravariant f where


contramap :: (b -> a) -> (f a -> f b)
It’s often convenient to think of functors in terms of producers and consumers. In this anal-
ogy, a (covariant) functor is a producer. You can turn a producer of a’s into a producer of b’s
by applying (using fmap) a function a->b. Conversely, to turn a consumer of a’s to a consumer
of b’s you need a function going in the opposite direction, b->a.
Example: A predicate is a function returning True or False:
data Predicate a = Predicate (a -> Bool)
It’s easy to see that it’s a contravariant functor:
instance Contravariant Predicate where
contramap f (Predicate h) = Predicate (h . f)
In practice, the only non-trivial examples of contravariant functors are variations on the
theme of function objects.
One way to tell if a given function type is covariant or contravariant in one of the type
arguments is by assigning polarities to the types used in its definition. We say that the return
type of a function is in a positive position, so it’s covariant; and the argument type is in the
negative position, so it’s contravariant. But if you put the whole function object in the negative
position of another function, then its polarities get reversed.
Consider this data type:
data Tester a = Tester ((a -> Bool) -> Bool)
It has a in a double-negative, therefore a positive position. This is why it’s a covariant Functor.
It acts as a producer of a’s:
instance Functor Tester where
fmap f (Tester g) = Tester g'
where g' h = g (h . f)
Notice that parentheses are important here. A similar function a -> Bool -> Bool has a
in a negative position. That’s because it’s a function of a returning a function (Bool -> Bool).
Equivalently, you may uncurry it to get a function that takes a pair: (a, Bool) -> Bool.
Either way, a ends up in the negative position.

Profunctors
We’ve seen before that the function type is functorial. It lifts two functions at a time, just like
Bifunctor, except that one of the functions goes in the opposite direction.
In category theory this corresponds to a functor from a product of two categories, one of
them being the opposite category: it’s a functor from  𝑜𝑝 × . Functors from  𝑜𝑝 ×  to 𝐒𝐞𝐭 are
called profunctors.
In Haskell, profunctors form a typeclass:
class Profunctor f where
dimap :: (a' -> a) -> (b -> b') -> (f a b -> f a' b')
You may think of a profunctor as a type that’s simultaneously a producer and a consumer.
It consumes one type and produces another.
The function type, which can be written as an infix operator (->), is an instance of Profunctor
8.4. THE HOM-FUNCTOR 71

instance Profunctor (->) where


dimap f g h = g . h . f
This is in accordance with our intuition that a function a->b consumes arguments of the type a
and produces results of the type b.
In programming, all non-trivial profunctors are variations on the function type.

8.4 The Hom-Functor


Arrows between any two objects form a set. This set is called a hom-set and is usually written
using the name of the category followed by the names of the objects:

(𝑎, 𝑏)

We can interpret the hom-set (𝑎, 𝑏) as all the ways 𝑏 can be observed from 𝑎.
Another way of looking at hom-sets is to say that they define a mapping that assigns a set
(𝑎, 𝑏) to every pair of objects. Sets themselves are objects in the category 𝐒𝐞𝐭. So we have a
mapping between categories.
This mapping is functorial. To see that, let’s consider what happens when we transform the
two objects 𝑎 and 𝑏. We are interested in a transformation that would map the set (𝑎, 𝑏) to
the set (𝑎′ , 𝑏′ ). Arrows in 𝐒𝐞𝐭 are regular functions, so it’s enough to define their action on
individual elements of a set.
An element of (𝑎, 𝑏) is an arrow ℎ ∶ 𝑎 → 𝑏 and an element of (𝑎′ , 𝑏′ ) is an arrow ℎ′ ∶ 𝑎′ →
𝑏 . We know how to transform one into another: we need to pre-compose ℎ with an arrow

𝑔 ′ ∶ 𝑎′ → 𝑎 and post-compose it with an arrow 𝑔 ∶ 𝑏 → 𝑏′ .


In other words, the mapping that takes a pair ⟨𝑎, 𝑏⟩ to the set (𝑎, 𝑏) is a profunctor:

 𝑜𝑝 ×  → 𝐒𝐞𝐭

Frequently, we are interested in varying only one of the objects, keeping the other fixed.
When we fix the source object and vary the target, the result is a functor that is written as:

(𝑎, −) ∶  → 𝐒𝐞𝐭

The action of this functor on an arrow 𝑔 ∶ 𝑏 → 𝑏′ is written as:

(𝑎, 𝑔) ∶ (𝑎, 𝑏) → (𝑎, 𝑏′ )

and is given by post-composition:


(𝑎, 𝑔) = (𝑔◦−)
Varying 𝑏 means switching focus from one object to another, so the complete functor (𝑎, −)
combines all the arrows emanating from 𝑎 into a coherent view of the category from the per-
spective of 𝑎. It is “the world according to 𝑎.”
Conversely, when we fix the target and vary the source of the hom-functor, we get a con-
travariant functor:
(−, 𝑏) ∶  𝑜𝑝 → 𝐒𝐞𝐭
whose action on an arrow 𝑔 ′ ∶ 𝑎′ → 𝑎 is written as:

(𝑔 ′ , 𝑏) ∶ (𝑎, 𝑏) → (𝑎′ , 𝑏)


72 CHAPTER 8. FUNCTORS

and is given by pre-composition:


(𝑔 ′ , 𝑏) = (−◦𝑔 ′ )
The functor (−, 𝑏) organizes all the arrows pointing at 𝑏 into one coherent view. It is the
picture of 𝑏 “as it’s seen by the world.”
We can now reformulate the results from the chapter on isomorphisms. If two objects 𝑎 and
𝑏 are isomorphic, then their hom-sets are also isomorphic. In particular:

(𝑎, 𝑥) ≅ (𝑏, 𝑥)

and
(𝑥, 𝑎) ≅ (𝑥, 𝑏)
We’ll discuss naturality conditions in the next chapter.
Another way of looking at the hom-functor (𝑎, −) is as an oracle that provides answers to
the question: “Is 𝑎 connected to me?” If the set (𝑎, 𝑥) is empty, the answer is negative: “𝑎 is
not connected to 𝑥.” Otherwise, every element of the set (𝑎, 𝑥) is a proof that such connection
exists.
Conversely, the contravariant functor (−, 𝑎) answers the question: “Am I connected to 𝑎?”
Taken together, the profunctor (𝑥, 𝑦) establishes a proof-relevant relation between objects.
Every element of the set (𝑥, 𝑦) is a proof that 𝑥 is connected to 𝑦. If the set is empty, the two
objects are unrelated.

8.5 Functor Composition


Just like we can compose functions, we can compose functors. Two functors are composable if
the target category of one is the source category of the other.
On objects, functor composition of 𝐺 after 𝐹 first applies 𝐹 to an object, then applies 𝐺 to
the result; and similarly on arrows.
Obviously, you can only compose composable functors. However all endofunctors are com-
posable, since their target category is the same as the source category.
In Haskell, a functor is a parameterized data type, so the composition of two functors is
again a parameterized data type. On objects, we define:
data Compose g f a = Compose (g (f a))
The compiler figures out that f and g must be type constructors because they are applied to
types: f is applied to the type parameter a, and g is applied to the resulting type.
Alternatively, you can tell the compiler that the first two arguments to Compose are type
constructors. You do this by providing a kind signature, which requires a language extension
KindSignatures that you put at the top of the source file:
{- # language KindSignatures # -}
You should also import the Data.Kind library that defines Type:
import Data.Kind
A kind signature is just like a type signature, except that it can be used to describe functions
operating on types.
Regular types have the kind Type. Type constructors have the kind Type -> Type, since
they map types to types.
8.5. FUNCTOR COMPOSITION 73

Compose takes two type constructors and produces a type constructor, so its kind signature
is:
(Type -> Type) -> (Type -> Type) -> (Type -> Type)
and the full definition is:
data Compose :: (Type -> Type) -> (Type -> Type) -> (Type -> Type)
where
Compose :: (g (f a)) -> Compose g f a
Any two type constructors can be composed this way. There is no requirement, at this point,
that they be functors.
However, if we want to lift a function using the composition of type constructors, g af-
ter f, then they must be functors. This requirement is encoded as a constraint in the instance
declaration:
instance (Functor g, Functor f) => Functor (Compose g f) where
fmap h (Compose gfa) = Compose (fmap (fmap h) gfa)
The constraint (Functor g, Functor f) expresses the condition that both type constructors
be instances of the Functor class. The constraints are followed by a double arrow.
The type constructor whose functoriality we are establishing is Compose f g, which is a
partial application of Compose to two functors.
In the implementation of fmap, we pattern match on the data constructor Compose. Its
argument gfa is of the type g (f a). We use one fmap to “get under” g. Then we use (fmap h)
to get under f. The compiler knows which fmap to use by analyzing the types.
You may visualize a composite functor as a container of containers. For instance, the com-
position of [] with Maybe is a list of optional values.

Exercise 8.5.1. Define a composition of a Functor after Contravariant. Hint: You can reuse
Compose, but you have to provide a different instance declaration.

Category of categories
We can view functors as arrows between categories. As we’ve just seen, functors are composable
and it’s easy to check that this composition is associative. We also have an identity (endo-)
functor for every category. So categories themselves seem to form a category, let’s call it 𝐂𝐚𝐭.
And this is where mathematicians start worrying about “size” issues. It’s a shorthand for
saying that there are paradoxes lurking around. So the correct incantation is that 𝐂𝐚𝐭 is a cat-
egory of small categories. But as long as we are not engaged in proofs of existence, we can
ignore size problems.
Chapter 9

Natural Transformations

We’ve seen that, when two objects 𝑎 and 𝑏 are isomorphic, they generate bijections between sets
of arrows, which we can now express as isomorphisms between hom-sets. For all 𝑥, we have:

(𝑎, 𝑥) ≅ (𝑏, 𝑥)
(𝑥, 𝑎) ≅ (𝑥, 𝑏)

The converse is not true, though. An isomorphism between hom-sets does not result in an
isomorphism between object unless additional naturality conditions are satisfied. We’ll now
re-formulate these naturality conditions in progressively more general settings.

9.1 Natural Transformations Between Hom-Functors


One way an isomorphism between two objects can be established is by directly providing two
arrows—one the inverse of the other. But quite often it’s easier to do it indirectly, by defining
bijections between arrows, either the ones impinging on the two objects, or the ones emanating
from the two objects.
For instance, as we’ve seen before, we may have, for every 𝑥, an invertible mapping of
arrows 𝛼𝑥 .
𝑥

𝛼𝑥

𝑎 𝑏
In other words, for every 𝑥, there is a mapping of hom-sets:

𝛼𝑥 ∶ (𝑥, 𝑎) → (𝑥, 𝑏)

When we vary 𝑥, the two hom-sets become two (contravariant) functors, (−, 𝑎) and (−, 𝑏),
and 𝛼 can be seen as a mapping between them. Such a mapping of functors, called a transfor-
mation, is really a family of individual mappings 𝛼𝑥 , one per each object 𝑥 in the category .
The functor (−, 𝑎) describes the way the worlds sees 𝑎, and the functor (−, 𝑏) describes
the way the world sees 𝑏.
The transformation 𝛼 switches back and forth between these two views. Every component
of 𝛼, the bijection 𝛼𝑥 , shows that the view of 𝑎 from 𝑥 is isomorphic to the view of 𝑏 from 𝑥.

75
76 CHAPTER 9. NATURAL TRANSFORMATIONS

The naturality condition we discussed before was the condition:

𝛼𝑦 ◦(−◦𝑔) = (−◦𝑔)◦𝛼𝑥
It relates components of 𝛼 taken at different objects. In other words, it relates the views from
two different observers 𝑥 and 𝑦, who are connected by an arrow 𝑔 ∶ 𝑦 → 𝑥.
Both sides of this equation are acting on the hom-set (𝑥, 𝑎). The result is in the hom-set
(𝑦, 𝑏). We can rewrite the two sides as:
(−◦𝑔) 𝛼𝑦
(𝑥, 𝑎) ←←←←←←←←←→
← (𝑦, 𝑎) ←←←←→
← (𝑦, 𝑏)
𝛼𝑥 (−◦𝑔)
(𝑥, 𝑎) ←←←←→
← (𝑥, 𝑏) ←←←←←←←←←→
← (𝑦, 𝑏)
Precomposition with 𝑔 ∶ 𝑦 → 𝑥 is also a mapping of hom-sets. In fact it is the lifting of 𝑔
by the contravariant hom-functor. We can write it as (𝑔, 𝑎) and (𝑔, 𝑏), respectively.
(𝑔,𝑎) 𝛼𝑦
(𝑥, 𝑎) ←←←←←←←←←←→
← (𝑦, 𝑎) ←←←←→
← (𝑦, 𝑏)
𝛼𝑥 (𝑔,𝑏)
(𝑥, 𝑎) ←←←←→
← (𝑥, 𝑏) ←←←←←←←←←←→
← (𝑦, 𝑏)
The naturality condition can therefore be rewritten as:
𝛼𝑦 ◦(𝑔, 𝑎) = (𝑔, 𝑏)◦𝛼𝑥
It can be illustrated by this commuting diagram:
(𝑔,𝑎)
(𝑥, 𝑎) (𝑦, 𝑎)
𝛼𝑥 𝛼𝑦
(𝑔,𝑏)
(𝑥, 𝑏) (𝑦, 𝑏)
We can now re-formulate our previous result: An invertible transformation 𝛼 between the
functors (−, 𝑎) and (−, 𝑏) that satisfies the naturality condition is equivalent to an isomor-
phism between 𝑎 and 𝑏.
We can follow exactly the same reasoning for the outgoing arrows. This time we start with
a transformation 𝛽 whose components are:
𝛽𝑥 ∶ (𝑎, 𝑥) → (𝑏, 𝑥)
The two (covariant) functors (𝑎, −) and (𝑏, −) describe the view of the world from the per-
spective of 𝑎 and 𝑏, respectively. The invertible transformation 𝛽 tells us that these two views
are equivalent, and the naturality condition
(𝑔◦−)◦𝛽𝑥 = 𝛽𝑦 ◦(𝑔◦−)
tells us that they behave nicely when we switch focus.
Here’s the commuting diagram that illustrates the naturality condition:
(𝑎,𝑔)
(𝑎, 𝑥) (𝑎, 𝑦)
𝛽𝑥 𝛽𝑦
(𝑏,𝑔)
(𝑏, 𝑥) (𝑏, 𝑦)
Again, such an invertible natural transformation 𝛽 establishes the isomorphism between 𝑎
and 𝑏.
9.2. NATURAL TRANSFORMATION BETWEEN FUNCTORS 77

9.2 Natural Transformation Between Functors


The two hom-functors from the previous section were

𝐹 𝑥 = (𝑎, 𝑥)
𝐺𝑥 = (𝑏, 𝑥)

They both map the category  to 𝐒𝐞𝐭, because that’s where the hom-sets live. We can say that
they create two different models of  inside 𝐒𝐞𝐭.
A natural transformation is a structure-preserving mapping between two such models.

(𝑎, 𝑥)
(𝑎,−)

𝑥 𝛽𝑥

(𝑏,−)
(𝑏, 𝑥)

This idea naturally extends to functors between any pair of categories. Any two functors

𝐹∶  →
𝐺∶  → 

may be seen as two different models of  inside .


To transform one model into another we connect the corresponding dots using arrows in .
For every object 𝑥 in  we pick an arrow that goes from 𝐹 𝑥 to 𝐺𝑥:

𝛼𝑥 ∶ 𝐹 𝑥 → 𝐺𝑥

A natural transformation thus maps objects to arrows.

𝐹𝑥
𝐹

𝑥 𝛼𝑥

𝐺
𝐺𝑥

The structure of a model, though, has as much to do with objects as it does with arrows, so
let’s see what happens to arrows. For every arrow 𝑓 ∶ 𝑥 → 𝑦 in , we have two corresponding
arrows in :

𝐹𝑓 ∶ 𝐹𝑥 → 𝐹𝑦
𝐺𝑓 ∶ 𝐺𝑥 → 𝐺𝑦

These are the two liftings of 𝑓 . We can use them to move within the bounds of each of the two
models. Then there are the components of 𝛼 which let us switch between models.
Naturality says that it shouldn’t matter whether you first move inside the first model and
then jump to the second one, or first jump to the second one and then move within it. This is
illustrated by the commuting naturality square:
78 CHAPTER 9. NATURAL TRANSFORMATIONS

𝐹𝑓
𝐹𝑥 𝐹𝑦
𝛼𝑥 𝛼𝑦
𝐺𝑓
𝐺𝑥 𝐺𝑦
Such a family of arrows 𝛼𝑥 that satisfies the naturality condition is called a natural transfor-
mation.
This is a diagram that shows a pair of categories, two functors between them, and a natural
transformation 𝛼 between the functors:

 𝛼 

Since for every arrow in  there is a corresponding naturality square, we can say that a
natural transformation maps objects to arrows, and arrows to commuting squares.
If every component 𝛼𝑥 of a natural transformation is an isomorphism, 𝛼 is called a natural
isomorphism.
We can now restate the main result about isomorphisms: Two objects are isomorphic if and
only if there is a natural isomorphism between their hom-functors (either the covariant, or the
contravariant ones—either one will do).
Natural transformations provide a very convenient high-level way of expressing commuting
conditions in a variety of situations. We’ll use them in this capacity to reformulate the definitions
of algebraic data types.

9.3 Natural Transformations in Programming


A natural transformation is a family of arrows parameterized by objects. In programming, this
corresponds to a family of functions parameterized by types, that is a polymorphic function.
The type of the argument to a natural transformation is described using one functor, and the
return type using another.
In Haskell, we can define a data type that accepts two type constructors representing two
functors, and produces a type of natural transformations:
data Natural :: (Type -> Type) -> (Type -> Type) -> Type where
Natural :: (forall a. f a -> g a) -> Natural f g
The forall quantifier tells the compiler that the function is polymorphic—that is, it’s defined
for every type a. As long as f and g are functors, this formula defines a natural transformation.
The types defined by forall are very special, though. The are polymorphic in the sense of
parametric polymorphism. It means that a single formula is used for all types. We’ve seen the
example of the identity function, which can be written as:
id :: forall a. a -> a
id x = x
The body of this function is very simple, just the variable x. It doesn’t matter what type x is, the
formula remains the same.
9.3. NATURAL TRANSFORMATIONS IN PROGRAMMING 79

This is in contrast to ad-hoc polymorphism. An ad-hoc polymorphic function may use dif-
ferent implementations for different types. An example of such a function is fmap, the member
function of the Functor typeclass. There is one implementation of fmap for lists, a different
one for Maybe, and so on, case by case.
The standard definition of a (parametric) natural transformation in Haskell uses a type syn-
onym:
type Natural f g = forall a. f a -> g a
A type declaration introduces an alias, a shorthand, for the right-hand-side.
It turns out that limiting the type of a natural transformation to adhere to parametric poly-
morphism has far-reaching consequences. Such a function automatically satisfies naturality
conditions. It’s an example of parametricity producing so called theorems for free.
We can’t express equalities of arrows in Haskell, but we can use naturality to transform
programs. In particular, if alpha is a natural transformation, we can replace:
fmap h . alpha
with:
alpha . fmap h
Here, the compiler will automatically figure out what versions of fmap and which components
of alpha to use.
We can also use more advanced language options to make the choices explicit. We can
express naturality using a pair of functions:
oneWay ::
forall f g a b. (Functor f, Functor g) =>
Natural f g -> (a -> b) -> f a -> g b
oneWay alpha h = fmap @g h . alpha @a

otherWay ::
forall f g a b. (Functor f, Functor g) =>
Natural f g -> (a -> b) -> f a -> g b
otherWay alpha h = alpha @b . fmap @f h
The annotations @a and @b specify the components of the parametrically polymorphic function
alpha, and the annotations @f and @g specify the functors for which the ad-hoc polymorphic
fmap is instantiated.
The following Haskell extensions have to be specified at the top of the file:
{- # language RankNTypes # -}
{- # language TypeApplications # -}
{- # language ScopedTypeVariables # -}
Here’s an example of a useful function that is a natural transformation between the list func-
tor and the Maybe functor:
safeHead :: Natural [] Maybe
safeHead [] = Nothing
safeHead (a : as) = Just a
(The standard library head function is “unsafe” in that it faults when given an empty list.)
80 CHAPTER 9. NATURAL TRANSFORMATIONS

Another example is the function reverse, which reverses a list. It’s a natural transformation
from the list functor to the list functor:
reverse :: Natural [] []
reverse [] = []
reverse (a : as) = reverse as ++ [a]

Incidentally, this is a very inefficient implementation. The actual library function uses an opti-
mized algorithm.
A useful intuition for understanding natural transformations builds on the idea that functors
acts like containers for data. There are two completely orthogonal things that you can do with a
container: You can transform the data it contains, without changing the shape of the container.
This is what fmap does. Or you can transfer the data, without modifying it, to another con-
tainer. This is what a natural transformation does: It’s a procedure for moving “stuff” between
containers without knowing what kind of “stuff” it is.
In other words, a natural transformation repackages the contents of one container into an-
other container. It does it in a way that is agnostic of the type of the contents, which means it
cannot inspect, create, or modify the contents. All it can do is to move it to a new location, or
drop it.
Naturality condition enforces the orthogonality of these two operations. It doesn’t matter if
you first modify the data and then move it to another container; or first move it, and then modify.
This is another example of successfully decomposing a complex problem into a sequence
of simpler ones. Keep in mind, though, that not every operation with containers of data can be
decomposed in that way. Filtering, for instance, requires both examining the data, as well as
changing the size or even the shape of the container.
On the other hand, almost every parametrically polymorphic function is a natural transfor-
mation. In some cases you may have to consider the identity or the constant functor as either
source or the target. For instance, the polymorphic identity function can be though of as a natural
transformation between two identity functors.

Vertical composition of natural transformations

Natural transformations can only be defined between parallel functors, that is functors that share
the same source category and the same target category. Such parallel functors form a functor
category. The standard notation for a functor category between two categories  and  is [, ].
You just put the names of the two categories between square brackets.
The objects in [, ] are functors, the arrows are natural transformations.
To show that this is indeed a category, we have to define the composition of natural transfor-
mations. This is easy if we keep in mind that components of natural transformations are regular
arrows in the target category. These arrows compose.
Indeed, suppose that we have a natural transformation 𝛼 between two functors 𝐹 and 𝐺. We
want to compose it with another natural transformation 𝛽 that goes from 𝐺 to 𝐻.
9.3. NATURAL TRANSFORMATIONS IN PROGRAMMING 81

𝛼
𝐺
 
𝛽

𝐻
Let’s look at the components of these transformations at some object 𝑥

𝛼𝑥 ∶ 𝐹 𝑥 → 𝐺 𝑥

𝛽𝑥 ∶ 𝐺 𝑥 → 𝐻 𝑥
These are just two arrows in  that are composable. So we can define a composite natural
transformation 𝛾 as follows:
𝛾∶ 𝐹 → 𝐻
𝛾𝑥 = 𝛽𝑥 ◦𝛼𝑥
This is called the vertical composition of natural transformations. You’ll see it written using a
dot 𝛾 = 𝛽 ⋅ 𝛼 or a simple juxtaposition 𝛾 = 𝛽𝛼.
Naturality condition for 𝛾 can be shown by pasting together (vertically) two naturality squares
for 𝛼 and 𝛽:
𝐹𝑓
𝐹𝑥 𝐹𝑦
𝛼𝑥 𝛼𝑦
𝐺𝑓 𝛾𝑦
𝛾𝑥 𝐺𝑥 𝐺𝑦
𝛽𝑥 𝛽𝑦
𝐻𝑓
𝐻𝑥 𝐻𝑦
In Haskell, vertical composition of natural transformation is just regular function compo-
sition applied to polymorphic functions. Using the intuition that natural transformations move
items between containers, vertical composition combines two such moves, one after another.

Functor categories
Since the composition of natural transformations is defined in terms of composition of arrows,
it is automatically associative.
There is also an identity natural transformation 𝑖𝑑𝐹 defined for every functor 𝐹 . Its compo-
nent at 𝑥 is the usual identity arrow at the object 𝐹 𝑥:

(𝑖𝑑𝐹 )𝑥 = 𝑖𝑑𝐹 𝑥

To summarize, for every pair of categories  and  there is a category of functors [, ]
with natural transformations as arrows.
The hom-set in that category is the set of natural transformations between two functors 𝐹
and 𝐺. Following the standard notational convention, we write it as:

[, ](𝐹 , 𝐺)
82 CHAPTER 9. NATURAL TRANSFORMATIONS

with the name of the category followed by the names of the two objects (here, functors) in
parentheses.
In category theory objects and arrows are drawn differently. Objects are dots and arrows are
pointy lines.
In 𝐂𝐚𝐭, the category of categories, functors are drawn as arrows. But in a functor category
[, ] functors are dots and natural transformations are arrows.
What is an arrow in one category could be an object in another.

Exercise 9.3.1. Prove the naturality condition of the composition of natural transformations:

𝛾𝑦 ◦𝐹 𝑓 = 𝐻𝑓 ◦𝛾𝑥

Hint: Use the definition of 𝛾 and the two naturality conditions for 𝛼 and 𝛽.

Horizontal composition of natural transformations


The second kind of composition of natural transformations is induced by the composition of
functors. Suppose that we have a pair of composable functors

𝐹∶  → 𝐺∶  → 

and, in parallel, another pair of composable functors:

𝐹′∶  →  𝐺′ ∶  → 

We also have two natural transformations:

𝛼∶ 𝐹 → 𝐹′ 𝛽 ∶ 𝐺 → 𝐺′

Pictorially:
𝐹 𝐺

 𝛼  𝛽 

𝐹′ 𝐺′
The horizontal composition 𝛽◦𝛼 maps 𝐺◦𝐹 to 𝐺′ ◦𝐹 ′ .

𝐺◦𝐹

 𝛽◦𝛼 

𝐺′ ◦𝐹 ′

Let’s pick an object 𝑥 in  and try to define the component of the composite (𝛽◦𝛼) at 𝑥. It
should be a morphism in :
(𝛽◦𝛼)𝑥 ∶ 𝐺(𝐹 𝑥) → 𝐺′ (𝐹 ′ 𝑥)
We can use 𝛼 to map 𝑥 to an arrow

𝛼𝑥 ∶ 𝐹 𝑥 → 𝐹 ′ 𝑥
9.3. NATURAL TRANSFORMATIONS IN PROGRAMMING 83

We can lift this arrow using 𝐺

𝐺(𝛼𝑥 ) ∶ 𝐺(𝐹 𝑥) → 𝐺(𝐹 ′ 𝑥)

To get from there to 𝐺′ (𝐹 ′ 𝑥) we can use the appropriate component of 𝛽

𝛽𝐹 ′ 𝑥 ∶ 𝐺(𝐹 ′ 𝑥) → 𝐺′ (𝐹 ′ 𝑥)

Altogether, we have
(𝛽◦𝛼)𝑥 = 𝛽𝐹 ′ 𝑥 ◦𝐺(𝛼𝑥 )
But there is another equally plausible candidate:

(𝛽◦𝛼)𝑥 = 𝐺′ (𝛼𝑥 )◦𝛽𝐹 𝑥

Fortunately, they are equal due to naturality of 𝛽.

𝐺(𝐹 𝑥)
𝛽𝐹 𝑥

𝐹𝑥 𝐺(𝛼𝑥 ) 𝐺′ (𝐹 𝑥)

𝑥 𝛼𝑥 𝐺(𝐹 ′ 𝑥) 𝐺′ (𝛼𝑥 )
𝛽𝐹 ′ 𝑥

𝐹 ′𝑥 𝐺′ (𝐹 ′ 𝑥)

The proof of naturality of 𝛽◦𝛼 is left as an exercise to a dedicated reader.


We can translate this directly to Haskell. We start with two natural transformations:
alpha :: forall x. F x -> F' x
beta :: forall x. G x -> G' x
Their horizontal composition has the following type signature:
beta_alpha :: forall x. G (F x) -> G' (F' x)
It has two equivalent implementations. The first one is:
beta_alpha = beta . fmap alpha
The compiler will automatically pick the correct version of fmap, the one for the functor G. The
second implementation is:
beta_alpha = fmap alpha . beta
Here, the compiler will pick the version of fmap for the functor G'.
What’s the intuition for horizontal composition? We’ve seen before that a natural transfor-
mation can be seen as repackaging data between two containers–functors. Here we are dealing
with nested containers. We start with the outer container described by G that is filled with inner
containers, each described by F. We have two natural transformations, alpha for transfering the
contents of F to F', and beta for moving the contents of G to G'. There are two ways of moving
data from G (F x) to G'(F' x). We can use fmap alpha to repackage all inner containers,
and then use beta to repackage the outer container.
84 CHAPTER 9. NATURAL TRANSFORMATIONS

𝐹 𝐹′
𝐹′
𝐺 𝐺 𝐺′

Or we can first use beta to repackage the outer container, and then apply fmap alpha to
repackage all the inner containers. The end result is the same.

𝐹
𝐹 𝐹′
𝐺
𝐺′ 𝐺′

Exercise 9.3.2. Implement two versions of horizontal composition of safeHead after reverse.
Compare their results acting on various arguments.

Exercise 9.3.3. Do the same with the horizontal composition of reverse after safeHead.

Whiskering
Quite often horizontal composition is used with one of the natural transformations being the
identity. There is a shorthand notation for such composition. For instance, 𝛼◦𝑖𝑑𝐹 is written as
𝛼◦𝐹 .
Because of the characteristic shape of the diagram, such composition is called “whiskering”.


𝐹
 𝛼 

𝐺′

In components, we have:
(𝛼◦𝐹 )𝑥 = 𝛼𝐹 𝑥
Let’s consider how we would translate this to Haskell. A natural transformation is a poly-
morphic function. Because of parametricity, it’s defined by the same formula for all types. So
whiskering on the right doesn’t change the formula, it changes the function signature.
For instance, if this is the declaration of alpha:
alpha :: forall x. G x -> G' x
then its whiskered version would be:
9.3. NATURAL TRANSFORMATIONS IN PROGRAMMING 85

alpha_f :: forall x. G (F x) -> G' (F x)


alpha_f = alpha
Because of Haskell’s type inference, this shift is implicit. When a polymorphic function is
called, we don’t have to specify which component of the natural transformation is executed–the
type checker figures it out by looking at the type of the argument.
The intuition in this case is that we are repackaging the outer container leaving the inner
containers intact.
Similarly, 𝑖𝑑𝐻 ◦𝛼 is written as 𝐻◦𝛼.

 𝛼 
𝐻

𝐺′

In components:
(𝐻◦𝛼)𝑥 = 𝐻(𝛼𝑥 )

In Haskell, the lifting of 𝛼𝑥 by 𝐻 is done using fmap, so given:


alpha :: forall x. G x -> G' x
the whiskered version would be:
h_alpha :: forall x. H (G x) -> H (G' x)
h_alpha = fmap alpha
Again, Haskell’s type inference engine figures out which version of fmap to use (here, it’s the
one from the Functor instance of G).
The intuition is that we are repackaging the contents of the inner containers leaving the outer
container intact.
Finally, in many applications a natural transformation is whiskered on both sides:


𝐹
 𝛼 
𝐻

𝐺′

In components, we have:
(𝐻◦𝛼◦𝐹 )𝑥 = 𝐻(𝛼𝐹 𝑥 )

and in Haskell:
h_alpha_f :: forall x. H (G (F x)) -> H (G' (F x))
h_alpha_f = fmap alpha
Here the intuition is that we have a triple layer of containers; and we are rearranging the
middle one, leaving the outer container and all the inner containers intact.
86 CHAPTER 9. NATURAL TRANSFORMATIONS

Interchange law
We can combine vertical composition with horizontal composition, as seen in the following
diagram:
𝐹 𝐹′

𝛼 𝛼′
𝐺 𝐺′
  
𝛽 𝛽′

𝐻 𝐻′
The interchange law states that the order of composition doesn’t matter: we can first do vertical
compositions and then the horizontal one, or first do the horizontal compositions and then the
vertical one.

9.4 Universal Constructions Revisited


Lao Tzu says, the simplest pattern is the clearest.
We’ve seen definitions of sums, products, exponentials, natural numbers, and lists.
The old-school approach to defining such data types is to explore their internals. It’s the
set-theory way: we look at how the elements of new sets are constructed from the elements of
old sets. An element of a sum is either an element of the first set, or the second set. An element
of a product is a pair of elements. And so on. We are looking at objects from the engineering
point of view.
In category theory we take the opposite approach. We are not interested in what’s inside
the object or how it’s implemented. We are interested in the purpose of the object, how it can
be used, and how it interacts with other objects. We are looking at objects from the utilitarian
point of view.
Both approaches have their advantages. The categorical approach came later, because you
need to study a lot of examples before clear patterns emerge. But once you see the patterns, you
discover unexpected connections between things, like the duality between sums and products.
Defining particular objects through their connections requires looking at possibly infinite
numbers of objects with which they interact.
“Tell me your relation to the Universe, and I’ll tell you who you are.”
Defining an object by its mappings-out or mappings-in with respect to all objects in the
category is called a universal construction.
Why are natural transformations so important? It’s because most categorical constructions
involve commuting diagrams. If we can re-cast these diagrams as naturality squares, we move
one level up the abstraction ladder and gain new valuable insights.
Being able to compress a lot of facts into small elegant formulas helps us see new patterns.
We’ll see, for instance, that natural isomorphisms between hom-sets pop up all over category
theory and eventually lead to the idea of an adjunction.
But first we’ll study several examples in greater detail to get some understanding of the terse
language of category theory. We’ll try, for instance, to decode the statement that the sum, or the
coproduct of two objects, is defined by the following natural isomorphism:
9.4. UNIVERSAL CONSTRUCTIONS REVISITED 87

[𝟐, ](𝐷, Δ𝑥 ) ≅ (𝑎 + 𝑏, 𝑥)

Picking objects
Even such a simple task as pointing at an object has a special interpretation in category theory.
We have already seen that pointing at an element of a set is equivalent to selecting a function
from the singleton set to it. Similarly, picking an object in a category is equivalent to selecting
a functor from the single-object category. Alternatively, it can be done using a constant functor
from another category.
Quite often we want to pick a pair of objects. That, too, can be accomplished by selecting
a functor from a two-object stick-figure category. Similarly, picking an arrow is equivalent to
selecting a functor from the “walking arrow” category, and so on.
By judiciously selecting our functors and natural transformations between them, we can
reformulate all the universal constructions we’ve seen so far.

Cospans as natural transformations


The definition of a sum requires the selection of two objects to be summed; and a third one to
serve as the target of the mapping out.

𝑎 𝑏
Left Right

𝑎+𝑏
𝑓 𝑔

𝑐
This diagram can be further decomposed into two simpler shapes called cospans:

𝑎 𝑏

To construct a cospan we first have to pick a pair of objects. To do that we’ll start with a
two-object category 𝟐. We’ll call its objects 1 and 2. We’ll use a functor

𝐷∶ 𝟐 → 

to select the objects 𝑎 and 𝑏:

𝐷1 = 𝑎
𝐷2 = 𝑏

(𝐷 stands for “diagram”, since the two objects form a very simple diagram consisting of two
dots in .)
We’ll use the constant functor
Δ𝑥 ∶ 𝟐 → 
to select the object 𝑥. This functor maps both 1 and 2 to 𝑥 (and the two identity arrows to 𝑖𝑑𝑥 ).
88 CHAPTER 9. NATURAL TRANSFORMATIONS

Since both functors go from 𝟐 to , we can define a natural transformation 𝛼 between them.
In this case, it’s just a pair of arrows:

𝛼1 ∶ 𝐷 1 → Δ𝑥 1
𝛼2 ∶ 𝐷 2 → Δ𝑥 2

These are exactly the two arrows in the cospan.


Naturality condition for 𝛼 is trivial, since there are no arrows (other than identities) in 𝟐.
There may be many cospans sharing the same three objects—meaning: there may be many
natural transformations between the two functors 𝐷 and Δ𝑥 . These natural transformations form
a hom-set in the functor category [𝟐, ], namely:

[𝟐, ](𝐷, Δ𝑥 )

Functoriality of cospans
Let’s consider what happens when we start varying the object 𝑥 in a cospan. We have a mapping
𝐹 that takes 𝑥 to the set of cospans over 𝑥:

𝐹 𝑥 = [𝟐, ](𝐷, Δ𝑥 )

This mapping turns out to be functorial in 𝑥.


To see that, consider an arrow 𝑚 ∶ 𝑥 → 𝑦. The lifting of this arrow is a mapping between
two sets of natural transformations:

[𝟐, ](𝐷, Δ𝑥 ) → [𝟐, ](𝐷, Δ𝑦 )

This might look very abstract until you remember that natural transformations have com-
ponents, and these components are just regular arrows. An element of the left-hand side is a
natural transformation:
𝜇 ∶ 𝐷 → Δ𝑥
It has two components corresponding to the two objects in 𝟐. For instance, we have

𝜇1 ∶ 𝐷 1 → Δ𝑥 1

or, using the definitions of 𝐷 and Δ:


𝜇1 ∶ 𝑎 → 𝑥
This is just the left leg of our cospan.
Similarly, the element of the right-hand side is a natural transformation:

𝜈 ∶ 𝐷 → Δ𝑦

Its component at 1 is an arrow


𝜈1 ∶ 𝑎 → 𝑦
We can get from 𝜇1 to 𝜈1 simply by post-composing it with 𝑚 ∶ 𝑥 → 𝑦. So the lifting of 𝑚 is a
component-by-component post-compositon (𝑚◦−):

𝜈1 = 𝑚◦𝜇1
𝜈2 = 𝑚◦𝜇2
9.4. UNIVERSAL CONSTRUCTIONS REVISITED 89

Sum as a universal cospan


Of all the cospans that you can build on the pair 𝑎 and 𝑏, the one with the arrows we called Left
and Right converging on 𝑎 + 𝑏 is very special. There is a unique mapping out of it to any other
cospan—a mapping that makes two triangles commute.

𝑎 𝑏
Left Right

𝑎+𝑏
𝑓 𝑔

𝑥
We are now in the position to translate this condition to a statement about natural transfor-
mations and hom-sets. The arrow ℎ is an element of the hom-set

(𝑎 + 𝑏, 𝑥)

A cospan over 𝑥 is a natural transformation, that is an element of the hom-set in the functor
category:
[𝟐, ](𝐷, Δ𝑥 )
Both are hom-sets in their respective categories. And both are just sets, that is objects in the
category 𝐒𝐞𝐭. This category forms a bridge between the functor category [𝟐, ] and a “regular”
category , even though, conceptually, they seem to be at very different levels of abstraction.
Paraphrasing Sigmund Freud, “Sometimes a set is just a set.”
Our universal construction is the bijection or the isomorphism of sets:

[𝟐, ](𝐷, Δ𝑥 ) ≅ (𝑎 + 𝑏, 𝑥)

Moreover, if we vary the object 𝑥, the two sides behave like functors from  to 𝐒𝐞𝐭. There-
fore it makes sense to ask if this mapping of functors is a natural isomorphism.
Indeed, it can be shown that the naturality condition for this isomorphism translates into
commuting conditions for the triangles in the definition of the sum. So the definition of the sum
can be replaced by a single equation.

Product as a universal span


An analogous argument can be made about the universal construction for the product. Again,
we start with the stick-figure category 𝟐 and the functor 𝐷. But this time we use a natural
transformation going in the opposite direction

𝛼 ∶ Δ𝑥 → 𝐷

Such a natural transformation is a pair of arrows that form a span:


𝑥
𝑓 𝑔

𝑎 𝑏
Collectively, these natural transformations form a hom-set in the functor category :

[𝟐, ](Δ𝑥 , 𝐷)
90 CHAPTER 9. NATURAL TRANSFORMATIONS

Every element of this hom-set is in one-to-one correspondence with a unique mapping ℎ into
the product 𝑎 × 𝑏. Such a mapping is a member of the hom-set (𝑥, 𝑎 × 𝑏). This correspondence
is expressed as the isomorphism:

[𝟐, ](Δ𝑥 , 𝐷) ≅ (𝑥, 𝑎 × 𝑏)

It can be shown that the naturality of this isomorphism guarantees that the triangles in this
diagram commute:
𝑥
𝑓 =𝛼1 ℎ 𝑔=𝛼2

𝑎×𝑏

fst snd
𝑎 = 𝐷1 𝑏 = 𝐷2

Exponentials
The exponentials, or function objects, are defined by this commuting diagram:

𝑥×𝑎
𝑓
ℎ×𝑖𝑑𝑎

𝑏𝑎 × 𝑎 𝜀𝑎𝑏 𝑏

Here, 𝑓 is an element of the hom-set (𝑥 × 𝑎, 𝑏) and ℎ is an element of (𝑥, 𝑏𝑎 ).


The isomorphism between these sets, natural in 𝑥, defines the exponential object.

(𝑥 × 𝑎, 𝑏) ≅ (𝑥, 𝑏𝑎 )

The 𝑓 in the diagram above is an element of the left-hand side, and ℎ is the corresponding
element of the right-hand side. The transformation 𝛼𝑥 (which also depends on 𝑎 and 𝑏) maps 𝑓
to ℎ.
𝛼𝑥 ∶ (𝑥 × 𝑎, 𝑏) → (𝑥, 𝑏𝑎 )
In Haskell, we call it curry. Its inverse, 𝛼 −1 is known as uncurry.
Unlike in the previous examples, here both hom-sets are in the same category, and it’s easy
to analyze the isomorphism in more detail. In particular, we’d like to see how the commuting
condition:
𝑓 = 𝜀𝑎𝑏 ◦(ℎ × 𝑖𝑑𝑎 )
arises from naturality.
The standard Yoneda trick is to make a substitution for 𝑥 that would reduce one of the hom-
sets to an endo-hom-set, that is a hom-set whose source is the same the target. This will allow
us to pick a canonical element of that hom-set, that is the identity arrow.
In our case, substituting 𝑏𝑎 for 𝑥 will allow us to pick ℎ = 𝑖𝑑(𝑏𝑎 ) .

𝑏𝑎 × 𝑎
𝑖𝑑(𝑏𝑎 ) ×𝑖𝑑𝑎 𝑓

𝑏𝑎 × 𝑎 𝜀𝑎𝑏 𝑏
9.4. UNIVERSAL CONSTRUCTIONS REVISITED 91

The commuting condition in this case tells us that 𝑓 = 𝜀𝑎𝑏 . In other words, we get the formula
for 𝜀𝑎𝑏 in terms of 𝛼:
𝜀𝑎𝑏 = 𝛼𝑥−1 (𝑖𝑑𝑥 )

where 𝑥 is equal to 𝑏𝑎 .
Since we recognize 𝛼 −1 as uncurry, and 𝜀 as function application, we can write it in Haskell
as:
apply :: (a -> b, a) -> b
apply = uncurry id
This my be surprising at first, until you realize that the currying of (a->b,a)->b leads to
(a->b)->(a->b).
We can also encode the two sides of the main isomorphism as Haskell functors:
data LeftFunctor a b x = LF ((x, a) -> b)

data RightFunctor a b x = RF (x -> (a -> b))


They are both contravariant functors in the type variable x.
instance Contravariant (LeftFunctor a b) where
contramap g (LF f) = LF (f . bimap g id)
This says that the lifting of 𝑔 ∶ 𝑥 → 𝑦 is given by the following pre-composition:

(−◦(𝑔×𝑖𝑑𝑎 ))
(𝑦 × 𝑎, 𝑏) ←←←←←←←←←←←←←←←←←←←←→
← (𝑥 × 𝑎, 𝑏)

Similarly:
instance Contravariant (RightFunctor a b) where
contramap g (RF h) = RF (h . g)
translates to:
(−◦𝑔)
(𝑦, 𝑏𝑎 ) ←←←←←←←←←→
← (𝑥, 𝑏𝑎 )

The natural transformation 𝛼 is just a thin encapsulation of curry; and its inverse is uncurry:
alpha :: forall a b x. LeftFunctor a b x -> RightFunctor a b x
alpha (LF f) = RF (curry f)

alpha_1 :: forall a b x. RightFunctor a b x -> LeftFunctor a b x


alpha_1 (RF h) = LF (uncurry h)
Using the two formulas for the lifting of 𝑔 ∶ 𝑥 → 𝑦, here’s the naturality square:

(−◦(𝑔×𝑖𝑑𝑎 ))
(𝑦 × 𝑎, 𝑏) (𝑥 × 𝑎, 𝑏)
𝛼𝑦 𝛼𝑥
(−◦𝑔)
(𝑦, 𝑏𝑎 ) (𝑥, 𝑏𝑎 )

Let’s now apply the Yoneda trick to it and replace 𝑦 with 𝑏𝑎 . This also allows us to substitute
𝑔—which now goes for 𝑥 to 𝑏𝑎 —with ℎ.
92 CHAPTER 9. NATURAL TRANSFORMATIONS

(−◦(ℎ×𝑖𝑑𝑎 ))
(𝑏𝑎 × 𝑎, 𝑏) (𝑥 × 𝑎, 𝑏)
𝛼(𝑏𝑎 ) 𝛼𝑥
(−◦ℎ)
(𝑏𝑎 , 𝑏𝑎 ) (𝑥, 𝑏𝑎 )

We know that the hom-set (𝑏𝑎 , 𝑏𝑎 ) contains at least the identity arrow, so we can pick the
element 𝑖𝑑(𝑏𝑎 ) in the lower left corner.
Reversing the arrow on the left, we know that 𝛼 −1 acting on identity produces 𝜀𝑎𝑏 in the
upper left corner (that’s the uncurry id trick).
Pre-composition with ℎ acting on identity produces ℎ in the lower right corner.
𝛼 −1 acting on ℎ produces 𝑓 in the upper right corner.

(−◦(ℎ×𝑖𝑑𝑎 ))
𝜀𝑎𝑏 𝑓
𝛼 −1 𝛼 −1
(−◦ℎ)
𝑖𝑑(𝑏𝑎 ) ℎ

(The ↦ arrows denote the action of functions on elements of sets.)


So the selection of 𝑖𝑑(𝑏𝑎 ) in the lower left corner fixes the other three corners. In particular,
we can see that the upper arrow applied to 𝜀𝑎𝑏 produces 𝑓 , which is exactly the commuting
condition:
𝜀𝑎𝑏 ◦(ℎ × 𝑖𝑑𝑎 ) = 𝑓
the one that we set out to derive.

9.5 Limits and Colimits


In the previous section we defined the sum and the product using natural transformations. These
were transformations between diagrams defined as functors from a very simple stick-figure cat-
egory 𝟐, one of the functors being the constant functor.
Nothing prevents us from replacing the category 𝟐 with something more complex. For in-
stance, we could try categories that have non-trivial arrows between objects, or categories with
infinitely many objects.
There is a whole vocabulary built around such constructions.
We used objects in the category 𝟐 for indexing objects in the category . We can replace 𝟐
with an arbitrary indexing category  . A diagram in  is still defined as a functor 𝐷 ∶  → .
It picks objects in , but it also picks arrows between them.
As the second functor we’ll still use the constant functor Δ𝑥 ∶  → .
A natural transformation, that is an element of the hom-set

[ , ](Δ𝑥 , 𝐷)

is now called a cone. Its dual, an element of

[ , ](𝐷, Δ𝑥 )

is called a cocone. They generalize the span and the cospan, respectively.
9.5. LIMITS AND COLIMITS 93

Diagramatically, cones and cocones look like this:

𝑥 𝐷1 𝐷2
𝑓 𝑔

𝐷3

𝑓 𝑔
𝐷1 𝐷2

𝐷3 𝑥

Since the indexing category may now contain arrows, the naturality conditions for these
diagrams are no longer trivial. The constant functor Δ𝑥 shrinks all vertices to one, so naturality
squares shrink to triangles. Naturality means that all triangles with 𝑥 in their apex must now
commute.
The universal cone, if it exists, is called the limit of the diagram 𝐷, and is written as Lim𝐷.
Universality means that it satisfies the following isomorphism, natural in 𝑥:

[ , ](Δ𝑥 , 𝐷) ≅ (𝑥, Lim𝐷)

For each cone with the apex 𝑥 there is a unique mapping from 𝑥 into the limit Lim𝐷.
A limit of a 𝐒𝐞𝐭-valued functor has a particularly simple characterization. It’s a set of cones
with the singleton set at the apex. Indeed, elements of the limit, that is functions from the
singleton set to it, are in one-to-one correspondence with such cones:

[ , ](Δ1 , 𝐷) ≅ (1, Lim𝐷)

Dually, the universal cocone is called a colimit, and is described by the following natural
isomorphism:
[ , ](𝐷, Δ𝑥 ) ≅ (Colim𝐷, 𝑥)
We can now say that a product is a limit (and a sum, a colimit) of a diagram from the indexing
category 𝟐.
Limits and colimits distill the essence of a pattern.
A limit, like a product, is defined by its mapping-in property. A colimit, like a sum, is
defined by its mapping-out property.
There are many interesting limits and colimits, and we’ll see some when we discuss algebras
and coalgebras.

Exercise 9.5.1. Show that the limit of a “walking arrow” category, that is a two-object category
with an arrow connecting the two objects, has the same elements as the first object in the diagram
(“elements” are the arrows from the terminal object).

Equalizers
A lot of high-school math involves learning how to solve equations or systems of equations. An
equation equates the outcomes of two different ways of producing something. If we are allowed
to subtract things, we usually shove everything to one side and simplify the problem to the one
of calculating the zeros of some expression. In geometry, the same idea is expressed as the
intersection of two geometric objects.
94 CHAPTER 9. NATURAL TRANSFORMATIONS

In category theory all these patterns are embodied in a single construction called an equal-
izer. An equalizer is a limit of a diagram whose pattern is given by a stick-figure category with
two parallel arrows:
𝑖 𝑗
The two arrows represent two ways of producing something.
A functor from this category picks a pair of objects and a pair of morphisms in the target
category. The limit of this diagram embodies an intersection of the two outcomes. It is an object
𝑒 with two arrows 𝑝 ∶ 𝑒 → 𝑎 and 𝑝′ ∶ 𝑒 → 𝑏.

𝑒
𝑝 𝑝′
𝑓
𝑎 𝑏
𝑔

We have two commuting conditions:

𝑝′ = 𝑓 ◦ℎ
𝑝′ = 𝑔◦ℎ

It means that 𝑝′ is fully determined by one of the equations, while the other turns into the con-
straints:
𝑓 ◦𝑝 = 𝑔◦𝑝
Since the equalizer is the limit, it is the universal such pair, as illustrated in this diagram:

𝑥

𝑓
𝑒 𝑝 𝑎 𝑏
𝑔

To develop the intuition for equalizers it’s instructive to consider how it works for sets. As
usual, the trick is to replace 𝑥 with the singleton set 1:

1
𝑎
𝑒
𝑓
𝐸 𝑝 𝐴 𝐵
𝑔

In this case 𝑎 is an element of 𝐴 such that 𝑓 𝑎 = 𝑔𝑎. That’s just a way of saying that 𝑎 is the
solution of a pair of equations. Universality means that there is a unique element 𝑒 of 𝐸 such
that 𝑝◦𝑒 = 𝑎. In other words, elements of 𝐸 are in one-to-one correspondence with the solutions
of the system of equations.

Coequalizers
What’s dual to the idea of equating or intersecting? It’s the process of discovering commonalities
and organizing things into buckets. For instance, we can distribute integers into even and odd
buckets. In category theory, this process of bucketizing is described by coequalizers.
9.5. LIMITS AND COLIMITS 95

A coequalizer is the colimit of the same diagram that we used to define the equalizer:

𝑓
𝑎 𝑏
𝑔
𝑞′ 𝑞
𝑐

This time, the arrow 𝑞 ′ is fully determined by 𝑞; and 𝑞 must satisfy the equation:

𝑞◦𝑓 = 𝑞◦𝑔

Again, we can gain some intuition by considering a coequalizer of two functions acting on
sets.
𝑓 𝑞
𝐴 𝐵 𝐶
𝑔

An 𝑥 ∈ 𝐴 is mapped to two elements 𝑓 𝑥 and 𝑔𝑥 in 𝐵, but then 𝑞 maps them back to a single
element of 𝐶. This element represents the bucket. Universality means that 𝐶 is a copy of 𝐵 in
which the elements that were produced from the same 𝑥 have been identified.
Consider an example where 𝐴 is a set of pairs of integers (m, n), such that either both
are even or both are odd. We want to coequalize two functions that are the two projections
(fst, snd). The equalizer set 𝐶 will have two elements corresponding to two buckets. We’ll
represent it as Bool. The equalizing function q selects the bucket:
q :: Int -> Bool
q n = n `mod` 2 == 0
Any function q' that cannot distinguish between the components of our pairs can be uniquely
factorized through the function h:
h :: (Int -> a) -> Bool -> a
h q' True = g' 0
h q' False = g' 1

Exercise 9.5.2. Run a few tests that show that, in the example above, the factorization (h g') . q
gives the same result as g' given by the following definition:
import Data.Bits

q' :: Int -> Bool


q' x = testBit x 0

Exercise 9.5.3. What is the coequalizer of the pair (id, reverse), both of the type String->String?
Test its universality by factorizing the following function:
q' :: String -> Maybe Char
q' s = if even len
then Nothing
else Just (s !! (len `div` 2))
where len = length s
96 CHAPTER 9. NATURAL TRANSFORMATIONS

The existence of the terminal object


Lao Tzu says: great acts are made up of small deeds.
So far we’ve been studying limits of tiny diagrams, that is functors from simple stick-figure
categories. Nothing, however, prevents us from defining limits and colimits where the patterns
are taken to be infinite categories. But there is a gradation of infinities. When the objects in a
category form a proper set, we call such a category small. Unfortunately, the very basic example,
the category 𝐒𝐞𝐭 of sets, is not small. We know that there is no set of all sets. 𝐒𝐞𝐭 is a large
category. But at least all the hom-sets in 𝐒𝐞𝐭 are sets. We say that 𝐒𝐞𝐭 is locally small. In what
follows we’ll be always working with locally small categories.
A small limit is a limit of a small diagram, that is a functor from a category whose objects
and morphisms form sets. A category in which all small limits exist is called small complete,
or just complete. In particular, in such a category, a product of an arbitrary set of objects exists.
You can also equalize an arbitrary set of arrows between two objects. If such category is locally
small, that means all equalizers exist.
Conversely, a (small) cocomplete category has all small colimits. In particular, such a cat-
egory has all small coproducts and coequalizers.
The category 𝐒𝐞𝐭 is both complete and cocomplete.
In a cocomplete locally small category there is a simple criterion for the existence of the
terminal object: It’s enough that a weakly terminal set exists.
A weakly terminal object, just like the terminal object, has an arrow coming from any object;
except that such an arrow is not necessarily unique.
A weakly terminal set is a family of objects 𝑡𝑖 indexed by a set 𝐼 such that, for any object 𝑐
in  there exists an 𝑖 and an arrow 𝑐 → 𝑡𝑖 . Such a set is also called a solution set.

𝑡1 𝑡2
𝑡3


In a cocomplete category we can always construct a coproduct 𝑖∈𝐼 𝑡𝑖 . This coproduct is a
weakly terminal object, because there is an arrow to it from every 𝑐. This arrow is the composite

of the arrow to some 𝑡𝑖 followed by the injection 𝜄𝑖 ∶ 𝑡𝑖 → 𝑗∈𝐼 𝑡𝑗 .
Given a weakly terminal object, we can construct the (strongly) terminal object. We first
define a subcategory  of  whose objects are 𝑡𝑖 . Morphisms in  are all the morphisms in 
that go between the objects of  . This is called a full subcategory of . By our construction, 
is small.
There is an obvious inclusion functor 𝐹 that embeds  in . This functor defines a small
diagram in . It turns out that the colimit of this diagram is the terminal object in .
Dually, a similar construction can be used to define an initial object as a limit of a weakly
initial set.
This property of solution sets will come handy in the proof of the Freyd’s adjoint functor
theorem.
9.6. THE YONEDA LEMMA 97

9.6 The Yoneda Lemma


A functor from some category  to the category of sets can be thought of as a model of this
category in 𝐒𝐞𝐭. Modeling, in general, is a lossy process: it discards some information. A
constant 𝐒𝐞𝐭-valued functor is an extreme example: it maps the whole category to a single set
and its identity function.
A hom-functor produces a model of the category as viewed from a certain vantage point. The
functor (𝑎, −), for instance, offers the panorama of  from the vantage point of 𝑎. It organizes
all the arrows emanating from 𝑎 into neat packages that are connected by images of arrows that
go between them, all in accordance with the original structure of the source category.

𝑥 𝑦
𝑥
𝑦
𝑣 𝐒𝐞𝐭
𝑣
𝑎 𝑧
𝑎 𝑧
Some vantage points are better than others. For instance, the view from the initial object is
quite sparse. Every object 𝑥 is mapped to a singleton set (0, 𝑥) corresponding to the unique
mapping 0 → 𝑥.
The view from the terminal object is more interesting: it maps all objects to their sets of
(global) elements (1, 𝑥).
The Yoneda lemma may be considered one of the most profound statements, or one of the
most trivial statements in category theory. Let’s start with the profound version.
Consider two models of  in 𝐒𝐞𝐭. The first one is given by the hom-functor (𝑎, −). It’s the
panoramic, very detailed view of  from the vantage point of 𝑎. The second is given by some
arbitrary functor 𝐹 ∶  → 𝐒𝐞𝐭. Any natural transformation between them embeds one model in
the other. It turns out that the set of all such natural transformations is fully determined by the
set 𝐹 𝑎.
Since the set of natural transformation is the hom-set in the functor category [, 𝐒𝐞𝐭], the
formal statement of the Yoneda lemma takes the form:

[, 𝐒𝐞𝐭]((𝑎, −), 𝐹 ) ≅ 𝐹 𝑎


Moreover, this isomorphism is natural in both 𝑎 and 𝐹 .
The reason this works is because all the mappings involved in this theorem are bound by
the requirements of preserving the structure of the category  and the structure of its models.
In particular, naturality conditions impose a huge set of constraints on the way the components
of a natural transformation propagate from one point to another.
The proof of the Yoneda lemma starts with a single identity arrow and lets naturality prop-
agate it across the whole category.
Here’s the sketch of the proof. It consists of two parts: First, given a natural transformation
we construct an element of 𝐹 𝑎. Second, given an element of 𝐹 𝑎 we construct the corresponding
natural transformation.
First, let’s pick an arbitrary element on the left-hand side of the Yoneda lemma: a natural
transformation 𝛼. Its component at 𝑥 is a function:
𝛼𝑥 ∶ (𝑎, 𝑥) → 𝐹 𝑥
98 CHAPTER 9. NATURAL TRANSFORMATIONS

We can now apply the Yoneda trick: substitute 𝑎 for 𝑥:

𝛼𝑎 ∶ (𝑎, 𝑎) → 𝐹 𝑎

and then pick the identity 𝑖𝑑𝑎 as the canonical element of (𝑎, 𝑎). The result is an element
𝛼𝑎 (𝑖𝑑𝑎 ) in the set 𝐹 𝑎. This defines a mapping in one direction, from natural transformations to
elements of the set 𝐹 𝑎.
Now the other way around. Given an element 𝑝 of the set 𝐹 𝑎 we want to construct a natural
transformation 𝛼. First, we assign 𝑝 to be the action of 𝛼𝑎 on 𝑖𝑑𝑎 ∈ (𝑎, 𝑎).

𝑥 𝛼𝑥
ℎ 𝐹𝑥
(𝑎, 𝑥)

ℎ 𝐹ℎ

𝛼𝑎
𝑎 𝑝 𝐹𝑎
𝑖𝑑𝑎 (𝑎, 𝑎)

Now let’s take an arbitrary object 𝑥 and an arbitrary element of (𝑎, 𝑥). The latter is an
arrow ℎ ∶ 𝑎 → 𝑥. Our natural transformation must map it to an element of 𝐹 𝑥. We can do it by
lifting the arrow ℎ using 𝐹 . We get a function:

𝐹 ℎ∶ 𝐹 𝑎 → 𝐹 𝑥

We can apply this function to 𝑝 and get an element of 𝐹 𝑥. We take this element to be the action
of 𝛼𝑥 on ℎ:
𝛼𝑥 ℎ = (𝐹 ℎ)𝑝

The isomorphism in the Yoneda lemma is natural both in 𝑎 and in 𝐹 . The latter means that
you can “move” from the functor 𝐹 to another functor 𝐺 by applying an arrow in the functor
category, that is a natural transformation. This is quite a leap in the levels of abstraction, but all
the definitions of functoriality and naturality work equally well in the functor category, where
objects are functors, and arrows are natural transformations.

Exercise 9.6.1. Fill in the gap in the proof when 𝐹 𝑎 is empty.

Exercise 9.6.2. Show that the mapping

(𝑎, 𝑥) → 𝐹 𝑥

defined above is a natural transformation. Hint: Vary 𝑥 using some 𝑓 ∶ 𝑥 → 𝑦.

Exercise 9.6.3. Show that the formula for 𝛼𝑥 can be derived from the assumption that 𝛼𝑎 (𝑖𝑑𝑎 ) =
𝑝 and the naturality condition. Hint: The lifting of ℎ by the hom-functor (𝑎, ℎ) is given by
post-composition.
9.6. THE YONEDA LEMMA 99

Yoneda lemma in programming


Now for the trivial part: The proof of the Yoneda lemma translates directly to Haskell code. We
start with the type of natural transformation between the hom-functor a->x and some functor
f, and show that it’s equivalent to the type of f acting on a.
forall x. (a -> x) -> f x. -- is isomorphic to (f a)
We produce a value of the type (f a) using the standard Yoneda trick
yoneda :: Functor f => (forall x. (a -> x) -> f x) -> f a
yoneda g = g id
Here’s the inverse mapping:
yoneda_1 :: Functor f => f a -> (forall x. (a -> x) -> f x)
yoneda_1 y = \h -> fmap h y
Note that we are cheating a little by mixing types and sets. The Yoneda lemma in the present
formulation works with 𝐒𝐞𝐭-valued functors. Again, the correct incantation is to say that we use
the enriched version of the Yoneda lemma in a self-enriched category.
The Yoneda lemma has some interesting applications in programming. For instance, let’s
consider what happens when we apply the Yoneda lemma to the identity functor. We get the
isomorphism between the type a (the identity functor acting on a) and
forall x. (a -> x) -> x
We interpret this as saying that any data type a can be replaced by a higher order polymorphic
function. This function takes another function—called a handler, a callback, or a continuation—
as an argument.
This is the standard continuation passing transformation that’s used a lot in distributed pro-
gramming, for instance when the value of type a has to be retrieved from a remote server. It’s
also useful as a program transformation that turns recursive algorithms into tail-recursive func-
tions.
Continuation-passing style is difficult to work with because the composition of continuations
is highly nontrivial, resulting in what programmers often call a “callback hell.” Fortunately con-
tinuations form a monad, which means their composition can be hidden behind the do notation.

The contravariant Yoneda lemma


By reversing a few arrow, the Yoneda lemma can be applied to contravariant functors as well.
It works on natural transformations between the contravariant hom-functor (−, 𝑎) and a con-
travariant functor 𝐹 :

[ 𝑜𝑝 , 𝐒𝐞𝐭]((−, 𝑎), 𝐹 ) ≅ 𝐹 𝑎

This is the Haskell implementation of the mapping:


coyoneda :: Contravariant f => (forall x. (x -> a) -> f x) -> f a
coyoneda g = g id
And this is the inverse transformation:
coyoneda_1 :: Contravariant f => f a -> (forall x. (x -> a) -> f x)
coyoneda_1 y = \h -> contramap h y
100 CHAPTER 9. NATURAL TRANSFORMATIONS

9.7 Yoneda Embedding


In a closed category, we have exponential objects that serve as stand-ins for hom-sets. This is
obviously a thing in 𝐒𝐞𝐭, where hom-sets, being sets, are automatically objects in 𝐒𝐞𝐭.
On the other hand, in the category of categories 𝐂𝐚𝐭, hom-sets are sets of functors, and it’s
not immediately obvious that they can be promoted to objects in 𝐂𝐚𝐭—that is, categories. But,
as we’ve seen before, they can! Functors between any two categories form a functor category.
Because of that, it’s possible to curry functors just like we curried functions. A functor
from a product category can be viewed as a functor returning a functor. In other words, 𝐂𝐚𝐭 is
a closed (symmetric) monoidal category.
In particular, we can apply currying to the hom-functor (𝑎, 𝑏). It is a profunctor, or a
functor from the product category:
 𝑜𝑝 ×  → 𝐒𝐞𝐭
But it’s also a contravariant functor in the first argument 𝑎. And for every 𝑎 in  𝑜𝑝 it produces a
covariant functor (𝑎, −), which is an object in the functor category [, 𝐒𝐞𝐭]. We can write this
mapping as:
 𝑜𝑝 → [, 𝐒𝐞𝐭]
Alternatively, we can fix 𝑏 and produce a contravariant functor (−, 𝑏). This mapping can be
written as
 → [ 𝑜𝑝 , 𝐒𝐞𝐭]
Both mappings are functorial, which means that, for instance, an arrow in  is mapped to a
natural transformation in [ 𝑜𝑝 , 𝐒𝐞𝐭].
These 𝐒𝐞𝐭-valued functor categories are common enough that they have special names. The
functors in [ 𝑜𝑝 , 𝐒𝐞𝐭] are called presheaves, and the ones in [, 𝐒𝐞𝐭] are called co-presheaves.
(The names come from algebraic topology.)
Let’s focus our attention on the following reading of the hom-functor:

 ∶  → [ 𝑜𝑝 , 𝐒𝐞𝐭]

It takes an object 𝑥 and maps it to a presheaf

𝑥 = (−, 𝑥)

which can be visualized as the totality of views of 𝑥 from all possible directions.
Let’s also review its action on arrows. The functor  lifts an arrow 𝑓 ∶ 𝑥 → 𝑦 to a mapping
of presheaves:
𝛼 ∶ (−, 𝑥) → (−, 𝑦)
The component of this natural transformation at some 𝑧 is a function between hom-sets:

𝛼𝑧 ∶ (𝑧, 𝑥) → (𝑧, 𝑦)

which is simply implemented as the post-composition (𝑓 ◦−).


Such a functor  can be thought of as creating a model of  in the presheaf category. But this
is no run-of-the-mill model—it’s an embedding of one category inside another. This particular
one is called the Yoneda embedding and the functor  is called the Yoneda functor.
First of all, every object of  is mapped to a different object (presheaf) in [ 𝑜𝑝 , 𝐒𝐞𝐭]. We say
that it’s injective on objects.
9.7. YONEDA EMBEDDING 101

But that’s not all: every arrow in  is mapped to a different arrow. We say that the embedding
functor is faithful.
If that weren’t enough, the mapping of hom-sets is also surjective, meaning that every arrow
between objects in [ 𝑜𝑝 , 𝐒𝐞𝐭] comes from some arrow in . We say that the functor is full.
Altogether, the embedding is fully faithful, that is the mapping of arrows is one-to-one.
However, in general, the Yoneda embedding is not surjective on objects, hence the word “em-
bedding.”
The fact that the embedding is fully faithful is the direct consequence of the Yoneda lemma.
Indeed, we know that, for any functor 𝐹 ∶  𝑜𝑝 → 𝐒𝐞𝐭, we have a natural isomorphism:
[ 𝑜𝑝 , 𝐒𝐞𝐭]((−, 𝑥), 𝐹 ) ≅ 𝐹 𝑥
In particular, we can substitute another hom-functor (−, 𝑦) for 𝐹 to get:
[ 𝑜𝑝 , 𝐒𝐞𝐭]((−, 𝑥), (−, 𝑦)) ≅ (𝑥, 𝑦)
The left-hand side is the hom-set in the presheaf category and the right-hand side is the hom-set
in . They are isomorphic, which proves that the embedding is fully faithful. In fact, the Yoneda
lemma tells us that the isomorphism is natural in 𝑥 and 𝑦.
Let’s have a closer look at this isomorphism. Let’s pick an element of the right-hand set
(𝑥, 𝑦)— an arrow 𝑓 ∶ 𝑥 → 𝑦. The isomorphism maps it to a natural transformation whose
component at 𝑧 is a function:
(𝑧, 𝑥) → (𝑧, 𝑦)
This mapping is implemented as post-composition (𝑓 ◦−).
In Haskell, we would write it as:
toNatural :: (x -> y) -> (forall z. (z -> x) -> (z -> y))
toNatural f = \h -> f . h
In fact, this syntax works too:
toNatural f = (f . )
The inverse mapping is:
fromNatural :: (forall z. (z -> x) -> (z -> y)) -> (x -> y)
fromNatural alpha = alpha id
(Notice the use of the Yoneda trick again.)
This isomorphism maps identity to identity and composition to composition. That’s be-
cause it’s implemented as post-composition, and post-composition preserves both identity and
composition. We’ve shown this fact before, in the chapter on isomorphisms:
((𝑓 ◦𝑔)◦−) = (𝑓 ◦−)◦(𝑔◦−)
Because it preserves composition and identity, this isomorphism also preserves isomor-
phisms. So if 𝑥 is isomorphic to 𝑦 then the presheaves (−, 𝑥) and (−, 𝑦) are isomorphic,
and vice versa.
This is exactly the result that we’ve been using all along to prove numerous isomorphisms
in previous chapters. If the hom-sets are naturally isomorphic, then the objects are isomorphic.
The Yoneda embedding builds on the idea that there is nothing more to an object than its
relationships with other objects. The presheaf (−, 𝑎), like a hologram, encodes the totality
of views of 𝑎 from the perspective of the whole category . The Yoneda embedding tells us
that, when we combine all these individual holograms, we get a perfect hologram of the whole
category.
102 CHAPTER 9. NATURAL TRANSFORMATIONS

9.8 Representable Functors


Objects in a co-presheaf category are functors that assign sets to objects in . Some of these
functors work by picking a reference object 𝑎 and assigning, to all objects 𝑥, their hom-sets
(𝑎, 𝑥):
𝐹 𝑥 = (𝑎, 𝑥)
Such functors, and all the functors isomorphic to those, are called representable. The whole
functor is “represented” by a single object 𝑎.
In a closed category, the functor which assigns to every object 𝑥 the set of elements of the
exponential object 𝑥𝑎 is represented by 𝑎. This is because the set of elements of 𝑥𝑎 is isomorphic
to (𝑎, 𝑥):
(1, 𝑥𝑎 ) ≅ (1 × 𝑎, 𝑥) ≅ (𝑎, 𝑥)
Seen this way, the representing object 𝑎 is like a logarithm of a functor.
The analogy goes deeper: just like a logarithm of a product is a sum of logarithms, a repre-
senting object for a product data type is a sum. For instance, the functor that squares its argument
using a product, 𝐹 𝑥 = 𝑥 × 𝑥, is represented by 2, which is the sum 1 + 1. Indeed, we’ve seen
before that 𝑥 × 𝑥 ≅ 𝑥2 .
Representable functors play a very special role in the category of 𝐒𝐞𝐭-valued functors. No-
tice that the Yoneda embedding maps all objects of  to representable presheaves. It maps an
object 𝑥 to a presheaf represented by 𝑥:

 ∶ 𝑥 ↦ (−, 𝑥)
We can find the entire category , objects and morphisms, embedded inside the presheaf
category as representable functors. The question is, what else is there in the presheaf category
“in between” representable functors?
Just like rational numbers are dense among real numbers, so representables are “dense”
among (co-) presheaves. Every real number may be approximated by rational numbers. Every
presheaf is a colimit of representables (and every co-presheaf, a limit). We’ll come back to this
topic when we talk about (co-) ends.

Exercise 9.8.1. Describe limits and colimits as representing objects. What are the functors they
represent?

Exercise 9.8.2. Consider a singleton functor 𝐹 ∶  → 𝐒𝐞𝐭 that assigns to each object 𝑐 a
singleton set {𝑐} that contains just that object (that is, a different singleton for every object).
Define the action of 𝐹 on arrows. Show that 𝐹 being representable is equivalent to  having
an initial object.

The guessing game


The idea that objects can be described by the way they interact with other objects is sometimes
illustrated by playing an imaginary guessing game. One category theorist picks a secret object
in a category, and the other has to guess which object it is (up to isomorphism, of course).
The guesser is allowed to point at objects, and use them as “probes” into the secret object.
The opponent is supposed to respond each time with a set: the set of arrows from the probing
object 𝑎 to the secret object 𝑥. This, of course, is the hom-set (𝑎, 𝑥).
The totality of these answers, as long as the opponent is not cheating, will define a presheaf
𝐹 ∶  𝑜𝑝 → 𝐒𝐞𝐭, and the object they are hiding is its representing object.
9.8. REPRESENTABLE FUNCTORS 103

But how do we know that the second category theorist in not cheating? To test that, we ask
questions about arrows. For every arrow we select, they should give us a function between two
sets—the sets they gave us for its endpoints. We can then check if all identity arrows are mapped
to identity functions, and whether compositions of arrows map to compositions of functions. In
other words, we’ll be able to verify that 𝐹 is indeed a functor.
However, a clever enough opponent may still fool us. The presheaf they are revealing to us
may describe a fantastic object—a figment of their imagination—and we won’t be able to tell.
It turns out that such fantastic beasts are often as interesting as the real ones.

Representable functors in programming


In Haskell, we define a class of representable functors using two functions that witness the
isomorphism:
𝐹 𝑥 = (𝑎, 𝑥)
The first one, tabulate, turns a function into a lookup table; and the second, index, uses the
representing type Key to index into it.
class Representable f where
type Key f :: Type
tabulate :: (Key f -> x) -> f x
index :: f x -> (Key f -> x)
Algebraic data types that use sums are not representable (there is no formula for taking a
logarithm of a sum). For instance, the list type is defined as a sum, so it’s not representable.
However, an infinite stream is.
Conceptually, a stream is like an infinite tuple, which is technically a product. Such a stream
is represented by the type of natural numbers. In other words, an infinite stream is equivalent to
a mapping out of natural numbers.
data Stream a = Stm a (Stream a)
Here’s the instance definition:
instance Representable Stream where
type Key Stream = Nat
tabulate g = tab Z
where
tab n = Stm (g n) (tab (S n))
index stm = \n -> ind n stm
where
ind Z (Stm a _) = a
ind (S n) (Stm _ as) = ind n as
Representable types are useful for implementing memoization of functions.

Exercise 9.8.3. Implement the Representable instance for Pair:


data Pair x = Pair x x

Exercise 9.8.4. Is the constant functor that maps everything to the terminal object representable?
Hint: what’s the logarithm of 1?
In Haskell, such a functor could be implemented as:
104 CHAPTER 9. NATURAL TRANSFORMATIONS

data Unit a = U
Implement the instance of Representable for it.

Exercise 9.8.5. The list functor is not representable. But can it be considered a sum or repre-
sentables?

9.9 2-category 𝐂𝐚𝐭


In the category of categories, 𝐂𝐚𝐭, the hom-sets are not “just” sets. Each of them can be pro-
moted to a functor category, with natural transformations playing the role of arrows. This kind
of structure is called a 2-category.
In the language of 2-categories, objects are called 0-cells, arrows between them are called
1-cells, and arrows between arrows are called 2-cells.
The obvious generalization of that picture would be to have 3-cells that go between 2-cells
and so on. An 𝑛-category has cells going up to the 𝑛-th level.
But why not have arrows going all the way? Enter infinity categories. Far from being
a curiosity, ∞-categories have practical applications. For instance they are used in algebraic
topology to describe points, paths between points, surfaces swiped by paths, volumes swiped
by surfaces, and so on, ad infinitum.

9.10 Useful Formulas


• Yoneda lemma for covariant functors:

[, 𝐒𝐞𝐭]((𝑎, −), 𝐹 ) ≅ 𝐹 𝑎

• Yoneda lemma for contravariant functors:

[ 𝑜𝑝 , 𝐒𝐞𝐭]((−, 𝑎), 𝐹 ) ≅ 𝐹 𝑎

• Corollaries to the Yoneda lemma:

[, 𝐒𝐞𝐭]((𝑥, −), (𝑦, −)) ≅ (𝑦, 𝑥)


[ 𝑜𝑝 , 𝐒𝐞𝐭]((−, 𝑥), (−, 𝑦)) ≅ (𝑥, 𝑦)
Chapter 10

Adjunctions

A sculptor subtracts irrelevant stone until a sculpture emerges. A mathematician abstracts irrel-
evant details until a pattern emerges.
We were able to define a lot of constructions using their mapping-in and mapping-out prop-
erties. Those, in turn, could be compactly written as isomorphisms between hom-sets. This
pattern of natural isomorphisms between hom-sets is called an adjunction and, once recognized,
pops up virtually everywhere.

10.1 The Currying Adjunction


The definition of the exponential is the classic example of an adjunction that relates mappings-
out and mappings-in. Every mapping out of a product corresponds to a unique mapping into the
exponential:
(𝑒 × 𝑎, 𝑏) ≅ (𝑒, 𝑏𝑎 )
The object 𝑏 takes the role of the focus on the left hand side; the object 𝑒 becomes the observer
on the right hand side.
We can spot two functors at play. They are both parameterized by 𝑎. On the left we have the
product functor (− × 𝑎) applied to 𝑒. On the right we have the exponential functor (−)𝑎 applied
to 𝑏.
If we write these functors as:
𝐿𝑎 𝑒 = 𝑒 × 𝑎
𝑅𝑎 𝑏 = 𝑏𝑎
then the natural isomorphism
(𝐿𝑎 𝑒, 𝑏) ≅ (𝑒, 𝑅𝑎 𝑏)
is called the adjunction between them.
In components, this isomorphism tells us that, given a mapping 𝜙 ∈ (𝐿𝑎 𝑒, 𝑏), there is
a unique mapping 𝜙𝑇 ∈ (𝑒, 𝑅𝑎 𝑏) and vice versa. These mappings are sometimes called the
transpose of each other—the nomenclature taken from matrix algebra.
The shorthand notation for the adjunction is 𝐿 ⊣ 𝑅. Substituting the product functor for 𝐿
and the exponential functor for 𝑅, we can write the currying adjunction concisely as:

(− × 𝑎) ⊣ (−)𝑎

105
106 CHAPTER 10. ADJUNCTIONS

The exponential object 𝑏𝑎 is sometimes called the internal hom and is written as [𝑎, 𝑏]. This
is in contrast to the external hom, which is the set (𝑎, 𝑏). The external hom is not an object in
 (except when  itself is 𝐒𝐞𝐭). With this notation, the currying adjunction can be written as:

(𝑒 × 𝑎, 𝑏) ≅ (𝑒, [𝑎, 𝑏])

A category in which this adjunction holds is called cartesian closed.


Since functions play central role in every programming language, cartesian closed categories
form the basis of all models of programming. We interpret the exponential 𝑏𝑎 as the function
type 𝑎 → 𝑏.
Here 𝑒 plays the role of the external environment—the Γ of the lambda calculus. The mor-
phism in (Γ × 𝑎, 𝑏) is interpreted as an expression of type 𝑏 in the environment Γ extended by
a variable of type 𝑎. The function type 𝑎 → 𝑏 therefore represents a closure that may capture a
value of type 𝑒 from its environment.
Incidentally, the category of (small) categories 𝐂𝐚𝐭 is also cartesian closed, as reflected in
this adjunction between product categories and functor categories that uses the same internal-
hom notation:
𝐂𝐚𝐭( × , ) ≅ 𝐂𝐚𝐭(, [, ])

Here, both sides are sets of natural transformations.

10.2 The Sum and the Product Adjunctions


The currying adjunction relates two endofunctors, but an adjunction can be easily generalized
to functors that go between different categories. Let’s see some examples first.

The diagonal functor


The sum and the product types were defined using bijections where one of the sides was a single
arrow and the other was a pair of arrows. A pair of arrows can be seen as a single arrow in the
product category.
To explore this idea, we need to define the diagonal functor Δ, which is a special mapping
from  to  × . It takes an object 𝑥 and duplicates it, producing a pair of objects ⟨𝑥, 𝑥⟩. It also
takes an arrow 𝑓 and duplicates it ⟨𝑓 , 𝑓 ⟩.
Interestingly, the diagonal functor is related to the constant functor we’ve seen previously.
The constant functor can be though of as a functor of two variables—it just ignores the second
one. We’ve seen this in the Haskell definition:
data Const c a = Const c
To see the connection, let’s look at the product category  ×  as a functor category [𝟐, ],
in other words, the exponential object  𝟐 in 𝐂𝐚𝐭. Indeed, a functor from 𝟐 (the stick-figure
category with two objects) picks a pair of objects—which is equivalent to a single object in the
product category.
A functor  → [𝟐, ] can be uncurried to  × 𝟐 → . The diagonal functor ignores the
second argument, the one coming from 𝟐: it does the same thing whether the second argument
is 1 or 2. That’s exactly what the constant functor does as well. This is why we use the same
symbol Δ for both.
Incidentally, this argument can be easily generalized to any indexing category, not just 𝟐.
10.2. THE SUM AND THE PRODUCT ADJUNCTIONS 107

The sum adjunction


Recall that the sum is defined by its mapping out property. There is a one-to one correspondence
between the arrows coming out of the sum 𝑎 + 𝑏 and pairs of arrows coming from 𝑎 and 𝑏
separately. In terms of hom-sets, we can write it as:

(𝑎 + 𝑏, 𝑥) ≅ (𝑎, 𝑥) × (𝑏, 𝑥)

where the product on the right-hand side is just a cartesian product of sets, that is the set of pairs.
Moreover, we’ve seen earlier that this bijection is natural in 𝑥.
We know that a pair of arrows is a single arrow in the product category. We can, therefore,
look at the elements on the right-hand side as arrows in  ×  going from the object ⟨𝑎, 𝑏⟩ to the
object ⟨𝑥, 𝑥⟩. The latter can be obtained by acting with the diagonal functor Δ on 𝑥. We have:

(𝑎 + 𝑏, 𝑥) ≅ ( × )(⟨𝑎, 𝑏⟩, Δ𝑥)

This is a bijection between hom-sets in two different categories. It satisfies naturality conditions,
so it’s a natural isomorphism.
We can spot a pair of functors here as well. On the left we have the functor that takes a pair
of objects ⟨𝑎, 𝑏⟩ and produces their sum 𝑎 + 𝑏:

(+) ∶  ×  → 

On the right-hand side, we have the diagonal functor Δ going in the opposite direction:

Δ∶  →  × 

Altogether, we have a pair of functors between a pair of categories:

(+)

 ×
Δ

and an isomorphism between the hom-sets:

(+)

𝑎+𝑏 ⟨𝑎, 𝑏⟩

𝑥 ⟨𝑥, 𝑥⟩
Δ

In other words, we have the adjunction:

(+) ⊣ Δ
108 CHAPTER 10. ADJUNCTIONS

The product adjunction


We can apply the same reasoning to the definition of a product. This time we have a natural
isomorphism between pairs of arrows and a mapping into the product.

(𝑥, 𝑎) × (𝑥, 𝑏) ≅ (𝑥, 𝑎 × 𝑏)


Replacing pairs of arrows with arrows in the product category we get:

( × )(Δ𝑥, ⟨𝑎, 𝑏⟩) ≅ (𝑥, 𝑎 × 𝑏)


These are the two functors going in the opposite direction:
Δ

× 
(×)

and this is the isomorphism of hom-sets:

⟨𝑥, 𝑥⟩ 𝑥

⟨𝑎, 𝑏⟩ 𝑎×𝑏

(×)

In other words, we have the adjunction:

Δ ⊣ (×)

Distributivity
In a bicartesian closed category products distribute over sums. We’ve seen one direction of the
proof using universal constructions. Adjunctions combined with the Yoneda lemma give us
more powerful tools to tackle this problem.
We want to show the natural isomorphism:

(𝑏 + 𝑐) × 𝑎 ≅ 𝑏 × 𝑎 + 𝑐 × 𝑎

Instead of proving this identity directly, we’ll show that the mappings out from both sides to an
arbitrary object 𝑥 are isomorphic:

((𝑏 + 𝑐) × 𝑎, 𝑥) ≅ (𝑏 × 𝑎 + 𝑐 × 𝑎, 𝑥)

The left hand side is a mapping out of a product, so we can apply the currying adjunction to it:

((𝑏 + 𝑐) × 𝑎, 𝑥) ≅ (𝑏 + 𝑐, 𝑥𝑎 )

This gives us a mapping out of a sum which, by the sum adjunction is isomorphic to the product
of two mappings:
(𝑏 + 𝑐, 𝑥𝑎 ) ≅ (𝑏, 𝑥𝑎 ) × (𝑐, 𝑥𝑎 )
10.3. ADJUNCTION BETWEEN FUNCTORS 109

We can now apply the inverse of the currying adjunction to both components:

(𝑏, 𝑥𝑎 ) × (𝑐, 𝑥𝑎 ) ≅ (𝑏 × 𝑎, 𝑥) × (𝑐 × 𝑎, 𝑥)

Using the inverse of the sum adjunction, we arrive at the final result:

(𝑏 × 𝑎, 𝑥) × (𝑐 × 𝑎, 𝑥) ≅ (𝑏 × 𝑎 + 𝑐 × 𝑎, 𝑥)

Every step in this proof was a natural isomorphism, so their composition is also a natural
isomorphism. By Yoneda lemma, the two objects that form the left- and the right-hand side of
distributivity law are therefore isomorphic.
A much shorter proof of this statement follows from the property of left adjoints that we’ll
discuss soon.

10.3 Adjunction between functors


In general, an adjunction relates two functors going in opposite directions between two cate-
gories. The left functor
𝐿∶  → 
and the right functor:
𝑅∶  → 
The adjunction 𝐿 ⊣ 𝑅 is defined as a natural isomorphism between two hom-sets.

(𝐿𝑥, 𝑦) ≅ (𝑥, 𝑅𝑦)

In other words, we have a family of invertible functions between sets:

𝜙𝑥𝑦 ∶ (𝐿𝑥, 𝑦) → (𝑥, 𝑅𝑦)

natural in both 𝑥 and 𝑦. For instance, naturality in 𝑦 means that, for any 𝑓 ∶ 𝑦 → 𝑦′ the following
diagram commutes:
(𝐿𝑥,𝑓 )
(𝐿𝑥, 𝑦) (𝐿𝑥, 𝑦′ )
𝜙𝑥𝑦 𝜙𝑥𝑦′
(𝑥,𝑅𝑓 )
(𝑥, 𝑅𝑦) (𝑥, 𝑅𝑦′ )
or, considering that a lifting of arrows by hom-functors is the same as post-composition:
𝑓 ◦−
(𝐿𝑥, 𝑦) (𝐿𝑥, 𝑦′ )
𝜙𝑥𝑦 𝜙𝑥𝑦′
𝑅𝑓 ◦−
(𝑥, 𝑅𝑦) (𝑥, 𝑅𝑦′ )

The double-headed arrows can be traversed in either direction (using 𝜙−1


𝑥𝑦 when going up), since
they are the components of an isomorphism.
Pictorially, we have two functors:
𝐿

 
𝑅
110 CHAPTER 10. ADJUNCTIONS

and, for any pair 𝑥 and 𝑦, two isomorphic hom-sets:

𝐿𝑥 𝑥

𝑦 𝑅𝑦
𝑅

These hom-sets come from two different categories, but sets are just sets. We say that 𝐿 is the
left adjoint of 𝑅, or that 𝑅 is the right adjoint of 𝐿
In Haskell, the simplified version of this could be encoded as a multi-parameter type class:
class (Functor left, Functor right) => Adjunction left right where
ltor :: (left x -> y) -> (x -> right y)
rtol :: (x -> right y) -> (left x -> y)
It requires the following pragma at the top of the file:
{- # language MultiParamTypeClasses # -}
Therefore, in a bicartesian category, the sum is the left adjoint to the diagonal functor; and
the product is its right adjoint. We can write this very concisely (or we could impress it in clay,
in a modern version of cuneiform):

(+) ⊣ Δ ⊣ (×)

Exercise 10.3.1. Draw the commuting square witnessing the naturality of the adjunction func-
tion 𝜙𝑥𝑦 in 𝑥.

Exercise 10.3.2. The hom-set (𝐿𝑥, 𝑦) on the left-hand side of the adjunction formula suggests
that 𝐿𝑥 could be seen as a representing object for some functor (a co-presheaf). What is this
functor? Hint: It maps 𝑦 to a set. What set is it?

Exercise 10.3.3. Conversely, a representing object 𝑎 for a presheaf 𝑃 is defined by:

𝑃 𝑥 ≅ (𝑥, 𝑎)

What is the presheaf for which 𝑅𝑦, in the adjunction formula, is the representing object.

10.4 Limits and Colimits as Adjunctions


The definition of a limit also involves a natural isomorphism between hom-sets:

[ , ](Δ𝑥 , 𝐷) ≅ (𝑥, Lim𝐷)

The hom-set on the left is in the functor category. Its elements are cones, or natural transfor-
mations between the constant functor Δ𝑥 and the diagram functor 𝐷. The one on the right is a
hom-set in .
In a category where all limits exist, we have the adjunction between these two functors:

Δ(−) ∶  → [ , ]
10.5. UNIT AND COUNIT OF AN ADJUNCTION 111

and:
Lim(−) ∶ [ , ] → 
Dually, the colimit is described by the following natural isomorphism:
[ , ](𝐷, Δ𝑥 ) ≅ (Colim𝐷, 𝑥)
We can write both adjunctions using one terse formula:
Colim ⊣ Δ ⊣ Lim
In particular, since the product category  ×  is equivalent to  2 , or the functor category
[𝟐, ], we can rewrite a product and a coproduct as a limit and a colimit:
[𝟐, ](Δ𝑥 , ⟨𝑎, 𝑏⟩) ≅ (𝑥, 𝑎 × 𝑏)
(𝑎 + 𝑏, 𝑥) ≅ [𝟐, ](⟨𝑎, 𝑏⟩, Δ𝑥 )
where ⟨𝑎, 𝑏⟩ denotes a diagram that is the action of a functor 𝐷 ∶ 𝟐 →  on the two objects of
𝟐.

10.5 Unit and Counit of an Adjunction


We compare arrows for equality, but we prefer to use isomorphisms for comparing objects.
We have a problem when it comes to functors, though. On the one hand, they are objects in
the functor category, so isomorphisms are the way to go; on the other hand, they are arrows in
𝐂𝐚𝐭 so maybe it’s okay to compare them for equality?
To shed some light on this dilemma, we should ask ourselves why we use equality for arrows.
It’s not because we like equality, but because there’s nothing else for us to do in a set but to
compare elements for equality. Two elements of a hom-set are either equal or not, period.
That’s not the case in 𝐂𝐚𝐭 which, as we know, is a 2-category. Here, hom-sets themselves
have the structure of a category—the functor category. In a 2-category we have arrows between
arrows so, in particular, we can define isomorphisms between arrows. In 𝐂𝐚𝐭 these would be
natural isomorphisms between functors.
However, even though we have the option of replacing arrow equalities with isomorphisms,
categorical laws in 𝐂𝐚𝐭 are still expressed as equalities. For instance, the composition of a
functor 𝐹 with the identity functor is equal to 𝐹 , and the same for associativity. A 2-category
in which the laws are satisfied “on the nose” is called strict, and 𝐂𝐚𝐭 is an example of a strict
2-category.
But as far as comparing categories goes, we have more options. Categories are objects in
𝐂𝐚𝐭, so it’s possible to define an isomorphism of categories as a pair of functors 𝐿 and 𝑅:
Id Id

 
𝑅

such that:
𝐿◦𝑅 = Id
Id = 𝑅◦𝐿
112 CHAPTER 10. ADJUNCTIONS

This definition involves equality of functors, though. What’s worse, acting on objects, it involves
equality of objects:

𝐿(𝑅𝑥) = 𝑥
𝑦 = 𝑅(𝐿𝑦)

This is why it’s more proper to talk about a weaker notion of equivalence of categories, where
equalities are replaced by isomorphisms:

𝐿◦𝑅 ≅ Id
Id ≅ 𝑅◦𝐿

On objects, an equivalence of categories means that a round trip produces an object that is
isomorphic, rather than equal, to the original one. In most cases, this is exactly what we want.
An adjunction is also defined as a pair of functors going in opposite directions, so it makes
sense to ask what the result of a round trip is.
The isomorphism that defines an adjunction works for any pair of objects 𝑥 and 𝑦

(𝐿𝑥, 𝑦) ≅ (𝑥, 𝑅𝑦)

so, in particular, it works if we replace 𝑦 with 𝐿𝑥

(𝐿𝑥, 𝐿𝑥) ≅ (𝑥, 𝑅(𝐿𝑥))

We can now use the Yoneda trick and pick the identity arrow 𝑖𝑑𝐿𝑥 on the left. The isomorphism
maps it to a unique arrow on the right, which we’ll call 𝜂𝑥 :

𝜂𝑥 ∶ 𝑥 → 𝑅(𝐿𝑥)

Not only is this mapping defined for every 𝑥, but it’s also natural in 𝑥. The natural transformation
𝜂 is called the unit of the adjunction. If we observe that the 𝑥 on the left is the action of the
identity functor on 𝑥, we can write:

𝜂 ∶ Id → 𝑅◦𝐿

As an example, let’s evaluate the unit of the coproduct adjunction:

(𝑎 + 𝑏, 𝑥) ≅ ( × )(⟨𝑎, 𝑏⟩, Δ𝑥)

by replacing 𝑥 with 𝑎 + 𝑏. We get:

𝜂⟨𝑎,𝑏⟩ ∶ ⟨𝑎, 𝑏⟩ → Δ(𝑎 + 𝑏)

This is a pair of arrows that are exactly the two injections ⟨Left, Right⟩.
We can do a similar trick by replacing 𝑥 with 𝑅𝑦:

(𝐿(𝑅𝑦), 𝑦) ≅ (𝑅𝑦, 𝑅𝑦)

Corresponding to 𝑖𝑑𝑅𝑦 on the right, we get an arrow on the left:

𝜀𝑦 ∶ 𝐿(𝑅𝑦) → 𝑦
10.5. UNIT AND COUNIT OF AN ADJUNCTION 113

These arrows form another natural transformation called the counit of the adjunction:

𝜀 ∶ 𝐿◦𝑅 → Id

Notice that, if those two natural transformations were invertible, they would witness the
equivalence of categories. But even if they’re not, this kind of “half-equivalence” is still very
interesting in the context of category theory.
As an example, let’s evaluate the counit of the product adjunction:

( × )(Δ𝑥, ⟨𝑎, 𝑏⟩) ≅ (𝑥, 𝑎 × 𝑏)

by replacing 𝑥 with 𝑎 × 𝑏. We get:

𝜀⟨𝑎,𝑏⟩ ∶ Δ(𝑎 × 𝑏) → ⟨𝑎, 𝑏⟩

This is a pair of arrows that are exactly the two projections ⟨fst, snd⟩.

Exercise 10.5.1. Derive the counit of the coproduct adjunction and the unit of the product
adjunction.

Triangle identities
We can use the unit/counit pair to formulate an equivalent definition of an adjunction. To do
that, we start with a pair of natural transformations:

𝜂 ∶ Id → 𝑅◦𝐿
𝜀 ∶ 𝐿◦𝑅 → Id

and impose additional triangle identities.


These identities can be derived from the standard definition of the adjunction by noticing
that 𝜂 can be used to replace an identity functor with the composite 𝑅◦𝐿, effectively letting us
insert 𝑅◦𝐿 anywhere an identity functor would work.
Similarly, 𝜀 can be used to eliminate the composite 𝐿◦𝑅 (i.e., replace it with identity).
So, for instance, starting with 𝐿:
𝐿◦𝜂 𝜀◦𝐿
𝐿 = 𝐿◦Id ←←←←←←←→ ← Id ◦𝐿 = 𝐿
← 𝐿◦𝑅◦𝐿 ←←←←←←←→

Here, we used the horizontal composition of natural transformation, with one of them being the
identity transformation (a.k.a., whiskering).
The first triangle identity is the condition that this chain of transformations result in the
identity natural transformation. Pictorially:

𝐿◦𝜂
𝐿 𝐿◦𝑅◦𝐿
𝜀◦𝐿
𝑖𝑑𝐿
𝐿
Similarly, we want the following chain of natural transformations to also compose to iden-
tity:
𝜂◦𝑅 𝑅◦𝜀
𝑅 = Id ◦𝑅 ←←←←←←←→ ← 𝑅◦Id = 𝑅
← 𝑅◦𝐿◦𝑅 ←←←←←←←→
114 CHAPTER 10. ADJUNCTIONS

or, pictorially:
𝜂◦𝑅
𝑅 𝑅◦𝐿◦𝑅
𝑅◦𝜀
𝑖𝑑𝑅
𝑅
It turns out that an adjunction can be alternatively defined in terms of the two natural trans-
formations, 𝜂 and 𝜀, satisfying the triangle identities:

(𝜀◦𝐿) ⋅ (𝐿◦𝜂) = 𝑖𝑑𝐿


(𝑅◦𝜀) ⋅ (𝜂◦𝑅) = 𝑖𝑑𝑅

From those, the mapping of hom-sets can be easily recovered. For instance, let’s start with
an arrow 𝑓 ∶ 𝑥 → 𝑅𝑦, which is an element of (𝑥, 𝑅𝑦). We can lift it to

𝐿𝑓 ∶ 𝐿𝑥 → 𝐿(𝑅𝑦)

We can then use 𝜂 to collapse the composite 𝐿◦𝑅 to identity. The result is an arrow 𝐿𝑥 → 𝑦,
which is an element of (𝐿𝑥, 𝑦).
The definition of the adjunction using unit and counit is more general in the sense that it can
be translated to an arbitrary 2-category setting.
Exercise 10.5.2. Given an arrow 𝑔 ∶ 𝐿𝑥 → 𝑦 implement an arrow 𝑥 → 𝑅𝑦 using 𝜀 and the fact
that 𝑅 is a functor. Hint: Start with the object 𝑥 and see how you can get from there to 𝑅𝑦 with
one stopover.

The unit and counit of the currying adjunction


Let’s calculate the unit and the counit of the currying adjunction:

(𝑒 × 𝑎, 𝑏) ≅ (𝑒, 𝑏𝑎 )

If we replace 𝑏 with 𝑒 × 𝑎, we get

(𝑒 × 𝑎, 𝑒 × 𝑎) ≅ (𝑒, (𝑒 × 𝑎)𝑎 )

Corresponding to the identity arrow on the left, we get the unit of the adjunction on the right:

𝜂 ∶ 𝑒 → (𝑒 × 𝑎)𝑎

This is a curried version of the product constructor. In Haskell, we write it as:


unit :: e -> (a -> (e, a))
unit = curry id
The counit is more interesting. Replacing 𝑒 with 𝑏𝑎 we get:

(𝑏𝑎 × 𝑎, 𝑏) ≅ (𝑏𝑎 , 𝑏𝑎 )

Corresponding to the identity arrow on the right, we get:

𝜀 ∶ 𝑏𝑎 × 𝑎 → 𝑏

which is the function application arrow.


In Haskell:
10.6. ADJUNCTIONS USING UNIVERSAL ARROWS 115

counit :: (a -> b, a) -> b


counit = uncurry id
When the adjunction is between two endofunctors, we can write an alternative Haskell def-
inition of it using the unit and the counit:
class (Functor left, Functor right) =>
Adjunction left right | left -> right, right -> left where
unit :: x -> right (left x)
counit :: left (right x) -> x
The additional two clauses left -> right and right -> left tell the compiler that, when
using an instance of the adjunction, one functor can be derived from the other. This definition
requires the following compile extensions:
{- # language MultiParamTypeClasses # -}
{- # LANGUAGE FunctionalDependencies # -}
The two functors that form the currying adjunction can be written as:
data L r x = L (x, r) deriving (Functor, Show)
data R r x = R (r -> x) deriving Functor
and the Adjunction instance for currying is:
instance Adjunction (L r) (R r) where
unit x = R (\r -> L (x, r))
counit (L (R f, r)) = f r
The first triangle identity states that the following polymorphic function:
triangle :: L r x -> L r x
triangle = counit . fmap unit
is the identity, and so is the second one:
triangle' :: R r x -> R r x
triangle' = fmap counit . unit
Notice that these two functions require the use of functional dependencies to be well-defined.
Triangle identities cannot be expressed in Haskell, so it’s up to the implementor of the adjunction
to prove them.
Exercise 10.5.3. Test a few examples of the first triangle identity for the currying adjunction.
Here’s an example:
triangle (L (2, 'a'))

Exercise 10.5.4. How would you test the second triangle identity for the currying adjunction?
Hint: the result of triangle' is a function, so you can’t display it, but you could call it.

10.6 Adjunctions Using Universal Arrows


We’ve seen the definition of an adjunction using the isomorphism of hom-sets, and another one
using the pair of unit/counit. It turns out that we can define an adjunction using just one element
of this pair, as long as it satisfies certain universality condition. To see that, we will construct a
new category whose objects are arrows.
116 CHAPTER 10. ADJUNCTIONS

We’ve seen before an example of such a category—the slice category ∕𝑐 that collects all
the arrows that converge on 𝑐. Such a category describes the view of the object 𝑐 from every
possible angle in .

Comma category
When dealing with an adjunction:

(𝐿𝑑, 𝑐) ≅ (𝑑, 𝑅𝑐)

we are observing the object 𝑐 from a narrower perspective defined by the functor 𝐿. Think of
𝐿 as defining a model of the category  inside . We are interested in the view of 𝑐 from the
perspective of this model. The arrows that describe this view form the comma category 𝐿∕𝑐.

𝐿𝑑 𝑑

𝑐
𝐿
 

An object in the comma category 𝐿∕𝑐 is a pair ⟨𝑑, 𝑓 ⟩, where 𝑑 is an object of  and
𝑓 ∶ 𝐿𝑑 → 𝑐 is an arrow in .
A morphism from ⟨𝑑, 𝑓 ⟩ to ⟨𝑑 ′ , 𝑓 ′ ⟩ is an arrow ℎ ∶ 𝑑 → 𝑑 ′ that makes the diagram on the
left commute:
𝐿ℎ
𝐿𝑑 𝐿𝑑 ′

𝑑 𝑑′
𝑓 𝑓′
𝑐

Universal arrow
The universal arrow from 𝐿 to 𝑐 is defined as the terminal object in the comma category 𝐿∕𝑐.
Let’s unpack this definition. The terminal object in 𝐿∕𝑐 is a pair ⟨𝑡, 𝜏⟩ with a unique morphism
from any object ⟨𝑑, 𝑓 ⟩. Such a morphism is an arrow ℎ ∶ 𝑑 → 𝑡 that satisfies the commuting
condition:
𝐿ℎ
𝐿𝑑 𝐿𝑡

𝑓 𝜏
𝑐

In other words, for any 𝑓 in the hom-set (𝐿𝑑, 𝑐) there is a unique element ℎ in the hom-set
(𝑑, 𝑡) such that:
𝑓 = 𝜏◦𝐿ℎ

Such a one-to-one mapping between elements of two hom-sets hints at the underlying adjunc-
tion.
10.6. ADJUNCTIONS USING UNIVERSAL ARROWS 117

Universal arrows from adjunctions


Let’s first convince ourselves that, when the functor 𝐿 has a right adjoint 𝑅, then for every 𝑐
there exists a universal arrow from 𝐿 to 𝑐. Indeed, this arrow is given by the pair ⟨𝑅𝑐, 𝜀𝑐 ⟩, where
𝜀 is the counit of the adjunction. First of all, the component of the counit has the right signature
for the object in the comma category 𝐿∕𝑐:

𝜀𝑐 ∶ 𝐿(𝑅𝑐) → 𝑐

We’d like to show that ⟨𝑅𝑐, 𝜀𝑐 ⟩ is the terminal object in 𝐿∕𝑐. That is, for any object
⟨𝑑, 𝑓 ∶ 𝐿𝑑 → 𝑐⟩ there is a unique ℎ ∶ 𝑑 → 𝑅𝑐 such that 𝑓 = 𝜀𝑐 ◦𝐿ℎ:

𝐿ℎ
𝐿𝑑 𝐿(𝑅𝑐)

𝑓 𝜀𝑐
𝑐

To prove this, let’s write one of the naturality conditions for 𝜙𝑑𝑐 as the function of 𝑑:

𝜙𝑑𝑐 ∶ (𝐿𝑑, 𝑐) → (𝑑, 𝑅𝑐)

For any arrow ℎ ∶ 𝑑 → 𝑑 ′ the following diagram must commute:

−◦𝐿ℎ
(𝐿𝑑 ′ , 𝑐) (𝐿𝑑, 𝑐)
𝜙𝑑 ′ ,𝑐 𝜙𝑑,𝑐
−◦ℎ
(𝑑 ′ , 𝑅𝑐) (𝑑, 𝑅𝑐)

We can use the Yoneda trick by setting 𝑑 ′ to 𝑅𝑐.

−◦𝐿ℎ
(𝐿(𝑅𝑐), 𝑐) (𝐿𝑑, 𝑐)
𝜙𝑅𝑐,𝑐 𝜙𝑑,𝑐
−◦ℎ
(𝑅𝑐, 𝑅𝑐) (𝑑, 𝑅𝑐)
We can now pick the special element of the hom-set (𝑅𝑐, 𝑅𝑐), namely the identity arrow 𝑖𝑑𝑅𝑐
and propagate it through the rest of the diagram. The upper left corner becomes 𝜀𝑐 , the lower
right corner becomes ℎ, and the upper right corner becomes the adjoint to ℎ, which we called
𝑓:

−◦𝐿ℎ
𝜀𝑐 𝑓
𝜙𝑅𝑐,𝑐 𝜙𝑑,𝑐
−◦ℎ
𝑖𝑑𝑅𝑐 ℎ

The upper arrow then gives us the sought after equality 𝑓 = (−◦𝐿ℎ)𝜀𝑐 = 𝜀𝑐 ◦𝐿ℎ.

Adjunction from universal arrows


The converse result is even more interesting. If, for every 𝑐, we have a universal arrow from 𝐿 to
𝑐, that is a terminal object ⟨𝑡𝑐 , 𝜀𝑐 ⟩ in the comma category 𝐿∕𝑐, then we can construct a functor
118 CHAPTER 10. ADJUNCTIONS

𝑅 that is the right adjoint to 𝐿. The action of this functor on objects is given by 𝑅𝑐 = 𝑡𝑐 , and
the family 𝜀𝑐 is automatically natural in 𝑐, and it forms the counit of the adjunction.
There is also a dual statement: An adjunction can be constructed starting from a family of
universal arrows 𝜂𝑑 , which form initial objects in the comma category 𝑑∕𝑅.
These results will help us prove the Freyd’s adjoint functor theorem.

10.7 Properties of Adjunctions


Left adjoints preserve colimits
We defined colimits as universal cocones. For every cocone—that is a natural transformation
from the diagram 𝐷 ∶  →  to the constant functor Δ𝑥 —there’s supposed to be a unique
factorizing morphism from the colimit Colim 𝐷 to 𝑥. This condition can be written as a one-to-
one correspondence between the set of cocones and a particular hom-set:

[ , ](𝐷, Δ𝑥 ) ≅ (Colim 𝐷, 𝑥)

The factorizing condition is encoded in the naturality of this isomorphism.


It turns out that the set of cocones, which is an object in 𝐒𝐞𝐭, is itself a limit of the following
𝐒𝐞𝐭-valued functor 𝐹 ∶  → 𝐒𝐞𝐭:
𝐹 𝑗 = (𝐷𝑗, 𝑥)
To show this, we’ll start with the limit of 𝐹 and end up with the set of cocones. You may
recall that a limit of a 𝐒𝐞𝐭-valued functor is equal to a set of cones with the apex 1 (the singleton
set). In our case, each such cone describes a selection of morphisms from the corresponding
hom-set (𝐷𝑗, 𝑥):
1

(𝐷𝑗1 , 𝑥) (𝐷𝑗2 , 𝑥)

(𝐷𝑗3 , 𝑥)
Each of these morphisms has as target the same object 𝑥, so they form the sides of a cocone
with the apex 𝑥.
𝐷𝑗1 𝐷𝑗2

𝐷𝑗3

𝑥
The commuting conditions for the cone with the apex 1 are simultaneously the commuting con-
dition for this cocone with the apex 𝑥. But these are exactly the cocones in the set [ , ](𝐷, Δ𝑥 ).
We can therefore replace the original set of cocones with the limit of (𝐷−, 𝑥) to get:

Lim (𝐷−, 𝑥) ≅ (Colim 𝐷, 𝑥)


10.7. PROPERTIES OF ADJUNCTIONS 119

The contravariant hom-functor is sometimes notated as:

ℎ𝑥 = (−, 𝑥)

In this notation we can write:

𝐿𝑖𝑚 (ℎ𝑥 ◦𝐷) ≅ ℎ𝑥 (𝐶𝑜𝑙𝑖𝑚 𝐷)

The limit of a hom-functor acting on a diagram 𝐷 is isomorphic to the hom-functor acting on


a colimit of this diagram. This is usually abbreviated to: The hom-functor preserves colimits.
(With the understanding that the contravariant hom-functor turns colimits into limits.)
A functor that preserves colimits is called co-continuous. Thus the contravariant hom-
functor is co-continuous.
Now suppose that we have the adjunction 𝐿 ⊣ 𝑅, where 𝐿 ∶  →  and 𝑅 goes in the
opposite direction. We want to show that the left functor 𝐿 preserves colimits, that is:

𝐿(Colim 𝐷) ≅ Colim(𝐿◦𝐷)

for any diagram 𝐷 ∶  →  for which the colimit exists.


We’ll use the Yoneda lemma to show that the mappings out of both sides to an arbitrary 𝑥
are isomorphic:
(𝐿(Colim 𝐷), 𝑥) ≅ (Colim(𝐿◦𝐷), 𝑥)

We apply the adjunction to the left hand side to get:

(𝐿(Colim 𝐷), 𝑥) ≅ (Colim 𝐷, 𝑅𝑥)

Preservation of colimits by the hom-functor gives us:

≅ Lim (𝐷−, 𝑅𝑥)

Using the adjunction again, we get:

≅ Lim ((𝐿◦𝐷)−, 𝑥)

And the second application of preservation of colimits gives us the desired result:

≅ ((Colim (𝐿◦𝐷), 𝑥)

Since this is true for any 𝑥, we get our result.


We can use this result to reformulate our earlier proof of distributivity in a cartesian closed
category. We use the fact that the product is the left adjoint of the exponential. Left adjoints
preserve colimits. A coproduct is a colimit, therefore:

(𝑏 + 𝑐) × 𝑎 ≅ 𝑏 × 𝑎 + 𝑐 × 𝑎

Here, the left functor is 𝐿𝑥 = 𝑥 × 𝑎, and the diagram 𝐷 selects a pair of objects 𝑏 and 𝑐.
120 CHAPTER 10. ADJUNCTIONS

Right adjoints preserve limits


Using a dual argument, we can show that right adjoints preserve limits, that is:

𝑅(Lim 𝐷) ≅ Lim (𝑅◦𝐷)

We start by showing that the (covariant) hom-functor preserves limits.

Lim (𝑥, 𝐷−) ≅ (𝑥, Lim 𝐷)

This follows from the argument that a set of cones that defines the limit is isomorphic to the
limit of the 𝐒𝐞𝐭-valued functor:
𝐹 𝑗 = (𝑥, 𝐷𝑗)

A functor that preserves limits is called continuous.


To show that, given the adjunction 𝐿 ⊣ 𝑅, the right functor 𝑅 ∶  →  preserves limits,
we use the Yoneda argument:

(𝑥, 𝑅(Lim 𝐷)) ≅ (𝑥, Lim (𝑅◦𝐷))

Indeed, we have:

(𝑥, 𝑅(Lim 𝐷)) ≅ (𝐿𝑥, Lim 𝐷) ≅ Lim (𝐿𝑥, 𝐷−) ≅ (𝑥, Lim (𝑅◦𝐷))

10.8 Freyd’s adjoint functor theorem


In general functors are lossy—they are not invertible. In some cases we can make up the lost
information by replacing it with the “best guess.” If we do it in an organized manner, we end
up with an adjunction. The question is: given a functor between two categories, what are the
conditions under which we can construct its adjoint.
The answer to this question is given by the Freyd’s adjoint functor theorem. At first it might
seem like this is a technical theorem involving a very abstract construction called the solution
set condition. We’ll see later that this condition translates directly to a programming technique
called defunctionalization.
In what follows, we’ll focus our attention on constructing the right adjoint to a functor
𝐿 ∶  → . A dual reasoning can be used to solve the converse problem of finding the left
adjoint to a functor 𝑅 ∶  → .
The first observation is that, since the left functor in an adjunction preserves colimits, we
have to postulate that our functor 𝐿 preserves colimits. This gives us a hint that the construction
of the right adjoint relies on the ability to construct colimits in , and being able to somehow
transport them back to  using 𝐿.
We could demand that all colimits, large and small, exist in  but this condition is too
strong. Even a small category that has all colimits is automatically a preorder—that is, it can’t
have more than one morphism between any two objects.
But let’s ignore size problems for a moment, and see how one would construct the right
adjoint to a colimit-preserving functor 𝐿, whose source category  is small and has all colimits,
large and small (thus it is a preorder).
10.8. FREYD’S ADJOINT FUNCTOR THEOREM 121

Freyd’s theorem in a preorder


The easiest way to define the right adjoint to 𝐿 is to construct, for every object 𝑐, a universal
arrow from 𝐿 to 𝑐. Such an arrow is the terminal object in the comma category 𝐿∕𝑐—the
category of arrows which originate in the image of 𝐿 and converge on the object 𝑐.

𝐿𝑑 𝑑

𝑐
𝐿
 

The important observation is that this comma category describes a cocone in . The base
of this cocone is formed by those objects in the image of 𝐿 that have an unobstructed view of 𝑐.
The arrows in the base of the cocone are the morphisms in 𝐿∕𝑐. These are exactly the arrows
that make the sides of the cocone commute.
𝐿ℎ
𝐿𝑑 𝐿𝑑 ′

𝑑 𝑑′
𝑓 𝑓′
𝑐

The base of this cocone can then be projected back to . There is a projection 𝜋𝑐 which maps
every pair (𝑑, 𝑓 ) in 𝐿∕𝑐 back to 𝑑, thus forgetting the arrow 𝑓 . It also maps every morphism in
𝐿∕𝑐 to an arrow in  that gave rise to it. This way 𝜋𝑐 defines a diagram in . The colimit of
this diagram exists, because we have assumed that all colimits exist in . Let’s call this colimit
𝑡𝑐 :
𝑡𝑐 = colim 𝜋𝑐

𝜋𝑐

𝑐 𝑡𝑐

 

Let’s see if we can use this 𝑡𝑐 to construct a terminal object in 𝐿∕𝑐. We have to find an
arrow, let’s call it 𝜀𝑐 ∶ 𝐿𝑡𝑐 → 𝑐, such that the pair ⟨𝑡𝑐 , 𝜀𝑐 ⟩ is terminal in 𝐿∕𝑐.
Notice that 𝐿 maps the diagram generated by 𝜋𝑐 back to the base of the cocone defined by
𝐿∕𝑐. The projection 𝜋𝑐 did nothing more than to ignore the sides of this cocone, leaving its base
intact.
We now have two cocones in  with the same base: the original one with the apex 𝑐 and the
new one obtained by applying 𝐿 to the cocone in . Since 𝐿 preserves colimits, the colimit of
the new cocone is 𝐿𝑡𝑐 —the image of the colimit 𝑡𝑐 :
122 CHAPTER 10. ADJUNCTIONS

colim (𝐿◦𝜋𝑐 ) = 𝐿(colim 𝜋𝑐 ) = 𝐿𝑡𝑐


By universal construction, we deduce that there must be a unique cocone morphism from the
colimit 𝐿𝑡𝑐 to 𝑐. That morphism, which we’ll call 𝜀𝑐 , makes all the relevant triangles commute.
What remains to be shown is that ⟨𝑡𝑐 , 𝜀𝑐 ⟩ is terminal in 𝐿∕𝑐, that is, for any ⟨𝑑, 𝑓 ∶ 𝐿𝑑 → 𝑐⟩
there is a unique comma-category morphism ℎ ∶ 𝑑 → 𝑡𝑐 that makes the following triangle
commute:

𝐿ℎ
𝐿𝑑 𝐿𝑡𝑐

𝑓 𝜀𝑐
𝑐
Notice that any such 𝑑 is automatically part of the diagram produced by 𝜋𝑐 (it’s the result
of 𝜋𝑐 acting on ⟨𝑑, 𝑓 ⟩). We know that 𝑡𝑐 is the limit of the 𝜋𝑐 diagram. So there must be a wire
from 𝑑 to 𝑡𝑐 in the limiting cocone. We pick this wire as our ℎ.

𝐿ℎ ℎ
𝐿𝑡𝑐 𝐿𝑑 𝑑
𝑡𝑐

𝜀𝑐 𝑓

The commuting condition then follows from 𝜀𝑐 being a morphism between two cocones. It is a
unique cocone morphism simply because  is a preorder.
This proves that there is a universal arrow ⟨𝑡𝑐 , 𝜀𝑐 ⟩ for every 𝑐, therefore we have a functor
𝑅 defined on objects as 𝑅𝑐 = 𝑡𝑐 that is the right adjoint to 𝐿.

Solution set condition


The problem with the previous proof is that comma categories in most practical cases are large:
their objects don’t form a set. But maybe we can approximate the comma category by selecting
a smaller but representative set of objects and arrows?
To select the objects, we’d use a mapping from some indexing set 𝐼. We define a set of
objects 𝑑𝑖 where 𝑖 ∈ 𝐼. Since we are trying to approximate the comma category 𝐿∕𝑐, we select
objects together with arrows 𝑓𝑖 ∶ 𝐿𝑑𝑖 → 𝑐.
The relevant part of the comma category is encoded in morphism between objects satisfying
the commuting condition. We could try to specialize this condition to only apply inside our
family of objects, but that would not be enough. We have to find a way to probe all other objects
of the comma category.
To do this, we reinterpret the commuting condition as a recipe for factorizing an arbitrary
𝑓 ∶ 𝐿𝑑 → 𝑐 through some pair ⟨𝑑𝑖 , 𝑓𝑖 ⟩:

𝐿ℎ
𝐿𝑑 𝐿𝑑𝑖

𝑓 𝑓𝑖
𝑐
10.8. FREYD’S ADJOINT FUNCTOR THEOREM 123

A solution set is a family of pairs ⟨𝑑𝑖 , 𝑓𝑖 ∶ 𝐿𝑑𝑖 → 𝑐⟩ indexed by a set 𝐼 that can be used
to factor any pair ⟨𝑑, 𝑓 ∶ 𝐿𝑑 → 𝑐⟩. It means that there exists an index 𝑖 ∈ 𝐼 and an arrow
ℎ ∶ 𝑑 → 𝑑𝑖 that factorizes 𝑓 :
𝑓 = 𝑓𝑖 ◦𝐿ℎ

Another way of expressing this property is to say that there exists a weakly terminal set of
object in the comma category 𝐿∕𝑐. A weakly terminal set has the property that for any object
in the category there is a morphism to at least one object in the set.
Previously we’ve seen that having the terminal object in the comma category 𝐿∕𝑐 for every
𝑐 is enough to define the adjunction. It turns out that we can achieve the same goal using the
solution set.
The assumptions of the Freyd’s adjoint functor theorem state that we have a colimit-preserving
functor 𝐿 ∶  →  from a small co-complete category. Both these conditions relate to small
diagrams. If we can pick a solution set ⟨𝑑𝑖 , 𝑓𝑖 ∶ 𝐿𝑑𝑖 → 𝑐⟩ for every 𝑐, then the right adjoint 𝑅
exists. Solution sets for different 𝑐’s may be different.
We’ve seen before that in a cocomplete category the existence of a weakly terminal set is
enough to define a terminal object. In our case it means that, for any 𝑐, we can construct the
universal arrow from 𝐿 to 𝑐. And this is enough to define the whole adjunction.
A dual version of the adjoint functor theorem can be used to construct the left adjoint.

Defunctionalization
Every programming language lets us define functions, but not all languages support higher level
functions (functions taking functions as arguments, functions returning functions, or data types
constructed from functions) or anonymous functions (a.k.a., lambdas). It turns out that, even in
such languages, higher order functions can be implemented using the process called defunction-
alization. This technique is based on the adjoint functor theorem. Moreover, defunctionalization
can be used whenever passing functions around is impractical, for instance in a distributed sys-
tem.
The idea behind defunctionalization is that the function type is defined as the right adjoint
to the product.
(𝑒 × 𝑎, 𝑏) ≅ (𝑒, 𝑏𝑎 )

The adjoint functor theorem can be used to approximate this adjoint.


In general, any finite program can only have a finite number of function definitions. These
functions (together with the environments they capture) form the solution set that we can use
to construct the function type. In practice, we do it only for a small subset of functions which
occur as arguments to, or are returned from, other functions.
A typical example of the usage of higher order functions is in continuation passing style.
For instance, here’s a function that calculates the sum of the elements of a list. But instead of
returning the sum it calls a continuation k with the result:
sumK :: [Int] -> (Int -> r) -> r
sumK [] k = k 0
sumK (i : is) k =
sumK is (\s -> k (i + s))
If the list is empty, the function calls the continuation with zero. Otherwise it calls itself recur-
sively, with two arguments: the tail of the list is, and a new continuation:
124 CHAPTER 10. ADJUNCTIONS

\s -> k (i + s)
This new continuation calls the previous continuation k, passing it the sum of the head of the
list and its argument s (which is the accumulated sum).
Notice that this lambda is a closure: It’s a function of one variable s, but it also has access
to k and i from its environment.
To extract the final sum, we call our recursive function with the trivial continuation, the
identity:
sumList :: [Int] -> Int
sumList as = sumK as (\i -> i)
Anonymous functions are convenient, but nothing prevents us from using named functions.
However, if we want to factor out the continuations, we have to be explicit about passing in the
environments.
For instance, we can replace our first lambda:
\s -> k (i + s)
with the function more, but we have to explicitly pass the pair (i, k) as the environment of
the type (Int, Int -> r):
more :: (Int, Int -> r) -> Int -> r
more (i, k) s = k (i + s)
The other lambda, the identity, uses the empty environment, so it becomes:
done :: Int -> Int
done i = i
Here’s the implementation of our algorithm using these two named functions:
sumK' :: [Int] -> (Int -> r) -> r
sumK' [] k = k 0
sumK' (i : is) k =
sumK' is (more (i, k))

sumList :: [Int] -> Int


sumList is = sumK' is done
In fact, if all we are interested in is calculating the sum, we can replace the polymorphic
type r with Int with no other changes.
This implementation still uses higher order functions. In order to eliminate them, we have
to analyze what it means to pass a function as an argument. Such a function can only be used
in one way: it can be applied to its arguments. This property of a function type is expressed as
the counit of the currying adjunction:

𝜀 ∶ 𝑏𝑎 × 𝑎 → 𝑏

or, in Haskell, as a higher-order function:


apply :: (a -> b, a) -> b
This time we are interested in constructing the counit from first principles. We’ve seen that this
can be accomplished using the comma category. In our case, an object of the comma category
for the product functor 𝐿𝑎 = (−) × 𝑎 is a pair

(𝑒, 𝑓 ∶ (𝑒 × 𝑎) → 𝑏)
10.8. FREYD’S ADJOINT FUNCTOR THEOREM 125

or, in Haskell:
data Comma a b e = Comma e ((e, a) -> b)
A morphism in this category between (𝑒, 𝑓 ) and (𝑒′ , 𝑓 ′ ) is an arrow ℎ ∶ 𝑒 → 𝑒′ , which satisfies
the commuting condition:
𝑓 ′ ◦ℎ = 𝑓

We interpret this morphism as “reducing” the environment 𝑒 down to 𝑒′ . The arrow 𝑓 ′ is able
to produce the same output of the type 𝑏 using a potentially smaller environment given by ℎ(𝑒).
For instance 𝑒 may contain variables that are irrelevant for computing 𝑏 from 𝑎, and ℎ projects
them out.

ℎ×𝑎
𝑒×𝑎 𝑒′ × 𝑎

𝑒 𝑒′
𝑓 𝑓′
𝑏

In fact, we performed this kind of reduction when defining more and done. In principle, we
could have passed the tail is to both functions, since it’s accessible at the point of call. But we
knew that they didn’t need it.
Using the Freyd’s theorem, we could define the function object 𝑎 → 𝑏 as the colimit of the
diagram defined by the comma category. Such a colimit is essentially a giant coproduct of all
environments modulo identifications given by comma-category morphisms. This identification
does the job of reducing the environment needed by 𝑎 → 𝑏 to the bare minimum.
In our example, the continuations we’re interested in are functions Int -> Int. In fact we
are not interested in generating the generic function type Int -> Int; just the minimal one that
would accommodate our two functions more and done. We can do it by creating a very small
solution set.
In our case the solution set consists of pairs (𝑒𝑖 , 𝑓𝑖 ∶ 𝑒𝑖 × 𝑎 → 𝑏) such that any pair (𝑒, 𝑓 ∶ 𝑒 ×
𝑎 → 𝑏) can be factorized through one of the 𝑓𝑖 ’s. More precisely, the only two environments
we’re interested in are (Int, Int ->Int) for more, and the empty environment () for done.
In principle, our solution set should allow for the factorization of every object of the comma
category, that is a pair of the type:
(e, (e, Int) -> Int)
but here we are only interested in two specific functions. Also, we are not concerned with the
uniqueness of the representation so, instead of using a colimit (as we did for the adjoint functor
theorem), we’ll just use a coproduct of all the environments of interest. We end up with the
following data type that is the sum of the two environments, () and (Int, Int -> Int),
we’re interested in. We end up with the type:
data Kont = Done | More Int Kont
Notice that we have recursively encoded the Int->Int part of the environment as Kont. Thus
we have also removed the need to use functions as arguments to data constructors.
If you look at this definition carefully, you will discover that it’s the definition of a list of
Int, modulo some renamings. Every call to More pushes another integer on the Kont stack.
This interpretation agrees with our intuition that recursive algorithms require some kind of a
runtime stack.
126 CHAPTER 10. ADJUNCTIONS

We are now ready to implement our approximation to the counit of the adjunction. It’s
composed from the bodies of the two functions, with the understanding that recursive calls also
go through apply:
apply :: (Kont, Int) -> Int
apply (Done, i) = i
apply (More i k, s) = apply (k, i + s)
Compare this with our earlier:
done i = i
more (i, k) s = k (i + s)
The main algorithm can now be rewritten without any higher order functions or lambdas:
sumK'' :: [Int] -> Kont -> Int
sumK'' [] k = apply (k, 0)
sumK'' (i : is) k = sumK'' is (More i k)
sumList'' is = sumK'' is Done
The main advantage of defunctionalization is that it can be used in distributed environments.
Arguments to remote functions, as long as they are data structures and not functions, can be
serialized and sent along the wire. All that’s needed is for the receiver to have access to apply.

10.9 Free/Forgetful Adjunctions


The two functors in the adjunction play different roles: the picture of the adjunction is not
symmetric. Nowhere is this illustrated better than in the case of the free/forgetful adjunctions.
A forgetful functor is a functor that “forgets” some of the structure of its source category.
This is not a rigorous definition but, in most cases, it’s pretty obvious what structure is being
forgotten. Very often the target category is just the category of sets, which is considered the
epitome of structurelessness. The result of the forgetful functor in that case is called the “un-
derlying” set, and the functor itself is often called 𝑈 .
More precisely, we say that a functor forgets structure if the mapping of hom-sets is not
surjective, that is, there are arrows in the target hom-set that have no corresponding arrows in
the source hom-set. Intuitively, it means that the arrows in the source have some structure to
preserve, so there are fewer of them; and that structure is absent in the target.
The left adjoint to a forgetful functor is called a free functor.

𝐹𝑥 𝑥

𝑦 𝑈𝑦
𝑈

A classic example of a free/forgetful adjunction is the construction of the free monoid.

The category of monoids


Monoids in a monoidal category  form their own category 𝐌𝐨𝐧(). Its objects are monoids,
and its arrows are the arrows of  that preserve the monoidal structure.
10.9. FREE/FORGETFUL ADJUNCTIONS 127

The following diagram explains what it means for 𝑓 to be a monoid morphism, going from
a monoid (𝑀1 , 𝜂1 , 𝜇1 ) to a monoid (𝑀2 , 𝜂2 , 𝜇2 ):

𝑀1 𝜇1 𝑀1 ⊗ 𝑀 1
𝜂1

𝐼 𝑓 𝑓 ⊗𝑓

𝜂2
𝑀2 𝜇2 𝑀2 ⊗ 𝑀 2

A monoid morphism 𝑓 must map unit to unit, which means that:

𝑓 ◦𝜂1 = 𝜂2

and it must map multiplication to multiplication:

𝑓 ◦𝜇1 = 𝜇2 ◦(𝑓 ⊗ 𝑓 )

Remember, the tensor product ⊗ is functorial, so it can lift pairs of arrows, as in 𝑓 ⊗ 𝑓 .


In particular, the category 𝐒𝐞𝐭 is monoidal, with cartesian product and the terminal object
providing the monoidal structure.
In particular, monoids in 𝐒𝐞𝐭 are sets with additional structure. They form their own cate-
gory 𝐌𝐨𝐧(𝐒𝐞𝐭) and there is a forgetful functor 𝑈 that simply maps the monoid to the set of its
elements. When we say that a monoid is a set, we mean the underlying set.

Free monoid
We want to construct the free functor

𝐹 ∶ 𝐒𝐞𝐭 → 𝐌𝐨𝐧(𝐒𝐞𝐭)

that is adjoint to the forgetful functor 𝑈 .


We start with an arbitrary set 𝑋 and an arbitrary monoid 𝑚. On the right-hand side of the
adjunction we have the set of functions from 𝑋 to 𝑈 𝑚. On the left-hand side, we have a set of
highly constrained structure-preserving monoid morphisms from 𝐹 𝑋 to 𝑚. How can these two
hom-sets be isomorphic?
In 𝐌𝐨𝐧(𝐒𝐞𝐭), monoids are sets of elements, and monoid morphisms are functions between
such sets, satisfying additional constraints: preserving unit and multiplication.
Arrows in 𝐒𝐞𝐭, on the other hand, are just functions with no additional constraints. So, in
general, there are fewer arrows between monoids than there are between their underlying sets.

𝐹𝑋 𝑋

𝑚 𝑈𝑚
𝑈

Here’s the idea: if we want to have a one to one matching between arrows, we want 𝐹 𝑋 to
be much larger than 𝑋. This way, there will be many more functions from it to 𝑚—so many
128 CHAPTER 10. ADJUNCTIONS

that, even after rejecting the ones that don’t preserve the structure, we’ll still have enough to
match every function 𝑓 ∶ 𝑋 → 𝑈 𝑚.
We’ll construct the monoid 𝐹 𝑋 starting from the set 𝑋, and adding more and more elements
as we go. We’ll call the initial set 𝑋 the generators of 𝐹 𝑋. We’ll construct a monoid morphism
𝑔 ∶ 𝐹 𝑋 → 𝑚 starting with the original function 𝑓 and extending it to act on more and more
elements.
On generators, 𝑥 ∈ 𝑋, 𝑔 works the same as 𝑓 :

𝑔𝑥 = 𝑓 𝑥

Since 𝐹 𝑋 is supposed to be a monoid, it has to have a unit. We can’t pick one of the
generators to be the unit, because it would impose constraints on the part of 𝑔 that is already
fixed by 𝑓 —it would have to map it to the unit 𝑒′ of 𝑚. So we’ll just add an extra element 𝑒 to
𝐹 𝑋 and call it the unit. We’ll define the action of 𝑔 on it by saying that it is mapped to the unit
𝑒′ of 𝑚:
𝑔𝑒 = 𝑒′
We also have to define monoidal multiplication in 𝐹 𝑋. Let’s start with a product of two
generators 𝑎 and 𝑏. The result of the multiplication cannot be another generator because, again,
that would constrain the part of 𝑔 that’s fixed by 𝑓 —products must be mapped to products. So
we have to make all products of generators new elements of 𝐹 𝑋. Again, the action of 𝑔 on those
products is fixed:
𝑔(𝑎 ⋅ 𝑏) = 𝑔𝑎 ⋅ 𝑔𝑏
Continuing with this construction, any new multiplication produces a new element of 𝐹 𝑋,
except when it can be reduced to an existing element by applying monoid laws. For instance,
the new unit 𝑒 times a generator 𝑎 must be equal to 𝑎. But we have made sure that 𝑒 is mapped
to the unit of 𝑚, so the product 𝑔𝑒 ⋅ 𝑔𝑎 is automatically equal to 𝑔𝑎.
Another way of looking at this construction is to think of the set 𝑋 as an alphabet. The
elements of 𝐹 𝑋 are then strings of characters from this alphabet. The generators are single-
letter strings, “𝑎”, “𝑏”, and so on. The unit is an empty string “”. Multiplication is string
concatenation, so “𝑎” times “𝑏” is a new string “𝑎𝑏”. Concatenation is automatically associative
and unital, with the empty string as the unit.
The intuition behind free functors is that they generate structure “freely,” as in “with no ad-
ditional constraints.” They also do it lazily: instead of performing operations, they just record
them. They create generic domain-specific programs that can be executed later by specific in-
terpreters.
The free monoid “remembers to do the multiplication” at a later time. It stores the arguments
to multiplication in a string, but doesn’t perform the multiplication. It’s only allowed to simplify
its records based on generic monoidal laws. For instance, it doesn’t have to store the command
to multiply by the unit. It can also “skip the parentheses” because of associativity.
Exercise 10.9.1. What is the unit and the counit of the free monoid adjunction 𝐹 ⊣ 𝑈 ?

Free monoid in programming


In Haskell, monoids are defined using the following typeclass:
class Monoid m where
mappend :: m -> m -> m
mempty :: m
10.10. THE CATEGORY OF ADJUNCTIONS 129

Here, mappend is the curried form of the mapping from the product: (m, m) -> m. The mempty
element corresponds to the arrow from the terminal object (unit of the monoidal category), or
simply an element of m.
A free monoid generated by some type a, which serves as a set of generators, is represented
by a list type [a]. An empty list serves as the unit; and monoid multiplication is implemented
as list concatenation, traditionally written in infix form:
(++) :: [a] -> [a] -> [a]
(++) [] ys = ys
(++) (x:xs) ys = x : xs ++ ys
A list is an instance of a Monoid:
instance Monoid [a] where
mempty = []
mappend = (++)
To show that it’s a free monoid, we have to be able to construct a monoid morphism from
the list of a to an arbitrary monoid m, provided we have an (unconstrained) mapping from a to
(the underlying set of) m. We can’t express all of this in Haskell, but we can define the function:
foldMap :: Monoid m => (a -> m) -> ([a] -> m)
foldMap f = foldr mappend mempty . fmap f
This function transforms the elements of the list to monoidal values using f and then folds them
using mappend, starting with the unit mempty.
It’s easy to see that an empty list is mapped to the monoidal unit. It’s not too hard to see
that a concatenation of two lists is mapped to the monoidal product of the results. So, indeed,
foldMap is a monoid morphism.
Following the intuition of a free monoid being a domain-specific program for multiplying
stuff, foldMap provides an interpreter for this program. It performs all the multiplications that
have been postponed. Note that the same program may be interpreted in many different ways,
depending on the choice of the concrete monoid and the function f.
We’ll come back to free monoids as lists in the chapter on algebras.

Exercise 10.9.2. Write a program that takes a list of integers and interprets it in two different
ways: once using the additive and once using the multiplicative monoid of integers.

10.10 The Category of Adjunctions


We can define composition of adjunctions by taking advantage of the composition of functors
that define them. Two adjunctions, 𝐿 ⊣ 𝑅 and 𝐿′ ⊣ 𝑅′ , are composable if they share the
category in the middle:
𝐿′ 𝐿

  
𝑅′ 𝑅

By composing the functors we get a new adjunction (𝐿′ ◦𝐿) ⊣ (𝑅◦𝑅′ ).


Indeed, let’s consider the hom-set:

(𝐿′ (𝐿𝑒), 𝑐)
130 CHAPTER 10. ADJUNCTIONS

Using the 𝐿′ ⊣ 𝑅′ adjunction, we can transpose 𝐿′ to the right, where it becomes 𝑅′ :

(𝐿𝑒, 𝑅′ 𝑐)

and using 𝐿 ⊣ 𝑅 we can similarly transpose 𝐿:

(𝑒, 𝑅(𝑅′ 𝑐))

Combining these two isomorphisms, we get the composite adjunction:

((𝐿′ ◦𝐿)𝑒, 𝑐) ≅ (𝑒, (𝑅◦𝑅′ )𝑐)

Because functor composition is associative, the composition of adjunctions is also associa-


tive. It’s easy to see that a pair of identity functors forms a trivial adjunction that serves as
the identity with respect to composition of adjunctions. Therefore we can define a category
𝐀𝐝𝐣(𝐂𝐚𝐭) in which objects are categories and arrows are adjunctions (by convention, pointing
in the direction of the left adjoint).
Adjunctions can be defined purely in terms of functors and natural transformations, that
is 1-cells and 2-cells in the 2-category 𝐂𝐚𝐭. There is nothing special about 𝐂𝐚𝐭, and in fact
adjunctions can be defined in any 2-category. Moreover, the category of adjunctions is itself a
2-category.

10.11 Levels of Abstraction


Category theory is about structuring our knowledge. In particular, it can be applied to the knowl-
edge of category theory itself. Hence we see a lot of mixing of abstraction levels in category
theory. The structures that we see at one level can be grouped into higher-level structures which
exhibit even higher levels of structure, and so on.
In programming we are used to building hierarchies of abstractions. Values are grouped into
types, types into kinds. Functions that operate on values are treated differently than functions
that operate on types. We often use different syntax to separate levels of abstractions. Not so in
category theory.
A set, categorically speaking, can be described as a discrete category. Elements of the set are
objects of this category and, other than the obligatory identity morphisms, there are no arrows
between them.
The same set can then be seen as an object in the category 𝐒𝐞𝐭. Arrows in this category are
functions between sets.
The category 𝐒𝐞𝐭, in turn, is an object in the category 𝐂𝐚𝐭. Arrows in 𝐂𝐚𝐭 are functors.
Functors between any two categories  and  are objects in the functor category [, ].
Arrows in this category are natural transformations.
We can define functors between functor categories, product categories, opposite categories,
and so on, ad infinitum.
Completing the circle, hom-sets in every category are sets. We can define mappings and
isomorphisms between them, reaching across disparate categories. Adjunctions are possible
because we can compare hom-sets that live in different categories.
Chapter 11

Dependent Types

We’ve seen types that depend on other types. They are defined using type constructors with type
parameters, like Maybe or []. Most programming languages have some support for generic data
types—data types parameterized by other data types.
Categorically, such types are modeled as functors 1 .
A natural generalization of this idea is to have types that are parameterized by values. For
instance, it’s often advantageous to encode the length of a list in its type. A list of length zero
would have a different type than a list of length one, and so on.
Obviously, you cannot change the length of such a list, since it would change its type. This
is not a problem in functional programming, where all data types are immutable anyway. When
you prepend an element to a list, you create a new list, at least conceptually. With a length-
encoded list, this new list is of a different type, that’s all!
Types parameterized by values are called dependent types. There are languages like Idris
or Agda that have full support for dependent types. It’s also possible to implement dependent
types in Haskell, but support for them is still rather patchy.
The reason for using dependent types in programming is to make programs provably correct.
In order to do that, the compiler must be able to check the assumptions made by the programmer.
Haskell, with its strong type system, is able to uncover a lot of bugs at compile time. For
instance, it won’t let you write a <> b (infix notation for mappend), unless you provide the
Monoid instance for the type of your variables.
However, within Haskell’s type system, there is no way to express or, much less enforce,
the unit and associativity laws for the monoid. For that, the instance of the Monoid type class
would have to carry with itself proofs of equality (not actual code):
assoc :: m <> (n <> p) = (m <> n) <> p
lunit :: mempty <> m = m
runit :: m <> mempty = m
Dependent types, and equality types in particular, pave the way towards this goal.
The material in this chapter is more advanced, and not used in the rest of the book, so you
may safely skip it on first reading. Also, to avoid confusion between fibers and functions, I
decided to use capital letters for objects in parts of this chapter.

1
A type constructor that has no Functor instance can be thought of as a functor from a discrete category—a
category with no arrows other than identities

131
132 CHAPTER 11. DEPENDENT TYPES

11.1 Dependent Vectors


We’ll start with the standard example of a counted list, or a vector:
data Vec n a where
VNil :: Vec Z a
VCons :: a -> Vec n a -> Vec (S n) a
The compiler will recognize this definition as dependently typed if you include the following
language pragmas:
{- # LANGUAGE DataKinds # -}
{- # LANGUAGE GADTs # -}
The first argument to the type constructor is a natural number n. Notice: this is a value, not a
type. The type checker is able to figure this out from the usage of n in the two data constructors.
The first one creates a vector of the type Vec Z a, and the second creates a vector of the type
Vect (S n) a, where Z and S are defined as the constructors of natural numbers:
data Nat = Z | S Nat
We can be more explicit about the parameters if we use the pragma:
{- # LANGUAGE KindSignatures # -}
and import the library:
import Data.Kind
We can then specify that n is a Nat, whereas a is a Type:
data Vec (n :: Nat) (a :: Type) where
VNil :: Vec Z a
VCons :: a -> Vec n a -> Vec (S n) a
Using one of these definitions we can, for instance, construct a vector (of integers) of length
zero:
emptyV :: Vec Z Int
emptyV = VNil
It has a different type than a vector of length one:
singleV :: Vec (S Z) Int
singleV = VCons 42 VNil
and so on.
We can now define a dependently typed function that returns the first element of a vector:
headV :: Vec (S n) a -> a
headV (VCons a _) = a
This function is guaranteed to work exclusively with non-zero-length vectors. These are the
vectors whose size matches (S n), which cannot be Z. If you try to call this function with
emptyV, the compiler will flag the error.
Another example is a function that zips two vectors together. Encoded in its type signature
is the requirement that the two vectors be of the same size n (the result is also of the size n):
zipV :: Vec n a -> Vec n b -> Vec n (a, b)
zipV (VCons a as) (VCons b bs) = VCons (a, b) (zipV as bs)
11.2. DEPENDENT TYPES CATEGORICALLY 133

zipV VNil VNil = VNil


Dependent types are especially useful when encoding the shapes of containers. For instance,
the shape of a list is encoded in its length. A more advanced example would encode the shape
of a tree as a runtime value.

Exercise 11.1.1. Implement the function tailV that returns the tail of the non-zero-length vec-
tor. Try calling it with emptyV.

11.2 Dependent Types Categorically


The easiest way to visualize dependent types is to think of them as families of types indexed by
elements of a set. In the case of counted vectors, the indexing set would be the set of natural
numbers ℕ.
The zeroth type would be the unit type () representing an empty vector. The type corre-
sponding to (S Z) would be a; then we’d have a pair (a, a), a triple (a, a, a) and so on,
with higher and higher powers of a.
If we want to talk about the whole family as one big set, we can take the sum of all these
types. For instance, the sum of all powers of 𝑎 is the familiar list type, a.k.a., a free monoid:

𝐿𝑖𝑠𝑡(𝑎) = 1 + 𝑎 + 𝑎 × 𝑎 + 𝑎 × 𝑎 × 𝑎 + ... = 𝑎𝑛
𝑛∶ℕ

Fibrations
Although intuitively easy to visualize, this point of view doesn’t generalize nicely to category
theory, where we don’t like mixing sets with objects. So we turn this picture on its head and
instead of talking about injecting family members into the sum, we consider a mapping that
goes in the opposite direction.
This, again, we can first visualize using sets. We have one big set 𝐸 describing the entire
family, and a function 𝑝 called the projection, or a display map, that goes from 𝐸 down to the
indexing set 𝐵 (also called the base).
This function will, in general, map multiple elements to one. We can then talk about the
inverse image of a particular element 𝑥 ∈ 𝐵 as the set of elements that get mapped down to it by
𝑝. This set is called the fiber and is written 𝑝−1 𝑥 (even though, in general, 𝑝 is not invertible in
the usual sense). Seen as a collection of fibers, 𝐸 is often called a fiber bundle or just a bundle.

𝑝−1 𝑥
𝐸

𝑥 𝐵

Now forget about sets. A fibration in an arbitrary category is a pair of objects 𝑒 and 𝑏 and
an arrow 𝑝 ∶ 𝑒 → 𝑏.
So this is really just an arrow, but the context is everything. When an arrow is called a
fibration, we use the intuition from sets, and imagine its source 𝑒 as a collection of fibers, with
𝑝 projecting each fiber down to a single point in the base 𝑏.
134 CHAPTER 11. DEPENDENT TYPES

We can go even further: since (small) categories form a category 𝐂𝐚𝐭 with functors as
arrows, we can define a fibration of a category, taking another category as its base.

Type families as fibrations


We will therefore model type families as fibrations. For instance, our counted-vector family can
be represented as a fibration whose base is the type of natural numbers. The whole family is a
sum (coproduct) of consecutive powers (products) of 𝑎:

𝐿𝑖𝑠𝑡(𝑎) = 𝑎0 + 𝑎1 + 𝑎2 + ... = 𝑎𝑛
𝑛∶ ℕ

with the zeroth power—the initial object—representing the vector of size zero.

...
0 1 2 3 4

The projection 𝑝 ∶ 𝐿𝑖𝑠𝑡(𝑎) → ℕ is the familiar 𝑙𝑒𝑛𝑔𝑡ℎ function.


In category theory we like to describe things in bulk—defining internal structure of things
by structure-preserving maps between them. Such is the case with fibrations. If we fix the base
object 𝑏 and consider all possible source objects in the category , and all possible projections
down to 𝑏, we get a slice category ∕𝑏. This category represents all the ways we can slice the
objects of  over the base 𝑏.
Recall that the objects in the slice category are pairs ⟨𝑒, 𝑝 ∶ 𝑒 → 𝑏⟩, and a morphism between
two objects ⟨𝑒, 𝑝⟩ and ⟨𝑒′ , 𝑝′ ⟩ is an arrow 𝑓 ∶ 𝑒 → 𝑒′ that commutes with the projections, that
is:
𝑝′ ◦𝑓 = 𝑝
The best way to visualize this is to notice that such a morphism maps fibers of 𝑝 to fibers of 𝑝′ .
It’s a “fiber-preserving” mapping between bundles.

𝑓
𝑒 𝑒′
𝑝 𝑝′
𝑏
Our counted vectors can be seen as objects in the slice category ∕ℕ given by pairs ⟨𝐿𝑖𝑠𝑡(𝑎), 𝑙𝑒𝑛𝑔𝑡ℎ⟩.
A morphism in this category maps vectors of length 𝑛 to vectors of the same length 𝑛.

Pullbacks
We’ve seen a lot of examples of commuting squares. Such a square is a graphical representation
of an equation: two path between opposite corners of a square, each a result of a composition
of two morphisms, are equal.
Like with every equality we may want to replace one or more of its components with an
unknown, and try to solve the resulting equation. For instance, we may ask the question: Is
there an object together with two arrows that would complete a commuting square? If many
such objects exist, is there a universal one? If the missing piece of the puzzle is the upper left
11.2. DEPENDENT TYPES CATEGORICALLY 135

corner of a square (the source), we call it a pullback. If it’s the lower right corner (the target),
we call it a pushout.

? 𝑓
? 𝐸 𝐸 𝐸′
? 𝑝 𝑝 ?
𝑓 ?
𝐴 𝐵 𝐵 ?
Let’s start with a particular fibration 𝑝 ∶ 𝐸 → 𝐵 and ask ourselves the question: what
happens when we change the base from 𝐵 to some 𝐴 that is related to it through a mapping
𝑓 ∶ 𝐴 → 𝐵. Can we “pull the fibers back” along 𝑓 ?
Again, let’s think about sets first. Imagine picking a fiber in 𝐸 over some point 𝑦 ∈ 𝐵 that
is in the image of 𝑓 . If 𝑓 were invertible, there would be an element 𝑥 = 𝑓 −1 𝑦. We’d plant our
fiber over it. In general, though, 𝑓 is not invertible. It means that there could be more elements
of 𝐴 that are mapped to our 𝑦. In the picture below you see two such elements, 𝑥1 and 𝑥2 . We’ll
just duplicate the fiber above 𝑦 and plant it over all elements that map to 𝑦. This way, every
point in 𝐴 will have a fiber sticking out of it. The sum of all these fibers will form a new bundle
𝐸 ′.

𝐸′ 𝐸

𝐴 𝑥1 𝑥2 𝑦 𝐵
𝑓

We have thus constructed a new fibration with the base 𝐴. Its projection 𝑝′ ∶ 𝐸 ′ → 𝐴 maps
each point in a given fiber to the point over which this fiber was planted. There is also an obvious
mapping 𝑔 ∶ 𝐸 ′ → 𝐸 that maps fibers to their corresponding fibers.
By construction, this new fibration ⟨𝐸 ′ , 𝑝′ ⟩ satisfies the condition:

𝑝◦𝑔 = 𝑓 ◦𝑝′

which can be represented as a commuting square:


𝑔
𝐸′ 𝐸
𝑝′ 𝑝
𝑓
𝐴 𝐵

In 𝐒𝐞𝐭, we can explicitly construct 𝐸 ′ as a subset of the cartesian product 𝐴×𝐸 with 𝑝′ = 𝜋1
and 𝑔 = 𝜋2 (the two cartesian projections). An element of 𝐸 ′ is a pair ⟨𝑎, 𝑒⟩, such that:

𝑓 (𝑎) = 𝑝(𝑒)

This commuting square is the starting point for the categorical generalization. However,
even in 𝐒𝐞𝐭 there are many different fibrations over 𝐴 that make this diagram commute. We
have to pick the universal one. Such a universal construction is called a pullback, or a fibered
product.
136 CHAPTER 11. DEPENDENT TYPES

In category theory, a pullback of 𝑝 ∶ 𝑒 → 𝑏 along 𝑓 ∶ 𝑎 → 𝑏 is an object 𝑒′ together with


two arrows 𝑝′ ∶ 𝑒′ → 𝑎 and 𝑔 ∶ 𝑒′ → 𝑒 that makes the following diagram commute
𝑔
𝑒′ 𝑒
𝑝′ 𝑝
𝑓
𝑎 𝑏

and that satisfies the universal condition.


The universal condition says that, for any other candidate object 𝑥 with two arrows 𝑞 ′ ∶ 𝑥 →
𝑒 and 𝑞 ∶ 𝑥 → 𝑎 such that 𝑝◦𝑞 ′ = 𝑓 ◦𝑞 (making the bigger “square” commute), there is a unique
arrow ℎ ∶ 𝑥 → 𝑒′ that makes the two triangles commute, that is:

𝑞 = 𝑝′ ◦ℎ
𝑞 ′ = 𝑔◦ℎ

Pictorially:
𝑥 𝑞′

𝑔
𝑒′ 𝑒
𝑞 ⌟
𝑝′ 𝑝
𝑓
𝑎 𝑏
The angle symbol in the upper corner of the square is used to mark pullbacks.
If we look at the pullback through the prism of sets and fibrations, 𝑒 is a bundle over 𝑏, and
we are constructing a new bundle 𝑒′ out of the fibers taken from 𝑒. Where we plant these fibers
over 𝑎 is determined by (the inverse image of) 𝑓 . This procedure makes 𝑒′ a bundle over both 𝑎
and 𝑏, the latter with the projection 𝑝◦𝑔 = 𝑓 ◦𝑝′ .
The 𝑥 in this picture is some other bundle over 𝑎 with the projection 𝑞. It is simultaneously
a bundle over 𝑏 with the projection 𝑓 ◦𝑞 = 𝑝◦𝑞 ′ . The unique mapping ℎ maps the fibers of 𝑥
given by 𝑞 −1 to fibers of 𝑒′ given by 𝑝′−1 .
All mappings in this picture work on fibers. Some of them rearrange fibers over new bases—
that’s what a pullback does. Other mappings modify individual fibers—the mapping ℎ ∶ 𝑥 → 𝑒′
works like this.
If you think of bundles as containers of fibers, the rearrangements of fibers corresponds to
natural transformations, and the modifications of fibers correspond to the action of fmap.
The universal condition then tells us that 𝑞 ′ can be factored into a modification of fibers ℎ,
followed by the rearrangement of fibers 𝑔.
It’s worth noting that picking the terminal object or the singleton set as the pullback target
gives us automatically the definition of the cartesian product:
𝜋2
𝑏×𝑒 𝑒

𝜋1 !
!
𝑏 1

Alternatively, we can think of this picture as planting as many copies of 𝑒 as there are ele-
ments in 𝑏. We’ll use this analogy when we talk about the dependent sum and product.
11.2. DEPENDENT TYPES CATEGORICALLY 137

Notice also that a single fiber can be extracted from a fibration by pulling it back to the
terminal object. In this case the mapping 𝑥 ∶ 1 → 𝑏 picks an element of the base, and the
pullback along it extracts a single fiber 𝜑:
𝑔
𝜑 𝑒

! 𝑝
𝑥
1 𝑏

The arrow 𝑔 injects this fiber back into 𝑒. By varying 𝑥 we can pick different fibers in 𝑒.

Exercise 11.2.1. Show that the pullback with the terminal object as the target is the product.

Exercise 11.2.2. Show that a pullback can be defined as a limit of the diagram from a stick-figure
category with three objects:
𝑎→𝑏←𝑐

Exercise 11.2.3. Show that a pullback in  with the target 𝑏 is a product in the slice category
∕𝑏. Hint: Define two projections as morphisms in the slice category. Use universality of the
pullback to show the universality of the product.

Substitution
We have two alternative descriptions of dependent types: one as fibrations and another as type
families. It’s in the latter framework that the pullback along a morphism 𝑓 can be interpreted as
a substitution. When we have a type family 𝑇 𝑦 parameterized by elements 𝑦 ∶ 𝐵 and we always
define a new type family by substituting 𝑓 𝑥 for 𝑦.

𝑇 (𝑓 𝑥) 𝑇𝑦

𝑓
𝑥 𝑦
The new type family is thus parameterized by different shapes.

Dependent environments
When modeling lambda calculus, we used the objects of a cartesian closed category to serve
both as types and environments. An empty environment was modeled as the terminal object
(unit type), and we were building more complex environments using products. The order in
which we multiply types doesn’t matter since the product is symmetric (up to isomorphism).
When dealing with dependent types, we have to take into account that the type we are adding
to the environment may depend on the values of the types already present in the environment.
As before, we start with an empty environment modeled as the terminal object.

Weakening
Base-change functor
We used a cartesian closed category as a model for programming. To model dependent types,
we need to impose an additional condition: We require the category to be locally cartesian
closed. This is a category in which all slice categories are cartesian closed.
138 CHAPTER 11. DEPENDENT TYPES

In particular, such categories have all pullbacks, so it’s always possible to change the base
of any fibration. Base change induces a mapping between slice categories that is functorial.
Given two slice categories ∕𝑏 and ∕𝑎 and an arrow between bases 𝑓 ∶ 𝑏 → 𝑎 the base-
change functor 𝑓 ∗ ∶ ∕𝑎 → ∕𝑏 maps a fibration ⟨𝑒, 𝑝⟩ to the fibration 𝑓 ∗ ⟨𝑒, 𝑝⟩ = ⟨𝑓 ∗ 𝑒, 𝑓 ∗ 𝑝⟩,
which is given by the pullback:
𝑔
𝑓 ∗𝑒 𝑒

𝑓 ∗𝑝 𝑝
𝑓
𝑏 𝑎
Notice that the functor 𝑓 ∗ goes in the opposite direction to the arrow 𝑓 .
To visualize the base-change functor let’s consider how it works on sets.
𝑔
𝑓 ∗𝐸 𝐸

𝑓 ∗𝑝 𝑝
𝑓
𝐵 𝐴

We have the intuition that the fibration 𝑝 decomposes the set 𝐸 into fibers over each point of 𝐴.
We can think of 𝑓 as another fibration that similarly decomposes 𝐵. Let’s call these fibers
in 𝐵 “patches.” For instance, if 𝐴 is just a two-element set, then the fibration given by 𝑓 splits
𝐵 into two patches. The pullback takes a fiber from 𝐸 and plants it over the whole patch in
𝐵. The resulting set 𝑓 ∗ 𝐸 looks like a patchwork, where each patch is planted with clones of a
single fiber from 𝐸.

𝑓 ∗𝐸 𝐸

𝐵 𝐴
𝑓

Since we have a function from 𝐵 to 𝐴 that may map many elements to one, the fibration
over 𝐵 has finer grain than the coarser fibration over 𝐴. The simplest, least-effort way to turn
the fibration of 𝐸 over 𝐴 to a fibration over 𝐵, is to spread the existing fibers over the patches
defined by (the inverse of) 𝑓 . This is the essence of the universal construction of the pullback.
In particular, if 𝐴 is a singleton set (the terminal object), then we have only one fiber (the
whole of 𝐸) and the bundle 𝑓 ∗ 𝐸 is a cartesian product 𝐵 × 𝐸. Such bundle is called a trivial
bundle.
A non-trivial bundle is not a product, but it can be locally decomposed into products. Just
as 𝐵 is a sum of patches, so 𝑓 ∗ 𝐸 is a sum of products of these patches and the corresponding
fibers of 𝐸.
You may also think of 𝐴 as providing an atlas that enumerates all the patches in the base 𝐵.
Imagine that 𝐴 is a set of countries and 𝐵 is a set of cities. The mapping 𝑓 assigns a country to
each city.
Continuing with this example, let 𝐸 be the set of languages fibrated by the country. If we
assume that in each city the language of the given country is spoken, the base-change functor
replants the country’s languages over each of its cities.
11.3. DEPENDENT SUM 139

By the way, this idea of using local patches and an atlas goes back to differential geom-
etry and general relativity, where we often glue together local coordinate systems to describe
topologically nontrivial bundles, like Möbius strips or Klein bottles.
As we’ll see soon, in a locally cartesian closed category, the base change functor has both
the left and the right adjoints. The left adjoint to 𝑓 ∗ is called 𝑓! (sometimes pronounced “f
down-shriek”) and the right adjoint is called 𝑓∗ (“f down-star”):

𝑓! ⊣ 𝑓 ∗ ⊣ 𝑓∗

In programming, the left adjoint is called the dependent sum, and the right adjoint is called the
dependent product or dependent function:

Σ𝑓 ⊣ 𝑓 ∗ ⊣ Π𝑓

Exercise 11.2.4. Define the action of the base-change functor on morphisms in ∕𝑎, that is,
given a morphism ℎ construct its counterpart 𝑓 ∗ ℎ

𝑓 ∗ℎ ℎ
𝑓 ∗ 𝑒′ 𝑓 ∗𝑒 𝑒′ 𝑒
𝑓 ∗𝑝 𝑝′ 𝑝
𝑓 ∗ 𝑝′
𝑏 𝑓
𝑎

Hint: Use the universality of the pullback and the commuting condition: 𝑔 ′ ◦ℎ◦𝑝 = 𝑓 ∗ 𝑝′ ◦𝑓 .

𝑔 ′ ◦ℎ
𝑓 ∗ 𝑒′
𝑓 ∗ℎ 𝑔′ ℎ
𝑓 ∗ 𝑒′ 𝑒′ 𝑒
𝑔 ⌟
𝑓 ∗𝑒 𝑒 𝑓 ∗ 𝑝′ 𝑝′ 𝑝
⌟ 𝑓
𝑓 ∗ 𝑝′ 𝑓 ∗𝑝 𝑝 𝑏 𝑎
𝑓
𝑏 𝑎

11.3 Dependent Sum


In type theory, the dependent sum, or the sigma type Σ𝑥∶𝐵 𝑇 (𝑥), is defined as a type of pairs in
which the type of the second component depends on the value of the first component.
Conceptually, the sum type is defined using its mapping-out property. The mapping out of
a sum is a pair of mappings, as illustrated in this adjunction:

(𝐹1 + 𝐹2 , 𝐹 ) ≅ ( × )(⟨𝐹1 , 𝐹2 ⟩, Δ𝐹 )

Here, we have a pair of arrows (𝐹1 → 𝐹 , 𝐹2 → 𝐹 ) that define the mapping out of the sum
𝑆 = 𝐹1 + 𝐹2 . In 𝐒𝐞𝐭, the sum is a tagged union. A dependent sum is a sum that is tagged by
elements of another set.
Our counted vector type can be thought of as a dependent sum tagged by natural numbers.
An element of this type is a natural number n (a value) paired with an element of the n-tuple
type (a, a, ... a). Here are some counted vectors of integers written in this representation:
140 CHAPTER 11. DEPENDENT TYPES

(0, ())
(1, 42)
(2, (64, 7))
(5, (8, 21, 14, -1, 0))
More generally, the introduction rule for the dependent sum assumes that there is a family of
types 𝑇 (𝑥) indexed by elements of the base type 𝐵. Then an element of Σ𝑥∶𝐵 𝑇 (𝑥) is constructed
from a pair of elements 𝑥 ∶ 𝐵 and 𝑦 ∶ 𝑇 (𝑥).
Categorically, dependent sum is modeled as the left adjoint of the base-change functor.
To see this, let’s first revisit the definition of a pair, which is an element of a product. We’ve
noticed before that a product can be written as a pullback from the singleton set—the terminal
object. Here’s the universal construction for the product/pullback (the notation anticipates the
target of this construction):
𝜙
𝑆
𝜙𝑇

𝜋2
𝑞 𝐵×𝐹 𝐹

𝜋1 !
!
𝐵 1
We have also seen that the product can be defined using an adjunction. We can spot this
adjunction in our diagram: for every pair of arrows ⟨𝜙, 𝑞⟩ there is a unique arrow 𝜙𝑇 that makes
the triangles commute.
Notice that, if we keep 𝑞 fixed, we get a one-to-one correspondence between the arrows 𝜙
and 𝜙𝑇 . This will be the adjunction we’re interested in.
We can now put our fibrational glasses on and notice that ⟨𝑆, 𝑞⟩ and ⟨𝐵 × 𝐹 , 𝜋1 ⟩ are two
fibrations over the same base 𝐵. The commuting triangle makes 𝜙𝑇 a morphism in the slice
category ∕𝐵, or a fiber-wise mapping. In other words 𝜙𝑇 is a member of the hom-set:
(⟨ ⟩ ⟨ ⟩)
𝑆 𝐵×𝐹
(∕𝐵) ,
𝑞 𝜋1

Since 𝜙 is a member of the hom-set (𝑆, 𝐹 ), we can rewrite the one-to-one correspondence
between 𝜙𝑇 and 𝜙 as an isomorphism of hom-sets:
(⟨ ⟩ ⟨ ⟩)
𝑆 𝐵×𝐹
(∕𝐵) , ≅ (𝑆, 𝐹 )
𝑞 𝜋1

In fact, it’s an adjunction in which we have the forgetful functor 𝑈 ∶ ∕𝐵 →  mapping ⟨𝑆, 𝑞⟩
to 𝑆, thus forgetting the fibration.
If you squint at this adjunction hard enough, you can see the outlines of the definition of 𝑆
as a categorical sum (coproduct).
Firstly, on the right you have a mapping out of 𝑆. Think of 𝑆 as the sum of fibers that are
defined by the fibration ⟨𝑆, 𝑞⟩.
Secondly, recall that the fibration ⟨𝐵 × 𝐹 , 𝜋1 ⟩ can be though of as producing many copies
of 𝐹 planted over points in 𝐵. This is a generalization of the diagonal functor Δ that duplicates
𝐹 —here, we make “𝐵 copies” of 𝐹 . The left hand side of the adjunction is just a bunch of
arrows, each mapping a different fiber of 𝑆 to the target fiber 𝐹 .
11.3. DEPENDENT SUM 141

𝜙𝑇

𝐵×𝐹 𝐹

Applying this idea to our counted-vector example, 𝜙𝑇 stands for infinitely many functions,
one per every natural number. In practice, we define these functions using recursion. For in-
stance, here’s a mapping out of a vector of integers:
sumV :: Vec n Int -> Int
sumV VNil = 0
sumV (VCons n v) = n + sumV v

Adding the atlas


We can generalize our diagram by replacing the terminal object with an arbitrary base 𝐴 (an
atlas). Instead of a single fiber, we now have a fibration ⟨𝐹 , 𝑝⟩, and we use the pullback square
that defines the base-change functor 𝑓 ∗ :

𝜙
𝑆
𝜙𝑇

𝑔
𝑞 𝑓 ∗𝐹 𝐹

𝑓 ∗𝑝 𝑝
𝑓
𝐵 𝐴

We can imagine that the fibration over 𝐵 is finer grain, since 𝑓 may map multiple points
to one. Think, for instance, of a function even :: Nat -> Bool that creates two bunches of
even and odd numbers. In this picture, 𝑓 defines a coarser “resampling” of the original 𝑆.
The universality of the pullback results in the following isomorphism of hom-sets:
(⟨ ⟩ ⟨ ⟩) (⟨ ⟩ ⟨ ⟩)
𝑆 ∗ 𝐹 𝑆 𝐹
(∕𝐵) ,𝑓 ≅ (∕𝐴) ,
𝑞 𝑝 𝑓 ◦𝑞 𝑝

Here, 𝜙𝑇 is an element of the left-hand side, and 𝜙 is the corresponding element of the right-
hand side.
We interpret this isomorphism as the adjunction between the base change functor 𝑓 ∗ on the
left and the dependent sum functor on the right.
(⟨ ⟩ ⟨ ⟩) ( ⟨ ⟩ ⟨ ⟩)
𝑆 ∗ 𝐹 𝑆 𝐹
(∕𝐵) ,𝑓 ≅ (∕𝐴) Σ𝑓 ,
𝑞 𝑝 𝑞 𝑝
142 CHAPTER 11. DEPENDENT TYPES

The dependent sum is thus given by this formula:


⟨ ⟩ ⟨ ⟩
𝑆 𝑆
Σ𝑓 =
𝑞 𝑓 ◦𝑞
This says that, if 𝑆 is fibered over 𝐵 using 𝑞, and there is a mapping 𝑓 from 𝐵 to 𝐴, then 𝑆 is
automatically (more coarsely) fibered over 𝐴, the projection being the composition 𝑓 ◦𝑞.
We’ve seen before that, in 𝐒𝐞𝐭, 𝑓 defines patches within 𝐵. Fibers of 𝐹 are replanted in these
patches to form 𝑓 ∗ 𝐹 . Locally—that is within each patch—𝑓 ∗ 𝐹 looks like a cartesian product.

𝑓 ∗𝐹 𝐹

𝐵 𝐴
𝑓

𝑆 itself is fibered in two ways: coarsely chopped over 𝐴 using 𝑓 ◦𝑞 and finely julienned over 𝐵
using 𝑞.
In category theory, the dependent sum, which is the left adjoint to the base change functor
𝑓 ∗ , is denoted by 𝑓! . For a given 𝑓 ∶ 𝑏 → 𝑎, it’s a functor:

𝑓! ∶ ∕𝑏 → ∕𝑎

Its action on an object (𝑠, 𝑞 ∶ 𝑠 → 𝑏) is given by post-composition by 𝑓 :

𝑓! (𝑠, 𝑞) = (𝑠, 𝑓 ◦𝑞)

Existential quantification
In the propositions as types interpretation, type families correspond to families of propositions.
The dependent sum type Σ𝑥∶𝐵 𝑇 (𝑥) corresponds to the proposition: There exists an 𝑥 for which
𝑇 (𝑥) is true:
∃𝑥∶𝐵 𝑇 (𝑥)
Indeed, a term of the type Σ𝑥∶𝐵 𝑇 (𝑥) is a pair of an element 𝑥 ∶ 𝐵 and an element 𝑦 ∶ 𝑇 (𝑥)—
which shows that 𝑇 (𝑥) is inhabited for some 𝑥.

11.4 Dependent Product


In type theory, the dependent product, or dependent function, or pi-type Π𝑥∶𝐵 𝑇 (𝑥), is defined
as a function whose return type depends on the value of its argument.
It’s called a function, because you can evaluate it. Given a dependent function 𝑓 ∶ Π𝑥∶𝐵 𝑇 (𝑥),
you may apply it to an argument 𝑥 ∶ 𝐵 to get a value 𝑓 (𝑥) ∶ 𝑇 (𝑥).

Dependent product in Haskell


A simple example of a dependent product is a function that constructs a vector of a given size
and fills it with copies of a given value:
11.4. DEPENDENT PRODUCT 143

replicateV :: a -> SNat n -> Vec n a


replicateV _ SZ = VNil
replicateV x (SS n) = VCons x (replicateV x n)
At the time of this writing, Haskell’s support for dependent types is limited, so the imple-
mentation of dependent functions requires the use of singleton types. In this case, the number
that is the argument to replicateV is passed as a singleton natural:
data SNat n where
SZ :: SNat Z
SS :: SNat n -> SNat (S n)
(Note that replicateV is a function of two arguments, so it can be either considered a dependent
function of a pair, or a regular function returning a dependent function.)

Dependent product of sets


Before we describe the categorical model of dependent functions, it’s instructive to consider
how they work on sets. A dependent function selects one element from each set 𝑇 (𝑥).
You may visualize the totality of this selection as a giant tuple—an element of a cartesian
product. For instance, in the trivial case of 𝐵 a two-element set {1, 2}, a dependent function
type is just a cartesian product 𝑇 (1) × 𝑇 (2). In general, you get one tuple component per every
value of 𝑥. It’s a giant tuple indexed by elements of 𝐵. This is the meaning of the product
notation, Π𝑥∶𝐵 𝑇 (𝑥).
In our example, replicateV picks a particular counted vector for each value of n. Counted
vectors are equivalent to tuples so, for n equal zero, replicateV returns an empty tuple (); for
n = 1 it returns a single value x; for n equal two, it duplicates x returning (x, x); etc.
The function replicateV, evaluated at some x :: a, is equivalent to an infinite tuple of
tuples:
((), 𝑥, (𝑥, 𝑥), (𝑥, 𝑥, 𝑥), ...)
which is a specific element of the type:

((), 𝑎, (𝑎, 𝑎), (𝑎, 𝑎, 𝑎), ...)

Dependent product categorically


In order to build a categorical model of dependent functions, we need to change our perspective
from a family of types to a fibration. We start with a bundle 𝐸∕𝐵 fibered by the projection
𝑝 ∶ 𝐸 → 𝐵. A dependent function is called a section of this bundle.
If you visualize the bundle as a bunch of fibers sticking out from the base 𝐵, a section is like
a haircut: it cuts through each fiber to produce a corresponding value. In physics, such sections
are called fields—with spacetime as the base.
Just like we talked about a function object representing a set of functions, we can talk about
an object 𝑆(𝐸) that represents a set of sections of a given bundle 𝐸.
Just like we defined function application as a mapping out of the product:

𝜀𝐵𝐶 ∶ 𝐶 𝐵 × 𝐵 → 𝐶

we can define the dependent function application as a mapping:

𝜀 ∶ 𝑆(𝐸) × 𝐵 → 𝐸
144 CHAPTER 11. DEPENDENT TYPES

We can visualize it as picking a section 𝑠 in 𝑆(𝐸) and an element 𝑥 of the base 𝐵 and producing
a value in the bundle 𝐸. (In physics, this would correspond to measuring a field at a particular
point in spacetime.)
But this time we have to insist that this value be in the correct fiber. If we project the result
of applying 𝜀 to (𝑠, 𝑥), it should fall back to the 𝑥.

𝑝−1 𝑥
𝐸
𝜀(𝑠, 𝑥)

𝑥 𝐵

In other words, this diagram must commute:


𝜀
𝑆(𝐸) × 𝐵 𝐸

𝜋2 𝑝
𝐵

This makes 𝜀 a morphism in the slice category ∕𝐵.


And just like the exponential object was universal, so is the object of sections. The univer-
sality condition has the same form: For any other object 𝐺 with an arrow 𝜙 ∶ 𝐺 × 𝐵 → 𝐸 there
is a unique arrow 𝜙𝑇 ∶ 𝐺 → 𝑆(𝐸) that makes the following diagram commute:

𝐺×𝐵
𝜙
𝜙𝑇 ×𝐵
𝜀
𝑆(𝐸) × 𝐵 𝐸

The difference is that both 𝜀 and 𝜙 are now morphisms in the slice category ∕𝐵.
The one-to-one correspondence between 𝜙 and 𝜙𝑇 forms the adjunction:
(⟨ ⟩ ⟨ ⟩)
𝐺×𝐵 𝐸
(∕𝐵) , ≅  (𝐺, 𝑆(𝐸))
𝜋2 𝑝
which we can use as the definition of the object of sections 𝑆(𝐸). The counit of this adjunction
is the dependent-function application. We get it by replacing 𝐺 with 𝑆(𝐸) and selecting the
identity morphism on the right. The counit is thus a member of the hom-set:
(⟨ ⟩ ⟨ ⟩)
𝑆(𝐸) × 𝐵 𝐸
(∕𝐵) ,
𝜋2 𝑝
Compare the above adjunction with the currying adjunction that defines the function object
𝐸𝐵:
(𝐺 × 𝐵, 𝐸) ≅ (𝐺, 𝐸 𝐵 )
Now recall that, in 𝐒𝐞𝐭, we interpret the product 𝐺 × 𝐵 as planting copies of 𝐺 as identical
fibers over each element of 𝐵. So a single element of the left-hand side of our adjunction is a
family of functions, one per fiber. Any given 𝑦 ∈ 𝐺 cuts a horizontal slice through 𝐺 ×𝐵. These
11.4. DEPENDENT PRODUCT 145

are the pairs (𝑦, 𝑏) for all 𝑏 ∈ 𝐵. Our family of functions maps this slice to the corresponding
fibers of 𝐸 thus creating a section of 𝐸.

𝐺×𝐵
𝐸

𝐺
𝑦

𝐵 𝐵

The adjunction tells us that this family of mappings uniquely determines a function from 𝐺
to 𝑆(𝐸). Every 𝑦 ∈ 𝐺 is thus mapped to a different element 𝑠 of 𝑆(𝐸). Therefore elements of
𝑆(𝐸) are in one-to-one correspondence with sections of 𝐸 .
These are all set-theoretical intuitions. We can generalize them by first noticing that the
right hand side of the adjunction can be easily expressed as a hom-set in the slice category ∕1
over the terminal object.
Indeed, there is one-to-one correspondence between objects 𝑋 in  and objects ⟨𝑋, !⟩ in
∕1 (here ! is the unique arrow to the terminal object). Arrows in ∕1 are arrows of  with no
additional constraints. We therefore have:
(⟨ ⟩ ⟨ ⟩) (⟨ ⟩ ⟨ ⟩)
𝐺×𝐵 𝐸 𝐺 𝑆(𝐸)
(∕𝐵) , ≅ (∕1) ,
𝜋2 𝑝 ! !

Adding the atlas


The next step is to “blur the focus” by replacing the terminal object with a more general base 𝐴,
serving as the atlas.
The right-hand side of the adjunction becomes a hom-set in the slice category ∕𝐴. 𝐺 itself
gets coarsely fibrated by some 𝑞 ∶ 𝐺 → 𝐴.
Remember that 𝐺 × 𝐵 can be understood as a pullback along the mapping ! ∶ 𝐵 → 1, or a
change of base from 1 to 𝐵. If we want to replace 1 with 𝐴, we should replace the product 𝐺 × 𝐵
with a more general pullback of 𝑞. Such a change of base is parameterized by a new morphism
𝑓 ∶ 𝐵 → 𝐴.

𝜋1 𝑔
𝐺×𝐵 𝐺 𝑓 ∗𝐺 𝐺
⌟ ⌟
𝜋2 ! 𝑓 ∗𝑞 𝑞
! 𝑓
𝐵 1 𝐵 𝐴

The result is that, instead of a bunch of 𝐺 fibers over 𝐵, we get a pullback 𝑓 ∗ 𝐺 that is
populated by groups of fibers from the fibration 𝑞 ∶ 𝐺 → 𝐴. This way 𝐴 serves as an atlas that
enumerates all the patches populated by uniform fibers.
Imagine, for instance, that 𝐴 is a two-element set. The fibration 𝑞 will split 𝐺 into two fibers.
They will serve as our generic fibers. These fibers are now replanted over the two patches in 𝐵
to form 𝑓 ∗ 𝐺. The replanting is guided by 𝑓 −1 .
146 CHAPTER 11. DEPENDENT TYPES

𝐺 𝑓 ∗𝐺 𝐸

𝐴 𝐵 𝐵
𝑓

The adjunction that defines the dependent function type is therefore:


( ⟨ ⟩ ⟨ ⟩) (⟨ ⟩ ⟨ ⟩)
∗ 𝐺 𝐸 𝐺 𝐸
(∕𝐵) 𝑓 , ≅ (∕𝐴) , Π𝑓
𝑞 𝑝 𝑞 𝑝
This is a generalization of an adjunction that we used to define the object of sections 𝑆(𝐸). This
one defines a new object Π𝑓 𝐸 that is a rearrangement of the object of sections.
The adjunction is a mapping between morphisms in their respective slice categories:

𝜙 𝜙𝑇
𝑓 ∗𝐺 𝐸 𝐺 Π𝑓 𝐸

𝑓 ∗𝑞 𝑝 𝑞
Π𝑓 𝑝
𝐵 𝐴
To gain some intuition into this adjunction, let’s consider how it works on sets.

• The right hand side operates in a coarsely grained fibration over the atlas 𝐴. It is a family
of functions, one function per patch. For every patch we get a function from the “thick
fiber” of 𝐺 (drawn in blue below) to the “thick fiber” of Π𝑓 𝐸 (not shown).
• The left hand side operates in a more finely grained fibration over 𝐵. These fibers are
grouped into small bundles over patches. Once we pick a patch (drawn in red below),
we get a family of functions from that patch to the corresponding patch in 𝐸 (drawn in
green)—a section of a small bundle in 𝐸. So, patch-by-patch, we get small sections of 𝐸.

The adjunction tells us that the elements of the “thick fiber” of Π𝑓 𝐸 correspond to small sections
of 𝐸 over the same patch.

𝐸 𝐺

𝐵 𝑝 𝑞
𝑓 𝐴

In category theory, the dependent product, which is the right adjoint to the base change
functor 𝑓 ∗ , is denoted by 𝑓∗ . For a given 𝑓 ∶ 𝑏 → 𝑎, it’s a functor:
𝑓∗ ∶ ∕𝑏 → ∕𝑎
11.5. EQUALITY 147

The following exercises shed some light on the role played by 𝑓 . It can be seen as localizing
the sections of 𝐸 by restricting them to “neighborhoods” defined by 𝑓 −1 .

Exercise 11.4.1. Consider what happens when 𝐴 is a two-element set {0, 1} and 𝑓 maps the
whole of 𝐵 to one element, say 1. How would you define the function on the right-hand side of
the adjunction? What should it do to the fiber over 0?

Exercise 11.4.2. Let’s pick 𝐺 to be a singleton set 1, and let 𝑥 ∶ 1 → 𝐴 be a fibration that selects
an element in 𝐴. Using the adjunction, show that:

• 𝑓 ∗ 1 has two types of fibers: singletons over the elements of 𝑓 −1 (𝑥) and empty sets other-
wise.

• A mapping 𝜙 ∶ 𝑓 ∗ 1 → 𝐸 is equivalent to a selection of elements, one from each fiber of


𝐸 over the elements of 𝑓 −1 (𝑥). In other words, it’s a partial section of 𝐸 over the subset
𝑓 −1 (𝑥) of 𝐵.

• A fiber of Π𝑓 𝐸 over a given 𝑥 is such a partial section.

• What happens when 𝐴 is also a singleton set?

Universal quantification
The logical interpretation of the dependent product Π𝑥∶𝐵 𝑇 (𝑥) is a universally quantified propo-
sition. An element of Π𝑥∶𝐵 𝑇 (𝑥) is a section—the proof that it’s possible to select an element
from each member of the family 𝑇 (𝑥). It means that none of them is empty. In other words, it’s
a proof of the proposition:
∀𝑥∶𝐵 𝑇 (𝑥)

11.5 Equality
Our first experience in mathematics involves equality. We learn that

1+1=2

and we don’t think much of it afterwards.


But what does it mean that 1 + 1 is equal to 2? Two is a number, but one plus one is an
expression, so they are not the same thing. There is some mental processing that we have to
perform before we pronounce these two things equal.
Contrast this with the statement 0 = 0, in which both sides of equality are the same thing.
It makes sense that, if we are to define equality, we’ll have to at least make sure that every-
thing is equal to itself. We call this property reflexivity.
Recall our definition of natural numbers:
data Nat where
Z :: Nat
S :: Nat -> Nat
This is how we can define equality for natural numbers:
148 CHAPTER 11. DEPENDENT TYPES

equal :: Nat -> Nat -> Bool


equal Z Z = True
equal (S m) (S n) = equal m n
equal _ _ = False
We are recursively stripping 𝑆’s in each number until one of them reaches 𝑍. If the other reaches
𝑍 at the same time, we pronounce the numbers we started with to be equal, otherwise they are
not.

Equational reasoning
Notice that, when defining equality in Haskell, we were already using the equal sign. For in-
stance, the equal sign in:
equal Z Z = True
tells us that wherever we see the expression equal Z Z we can replace it with True and vice
versa.
This is the principle of substituting equals for equals, which is the basis for equational rea-
soning in Haskell. We can’t encode proofs of equality directly in Haskell, but we can use equa-
tional reasoning to reason about Haskell programs. This is one of the main advantages of pure
functional programming. You can’t perform such substitutions in imperative languages, because
of side effects.
If we want to prove that 1 + 1 is 2, we have to first define addition. The definition can either
be recursive in the first or in the second argument. This one recurses in the second argument:
add :: Nat -> Nat -> Nat
add n Z = n
add n (S m) = S (add n m)
We encode 1 + 1 as:
add (S Z) (S Z)
We can now use the definition of add to simplify this expression. We try to match the first
clause, and we fail, because S Z is not the same as Z. But the second clause matches. In it, n is
an arbitrary number, so we can substitute S Z for it, and get:
add (S Z) (S Z) = S (add (S Z) Z)
In this expression we can perform another substitution of equals using the first clause of the
definition of add (again, with n replaced by S Z):
add (S Z) Z = (S Z)
We arrive at:
add (S Z) (S Z) = S (S Z)
We can clearly see that the right-hand side is the encoding of 2. But we haven’t shown that our
definition of equality is reflexive so, in principle, we don’t know if
eq (S (S Z)) (S (S Z))
yields True. We have to use step-by-step equational reasoning again:
equal (S (S Z) (S (S Z)) =
{- second clause of the definition of equal -}
11.5. EQUALITY 149

equal (S Z) (S Z) =
{- second clause of the definition of equal -}
equal Z Z =
{- first clause of the definition of equal -}
True
We can use this kind of reasoning to prove statements about concrete numbers, but we run
into problems when reasoning about generic numbers—for instance, showing that something is
true for all n. Using our definition of addition, we can easily show that add n Z is the same
as n. But we can’t prove that add Z n is the same as n. The latter proof requires the use of
induction.
We end up distinguishing between two kinds of equality. One is proven using substitutions,
or rewriting rules, and is called definitional equality. You can think of it as macro expansion or
inline expansion in programming languages. It also involves 𝛽-reductions: performing function
application by replacing formal parameters by actual arguments, as in:
(\x -> x + x) 2 =
{- beta reduction -}
2 + 2
The second more interesting kind of equality is called propositional equality and it may
require actual proofs.

Equality vs isomorphism
We said that category theorists prefer isomorphism over equality—at least when it comes to
objects. It is true that, within the confines of a category, there is no way to differentiate between
isomorphic objects. In general, though, equality is stronger than isomorphism. This is a prob-
lem, because it’s very convenient to be able to substitute equals for equals, but it’s not always
clear that one can substitute isomorphic for isomorphic.
Mathematicians have been struggling with this problem, mostly trying to modify the def-
inition of isomorphism—but a real breakthrough came when they decided to simultaneously
weaken the definition of equality. This led to the development of homotopy type theory, or
HoTT for short.
Roughly speaking, in type theory, specifically in Martin-Löf theory of dependent types,
equality is represented as a type, and in order to prove equality one has to construct an element
of that type—in the spirit of the Curry-Howard interpretation.
Furthermore, in HoTT, the proofs themselves can be compared for equality, and so on ad
infinitum. You can picture this by considering proofs of equality not as points but as some
abstract paths that can be morphed into each other; hence the language of homotopies.
In this setting, instead of isomorphism, which involves strict equalities of arrows:

𝑓 ◦𝑔 = 𝑖𝑑

𝑔◦𝑓 = 𝑖𝑑
one defines an equivalence, in which these equalities are treated as types.
The main idea of HoTT is that one can impose the univalence axiom which, roughly speak-
ing, states that equalities are equivalent to equivalences, or symbolically:

(𝐴 = 𝐵) ≅ (𝐴 ≅ 𝐵)
150 CHAPTER 11. DEPENDENT TYPES

Notice that this is an axiom, not a theorem. We can either take it or leave it and the theory is
still valid (at least we think so).

Equality types
Suppose that you want to compare two terms for equality. The first requirement is that both
terms be of the same type. You can’t compare apples with oranges. Don’t get confused by some
programming languages allowing comparisons of unlike terms: in every such case there is an
implicit conversion involved, and the final equality is always between same-type values.
For every pair of values there is, in principle, a separate type of proofs of equality. There
is a type for 0 = 0, there is a type for 1 = 1, and there is a type for 1 = 0; the latter hopefully
uninhabited.
Equality type, a.k.a., identity type, is therefore a dependent type: it depends on the two
values that we are comparing. It’s usually written as 𝐼𝑑 𝐴 , where 𝐴 is the type of both values,
or using an infix notation as 𝑥 =𝐴 𝑦 (equal sign with the subscript 𝐴).
For instance, the type of equality of two zeros is written as 𝐼𝑑 ℕ (0, 0) or:

0 =ℕ 0

Notice: this is not a statement or a term. It’s a type, like Int or Bool. You can define a value of
this type if you have an introduction rule for it.

Introduction rule
The introduction rule for the equality type is the dependent function:

𝑟𝑒𝑓 𝑙𝐴 ∶ Π𝑥∶𝐴 𝐼𝑑 𝐴 (𝑥, 𝑥)

which can be interpreted in the spirit of propositions as types as the proof of the statement:

∀𝑥∶𝐴 𝑥 = 𝑥

This is the familiar reflexivity: it shows that, for all 𝑥 of type 𝐴, 𝑥 is equal to itself. You can
apply this function to some concrete value 𝑥 of type 𝐴, and it will produce a new value of type
𝐼𝑑 𝐴 (𝑥, 𝑥).
We can now prove that 0 = 0. We can execute 𝑟𝑒𝑓 𝑙ℕ (0) to get a value of the type 0 =ℕ 0.
This value is the proof that the type is inhabited, and therefore corresponds to a true proposition.
This is the only introduction rule for equality, so you might think that all proofs of equality
boil down to “they are equal because they are the same.” Surprisingly, this is not the case.

𝛽-reduction and 𝜂-conversion


In type theory we have this interplay of introduction and elimination rules that essentially makes
them the inverse of each other.
Consider the definition of a product. We introduce it by providing two values, 𝑥 ∶ 𝐴 and
𝑦 ∶ 𝐵 and we get a value 𝑝 ∶ 𝐴 × 𝐵. We can then eliminate it by extracting two values using two
projections. But how do we know if these are the same values that we used to construct it? This
is something that we have to postulate. We call it the computation rule or the 𝛽-reduction rule.
Conversely, if we are given a value 𝑝 ∶ 𝐴 × 𝐵, we can extract the two components using
projections, and then use the introduction rule to recompose it. But how do we know that we’ll
11.5. EQUALITY 151

get the same 𝑝? This too has to be postulated. This is sometimes called the uniqueness condition,
or the 𝜂-conversion rule.
In the categorical model of type theory these two rules follow from the universal construc-
tion.
The equality type also has the elimination rule, which we’ll discuss shortly, but we don’t
impose the uniqueness condition. It means that it’s possible that there are some equality proofs
that were not obtained using 𝑟𝑒𝑓 𝑙.
This is exactly the weakening of the notion of equality that makes HoTT interesting to math-
ematicians.

Induction principle for natural numbers


Before formulating the elimination rule for equality, it’s instructive to first discuss a simpler
elimination rule for natural numbers. We’ve already seen such rule describing primitive re-
cursion. It allowed us to define recursive functions by specifying a value 𝑖𝑛𝑖𝑡 and a function
𝑠𝑡𝑒𝑝.
Using dependent types, we can generalize this rule to define the dependent elimination rule
that is equivalent to the principle of mathematical induction.
The principle of induction can be described as a device to prove, in one fell swoop, whole
families of propositions indexed by natural numbers. For instance, the statement that add Z n
is equal to n is really an infinite number of propositions, one per each value of n.
We could, in principle, write a program that would meticulously verify this statement for
a very large number of cases, but we’d never be sure if it holds in general. There are some
conjectures about natural numbers that have been tested this way using computers but, obviously,
they can never exhaust an infinite set of cases.
Roughly speaking, we can divide all mathematical theorems into two groups: the ones that
can be easily formulated and the ones whose formulation is complex. They can be further
subdivided into the ones whose proofs are simple, and the ones that are hard or impossible to
prove. For instance, the famous Fermat’s Last Theorem was extremely easy to formulate, but
its proof required some massively complex mathematical machinery.
Here, we are interested in theorems about natural numbers that are both easy to formulate
and easy to prove. We’ll assume that we know how to generate a family of propositions or,
equivalently, a dependent type 𝑇 (𝑛), where 𝑛 is a natural number.
We’ll also assume that we have a value:
𝑖𝑛𝑖𝑡 ∶ 𝑇 (𝑍)
or, equivalently, the proof of the zeroth proposition; and a dependent function:
𝑠𝑡𝑒𝑝 ∶ Π𝑛∶ℕ (𝑇 (𝑛) → 𝑇 (𝑆𝑛))
This function is interpreted as generating a proof of the (𝑛 + 1)st proposition from the proof of
the 𝑛th proposition.
The dependent elimination rule for natural numbers postulates that, given such 𝑖𝑛𝑖𝑡 and 𝑠𝑡𝑒𝑝,
there exists a dependent function:
𝑓 ∶ Π𝑛∶ℕ 𝑇 (𝑛)
This function is interpreted as providing the proof that 𝑇 (𝑛) is true for all 𝑛.
Moreover, this function, when applied to zero reproduces 𝑖𝑛𝑖𝑡:
𝑓 (𝑍) = 𝑖𝑛𝑖𝑡
152 CHAPTER 11. DEPENDENT TYPES

and, when applied to the successor of 𝑛, is consistent with taking a 𝑠𝑡𝑒𝑝:

𝑓 (𝑆𝑛) = (𝑠𝑡𝑒𝑝(𝑛))(𝑓 (𝑛))

(Here, 𝑠𝑡𝑒𝑝(𝑛) produces a function, which is then applied to the value 𝑓 (𝑛).) These are the two
computation rules for natural numbers.
Notice that the induction principle is not a theorem about natural numbers. It’s part of the
definition of the type of natural numbers.
Not all dependent mappings out of natural numbers can be decomposed into 𝑖𝑛𝑖𝑡 and 𝑠𝑡𝑒𝑝,
just as not all theorems about natural numbers can be proven inductively. There is no 𝜂-conversion
rule for natural numbers.

Equality elimination rule


The elimination rule for equality type is somewhat analogous to the induction principle for
natural numbers. There we used 𝑖𝑛𝑖𝑡 to ground ourselves at the start of the journey, and 𝑠𝑡𝑒𝑝
to make progress. The elimination rule for equality requires a more powerful grounding, but it
doesn’t have a 𝑠𝑡𝑒𝑝. There really is no good analogy for how it works, other than through a leap
of faith.
The idea is that we want to construct a mapping out of the equality type. But since equality
type is itself a two-parameter family of types, the mapping out should be a dependent function.
The target of this function is another family of types:

𝑇 (𝑥, 𝑦, 𝑝)

that depends on the pair of values that are being compared 𝑥, 𝑦 ∶ 𝐴, and the proof of equality
𝑝 ∶ 𝐼𝑑(𝑥, 𝑦).
The function we are trying to construct is:

𝑓 ∶ Π𝑥,𝑦∶𝐴 Π𝑝∶𝐼𝑑(𝑥,𝑦) 𝑇 (𝑥, 𝑦, 𝑝)

It’s convenient to think of it as generating a proof that for all points 𝑥 and 𝑦, and for every
proof that the two are equal, the proposition 𝑇 (𝑥, 𝑦, 𝑝) is true. Notice that, potentially, we have
a different proposition for every proof that the two points are equal.
The least that we can demand from 𝑇 (𝑥, 𝑦, 𝑝) is that it should be true when 𝑥 and 𝑦 are
literally the same, and the equality proof is the obvious 𝑟𝑒𝑓 𝑙. This requirement can be expressed
as a dependent function:
𝑡 ∶ Π𝑥∶𝐴 𝑇 (𝑥, 𝑥, 𝑟𝑒𝑓 𝑙(𝑥))

Notice that we are not even considering proofs of 𝑥 = 𝑥, other than those given by reflexivity.
Do such proofs exist? We don’t know and we don’t care.
So this is our grounding, the starting point of a journey that should lead us to defining our
𝑓 for all pairs of points and all proofs of equality. The intuition is that we are defining 𝑓 as a
function on a plane (𝑥, 𝑦), with a third dimension given by 𝑝. To do that, we’re given something
that’s defined on the diagonal (𝑥, 𝑥), with 𝑝 restricted to 𝑟𝑒𝑓 𝑙.
11.5. EQUALITY 153

𝑝 ∶ 𝐼𝑑(𝑥, 𝑦)
𝑦

𝑟𝑒𝑓 𝑙

You would think that we need something more, some kind of a 𝑠𝑡𝑒𝑝 that would move us from
one point to another. But, unlike with natural numbers, there is no next point or next equality
proof to jump to. All we have at our disposal is the function 𝑡 and nothing else.
Therefore we postulate that, given a type family 𝑇 (𝑥, 𝑦, 𝑝) and a function:

𝑡 ∶ Π𝑥∶𝐴 𝑇 (𝑥, 𝑥, 𝑟𝑒𝑓 𝑙(𝑥))

there exists a function:


𝑓 ∶ Π𝑥,𝑦∶𝐴 Π𝑝∶𝐼𝑑(𝑥,𝑦) 𝑇 (𝑥, 𝑦, 𝑝)
such that (computation rule):
𝑓 (𝑥, 𝑥, 𝑟𝑒𝑓 𝑙(𝑥)) = 𝑡(𝑥)
Notice that the equality in the computation rule is definitional equality, not a type.
Equality elimination tells us that it’s always possible to extend the function 𝑡, which is de-
fined on the diagonal, to the whole 3-d space.
This is a very strong postulate. One way to understand it is to argue that, within the frame-
work of type theory—which is formulated using the language of introduction and elimination
rules, and the rules for manipulating those—it’s impossible to define a type family 𝑇 (𝑥, 𝑦, 𝑝)
that would not satisfy the equality elimination rule.
The closest analogy that we’ve seen so far is the result of parametricity, which states that,
in Haskell, all polymorphic functions between endofunctors are automatically natural transfor-
mations. Another example, this time from calculus, is that any analytic function defined on the
real axis has a unique extension to the whole complex plane.
The use of dependent types blurs the boundary between programming and mathematics.
There is a whole spectrum of languages, starting with Haskell barely dipping its toes in depen-
dent types while still firmly established in commercial usage, all the way to theorem provers,
which are helping mathematicians formalize mathematical proofs.
Chapter 12

Algebras

The essence of algebra is the formal manipulation of expressions. But what are expressions,
and how do we manipulate them?
The first things to observe about algebraic expressions like 2(𝑥 + 𝑦) or 𝑎𝑥2 + 𝑏𝑥 + 𝑐 is
that there are infinitely many of them. There is a finite number of rules for making them, but
these rules can be used in infinitely many combinations. This suggests that the rules are used
recursively.
In programming, expressions are virtually synonymous to (parsing) trees. Consider this
simple example of an arithmetic expression:
data Expr = Val Int
| Plus Expr Expr

It’s a recipe for building trees. We start with little trees using the Val constructor. We then plant
these seedlings into nodes, and so on.
e2 = Val 2
e3 = Val 3
e5 = Plus e2 e3
e7 = Plus e5 e2

Such recursive definitions work perfectly well in a programming language. The problem is
that every new recursive data structure would require its own library of functions that operate
on it.
From type-theory point of view, we’ve been able to define recursive types, such as natural
numbers or lists, by providing, in each case, specific introduction and elimination rules. What
we need is something more general, a procedure for generating arbitrary recursive types from
simpler pluggable components.
There are two orthogonal concerns when it comes to recursive data structures. One is the
machinery of recursion. The other is the pluggable components.
We know how to work with recursion: We assume that we know how to construct small
trees. We then use the recursive step to plant those trees into nodes to make bigger trees.
Category theory tells us how to formalize this imprecise description.

155
156 CHAPTER 12. ALGEBRAS

12.1 Algebras from Endofunctors


The idea of planting smaller trees into nodes requires that we formalize what it means to have a
data structure with holes—a “container for stuff.” This is exactly what functors are for. Because
we want to use these functors recursively, they have to be endo-functors.
For instance, the endofunctor from our earlier example would be defined by the following
data structure, where x’s mark the spots:
data ExprF x = ValF Int
| PlusF x x
Information about all possible shapes of expressions is abstracted into one single functor.
The other important piece of information when defining an algebra is the recipe for evalu-
ating expressions. This, too, can be encoded using the same endofunctor.
Thinking recursively, let’s assume that we know how to evaluate all subtrees of a larger
expression. Then the remaining step is to plug these results into the top level node and evaluate
it.
For instance, suppose that the x’s in the functor were replaced by integers—the results of
evaluation of the subtrees. It’s pretty obvious what we should do in the last step. If the top of
our tree is just a leaf ValF (which means there were no subtrees to evaluate) we’ll just return
the integer stored in it. If it’s a PlusF node, we’ll add the two integers in it. This recipe can be
encoded as:
eval :: ExprF Int -> Int
eval (ValF n) = n
eval (PlusF m n) = m + n
We have made some seemingly obvious assumptions based on common sense. For instance,
since the node was called PlusF we assumed that we should add the two numbers. But multi-
plication or subtraction would work equally well.
Since the leaf ValF contained an integer, we assumed that the expression should evaluate
to an integer. But there is an equally plausible evaluator that pretty-prints the expression by
converting it to a string. This evaluator uses concatenation instead of addition:
pretty :: ExprF String -> String
pretty (ValF n) = show n
pretty (PlusF s t) = s ++ " + " ++ t

In fact there are infinitely many evaluators, some sensible, others less so, but we shouldn’t
be judgmental. Any choice of the target type and any choice of the evaluator should be equally
valid. This leads to the following definition:
An algebra for an endofunctor 𝐹 is a pair (𝑐, 𝛼). The object 𝑐 is called the carrier of the
algebra, and the evaluator 𝛼 ∶ 𝐹 𝑐 → 𝑐 is called the structure map.
In Haskell, given the functor f we define:
type Algebra f c = f c -> c

Notice that the evaluator is not a polymorphic function. It’s a specific choice of a function
for a specific type c. There may be many choices of the carrier types and there be many different
evaluators for a given type. They all define separate algebras.
We have previously defined two algebras for ExprF. This one has Int as a carrier:
12.2. CATEGORY OF ALGEBRAS 157

eval :: Algebra ExprF Int


eval (ValF n) = n
eval (PlusF m n) = m + n
and this one has String as a carrier:
pretty :: Algebra ExprF String
pretty (ValF n) = show n
pretty (PlusF s t) = s ++ " + " ++ t

12.2 Category of Algebras


Algebras for a given endofunctor 𝐹 form a category. An arrow in that category is an algebra
morphism, which is a structure-preserving arrow between their carrier objects.
Preserving structure in this case means that the arrow must commute with the two structure
maps. This is where functoriality comes into play. To switch from one structure map to another,
we have to be able to lift an arrow that goes between their carriers.
Given an endofunctor 𝐹 , an algebra morphism between two algebras (𝑎, 𝛼) and (𝑏, 𝛽) is an
arrow 𝑓 ∶ 𝑎 → 𝑏 that makes this diagram commute:

𝐹𝑓
𝐹𝑎 𝐹𝑏
𝛼 𝛽
𝑓
𝑎 𝑏

In other words, the following equation must hold:

𝑓 ◦𝛼 = 𝛽◦𝐹 𝑓

The composition of two algebra morphisms is again an algebra morphism, which can be
seen by pasting together two such diagrams (a functor maps composition to composition). The
identity arrow is also an algebra morphism, because

𝑖𝑑𝑎 ◦𝛼 = 𝛼◦𝐹 (𝑖𝑑𝑎 )

(a functor maps identity to identity).


The commuting condition in the definition of an algebra morphism is very restrictive. Con-
sider for instance a function that maps an integer to a string. In Haskell there is a show function
(actually, a method of the Show class) that does it. It is not an algebra morphism from eval to
pretty.

Exercise 12.2.1. Show that show is not an algebra morphism. Hint: Consider what happens to
the PlusF node.

Initial algebra
The initial object in the category of algebras for a given functor 𝐹 is called the initial algebra
and, as we’ll see, it plays a very important role.
By definition, the initial algebra (𝑖, 𝜄) has a unique algebra morphism 𝑓 from it to any other
algebra (𝑎, 𝛼). Diagrammatically:
158 CHAPTER 12. ALGEBRAS

𝐹𝑓
𝐹𝑖 𝐹𝑎
𝜄 𝛼
𝑓
𝑖 𝑎
This unique morphism is called a catamorphism for the algebra (𝑎, 𝛼). It is sometimes written
using “banana brackets” as ⦇𝛼⦈.

Exercise 12.2.2. Let’s define two algebras for the following functor:
data FloatF x = Num Float | Op x x
The first algebra:
addAlg :: Algebra FloatF Float
addAlg (Num x) = log x
addAlg (Op x y) = x + y
The second algebra:
mulAlg :: Algebra FloatF Float
mulAlg (Num x) = x
mulAlg (Op x y) = x * y
Make a convincing argument that log (logarithm) is an algebra morphism between these two.
(Float is a built-in floating-point number type.)

12.3 Lambek’s Lemma and Fixed Points


Lambek’s lemma says that the structure map 𝜄 of the initial algebra is an isomorphism.
The reason for it is the self-similarity of algebras. You can lift any algebra (𝑎, 𝛼) using 𝐹 ,
and the result (𝐹 𝑎, 𝐹 𝛼) is also an algebra with the structure map 𝐹 𝛼 ∶ 𝐹 (𝐹 𝑎) → 𝐹 𝑎.
In particular, if you lift the initial algebra (𝑖, 𝜄), you get a new algebra with the carrier 𝐹 𝑖
and the structure map 𝐹 𝜄 ∶ 𝐹 (𝐹 𝑖) → 𝐹 𝑖. It follows then that there must be a unique algebra
morphism from the initial algebra to it:

𝐹ℎ
𝐹𝑖 𝐹 (𝐹 𝑖)
𝜄 𝐹𝜄

𝑖 𝐹𝑖

This ℎ is the inverse of 𝜄. To see that, let’s consider the composition 𝜄◦ℎ. It is the arrow at the
bottom of the following diagram

𝐹ℎ 𝐹𝜄
𝐹𝑖 𝐹 (𝐹 𝑖) 𝐹𝑖
𝜄 𝐹𝜄 𝜄
ℎ 𝜄
𝑖 𝐹𝑖 𝑖

This is a pasting of the original diagram with a trivially commuting diagram. Therefore the
whole rectangle commutes. We can interpret this as 𝜄◦ℎ being an algebra morphism from (𝑖, 𝜄)
12.3. LAMBEK’S LEMMA AND FIXED POINTS 159

to itself. But there already is such an algebra morphism—the identity. So, by uniqueness of the
mapping out from the initial algebra, these two must be equal:

𝜄◦ℎ = 𝑖𝑑𝑖

Knowing that, we can now go back to the previous diagram, which states that:

ℎ◦𝜄 = 𝐹 𝜄◦𝐹 ℎ

Since 𝐹 is a functor, it maps composition to composition and identity to identity. Therefore the
right hand side is equal to:
𝐹 (𝜄◦ℎ) = 𝐹 (𝑖𝑑𝑖 ) = 𝑖𝑑𝐹 𝑖
We have thus shown that ℎ is the inverse of 𝜄, which means that 𝜄 is an isomorphism. In
other words:
𝐹𝑖 ≅ 𝑖
We interpret this identity as stating that 𝑖 is a fixed point of 𝐹 (up to isomorphism). The action
of 𝐹 on 𝑖 “doesn’t change it.”
There may be many fixed points, but this one is the least fixed point because there is an
algebra morphism from it to any other fixed point. The least fixed point of an endofunctor 𝐹 is
denoted 𝜇𝐹 , so we write:
𝑖 = 𝜇𝐹

Fixed point in Haskell


Let’s consider how the definition of the fixed point works with our original example given by
the endofunctor:
data ExprF x = ValF Int | PlusF x x
Its fixed point is a data structure defined by the property that ExprF acting on it reproduces it.
If we call this fixed point Expr, the fixed point equation becomes (in pseudo-Haskell):
Expr = ExprF Expr
Expanding ExprF we get:
Expr = ValF Int | PlusF Expr Expr
Compare this with the recursive definition (actual Haskell):
data Expr = Val Int | Plus Expr Expr
We get a recursive data structure as a solution to the fixed-point equation.
In Haskell, we can define a fixed point data structure for any functor (or even just a type
constructor). As we’ll see later, this doesn’t always give us the carrier of the initial algebra. It
only works for those functors that have the “leaf” component.
Let’s call Fix f the fixed point of a functor f. Symbolically, the fixed-point equation can
be written as:
𝑓 (Fix𝑓 ) ≅ Fix𝑓
or, in code,
data Fix f where
In :: f (Fix f) -> Fix f
160 CHAPTER 12. ALGEBRAS

The data constructor In is exactly the structure map of the initial algebra whose carrier is Fix f.
Its inverse is:
out :: Fix f -> f (Fix f)
out (In x) = x
The Haskell standard library contains a more idiomatic definition:
newtype Fix f = Fix { unFix :: f (Fix f) }
To create terms of the type Fix f we often use “smart constructors.” For instance, with the
ExprF functor, we would define:
val :: Int -> Fix ExprF
val n = In (ValF n)

plus :: Fix ExprF -> Fix ExprF -> Fix ExprF


plus e1 e2 = In (PlusF e1 e2)
and use it to generate expression trees like this one:
e9 :: Fix ExprF
e9 = plus (plus (val 2) (val 3)) (val 4)

12.4 Catamorphisms
Our goal, as programmers, is to be able to perform a computation over a recursive data structure—
to “fold” it. We now have all the ingredients.
The data structure is defined as a fixed point of a functor. An algebra for this functor defines
the operation we want to perform. We’ve seen the fixed point and the algebra combined in the
following diagram:
𝐹𝑓
𝐹𝑖 𝐹𝑎
𝜄 𝛼
𝑓
𝑖 𝑎
that defines the catamorphism 𝑓 for the algebra (𝑎, 𝛼).
The final piece of information is the Lambek’s lemma, which tells us that 𝜄 could be inverted
because it’s an isomorphism. It means that we can read this diagram as:

𝑓 = 𝛼◦𝐹 𝑓 ◦𝜄−1

and interpret it as a recursive definition of 𝑓 .


Let’s redraw this diagram using Haskell notation. The catamorphism depends on the alge-
bra so, for the algebra with the carrier a and the evaluator alg, we’ll have the catamorphism
cata alg.

fmap (cata alg)


f (Fix f) f a
out alg
cata alg
Fix f a
By simply following the arrows, we get this recursive definition:
12.4. CATAMORPHISMS 161

cata :: Functor f => Algebra f a -> Fix f -> a


cata alg = alg . fmap (cata alg) . out
Here’s what’s happening: We apply this definition to some Fix f. Every Fix f is obtained
by applying In to a functorful of Fix f:
data Fix f where
In :: f (Fix f) -> Fix f
The function out “strips” the data constructor In:
out :: Fix f -> f (Fix f)
out (In x) = x
We can now evaluate the functorful of Fix f by fmap’ing cata alg over it. This is a
recursive application. The idea is that the trees inside the functor are smaller than the original
tree, so the recursion eventually terminates. It terminates when it hits the leaves.
After this step, we are left with a functorful of values, and we apply the evaluator alg to it,
to get the final result.
The power of this approach is that all the recursion is encapsulated in one data type and one
library function: We have the definition of Fix and the catamorphism cata. The client of the
library provides only the non-recursive pieces: the functor and the algebra. These are much
easier to deal with. We have decomposed a complex problem into simpler components.

Examples
We can immediately apply this construction to our earlier examples. You can check that:
cata eval e9
evaluates to 9 and
cata pretty e9
evaluates to the string "2 + 3 + 4".
Sometimes we want to display the tree on multiple lines with indentation. This requires
passing a depth counter to recursive calls. There is a clever trick that uses a function type as a
carrier:
pretty' :: Algebra ExprF (Int -> String)
pretty' (ValF n) i = indent i ++ show n
pretty' (PlusF f g) i = f (i + 1) ++ "\n" ++
indent i ++ "+" ++ "\n" ++
g (i + 1)
The auxiliary function indent replicates the space character:
indent n = replicate (n * 2) ' '
The result of:
cata pretty' e9 0
when printed, looks like this:
2
+
3
162 CHAPTER 12. ALGEBRAS

+
4
Let’s try defining algebras for other familiar functors. The fixed point of the Maybe functor:
data Maybe x = Nothing | Just x
after some renaming, is equivalent to the type of natural numbers
data Nat = Z | S Nat
An algebra for this functor consists of a choice of the carrier a and an evaluator:
alg :: Maybe a -> a
The mapping out of Maybe is determined by two things: the value corresponding to Nothing
and a function a->a corresponding to Just. In our discussion of the type of natural numbers
we called these init and step. We can now see that the elimination rule for Nat is the cata-
morphism for this algebra.

Lists as initial algebras


The list type that we’ve seen previously is equivalent to a fixed point of the following functor,
which is parameterized by the type of the list contents a:
data ListF a x = NilF | ConsF a x
An algebra for this functor is a mapping out:
alg :: ListF a c -> c
alg NifF = init
alg (ConsF a c) = step (a, c)
which is determined by the value init and the function step:
init :: c
step :: (a, c) -> c
A catamorphism for such an algebra is the list recursor:
recList :: c -> ((a, c) -> c) -> (List a -> c)
where (List a) can be identified with the fixed point Fix (ListF a).
We’ve seen before a recursive function that reverses a list. It was implemented by appending
elements to the end of a list, which is very inefficient. It’s easy to rewrite this function using a
catamorphism, but the problem remains.
Prepending elements, on the other hand, is cheap. A better algorithm would traverse the list,
accumulating elements in a first-in-first-out queue, and then pop them one-by-one and prepend
them to a new list.
The queue regimen can be implemented using composition of closures: each closure is a
function that remembers its environment. Here’s the algebra whose carrier is a function type:
revAlg :: Algebra (ListF a) ([a]->[a])
revAlg NilF = id
revAlg (ConsF a f) = \as -> f (a : as)
At each step, this algebra creates a new function. This function, when executed, will apply
the previous function f to a list, which is the result of prepending the current element a to the
12.5. INITIAL ALGEBRA FROM UNIVERSALITY 163

function’s argument as. The resulting closure remembers the current element a and the previous
function f.
The catamorphism for this algebra accumulates a queue of such closures. To reverse a list,
we apply the result of the catamorphism for this algebra to the empty list:
reverse :: Fix (ListF a) -> [a]
reverse as = (cata revAlg as) []
This trick is at the core of the fold-left function, foldl. Care should be taken when using it,
because of the danger of stack overflow.
Lists are so common that their eliminators (called “folds”) are included in the standard li-
brary. But there are infinitely many possible recursive data structures, each generated by its own
functor, and we can use the same catamorphism on all of them.
It’s worth mentioning that the list construction works in any monoidal category with co-
products. We can replace the list functor with the more general:
𝐹𝑥 = 𝐼 + 𝑎 ⊗ 𝑥
where 𝐼 is the unit object and ⊗ is the tensor product. The solution to the fixed point equation:
𝐿𝑎 ≅ 𝐼 + 𝑎 ⊗ 𝐿𝑎
can be formally written as a series:
𝐿𝑎 = 𝐼 + 𝑎 + 𝑎 ⊗ 𝑎 + 𝑎 ⊗ 𝑎 ⊗ 𝑎 + ...
We interpret this as a definition of a list, which can be empty 𝐼, a singleton 𝑎, a two-element list
𝑎 ⊗ 𝑎 and so on.
Incidentally, if you squint hard enough, this solution can be obtained by following a sequence
of formal transformations:
𝐿𝑎 ≅ 𝐼 + 𝑎 ⊗ 𝐿 𝑎
𝐿𝑎 − 𝑎 ⊗ 𝐿 𝑎 ≅ 𝐼
(𝐼 − 𝑎) ⊗ 𝐿𝑎 ≅ 𝐼
𝐿𝑎 ≅ 𝐼∕(𝐼 − 𝑎)
𝐿𝑎 ≅ 𝐼 + 𝑎 + 𝑎 ⊗ 𝑎 + 𝑎 ⊗ 𝑎 ⊗ 𝑎 + ...
where the last step uses the formula for the sum of the geometric series. Admittedly, the inter-
mediate steps make no sense, since there is no subtraction or division defined on objects, yet the
final result make sense and, as we’ll see later, it may be made rigorous by considering a colimit
of a chain of objects.

12.5 Initial Algebra from Universality


Another way of looking at the initial algebra, at least in 𝐒𝐞𝐭, is to view it as a collection of
catamorphisms that, as a whole, hint at the existence of an underlying object. Instead of seeing
𝜇𝐹 as a set of trees, we can look at it as a set of functions from algebras to their carriers.
In a way, this is just another manifestation of the Yoneda lemma: every data structure can
be described either by mappings in or mappings out. The mappings in, in this case, are the
constructors of the recursive data structure. The mappings out are all the catamorphisms that
can be applied to it.
First, let’s make the polymorphism in the definition of cata explicit:
164 CHAPTER 12. ALGEBRAS

cata :: Functor f => forall a. Algebra f a -> Fix f -> a


cata alg = alg . fmap (cata alg) . out
and then flip the arguments. We get:
cata' :: Functor f => Fix f -> forall a. Algebra f a -> a
cata' (In x) = \alg -> alg (fmap (flip cata' alg) x)
The function flip reverses the order of arguments to a function:
flip :: (a -> b -> c) -> (b -> a -> c)
flip f b a = f a b
This gives us a mapping from Fix f to a set of polymorphic functions.
Conversely, given a polymorphic function of the type:
forall a. Algebra f a -> a
we can reconstruct Fix f:
uncata :: Functor f => (forall a. Algebra f a -> a) -> Fix f
uncata alga = alga In
In fact, these two functions, cata' and uncata, are the inverse of each other, establishing the
isomorphism between Fix f and the type of polymorphic functions:
data Mu f = Mu (forall a. Algebra f a -> a)
We can now substitute Mu f everywhere we used Fix f.
Folding over Mu f is easy, since Mu carries in itself its own set of catamorphisms:
cataMu :: Algebra f a -> (Mu f -> a)
cataMu alg (Mu h) = h alg
You might be wondering how one can construct terms of the type Mu f for, let’s say lists. It
can be done using recursion:
fromList :: forall a. [a] -> Mu (ListF a)
fromList as = Mu h
where h :: forall x. Algebra (ListF a) x -> x
h alg = go as
where
go [] = alg NilF
go (n: ns) = alg (ConsF n (go ns))
To compile this code you have to use the language pragma:
{- # language ScopedTypeVariables # -}
which puts the type variable a in the scope of the where clause.

Exercise 12.5.1. Write a test that takes a list of integers, converts it to the Mu form, and calculates
the sum using cataMu.

12.6 Initial Algebra as a Colimit


In general, there is no guarantee that the initial object in the category of algebras exists. But if it
exists, Lambek’s lemma tells us that it’s a fixed point of the endofunctor for those algebras. The
12.6. INITIAL ALGEBRA AS A COLIMIT 165

construction of this fixed point is a little mysterious, since it involves tying the recursive knot.
Loosely speaking, the fixed point is reached after we apply the functor infinitely many times.
Then, applying it once more won’t change anything. Infinity plus one is still infinity. This idea
can be made precise if we take it one step at a time. For simplicity, let’s consider algebras in the
category of sets, which has all the nice properties.
We’ve seen, in our examples, that building instances of recursive data structures always
starts with the leaves. The leaves are the parts in the definition of the functor that don’t depend
on the type parameter: the NilF of the list, the ValF of the tree, the Nothing of the Maybe, etc.
We can tease them out if we apply our functor 𝐹 to the initial object—the empty set 0. Since
the empty set has no elements, the instances of the type 𝐹 0 are leaves only.
Indeed, the only inhabitant of the type Maybe Void is constructed using Nothing. The only
inhabitants of the type ExprF Void are ValF n, where n is an Int.
In other words, 𝐹 0 is the “type of leaves” for the functor 𝐹 . Leaves are trees of depth one.
For the Maybe functor there’s only one. The type of leaves for this functor is a singleton:
m1 :: Maybe Void
m1 = Nothing
In the second iteration, we apply 𝐹 to the leaves from the previous step and get trees of
depth at most two. Their type is 𝐹 (𝐹 0).
For instance, these are all the terms of the type Maybe(Maybe Void):
m2, m2' :: Maybe (Maybe Void)
m2 = Nothing
m2' = Just Nothing
We can continue this process, adding deeper and deeper trees at each step. In the 𝑛-th
iteration, the type 𝐹 𝑛 0 (𝑛-fold application of 𝐹 to the initial object) describes all trees of depth
up to 𝑛. However, for every 𝑛, there are still infinitely many trees of depth greater than 𝑛 that
are not covered.
If we knew how to define 𝐹 ∞ 0, we would have covered all possible trees. The next best
thing that we could try is to add up all those partial trees and construct an infinite sum type. Just
like we have defined sums of two types, we can define sums of many types, including infinitely
many.
An infinite sum (a coproduct):
∑∞
𝐹 𝑛0
𝑛=0

is just like a finite sum, except that it has infinitely many constructors 𝑖𝑛 :

0 𝐹0 𝐹 (𝐹 0) ... 𝐹 𝑛0 ...
𝑖1
𝑖0 𝑖2
𝑖𝑛

∑∞ 𝑛0
𝑛=0 𝐹

It has the universal mapping-out property, just like the sum of two types, only with infinitely
many cases. (Obviously, we can’t express it in Haskell.)
To construct a tree of depth 𝑛, we would first select it from 𝐹 𝑛 0 and use the 𝑛-th constructor
𝑖𝑛 to inject it into the sum.
166 CHAPTER 12. ALGEBRAS

There is just one problem: the same tree shape can also be constructed using any of the
𝐹 𝑚 0, for 𝑚 > 𝑛.
Indeed, we’ve seen the leaf Nothing appear in Maybe Void and Maybe(Maybe Void). In
fact it shows up in any nonzero power of Maybe acting on Void.
Similarly, Just Nothing shows up in all powers starting with two.
Just(Just(Nothing)) shows up in all powers starting with three, and so on...
But there is a way to get rid of all these duplicates. The trick is to replace the sum by a
colimit. Instead of a diagram consisting of discrete objects, we can construct a chain (such
chains are called 𝜔-chains). Let’s call this chain Γ, and its colimit 𝑖:

𝑖 = Colim Γ

¡ 𝐹¡ 𝐹 (𝐹 ¡)
0 𝐹0 𝐹 (𝐹 0) ... 𝐹 𝑛0 ...
𝑖1
𝑖2
𝑖0 𝑖𝑛

𝑖
It’s almost the same as the sum, but with additional arrows at the base of the cocone. These
arrows are the cumulative liftings of the unique arrow ¡ that goes from the initial object to 𝐹 0
(we called it absurd in Haskell). The effect of these arrows is to collapse the set of infinitely
many copies of the same tree down to just one representative.
To see that, consider for instance a tree of depth 3. It can be first found as an element of
𝐹 0, that is to say, as an arrow 𝑡 ∶ 1 → 𝐹 3 0. It is injected into the colimit 𝑖 as the composite
3

𝑖3 ◦𝑡.

... 1
𝑡′
𝑡
𝐹 3 (¡)
... 𝐹 30 𝐹 40
𝑖3
𝑖4
𝑖
The same shape of a tree is also found in 𝐹 4 0, as the composite 𝑡′ = 𝐹 3 (¡)◦𝑡. It is injected into
the colimit as the composite 𝑖4 ◦𝑡′ = 𝑖4 ◦𝐹 3 (¡)◦𝑡.
This time, however, we have the commuting triangle— the face of the cocone:

𝑖4 ◦𝐹 3 (¡) = 𝑖3

which means that:


𝑖4 ◦𝑡′ = 𝑖4 ◦𝐹 3 (¡)◦𝑡 = 𝑖3 ◦𝑡
The two copies of the tree have been identified in the colimit. You can convince yourself that
this procedure removes all duplicates.

The proof
We can prove directly that 𝑖 = Colim Γ is the initial algebra. There is however one assumption
that we have to make: the functor 𝐹 must preserve the colimits of 𝜔-chains. The colimit of 𝐹 Γ
12.6. INITIAL ALGEBRA AS A COLIMIT 167

must be equal to 𝐹 𝑖.
Colim(𝐹 Γ) ≅ 𝐹 𝑖
Fortunately, this assumption holds in 𝐒𝐞𝐭 1 .
Here’s the sketch of the proof: To show the isomorphism, we’ll first construct an arrow
𝑖 → 𝐹 𝑖 and then an arrow 𝜄 ∶ 𝐹 𝑖 → 𝑖 in the opposite direction. We’ll skip the proof that
they are the inverse of each other. Then we’ll show the universality of (𝑖, 𝜄) by constructing a
catamorphism to an arbitrary algebra.
All subsequent proofs follow a simple pattern. We start with a universal cocone that defines
a colimit. Then we construct another cocone based on the same chain. From universality, there
must be a unique arrow from the colimit to the apex of this new cocone.
We use this trick to construct the mapping 𝑖 → 𝐹 𝑖. If we can construct a cocone from
the chain Γ to Colim(𝐹 Γ) then, by universality, there must be an arrow from 𝑖 to Colim(𝐹 Γ).
The latter, by our assumption that 𝐹 preserves colimits, is isomorphic to 𝐹 𝑖. So we’ll have a
mapping 𝑖 → 𝐹 𝑖.
To construct this cocone, first notice that Colim(𝐹 Γ) is, by definition, the apex of a cocone
𝐹 Γ.

𝐹¡ 𝐹 (𝐹 ¡) 𝐹 3¡
𝐹0 𝐹 (𝐹 0) 𝐹 30 ... 𝐹 𝑛0 ...
𝑗2
𝑗3
𝑗1 𝑗𝑛

Colim(𝐹 Γ)

The diagram 𝐹 Γ is the same as Γ, except that it’s missing the naked initial object at the start of
the chain.
The spokes of the cocone we are looking for, from Γ to Colim(𝐹 Γ), are marked in red in the
diagram below:

¡ 𝐹¡ 𝐹 (𝐹 ¡)
0 𝐹0 𝐹 (𝐹 0) ... 𝐹 𝑛+1 0 ...
¡ 𝐹¡ 𝐹 (𝐹 ¡)
𝐹¡ 𝐹 (𝐹 ¡) 𝐹 3¡
𝐹0 𝐹 (𝐹 0) 𝐹 30 ... 𝐹 𝑛0 ...
𝑗2
𝑗3
𝑗1 𝑗𝑛

Colim(𝐹 Γ)

Since 𝑖 = Colim Γ is the apex of the universal cocone based on Γ, there must be a unique
mapping out of it to Colim(𝐹 Γ) which, as we said, was equal to 𝐹 𝑖. This is the mapping we
were looking for:
𝑖 → 𝐹𝑖

Next, notice that the chain 𝐹 Γ is a sub-chain of Γ, so it can be embedded in it. It means that
we can construct a cocone from 𝐹 Γ to the apex 𝑖 by going through (a sub-chain of) Γ (the red
arrows below).
1
This is the consequence of the fact that colimits in 𝐒𝐞𝐭 are built from disjoint unions of sets.
168 CHAPTER 12. ALGEBRAS

𝐹¡ 𝐹 (𝐹 ¡)
𝐹0 𝐹 (𝐹 0) ... 𝐹 𝑛0 ...

¡ 𝐹¡ 𝐹 2¡
0 𝐹0 𝐹 20 ... 𝐹 𝑛0 ...
𝑖1
𝑖2
𝑖0 𝑖𝑛

𝑖
From the universality of the Colim(𝐹 Γ) it follows that there is a mapping out

Colim(𝐹 Γ) → 𝑖

and thus we have the mapping in the other direction:

𝜄∶ 𝐹 𝑖 → 𝑖

This shows that 𝑖 is a carrier of an algebra. In fact, it can be shown that the two mappings are
the inverse of each other, as we would expect from the Lambek’s lemma.
To show that (𝑖, 𝜄) is indeed the initial algebra, we have to construct a mapping out of it to an
arbitrary algebra (𝑎, 𝛼 ∶ 𝐹 𝑎 → 𝑎). Again, we can use universality, as long as we can construct
a cocone from Γ to 𝑎.

¡ 𝐹¡ 𝐹 (𝐹 ¡)
0 𝐹0 𝐹 (𝐹 0) ... 𝐹 𝑛0 ...
𝑓1
𝑓2
𝑓0 𝑓𝑛

𝑎
The zeroth spoke of this cocone goes from 0 to 𝑎, so it’s just 𝑓0 = ¡.
The first spoke, 𝐹 0 → 𝑎, is 𝑓1 = 𝛼◦𝐹 𝑓0 , because 𝐹 𝑓0 ∶ 𝐹 0 → 𝐹 𝑎 and 𝛼 ∶ 𝐹 𝑎 → 𝑎.
The third spoke, 𝐹 (𝐹 0) → 𝑎 is 𝑓2 = 𝛼◦𝐹 𝑓1 . And so on...
The unique mapping from 𝑖 to 𝑎 is then our catamorphism. With some more diagram chas-
ing, it can be shown that it’s indeed an algebra morphism.
Notice that this construction only works if we can “prime” the process by creating the leaves
of the functor. If, on the other hand, 𝐹 0 ≅ 0, then there are no leaves, and all further iterations
will keep reproducing 0.
Chapter 13

Coalgebras

Coalgebras are just algebras in the opposite category. End of chapter!


Well, maybe not... As we’ve seen before, the category in which we’re working is not sym-
metric with respect to duality. In particular, if we compare the terminal and the initial objects,
their properties are not symmetric. Our initial object has no incoming arrows, whereas the ter-
minal one, besides having unique incoming arrows, has lots of outgoing arrows.
Since initial algebras were constructed starting from the initial object, we might expect ter-
minal coalgebras—which are their duals, therefore generated from the terminal object—not to
be just their mirror images, but to add their own interesting twists.
We’ve seen that the main application of algebras was in processing recursive data structures:
in folding them. Dually, the main application of coalgebras is in generating, or unfolding, the
recursive, tree-like, data structures. The unfolding is done using an anamorphism.
We use catamorphisms to chop trees, we use anamorphisms to grow them.
We cannot produce information from nothing so, in general, both a catamorphism and an
anamorphism tend to reduce the amount of information that’s contained in their input.
After you sum a list of integers, it’s impossible to recover the original list.
By the same token, if you grow a recursive data structure using an anamorphism, the seed
must contain all the information that ends up in the tree. You don’t gain new information, but
the advantage is that the information you have is now stored in a form that’s more convenient
for further processing.

13.1 Coalgebras from Endofunctors


A coalgebra for an endofunctor 𝐹 is a pair consisting of a carrier 𝑎 and a structure map: an
arrow 𝑎 → 𝐹 𝑎.
In Haskell, we define:
type Coalgebra f a = a -> f a
We often think of the carrier as the type of a seed from which we grow the data structure, be it
a list or a tree.
For instance, here’s a functor that can be used to create a binary tree, with integers stored at
the nodes:
data TreeF x = LeafF | NodeF Int x x
deriving (Show, Functor)

169
170 CHAPTER 13. COALGEBRAS

We don’t even have to define the instance of Functor for it—the deriving clause tells the com-
piler to generate the canonical one for us (together with the Show instance to allow conversion
to String, if we want to display it).
A coalgebra is a function that takes a seed of the carrier type and produces a functorful of
new seeds. These new seeds can then be used to generate the subtrees, recursively.
Here’s a coalgebra for the functor TreeF that takes a list of integers as a seed:
split :: Coalgebra TreeF [Int]
split [] = LeafF
split (n : ns) = NodeF n left right
where
(left, right) = partition (<= n) ns
If the seed is empty, it generates a leaf; otherwise it creates a new node. This node stores the
head of the list and fills the node with two new seeds. The library function partition splits a
list using a user-defined predicate, here (<= n), less-than-or-equal to n. The result is a pair of
lists: the first one satisfying the predicate; and the second, not.
You can convince yourself that a recursive application of this coalgebra creates a binary
sorted tree. We’ll use this coalgebra later to implement a sort.

13.2 Category of Coalgebras


By analogy with algebra morphisms, we can define coalgebra morphisms as the arrows between
carriers that satisfy a commuting condition.
Given two coalgebras (𝑎, 𝛼) and (𝑏, 𝛽), the arrow 𝑓 ∶ 𝑎 → 𝑏 is a coalgebra morphism if the
following diagram commutes:

𝑓
𝑎 𝑏
𝛼 𝛽
𝐹𝑓
𝐹𝑎 𝐹𝑏
The interpretation is that it doesn’t matter if we first map the carriers and then apply the
coalgebra 𝛽, or first apply the coalgebra 𝛼 and then apply the arrow to its contents, using the
lifting 𝐹 𝑓 .
Coalgebra morphisms can be composed, and the identity arrow is automaticaly a coalgebra
morphism. It’s easy to see that coalgebras, just like algebras, form a category.
This time, however, we are interested in the terminal object in this category—a terminal
coalgebra. If a terminal coalgebra (𝑡, 𝜏) exists, it satisfies the dual of the Lambek’s lemma.

Exercise 13.2.1. Lambek’s lemma: Show that the structure map 𝜏 of the terminal coalgebra
(𝑡, 𝜏) is an isomorphism. Hint: The proof is dual to the one for the initial algebra.

As a consequence of the Lambek’s lemma, the carrier of the terminal algebra is a fixed point
of the endofunctor in question.
𝐹𝑡 ≅ 𝑡
with 𝜏 and 𝜏 −1 serving as the witnesses of this isomorphism.
It also follows that (𝑡, 𝜏 −1 ) is an algebra; just as (𝑖, 𝜄−1 ) is a coalgebra, assuming that (𝑖, 𝜄) is
the initial algebra.
13.3. ANAMORPHISMS 171

We’ve seen before that the carrier of the initial algebra is a fixed point. In principle, there
may be many fixed points for the same endofunctor. The initial algebra is the least fixed point
and the terminal coalgebra the greatest fixed point.
The greatest fixed point of an endofunctor 𝐹 is denoted by 𝜈𝐹 , so we have:
𝑡 = 𝜈𝐹
We can also see that there must be a unique algebra morphism (a catamorphism) from the
initial algebra to the terminal coalgebra. That’s because the terminal coalgebra is also an algebra.
Similarly, there is a unique coalgebra morphism from the initial algebra (which is also a
coalgebra) to the terminal coalgebra. In fact, it can be shown that it’s the same underlying
morphism 𝜌 ∶ 𝜇𝐹 → 𝜈𝐹 in both cases.
In the category of sets, the carrier set of the initial algebra is a subset of the carrier set of
the terminal coalgebra, with the function 𝜌 embedding the former in the latter.

𝜌 𝜈𝐹

𝜇𝐹

We’ll see later that in Haskell the situation is more subtle, because of lazy evaluation. But,
at least for functors that have the leaf component—that is, their action on the initial object is
non-trivial—Haskell’s fixed point type works as a carrier for both the initial algebra and the
terminal coalgebra.
data Fix f where
In :: f (Fix f) -> Fix f

Exercise 13.2.2. Show that, for the identity functor in 𝐒𝐞𝐭, every object is a fixed point, the
empty set is the least fixed point, and the singleton set is the greatest fixed point. Hint: The least
fixed point must have arrows going to all other fixed points, and the greatest fixed point must
have arrows coming from all other fixed points.
Exercise 13.2.3. Show that the empty set is the carrier of the initial algebra for the identity
functor in 𝐒𝐞𝐭. Dually, show that the singleton set is this functor’s terminal coalgebra. Hint:
Show that the unique arrows are indeed (co-) algebra morphisms.

13.3 Anamorphisms
The terminal coalgebra (𝑡, 𝜏) is defined by its universal property: there is a unique coalgebra
morphism ℎ from any coalgebra (𝑎, 𝛼) to (𝑡, 𝜏). This morphism is called the anamorphism.
Being a coalgebra morphism, it makes the following diagram commute:

𝑎 𝑡
𝛼 𝜏
𝐹ℎ
𝐹𝑎 𝐹𝑡
172 CHAPTER 13. COALGEBRAS

Just like with algebras, we can use the Lambek’s lemma to “solve” for h:

ℎ = 𝜏 −1 ◦𝐹 ℎ◦𝛼

The solution is called an anamorphism and is sometimes written using “lens brackets” as [(𝛼)].
Since the terminal coalgebra (just like the initial algebra) is a fixed point of a functor, the
above recursive formula can be translated directly to Haskell as:
ana :: Functor f => Coalgebra f a -> a -> Fix f
ana coa = In . fmap (ana coa) . coa
Here’s the interpretation of this formula: Given a seed of type a, we first act on it with the
coalgebra coa. This gives us a functorful of seeds. We expand these seeds by recursively
applying the anamorphism using fmap. We then apply the constructor In to get the final result.
As an example, we can apply the anamorphism to the split coalgebra we defined earlier:
ana split takes a list of integers and creates a sorted tree.
We can then use a catamorphsims to fold this tree into a sorted list. We define the following
algebra:
toList :: Algebra TreeF [Int]
toList LeafF = []
toList (NodeF n ns ms) = ns ++ [n] ++ ms
It concatenates the left list with the singleton pivot and the right list. To sort a list we combine
the anamorphism with the catamorphism:
qsort = cata toList . ana split
This gives us a (very inefficient) implementation of quicksort. We’ll come back to it in the next
section.

Infinite data structures


When studying algebras we relied on data structures that had a leaf component—that is endo-
functors that, when acting on the initial object, would produce a result different from the initial
object. When constructing recursive data structures we had to start somewhere, and that meant
constructing the leaves first.
With coalgebras, we are free to drop this requirement. We no longer have to construct recur-
sive data structures “by hand”—we have anamorphisms to do that for us. An endofunctor that
has no leaves is perfectly acceptable: its coalgebras are going to generate infinite data structures.
Infinite data structures are representable in Haskell because of its laziness. Things are eval-
uated on the need-to-know basis. Only those parts of an infinite data structure that are explicitly
demanded are calculated; the evaluation of the rest is kept in suspended animation.
To implement infinite data structures in strict languages, one must resort to representing val-
ues as functions—something Haskell does behind the scenes (these functions are called thunks).
Let’s look at a simple example: an infinite stream of values. To generate it, we first define a
functor that looks very much like the one we used to generate lists, except that it lacks the leaf
component (the empty-list constructor). You may recognize it as a product functor, with the
first component fixed to be the stream’s payload:
data StreamF a x = StreamF a x
deriving Functor
An infinite stream is the fixed point of this functor.
13.4. HYLOMORPHISMS 173

type Stream a = Fix (StreamF a)


Here’s a simple coalgebra that uses a single integer n as a seed:
step :: Coalgebra (StreamF Int) Int
step n = StreamF n (n+1)
It stores the current seed as a payload, and seeds the next budding stream with n + 1.
The anamorphism for this coalgebra, when seeded with zero, generates the stream of all
natural numbers.
allNats :: Stream Int
allNats = ana step 0
In a non-lazy language this anamorphism would run forever, but in Haskell it’s instantaneous.
The incremental price is paid only when we want to retrieve some of the data, for instance, using
these accessors:
head :: Stream a -> a
head (In (StreamF a _)) = a

tail :: Stream a -> Stream a


tail (In (StreamF _ s)) = s

13.4 Hylomorphisms
The type of the output of an anamorphism is a fixed point of a functor, which is the same type
as the input to a catamorphism. In Haskell, they are both described by the same data type,
Fix f. Therefore it’s possible to compose them together, as we’ve done when implementing
quicksort. In fact, we can combine a coalgebra with an algebra in one recursive function called
a hylomorphism:
hylo :: Functor f => Algebra f b -> Coalgebra f a -> a -> b
hylo alg coa = alg . fmap (hylo alg coa) . coa
We can rewrite quicksort as a hylomorphism:
qsort = hylo toList split

Notice that there is no trace of the fixed point in the definition of the hylomorphism. Con-
ceptually, the coalgebra is used to build (unfold) the recursive data structure from the seed, and
the algebra is used to fold it into a value of type b. But because of Haskell’s laziness, the in-
termediate data structure doesn’t have to be materialized in full in memory. This is particularly
important when dealing with very large intermediate trees. Only the branches that are currently
being traversed are evaluated and, as soon as they have been processed, they are passed to the
garbage collector.
Hylomorphisms in Haskell are a convenient replacement for recursive backtracking algo-
rithms, which are very hard to implement correctly in imperative languages. We take advantage
of the fact that designing a data structure is easier than following complicated flow of control
and keeping track of our place in a recursive algorithm.
This way, data structures can be used to visualize complex flows of control.
174 CHAPTER 13. COALGEBRAS

The impedance mismatch


We’ve seen that, in the category of sets, the initial algebras don’t necessarily coincide with
terminal coalgebras. The identity functor, for instance, has the empty set as the carrier of the
initial algebra and the singleton set as the carrier of its terminal coalgebra.
We have other functors that have no leaf components, such as the stream functor. The initial
algebra for such a functor is the empty set as well.
In 𝐒𝐞𝐭, the initial algebra is the subset of the terminal coalgebra, and hylomorphisms can only
be defined for this subset. It means that we can use a hylomorphism only if the anamorphism
for a particular coalgebra lands us in this subset. In that case, because the embedding of initial
algebras in terminal coalgebras is injective, we can find the corresponding element in the initial
algebra and apply the catamorphism to it.
In Haskell, however, we have one type, Fix f, combining both, the initial algebra and the
terminal coalgebra. This is where the simplistic interpretation of Haskell types as sets of values
breaks down.
Let’s consider this simple stream algebra:
add :: Algebra (StreamF Int) Int
add (StreamF n sum) = n + sum
Nothing prevents us from using a hylomorphism to calculate the sum of all natural numbers:
sumAllNats :: Int
sumAllNats = hylo add step 1
It’s a perfectly well-formed Haskell program that passes the type checker. So what value does
it produce when we run it? (Hint: It’s not −1∕12.) The answer is: we don’t know, because this
program never terminates. It runs into infinite recursion and eventually exhausts the computer’s
resources.
This is the aspect of real-life computations that mere functions between sets cannot model.
Some computer function may never terminate.
Recursive functions are formally described by domain theory as limits of partially defined
functions. If a function is not defined for a particular value of the argument, it is said to return a
bottom value ⊥. If we include bottoms as special elements of every type (these are then called
lifted types), we can say that our function sumAllNats returns a bottom of the type Int. In
general, catamorphisms for infinite types don’t terminate, so we can treat them as returning
bottoms.
It should be noted, however, that the inclusion of bottoms complicates the categorical inter-
pretation of Haskell. In particular, many of the universal constructions that rely on uniqueness
of mappings no longer work as advertised.
The “bottom” line is that Haskell code should be treated as an illustration of categorical
concepts rather than a source of rigorous proofs.

13.5 Terminal Coalgebra from Universality


The definition of an anamorphism can be seen as an expression of the universal property of the
terminal coalgebra. Here it is, with the universal quantification made explicit:
ana :: Functor f => forall a. Coalgebra f a -> (a -> Fix f)
ana coa = In . fmap (ana coa) . coa
13.5. TERMINAL COALGEBRA FROM UNIVERSALITY 175

What it tells us is that, given any coalgebra, there is a mapping from its carrier to the carrier
of the terminal coalgebra, Fix f. We know, from the Lambek’s lemma, that this mapping is in
fact a coalgebra morphism.
Let’s uncurry this definition:
ana :: Functor f => forall a. (a -> f a, a) -> Fix f
ana (coa, x) = In (fmap (curry ana coa) (coa x))
We can use this formula as the alternative definition of the carrier for the terminal coalgebra.
We can replace Fix f with the type we are defining—let’s call it Nu f. The type signature:
forall a. (a -> f a, a) -> Nu f
tells us that we can construct an element of Nu f from a pair (a -> f a, a). It looks just like
a data constructor, except that it’s polymorphic in a.
Data types with a polymorphic constructor are called existential types. In pseudo-code (not
actual Haskell) we would define Nu f as:
data Nu f = Nu (exists a. (Coalgebra f a, a))
Compare this with the definition of the least fixed point of an algebra:
data Mu f = Mu (forall a. Algebra f a -> a)
To construct an element of an existential type, we have the option of picking the most con-
venient type—the type for which we have the data required by the constructor.
For instance, we can construct a term of the type Nu (StreamF Int) by picking Int as the
convenient type, and providing the pair:
nuArgs :: (Int -> StreamF Int Int, Int)
nuArgs = (\n -> StreamF n (n+1) , 0)
The clients of an existential data type have no idea what type was used in its construction.
All they know is that such a type exists—hence the name. If they want to use an existential type,
they have to do it in a way that is not sensitive to the choice that was made in its construction.
In practice, it means that an existential type must carry with itself both the producer and the
consumer of the hidden value.
This is indeed the case in our example: the producer is just the value of type a, and the
consumer is the function a -> f a.
Naively, all that the clients could do with this pair, without any knowledge of what the type
a was, is to apply the function to the value. But if f is a functor, they can do much more. They
can repeat the process by applying the lifted function to the contents of f a, and so on. They
end up with all the information that’s contained in the infinite stream.
There are several ways of defining existential data types in Haskell. We can use the uncurried
version of the anamorphism directly as the data constructor:
data Nu f where
Nu :: forall a f. (a -> f a, a) -> Nu f
Notice that, in Haskell, if we explicitly quantify one type, all other type variables must also be
quantified: here, it’s the type constructor f (however, Nu f is not existential in f, since it’s an
explicit parameter).
We can also omit the quantification altogether:
data Nu f where
Nu :: (a -> f a, a) -> Nu f
176 CHAPTER 13. COALGEBRAS

This is because type variables that are not arguments to the type constructor are automatically
treated as existentials.
We can also use the more traditional form:
data Nu f = forall a. Nu (a -> f a, a)
(This one requires the quantification of a.)
At the time of this writing there is a proposal to introduce the keyword exists to Haskell
that would make this definition work:
data Nu f = Nu (exists a. (a -> f a, a))
(Later we’ll see that existential data types correspond to coends in category theory.)
The constructor of Nu f is literally the (uncurried) anamorphism:
anaNu :: Coalgebra f a -> a -> Nu f
anaNu coa a = Nu (coa, a)
If we are given a stream in the form of Nu (Stream a), we can access its element using
accessors functions. This one extracts the first element:
head :: Nu (StreamF a) -> a
head (Nu (unf, s)) =
let (StreamF a _) = unf s
in a
and this one advances the stream:
tail :: Nu (StreamF a) -> Nu (StreamF a)
tail (Nu (unf, s)) =
let (StreamF _ s') = unf s
in Nu (unf, s')
You can test them on an infinite stream of integers:
allNats = Nu nuArgs

13.6 Terminal Coalgebra as a Limit


In category theory we are not afraid of infinities—we make sense of them.
At face value, the idea that we could construct a terminal coalgebra by applying the functor
𝐹 infinitely many times to some object, let’s say the terminal object 1, makes no sense. But
the idea is very convincing: Applying 𝐹 one more time is like adding one to infinity—it’s still
infinity. So, naively, this is a fixed point of 𝐹 :

𝐹 (𝐹 ∞ 1) ≅ 𝐹 ∞+1 1 ≅ 𝐹 ∞ 1

To turn this loose reasoning into a rigorous proof, we have to tame the infinity, which means
we have to define some kind of limiting procedure.
As an example, let’s consider the product functor:

𝐹𝑎 𝑥 = 𝑎 × 𝑥

Its terminal coalgebra is an infinite stream. We’ll approximate it by starting with the terminal
object 1. The next step is:
𝐹𝑎 1 = 𝑎 × 1 ≅ 𝑎
13.6. TERMINAL COALGEBRA AS A LIMIT 177

which we could imagine is a stream of length one. We can continue with:

𝐹𝑎 (𝐹𝑎 1) = 𝑎 × (𝑎 × 1) ≅ 𝑎 × 𝑎

a stream of length two, and so on.


This looks promising, but what we need is one object that would combine all these approx-
imations. We need a way to glue the next approximation to the previous one.
Recall, from an earlier exercise, the limit of the “walking arrow” diagram. This limit has
the same elements as the starting object in the diagram. In particular, consider the limit of the
single-arrow diagram 𝐷1 :
𝐿𝑖𝑚𝐷1
𝜋0 𝜋1

1 𝐹1
!

(! is the unique morphism targeting the terminal object 1). This limit has the same elements as
𝐹 1. Similarly, the limit of a two-arrow diagram 𝐷2 :

𝐿𝑖𝑚𝐷2
𝜋0 𝜋2
𝜋1
1 𝐹1 𝐹 (𝐹 1)
! 𝐹!

has the same elements as 𝐹 (𝐹 1).


We can continue extending this diagram to infinity. It turns out that the limit of this infinite
chain is our fixed point carrier of the terminal coalgebra.

𝑡
𝜋0 𝜋𝑛
𝜋2
𝜋1

1 𝐹1 𝐹 (𝐹 1) ... 𝐹 𝑛1 ...
! 𝐹! 𝐹 (𝐹 !) 𝐹 𝑛!

The proof of this fact can be obtained from the analogous proof for initial algebras by reversing
the arrows.
Chapter 14

Monads

What do a wheel, a clay pot, and a wooden house have in common? They are all useful because
of the emptiness in their center.
Lao Tzu says: “The value comes from what is there, but the use comes from what is not
there.”
What does the Maybe functor, the list functor, and the reader functor have in common? They
all have emptiness in their center.
When monads are explained in the context of programming, it’s hard to see the common
pattern when you focus on the functors. To understand monads you have to look inside functors
and read between the lines of code.

14.1 Programming with Side Effects


So far we’ve been talking about programming in terms of computations that were modeled
mainly on functions between sets (with the exception of non-termination). In programming,
such functions are called total and pure.
A total function is defined for all values of its arguments.
A pure function is implemented purely in terms of its arguments and, in case of closures,
the captured values—it has no access to, much less the ability to modify the outside world.
Most real-world programs, though, have to interact with the external world: they read and
write files, process network packets, prompt users for data, etc. Most programming languages
solve this problem by allowing side effect. A side effect is anything that breaks the totality or
the purity of a function.
Unfortunately, this shotgun approach adopted by imperative languages makes reasoning
about programs extremely hard. When composing effectful computations one has to carefully
reason about the composition of effects on a case-by-case basis. To make things even harder,
most effects are hidden not only inside the implementation (as opposed to the interface) of a par-
ticular function, but also in the implementation of all the functions that it’s calling, recursively.
The solution adopted by purely functional languages, like Haskell, is to encode side effects
in the return types of pure functions. Amazingly, this is possible for all relevant effects.
The idea is that, instead of a computation of the type a->b with side effects, we use a function
a -> f b, where the type constructor f encodes the appropriate effect. At this point there are
no conditions imposed on f. It doesn’t even have to be a Functor, much less a monad. This
will come later, when we talk about effect composition.

179
180 CHAPTER 14. MONADS

Below is the list of common effects and their pure-function versions.

Partiality
In imperative languages, partiality is often encoded using exceptions. When a function is called
with the “wrong” value for its argument, it throws an exception. In some languages, the type of
exception is encoded in the signature of the function using special syntax.
In Haskell, a partial computation can be implemented by a function returning the result
inside the Maybe functor. Such a function, when called with the “wrong” argument, returns
Nothing, otherwise is wraps the result in the Just constructor.
If we want to encode more information about the type of the failure, we can use the Either
functor, with the Left traditionally passing the error data (often a simple String); and Right
encapsulating the real return, if available.
The caller of a Maybe-valued function cannot easily ignore the exceptional condition. In
order to extract the value, they have to pattern-match the result and decide how to deal with
Nothing. This is in contrast to the “poor-man’s Maybe” of some imperative languages where
the error condition is encoded using a null pointer.

Logging
Sometimes a computation has to log some values in some external data structure. Logging or
auditing is a side effect that’s particularly dangerous in concurrent programs, where multiple
threads might try to access the same log simultaneously.
The simple solution is for a function to return the computed value paired with the item to be
logged. In other words, a logging computation of the type a -> b can be replaced by a pure
function:
a -> Writer w b
where the Writer functor is a thin encapsulation of the product:
newtype Writer w a = Writer (a, w)
with w being the type of the log.
The caller of this function is then responsible for extracting the value to be logged. This is
a common trick: make the function provide all the data, and let the caller deal with the effects.

Environment
Some computations need read-only access to some external data stored in the environment.
The read-only environment, instead of being secretly accessed by a computation, can be simply
passed to a function as an additional argument. If we have a computation a -> b that needs
access to some environment e, we replace it with a function (a, e) -> b. At first, this doesn’t
seem to fit the pattern of encoding side effects in the return type. However, such a function can
always be curried to the form:
a -> (e -> b)
The return type of this function can be encoded in the reader functor, itself parameterized by
the environment type e:
newtype Reader e a = Reader (e -> a)
This is an example of a delayed side effect. The function:
14.1. PROGRAMMING WITH SIDE EFFECTS 181

a -> Reader e a
doesn’t want to deal with effects so it delegates the responsibility to the caller. You may think
of it as producing a script to be executed at a later time. The function runReader plays the role
of a very simple interpreter of this script:
runReader :: Reader e a -> e -> a
runReader (Reader h) e = h e

State
The most common side effect is related to accessing and potentially modifying some shared
state. Unfortunately, shared state is the notorious source of concurrency errors. This is a serious
problem in object-oriented languages where stateful objects can be transparently shared between
many clients. In Java, such objects may be provided with individual mutexes at the cost of
impaired performance and the risk of deadlocks.
In functional programming we make state manipulations explicit: we pass the state as an
additional argument and return the modified state paired with the return value. We replace a
stateful computation a -> b with
(a, s) -> (b, s)
where s is the type of state. As before, we can curry such a function to get it to the form:
a -> (s -> (b, s))
This return type can be encapsulated in the following functor:
newtype State s a = State (s -> (a, s))
The caller of such a function is supposed to retrieve the result and the modified state by providing
the initial state and calling the helper function, the interpreter, runState:
runState :: State s a -> s -> (a, s)
runState (State h) s = h s
Notice that, modulo constructor unpacking, runState is bona fide function application.

Nondeterminism
Imagine performing a quantum experiment that measures the spin of an electron. Half of the
time the spin will be up, and half of the time it will be down. The result is non-deterministic. One
way to describe it is to use the many-worlds interpretation: when we perform the experiment,
the Universe splits into two universes, one for each result.
What does it mean for a function to be non-deterministic? It means that it will return different
results every time it’s called. We can model this behavior using the many-worlds interpretation:
we let the function return all possible results at once. In practice, we’ll settle for a (possibly
infinite) list of results:
We replace a non-deterministic computation a -> b with a pure function returning a func-
torful of results—this time it’s the list functor:
a -> [b]
Again, it’s up to the caller to decide what to do with these results.
182 CHAPTER 14. MONADS

Input/Output
This is the trickiest side effect because it involves interacting with the external world. Obviously,
we cannot model the whole world inside a computer program. So, in order to keep the program
pure, the interaction has to happen outside of it. The trick is to let the program generate a script.
This script is then passed to the runtime to be executed. The runtime is the effectful virtual
machine that runs the program.
This script itself sits inside the opaque, predefined IO functor. The values hidden in this
functor are not accessible to the program: there is no runIO function. Instead, the IO value
produced by the program is executed, at least conceptually, after the program is finished.
In reality, because of Haskell’s laziness, the execution of I/O is interleaved with the rest of
the program. Pure functions that comprise the bulk of your program are evaluated on demand—
the demand being driven by the execution of the IO script. If it weren’t for I/O, nothing would
ever be evaluated.
The IO object that is produced by a Haskell program is called main and its type signature
is:
main :: IO ()
It’s the IO functor containing the unit—meaning: there is no useful value other than the in-
put/output script.
We’ll talk about how IO actions are created soon.

Continuation
We’ve seen that, as a consequence of the Yoneda lemma, we can replace a value of type a with
a function that takes a handler for that value. This handler is called a continuation. Calling a
handler is considered a side effect of a computation. In terms of pure functions, we encode it
as:
a -> Cont r b
where Cont r is the following functor:
newtype Cont r a = Cont ((a -> r) -> r)
It’s the responsibility of the caller of this function to provide the continuation, a function k :: a -> r,
and retrieve the result:
runCont :: Cont r a -> (a -> r) -> r
runCont (Cont f) k = f k
This is the Functor instance for Cont r:
instance Functor (Cont r) where
-- f :: a -> b
-- k :: b -> r
fmap f c = Cont (\k -> runCont c (k . f))
Notice that this is a covariant functor because the type a is in a doubly negative position.
In a cartesian closed category, continuations are generated by the endofunctor:

𝑎
𝐾 𝑟 𝑎 = 𝑟𝑟
14.2. COMPOSING EFFECTS 183

14.2 Composing Effects


Now that we know how to make one giant leap using a function that produces both a value
and a side effect, the next problem is to figure out how to decompose this leap into smaller
human-sized steps. Or, conversely, how to combine such smaller steps into one larger step.
The way effectful computations are composed in imperative languages is to use regular
function composition for the values and let the side effects combine themselves willy-nilly.
When we represent effectful computations as pure functions, we are faced with the problem
of composing two functions of the form
g :: a -> f b
h :: b -> f c
In all cases of interest the type constructor f happens to be a Functor, so we’ll assume that in
what follows.
The naive approach would be to unpack the result of the first function, pass the value to the
next function, then compose the effects of both functions on the side, and combine them with
the result of the second function. This is not always possible, even for cases that we have studied
so far, much less for an arbitrary type constructor.
For the sake of the argument, it’s instructive to see how we could do it for the Maybe functor.
If the first function returns Just, we pattern match it to extract the contents and call the next
function with it.
But if the first function returns Nothing, we have no value with which to call the second
function. We have to short-circuit it, and return Nothing directly. So composition is possible,
but it means modifying flow of control by skipping the second call based on the side effect of
the first call.
For some functors the composition of side effects is possible, for others it’s not. How can
we characterize those “good” functors?
For a functor to encode composable side effects we must at least be able to implement the
following polymorphic higher-order function:
composeWithEffects :: Functor f =>
(b -> f c) -> (a -> f b) -> (a -> f c)
This is very similar to regular function composition:
(.) :: (b -> c) -> (a -> b) -> (a -> c)
so it’s natural to ask if there is a category in which the former defines a composition of arrows.
Let’s see what more is needed to construct such a category.
Objects in this new category are the same Haskell types as before. But an arrow 𝑎 ↠ 𝑏, is
implemented as a Haskell function:
g :: a -> f b
Our composeWithEffects can then be used to implement the composition of such arrows.
To have a category, we require that this composition be associative. We also need an identity
arrow for every object a. This is an arrow 𝑎 ↠ 𝑎, so it corresponds to a Haskell function:
idWithEffects :: a -> f a
It must behave like identity with respect to composeWithEffects.
Another way of looking at this arrow is that it lets you add a trivial effect to any value of
type a. It’s the effect that combined with any other effect does nothing to it.
184 CHAPTER 14. MONADS

We have just defined a monad! After some renaming and rearranging, we can write it as a
typeclass:
class Functor m => Monad m where
(<=<) :: (b -> m c) -> (a -> m b) -> (a -> m c)
return :: a -> m a
The infix operator <=< replaces the function composeWithEffects. The return function is
the identity arrow in our new category. (This is not the definition of the monad you’ll find in
the Haskell’s Prelude but, as we’ll see soon, it’s equivalent to it.)
As an exercise, let’s define the Monad instance for Maybe. The “fish” operator <=< composes
two functions:
f :: a -> Maybe b
g :: b -> Maybe c
into one function of the type a -> Maybe c. The unit of this composition, return, encloses a
value in the Just constructor.
instance Monad Maybe where
g <=< f = \a -> case f a of
Nothing -> Nothing
Just b -> g b
return = Just
You can easily convince yourself that category laws are satisfied. In particular return <=< g
is the same as g and f <=< return is the same as f. The proof of associativity is also pretty
straightforward: If any of the functions returns Nothing, the result is Nothing; otherwise it’s
just a straightforward function composition, which is associative.
The category that we have just defined is called the Kleisli category for the monad m. The
functions a -> m b are called the Kleisli arrows. They compose using <=< and the identity
arrow is called return.
All functors from the previous section are Monad instances. If you look at them as type
constructors, or even functors, it’s hard to see any similarities between them. The thing they
have in common is that they can be used to implement composable Kleisli arrows.
As Lao Tze would say: Composition is something that happens between things. While
focusing our attention on things, we often lose sight of what’s in the gaps.

14.3 Alternative Definitions


The definition of a monad using Kleisli arrows has the advantage that the monad laws are simply
the associativity and the unit laws of a category. There are two other equivalent definitions of a
monad, one preferred by mathematicians, and one by programmers.
First, let’s notice that, when implementing the fish operator, we are given two functions as
arguments. The only thing a function is useful for is to be applied to an argument. When we
apply the first function f :: a -> m b we get a value of the type m b. At this point we
would be stuck, if it weren’t for the fact that m is a functor. Functoriality lets us apply the second
function g :: b -> m c to m b. Indeed the lifting of g by m is of the type:
m b -> m (m c)
This is almost the result we are looking for, if we could only flatten m(m c) to m c. This flat-
tening is called join. In other words, if we are given:
14.3. ALTERNATIVE DEFINITIONS 185

join :: m (m a) -> m a
we can implement <=<:
g <=< f = \a -> join (fmap g (f a))
or, using point free notation:
g <=< f = join . fmap g . f

Conversely, join can be implemented in terms of <=<:


join = id <=< id
The latter may not be immediately obvious, until you realize that the rightmost id is applided
to m (m a), and the leftmost is applied to m a. We interpret a Haskell function:
m (m a) -> m (m a)
as an arrow in the Kleisli category 𝑚(𝑚𝑎) ↠ 𝑚𝑎. Similarly, the function:
m a -> m a
implements a Kleisli arrow 𝑚𝑎 ↠ 𝑎. Their Kleisli composition produces a Kleisli arrow
𝑚(𝑚𝑎) ↠ 𝑎 or a Haskell function:
m (m a) -> m a
This leads us to the equivalent definition of a monad in terms of join and return:
class Functor m => Monad m where
join :: m (m a) -> m a
return :: a -> m a
This is still not the definition you will find in the standard Haskell Prelude. Since the fish
operator is a generalization of the dot operator, using it is equivalent to point-free programming.
It lets us compose arrows without naming intermediate values. Although some consider point-
free programs more elegant, most programmers find them difficult to follow.
But function composition is really done in two steps: We apply the first function, then apply
the second function to the result. Explicitly naming the intermediate result is often helpful in
understanding what’s going on.
To do the same with Kleisli arrows, we have to know how to apply the second Kleisli arrow
to a named monadic value—the result of the first Kleisli arrow. The function that does that is
called bind and is written as an infix operator:
(>>=) :: m a -> (a -> m b) -> m b
Obviously, we can implement Kleisli composition in terms of bind:
g <=< f = \a -> (f a) >>= g

Conversely, bind can be implemented in terms of the Kleisli arrow:


ma >>= k = (k <=< id) ma
This leads us to the following definition:
class Monad m where
(>>=) :: m a -> (a -> m b) -> m b
return :: a -> m a
186 CHAPTER 14. MONADS

This is almost the definition you’ll find in the Prelude, except for the additional constraint.
This constraint states the fact that every instance of Monad is also an instance of Applicative.
We will postpone the discussion of applicatives to the section on monoidal functors.
We can also implement join using bind:
join :: (Monad m) => m (m a) -> m a
join mma = mma >>= id
The Haskell function id goes from m a to m a or, as a Kleisli arrow, 𝑚𝑎 ↠ 𝑎.
Interestingly, a Monad defined using bind is automatically a functor. The lifting function for
it is called liftM
liftM :: Monad m => (a -> b) -> (m a -> m b)
liftM f ma = ma >>= (return . f)

14.4 Monad Instances


We are now ready to define monad instances for the functors we used for side effects. This will
allow us to compose side effects.

Partiality
We’ve already seen the version of the Maybe monad implemented using Kleisli composition.
Here’s the more familiar implementation using bind:
instance Monad Maybe where
Nothing >>= k = Nothing
(Just a) >>= k = k a
return = Just
Adding a trivial effect to any value means enclosing it in Just.

Logging
In order to compose functions that produce logs, we need a way to combine individual log
entries. This is why the writer monad:
newtype Writer w a = Writer (a, w)
requires the type of the log to be an instance of Monoid. This allows us to append logs, and also
to create a trivial effect: an empty log.
instance Monoid w => Monad (Writer w) where
(Writer (a, w)) >>= k = let (Writer (b, w')) = k a
in Writer (b, mappend w w')
return a = Writer (a, mempty)
The let clause is used for introducing local bindings. Here, the result of applying k is pattern
matched, and the local variables b and w' are initialized. The let/in construct is an expression
whose value is given by the content of the in clause.

Environment
The reader monad is a thin encapsulation of a function from the environment to the return type:
14.4. MONAD INSTANCES 187

newtype Reader e a = Reader (e -> a)


Here’s the Monad instance:
instance Monad (Reader e) where
ma >>= k = Reader (\e -> let a = runReader ma e
in runReader (k a) e)
return a = Reader (\e -> a)
The implementation of bind for the reader monad creates a function that takes the environment
as its argument. This environment is used twice, first to run ma to get the value of a, and then
to evaluate the value produced by k a.
The implementation of return ignores the environment.

Exercise 14.4.1. Define the Functor and the Monad instance for the following data type:
newtype E e a = E (e -> Maybe a)
Hint: You may use this handy function:
runE :: E e a -> e -> Maybe a
runE (E f) e = f e

State
Like reader, the state monad is a function type:
newtype State s a = State (s -> (a, s))
Its bind is similar, except that the result of k acting on a has to be run with the modified state
s'.
instance Monad (State s) where
st >>= k = State (\s -> let (a, s') = runState st s
in runState (k a) s')

return a = State (\s -> (a, s))


Applying bind to identity gives us the definition of join:
join :: State s (State s a) -> State s a
join mma = State (\s -> let (ma, s') = runState mma s
in runState ma s')
Notice that we are essentially passing the result of the first runState to the second runState,
except that we have to uncurry the second one so it can accept a pair:
join mma = State (\s -> (uncurry runState) (runState mma s))
In this form, it’s easy to convert it to point-free notation:
join mma = State (uncurry runState . runState mma)
There are two basic Kleisli arrows (the first one, conceptually, coming from the terminal
object ()) with which we can construct an arbitrary stateful computation. The first one retrieves
the current state:
188 CHAPTER 14. MONADS

get :: State s s
get = State (\s -> (s, s))
and the second one modifies it:
set :: s -> State s ()
set s = State (\_ -> ((), s))
A lot of monads come with their own libraries of predefined basic Kleisli arrows.

Nondeterminism
For the list monad, let’s consider how we would implement join. It must turn a list of lists
into a single list. This can be done by concatenating all the inner lists using the library function
concat. From there, we can derive the implementation of bind.
instance Monad [] where
as >>= k = concat (fmap k as)
return a = [a]
Here, return constructs a singleton list. Thus a trivial version of nondeterminism is determin-
ism.
What imperative languages do using nested loops we can do in Haskell using the list monad.
Think of as in bind as aggregating the result of running the inner loop and k as the code that
runs in the outer loop.
In many ways, Haskell’s list behaves more like what is called an iterator or a generator in
imperative languages. Because of laziness, the elements of the list are rarely stored in memory
all at once, so you may conceptualize a Haskell list as a pointer to the head and a recipe for
advancing it forward towards the tail. Or you may think of a list as a coroutine that produces,
on demand, elements of a sequence.

Continuation
The implementation of bind for the continuation monad:
newtype Cont r a = Cont ((a -> r) -> r)
requires some backward thinking, because of the inherent inversion of control—the “don’t call
us, we’ll call you” principle.
The result of bind is of the type Cont r b. To construct it, we need a function that takes,
as an argument k :: b -> r:
ma >>= fk = Cont (\k -> ...)
We have two ingredients at our disposal:
ma :: Cont r a
fk :: a -> Cont r b
We’d like to run ma, and for that we need a continuation that would accept an a.
runCont ma (\a -> ...)
Once we have an a, we can execute our fk. The result is of the type Cont r b, so we can run
it with our continuation k :: b -> r.
14.5. DO NOTATION 189

runCont (fk a) k
Taken together, this convoluted process produces the following implementation:
instance Monad (Cont r) where
ma >>= fk = Cont (\k -> runCont ma (\a -> runCont (fk a) k))
return a = Cont (\k -> k a)
As I mentioned earlier, composing continuations is not for the faint of heart. However, it has to
be implemented only once—in the definition of the continuation monad. From there on, the do
notation will make the rest relatively easy.

Input/Output
The IO monad’s implementation is baked into the language. The basic I/O primitives are avail-
able through the library. They are either in the form of Kleisli arrows, or IO objects (conceptu-
ally, Kleisli arrows from the terminal object ()).
For instance, the following object contains a command to read a line from the standard
input:
getLine :: IO String
There is no way to extract the string from it, since it’s not there yet; but the program can process
it through a further series of Kleisli arrows.
The IO monad is the ultimate procrastinator: the composition of its Kleisli arrows piles up
task after task to be executed later by the Haskell runtime.
To output a string followed by a newline, you can use this Kleisli arrow:
putStrLn :: String -> IO ()
Combining the two, you may construct a simple main object:
main :: IO ()
main = getLine >>= putStrLn
which echoes a string you type.

14.5 Do Notation
It’s worth repeating that the sole purpose of monads in programming is to let us decompose one
big Kleisli arrow into multiple smaller ones.
This can be either done directly, in a point-free style, using Kleisli composition <=<; or by
naming intermediate values and binding them to Kleisli arrows using >>=.
Some Kleisli arrows are defined in libraries, others are reusable enough to warrant out-of-
line implementation but, in practice, the majority are implemented as single-shot inline lambdas.
Here’s a simple example:
main :: IO ()
main =
getLine >>= \s1 ->
getLine >>= \s2 ->
putStrLn ("Hello " ++ s1 ++ " " ++ s2)
190 CHAPTER 14. MONADS

which uses an ad-hoc Kleisli arrow of the type String->IO () defined by the lambda expres-
sion:
\s1 ->
getLine >>= \s2 ->
putStrLn ("Hello " ++ s1 ++ " " ++ s2)
The body of this lambda is further decomposed using another ad-hoc Kleisli arrow:
\s2 -> putStrLn ("Hello " ++ s1 ++ " " ++ s2)
Such constructs are so common that there is special syntax called the do notation that cuts
through a lot of boilerplate. The above code, for instance, can be written as:
main = do
s1 <- getLine
s2 <- getLine
putStrLn ("Hello " ++ s1 ++ " " ++ s2)
The compiler will automatically convert it to a series of nested lambdas. The line s1<-getLine
is usually read as: “s1 gets the result of getLine.”
Here’s another example: a function that uses the list monad to generate all possible pairs of
elements taken from two lists.
pairs :: [a] -> [b] -> [(a, b)]
pairs as bs = do
a <- as
b <- bs
return (a, b)
Notice that the last line in a do block must produce a monadic value—here this is accomplished
using return.
Most imperative languages lack the abstraction power to generically define a monad, and
instead they attempt to hard-code some of the more common monads. For instance, they imple-
ment exceptions as an alternative to the Either monad, or concurrent tasks as an alternative to
the continuation monad. Some, like C++, introduce coroutines that mimic Haskell’s do nota-
tion.
Exercise 14.5.1. Implement the following function that works for any monad:
ap :: Monad m => m (a -> b) -> m a -> m b
Hint: Use do notation to extract the function and the argument. Use return to return the result.
Exercise 14.5.2. Rewrite the pairs function using the bind operators and lambdas.

14.6 Continuation Passing Style


I mentioned before that the do notation provides the syntactic sugar that makes working with
continuations more natural. One of the most important applications of continuations is in trans-
forming programs to use CPS (continuation passing style). The CPS transformation is common
in compiler construction. Another very important application of CPS is in converting recursion
to iteration.
The common problem with deeply recursive programs is that they may blow the runtime
stack. A function call usually starts by pushing function arguments, local variables, and the
14.6. CONTINUATION PASSING STYLE 191

return address on the stack. Thus deeply nested recursive calls may quickly exhaust the (usually
fixed-size) runtime stack resulting in a runtime error. This is the main reason why imperative
languages prefer looping to recursion, and why most programmers learn about loops before they
study recursion. However, even in imperative languages, when it comes to traversing recursive
data structures, such as linked lists or trees, recursive algorithms are more natural.
The problem with using loops, though, is that they require mutation. There is usually some
kind of a counter or a pointer that is advanced and checked with each turn of the loop. This is
why purely functional languages that shun mutation must use recursion in place of loops. But
since looping is more efficient and it doesn’t consume the runtime stack, the compiler tries to
covert recursive calls to loops. In Haskell all tail-recursive functions are turned into loops.

Tail recursion and CPS


Tail recursion means that the recursive call happens at the very end of the function. The function
doesn’t perform any additional operations on the result of the tail call. For instance this program
is not tail recursive, because it has to add i to the result of the recursive call:
sum1 :: [Int] -> Int
sum1 [] = 0
sum1 (i : is) = i + sum1 is
In contrast, the following implementation is tail recursive because the result of the recursive call
to go is returned without further modification:
sum2 = go 0
where go n [] = n
go n (i : is) = go (n + i) is
The compiler can easily turn the latter into a loop. Instead of making the recursive call, it will
overwrite the value of the first argument n with n + i, overwrite the pointer to the head of the
list with the pointer to its tail, and then jump to the beginning of the function.
Note however that it doesn’t mean that the Haskell compiler won’t be able to cleverly op-
timize the first implementation. It just means that the second implementation, which is tail
recursive, is guaranteed to be turned into a loop.
In fact, it’s always possible to turn recursion into tail recursion by performing the CPS trans-
formation. This is because a continuation encapsulates the rest of the computation, so it’s always
the last call in a function.
To see how it works in practice, consider a simple tree traversal. Let’s define a tree that
stores strings in both nodes and leaves:
data Tree = Leaf String
| Node Tree String Tree
To concatenate these strings we use the traversal that first recurses into the left subtree, and then
into the right subtree:
show :: Tree -> String
show (Node lft s rgt) =
let ls = show lft
rs = show rgt
in ls ++ s ++ rs
192 CHAPTER 14. MONADS

This is definitely not a tail recursive function, and it’s not obvious how to turn it into one.
However, we can almost mechanically rewrite it using the continuation monad:
showk :: Tree -> Cont r String
showk (Leaf s) = return s
showk (Node lft s rgt) = do
ls <- showk lft
rs <- showk rgt
return (ls ++ s ++ rs)
We can run the result with the trivial continuation id:
show :: Tree -> String
show t = runCont (showk t) id
This implementation is automatically tail recursive. We can see it clearly by desugaring the
do notation:
showk :: Tree -> (String -> r) -> r
showk (Leaf s) k = k s
showk (Node lft s rgt) k =
showk lft (\ls ->
showk rgt (\rs ->
k (ls ++ s ++ rs)))
Let’s analyze this code. The function calls itself, passing the left subtree lft and the following
continuation:
\ls ->
showk rgt (\rs ->
k (ls ++ s ++ rs))
This lambda in turn calls showk with the right subtree rgt and another continuation:
\rs -> k (ls ++ s ++ rs)
This innermost lambda that has access to all three strings: left, middle, and right. It concatenates
them and calls the outermost continuation k with the result.
In each case, the recursive call to showk is the last call, and its result is immediately returned.
Moreover, the type of the result is the generic type r, which in itself guarantees that we can’t
perform any operations on it. When we finally run the result of showk, we pass it the identity
(instantiated for the String type):
show :: Tree -> String
show t = runCont (showk t) id

Using named functions


But suppose that our programming language doesn’t support anonymous functions. Is it possible
to replace the lambdas with named functions? We’ve done this before when we discussed the
adjoint functor theorem. We notice that the lambdas generated by the continuation monad are
closures—they capture some values from their environment. If we want to replace them with
named functions, we’ll have to pass the environment explicitly.
We replace the first lambda with the call to the function named next, and pass it the neces-
sary environment in the form or a tuple of three values (s, rgt, k):
14.6. CONTINUATION PASSING STYLE 193

showk :: Tree -> (String -> r) -> r


showk (Leaf s) k = k s
showk (Node lft s rgt) k =
showk lft (next (s, rgt, k))
The three values are the string from the current node of the tree, the right subtree, and the outer
continuation.
The function next makes the recursive call to showk passing to it the right subtree and a
continuation named conc:
next :: (String, Tree, String -> r) -> String -> r
next (s, rgt, k) ls = showk rgt (conc (ls, s, k))
Again, conc explicitly captures the environment containing two strings and the outer continua-
tion. It performs the concatenation and calls the outer continuation with the result:
conc :: (String, String, String -> r) -> String -> r
conc (ls, s, k) rs = k (ls ++ s ++ rs)
Finally, we define the trivial continuation:
done :: String -> String
done s = s
that we use to extract the final result:
show t = showk t done

Defunctionalization
Continuation passing style requires the use of higher order functions. If this is a problem, e.g.,
when implementing distributed systems, we can always use the adjoint functor theorem to de-
functionalize our program.
The first step is to create the sum of all relevant environments, including the empty one we
used in done:
data Kont = Done
| Next String Tree Kont
| Conc String String Kont
Notice that this data structure can be reinterpreted as a list or a stack. It can be seen as a list of
elements of the following sum type:
data Sum = Next' String Tree | Conc' String String
This list is our version of the runtime stack necessary to implement a recursive algorithm.
Since we are only interested in producing a string as the final result, we’re going to approx-
imate the String -> String function type. This is the approximate counit of the adjunction
that defines it (see the adjoint functor theorem):
apply :: (Kont, String) -> String
apply (Done, s) = s
apply (Next s rgt k, ls) = showk rgt (Conc ls s k)
apply (Conc ls s k, rs) = apply (k, ls ++ s ++ rs)
The showk function can be now implemented without recourse to higher order functions:
194 CHAPTER 14. MONADS

showk :: Tree -> Kont -> String


showk (Leaf s) k = apply (k, s)
showk (Node lft s rgt) k = showk lft (Next s rgt k)
To extract the result, we call it with Done:
showTree t = showk t Done

14.7 Monads Categorically


In category theory monads first arose in the study of algebras. In particular, the bind operator
can be used to implement the very important operation of substitution.

Substitution
Consider this simple expression type. It’s parameterized by the type x that we can use for naming
our variables:
data Ex x = Val Int
| Var x
| Plus (Ex x) (Ex x)
deriving (Functor, Show)
We can, for instance, construct an expression (2 + 𝑎) + 𝑏:
ex :: Ex Char
ex = Plus (Plus (Val 2) (Var 'a')) (Var 'b')
We can implement the Monad instance for Ex:
instance Monad Ex where
Val n >>= k = Val n
Var x >>= k = k x
Plus e1 e2 >>= k =
let x = e1 >>= k
y = e2 >>= k
in (Plus x y)

return x = Var x
Now suppose that you want to make a substitution by replacing the variable 𝑎 with 𝑥1 +2 and
𝑏 with 𝑥2 (for simplicity, let’s not worry about other letters of the alphabet). This substitution is
represented by the Kleisli arrow sub:
sub :: Char -> Ex String
sub 'a' = Plus (Var "x1") (Val 2)
sub 'b' = Var "x2"
As you can see, we were even able to change the type used for naming variables from Char to
String.
When we bind this Kleisli arrow to ex:
ex' :: Ex String
ex' = ex >>= sub
we get, as expected, a tree corresponding to (2 + (𝑥1 + 2)) + 𝑥2 .
14.7. MONADS CATEGORICALLY 195

Monad as a monoid
Let’s analyze the definition of a monad that uses join:
class Functor m => Monad m where
join :: m (m a) -> m a
return :: a -> m a
We have an endofunctor m and two polymorphic functions.
In category theory, the functor that defines the monad is traditionally denoted by 𝑇 (probably
because monads were initially called “triples”). The two polymorphic functions become natural
transformations. The first one, corresponding to join, maps the “square” of 𝑇 (a composition
of 𝑇 with itself) to 𝑇 :
𝜇 ∶ 𝑇 ◦𝑇 → 𝑇
(Of course, only endo-functors can be squared this way.)
The second, corresponding to return, maps the identity functor to 𝑇 :

𝜂 ∶ Id → 𝑇

Compare this with our earlier definition of a monoid in a monoidal category:

𝜇∶ 𝑚 ⊗ 𝑚 → 𝑚
𝜂∶ 𝐼 → 𝑚

The similarity is striking. This is why we often call the natural transformation 𝜇 the monadic
multiplication. But in what category can the composition of functors be considered a tensor
product?
Enter the category of endofunctors. Objects in this category are endofunctors and arrows
are natural transformations.
But there’s more structure to that category. We know that any two endofunctors can be
composed. How can we interpret this composition if we want to treat endofunctors as objects?
An operation that takes two objects and produces a third object looks like a tensor product. The
only condition imposed on a tensor product is that it’s functorial in both arguments. That is,
given a pair of arrows:

𝛼∶ 𝑇 → 𝑇′
𝛽 ∶ 𝑆 → 𝑆′

we can lift it to the mapping of the tensor product:

𝛼 ⊗ 𝛽 ∶ 𝑇 ⊗ 𝑆 → 𝑇 ′ ⊗ 𝑆′

In the category of endofunctors, the arrows are natural transformations so, if we replace ⊗
with ◦, the lifting is the mapping:

𝛼◦𝛽 ∶ 𝑇 ◦𝑇 ′ → 𝑆◦𝑆 ′

But this is just horizontal composition of natural transformations (now you understand why it’s
denoted by a circle).
The unit object in this monoidal category is the identity endofunctor, and unit laws are
satisfied “on the nose,” meaning
Id◦𝑇 = 𝑇 = 𝑇 ◦Id
196 CHAPTER 14. MONADS

We don’t need any unitors. We don’t need any associators either, since functor composition is
automatically associative.
A monoidal category in which unitors and associators are identity morphisms is called a
strict monoidal category.
Notice, however, that composition is not symmetric, so this is not a symmetric monoidal
category.
So, all said, a monad is a monoid in the monoidal category of endofunctors.
A monad (𝑇 , 𝜂, 𝜇) consists of an object in the category of endofunctors—meaning an end-
ofunctor 𝑇 ; and two arrows—meaning natural transformations:

𝜂 ∶ Id → 𝑇
𝜇 ∶ 𝑇 ◦𝑇 → 𝑇

For this to be a monoid, these arrows must satisfy monoidal laws. Here are the unit laws (with
unitors replaced by strict equalities):

𝜂◦𝑇 𝑇 ◦𝜂
Id◦𝑇 𝑇 ◦𝑇 𝑇 ◦Id
𝜇
= =
𝑇

and this is the associativity law:

=
(𝑇 ◦𝑇 )◦𝑇 𝑇 ◦(𝑇 ◦𝑇 )
𝜇◦𝑇 𝑇 ◦𝜇

𝑇 ◦𝑇 𝑇 ◦𝑇
𝜇 𝜇

We used the whiskering notation for horizontal composition of 𝜇◦𝑇 and 𝑇 ◦𝜇.
These are the monad laws in terms of 𝜇 and 𝜂. They can be directly translated to the laws
for join and return. They are also equivalent to the laws of the Kleisli category built from
arrows 𝑎 → 𝑇 𝑏.

14.8 Free Monads


A monad lets us specify a sequence of actions that may produce side effects. Such a sequence
tells the computer both what to do and how to do it. But sometimes more flexibility is required:
We’d like to separate the “what” from the “how." A free monad lets us produce the sequence
without committing to a particular monad for its execution. This is analogous to defining a free
monoid (a list), which lets us postpone the choice of the algebra to apply to it; or to creating an
AST (abstract syntax tree) before compiling it to executable code.
Free constructions are defined as left adjoints to forgetful functors. So first we have to define
what it means to forget to be a monad. Since a monad is an endofunctor equipped with additional
structure, we’d like to forget this structure. We take a monad (𝑇 , 𝜂, 𝜇) and keep only 𝑇 . But in
order to define such a mapping as a functor, we first need to define the category of monads.
14.8. FREE MONADS 197

Category of monads
The objects in the category of monads 𝐌𝐨𝐧() are monads (𝑇 , 𝜂, 𝜇). We can define an ar-
row between two monads (𝑇 , 𝜂, 𝜇) and (𝑇 ′ , 𝜂 ′ , 𝜇′ ) as a natural transformation between the two
endofunctors:
𝜆∶ 𝑇 → 𝑇 ′
However, since monads are endofunctors with structure, we want these natural transformations
to preserve the structure. Preservation of unit means that the following diagram must commute:

Id
𝜂 𝜂′

𝜆
𝑇 𝑇′
Preservation of multiplication means that the following diagram must commute:
𝜆◦𝜆
𝑇 ◦𝑇 𝑇 ′ ◦𝑇 ′
𝜇 𝜇′
𝜆
𝑇 𝑇′
Another way of looking at 𝐌𝐨𝐧() is that it’s a category of monoids in the monoidal cate-
gory ([, ], ◦, Id).

Free monad
Now that we have a category of monads, we can define the forgetful functor:

𝑈 ∶ 𝐌𝐨𝐧() → [, ]

that maps every triple (𝑇 , 𝜂, 𝜇) to 𝑇 and every monad morphism to the underlying natural trans-
formation.
We would like a free monad to be generated by a left adjoint to this forgetful functor. The
problem is that this left adjoint doesn’t always exist. As usual, this is related to size issues:
monads tend to blow things up. The bottom line is that free monads exist for some, but not all,
endofunctors. Therefore we can’t define a free monad through an adjunction. Fortunately, in
most cases of interest, a free monad can be defined as a fixed point of an algebra.
The construction is analogous to how we defined a free monoid as an initial algebra for the
list functor:
data ListF a x = NilF | ConsF a x
or the more general:
𝐹𝑎 𝑥 = 𝐼 + 𝑎 ⊗ 𝑥
This time, however, the monoidal category in which a monad is defined as a monoid is the
category of endofunctors ([, ], ◦, Id). A free monoid in this category is the initial algebra for
the higher order “list” functor that maps functors to functors:

Φ𝐹 𝐺 = Id + 𝐹 ◦𝐺

Here, the coproduct of two functors is defined point-wise. On objects:

(𝐹 + 𝐺)𝑎 = 𝐹 𝑎 + 𝐺𝑎
198 CHAPTER 14. MONADS

and on arrows:
(𝐹 + 𝐺)𝑓 = 𝐹 𝑓 + 𝐺𝑓

(We form a coproduct of two morphisms using the functoriality of the coproduct. We assume
that  is co-cartesian, that is all coproducts exist.)
The initial algebra is the (least) fixed point of this operator, or the solution to the recursive
equation:
𝐿𝐹 ≅ Id + 𝐹 ◦𝐿𝐹

This formula establishes a natural isomorphism between two functors. Going from right to left,
Id + 𝐹 ◦𝐿𝐹 → 𝐿𝐹 , we have a mapping out of the sum, which is equivalent to a pair of natural
transformations:

Id → 𝐿𝐹
𝐹 ◦𝐿𝐹 → 𝐿𝐹

When translating this to Haskell, the components of these transformations become two con-
structors. We define the following recursive data type parameterized by a functor f:
data FreeMonad f a where
Pure :: a -> FreeMonad f a
Free :: f (FreeMonad f a) -> FreeMonad f a
If we think of the functor f as a container of values, the constructor Free takes a functorful
of (FreeMonad f a) and stashes it away. An arbitrary value of the type FreeMonad f a is
therefore a tree in which every node is a functorful of branches, and each leaf contains a value
of the type a.
Because this definition is recursive, the Functor instance for it is also recursive:
instance Functor f => Functor (FreeMonad f) where
fmap g (Pure a) = Pure (g a)
fmap g (Free ffa) = Free (fmap (fmap g) ffa)
Here, the outer fmap uses the Functor instance of f, while the inner (fmap g) recurses into
the branches.
It’s easy to show that FreeMonad is a Monad. The monadic unit eta is just a thin encapsu-
lation of the identity functor:
eta :: a -> FreeMonad f a
eta a = Pure a
Monadic multiplication, or join, is defined recursively:
mu :: Functor f => FreeMonad f (FreeMonad f a) -> FreeMonad f a
mu (Pure fa) = fa
mu (Free ffa) = Free (fmap mu ffa)
The Monad instance for FreeMonad f is therefore:
instance Functor f => Monad (FreeMonad f) where
return a = eta a
m >>= k = mu (fmap k m)
We can also define bind directly:
14.8. FREE MONADS 199

(Pure a) >>= k = k a
(Free ffa) >>= k = Free (fmap (>>= k) ffa)
A free monad accumulates monadic actions in a tree-like structure without committing to
any particular evaluation strategy. This tree can be “interpreted” using an algebra. But this
time it’s an algebra in the category of endofunctors, so its carrier is an endofunctor 𝐺 and the
structure map 𝛼 is a natural transformation Φ𝐹 𝐺 → 𝐺:
𝛼 ∶ Id + 𝐹 ◦𝐺 → 𝐺
This natural transformation, being a mapping out of a sum, is equivalent to a pair of natural
transformations :
𝜆 ∶ Id → 𝐺
𝜌 ∶ 𝐹 ◦𝐺 → 𝐺
We can translate it to Haskell as a pair of polymorphic functions:
type MAlg f g a = (a -> g a, f (g a) -> g a)
Since the free monad is the initial algebra, there is a unique mapping—the catamorphism—
from it to any other algebra. Recall how we defined a catamorphism for a regular algebra:
cata :: Functor f => Algebra f a -> Fix f -> a
cata alg = alg . fmap (cata alg) . out
The out part unwraps the contents of the fixed point. Here we can do this by pattern-matching
on the two constructors of the free monad. If it’s a leaf, we apply our 𝜆 to it. If it’s a node, we
recursively process its contents, and apply our 𝜌 to the result:
mcata :: Functor f => MAlg f g a -> FreeMonad f a -> g a
mcata (l, r) (Pure a) = l a
mcata (l, r) (Free ffa) =
r (fmap (mcata (l, r)) ffa)
Many tree-like monads are in fact free monads for simple functors.
Exercise 14.8.1. A (non-empty) rose tree is defined as:
data Rose a = Leaf a | Rose [Rose a]
deriving Functor
Implement conversions back and forth between Rose a and FreeMonad [] a.
Exercise 14.8.2. Implement conversions between a binary tree and FreeMonad Bin a, where:
data Bin a = Bin a a

Exercise 14.8.3. Find a functor whose free monad is equivalent to the list monad [a].

Stack calculator example


As an example, let’s consider a stack calculator implemented as an embedded domain-specific
language, EDSL. We’ll use the free monad to accumulate simple commands written in this
language.
The commands are defined by the functor StackF. Think of the parameter k as the contin-
uation.
200 CHAPTER 14. MONADS

data StackF k = Push Int k


| Top (Int -> k)
| Pop k
| Add k
deriving Functor
For instance, Push is supposed to push an integer on the stack and then call the continuation k.
The free monad for this functor can be thought of as a tree, with most branches having just
one child, thus forming lists. The exception is the Top node, which has many children, one per
every value of Int.
Here’s the free monad for this functor:
type FreeStack = FreeMonad StackF
In order to create domain-specific programs we’ll define a few helper functions. There is a
generic one that lifts a functorful of values to a free monad:
liftF :: (Functor f) => f r -> FreeMonad f r
liftF fr = Free (fmap (Pure) fr)
We also need a series of “smart constructors,” which are Kleisli arrows for our free monad:
push :: Int -> FreeStack ()
push n = liftF (Push n ())

pop :: FreeStack ()
pop = liftF (Pop ())

top :: FreeStack Int


top = liftF (Top id)

add :: FreeStack ()
add = liftF (Add ())
Since a free monad is a monad, we can conveniently combine Kleisli arrows using the do
notation. For instance, here’s a toy program that adds two numbers and returns their sum:
calc :: FreeStack Int
calc = do
push 3
push 4
add
x <- top
pop
return x
In order to execute this program, we need to define an algebra whose carrier is an endo-
functor. Since we want to implement a stack-based calculator, we’ll use a version of the state
functor. Its state is a stack—a list of integers. The state functor is defined as a function type;
here it’s a function that takes a list and returns a new list coupled with the type parameter k:
data StackAction k = St ([Int] -> ([Int], k))
deriving Functor
To run the action, we apply the function to the stack:
14.9. MONOIDAL FUNCTORS 201

runAction :: StackAction k -> [Int] -> ([Int], k)


runAction (St act) ns = act ns
We define the algebra as a pair of polymorphic functions corresponding to the two construc-
tors of the free monad, Pure and Free:
runAlg :: MAlg StackF StackAction a
runAlg = (stop, go)
The first function terminates the execution of the program and returns a value:
stop :: a -> StackAction a
stop a = St (\xs -> (xs, a))
The second function pattern matches on the type of the command. Each command carries with
it a continuation. This continuation has to be run with a (potentially modified) stack. Each
command modifies the stack in a different way:
go :: StackF (StackAction k) -> StackAction k
go (Pop k) = St (\ns -> runAction k (tail ns))
go (Top ik) = St (\ns -> runAction (ik (head ns)) ns)
go (Push n k) = St (\ns -> runAction k (n: ns))
go (Add k) = St (\ns -> runAction k
((head ns + head (tail ns)): tail (tail ns)))
For instance, Pop discards the top of the stack. Top takes an integer from top of the stack and
uses it to pick the branch to be executed. It does it by applying the function ik to the integer.
Add adds the two numbers at the top of the stack and pushes the result.
Notice that the algebra we have defined does not involve recursion. Separating recursion
from the actions is one of the advantages of the free monad approach. The recursion is instead
encoded once and for all in the catamorphism.
Here’s the function that can be used to run our toy program:
run :: FreeMonad StackF k -> ([Int], k)
run prog = runAction (mcata runAlg prog) []
Obviously, the use of partial functions head and tail makes our interpreter fragile. A badly
formed program will cause a runtime error. A more robust implementation would use an algebra
that allows for error propagation.
The other advantage of using free monads is that the same program may be interpreted using
different algebras.

Exercise 14.8.4. Implement a “pretty printer” that displays the program constructed using our
free monad. Hint: Implement the algebra that uses the Const functor as the carrier:
showAlg :: MAlg StackF (Const String) a

14.9 Monoidal Functors


We’ve seen several examples of monoidal cateogries. Such categories are equipped with some
kind of binary operation, e.g., a cartesian product, a sum, composition (in the category of endo-
functors), etc. They also have a special object that serves as the unit with respect to that binary
operation. Unit and associativity laws are satisfied either on the nose (in strict monoidal cate-
gories) or up to isomorphism.
202 CHAPTER 14. MONADS

Every time we have more than one instance of some structure, we may ask ourselves the
question: is there a whole category of such things? In this case: do monoidal categories form
their own category? For this to work we would have to define arrows between monoidal cate-
gories.
A monoidal functor 𝐹 from a monoidal category (, ⊗, 𝑖) to another monoidal category
(, ⊕, 𝑗) maps tensor product to tensor product and unit to unit—all up to isomorphism:

𝐹 𝑎 ⊕ 𝐹 𝑏 ≅ 𝐹 (𝑎 ⊗ 𝑏)
𝑗 ≅ 𝐹𝑖

Here, on the left-hand side we have the tensor product and the unit in the target category, and
on the right their counterparts in the source category.
If the two monoidal categories in question are not strict, that is the unit and associativity
laws are satisfied only up to isomorphism, there are additional coherency conditions that ensure
that unitors are mapped to unitors and associators are mapped to associators.
The category of monoidal categories with monoidal functors as arrows is called 𝐌𝐨𝐧𝐂𝐚𝐭. In
fact it’s a 2-category, since one can define structure-preserving natural transformations between
monoidal functors.

Lax monoidal functors


One of the perks of monoidal categories is that they allow us to define monoids. You can easily
convince yourself that monoidal functors map monoids to monoids. It turns out that you don’t
need the full power of monoidal functors to accomplish that. Let’s consider what the minimal
requirements are for a functor to map monoids to monoids.
Let’s start with a monoid (𝑚, 𝜇, 𝜂) in the monoidal category (, ⊗, 𝑖). Consider a functor 𝐹
that maps 𝑚 to 𝐹 𝑚. We want 𝐹 𝑚 to be a monoid in the target monoidal category (, ⊕, 𝑗). For
that we need to find two mappings:

𝜂′ ∶ 𝑗 → 𝐹 𝑚
𝜇′ ∶ 𝐹 𝑚 ⊕ 𝐹 𝑚 → 𝐹 𝑚

satisfying monoidal laws.


Since 𝑚 is a monoid, we do have at our disposal the liftings of the original mappings:

𝐹𝜂∶ 𝐹𝑖 → 𝐹𝑚
𝐹 𝜇 ∶ 𝐹 (𝑚 ⊗ 𝑚) → 𝐹 𝑚

What we are missing, in order to implement 𝜂 ′ and 𝜇′ , are two additional arrows:

𝑗 → 𝐹𝑖
𝐹 𝑚 ⊕ 𝐹 𝑚 → 𝐹 (𝑚 ⊗ 𝑚)

A monoidal functor would provide such arrows. However, for what we’re trying it accomplish,
we don’t need these arrows to be invertible.
A lax monoidal functor is a functor equipped with a morphism 𝜙𝑖 and a natural transforma-
tion 𝜙𝑎𝑏 :

𝜙𝑖 ∶ 𝑗 → 𝐹 𝑖
𝜙𝑎𝑏 ∶ 𝐹 𝑎 ⊕ 𝐹 𝑏 → 𝐹 (𝑎 ⊗ 𝑏)
14.9. MONOIDAL FUNCTORS 203

satisfying the appropriate unitality and associativity conditions.


Such a functor maps a monoid (𝑚, 𝜇, 𝜂) to a monoid (𝐹 𝑚, 𝜇′ , 𝜂 ′ ) with:

𝜂 ′ = 𝐹 𝜂◦𝜙𝑖
𝜇 ′ = 𝐹 𝜇◦𝜙𝑎𝑏

The simplest example of a lax monoidal functor is an endofunctor that preserves the usual
cartesian product. We can define it in Haskell as a typeclass:
class Monoidal f where
unit :: f ()
(>*<) :: f a -> f b -> f (a, b)
Corresponding to 𝜙𝑎𝑏 we have an infix operator which, according to Haskell conventions, is
written in its curried form.

Exercise 14.9.1. Implement the Monoidal instance for the list functor.

Functorial strength
There is another way a functor may interact with monoidal structure, one that hides in plain sight
when we do programming. We take it for granted that functions have access to the environment.
Such functions are called closures.
For instance, here’s a function that captures a variable a from the environment and pairs it
with its argument:
\x -> (a, x)
This definition makes no sense in isolation, but it does when the environment contains the vari-
able a, e.g.,
pairWith :: Int -> (String -> (Int, String))
pairWith a = \x -> (a, x)
The function returned by calling pairWith 5 “closes over” the 5 from its environment.
Now consider the following modification, which returns a singleton list that contains the
closure:
pairWith' :: Int -> [String -> (Int, String)]
pairWith' a = [\x -> (a, x)]
As a programmer you’d be very surprised if this didn’t work. But what we do here is highly
nontrivial: we are smuggling the environment under the list functor. According to our model of
lambda calculus, a closure is a morphism from the product of the environment and the function
argument. Here the lambda, which is really a function of (Int, String), is defined inside a
list functor, but it captures the value a that is defined outside the list.
The property that lets us smuggle the environment under a functor is called functorial
strength or tensorial strength and can be implemented in Haskell as:
strength :: Functor f => (e, f a) -> f (e, a)
strength (e, as) = fmap (e, ) as
The notation (e, ) is called a tuple section and is equivalent to the partial application of the
pair constructor: (,) e.
204 CHAPTER 14. MONADS

In category theory, strength for an endofunctor 𝐹 is defined as a natural transformation that


smuggles a tensor product into a functor:

𝜎 ∶ 𝑎 ⊗ 𝐹 (𝑏) → 𝐹 (𝑎 ⊗ 𝑏)

There are some additional conditions which ensure that it works nicely with the unitors and the
associator of the monoidal category in question.
The fact that we were able to implement strength for an arbitrary functor means that, in
Haskell, every functor is strong. This is the reason why we don’t have to worry about accessing
the environment from inside a functor.
Even more importantly, every monad in Haskell is strong by virtue of being a functor. This
is also why every monad is automatically Monoidal.
instance Monad m => Monoidal m where
unit = return ()
ma >*< mb = do
a <- ma
b <- mb
return (a, b)
If you desugar this code to use monadic bind and lambdas, you’ll notice that the final return
needs access to both a and b, which are defined in outer environments. This would be impossible
without a monad being strong.
In category theory, though, not every endofunctor in a monoidal category is strong. For
now, the magic incantation is that the category we’re working with is self-enriched, and every
endofunctor defined in Haskell is enriched. We’ll come back to it when we talk about enriched
categories. In Haskell, strength boils down to the fact that we can always fmap a partially applied
pair constructor (a, ).

Applicative functors
In programming, the idea of applicative functors arose from the following question: A functor
lets us lift a function of one variable. How can we lift a function of two or more variables?
By analogy with fmap, we’d like to have a function:
liftA2 :: (a -> b -> c) -> f a -> f b -> f c
A function of two arguments—here, in its curried form—is a function of one argument
returning a function. So, assuming that f is a functor, we can fmap the first argument of liftA2,
which has the type:
a -> (b -> c)
over the second argument (f a) to get:
f (b -> c)
The problem is, we don’t know how to apply f (b -> c) to the remaining argument (f b).
The class of functors that let us do that is called Applicative. It turns out that, once we
know how to lift a two-argument function, we can lift functions of any number of arguments,
except zero. A zero-argument function is just a value, so lifting it means implementing a func-
tion:
14.9. MONOIDAL FUNCTORS 205

pure :: a -> f a
Here’s the Haskell definition:
class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
The application of a functorful of functions to a functorful of arguments is defined as an infix
operator <*> that is customarily called “splat.”
There is also an infix version of fmap:
(<$>) :: Functor f => (a -> b) -> f a -> f b
which can be used in this terse implementation of liftA2:
liftA2 :: Applicative f => (a -> b -> c) -> f a -> f b -> f c
liftA2 g as bs = g <$> as <*> bs
Both operators bind to the left, which makes this syntax mimic regular function application.
An applicative functor must also satisfy a set of laws:
pure id <*> v = v -- Identity
pure f <*> pure x = pure (f x) -- Homomorphism
u <*> pure y = pure ($ y) <*> u -- Interchange
pure (.) <*> u <*> v <*> w = u <*> (v <*> w) -- Composition

Exercise 14.9.2. Implement liftA3, a function that lifts a 3-argument function using an ap-
plicative functor.

Closed functors
If you squint at the definition of the splat operator:
(<*>) :: f (a -> b) -> (f a -> f b)
you may see it as mapping function objects to function objects.
This becomes clearer if you consider a functor between two categories, both of them closed.
You may start with a function object 𝑏𝑎 in the source category and apply the functor 𝐹 to it:

𝐹 (𝑏𝑎 )

Alternatively, you may map the two objects 𝑎 and 𝑏 and construct a function object between
them in the target category:
(𝐹 𝑏)𝐹 𝑎
If we demand that the two ways be isomorphic, we get a definition of a strict closed functor.
But, as was the case with monoidal functors, we are more interested in the lax version, which is
equipped with a one-way natural transformation:

𝐹 (𝑏𝑎 ) → (𝐹 𝑏)𝐹 𝑎

If 𝐹 is an endofunctor, this translates directly into the definition of the splat operator.
The full definition of a lax closed functor includes the mapping of the monoidal unit and
some coherence conditions. All said, an applicative functor is a lax closed functor.
206 CHAPTER 14. MONADS

In a closed cartesian category, the exponential is related to the cartesian product through the
currying adjunction. It’s no surprise then, that in such a category lax monoidal and lax closed
(applicative) endofunctors are the same.
We can easily express this in Haskell:
instance (Functor f, Monoidal f) => Applicative f where
pure a = fmap (const a) unit
fs <*> as = fmap apply (fs >*< as)
where const is a function that ignores its second argument:
const :: a -> b -> a
const a b = a
and apply is the uncurried function application:
apply :: (a -> b, a) -> b
apply (f, a) = f a
And the other way around we have:
instance Applicative f => Monoidal f where
unit = pure ()
as >*< bs = (,) <$> as <*> bs
In the latter, we used the pair constructor (,) as a two-argument function.

Monads and applicatives


Since in a cartesian closed category every monad1 is lax monoidal, it is automatically applica-
tive. We can show it directly by implementing ap, which has the same type signature as the
splat operator:
ap :: (Monad m) => m (a -> b) -> m a -> m b
ap fs as = do
f <- fs
a <- as
return (f a)
This connection is expressed in the Haskell definition of a Monad by having Applicative
as its superclass:
class Applicative m => Monad m where
(>>=) :: forall a b. m a -> (a -> m b) -> m b
return :: a -> m a
return = pure
Notice the default implementation of return as pure.
The converse is not true: not every Applicative is a Monad. The standard counterexample
is the Applicative instance for a list functor that uses zipping:
instance Applicative [] where
pure = repeat
fs <*> as = zipWith apply fs as

1
Again, the correct incantation is “every enriched monad”
14.9. MONOIDAL FUNCTORS 207

Of course, the list functor is also a monad, so there is another Applicative instance based on
that. Its splat operator applies every function to every argument.
In programming, monad is more powerful than applicative. That’s because monadic code
lets you examine the contents of a monadic value and branch depending on it. This is true even
for the IO monad which otherwise provides no means of extracting the value. In this example
we are branching on the contents of an IO object:
main :: IO ()
main = do
s <- getLine
if s == "yes"
then putStrLn "Thank you!"
else putStrLn "Next time."
Of course, the inspection of the value is postponed until the runtime interpreter of IO gets hold
of this code.
Applicative composition using the splat operator doesn’t allow for one part of the compu-
tation to inspect the result of the other. This a limitation that can be turned into an advantage.
The absence of dependencies makes it possible to run the computations in parallel. Haskell’s
parallel libraries use applicative programming extensively.
On the other hand, monads let us use the very convenient do syntax, which is arguably more
readable than the applicative syntax. Fortunately, there is a language extension ApplicativeDo,
which instructs the compiler to selectively use applicative constructs in interpreting do blocks,
whenever there are no dependencies.

Exercise 14.9.3. Verify Applicative laws for the zip instance of the list functor.
Chapter 15

Monads and Adjunctions

15.1 String Diagrams


A line partitions a plane. We can think of it as either dividing a plane or as connecting two
halves of the plane.
A dot partitions a line. We can think of it as either separating two half-lines or as joining
them together.
This is a diagram in which two categories are represented as dots, two functors as arrows,
and a natural transformation as a double arrow.

 𝛼 

But the same idea can be represented by drawing categories as areas of a plane, functors as
lines between areas, and natural transformations as dots that join line segments.
The idea is that a functor always goes between a pair of categories, therefore it can be drawn
as a boundary between them. A natural transformation always goes between a pair of functors,
therefore it can be drawn as a dot joining two segments of a line.

𝐺
 
𝛼

This is an example of a string diagram. You read such a digram bottom-up, left-to-right
(think of the (𝑥, 𝑦) system of coordinates).
The bottom of this diagram shows the functor 𝐹 that goes from  to . The top of the
diagram shows the functor 𝐺 that goes between the same two categories. The transition happens
in the middle, where a natural transformation 𝛼 maps 𝐹 to 𝐺.
In Haskell, this diagram is interpreted as a polymorphic function between two endofunctors:

209
210 CHAPTER 15. MONADS AND ADJUNCTIONS

alpha :: forall x. F x -> G x


So far it doesn’t seem like we gain a lot by using this new visual representation. But let’s
apply it to something more interesting: vertical composition of natural transformations:

𝛽
𝐺
 
𝛼

The corresponding string diagram shows the two categories and three functors between them
joined by two natural transformations.

𝐻
 
𝛽
𝐺
𝛼

As you can see, you can reconstruct the original diagram from the string diagram by scanning
it bottom-to-top.
Again, in Haskell we’ll be dealing with three endofunctors, and the vertical composition of
beta after alpha:
alpha :: forall x. F x -> G x
beta :: forall x. G x -> H x
is implemented using regular function composition:
beta_alpha :: forall x. F x -> H x
beta_alpha = beta . alpha
Let’s continue with the horizontal composition of natural transformations:

𝐹′ 𝐺′

 𝛼  𝛽 

𝐹 𝐺

This time we have three categories, so we’ll have three areas.


The bottom of the string diagram corresponds to the composition of functors 𝐺◦𝐹 (in this
order). The top corresponds to 𝐺′ ◦𝐹 ′ . One natural transformation, 𝛼, connects 𝐹 to 𝐹 ′ ; the
15.1. STRING DIAGRAMS 211

other, 𝛽, connects 𝐺 to 𝐺′ .

𝐹′ 𝐺′
  
𝛼 𝛽

𝐹 𝐺

Parallel vertical lines in this new system correspond to functor composition.


You may think of the horizontal composition of natural transformations as happening along
the imaginary horizontal line in the middle of the diagram. But what if somebody was sloppy
in drawing the diagram, and one of the dots was a little higher than the other? As it turns out,
the exact positioning of the dots doesn’t matter, due to the interchange law.
But first, let’s illustrate whiskering: horizontal composition in which one of the natural
transformation is the identity. We can draw it like this:

𝐹 𝐺′
  
𝑖𝑑𝐹 𝛽

𝐹 𝐺

But, really, the identity can be inserted at any point on a vertical line, so we don’t even have to
draw it. The following diagram represents the whiskering of 𝛽◦𝐹 .

𝐹 𝐺′
  
𝛽

𝐹 𝐺

In Haskell, where beta is a polymorphic function:


beta :: forall x. G x -> G' x
we read this diagram as:
beta_f :: forall x. G (F x) -> G' (F x)
beta_f = beta
with the understanding that the type checker instantiates the polymorphic function beta for the
correct type.
Similarly, you can easily imagine the diagram for 𝐺◦𝛼, and its Haskell realization:
g_alpha :: forall x. G (F x) -> G (F' x)
beta_f = fmap alpha
with:
212 CHAPTER 15. MONADS AND ADJUNCTIONS

alpha :: forall x. F x -> F' x


Here’s the string diagram that corresponds to the interchange law:

𝐹 ′′ 𝐺′′
  
𝛼′ 𝛽′
𝐹′ 𝐺′
𝛼 𝛽

𝐹 𝐺

This diagram is purposefully ambiguous. Are we supposed to first do vertical composition of


natural transformations and then the horizontal one? Or should we compose 𝛽◦𝛼 and 𝛽 ′ ◦𝛼 ′
horizontally, and then compose the results vertically? The interchange law says that it doesn’t
matter: the result is the same.
Now try to replace a pair of natural transformations in this diagram with identities. If you
replace 𝛼 ′ and 𝛽 ′ , you get the horizontal composition of 𝛽◦𝛼. If you replace 𝛼 ′ and 𝛽 with
identity natural transformations, and rename 𝛽 ′ to 𝛽, you get the diagram in which 𝛼 is shifted
down with respect to 𝛽, and so on.

𝐹′ 𝐺′
  
𝛽

𝐹 𝐺

The interchange law tells us that all these diagrams are equal. We are free to slide natural
transformations like beads on a string.

String diagrams for the monad


A monad is defined as an endofunctor equipped with two natural transformations, as illustrated
by the following diagrams:

𝑇 𝑇

 𝜂   𝜇 

Id 𝑇 ◦𝑇

Since we are dealing with just one category, when translating these diagrams to string dia-
grams, we can dispose of the naming (and the shading) of categories, and just draw the strings
15.1. STRING DIAGRAMS 213

alone.

𝑇 𝑇
𝜂
𝑇 𝜇 𝑇
Id

In the first diagram it’s customary to skip the dashed line corresponding to the identity functor.
The 𝜂 dot can be used to freely inject a 𝑇 line into a diagram. Two 𝑇 lines can be joined by the
𝜇 dot.
String diagrams are especially useful in expressing monad laws. For instance, we have the
left identity law:
𝜇◦(𝜂◦𝑇 ) = 𝑖𝑑
which can be visualized as a commuting diagram:
𝜂◦𝑇
Id◦𝑇 𝑇 ◦𝑇
𝜇
𝑖𝑑
𝑇

The corresponding string diagrams represents the equality of the two paths through this diagram:

𝑇
𝑇 𝜇 =
𝑇 𝑇
𝜂

You may think of this equality as the result of yanking the top and bottom strings resulting in
the 𝜂 appendage being retracted into the straight line.
There is a symmetric right identity law:

𝑇
𝜇 𝑇 =
𝑇 𝑇
𝜂

Finally, this is the associativity law in terms of string diagrams:

𝑇 𝑇

𝜇 = 𝜇

𝜇 𝜇

𝑇 𝑇 𝑇 𝑇 𝑇 𝑇
214 CHAPTER 15. MONADS AND ADJUNCTIONS

String diagrams for the adjunction


As we discussed before, an adjunction is a relation between a pair of functors, 𝐿 ∶  →  and
𝑅 ∶  → . It can be defined by a pair of natural transformations, the unit 𝜂 and the counit 𝜀,
satisfying triangular identities.
The unit of the adjunction can be illustrated by a “cup”-shaped diagram:

𝐿 𝑅
 

The identity functor at the bottom of the diagram is omitted from the picture. The 𝜂 dot turns
the identity functor below it to the composition 𝑅◦𝐿 above it.
Similarly, the counit can be visualized as a “cap”-shaped string diagram with the implicit
identity functor at the top:


𝜀


𝑅 𝐿

Triangle identities can be easily expressed using string diagrams. They also make intuitive
sense, as you can imagine pulling on the string from both sides to straighten the curve.
For instance, this is the first triangle identity, sometimes called the zigzag identity:

𝑅 𝑅
 𝜀 

𝐿 =

𝜂
 
𝑅 𝑅

Reading the left diagram bottom-to-top produces a series of mappings:


𝜂◦𝑅 𝑅◦𝜀
𝐼𝑑 ◦𝑅 ←←←←←←←→
← 𝑅◦𝐿◦𝑅 ←←←←←←←→
← 𝑅◦𝐼𝑑
This must be equal to the right-hand-side, which may be interpreted as the (invisible) identity
natural transformation on 𝑅
In the case 𝑅 is an endofunctor, we can translate the first diagram directly to Haskell. The
whiskering of the unit of the adjunction 𝜂 by 𝑅 results in the polymorphic function unit being
instantiated at R x. The whiskering of 𝜀 results in the lifting of counit by the functor 𝑅. The
vertical composition translates to function composition:
triangle :: forall x. R x -> R x
triangle = fmap counit . unit
15.2. MONADS FROM ADJUNCTIONS 215

Exercise 15.1.1. Draw the string diagrams for the second triangle identity and translate them
to Haskell.

15.2 Monads from Adjunctions


You might have noticed that the same symbol 𝜂 is used for the unit of the adjunction and for the
unit of the monad. This is not a coincidence.
At first sight it might seem like we are comparing apples to oranges: an adjunction is defined
with two functors between two categories and a monad is defined by one endofunctor operating
on a single category. However, the composition of two functors going in opposite directions is
an endofunctor, and the unit of the adjunction maps the identity endofunctor to the endofunctor
𝑅◦𝐿.
Compare this diagram:
𝐿 𝑅
 

with the one defining the monadic unit:

𝑇

It turns out that, for any adjunction 𝐿 ⊣ 𝑅, the endofunctor 𝑇 = 𝑅◦𝐿 is a monad, with the
multiplication 𝜇 defined by the following diagram:

𝐿 𝑅
  

𝐿 𝑅 𝐿 𝑅

Reading this diagram bottom-to-top, we get the following transformation (imagine slicing it
horizontally at the dot):
𝑅◦𝜀◦𝐿
𝑅◦𝐿◦𝑅◦𝐿 ←←←←←←←←←←←←→
← 𝑅◦𝐿

Compare this with the definition of the monadic 𝜇:


216 CHAPTER 15. MONADS AND ADJUNCTIONS

𝑇

𝑇 𝑇

We get the definition of 𝜇 for the monad 𝑅◦𝐿 as the double-whiskering of 𝜀:

𝜇 = 𝑅◦𝜀◦𝐿

The Haskell translation of the string diagram defining 𝜇 in terms of 𝜀 is always possible.
The monadic multiplication, or join, becomes:
join :: forall x. T (T x) -> T x
join = fmap counit
where fmap corresponds to the lifting by the endofunctor T defined as the composition 𝑅◦𝐿.
Notice that  in this case is the Haskell category of types and functions, but  can be an outside
category.
To complete the picture, we can use string diagrams to derive monadic laws using triangle
identities. The trick is to replace all strings in monadic laws by pairs of parallel strings and then
rearrange them according to the rules.
To summarize, every adjunction 𝐿 ⊣ 𝑅 with the unit 𝜂 and counit 𝜀 defines a monad
(𝑅◦𝐿, 𝜂, 𝑅◦𝜀◦𝐿).
We’ll see later that, dually, the other composition, 𝐿◦𝑅 defines a comonad.

Exercise 15.2.1. Draw string diagrams to illustrate monadic laws (unit and associativity) for
the monad derived from an adjunction.

15.3 Examples of Monads from Adjunctions


We’ll go through several examples of adjunctions that generate some of the monads that we use
in programming. We’ll expand on these examples later, when we talk about monad transformers.
Most examples involve functors that leave the category of Haskell types and functions, even
though the round trip that generates the monad ends up being an endofunctor. This is why it’s
often impossible to express such adjunctions in Haskell.
To additionally complicate things, there is a lot of bookkeeping related to explicit naming of
data constructors, which is necessary for type inference to work. This may sometimes obscure
the simplicity on the underlying formulas.

Free monoid and the list monad


The list monad is generated by the free monoid adjunction we’ve seen before. The unit of this
adjunction, 𝜂𝑋 ∶ 𝑋 → 𝑈 (𝐹 𝑋), injects the elements of the set 𝑋 as the generators of the free
monoid 𝐹 𝑋, after which 𝑈 extracts the underlying set.
In Haskell, we represent the free monoid as a list type, and its generators are singleton lists.
The unit 𝜂𝑋 maps elements of 𝑋 to such singletons:
15.3. EXAMPLES OF MONADS FROM ADJUNCTIONS 217

return x = [x]
To implement the counit, 𝜀𝑀 ∶ 𝐹 (𝑈 𝑀) → 𝑀, we take a monoid 𝑀, forget its multiplication,
and use its set of elements as generators for a new free monoid. A component of the counit at
𝑀 is then a monoid morphism from the free monoid back to 𝑀 or, in Haskell, [m]->m. It turns
out that this monoid morphism is a special case of a catamorphism.
First, recall the Haskell implementation of a general list catamorphism:
foldMap :: Monoid m => (a -> m) -> ([a] -> m)
foldMap f = foldr mappend mempty . fmap f
Here, we interpret (a -> m) as a regular function from a to the underlying set of a monoid m.
The result is interpreted as a monoid morphism from the free monoid generated by a (that is a
list of a’s) to m. This is just one direction of the adjunction:

𝐒𝐞𝐭(𝑎, 𝑈 𝑚) ≅ 𝐌𝐨𝐧(𝐹 𝑎, 𝑚)

To get the counit as a monoid morphism [m]->m we apply foldMap to identity. The result
is (foldMap id) or, in terms of foldr:
epsilon = foldr mappend mempty
It is a monoid morphism since it maps an empty list to the monoidal unit, and concatenation to
monoidal product.
Monadic multiplication, or join, is given by the whiskering of the counit:

𝜇 = 𝑈 ◦𝜀◦𝐹

You can easily convince yourself that whiskering on the left doesn’t do much here, since it’s just
a lifting of a monoid morphism by the forgetful functor (it keeps the function while forgetting
its special property of preserving structure).
The right whiskering by 𝐹 is more interesting. It means that the component 𝜇𝑋 corresponds
to the component of 𝜀 at 𝐹 𝑋, which is the free monoid generated from the set 𝑋. This free
monoid is defined by:
mempty = []
mappend = (++)
which gives us the definition of join:
join = foldr (++) []
As expected, this is the same as concat: In the list monad, multiplication is concatenation.

The currying adjunction and the state monad


The state monad is generated by the currying adjunction that we used to define the exponential
object. The left functor is defined by a product with some fixed object 𝑠:

𝐿𝑠 𝑎 = 𝑎 × 𝑠

We can, for instance, implement it as a Haskell type:


newtype L s a = L (a, s)
218 CHAPTER 15. MONADS AND ADJUNCTIONS

The right functor is the exponentiation, parameterized by the same object 𝑠:

𝑅𝑠 𝑐 = 𝑐 𝑠

In Haskell, it’s a thinly encapsulated function type:


newtype R s c = R (s -> c)
The monad is given by the composition of these two functors. On objects:

(𝑅𝑠 ◦𝐿𝑠 )𝑎 = (𝑎 × 𝑠)𝑠

In Haskell we would write it as:


newtype St s a = St (R s (L s a))
If you expand this definition, it’s easy to recognize in it the State functor:
newtype State s a = State (s -> (a, s))
The unit of the adjunction 𝐿𝑠 ⊣ 𝑅𝑠 is a mapping:

𝜂𝑎 ∶ 𝑎 → (𝑎 × 𝑠)𝑠

which can be implemented in Haskell as:


unit :: a -> R s (L s a)
unit a = R (\s -> L (a, s))
You may recognized in it a thinly veiled version of return for the state monad:
return :: a -> State s a
return a = State (\s -> (a, s))
Here’s the component of the counit of this adjunction at 𝑐:

𝜀𝑐 ∶ 𝑐 𝑠 × 𝑠 → 𝑐

It can be implemented in Haskell as:


counit :: L s (R s a) -> a
counit (L ((R f), s))= f s
which, after stripping data constructors, is equivalent to apply, or the uncurried version of
runState.
Monad multiplication 𝜇 is given by the whiskering of 𝜀 from both sides:

𝜇 = 𝑅𝑠 ◦𝜀◦𝐿𝑠

Here it is translated to Haskell:


mu :: R s (L s (R s (L s a))) -> R s (L s a)
mu = fmap counit
Whiskering on the right doesn’t do anything other than select a component of the natural trans-
formation. This is done automatically by Haskell’s type inference engine. Whiskering on the
left is done by lifting the component of the natural transformation. Again, type inference picks
the correct implementation of fmap—here, it’s equivalent to precomposition.
Compare this with the implementation of join:
15.3. EXAMPLES OF MONADS FROM ADJUNCTIONS 219

join :: State s (State s a) -> State s a


join mma = State (fmap (uncurry runState) (runState mma))
Notice the dual use of runState:
runState :: State s a -> s -> (a, s)
runState (State h) s = h s
When it’s uncurried, its type signature becomes:
uncurry runState :: (State s a, s) -> (a, s)
which is equivalent to that of counit.
When partially applied, runState just strips the data constructor exposing the underlying
function type:
runState st :: s -> (a, s)

M-sets and the writer monad


The writer monad:
newtype Writer m a = Writer (a, m)
is parameterized by a monoid m. This monoid is used for accumulating log entries. The adjunc-
tion we are going to use involves a category of M-sets for that monoid.
An M-set is a set 𝑆 on which we define the action of a monoid 𝑀. Such an action is a
mapping:
𝑎∶ 𝑀 × 𝑆 → 𝑆
We often use the curried version of the action, with the monoid element in the subscript position.
Thus 𝑎𝑚 becomes a function 𝑆 → 𝑆.
This mapping has to satisfy some constraints. The action of the monoidal unit 1 must not
change the set, so it has to be the identity function:

𝑎1 = 𝑖𝑑𝑆

and two consecutive actions must combine to an action of their monoidal product:

𝑎𝑚1 ◦𝑎𝑚2 = 𝑎𝑚1 ⋅𝑚2

This choice of the order of multiplication defines what it called the left action. (The right action
has the two monoidal elements swapped on the right-hand side.)
M-sets form a category 𝐌𝐒𝐞𝐭. The objects are pairs (𝑆, 𝑎 ∶ 𝑀 × 𝑆 → 𝑆) and the arrows
are equivariant maps, that is functions between sets that preserve actions.
A function 𝑓 ∶ 𝑆 → 𝑅 is an equivariant mapping from (𝑆, 𝑎) to (𝑅, 𝑏) if the following
diagram commutes, for every 𝑚 ∈ 𝑀:

𝑓
𝑆 𝑅
𝑎𝑚 𝑏𝑚
𝑓
𝑆 𝑅
In other words, it doesn’t matter if we first do the action 𝑎𝑚 , and then map the set; or first map
the set, and then do the corresponding action 𝑏𝑚 .
220 CHAPTER 15. MONADS AND ADJUNCTIONS

There is a forgetful functor 𝑈 from 𝐌𝐒𝐞𝐭 to 𝐒𝐞𝐭, which assigns the set 𝑆 to the pair (𝑆, 𝑎),
thus forgetting the action.
Corresponding to it there is a free functor 𝐹 . Its action on a set 𝑆 produces an M-set. It’s
a set that is a cartesian product of 𝑆 and 𝑀, where 𝑀 is treated as a set of elements (in other
words, the result of the action of a forgetful functor on a monoid). An element of this M-set is
a pair (𝑥 ∈ 𝑆, 𝑚 ∈ 𝑀) and the free action is defined by:

𝜙𝑛 ∶ (𝑥, 𝑚) ↦ (𝑥, 𝑛 ⋅ 𝑚)

leaving the element 𝑥 unchanged, and only multiplying the 𝑚-component.


To show that 𝐹 is left adjoint to 𝑈 we have to construct the following natural isomorphism:

𝐌𝐒𝐞𝐭(𝐹 𝑆, 𝑄) ≅ 𝐒𝐞𝐭(𝑆, 𝑈 𝑄)

for any set 𝑆 and any M-set 𝑄. If we represent 𝑄 as a pair (𝑅, 𝑏), the element of the right hand
side of the adjunction is a plain function 𝑢 ∶ 𝑆 → 𝑅. We can use this function to construct an
equivariant mapping on the left.
The trick here is to notice that such an equivariant mapping 𝑓 ∶ 𝐹 𝑆 → 𝑄 is fully determined
by its action on the elements of the form (𝑥, 1) ∈ 𝐹 𝑆, where 1 is the monoidal unit.
Indeed, from the equivariance condition it follows that:

𝑓
(𝑥, 1) 𝑟
𝜙𝑚 𝑏𝑚
𝑓
(𝑥, 𝑚 ⋅ 1) 𝑟′

or:
𝑓 (𝜙𝑚 (𝑥, 1)) = 𝑓 (𝑥, 𝑚) = 𝑏𝑚 (𝑓 (𝑥, 1))
Thus every function 𝑢 ∶ 𝑆 → 𝑅 uniquely defines an equivariant mapping 𝑓 ∶ 𝐹 𝑆 → 𝑄 given
by:
𝑓 (𝑥, 𝑚) = 𝑏𝑚 (𝑢𝑥)
The unit of this adjunction 𝜂𝑆 ∶ 𝑆 → 𝑈 (𝐹 𝑆) maps an element 𝑥 to a pair (𝑥, 1). Compare
this with the definition of return for the writer monad:
return a = Writer (a, mempty)
The counit is given by an equivariant map:

𝜀𝑄 ∶ 𝐹 (𝑈 𝑄) → 𝑄

The left hand side is the M-set constructed by taking the underlying set of 𝑄 and taking its
product with the underlying set of 𝑀. The original action of 𝑄 is forgotten and replaced by the
free action. The obvious choice for the counit is:

𝜀𝑄 ∶ (𝑥, 𝑚) ↦ 𝑎𝑚 𝑥

where 𝑥 is an element of (the underlying set of) 𝑄 and 𝑎 is the action defined in 𝑄.
Monad multiplication 𝜇 is given by the whiskering of the counit.

𝜇 = 𝑈 ◦𝜀◦𝐹
15.4. MONAD TRANSFORMERS 221

It means replacing 𝑄 in the definition of 𝜀𝑄 with a free M-set whose action is the free action.
In other words, we replace 𝑥 with (𝑥, 𝑚) and 𝑎𝑛 with 𝜙𝑛 . (Whiskering with 𝑈 doesn’t change
anything.)
𝜇𝑆 ∶ ((𝑥, 𝑚), 𝑛) ↦ 𝜙𝑛 (𝑥, 𝑚) = (𝑥, 𝑛 ⋅ 𝑚)
Compare this with the definition of join for the writer monad:
join :: Monoid m => Writer m (Writer m a) -> Writer m a
join (Writer ( Writer (x, m), n)) = Writer (x, mappend n m)

Pointed objects and the Maybe monad


Pointed objects are objects with a designated element. Since picking an element is done using an
arrow from the terminal object, the category of pointed objects is defined using pairs (𝑎, 𝑝 ∶ 1 →
𝑎), where 𝑎 is an object in .
The morphisms between these pairs are the arrows in  that preserve the points. Thus a
morphism from (𝑎, 𝑝 ∶ 1 → 𝑎) to (𝑏, 𝑞 ∶ 1 → 𝑏) is an arrow 𝑓 ∶ 𝑎 → 𝑏 such that 𝑞 = 𝑓 ◦𝑝. This
category is also called a coslice category and is written as 1∕.
There is an obvious forgetful functor 𝑈 ∶ 1∕ →  that forgets the point. Its left adjoint is
a free functor 𝐹 that maps an object 𝑎 to a pair (1 + 𝑎, Left). In other words, 𝐹 freely ads a point
to an object using a coporoduct.
The Either monad is similarly constructed by replacing 1 with a fixed object 𝑒.

Exercise 15.3.1. Show that 𝑈 ◦𝐹 is the Maybe monad.

The continuation monad


The continuation monad is defined in terms of a pair of contravariant functors in the category
of sets. We don’t have to modify the definition of the adjunction to work with contravariant
functors. It’s enough to select the opposite category for one of the endpoints.
We’ll define the left functor as:

𝐿𝑍 ∶ 𝐒𝐞𝐭 𝑜𝑝 → 𝐒𝐞𝐭

It maps a set 𝑋 to the hom-set in 𝐒𝐞𝐭:

𝐿𝑍 𝑋 = 𝐒𝐞𝐭(𝑋, 𝑍)

This functor is parameterized by another set 𝑍. The right functor is defined by essentially the
same formula:
𝑅𝑍 ∶ 𝐒𝐞𝐭 → 𝐒𝐞𝐭 𝑜𝑝
𝑅𝑍 𝑋 = 𝐒𝐞𝐭 𝐨𝐩 (𝑍, 𝑋) = 𝐒𝐞𝐭(𝑋, 𝑍)
The composition 𝑅◦𝐿 can be written in Haskell as ((x -> r) -> r), which is the same
as the (covariant) endofunctor that defines the continuation monad.

15.4 Monad Transformers


Suppose that you want to combine multiple effects, say, state with the possibility of failure. One
option is to define your own monad from scratch. You define a functor:
222 CHAPTER 15. MONADS AND ADJUNCTIONS

newtype MaybeState s a = MS (s -> Maybe (a, s))


deriving Functor
together with the function to extract the result (or report failure):
runMaybeState :: MaybeState s a -> s -> Maybe (a, s)
runMaybeState (MS h) s = h s
You define the monad instance for it:
instance Monad (MaybeState s) where
return a = MS (\s -> Just (a, s))
ms >>= k = MS (\s -> case runMaybeState ms s of
Nothing -> Nothing
Just (a, s') -> runMaybeState (k a) s')
and, if you are diligent enough, check that it satisfies the monad laws.
There is no general recipe for combining monads. In that sense, monads are not composable.
However, we know that adjunctions are composable. We’ve also seen how to get monads from
adjunctions and, as we’ll soon see, every monad can be obtained this way. So, if we can match
adjunctions, the monads that they generate will automatically compose.
Consider two composable adjunctions:

𝐿′ 𝐿

  
𝑅′ 𝑅

There are three monads in this picture. There is the “inner” monad 𝑅′ ◦𝐿′ and the “outer” monad
𝑅◦𝐿 as well as the composite 𝑅◦𝑅′ ◦𝐿′ ◦𝐿.
If we call the inner monad 𝑇 = 𝑅′ ◦𝐿′ , then 𝑅◦𝑇 ◦𝐿 is the composite monad called the
monad transformer, because it transforms the monad 𝑇 into a new monad.

𝑇 =𝑅′ ◦𝐿′

 
𝑅

In our example, we can treat Maybe as the inner monad:

𝑇𝑎 = 1 + 𝑎

It is transformed using the outer adjunction 𝐿𝑠 ⊣ 𝑅𝑠 , the one that generates the state monad:

𝐿𝑠 𝑎 = 𝑎 × 𝑠

𝑅𝑠 𝑐 = 𝑐 𝑠
The result is:
(𝑅𝑠 ◦𝑇 ◦𝐿𝑠 )𝑎 = (1 + 𝑎 × 𝑠)𝑠
or, in Haskell:
15.4. MONAD TRANSFORMERS 223

s -> Maybe (a, s)


which matches the definition of our MaybeState monad.
In general, the inner monad 𝑇 is defined by its unit 𝜂 𝑖 and multiplication 𝜇 𝑖 (the superscript
𝑖 standing for “inner”). The “outer” adjunction is defined by its unit 𝜂 𝑜 and counit 𝜀𝑜 .
The unit of the composite monad is the natural transformation:

𝜂 ∶ 𝐼𝑑 → 𝑅◦𝑇 ◦𝐿

given by the string diagram:


𝐿 𝑇 𝑅

𝜂𝑖

𝜂𝑜

It is the vertical composition of the whiskered inner unit 𝑅◦𝜂 𝑖 ◦𝐿 and the outer unit 𝜂 𝑜 . In
components:
𝑖
𝜂𝑎 = 𝑅(𝜂𝐿𝑎 )◦𝜂𝑎𝑜
The multiplication of the composite monad is a natural transformation:

𝜇 ∶ 𝑅◦𝑇 ◦𝐿◦𝑅◦𝑇 ◦𝐿 → 𝑅◦𝑇 ◦𝐿

given by the string diagram:

𝐿 𝑇 𝑅

𝜇𝑖
𝜀𝑜

𝐿 𝑇 𝑅 𝐿 𝑇 𝑅

It’s the vertical composition of the multiply whiskered outer counit:

𝑅◦𝑇 ◦𝜀𝑜 ◦𝑇 ◦𝐿

followed by the whiskered inner multiplication 𝑅◦𝜇𝑖 ◦𝐿. In components:


𝑖
𝜇𝑐 = 𝑅(𝜇𝐿𝑐 )◦(𝑅◦𝑇 )(𝜀𝑜(𝑇 ◦𝐿)𝑐 )

State monad transformer


Let’s unpack these equations for the case of the state monad transformer. The state monad is
generated by the currying adjunction. The left functor 𝐿𝑠 is the product functor (a, s), and
the right functor 𝑅𝑠 is the exponential, a.k.a., the reader functor (s -> a).
As we’ve seen before, the outer counit 𝜀𝑜𝑎 is function application:
224 CHAPTER 15. MONADS AND ADJUNCTIONS

counit :: (s -> a, s) -> a


counit (f, x) = f x
and the unit 𝜂𝑎𝑜 is the curried pair constructor:
unit :: a -> s -> (a, s)
unit x = \s -> (x, s)
We’ll keep the inner monad (𝑇 , 𝜂 𝑖 , 𝜇𝑖 ) arbitrary. In Haskell, we’ll call this triple m, return,
and join.
The composite monad that we get by applying the state monad transformer to the monad 𝑇 ,
is the composition 𝑅◦𝑇 ◦𝐿 or, in Haskell:
newtype StateT s m a = StateT (s -> m (a, s))

runStateT :: StateT s m a -> s -> m (a, s)


runStateT (StateT h) s = h s
The unit of the monad transformer is the vertical composition of 𝜂 𝑜 and 𝑅◦𝜂 𝑖 ◦𝐿. In com-
ponents:
𝑖
𝜂𝑎 = 𝑅(𝜂𝐿𝑎 )◦𝜂𝑎𝑜

There are a lot of moving parts in this formula, so let’s analyze it step-by-step. We start from
the right: we have the 𝑎-component of the unit of the adjunction, which is an arrow from 𝑎 to
𝑅(𝐿𝑎). In Haskell, it’s the function unit.
unit :: a -> s -> (a, s)
Let’s evaluate this function at some x :: a. The result is another function s -> (a, s). We
pass this function as an argument to 𝑅(𝜂𝐿𝑎 𝑖
).
𝜂𝐿𝑎 is the component of return of the inner monad taken at 𝐿𝑎. Here, 𝐿𝑎 is the type
𝑖

(a, s). So we are instantiating the polymorphic function return :: a -> m a as a function
(a, s) -> m (a, s). (The type inferencer will do this automatically for us.)
Next, we are lifting this component of return using 𝑅. Here, 𝑅 is the exponential (−)𝑠 ,
so it lifts a function by post-composition. It will post-compose return to whatever function is
passed to it. In our case, that’s the function that was produced by unit. Notice that the types
match: we are post-composing (a, s) -> m (a, s) after s -> (a, s).
We can write the result of this composition as:
return x = StateT (return . \s -> (x, s))
or, inlining function composition:
return x = StateT (\s -> return (x, s))
We inserted the data constructor StateT to make the type checker happy. This is the return
of the composite monad in terms of the return of the inner monad.
The same reasoning can be applied to the formula for the component of the composite 𝜇 at
some 𝑎:

𝑖
𝜇𝑎 = 𝑅(𝜇𝐿𝑎 )◦(𝑅◦𝑇 )(𝜀𝑜(𝑇 ◦𝐿)𝑎 )

The inner 𝜇 𝑖 is the join of the monad m. Applying 𝑅 turns it into post-composition.
The outer 𝜀𝑜 is function application taken at 𝑇 (𝐿𝑎) or m (a, s). It’s a function of the type:
15.5. MONAD ALGEBRAS 225

(s -> m (a, s), s) -> m (a, s)


which, inserting the appropriate data constructors, can be written as uncurry runStateT:
uncurry runStateT :: (StateT s m a, s) -> m (a, s)
The application of (𝑅◦𝑇 ) lifts this component of 𝜀 using the composition of functors 𝑅 and 𝑇 .
The former is implemented as post-composition, and the latter is the fmap of the monad m.
Putting all this together, we get a point-free formula for join of the state monad trans-
former:
join :: StateT s m (StateT s m a) -> StateT s m a
join mma = StateT (join . fmap (uncurry runStateT) . runStateT mma)
Here, the partially applied (runStateT mma) strips off the data constructor from the argument
mma:
runStateT mma :: s -> m (a, x)
Our earlier example of MaybeState can now be rewritten using a monad transformer:
type MaybeState s a = StateT s Maybe a

The vanilla State monad can be recovered by applying the StateT monad transformer to
the identity functor, which has a Monad instance defined in the library (notice that the last type
variable a is skipped in this definition):
type State s = StateT s Identity

Other monad transformers follow the same pattern. They are defined in the Monad Trans-
former Library, MTL.

15.5 Monad Algebras


Every adjunction generates a monad, and so far we’ve been able to define adjunctions for all
the monads of interest for us. But is every monad generated by an adjunction? The answer is
yes, and there are usually many adjunctions—in fact a whole category of adjunctions—for every
monad.
Finding an adjunction for a monad is analogous to factorization. We want to express a
functor as a composition of two other functors, 𝑇 = 𝑅◦𝐿. The problem is complicated by the
fact that this factorization also requires finding the appropriate intermediate category. We’ll find
such a category by studying algebras for a monad.
A monad is defined by an endofunctor, and we know that it’s possible to define algebras for
an endofunctor. Mathematicians often think of monads as tools for generating expressions and
algebras as tools for evaluating those expressions. However, expressions generated by monads
impose some compatibility conditions on those algebras.
For instance, you may notice that the monadic unit 𝜂𝑎 ∶ 𝑎 → 𝑇 𝑎 has the type signature
that looks like the inverse of the structure map of an algebra 𝛼 ∶ 𝑇 𝑎 → 𝑎. Of course, 𝜂 is a
natural transformation that is defined for every type, whereas an algebra has a fixed carrier type.
Nevertheless, we might reasonably expect that one might undo the action of the other.
Consider the earlier example of the expression monad Ex. An algebra for this monad is a
choice of the carrier type, let’s say Char and an arrow:
226 CHAPTER 15. MONADS AND ADJUNCTIONS

alg :: Ex Char -> Char


Since Ex is a monad, it defines a unit, or return, which is a polymorphic function that can be
used to generate simple expressions from values. The unit of Ex is:
return x = Var x
We can instantiate the unit for an arbitrary type, in particular for the carrier type of our algebra.
It makes sense to demand that evaluating Var c, where c is a character, should give us back the
same c. In other words, we’d like:
alg . return = id
This condition will immediately eliminate a lot of algebras, such as:
alg (Var c) = 'a' -- not compatible with the monad Ex
The second condition we’d like to impose is that the algebra that’s compatible with a monad
respects substitution. A monad lets us flatten nested expressions using join. An algebra lets us
evaluate such expressions.
There are two ways of doing that: we can apply the algebra to a flattened expression, or we
can apply it to the inner expression first (using fmap), and then evaluate the resulting expression.
alg (join mma) = alg (fmap alg mma)
where mma is of the nested type Ex (Ex Char).
In category theory these two conditions define a monad algebra.
We say that (𝑎, 𝛼 ∶ 𝑇 𝑎 → 𝑎) is a monad algebra for the monad (𝑇 , 𝜇, 𝜂) if the following
diagrams commute:
𝜂𝑎 𝑇𝛼
𝑎 𝑇𝑎 𝑇 (𝑇 𝑎) 𝑇𝑎
𝛼 𝜇𝑎 𝛼
𝑖𝑑𝑎
𝑎 𝛼
𝑇𝑎 𝑎
These laws are sometimes called the unit law and the multiplication law for monad algebras.
Since monad algebras are just special kinds of algebras, they form a sub-category of alge-
bras. Recall that algebra morphisms are arrows that satisfy the following condition:

𝑇𝑓
𝑇𝑎 𝑇𝑏
𝛼 𝛽
𝑓
𝑎 𝑏

In light of this definition, we can re-interpret the second monad-algebra diagram as asserting
that the structure map 𝛼 of a monad algebra (the bottom arrow) is also an algebra morphism from
(𝑇 𝑎, 𝜇𝑎 ) to (𝑎, 𝛼). This will come in handy in what follows.

Eilenberg-Moore category
The category of monad algebras for a given monad 𝑇 on  is called the Eilenberg-Moore cat-
egory and is denoted by  𝑇 . It turns out that it is a good choice for the intermediate category
that lets us factorize the monad 𝑇 as a composition of a pair of adjoint functors.
The process goes as follows: we define a pair of functors, show that they form an adjunction,
and then show that the monad generated by this adjunction is the original monad.
15.5. MONAD ALGEBRAS 227

First of all, there is an obvious forgetful functor, which we’ll call 𝑈 𝑇 , from  𝑇 to . It maps
an algebra (𝑎, 𝛼) to its carrier 𝑎, and treats algebra morphisms as regular morphisms between
carriers.
More interestingly, there is a free functor 𝐹 𝑇 that is the left adjoint to 𝑈 𝑇 .

𝐹𝑇

𝑇 
𝑈𝑇

On objects, 𝐹 𝑇 maps an object 𝑎 of  to a monad algebra, an object in  𝑇 . For the carrier


of this algebra we pick not 𝑎 but 𝑇 𝑎. For the structure map, which is the mapping 𝑇 (𝑇 𝑎) → 𝑇 𝑎
we pick the component of monad multiplication 𝜇𝑎 ∶ 𝑇 (𝑇 𝑎) → 𝑇 𝑎.
It’s easy to check that this algebra (𝑇 𝑎, 𝜇𝑎 ) is indeed a monad algebra—the necessary com-
muting conditions follow from monad laws. Indeed, substituting the algebra (𝑇 𝑎, 𝜇𝑎 ) into the
monad-algebra diagrams, we get (with the algebra part drawn in red):

𝜂𝑇 𝑎 𝑇 𝜇𝑎
𝑇𝑎 𝑇 (𝑇 𝑎) 𝑇 (𝑇 (𝑇 𝑎)) 𝑇 (𝑇 𝑎)
𝜇𝑎 𝜇𝑇 𝑎 𝜇𝑎
𝑖𝑑𝑇 𝑎
𝜇𝑎
𝑇𝑎 𝑇 (𝑇 𝑎) 𝑇𝑎

The first diagram is just the left monadic unit law in components. The 𝜂𝑇 𝑎 arrow corresponds to
the whiskering of 𝜂◦𝑇 . The second diagram is the associativity of 𝜇 with the two whiskerings
𝜇◦𝑇 and 𝑇 ◦𝜇 expressed in components.
To prove that we have an adjunction, we’ll define two natural transformations to serve as
the unit and the counit of the adjunction.
For the unit of the adjunction we pick the monadic unit 𝜂 of 𝑇 . They both have the same
signature—in components, 𝜂𝑎 ∶ 𝑎 → 𝑈 𝑇 (𝐹 𝑇 𝑎).
The counit is a natural transformation:

𝜀 ∶ 𝐹 𝑇 ◦𝑈 𝑇 → 𝐼𝑑

The component of 𝜀 at (𝑎, 𝛼) is an algebra morphism from the free algebra generated by 𝑎, that
is (𝑇 𝑎, 𝜇𝑎 ), back to (𝑎, 𝛼). As we’ve seen earlier, 𝛼 itself is such a morphism. We can therefore
pick 𝜀(𝑎,𝛼) = 𝛼.
Triangular identities for these definitions of 𝜂 and 𝜀 follow from unit laws for the monad and
the monad algebra.
As is true for all adjunctions, the composition 𝑈 𝑇 ◦𝐹 𝑇 is a monad. We’ll show that this the
same monad we started with. Indeed, on objects, the composition 𝑈 𝑇 (𝐹 𝑇 𝑎) first maps 𝑎 to a
free monad algebra (𝑇 𝑎, 𝜇) and then forgets the structure map. The net result is the mapping of
𝑎 to 𝑇 𝑎, which is exactly what the original monad did.
On arrows, it lifts an arrow 𝑓 ∶ 𝑎 → 𝑏 using 𝑇 . The fact that the arrow 𝑇 𝑓 is an algebra
morphism from (𝑇 𝑎, 𝜇𝑎 ) to (𝑇 𝑏, 𝜇𝑏 ) follows from naturality of 𝜇:

𝑇 (𝑇 𝑓 )
𝑇 (𝑇 𝑎) 𝑇 (𝑇 𝐵)
𝜇𝑎 𝜇𝑏
𝑇𝑓
𝑇𝑎 𝑇𝐵
228 CHAPTER 15. MONADS AND ADJUNCTIONS

Finally, we have to show that the unit and the counit of the monad 𝑈 𝑇 ◦𝐹 𝑇 are the same as
the unit and the counit of our original monad.
The units are the same by construction.
The monad multiplication of 𝑈 𝑇 ◦𝐹 𝑇 is given by the whiskering 𝑈 𝑇 ◦𝜀◦𝐹 𝑇 of the unit of
the adjunction. In components, this means instantiating 𝜀 at (𝑇 𝑎, 𝜇𝑎 ), which gives us 𝜇𝑎 (the
action of 𝑈 𝑇 on arrows is trivial). This is indeed the original monad multiplication.
We have thus shown that, for any monad 𝑇 we can define the Eilenberg-Moore category and
a pair of adjoint functors that factorize this monad.

Kleisli category
Inside every Eilenberg-Moore category there is a smaller Kleisli category struggling to get out.
This smaller category is the image of the free functor we have constructed in the previous section.
Despite appearances, the image of a functor does not necessarily define a subcategory.
Granted, it maps identities to identities and composition to composition. The problem may
arise if two arrows that were not composable in the source category become composable in the
target category. This may happen if the target of the first arrow is mapped to the same object
as the source of the second arrow. In the example below, 𝐹 𝑓 and 𝐹 𝑔 are composable, but their
composition 𝐹 𝑔◦𝐹 𝑓 may be absent from the image of the first category.

𝑎
𝑓 𝐹𝑎
𝑏 𝐹𝑓

𝐹𝑏 = 𝐹𝑐 𝐹 𝑔◦𝐹 𝑓

𝑐 𝐹𝑔
𝑔
𝐹𝑑
𝑑
However, the free functor 𝐹 𝑇 maps distinct objects into distinct free algebras, so its image
is indeed a subcategory of  𝑇 .
We have encountered the Kleisli category before. There are many ways of constructing the
same category, and the simplest one is to describe the Kleisli category is in terms of Kleisli
arrows.
A Kleisli category for the monad (𝑇 , 𝜂, 𝜇) is denoted by 𝑇 . Its objects are the same as the
objects of , but an arrow in 𝑇 from 𝑎 to 𝑏 is represented by an arrow in  that goes from 𝑎 to
𝑇 𝑏. You may recognize it as the Kleisli arrow a -> m b we’ve defined before. Because 𝑇 is
a monad, these Kleisli arrows can be composed using the “fish” operator <=<.
To establish the adjunction:
𝐿𝑇

𝑇 
𝑅𝑇

we define the left functor 𝐿𝑇 ∶  → 𝑇 as identity on objects. We still have to define what it
does to arrows. It should map a regular arrow 𝑓 ∶ 𝑎 → 𝑏 to a Kleisli arrow from 𝑎 to 𝑏. This
Kleisli arrow 𝑎 ↠ 𝑏 is represented by an arrow 𝑎 → 𝑇 𝑏 in . Such an arrow always exists as
the composite 𝜂𝑏 ◦𝑓 :
𝑓 𝜂𝑏
𝐿𝑇 𝑓 ∶ 𝑎 ←←←→
← 𝑏 ←←←→
← 𝑇𝑏
15.5. MONAD ALGEBRAS 229

The right functor 𝑅𝑇 ∶ 𝑇 →  is defined on objects as a mapping that takes an 𝑎 in the


Kleisli category to an object 𝑇 𝑎 in . Given a Kleisli arrow 𝑎 ↠ 𝑏, which is represented by an
arrow 𝑔 ∶ 𝑎 → 𝑇 𝑏, 𝑅𝑇 will map it to an arrow 𝑅𝑇 𝑎 → 𝑅𝑇 𝑏, that is an arrow 𝑇 𝑎 → 𝑇 𝑏 in .
We take this arrow to be the composite 𝜇𝑏 ◦𝑇 𝑔:
𝑇𝑔 𝜇𝑏
← 𝑇 (𝑇 𝑏) ←←←←→
𝑇 𝑎 ←←←←←→ ← 𝑇𝑏

To establish the adjunction, we’ll show the isomorphism of hom-sets:

𝑇 (𝐿𝑇 𝑎, 𝑏) ≅ (𝑎, 𝑅𝑇 𝑏)

An element of the left hand-side is a Kleisli arrow 𝑎 ↠ 𝑏, which is represented by 𝑓 ∶ 𝑎 → 𝑇 𝑏.


We can find the same arrow on the right hand side, since 𝑅𝑇 𝑏 is 𝑇 𝑏. So the isomorphism is
between Kleisli arrows in  𝑇 and the arrows in  that represent them.
The composite 𝑅𝑇 ◦𝐿𝑇 is equal to 𝑇 and, indeed, it can be shown that this adjunction gen-
erates the original monad.
In general, there may be many adjunctions that generate the same monad. Adjunctions them-
selves form a 2-category, so it’s possible to compare adjunctions using adjunction morphisms
(1-cells in the 2-category). It turns out that the Kleisli adjunction is the initial object among all
adjunctions that generate a given monad. Dually, the Eilenberg-Moore adjunction is terminal.
Chapter 16

Comonads

If it were easily pronounceable, we should probably call side effects “ntext,” because the dual
to side effects is “context."
Just like we were using Kleisli arrows to deal with side effects, we use co-Kleisli arrows to
deal with contexts.
Let’s start with the familiar example of an environment as a context. We have previously
constructed a reader monad from it, by currying the arrow:
(a, e) -> b
This time, however, we’ll treat it as a co-Kleisli arrow, which is an arrow from a “contextualized”
argument.
As was the case with monads, we are interested in being able to compose such arrows. This
is relatively easy for the environment-carrying arrows:
composeWithEnv :: ((b, e) -> c) -> ((a, e) -> b) -> ((a, e) -> c)
composeWithEnv g f = \(a, e) -> g (f (a, e), e)
It’s also straightforward to implement an arrow that serves as an identity with respect to this
composition:
idWithEnv :: (a, e) -> a
idWithEnv (a, e) = a
This strongly suggests the idea that there is a category in which co-Kleisli arrows serve as
morphisms.

Exercise 16.0.1. Show that the composition of co-Kleisli arrows using composeWithEnv is
associative.

16.1 Comonads in Programming


A functor w (consider it a stylized upside-down m) is a comonad if it supports composition of
co-Kleisli arrows:
class Functor w => Comonad w where
(=<=) :: (w b -> c) -> (w a -> b) -> (w a -> c)
extract :: w a -> a

231
232 CHAPTER 16. COMONADS

Here, the composition is written in the form of an infix operator. The unit of composition is
called extract, since it extracts a value from the context.
Let’s try it with our example. It is convenient to pass the environment as the first component
of the pair. The comonad is then given by the functor that’s a partial application of the pair
constructor ((,) e).
instance Comonad ((,) e) where
g =<= f = \ea -> g (fst ea, f ea)
extract = snd
As with monads, co-Kleisli composition may be used in point-free style of programming.
But we can also use the dual to join called duplicate:
duplicate :: w a -> w (w a)
or the dual to bind called extend:
extend :: (w a -> b) -> w a -> w b
Here’s how we can implement co-Kleisli composition in terms of duplicate and fmap:
g =<= f = g . fmap f . duplicate

Exercise 16.1.1. Implement duplicate in terms of extend and vice versa.

The Stream comonad


Interesting examples of comonads deal with larger, sometimes infinite, contexts. Here’s an
infinite stream:
data Stream a = Cons a (Stream a)
deriving Functor
If we consider such a stream as a value of the type a in the context of an infinite tail, we can
provide a Comonad instance for it:
instance Comonad Stream where
extract (Cons a as) = a
duplicate (Cons a as) = Cons (Cons a as) (duplicate as)
Here, extract returns the head of the stream and duplicate turns a stream into a stream of
streams, in which each consecutive stream is the tail of the previous one.
The intuition is that duplicate sets the stage for iteration, but it does it in a very general
way. The head of each of the substreams can be interpreted as a future “current position” in the
original stream.
It would be easy to perform a computation that goes over the head elements of these streams.
But that’s not where the power of a comonad lies. It lets us perform computations that require
an arbitrary look-ahead. Such a computation requires access not only to heads of consecutive
substreams, but to their tails as well.
This is what extend does: it applies a given co-Kleisli arrow f to all the streams generated
by duplicate:
extend f (Cons a as) = Cons (f (Cons a as)) (extend f as)
Here’s an example of a co-Kleisli arrow that averages the first five elements of a stream:
16.1. COMONADS IN PROGRAMMING 233

avg :: Stream Double -> Double


avg = (/5). sum . stmTake 5
It uses a helper function that extracts the first n items:
stmTake :: Int -> Stream a -> [a]
stmTake 0 _ = []
stmTake n (Cons a as) = a : stmTake (n - 1) as

We can run avg over the whole stream using extend to smooth local fluctuation. Electri-
cal engineers might recognize this as a simple low-pass filter with extend implementing the
convolution. It produces a running average of the original stream.
smooth :: Stream Double -> Stream Double
smooth = extend avg

Comonads are useful for structuring computations in spatially or temporally extended data
structures. Such computations are local enough to define the “current location,” but require
gathering information from neighboring locations. Signal processing or image processing are
good examples. So are simulations, in which differential equations have to be iteratively solved
inside volumes: climate simulations, cosmological models, or nuclear reactions, to name a few.
Conway’s Game of Life is also a good testing ground for comonadic methods.
Sometimes it’s convenient to perform calculation on continuous streams of data, postponing
the sampling until the very last step. Here’s an example of a signal that is a function of time
(represented by Double)
data Signal a = Sig (Double -> a) Double
The first component is a continuous stream of a’s implemented as a function of time. The second
component is the current time.
This is the Comonad instance for the continuous stream:
instance Comonad Signal where
extract (Sig f x) = f x
duplicate (Sig f x) = Sig (\y -> Sig f (x - y)) x
extend g (Sig f x) = Sig (\y -> g (Sig f (x - y))) x
Here, extend convolves the filter
g :: Signal a -> a
over the whole stream.

Exercise 16.1.2. Implement the Comonad instance for a bidirectional stream:


data BiStream a = BStr [a] [a]
Assume that both list are infinite. Hint: Consider the first list as the past (in reverse order); the
head of the second list as the present, and its tail as the future.

Exercise 16.1.3. Implement a low-pass filter for BiStream from the previous exercise. It aver-
ages over three values: the current one, the one from the immediate past, and the one from the
immediate future. For electrical engineers: implement a Gaussian filter.
234 CHAPTER 16. COMONADS

16.2 Comonads Categorically


We can get the definition of a comonad by reversing the arrows in the definition of a monad.
Our duplicate corresponds to the reversed join, and extract is the reversed return.
A comonad is thus an endofunctor 𝑊 equipped with two natural transformations:

𝛿 ∶ 𝑊 → 𝑊 ◦𝑊
𝜀 ∶ 𝑊 → Id

These transformations (corresponding to duplicate and extract, respectively) must sat-


isfy the same identities as the monad, except with the arrows reversed.
These are the counit laws:
𝜀◦𝑊 𝑊 ◦𝜀
Id◦𝑊 𝑊 ◦𝑊 𝑊 ◦Id
𝛿
= =
𝑊
and this is the associativity law:
=
(𝑊 ◦𝑊 )◦𝑊 𝑊 ◦(𝑊 ◦𝑊 )
𝛿◦𝑊 𝑊 ◦𝛿

𝑊 ◦𝑊 𝑊 ◦𝑊

𝛿 𝛿
𝑊

Comonoids
We’ve seen how monadic laws follow from monoid laws. We can expect that comonad laws
should follow from a dual version of a monoid.
Indeed, a comonoid is an object 𝑤 in a monoidal category (, ⊗, 𝐼) equipped with two
morphisms called co-multiplication and a co-unit:

𝛿∶ 𝑤 → 𝑤 ⊗ 𝑤
𝜀∶ 𝑤 → 𝐼

We can replace the tensor product with endofunctor composition and the unit object with the
identity functor to get the definition of a comonad as a comonoid in the category of endofunctors.
In Haskell we can define a Comonoid typeclass for the cartesian product:
class Comonoid w where
split :: w -> (w, w)
destroy :: w -> ()
Comonoids are less talked about than their siblings, monoids, mainly because they are taken
for granted. In a cartesian category, every object can be made into a comonoid just by using the
diagonal map Δ𝑎 ∶ 𝑎 → 𝑎 × 𝑎 for co-multiplication, and the unique arrow to the terminal object
for counit.
In programming this is something we do without thinking. Co-multiplication means being
able to duplicate a value, and counit means being able to abandon a value.
In Haskell, we can easily implement the Comonoid instance for any type:
16.3. COMONADS FROM ADJUNCTIONS 235

instance Comonoid w where


split w = (w, w)
destroy w = ()

In fact, we don’t think twice of using the argument of a function twice, or not using it at all. But,
if we wanted to be explicit, functions like:
f x = x + x
g y = 42

could be written as:


f x = let (x1, x2) = split x
in x1 + x2
g y = let () = destroy y
in 42

There are some situations, though, when duplicating or discarding a variable is undesirable.
This is the case when the argument is an external resource, like a file handle, network port, or
a chunk of memory allocated on the heap. Such resources are supposed to have well-defined
lifetimes between being allocated and deallocated. Tracking lifetimes of objects that can be
easily duplicated or discarded is very difficult and a notorious source of programming errors.
A programming model based on a cartesian category will always have this problem. The
solution is to instead use a monoidal (closed) category that doesn’t support duplication or de-
struction of objects. Such a category is a natural setting for linear types. Elements of linear
types are used in Rust and, at the time of this writing, are being tried in Haskell. In C++ there
are constructs that mimic linearity, like unique_ptr and move semantics.

16.3 Comonads from Adjunctions


We’ve seen that an adjunction 𝐿 ⊣ 𝑅 between two functors 𝐿 ∶  →  and 𝑅 ∶  →  gives
rise to a monad 𝑅◦𝐿 ∶  → . The other composition, 𝐿◦𝑅, which is an endofunctor in ,
turns out to be a comonad.
The counit of the adjunction serves as the counit of the comonad. This can be illustrated by
the following string diagram:

 
𝑅 𝐿

The comultiplication is given by the whiskering of 𝜂:

𝛿 = 𝐿◦𝜂◦𝑅
236 CHAPTER 16. COMONADS

as illustrated by this string diagram:

𝑅 𝐿 𝑅 𝐿
𝜂

  
𝑅 𝐿

As before, comonad laws can be derived from triangle identities.

Costate comonad
We’ve seen that the state monad can be generated from the currying adjunction between the
product and the exponential. The left functor was defined as a product with some fixed object
𝑠:
𝐿𝑠 𝑎 = 𝑎 × 𝑠
and the right functor was the exponentiation, parameterized by the same object 𝑠:

𝑅𝑠 𝑐 = 𝑐 𝑠

The composition 𝐿𝑠 ◦𝑅𝑠 generates a comonad called the costate comonad or the store comonad.
Translated to Haskell, the right functor assigns a function type s->c to c, and the left functor
pairs c with s. The result of the composition is the endofunctor:
data Store s c = St (s -> c) s
or, using GADT notation:
data Store s c where
St :: (s -> c) -> s -> Store s c
The functor instance post-composes the function to the first component of Store:
instance Functor (Store s) where
fmap g (St f s) = St (g . f) s
The counit of this adjunction, which becomes the comonadic extract, is function applica-
tion:
extract :: Store s c -> c
extract (St f s) = f s
The unit of this adjunction is a natural transformation 𝜂 ∶ Id → 𝑅𝑠 ◦𝐿𝑠 . We’ve used it as the
return of the state monad. This is its component at c:
unit :: c -> (s -> (c, s))
unit c = \s -> (c, s)
To get duplicate we need to whisker 𝜂 it between the two functors:

𝛿 = 𝐿𝑠 ◦𝜂◦𝑅𝑠

Whiskering on the right means taking the component of 𝜂 at the object 𝑅𝑠 𝑐, and whiskering
on the left means lifting this component using 𝐿𝑠 . Since Haskell translation of whiskering is a
tricky process, let’s analyze it step-by-step.
16.3. COMONADS FROM ADJUNCTIONS 237

For simplicity, let’s fix the type s to, say, Int. We encapsulate the left functor into a
newtype:
newtype Pair c = P (c, Int)
deriving Functor
and keep the right functor a type synonym:
type Fun c = Int -> c
The unit of the adjunction can be written as a natural transformation using explicit forall:
eta :: forall c. c -> Fun (Pair c)
eta c = \s -> P (c, s)
We can now implement comultiplication as the whiskering of eta. The whiskering on the
right is encoded in the type signature, by using the component of eta at Fun c. The whiskering
on the left is done by lifting eta using the fmap defined for the Pair functor. We use the
language pragma TypeApplications to make it explicit which fmap is to be used:
delta :: forall c. Pair (Fun c) -> Pair (Fun (Pair (Fun c)))
delta = fmap @Pair eta
This can be rewritten more explicitly as:
delta (P (f, s)) = P (\s' -> P (f, s'), s)
The Comonad instance can thus be written as:
instance Comonad (Store s) where
extract (St f s) = f s
duplicate (St f s) = St (St f) s
The store comonad is a useful programming concept. To understand it, let’s consider again
the case where s is Int.
We interpret the first component of Store Int c, the function f :: Int -> c, to be an
accessor to an imaginary infinite stream of values, one for each integer.
The second component can be interpreted as the current index. Indeed, extract uses this
index to retrieve the current value.
With this interpretation, duplicate produces an infinite stream of streams, each shifted by
a different offset, and extend performs a convolution on this stream. Of course, laziness saves
the day: only the values we explicitly demand will be evaluated.
Notice also that our earlier example of the Signal comonad is reproduced by Store Double.

Exercise 16.3.1. A cellular automaton can be implemented using the store comonad. This is
the co-Kleisli arrow describing rule 110:
step :: Store Int Cell -> Cell
step (St f n) =
case (f (n-1), f n, f (n+1)) of
(L, L, L) -> D
(L, D, D) -> D
(D, D, D) -> D
_ -> L
A cell can be either live or dead:
238 CHAPTER 16. COMONADS

data Cell = L | D
deriving Show
Run a few generations of this automaton. Hint: Use the function iterate from the Prelude.

Comonad coalgebras
Dually to monad algebras we have comonad coalgebras. Given a comonad (𝑊 , 𝜀, 𝛿), we can
construct a coalgebra, which consists of a carrier object 𝑎 and an arrow 𝜙 ∶ 𝑎 → 𝑊 𝑎. For this
coalgebra to compose nicely with the comonad, we’ll require that we be able to extract the value
that was injected using 𝜙; and that the lifting of 𝜙, when acting on the result of 𝜙, be equivalent
to duplication:
𝜀𝑎 𝑊𝜙
𝑎 𝑊𝑎 𝑊 (𝑊 𝑎) 𝑊𝑎
𝜙 𝛿𝑎 𝜙
𝑖𝑑𝑎
𝜙
𝑎 𝑊𝑎 𝑎
Just like with monad algebras, comonad coalgebras form a category. Given a comonad
(𝑊 , 𝜀, 𝛿) in , its comonad coalgebras form the category called the Eilenberg-Moore category
(sometimes prefixed with co-)  𝑊 .
There is a co-Kleisli subcategory of  𝑊 denoted by 𝑊
Given a comonad 𝑊 , we can construct an adjunction using either  𝑊 or 𝑊 that reproduces
the comonad 𝑊 . The construction is fully analogous to the one for monads.

Lenses
The coalgebra for the Store comonad is of particular interest. We’ll do some renaming first.
Let’s call the carrier s and the state a.
data Store a s = St (a -> s) a
The coalgebra is given by a function:
phi :: s -> Store a s
which is equivalent to a pair of functions:
set :: s -> a -> s
get :: s -> a
Such a pair is called a lens: s is called the source, and a is the focus.
With this interpretation get lets us extract the focus, and set replaces the focus with a new
value to produce a new s.
Lenses were first introduced to describe the retrieval and modification of data in database
records. Then they found application is working with data structures. A lens objectifies the idea
of having read/write access to a part of a larger object. For instance, a lens can focus on one of
the components of a pair or a particular component of a record. We’ll discuss lenses and optics
in the next chapter.
Let’s apply the laws of the comonad coalgebra to a lens. For simplicity, let’s omit data
constructors from the equations. We get the following simplified definitions:
phi s = (set s, get s)
epsilon (f, a) = f a
delta (f, a) = (\x -> (f, x), a)
16.3. COMONADS FROM ADJUNCTIONS 239

𝜀𝑠 𝑊𝜙
𝑠 𝑊𝑠 𝑊 (𝑊 𝑠) 𝑊𝑠
𝜙 𝛿𝑠 𝜙
𝑖𝑑𝑠
𝜙
𝑎 𝑊𝑠 𝑠
The first law tells us that applying the result of set to the result of get results in identity:
set s (get s) = s
This is called the set/get law of the lens. Nothing should change when you replace the focus
with the same focus.
The second law requires the application of fmap phi to the result of phi:
fmap phi (set s, get s) = (phi . set s, get s)
This should be equal to the application of delta:
delta (set s, get s) = (\x -> (set s, x), get s)
Comparing the two, we get:
phi . set s = \x -> (set s, x)
Let’s apply it to some a:
phi (set s a) = (set s, a)
Using the definition of phi gives us:
(set (set s a), get (set s a)) = (set s, a)
We have two equalities. The first components are functions, so we apply them to some a' and
get the set/set lens law:
set (set s a) a' = set s a'
Setting the focus to a and then overwriting it with a' is the same as setting the focus directly to
a'.
The second components give us the get/set law:
get (set s a) = a
After we set the focus to a, the result of get is a.
Lenses that satisfy these laws are called lawful lenses. They are comonad coalgebras for the
store comonad.
Chapter 17

Ends and Coends

17.1 Profunctors
In the rarified air of category theory we encounter patterns that are so far removed from their
origins that we have problems visualizing them. It doesn’t help that the more abstract a pattern
gets the more dissimilar the concrete examples of it are.
An arrow from 𝑎 to 𝑏 is relatively easy to visualize. We have a very familiar model for it: a
function that consumes elements of 𝑎 and produces elements of 𝑏. A hom-set is a collection of
such arrows.
A functor is an arrow between categories. It consumes objects and arrows from one category
and produces objects and arrows from another. We can think of it a recipe for building such
objects (and arrows) from materials provided by the source category. In particular, we often
think of an endofunctor as a container of building materials.
A profunctor maps a pair of objects ⟨𝑎, 𝑏⟩ to a set 𝑃 ⟨𝑎, 𝑏⟩ and a pair of arrows:
⟨𝑓 ∶ 𝑠 → 𝑎, 𝑔 ∶ 𝑏 → 𝑡⟩
to a function:
𝑃 ⟨𝑓 , 𝑔⟩ ∶ 𝑃 ⟨𝑎, 𝑏⟩ → 𝑃 ⟨𝑠, 𝑡⟩
A profunctor is an abstraction that combines elements of many other abstractions. Since it’s
a functor  𝑜𝑝 ×  → 𝐒𝐞𝐭, we can think of it as constructing a set from a pair of objects, and a
function from a pair of arrows (one of them going in the opposite direction). This doesn’t help
our imagination though.
Fortunately, we have a good model for a profunctor: the hom-functor. The set of arrows
between two objects behaves like a profunctor when you vary the objects. It also makes sense
that there is a difference between varying the source and the target of the hom-set.
We can, therefore, think of an arbitrary profunctor as generalizing the hom-functor. A pro-
functor provides additional bridges between objects, on top of hom-sets that are already there.
There is, however one big difference between an element of the hom-set (𝑎, 𝑏) and an
element of the set 𝑃 ⟨𝑎, 𝑏⟩. Elements of hom-sets are arrows, and arrows can be composed. It’s
not immediately obvious how to compose profunctors.
Granted, the lifting of arrows by a profunctor can be seen as generalizing composition—just
not between profuctors, but between hom-sets and profunctors. For instance, we can “precom-
pose” 𝑃 ⟨𝑎, 𝑏⟩ with an arrow 𝑓 ∶ 𝑠 → 𝑎 to obtain 𝑃 ⟨𝑠, 𝑏⟩:
𝑃 ⟨𝑓 , 𝑖𝑑𝑏 ⟩ ∶ 𝑃 ⟨𝑎, 𝑏⟩ → 𝑃 ⟨𝑠, 𝑏⟩

241
242 CHAPTER 17. ENDS AND COENDS

Similarly, we can “postcompose” it with 𝑔 ∶ 𝑏 → 𝑡:

𝑃 ⟨𝑖𝑑𝑎 , 𝑔⟩ ∶ 𝑃 ⟨𝑎, 𝑏⟩ → 𝑃 ⟨𝑎, 𝑡⟩

This kind of heterogenous composition takes a composable pair consisting of an arrow and an
element of a profunctor and produces an element of a profunctor.
A profunctor can be extended this way on both sides by lifting a pair of arrows:

𝑓 𝑔
𝑠 𝑎 𝑃 𝑏 𝑡

Collages
There is no reason to restrict a profunctor to a single category. We can easily define a profunctor
between two categories as a functor 𝑃 ∶  𝑜𝑝 ×  → 𝐒𝐞𝐭. Such a profunctor can be used to glue
two categories together by generating the missing hom-sets from the objects in  to the objects
in .
A collage (or a cograph) of two categories  and  is a category whose objects are objects
from both categories (a disjoint union). A hom-set between two objects 𝑥 and 𝑦 is either a hom-
set in , if both objects are in ; a hom-set in , if both are in ; or the set 𝑃 ⟨𝑥, 𝑦⟩ if 𝑥 is in 
and 𝑦 is in . Otherwise the hom-set is empty.
Composition of morphisms is the usual composition, except if one of the morphisms is an
element of 𝑃 ⟨𝑥, 𝑦⟩. In that case we lift the morphism we’re trying to pre- or post-compose.
It’s easy to see that a collage is indeed a category. The new morphisms that go between the
two sides of the collage are sometimes called heteromorphisms. They can only go from  to ,
never the other way around.
Seen this way, a profunctor  𝑜𝑝 ×  → 𝐒𝐞𝐭 should really be called an endo-profunctor. It
defines a collage of  with itself.

Exercise 17.1.1. Show that there is a functor from a collage of two categories to a stick-figure
“walking arrow” category that has two objects and one arrow between them (and two identity
arrows).

Exercise 17.1.2. Show that, if there is a functor from  to the walking arrow category then 
can be split into a collage of two categories.

Profunctors as relations
Under a microscope, a profunctor looks like a hom-functor, and the elements of the set 𝑃 ⟨𝑎, 𝑏⟩
look like individual arrows. But when we zoom out, we can view a profunctor as a relation
between objects. These are not the usual relations; they are proof-relevant relations.
To understand this concept better, let’s consider a regular functor 𝐹 ∶  → 𝐒𝐞𝐭 (in other
words, a co-presheaf). One way to interpret it is to say that it defines a subset of objects of ,
namely those objects that are mapped to non-empty sets. Every element of 𝐹 𝑎 is then treated
as a proof that 𝑎 is a member of this subset. If, on the other hand, 𝐹 𝑎 is an empty set, then 𝑎 is
not a member of the subset.
We can apply the same interpretation to profunctors. If the set 𝑃 ⟨𝑎, 𝑏⟩ is empty, we say that
𝑏 is not related to 𝑎. If it’s not empty, we say that each element of the set 𝑃 ⟨𝑎, 𝑏⟩ represents a
proof that 𝑏 is related to 𝑎. We can then treat a profunctor as a proof-relevant relation.
17.1. PROFUNCTORS 243

Notice that we don’t assume anything about this relation. It doesn’t have to be reflexive, as
it’s possible for 𝑃 ⟨𝑎, 𝑎⟩ to be empty (in fact, 𝑃 ⟨𝑎, 𝑎⟩ makes sense only for endo-profunctors). It
doesn’t have to be symmetric either.
Since the hom-functor is an example of an (endo-) profunctor, this interpretation lets us
view the hom-functor in a new light: as a built-in proof-relevant relation between objects in a
category. If there’s an arrow between two objects, they are related. Notice that this relation is
reflexive, since (𝑎, 𝑎) is never empty: at the very least, it contains the identity morphism.
Moreover, as we’ve seen before, hom-functors interact with profunctors. If 𝑎 is related to 𝑏
through 𝑃 , and the hom-sets (𝑠, 𝑎) and (𝑏, 𝑡) are non-empty, then automatically 𝑠 is related
to 𝑡 through 𝑃 . Profunctors are therefore proof-relevant relations that are compatible with the
structure of the categories in which they operate.
We know how to compose a profunctor with hom-functors, but how would we compose two
profunctors? We can get a clue from the composition of relations.
Suppose that you want to charge your cellphone, but you don’t have a charger. In order to
connect you to a charger it’s enough that you have a friend who owns a charger. Any friend will
do. You compose the relation of having a friend with the relation of a person having a charger
to get a relation of being able to charge your phone. The proof that you can charge your phone
is a pair of proofs, one of friendship and one of the possession of a charger.
In general, we say that two objects are related by the composite relation if there exists an
object in the middle that is related to both of them.

Profunctor composition in Haskell


Composition of relations can be translated to profunctor composition in Haskell. Let’s first
recall the definition of a profunctor:
class Profunctor p where
dimap :: (s -> a) -> (b -> t) -> (p a b -> p s t)
The key to understanding profunctor composition is that it requires the existence of the object
in the middle. For object 𝑏 to be related to object 𝑎 through the composite 𝑃 ⋄ 𝑄 there has to
exist an object 𝑥 that bridges the gap:
𝑄 𝑃
𝑎 𝑥 𝑏
This can be encoded in Haskell using an existential type. Given two profunctors p and q,
their composition is a new profunctor Procompose p q:
data Procompose p q a b where
Procompose :: q a x -> p x b -> Procompose p q a b
We are using a GADT to express the existential nature of the object x. The two arguments to the
data constructor can be seen as a pair of proofs: one proves that x is related to a, and the other
that b is related to x. This pair then constitutes the proof that b is related to a.
An existential type can be seen as a generalization of a sum type. We are summing over all
possible types x. Just like a finite sum can be constructed by injecting one of the alternatives
(think of the two constructors of Either), the existential type can be constructed by picking one
particular type for x and injecting it into the definition of Procompose.
Just as mapping out from a sum type requires a pair of function, one per each alternative;
a mapping out from an existential type requires a family of functions, one per every type. The
mapping out from Procompose, for instance, is given by a polymorphic function:
244 CHAPTER 17. ENDS AND COENDS

mapOut :: Procompose p q a b -> (forall x. q a x -> p x b -> c) -> c


mapOut (Procompose qax pxb) f = (f qax pxb)
The composition of profunctors is again a profunctor, as can be seen from this instance:
instance (Profunctor p, Profunctor q) => Profunctor (Procompose p q)
where
dimap l r (Procompose qax pxb) =
Procompose (dimap l id qax) (dimap id r pxb)
This is just saying that you can extend the composite profunctor by extending the first one to the
left and the second one to the right.
The fact that this definition of profunctor composition happens to work in Haskell is due to
parametricity. The language constraints the types of profunctors in a way that makes this work.
In general, though, taking a simple sum over intermediate objects would result in over-counting,
so in category theory we have to compensate for that.

17.2 Coends
The over-counting in the naive definition of profunctor composition happens when two candi-
dates for the object in the middle are connected by a morphism:

𝑄 𝑃
𝑓
𝑎 𝑥 𝑦 𝑏

We can either extend 𝑄 on the right, by lifting 𝑄⟨𝑖𝑑, 𝑓 ⟩, and use 𝑦 as the middle object; or we
can extend 𝑃 on the left, by lifting 𝑃 ⟨𝑓 , 𝑖𝑑⟩, and use 𝑥 as the intermediary.
In order to avoid the double-counting, we have to tweak our definition of a sum type when
applied to profunctors. The resulting construction is called a coend.
First, let’s re-formulate the problem. We are trying to sum over all objects 𝑥 in the product:

𝑃 ⟨𝑎, 𝑥⟩ × 𝑄⟨𝑥, 𝑏⟩

The double-counting happens because we can open up the gap between the two profunctors, as
long as there is a morphism that we can fit between them. So we are really looking at a more
general product:
𝑃 ⟨𝑎, 𝑥⟩ × 𝑄⟨𝑦, 𝑏⟩

The important observation is that, if we fix the endpoints 𝑎 and 𝑏, this product is a profunctor
in ⟨𝑦, 𝑥⟩. This is easily seen after a little rearrangement (up to isomorphism):

𝑄⟨𝑦, 𝑏⟩ × 𝑃 ⟨𝑎, 𝑥⟩

We are interested in the sum of the diagonal parts of this profunctor, that is when 𝑥 is equal to
𝑦.
So let’s see how we would go about defining the sum of all diagonal entries of a general
profunctor 𝑃 . In fact, this construction works for any functor 𝑃 ∶  𝑜𝑝 ×  → 𝐷, not just for
𝐒𝐞𝐭-valued profunctors.
17.2. COENDS 245

The sum of the diagonal objects is defined by injections; in this case, one per every object
in . Here we show just two of them and the dashed line representing all the rest:

𝑃 ⟨𝑦, 𝑦⟩ 𝑃 ⟨𝑥, 𝑥⟩

𝑖𝑦 𝑖𝑥
𝑑
If we were defining a sum, we’d make it a universal object equipped with such injections.
But because we are dealing with functors of two variables, we want to identify the injections
that are related by “extending” some common ancestor (here, 𝑃 ⟨𝑦, 𝑥⟩). We want the following
diagram to commute, whenever there is a connecting morphism 𝑓 ∶ 𝑥 → 𝑦:

𝑃 ⟨𝑦, 𝑥⟩
𝑃 ⟨𝑖𝑑,𝑓 ⟩ 𝑃 ⟨𝑓 ,𝑖𝑑⟩

𝑃 ⟨𝑦, 𝑦⟩ 𝑃 ⟨𝑥, 𝑥⟩

𝑖𝑦 𝑖𝑥
𝑑
This diagram is called a co-wedge, and its commuting condition is called the co-wedge condi-
tion. For every 𝑓 ∶ 𝑥 → 𝑦, we demand that:
𝑖𝑥 ◦𝑃 ⟨𝑓 , 𝑖𝑑𝑦 ⟩ = 𝑖𝑦 ◦𝑃 ⟨𝑖𝑑𝑥 , 𝑓 ⟩
The universal co-wedge is called a coend.
Since a coend generalizes the sum to a potentially infinite domain, we write it using the
integral sign, with the “integration variable” at the top:
𝑥∶ 
𝑃 ⟨𝑥, 𝑥⟩

Universality means that, whenever there is an object 𝑑 in  equipped with a family of arrows
𝑔𝑥 ∶ 𝑃 ⟨𝑥, 𝑥⟩ → 𝑑 satisfying the co-wedge condition, there is a unique mapping out from the
coend:
𝑥∶ 
ℎ∶ 𝑃 ⟨𝑥, 𝑥⟩ → 𝑑

that factorizes every 𝑔𝑥 through the injection 𝑖𝑥 :
𝑔𝑥 = ℎ◦𝑖𝑥
Pictorially, we have:

𝑃 ⟨𝑦, 𝑥⟩
𝑃 ⟨𝑖𝑑,𝑓 ⟩ 𝑃 ⟨𝑓 ,𝑖𝑑⟩

𝑃 ⟨𝑦, 𝑦⟩ 𝑃 ⟨𝑥, 𝑥⟩

𝑖𝑦 𝑖𝑥
𝑥
∫ 𝑃 ⟨𝑥, 𝑥⟩
𝑔𝑦 ℎ 𝑔𝑥

𝑑
246 CHAPTER 17. ENDS AND COENDS

Compare this with the definition of a sum of two objects:

𝑎 𝑏
Left Right

𝑎+𝑏
𝑓 𝑔

𝑑
Just like the sum was defined as a universal cospan, a coend is defined as a universal co-wedge.
In particular, if you were to construct a coend of a 𝐒𝐞𝐭-valued profunctor, you would start
with a sum (a discriminated union) of all the sets 𝑃 ⟨𝑥, 𝑥⟩. Then you would identify all the
elements of this sum that satisfy the co-wedge condition. You’d identify the element 𝑎 ∈ 𝑃 ⟨𝑥, 𝑥⟩
with the element 𝑏 ∈ 𝑃 ⟨𝑦, 𝑦⟩ whenever there is an element 𝑐 ∈ 𝑃 ⟨𝑦, 𝑥⟩ and a morphism 𝑓 ∶ 𝑥 →
𝑦, such that:
𝑃 ⟨𝑖𝑑, 𝑓 ⟩(𝑐) = 𝑏
and
𝑃 ⟨𝑓 , 𝑖𝑑⟩(𝑐) = 𝑎
Notice that, in a discrete category (which is just a set of objects with no arrows between
them) the co-wedge condition is trivial (there are no 𝑓 ’s other than identities), so a coend is just
a straightforward sum (coproduct) of the diagonal objects 𝑃 ⟨𝑥, 𝑥⟩.

Extranatural transformations
A family of arrows in the target category parameterized by the objects of the source category
can often be combined into a single natural transformation between two functors.
The injections in our definition of a cowedge form a family of functions that is parameterized
by objects, but they don’t neatly fit into a definition of a natural transformation.

𝑃 ⟨𝑦, 𝑦⟩ 𝑃 ⟨𝑥, 𝑥⟩

𝑖𝑦 𝑖𝑥
𝑑
The problem is that the functor 𝑃 ∶  𝑜𝑝 ×  →  is contravariant in the first argument and
covariant in the second; so its diagonal part, which on objects is defined as 𝑥 ↦ 𝑃 ⟨𝑥, 𝑥⟩, is
neither.
The closest analog of naturality at our disposal is the cowedge condition:

𝑃 ⟨𝑦, 𝑥⟩
𝑃 ⟨𝑖𝑑,𝑓 ⟩ 𝑃 ⟨𝑓 ,𝑖𝑑⟩

𝑃 ⟨𝑦, 𝑦⟩ 𝑃 ⟨𝑥, 𝑥⟩

𝑖𝑦 𝑖𝑥
𝑑
Indeed, as is the case with the naturality square, it involves the interaction between the lifting of
a morphism 𝑓 ∶ 𝑥 → 𝑦 (here, in two different ways) and the components of the transformation
𝑖.
17.2. COENDS 247

Granted, the standard naturality condition deals with pairs of functors. Here, the target of
the transformation is a fixed object 𝑑. But we can always reinterpret it as the output of a constant
functor Δ𝑑 ∶  𝑜𝑝 ×  → .
The cowedge condition can be interpreted as a special case of the more general extranatural
transformation. An extranatural transformation is a family of arrows:

𝛼𝑐𝑑 ∶ 𝑃 ⟨𝑐, 𝑐⟩ → 𝑄⟨𝑑, 𝑑⟩

between two functors of the form:

𝑃 ∶  𝑜𝑝 ×  → 

𝑄 ∶ 𝑜𝑝 ×  → 

Extranaturality in 𝑐 means that the following diagram commutes for any morphism 𝑓 ∶ 𝑐 → 𝑐 ′ :

𝑃 ⟨𝑐 ′ , 𝑐⟩
𝑃 ⟨𝑖𝑑,𝑓 ⟩ 𝑃 ⟨𝑓 ,𝑖𝑑⟩

𝑃 ⟨𝑐 ′ , 𝑐 ′ ⟩ 𝑃 ⟨𝑐, 𝑐⟩

𝛼𝑐 ′ 𝑑 𝛼𝑐𝑑
𝑄⟨𝑑, 𝑑⟩

Extranaturality in 𝑑 means that the following diagram commutes for any morphism 𝑔 ∶ 𝑑 → 𝑑 ′ :

𝑃 ⟨𝑐, 𝑐⟩
𝛼𝑐𝑑 𝛼𝑐𝑑 ′

𝑄⟨𝑑, 𝑑⟩ 𝑄⟨𝑑 ′ , 𝑑 ′ ⟩

𝑄⟨𝑖𝑑,𝑔⟩ 𝑄⟨𝑔,𝑖𝑑⟩
𝑄⟨𝑑, 𝑑 ′ ⟩

Given this definition, we get our cowedge condition as the extranaturality of the mapping
between the profunctor 𝑃 and the constant profunctor Δ𝑑 .
We can now reformulate the definition of the coend as a pair (𝑒, 𝑖) where 𝑒 is the object
equipped with the extranatural transformation 𝑖 ∶ 𝑃 → Δ𝑒 that is universal among such pairs.
Universality means that for any object 𝑑 equipped with the extranatural transformation
𝛼 ∶ 𝑃 → Δ𝑑 there is a unique morphism ℎ ∶ 𝑒 → 𝑑 that factorizes all the components of 𝛼
through the components of 𝑖:
𝛼𝑥 = ℎ◦𝑖𝑥

We call this object 𝑒 the coend, and write it as:


𝑥
𝑒= 𝑃 ⟨𝑥, 𝑥⟩

248 CHAPTER 17. ENDS AND COENDS

Profunctor composition using coends


Equipped with the definition of a coend we can now formally define the composition of two
profunctors:
𝑥∶ 
(𝑃 ⋄ 𝑄)⟨𝑎, 𝑏⟩ = 𝑄⟨𝑎, 𝑥⟩ × 𝑃 ⟨𝑥, 𝑏⟩

Compare this with:
data Procompose p q a b where
Procompose :: q a x -> p x b -> Procompose p q a b
The reason why in Haskell we don’t have to worry about the co-wedge condition is analo-
gous to the reason why all parametrically polymorphic functions automatically satisfy naturality
condition. A coend is defined using a family of injections; in Haskell all these injections are
defined by a single polymorphic function:
data Coend p where
Coend :: p x x -> Coend p
Coends introduce a new level of abstraction in dealing with profunctors. Calculations using
coends usually take advantage of their mapping-out property. To define a mapping out of a
coend to some object 𝑑:
𝑥
𝑃 ⟨𝑥, 𝑥⟩ → 𝑑

it’s enough to define a family of functions from the diagonal entries of the functor to 𝑑:
𝑔𝑥 ∶ 𝑃 ⟨𝑥, 𝑥⟩ → 𝑑
satisfying the cowedge condition. You can get a lot of mileage from this trick, especially when
combined with the Yoneda lemma. We’ll see examples of this in what follows.
Exercise 17.2.1. Define a Profunctor instance for the pair of profunctors:
newtype ProPair q p a b x y = ProPair (q a y, p x b)
Hint: Keep the first four parameters fixed:
instance (Profunctor p, Profunctor q) => Profunctor (ProPair q p a b)

Exercise 17.2.2. Profunctor composition can be expressed using a coend:


newtype CoEndCompose p q a b = CoEndCompose (Coend (ProPair q p a b))
Define a Profunctor instance for CoEndCompose.

Colimits as coends
A function of two variables that ignores one of its arguments is equivalent to a function of one
variable. Similarly, a profunctor that ignores one of its arguments is equivalent to a functor.
Conversely, given a functor 𝐹 , we can construct a profunctor:
𝑃 ⟨𝑥, 𝑦⟩ = 𝐹 𝑦
Similarly, its action on a pair of arrows ignores one of the arrows:
𝑃 ⟨𝑓 , 𝑔⟩ = 𝐹 𝑔
17.3. ENDS 249

For any 𝑓 ∶ 𝑥 → 𝑦, our definition of a coend for such a profunctor reduces to the following
diagram:
𝐹𝑥
𝐹𝑓 𝑖𝑑𝐹 𝑥

𝐹𝑦 𝐹𝑥

𝑖𝑦 𝑖𝑥
𝑥
∫ 𝐹𝑥
𝑔𝑦 𝑔𝑥

𝑑
After shrinking the identity arrows, the original co-wedge becomes a co-cone, and the universal
condition turns into the definition of a colimit. This justifies the use the coend notation for
colimits: 𝑥
𝐹 𝑥 = colim𝐹

The functor 𝐹 defines a diagram in the target category. The source category, in this case, is the
pattern.
We can gain a useful intuition if we consider a discrete category, in which a profunctor
is a (possibly infinite) matrix and a coend is the sum (coproduct) if its diagonal elements. A
profunctor that is constant along one axis corresponds to a matrix whose rows are identical
(each given by a “vector” 𝐹 𝑥). The sum of the diagonal elements of such a matrix is equal to
the sum of all components of the vector 𝐹 𝑥.

⎛𝐹 𝑎 𝐹 𝑏 𝐹 𝑐 ...⎞
⎜𝐹 𝑎 𝐹 𝑏 𝐹 𝑐 ...⎟
⎜𝐹 𝑎 𝐹 𝑏 𝐹 𝑐 ...⎟⎟

⎝ ... ... ... ...⎠
In a non-discrete category, this sum generalizes to a colimit.

17.3 Ends
Just like a coend generalizes a sum of the diagonal elements of a profunctor—its dual, an end,
generalizes the product. A product is defined by its projections, and so is an end.
The generalization of a span that we used in the definition of a product would be an object
𝑑 with a family of projections, one per every object 𝑥:

𝜋𝑥 ∶ 𝑑 → 𝑃 ⟨𝑥, 𝑥⟩

The dual to a co-wedge is called a wedge:

𝑑
𝜋𝑥 𝜋𝑦

𝑃 ⟨𝑥, 𝑥⟩ 𝑃 ⟨𝑦, 𝑦⟩

𝑃 ⟨𝑖𝑑,𝑓 ⟩ 𝑃 ⟨𝑓 ,𝑖𝑑⟩
𝑃 ⟨𝑥, 𝑦⟩
250 CHAPTER 17. ENDS AND COENDS

For every arrow 𝑓 ∶ 𝑥 → 𝑦 we demand that:

𝑃 ⟨𝑓 , 𝑖𝑑𝑦 ⟩◦𝜋𝑦 = 𝑃 ⟨𝑖𝑑𝑥 , 𝑓 ⟩◦𝜋𝑥

The end is a universal wedge. We use the integral sign for it too, this time with the “inte-
gration variable” at the bottom.
𝑃 ⟨𝑥, 𝑥⟩
∫𝑥 ∶ 
You might be wondering why integrals based on multiplication rather than summation are
rarely used in calculus. That’s because we can use a logarithm to replace multiplication with
addition. We don’t have this luxury in category theory, so ends and coends are equally important.
To summarize, an end is an object equipped with a family of morphisms (projections):
( )
𝜋𝑎 ∶ 𝑃 ⟨𝑥, 𝑥⟩ → 𝑃 ⟨𝑎, 𝑎⟩
∫𝑥

satisfying the wedge condition.


It is universal among such objects; that is, for any other object 𝑑 equipped with a family
of arrows 𝑔𝑥 satisfying the wedge condition, there is a unique morphism ℎ that factorizes the
family 𝑔𝑥 through the family 𝜋𝑥 :
𝑔𝑥 = 𝜋𝑥 ◦ℎ
Pictorially, we have:

𝑑
𝑔𝑥 ℎ 𝑔𝑦

∫𝑥 𝑃 ⟨𝑥, 𝑥⟩
𝜋𝑥 𝜋𝑦

𝑃 ⟨𝑥, 𝑥⟩ 𝑃 ⟨𝑦, 𝑦⟩

𝑃 ⟨𝑖𝑑,𝑓 ⟩ 𝑃 ⟨𝑓 ,𝑖𝑑⟩
𝑃 ⟨𝑥, 𝑦⟩

Equivalently, we can say that the end is a pair (𝑒, 𝜋) consisting of an object 𝑒 and an ex-
tranatural transformation 𝜋 ∶ Δ𝑑 → 𝑒 that is universal among such pairs. The wedge condition
turns out to be a special case of extranaturality condition.
If you were to construct an end of a 𝐒𝐞𝐭-valued profunctor, you’d start with a product of
all 𝑃 ⟨𝑥, 𝑥⟩ for all objects in the category and then prune the tuples that don’t satisfy the wedge
condition.
In particular, imagine using the singleton set 1 in place of 𝑑. The family 𝑔𝑥 would select
one element from each set 𝑃 ⟨𝑥, 𝑥⟩. This would give you a giant tuple. You’d weed out most of
these tuples, leaving only the ones that satisfy the wedge condition.
Again, in Haskell, due to parametricity, the wedge condition is automatically satisfied, and
the definition of an end for a profunctor p simplifies to:
type End p = forall x. p x x
The Haskell implementation of an End doesn’t showcase the fact that it is dual to the Coend.
This is because, at the time of this writing, Haskell doesn’t have a built-in syntax for existential
types. If it did, the Coend would be implemented as:
17.3. ENDS 251

type Coend p = exists x. p x x


The existential/universal duality between a Coend and an End means that it’s easy to con-
struct a Coend—all you need is to pick one type x for which you have a value of the type p x x.
On the other end, to construct an End you have to provide a whole family of values p x x, one
for every type x. In other words, you need a polymorphic formula that is parameterized by x. A
definition of a polymorphic function is a canonical example of such a formula.

Natural transformations as an end


The most interesting application of an end is in concisely defining natural transformations. Con-
sider two functors, 𝐹 and 𝐺, going between two categories  and . A natural transformation
between them is a family of arrows 𝛼𝑥 in . You may think of it as picking one element 𝛼𝑥 from
each hom-set (𝐹 𝑥, 𝐺𝑥).
𝐹𝑥
𝐹

𝑥 𝛼𝑥
𝐺

𝐺𝑥
We know that the mapping ⟨𝑎, 𝑏⟩ → (𝑎, 𝑏) defines a profunctor. It turns out that, for any
pair of functors, the mapping ⟨𝑎, 𝑏⟩ → (𝐹 𝑎, 𝐺𝑏) also behaves like a profunctor. Its action on
a pair of arrows ⟨𝑓 , 𝑔⟩ is a combination of pre- and post-composition of lifted arrows:
(𝐺𝑔)◦ − ◦(𝐹 𝑓 )
Indeed, an element of the set (𝐹 𝑎, 𝐺𝑏) is an arrow ℎ ∶ 𝐹 𝑎 → 𝐺𝑏. We are trying to lift a
pair of arrows 𝑓 ∶ 𝑠 → 𝑎 and 𝑔 ∶ 𝑏 → 𝑡. We can do it with a pair of arrows in : the first one
is 𝐹 𝑓 ∶ 𝐹 𝑠 → 𝐹 𝑎, and the second one is 𝐺𝑔 ∶ 𝐺𝑏 → 𝐺𝑡. The composition 𝐺𝑔◦ℎ◦𝐹 𝑓 gives us
the desired result 𝐹 𝑠 → 𝐺𝑡, which is an element of (𝐹 𝑠, 𝐺𝑡).
𝐹𝑓 ℎ 𝐺𝑔
𝐹 𝑠 ←←←←←←→
← 𝐹 𝑎 ←←→
← 𝐺𝑏 ←←←←←→
← 𝐺𝑡
The diagonal parts of this profunctor are good candidates for the components of a natural
transformation. In fact, the end:
(𝐹 𝑥, 𝐺𝑥)
∫𝑥 ∶ 
defines a set of natural transformations from 𝐹 to 𝐺.
In Haskell, this is consistent with our earlier definition:
type Natural f g = forall x. f x -> g x
In category theory, though, we have to check the wedge condition. Plugging in our profunc-
tor, we get:

∫𝑥 (𝐹 𝑥, 𝐺𝑥)
𝜋𝑎 𝜋𝑏

(𝐹 𝑎, 𝐺𝑎) ⟨𝐹 𝑏, 𝐺𝑏⟩

(𝐹 𝑓 ◦ −) (− ◦ 𝐺𝑓 )
(𝐹 𝑎, 𝐺𝑏)
252 CHAPTER 17. ENDS AND COENDS

We can focus on a single element of the set ∫𝑥 (𝐹 𝑥, 𝐺𝑥) by instantiating the universal
condition for the singleton set:

1
𝛼𝑎 𝛼 𝛼𝑏

∫𝑥 (𝐹 𝑥, 𝐺𝑥)
𝜋𝑎 𝜋𝑏

(𝐹 𝑎, 𝐺𝑎) ⟨𝐹 𝑏, 𝐺𝑏⟩

(𝐹 𝑓 ◦ −) (− ◦ 𝐺𝑓 )
(𝐹 𝑎, 𝐺𝑏)
It picks the component 𝛼𝑎 from the hom-set (𝐹 𝑎, 𝐺𝑎) and the component 𝛼𝑏 from (𝐹 𝑏, 𝐺𝑏).
The wedge condition then boils down to:

𝐹 𝑓 ◦𝛼𝑎 = 𝛼𝑏 ◦𝐺𝑓

for any 𝑓 ∶ 𝑎 → 𝑏. This is exactly the naturality condition. So an element 𝛼 of this end is indeed
a natural transformation.
The set of natural transformations, or the hom-set in the functor category, is thus given by
the end:
[, ](𝐹 , 𝐺) ≅ (𝐹 𝑥, 𝐺𝑥)
∫𝑥 ∶ 
As we discussed earlier, to construct an End we have to give it a whole family of values
parameterized by types. Here, these values are the components of a polymorphic function.

Limits as ends
Just like we were able to express colimits as coends, we can express limits as ends. As before,
we define a profunctor that ignores its first argument:

𝑃 ⟨𝑥, 𝑦⟩ = 𝐹 𝑦
𝑃 ⟨𝑓 , 𝑔⟩ = 𝐹 𝑔

The universal condition that defines an end becomes the definition of a universal cone:
𝑑
ℎ 𝑔𝑦
𝑔𝑥

∫𝑥 𝐹 𝑥
𝜋𝑥 𝜋𝑦

𝐹𝑥 𝐹𝑦

𝐹𝑓 𝑖𝑑𝐹 𝑦
𝐹𝑦

We can thus use the end notation for limits:

𝐹 𝑥 = lim𝐹
∫𝑥
17.4. CONTINUITY OF THE HOM-FUNCTOR 253

17.4 Continuity of the Hom-Functor


In category theory, a functor is called continuous if it preserves limits (and co-continuous, if it
preserves colimits). It means that, if you have a diagram in the source category then it doesn’t
matter if you first use the functor to map the diagram, and then take the limit; or take the limit
in the source category, and use the functor to map this limit.
The hom-functor is an example of a functor that is continuous in its second argument. Since
a product is the simplest example of a limit, this means, in particular, that:

(𝑥, 𝑎 × 𝑏) ≅ (𝑥, 𝑎) × (𝑥, 𝑏)

The left hand side applies the hom-functor to the product (a limit of a span). The right hand side
maps the diagram, here just a pair of objects, and takes the product (limit) in the target category.
The target category for the hom-functor is 𝐒𝐞𝐭, so this is just a cartesian product. The two sides
are isomorphic by the universal property of the product: the mapping into the product is defined
by a pair of mappings into the two objects.
Continuity of the hom-functor in the first argument is reversed: it maps colimits to limits.
Again, the simplest example of a colimit is the sum, so we have:

(𝑎 + 𝑏, 𝑥) ≅ (𝑎, 𝑥) × (𝑏, 𝑥)

This follows from the universality of the sum: a mapping out of the sum is defined by a pair of
mapping out of the two objects.
It can be shown that an end can be expressed as a limit, and a coend as a colimit. Therefore,
by continuity of the hom-functor, we can always pull out the integral sign from inside a hom-set.
By analogy with the product, we have the mapping-in formula for an end:
( )
 𝑑, 𝑃 ⟨𝑎, 𝑎⟩ ≅ (𝑑, 𝑃 ⟨𝑎, 𝑎⟩)
∫𝑎 ∫𝑎

By analogy with the sum, we have a mapping-out formula for the coend:
( 𝑎 )
 𝑃 ⟨𝑎, 𝑎⟩, 𝑑 ≅ (𝑃 ⟨𝑎, 𝑎⟩, 𝑑)
∫ ∫𝑎

Notice that, in both cases, the right-hand side is an end.

17.5 Fubini Rule


The Fubini rule in calculus states the conditions under which we can switch the order of inte-
gration in double integrals. It turns out that we can similarly switch the order of double ends
and coends. The Fubini rule for ends works for functors of the form 𝑃 ∶  ×  𝑜𝑝 ×  × 𝑜𝑝 → .
The following expressions, as long as they exist, are isomorphic:

𝑃 ⟨𝑐, 𝑐⟩⟨𝑑, 𝑑⟩ ≅ 𝑃 ⟨𝑐, 𝑐⟩⟨𝑑, 𝑑⟩ ≅ 𝑃 ⟨𝑐, 𝑐⟩⟨𝑑, 𝑑⟩


∫𝑐 ∶  ∫𝑑 ∶  ∫𝑑 ∶  ∫𝑐 ∶  ∫⟨𝑐,𝑑⟩ ∶ ×

In the last end, the functor 𝑃 is reinterpreted as 𝑃 ∶ ( × )𝑜𝑝 × ( × ) → 


The analogous rule works for coends as well.
254 CHAPTER 17. ENDS AND COENDS

17.6 Ninja Yoneda Lemma


Having expressed the set of natural transformations as an end, we can now rewrite the Yoneda
lemma. This is the original formulation:

[, 𝐒𝐞𝐭]((𝑎, −), 𝐹 ) ≅ 𝐹 𝑎

Here, 𝐹 is a (covariant) functor from  to 𝐒𝐞𝐭 (a co-presheaf), and so is the hom-functor (𝑎, −).
Expressing the set of natural transformations as an end we get:

𝐒𝐞𝐭((𝑎, 𝑥), 𝐹 𝑥) ≅ 𝐹 𝑎
∫𝑥 ∶ 

Similarly, we have the Yoneda lemma for a contravariant functor (a presheaf) 𝐺:

𝐒𝐞𝐭((𝑥, 𝑎), 𝐺𝑥) ≅ 𝐺𝑎


∫𝑥 ∶ 

These versions of the Yoneda lemma, expressed in terms of ends, are often half-jokingly
called ninja-Yoneda lemmas. The fact that the “integration variable” is explicit makes them
somewhat easier to use in complex formulas.
There is also a dual set of ninja co-Yoneda lemmas that use coends instead. For a covariant
functor, we have:
𝑥∶ 
(𝑥, 𝑎) × 𝐹 𝑥 ≅ 𝐹 𝑎

and for the contravariant one we have:
𝑥∶ 
(𝑎, 𝑥) × 𝐺𝑥 ≅ 𝐺𝑎

Physicists might notice the similarity of these formulas to integrals involving the Dirac delta
function (actually, a distribution). This is why profunctors are sometimes called distributors,
following the adage that “distributors are to functors as distributions are to functions.” Engineers
might notice the similarity of the hom-functor to the impulse function.
This intuition is often expressed by saying that we can perform the “integration over 𝑥” in
this formula that results in replacing 𝑥 with 𝑎 in the integrand 𝐺𝑥.
If  is a discrete category, the coend reduces to the sum (coproduct), and the hom-functor
reduces to the unit matrix (the Kronecker delta). The co-Yoneda lemma becomes:
∑ 𝑗
𝛿𝑖 𝑣𝑗 = 𝑣𝑖
𝑗

In fact, a lot of linear algebra translates directly to the theory of 𝐒𝐞𝐭-valued functors. You may
often view of such functors as vectors in a vector space, in which hom-functors form a basis.
Profunctors become matrices and coends can be used to multiply such matrices, calculate their
traces, or multiply vectors by matrices.
Yet another name for profunctors, especially in Australia, is “bimodules.” This is because
the lifting of morphisms by a profunctor is somewhat similar to the left and right actions on sets.
The proof of the co-Yoneda lemma is quite instructive, as it uses a few common tricks. Most
importantly, we rely on the corollary of the Yoneda lemma, which says that, if all the mappings
17.6. NINJA YONEDA LEMMA 255

out from two objects to an arbitrary object are isomorphic, then the two objects are themselves
isomorphic. We’ll start, therefore, with such a mapping-out to an arbitrary set 𝑆:
( 𝑥∶  )
𝐒𝐞𝐭 (𝑥, 𝑎) × 𝐹 𝑥, 𝑆

Using the co-continuity of the hom-functor, we can pull out the integral sign, replacing the
coend with an end:
𝐒𝐞𝐭 ((𝑥, 𝑎) × 𝐹 𝑥, 𝑆)
∫𝑥 ∶ 
Since the category of sets is cartesian closed, we can curry the product:

( )
𝐒𝐞𝐭 (𝑥, 𝑎), 𝑆 𝐹 𝑥
∫𝑥 ∶ 

We can now use the Yoneda lemma to “integrate over 𝑥.” The result is 𝑆 𝐹 𝑎 . Finally, in 𝐒𝐞𝐭, the
exponential object is isomorphic to the hom-set:

𝑆 𝐹 𝑎 ≅ 𝐒𝐞𝐭(𝐹 𝑎, 𝑆)

Since 𝑆 was arbitrary, we conclude that:


𝑥∶ 
(𝑥, 𝑎) × 𝐹 𝑥 ≅ 𝐹 𝑎

Exercise 17.6.1. Prove the contravariant version of the co-Yoneda lemma.

Yoneda lemma in Haskell


We’ve already seen the Yoneda lemma implemented in Haskell. We can now rewrite it in terms
of an end. We start by defining a profunctor that will go under the end. Its type constructor
takes a functor f and a type a and generates a profunctor that’s contravariant in x and covariant
in y:
data Yo f a x y = Yo ((a -> x) -> f y)
The Yoneda lemma establishes the isomorphism between the end over this profunctor and the
type obtained by acting with the functor f on a. This isomorphism is witnessed by a pair of
functions:
yoneda :: Functor f => End (Yo f a) -> f a
yoneda (Yo g) = g id

yoneda_1 :: Functor f => f a -> End (Yo f a)


yoneda_1 fa = Yo (\h -> fmap h fa)
Similarly, the co-Yoneda lemma uses a coend over the following profunctor:
data CoY f a x y = CoY (x -> a) (f y)
The isomorphism is witnessed by a pair of functions. The first one says that if you have a
function x -> a and a functorful of x then you can make a functorful of a using the fmap:
256 CHAPTER 17. ENDS AND COENDS

coyoneda :: Functor f => Coend (CoY f a) -> f a


coyoneda (Coend (CoY g fa)) = fmap g fa
You can do it without knowing anything about the existential type x.
The second says that if you have a functorful of a, you can create a coend by injecting it
(together with the identity function) into the existential type:
coyoneda_1 :: Functor f => f a -> Coend (CoY f a)
coyoneda_1 fa = Coend (CoY id fa)

17.7 Day Convolution


Electrical engineers are familiar with the idea of convolution. We can convolve two streams by
shifting one of them and summing its product with the other one:

(𝑓 ⋆ 𝑔)(𝑥) = 𝑓 (𝑦)𝑔(𝑥 − 𝑦)𝑑𝑦
∫−∞

This formula can be translated almost verbatim to category theory. We can start by replacing the
integral with a coend. The problem is, we don’t know how to subtract objects. We do however
know how to add them, in a co-cartesian category.
Notice that the sum of the arguments to the two functions is equal to 𝑥. We could enforce
this condition by introducing the Dirac delta function or the “impulse function,” 𝛿(𝑎 + 𝑏 − 𝑥).
In category theory we use the hom-functor to do the same. Thus we can define a convolution of
two 𝐒𝐞𝐭-valued functors:
𝑎,𝑏
(𝐹 ⋆ 𝐺)𝑥 = (𝑎 + 𝑏, 𝑥) × 𝐹 𝑎 × 𝐺𝑏

Informally, if we could define subtraction as the right adjoint to coproduct, we’d write:
𝑎,𝑏 𝑎,𝑏 𝑏
(𝑎 + 𝑏, 𝑥) × 𝐹 𝑎 × 𝐺𝑏 ≅ (𝑎, 𝑏 − 𝑥) × 𝐹 𝑎 × 𝐺𝑏 ≅ 𝐹 (𝑏 − 𝑥) × 𝐺𝑏
∫ ∫ ∫

There is nothing special about coproduct so, in general, Day convolution is defined for any
monoidal category with a tensor product:
𝑎,𝑏
(𝐹 ⋆ 𝐺)𝑥 = (𝑎 ⊗ 𝑏, 𝑥) × 𝐹 𝑎 × 𝐺𝑏

In fact, Day convolution for a monoidal category (, ⊗, 𝐼) endows the category of co-
presheaves [, 𝐒𝐞𝐭] with a monoidal structure. Simply said, if you can multiply (tensor) objects
in , you can multiply (tensor) set-valued functors on  𝑜𝑝 .
It’s easy to check that Day convolution is associative (up to isomorphism) and that (𝐼, −)
serves as the unit object. For instance, we have:
𝑎,𝑏 𝑏
((𝐼, −) ⋆ 𝐺)𝑥 = (𝑎 ⊗ 𝑏, 𝑥) × (𝐼, 𝑎) × 𝐺𝑏 ≅ (𝐼 ⊗ 𝑏, 𝑥) × 𝐺𝑏 ≅ 𝐺𝑥
∫ ∫

So the unit of Day convolution is the Yoneda functor taken at monoidal unit, which lends itself
to the anagrammatic slogan, “ONE of DAY is a YONEDA of ONE.”
17.7. DAY CONVOLUTION 257

If the tensor product is symmetric, then the corresponding Day convolution is also symmet-
ric (up to isomorphism).
In the special case of a cartesian closed category, we can use the currying adjunction to
simplify the formula:
𝑎,𝑏 𝑎,𝑏 𝑏
(𝐹 ⋆ 𝐺)𝑥 = (𝑎 × 𝑏, 𝑥) × 𝐹 𝑎 × 𝐺𝑏 ≅ (𝑎, 𝑥𝑏 ) × 𝐹 𝑎 × 𝐺𝑏 ≅ 𝐹 (𝑥𝑏 ) × 𝐺𝑏
∫ ∫ ∫

In Haskell, the product-based Day convolution can be defined using an existential type:
data Day f g x where
Day :: ((a, b) -> x) -> f a -> g b -> Day f g x
If we think of functors as containers of values, Day convolution tells us how to combine two
different containers into one–given a function that combines two different values into one.

Exercise 17.7.1. Define the Functor instance for Day.

Exercise 17.7.2. Implement the associator for Day.


assoc :: Day f (Day g h) x -> Day (Day f g) h x

Applicative functors as monoids


We’ve seen before the definition of applicative functors as lax monoidal functors. It turns out
that, just like monads, applicative functors can also be defined as monoids.
Recall that a monoid is an object in a monoidal category. The category we’re interested in
is the co-presheaf category [, 𝐒𝐞𝐭]. If  is cartesian, then the co-presheaf category is monoidal
with respect to Day convolution, with the unit object (𝐼, −). A monoid in this category is a
functor 𝐹 equipped with two natural transformations that serve as unit and multiplication:

𝜂 ∶ (𝐼, −) → 𝐹

𝜇∶ 𝐹 ⋆ 𝐹 → 𝐹
In particular, in a cartesian closed category where the unit is the terminal object, (1, 𝑎) is
isomorphic to 𝑎, and the component of unit at 𝑎 is:

𝜂𝑎 ∶ 𝑎 → 𝐹 𝑎

You may recognize this function as pure in the definition of Applicative.


pure :: a -> f a
Let’s consider the set of natural transformations from which 𝜇 is taken. We’ll write it as an
end:
( )
𝜇 ∈ 𝐒𝐞𝐭 (𝐹 ⋆ 𝐹 )𝑥, 𝐹 𝑥
∫𝑥
Pluggin in the definition of Day convolution, we get:

( 𝑎,𝑏 )
𝐒𝐞𝐭 (𝑎 × 𝑏, 𝑥) × 𝐹 𝑎 × 𝐹 𝑏, 𝐹 𝑥
∫𝑥 ∫
258 CHAPTER 17. ENDS AND COENDS

We can pull out the coend using co-continuity of the hom-functor:


( )
𝐒𝐞𝐭 (𝑎 × 𝑏, 𝑥) × 𝐹 𝑎 × 𝐹 𝑏, 𝐹 𝑥
∫𝑥,𝑎,𝑏

We can then use the currying adjunction in 𝐒𝐞𝐭 to obtain:


( )
𝐒𝐞𝐭 (𝑎 × 𝑏, 𝑥), 𝐒𝐞𝐭(𝐹 𝑎 × 𝐹 𝑏, 𝐹 𝑥)
∫𝑥,𝑎,𝑏

Finally, we apply the Yoneda lemma to perform the integration over 𝑥:


( )
𝐒𝐞𝐭 𝐹 𝑎 × 𝐹 𝑏, 𝐹 (𝑎 × 𝑏)
∫𝑎,𝑏

The result is the set of natural transformations from which to select the second part of the lax
monoidal functor:
(>*<) :: f a -> f b -> f (a, b)

Free Applicatives
We have just learned that applicative functors are monoids in the monoidal category:

([, 𝐒𝐞𝐭], (𝐼, −), ⋆)

It’s only natural to ask what a free monoid in that category is.
Just like we did with free monads, we’ll construct a free applicative as the initial algebra, or
the least fixed point of the list functor. Recall that the list functor was defined as:

Φ𝑎 𝑥 = 1 + 𝑎 ⊗ 𝑥

In our case it becomes:


Φ𝐹 𝐺 = (𝐼, −) + 𝐹 ⋆ 𝐺
Its fixed point is given by the recursive formula:

𝐴𝐹 ≅ (𝐼, −) + 𝐹 ⋆ 𝐴𝐹

When translating this to Haskell, we observe that functions from the unit ()->a are isomor-
phic to elements of a.
Corresponding to the two addends in the definition of 𝐴𝐹 , we get two constructors:
data FreeA f x where
DoneA :: x -> FreeA f x
MoreA :: ((a, b) -> x) -> f a -> FreeA f b -> FreeA f x
I have inlined the definition of Day convolution:
data Day f g x where
Day :: ((a, b) -> x) -> f a -> g b -> Day f g x
The easiest way to show that FreeA f is an applicative functor is to go through Monoidal:
17.8. THE BICATEGORY OF PROFUNCTORS 259

class Monoidal f where


unit :: f ()
(>*<) :: f a -> f b -> f (a, b)
Since FreeA f is a generalization of a list, the Monoidal instance for free applicative gen-
eralizes the idea of list concatenation. We do the pattern matching on the first list, resulting in
two cases.
In the first case, instead of an empty list we have DoneA x. Prepending it to the second
argument doesn’t change the length of the list, but it modifies the type of the values stored in it.
It pairs each of them with x:
(DoneA x) >*< fry = fmap (x,) fry
The second case is a “list” whose head fa is a functorful of a’s, and the tail frb is of the
type FreeA f b. The two are glued using a function abx :: (a, b) -> x.
(MoreA abx fa frb) >*< fry = MoreA (reassoc abx) fa (frb >*< fry)
To produce the result, we concatenate the two tails using the recursive call to >*< and prepend
fa to it. To glue this head to the new tail we have to provide a function that re-associates the
pairs:
reassoc :: ((a, b)-> x) -> (a, (b, y)) -> (x, y)
reassoc abx (a, (b, y)) = (abx (a, b), y)
The complete instance is thus:
instance Functor f => Monoidal (FreeA f) where
unit = DoneA ()
(DoneA x) >*< fry = fmap (x,) fry
(MoreA abx fa frb) >*< fry = MoreA (reassoc abx) fa (frb >*< fry)
Once we have the Monoidal instance, it’s straightforward to produce the Applicative
instance:
instance Functor f => Applicative (FreeA f) where
pure a = DoneA a
ff <*> fx = fmap app (ff >*< fx)

app :: (a -> b, a) -> b


app (f, a) = f a

Exercise 17.7.3. Define the Functor instance for the free applicative.

17.8 The Bicategory of Profunctors


Since we know how to compose profunctors using coends, the question arises: is there a category
in which they serve as morphisms? The answer is yes, as long as we relax the rules a bit. The
problem is that the categorical laws for profunctor composition are nor satisfied “on the nose,”
but only up to isomorphism.
For instance, we can try to show associativity of profunctor composition. We start with:
𝑏( 𝑎 )
((𝑃 ⋄ 𝑄) ⋄ 𝑅)⟨𝑠, 𝑡⟩ = 𝑃 ⟨𝑠, 𝑎⟩ × 𝑄⟨𝑎, 𝑏⟩ × 𝑅⟨𝑏, 𝑡⟩
∫ ∫
260 CHAPTER 17. ENDS AND COENDS

and, after a few transformations, arrive at:


𝑎 ( 𝑏 )
(𝑃 ⋄ (𝑄 ⋄ 𝑅))⟨𝑠, 𝑡⟩ = 𝑃 ⟨𝑠, 𝑎⟩ × 𝑄⟨𝑎, 𝑏⟩ × 𝑅⟨𝑏, 𝑡⟩
∫ ∫

We use the associativity of the product and the fact that we can switch the order of coends using
the Fubini theorem. Both are true only up to isomorphism. We don’t get associativity “on the
nose.”
The identity profunctor turns out to be the hom-functor, which can be written symbolically
as (−, =), with placeholders for both arguments. For instance:
𝑎
((−, =) ⋄ 𝑃 ) ⟨𝑠, 𝑡⟩ = (𝑠, 𝑎) × 𝑃 ⟨𝑎, 𝑡⟩ ≅ 𝑃 ⟨𝑠, 𝑡⟩

This is the consequence of the (contravariant) ninja co-Yoneda lemma, which is also an isomorphism—
not an equality.
A category in which categorical laws are satisfied up to isomorphism is called a bicategory.
Notice that such a category must be equipped with 2-cells—morphisms between morphisms,
which we’ve already seen in the definition of a 2-category. We need those in order to be able to
define isomorphisms between 1-cells.
A bicategory 𝐏𝐫𝐨𝐟 has (small) categories as objects, profunctors as 1-cells, and natural
transformations as 2-cells.
Since profunctors are functors  𝑜𝑝 ×  → 𝐒𝐞𝐭, the standard definition of natural transfor-
mations between them applies. It’s a family of functions parameterized by objects of  𝑜𝑝 × ,
which are themselves pairs of objects.
The naturality condition for a transformation 𝛼⟨𝑎,𝑏⟩ between two profunctors 𝑃 and 𝑄 takes
the form:
𝑃 ⟨𝑎, 𝑏⟩
𝛼⟨𝑎,𝑏⟩ 𝑃 ⟨𝑓 ,𝑔⟩

𝑄⟨𝑎, 𝑏⟩ 𝑃 ⟨𝑠, 𝑡⟩

𝑄⟨𝑓 ,𝑔⟩ 𝛼⟨𝑠,𝑡⟩


𝑄⟨𝑠, 𝑡⟩

for every pair of arrows:


⟨𝑓 ∶ 𝑠 → 𝑎, 𝑔 ∶ 𝑏 → 𝑡⟩

Monads in a bicategory
We’ve seen before that categories, functors, and natural transformations form a 2-category 𝐂𝐚𝐭.
Let’s focus on one object, a category , that is a 0-cell in 𝐂𝐚𝐭. The 1-cells that start and end
at this object form a regular category, in this case it’s the functor category [, ]. The objects
in this category are endo-1-cells of the outer 2-category 𝐂𝐚𝐭. The arrows between them are the
2-cells of the outer 2-category.
This endo-one-cell category is automatically equipped with a monoidal structure. We de-
fine the tensor product as the composition of 1-cells—all 1-cells with the same source and target
compose. The monoidal unit object is the identity 1-cell, 𝐼. In [, ] this product is the com-
position of endofunctors and the unit is the identity functor.
17.8. THE BICATEGORY OF PROFUNCTORS 261

If we now focus our attention on just one endo-1-cell 𝐹 , we can “square” it, that is use the
monoidal product to multiply it by itself. In other words, use the 1-cell composition to create
𝐹 ◦𝐹 . We say that 𝐹 is a monad if we can find 2-cells:

𝜇 ∶ 𝐹 ◦𝐹 → 𝐹

𝜂∶ 𝐼 → 𝐹
that behave like multiplication and unit, that is they make the associativity and unit diagrams
commute.

𝐹 ◦𝐹

In fact a monad can be defined in an arbitrary bicategory, not just the 2-category 𝐂𝐚𝐭.

Prearrows as monads in 𝐏𝐫𝐨𝐟


Since 𝐏𝐫𝐨𝐟 is a bicategory, we can define a monad in it. It is an endo-profunctor (a 1-cell):

𝑃 ∶  𝑜𝑝 ×  → 𝐒𝐞𝐭

equipped with two natural transformations (2-cells):

𝜇∶ 𝑃 ⋄ 𝑃 → 𝑃

𝜂 ∶ (−, =) → 𝑃
that satisfy the associativity and unit conditions.
We’ll look at these natural transformations as elements of ends. For instance:
( 𝑥 )
𝜇∈ 𝐒𝐞𝐭 𝑃 ⟨𝑎, 𝑥⟩ × 𝑃 ⟨𝑥, 𝑏⟩, 𝑃 ⟨𝑎, 𝑏⟩
∫⟨𝑎,𝑏⟩ ∫

By co-continuity, this is equivalent to:


( )
𝐒𝐞𝐭 𝑃 ⟨𝑎, 𝑥⟩ × 𝑃 ⟨𝑥, 𝑏⟩, 𝑃 ⟨𝑎, 𝑏⟩
∫⟨𝑎,𝑏⟩,𝑥

The unit is:


𝜂∈ 𝐒𝐞𝐭((𝑎, 𝑏), 𝑃 ⟨𝑎, 𝑏⟩)
∫⟨𝑎,𝑏⟩
In Haskell, such profunctor monads are called pre-arrows:
262 CHAPTER 17. ENDS AND COENDS

class Profunctor p => PreArrow p where


(>>>) :: p a x -> p x b -> p a b
arr :: (a -> b) -> p a b
An Arrow is a PreArrow that is also a Tambara module. We’ll talk about Tambara modules in
the next chapter.

17.9 Existential Lens


The first rule of category-theory club is that you don’t talk about the internals of objects.
The second rule of category-theory club is that, if you have to talk about the internals of
objects, use arrows only.

Existential lens in Haskell


What does it mean for an object to be a composite—to have parts? At the very minimum, you
should be able to retrieve a part of such an object. Even better if you can replace that part with
a new one. This pretty much defines a lens:
get :: s -> a
set :: s -> a -> s
Here, get extracts the part a from the whole s, and set replaces that part with a new a. Lens
laws help to reinforce this picture. And it’s all done in terms of arrows.
Another way of describing a composite object is to say that it can be split into a focus and a
residue. The trick is that, although we want to know what type the focus is, we don’t care about
the type of the residue. All we need to know about the residue is that it can be combined with
the focus to recreate the whole object.
In Haskell, we would express this idea using an existential type:
data LensE s a where
LensE :: (s -> (c, a), (c, a) -> s) -> LensE s a
This tells us that there exists some unspecified type c such that s can be split into, and recon-
structed from, a product (c, a).

𝑠 𝑐

The get/set version of the lens can be derived from this existential form.
toGet :: LensE s a -> (s -> a)
toGet (LensE (l, r)) = snd . l

toSet :: LensE s a -> (s -> a -> s)


toSet (LensE (l, r)) s a = r (fst (l s), a)
Notice that we don’t need to know anything about the type of the residue. We take advantage
of the fact that the existential lens contains both the producer and the consumer of c and we’re
just mediating between the two.
17.9. EXISTENTIAL LENS 263

It’s impossible to extract a “naked” residue, as witnessed by the fact that the following code
doesn’t compile:
getResidue :: Lens s a -> c
getResidue (Lens (l, r)) = fst . l

Existential lens in category theory


We can easily translate the new definition of the lens to category theory by expressing the exis-
tential type as a coend:
𝑐
(𝑠, 𝑐 × 𝑎) × (𝑐 × 𝑎, 𝑠)

In fact, we can generalize it to a type-changing lens, in which the focus 𝑎 can be replaced with
a new focus of a different type 𝑏. Replacing 𝑎 with 𝑏 will produce a new composite object 𝑡:

𝑐 𝑡

The lens is now parameterized by two pairs of objects: ⟨𝑠, 𝑡⟩ for the outer ones, and ⟨𝑎, 𝑏⟩
for the inner ones. The existential residue 𝑐 remains hidden:
𝑐
⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩ = (𝑠, 𝑐 × 𝑎) × (𝑐 × 𝑏, 𝑡)

The product under the coend is the diagonal part of the profunctor that is covariant in 𝑦 and
contravariant in 𝑥:
(𝑠, 𝑦 × 𝑎) × (𝑥 × 𝑏, 𝑡)

Exercise 17.9.1. Show that:


(𝑠, 𝑦 × 𝑎) × (𝑥 × 𝑏, 𝑡)

is a profunctor in ⟨𝑥, 𝑦⟩.

Type-changing lens in Haskell


In Haskell, we can define a type-changing lens as the following existential type:
data LensE s t a b where
LensE :: (s -> (c, a)) -> ((c, b) -> t) -> LensE s t a b
As before, we can use the existential lens to get and set the focus:
toGet :: LensE s t a b -> (s -> a)
toGet (LensE l r) = snd . l

toSet :: LensE s t a b -> (s -> b -> t)


toSet (LensE l r) s a = r (fst (l s), a)
264 CHAPTER 17. ENDS AND COENDS

The two functions, s->(c, a) and (c, b)-> t are often called the forward and the back-
ward pass. The forward pass can be used to extract the focus a. The backward pass provides
the answer to the question: If we wanted the result of the forward pass to be some other b, what
t should we pass to it?
And sometimes we’re just asking: What change t should we make to the input if we wanted
to change the focus by b. The latter point of view is especially useful when using lenses to
describe neural networks.
The simplest example of a lens acts on a product. It can extract or replace one component
of the product, treating the other as the residue. In Haskell, we’d implement it as:
prodLens :: LensE (c, a) (c, b) a b
prodLens = LensE id id
Here, the type of the whole is the product (c, a). When we replace a with b we end up with
the target type (c, b). Since the source and the target are already products, the two functions
in the definition of the existential lens are just identities.

Lens composition
The main advantage of using lenses is that they compose. A composition of two lenses lets us
zoom in on a subcomponent of a component.
Suppose that we start with a lens that lets us access the focus a and change it to b. This
focus is part of a whole described by the source s and the target t.
We also have the inner lens that can access the focus of a' inside the whole of a, and replace
it with b' to produce a b.
We can now construct a composite lens that can access a' and b' inside of s and t. The
trick is to realize that we can take, as the new residue, a product of the two residues:

𝑎 𝑎′
𝑐′
𝑠 𝑐 𝑐

𝑏′ 𝑏
𝑐′
𝑐 𝑐 𝑡

compLens :: LensE a b a' b' -> LensE s t a b -> LensE s t a' b'
compLens (LensE l2 r2) (LensE l1 r1) = LensE l3 r3
where l3 = assoc' . bimap id l2 . l1
r3 = r1 . bimap id r2 . assoc
The left mapping in the new lens is given by the following composite:
𝑙1 (𝑖𝑑,𝑙2 ) 𝑎𝑠𝑠𝑜𝑐 ′
← (𝑐, (𝑐 ′ , 𝑎′ )) ←←←←←←←←←←→
← (𝑐, 𝑎) ←←←←←←←←←←→
𝑠 ←←←→ ← ((𝑐, 𝑐 ′ ), 𝑎′ )
17.10. LENSES AND FIBRATIONS 265

and the right mapping is given by:


𝑎𝑠𝑠𝑜𝑐 (𝑖𝑑,𝑟2 ) 𝑟1
((𝑐, 𝑐 ′ ), 𝑏′ ) ←←←←←←←←←→
← (𝑐, (𝑐 ′ , 𝑏′ )) ←←←←←←←←←←→
← (𝑐, 𝑏) ←←←→
← 𝑡
We have used the associativity and functoriality of the product:
assoc :: ((c, c'), b') -> (c, (c', b'))
assoc ((c, c'), b') = (c, (c', b'))

assoc' :: (c, (c', a')) -> ((c, c'), a')


assoc' (c, (c', a')) = ((c, c'), a')

instance Bifunctor (,) where


bimap f g (a, b) = (f a, g b)
As an example, let’s compose two product lenses:
l3 :: LensE (c, (c', a')) (c, (c', b')) a' b'
l3 = compLens prodLens prodLens
and apply it to a nested product:
x :: (String, (Bool, Int))
x = ("Outer", (True, 42))
Our composite lens lets us not only retrieve the innermost component:
toGet l3 x
> 42
but also replace it with a value of a different type (here, Char):
toSet l3 x 'z'
> ("Outer",(True,'z'))

Category of lenses
Since lenses can be composed, you might be wondering if there is a category in which lenses
define hom-sets.
Indeed, there is a category 𝐋𝐞𝐧𝐬 whose objects are pairs of objects in , and arrows from
⟨𝑠, 𝑡⟩ to ⟨𝑎, 𝑏⟩ are elements of ⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩.
The formula for the composition of existential lenses is too complicated to be useful in
practice. In the next chapter we’ll see an alternative representation of lenses using Tambara
modules, in which composition is just a composition of functions.

17.10 Lenses and Fibrations


There is an alternative view of lenses using the language of fiber bundles. A projection 𝑝 that
defines a fibration can be seen as “decomposing” the bundle 𝐸 into fibers.
In this view, 𝑝 plays the role of get:
𝑝∶ 𝐸 → 𝐵
The base 𝐵 represents the type of the focus and 𝐸 represents the type of the composite from
which that focus can be extracted.
266 CHAPTER 17. ENDS AND COENDS

The other part of the lens, set, is a mapping:

𝑞∶ 𝐸 × 𝐵 → 𝐸

Let’s see how we can interpret it using fibrations.

Transport law
We interpret 𝑞 as “transporting” an element of the bundle 𝐸 to a new fiber. The new fiber is
specified by an element of 𝐵.
This property of the transport is expressed by the get/set lens law, or the transport law, that
says that “you get what you set”:
get (set s a) = a
We say that 𝑞(𝑠, 𝑎) transports 𝑠 to a new fiber over 𝑎:

𝑝−1 𝑎
𝐸
𝑞(𝑠, 𝑎)
𝑠

𝑎 𝐵

We can rewrite this law in terms of 𝑝 and 𝑞:

𝑝◦𝑞 = 𝜋2

where 𝜋2 is the second projection from the product.


Equivalently, we can represent it as a commuting diagram:

𝐸×𝐵
𝑞

𝜀×𝑖𝑑 𝐸
𝑝
𝐵

Here, instead of using the projection 𝜋2 , I used a comonoidal counit 𝜀:

𝜀∶ 𝐸 → 1

followed by the unit law for the product. Using a comonoid makes it easier to generalize this
construction to a tensor product in a monoidal category.

Identity law
Here’s the set/get law or the identity law. It says that “nothing changes if you set what you get”:
set s (get s) = s
17.10. LENSES AND FIBRATIONS 267

We can write it in terms of a comonoidal comultiplication:

𝛿∶ 𝐸 → 𝐸 × 𝐸

The set/get law requires the following composite to be an identity:


𝛿 𝑖𝑑×𝑝 𝑞
← 𝐸 × 𝐸 ←←←←←←←←→
𝐸 ←←→ ← 𝐸 × 𝐵 ←←→
← 𝐸

Here’s the illustration of this law in a bundle:

𝑝−1 𝑎
𝐸

𝑠 = 𝑞(𝑠, 𝑎)

𝑎 𝐵

Composition law
Finally, here’s the set/set law, or the composition law. It says that “the last set wins”:
set (set s a) a' = set s a'
and the corresponding commuting diagram:

𝐸×𝐵×𝐵
𝑞×𝑖𝑑
𝑖𝑑×𝜀×𝑖𝑑

𝐸×𝐵 𝐸×𝐵
𝑞
𝑞
𝐸

Again, to get rid of the middle 𝐵, I used the counit rather than a projection from the product.
This is what the set/set law looks like in a bundle:

𝑝−1 𝑎 𝑝−1 𝑎′
𝐸
𝑠 𝑞(𝑎′ , 𝑠′ ) = 𝑞(𝑎′ , 𝑠)

𝑠′ = 𝑞(𝑠, 𝑎)

𝑎 𝐵
𝑎′

Type-changing lens
A type-changing lens generalizes transport to act between bundles. We have to define a whole
family of bundles. We start with a category  whose objects define the types that we will use
for the foci of our lens.
268 CHAPTER 17. ENDS AND COENDS

We construct the set 𝐵 as the combined set of all elements of all focus types. 𝐵 is fibrated
over —the projection 𝜋 sending an element of 𝐵 to its corresponding type. You may think of
𝐵 as the set of objects of the coslice category 1∕.
The bundle of bundles 𝐸 is a set that’s fibered over 𝐵 with the projection 𝑝. Since 𝐵 itself
is fibered over , 𝐸 is transitively fibered over , with the composite projection 𝜋◦𝑝. It’s this
coarser fibration that splits 𝐸 into a family of bundles. Each of these bundles corresponds to a
different type of the composite for a given focus type. A type-changing lens will move between
these bundles.

𝑝
𝐵
𝜋 

The projection 𝑝 takes an element 𝑠 ∈ 𝐸 and produces an element 𝑏 ∈ 𝐵 whose type is


given by 𝜋𝑏. This is the generalization of get.
The transport 𝑞, which corresponds to set, takes an element 𝑠 ∈ 𝐸 and an element 𝑏 ∈ 𝐵
and produces a new element 𝑡 ∈ 𝐸. The important observation is that 𝑠 and 𝑡 may belong to
different sub-bundles of 𝐸.
The transport satisfies the following laws:
The get/set law (transport):

𝑝(𝑞(𝑏, 𝑠)) = 𝑏
The set/get law (identity):

𝑞(𝑝(𝑠), 𝑠) = 𝑠
The set/set law (composition):

𝑞(𝑐, 𝑞(𝑏, 𝑠)) = 𝑞(𝑐, 𝑠)

17.11 Important Formulas


This is a handy (co-)end calculus cheat-sheet.
• Continuity of the hom-functor:
( )
 𝑑, 𝑃 ⟨𝑎, 𝑎⟩ ≅  (𝑑, 𝑃 ⟨𝑎, 𝑎⟩)
∫𝑎 ∫𝑎

• Co-continuity of the hom-functor:


( 𝑎 )
 𝑃 ⟨𝑎, 𝑎⟩, 𝑑 ≅  (𝑃 ⟨𝑎, 𝑎⟩, 𝑑)
∫ ∫𝑎
17.11. IMPORTANT FORMULAS 269

• Ninja Yoneda:
𝐒𝐞𝐭((𝑎, 𝑥), 𝐹 𝑥) ≅ 𝐹 𝑎
∫𝑥

• Ninja co-Yoneda:
𝑥
(𝑥, 𝑎) × 𝐹 𝑥 ≅ 𝐹 𝑎

• Ninja Yoneda for contravariant functors (presheaves):

𝐒𝐞𝐭((𝑥, 𝑎), 𝐺𝑥) ≅ 𝐺𝑎


∫𝑥

• Ninja co-Yoneda for contravariant functors:


𝑥
(𝑎, 𝑥) × 𝐺𝑥 ≅ 𝐺𝑎

• Day convolution:
𝑎,𝑏
(𝐹 ⋆ 𝐺)𝑥 = (𝑎 ⊗ 𝑏, 𝑥) × 𝐹 𝑎 × 𝐺𝑏

Chapter 18

Tambara Modules

It’s not often that an obscure corner of category theory gains sudden prominence in program-
ming. Tambara modules got a new lease on life in their application to profunctor optics. They
provide a clever solution to the problem of composing optics. We’ve seen that, in the case of
lenses, the getters compose nicely using function composition, but the composition of setters
involves some shenanigans. The existential representation doesn’t help much. The profunctor
representation, on the other hand, makes composition a snap.
The situation is somewhat analogous to the problem of composing geometric transforma-
tions in graphics programming. For instance, if you try to compose two rotations around two
different axes, the formula for the new axis and the angle is quite complicated. But if you
represent rotations as matrices, you can use matrix multiplication; or, if you represent them as
quaternions, you can use quaternion multiplication. Profunctor representation lets you compose
optics using straightforward function composition.

18.1 Tannakian Reconstruction


Monoids and their Representations
The theory or representations is a science in itself. Here, we’ll approach it from the categorical
perspective. Instead of groups, we’ll consider monoids. A monoid can be defined as a special
object in a monoidal category, but it can also be thought of as a single-object category . If
we call its object ∗, the hom-set (∗, ∗) contains all the information we need.
What we called “product” in a monoid is replaced by the composition of morphisms. By the
laws of a category, it’s associative and unital—the identity morphism serving as the monoidal
unit.
In this sense, every single-object category is automatically a monoid and all monoids can
be made into single-object categories.
For instance, a monoid of the whole numbers with addition can be thought of as a cate-
gory with a single abstract object ∗ and a morphism for every number. To compose two such
morphisms, you add their numbers, as in the example below:
2 3
∗ ∗ ∗
5

The morphism corresponding to zero is automatically the identity morphism.

271
272 CHAPTER 18. TAMBARA MODULES

We can represent a monoid as transformations of a set. Such a representation is given by


a functor 𝐹 ∶  → 𝐒𝐞𝐭. It maps the single object ∗ to some set 𝑆, and it maps the hom-
set (∗, ∗) to a set of functions 𝑆 → 𝑆. By functor laws, it maps identity to identity and
composition to composition, so it preserves the structure of the monoid.
If the functor is fully faithful, its image contains exactly the same information as the monoid
and nothing more. But, in general, functors cheat. The hom-set 𝐒𝐞𝐭(𝑆, 𝑆) may contain some
other functions that are not in the image of (∗, ∗); and multiple morphisms in  may be
mapped to a single function.
In the extreme, the whole hom-set (∗, ∗) may be mapped to the identity morphism 𝑖𝑑𝑆 .
So, just by looking at the set 𝑆—the image of ∗ under the functor 𝐹 —we cannot dream of
reconstructing the original monoid.
Not all is lost, though, if we are allowed to look at all the representations of a given monoid
simultaneously. Such representations form a category—the functor category [, 𝐒𝐞𝐭], a.k.a.
the co-presheaf category over . Arrows in this category are natural transformations.
Since the source category  contains only one object, naturality conditions take a partic-
ularly simple form. A natural transformation 𝛼 ∶ 𝐹 → 𝐺 has only one component, a function
𝛼 ∶ 𝐹 ∗→ 𝐺 ∗. Given a morphism 𝑚 ∶ ∗→∗, the naturality square reads:

𝛼
𝐹 ∗ 𝐺∗
𝐹𝑚 𝐺𝑚
𝛼
𝐹 ∗ 𝐺∗
It’s a relationship between three functions acting on two sets:

𝐹𝑚 𝐺𝑚

𝐹 ∗ 𝐺∗
𝛼

The naturality condition tells us that:

𝐺𝑚◦𝛼 = 𝛼◦𝐹 𝑚

In other words, if you pick any element 𝑥 in the set 𝐹 ∗, you can map it to 𝐺 ∗ using 𝛼 and
then apply the transformation 𝐺𝑚 corresponding to 𝑚; or you can first apply the transformation
𝐹 𝑚 and then map the result using 𝛼. The result must be the same.
Such functions are called equivariant. We often call 𝐹 𝑚 the action of 𝑚 on the set 𝐹 ∗.
An equivariant function connects an action on one set to its corresponding action on another set
using either pre-composition or post-composition.

Tannakian reconstruction of a monoid


How much information do we need to reconstruct a monoid from its representations? Just look-
ing at the sets is definitely not enough, since any monoid can be represented on any set. But if
we include structure-preserving functions between those sets, we might have a chance.
Given a functor 𝐹 ∶  → 𝐒𝐞𝐭, consider the set of all functions 𝐹 ∗→ 𝐹 ∗, that is the
hom-set 𝐒𝐞𝐭(𝐹 ∗, 𝐹 ∗), . At least some of these functions are the actions of the monoid. These
18.1. TANNAKIAN RECONSTRUCTION 273

are the functions of the form 𝐹 𝑚, where 𝑚 is an arrow in . Keep in mind, though, that there
may be many more functions in that hom-set that don’t correspond to actions.
Now let’s look at another set 𝐺 ∗, which is the image of some other functor 𝐺. In its hom-
set 𝐒𝐞𝐭(𝐺 ∗, 𝐺 ∗) we’ll find, among others, the corresponding actions of the form 𝐺𝑚. An
equivariant function, that is a natural transformation in [, 𝐒𝐞𝐭], will connect any two related
action 𝐹 𝑚 to 𝐺𝑚.
Now imagine creating a gigantic tuple by taking one function from each of the sets 𝐒𝐞𝐭(𝐹 ∗
, 𝐹 ∗), for all functors 𝐹 ∶  → 𝐒𝐞𝐭. We are interested only in tuples whose elements are
connected. Here’s what I mean: If we pick 𝑔 ∈ 𝐒𝐞𝐭(𝐺 ∗, 𝐺 ∗) and ℎ ∈ 𝐒𝐞𝐭(𝐻 ∗, 𝐻 ∗) and
there is a natural transformation (equivariant function) 𝛼 between the two functors 𝐺 and 𝐻,
we want the two functions to be related:

𝛼◦𝑔 = ℎ◦𝛼

or, pictorially:
𝑔 ℎ

𝐺∗ 𝐻∗
𝛼

Notice that this correlation also works on pairs 𝑔 and ℎ which are not of the form 𝑔 = 𝐺𝑚 and
ℎ = 𝐻𝑚.
Such tuples are exactly the elements of the end:

𝐒𝐞𝐭(𝐹 ∗, 𝐹 ∗)
∫𝐹

whose wedge condition provides the constraints we are looking for.

∫𝐹 𝐒𝐞𝐭(𝐹 ∗, 𝐹 ∗)
𝜋𝐺 𝜋𝐻

𝐒𝐞𝐭(𝐺 ∗, 𝐺 ∗) 𝐒𝐞𝐭(𝐻 ∗, 𝐻 ∗)

𝛼◦− −◦𝛼
𝐒𝐞𝐭(𝐺 ∗, 𝐻 ∗)
Here 𝛼 is a morphism in the functor category [, 𝐒𝐞𝐭]:

𝛼∶ 𝐺 → 𝐻

This natural transformation has only one component, which we’ll also call 𝛼. It’s an equivariant
function between the two representations.
Here are some details. The profunctor under the end is given by:

𝑃 ⟨𝐺, 𝐻⟩ = 𝐒𝐞𝐭(𝐺 ∗, 𝐻 ∗)

It’s a functor of the form:

𝑃 ∶ [, 𝐒𝐞𝐭]𝑜𝑝 × [, 𝐒𝐞𝐭] → 𝐒𝐞𝐭


274 CHAPTER 18. TAMBARA MODULES

Consider its action on pairs of morphisms in [, 𝐒𝐞𝐭]. Given a pair of natural transformations:

𝛼 ∶ 𝐺′ → 𝐺
𝛽 ∶ 𝐻 → 𝐻′

their lifting is a function:


𝑃 ⟨𝛼, 𝛽⟩ ∶ 𝑃 ⟨𝐺, 𝐻⟩ → 𝑃 ⟨𝐺′ , 𝐻 ′ ⟩
Substituting our definition of 𝑃 , we have:

𝑃 ⟨𝛼, 𝛽⟩ ∶ 𝐒𝐞𝐭(𝐺 ∗, 𝐻 ∗) → 𝐒𝐞𝐭(𝐺′ ∗, 𝐻 ′ ∗)

We get this function by pre-composing with 𝛼 and post-composing with 𝛽 (these functions are
the only components of the two natural transformations 𝛼 and 𝛽):

𝑃 ⟨𝛼, 𝛽⟩ = 𝛽◦ − ◦𝛼

That is, given a function 𝑓 ∈ 𝐒𝐞𝐭(𝐺 ∗, 𝐻 ∗), we produce a function 𝛽◦𝑓 ◦𝛼 ∈ 𝐒𝐞𝐭(𝐺′ ∗, 𝐻 ′ ∗).
In the wedge condition, if we pick 𝑔 to be the element of 𝐒𝐞𝐭(𝐺 ∗, 𝐺 ∗) and ℎ to be the
element of 𝐒𝐞𝐭(𝐻 ∗, 𝐻 ∗), we reproduce our condition:

𝛼◦𝑔 = ℎ◦𝛼

The Tannakian reconstruction theorem, in this case, tells us that:

𝐒𝐞𝐭(𝐹 ∗, 𝐹 ∗) ≅ (∗, ∗)
∫𝐹
In other words, we can recover the monoid from its representations. We’ll see the proof of this
theorem in the context of a more general statement.

Cayley’s theorem
In group theory, Cayley’s theorem states that every group is isomorphic to a (subgroup of the)
group of permutations. A group is just a monoid in which every element has an inverse. Per-
mutations are bijective functions that map a set to itself. They permute the elements of a set.
In category theory, Cayley’s theorem is practically built into the definition of a monoid and
its representations.
The connection between the single-object interpretation and the more traditional set-of-
elements interpretation of a monoid is easy to establish. We do this by constructing the functor
𝐹 ∶  → 𝐒𝐞𝐭 that maps ∗ to the special set 𝑆 that is equal to the hom-set: 𝑆 = (∗, ∗). Ele-
ments of this set are identified with morphisms in . We define the action of 𝐹 on morphisms
as post-composition:
(𝐹 𝑚)𝑛 = 𝑚◦𝑛
Here 𝑚 is a morphism in  and 𝑛 is an element of 𝑆, which happens to also be a morphism in
.
We can take this particular representation as an alternative definition of a monoid in the
monoidal category 𝐒𝐞𝐭. All we need is to implement unit and multiplication:

𝜂∶ 1 → 𝑆
𝜇∶ 𝑆 × 𝑆 → 𝑆
18.1. TANNAKIAN RECONSTRUCTION 275

The unit picks the element of 𝑆 that corresponds to 𝑖𝑑∗ in (∗, ∗). Multiplication of two
elements 𝑚 and 𝑛 is given by the element that corresponds to 𝑚◦𝑛.
At the same time we can look at 𝑆 as an image of 𝐹 ∶  → 𝐒𝐞𝐭, in which case it’s the
functions 𝑆 → 𝑆 that form a representation of the monoid. This is the essence of the Cayley’s
theorem: Every monoid can be represented by a set of endo-functions.
In programming, the best example of applying the Cayley’s theorem is in the efficient im-
plementation of list reversal. Recall the naive recursive implementation of reversal:
reverse :: [a] -> [a]
reverse [] = []
reverse (a : as) = reverse as ++ [a]
It splits the list into the head and the tail, reverses the tail, and appends a singleton made out of
the head to the result. The problem is that every append has to traverse the growing list resulting
in 𝑂(𝑁 2 ) performance.
Remember, however, that a list is a (free) monoid:
instance Monoid [a] where
mempty = []
mappend as bs = as ++ bs
We can use Cayley’s theorem to represent this monoid as functions on lists:
type DList a = [a] -> [a]
To represent a list, we turn it into a function. It’s a function (a closure) that prepends this list as
to its argument xs:
rep :: [a] -> DList a
rep as = \xs -> as ++ xs
This representation is called a difference list.
To turn a function back to a list, it’s enough to apply it to an empty list:
unRep :: DList a -> [a]
unRep f = f []
It’s easy to check that the representation of an empty list is an identity function, and that the
representation of a concatenation of two lists is a composition of representations:
rep [] = id
rep (xs ++ ys) = rep xs . rep ys
So this is exactly the Cayley representation of the list monoid.
We can now translate the reversal algorithm to produce this new representation:
rev :: [a] -> DList a
rev [] = rep []
rev (a : as) = rev as . rep [a]
and turn it back to a list:
fastReverse :: [a] -> [a]
fastReverse = unRep . rev
At first sight it might seem like we haven’t done much except add a layer of conversion on
top of our recursive algorithm. Except that the new algorithm has 𝑂(𝑁) rather than 𝑂(𝑁 2 )
276 CHAPTER 18. TAMBARA MODULES

performance. To see that, consider reversing a simple list [1, 2, 3]. The function rev turns
this list into a composition of functions:
rep [3] . rep [2] . rep [1]
It does it in linear time. The function unRep executes this composite starting with an empty list.
But notice that each rep prepends its argument to the cumulative result. In particular, the final
rep [3] executes:
[3] ++ [2, 1]
Unlike appending, prepending is a constant-time operation, so the whole algorithm takes 𝑂(𝑁)
time.
Another way of looking at it is to realize that rev queues up the actions in the order of the
elements of the list, starting at the head. But the queue of functions is executed in the first-in-
first-out (FIFO) order.
Because of Haskell’s laziness, list reversal using foldl has similar performance:
reverse = foldl (\as a -> a : as) []
This is because foldl, before returning the result, traverses the list left-to-right accumulating
functions (closures). It then executes them as necessary, in the FIFO order.

Proof of Tannakian reconstruction


Monoid reconstruction is a special case of a more general theorem in which, instead of the
single-object category, we use a regular category. As in the monoid case, we’ll reconstruct
the hom-set, only this time it will be a regular hom-set between two objects. We’ll prove the
formula:
𝐒𝐞𝐭(𝐹 𝑎, 𝐹 𝑏) ≅ (𝑎, 𝑏)
∫𝐹 ∶ [,𝐒𝐞𝐭]
The trick is to use the Yoneda lemma to represent the action of 𝐹 :

𝐹 𝑎 ≅ [, 𝐒𝐞𝐭]((𝑎, −), 𝐹 )

and the same for 𝐹 𝑏. We get:

𝐒𝐞𝐭([, 𝐒𝐞𝐭]((𝑎, −), 𝐹 ), [, 𝐒𝐞𝐭]((𝑏, −), 𝐹 ))


∫𝐹 ∶ [,𝐒𝐞𝐭]

Notice that the two sets of natural transformations here are hom-sets in [, 𝐒𝐞𝐭].
Recall the corollary to the Yoneda lemma that works for any category :

[, 𝐒𝐞𝐭]((𝑥, −), (𝑦, −)) ≅ (𝑦, 𝑥)

We can write it using an end:

𝐒𝐞𝐭((𝑥, 𝑧), (𝑦, 𝑧)) ≅ (𝑦, 𝑥)


∫𝑧 ∶ 

In particular, we can replace  with the functor category [, 𝐒𝐞𝐭]. We get:

𝐒𝐞𝐭([, 𝐒𝐞𝐭]((𝑎, −), 𝐹 ), [, 𝐒𝐞𝐭]((𝑏, −), 𝐹 )) ≅ [, 𝐒𝐞𝐭]((𝑏, −), (𝑎, −))
∫𝐹 ∶ [,𝐒𝐞𝐭]
18.1. TANNAKIAN RECONSTRUCTION 277

We can then apply the Yoneda lemma again to the right hand side and get:

(𝑎, 𝑏)

which is exactly the sought after result.


It’s important to realize how the structure of the functor category enters the end through the
wedge condition. It does that through natural transformations. Every time we have a natural
transformation between two functors 𝛼 ∶ 𝐺 → 𝐻, the following diagram must commute:

∫𝐹 𝐒𝐞𝐭(𝐹 𝑎, 𝐹 𝑏)
𝜋𝐺 𝜋𝐻

𝐒𝐞𝐭(𝐺𝑎, 𝐺𝑏) 𝐒𝐞𝐭(𝐻𝑎, 𝐻𝑏)

𝐒𝐞𝐭(𝑖𝑑,𝛼) 𝐒𝐞𝐭(𝛼,𝑖𝑑)
𝐒𝐞𝐭(𝐺𝑎, 𝐻𝑏)

To get some intuition about Tannakian reconstruction, you may recall that 𝐒𝐞𝐭-valued func-
tors have the interpretation as proof-relevant subsets. A functor 𝐹 ∶  → 𝐒𝐞𝐭 (a co-presheaf)
defines a subset of the objects of (a small category) . We say that an object 𝑎 is in that subset
only if 𝐹 𝑎 is non-empty. Each element of 𝐹 𝑎 can then be interpreted as a proof of that.
But unless the category in question is discrete, not all subsets will correspond to co-presheaves.
In particular, whenever there is an arrow 𝑓 ∶ 𝑎 → 𝑏, there also is a function 𝐹 𝑓 ∶ 𝐹 𝑎 → 𝐹 𝑏.
According to our interpretation, such function maps every proof that 𝑎 is in the subset defined
by 𝐹 to a proof that 𝑏 is in that subset. Co-presheaves thus define proof-relevant subsets that are
compatible with the structure of the category.
Let’s reinterpret Tannakian reconstruction in the same spirit.

𝐒𝐞𝐭(𝐹 𝑎, 𝐹 𝑏) ≅ (𝑎, 𝑏)
∫𝐹 ∶ [,𝐒𝐞𝐭]

An element of the left-hand side is a proof that for every subset that is compatible with the
structure of , if 𝑎 belongs to that subset, so does 𝑏. That is only possible if there is an arrow
from 𝑎 to 𝑏.

Tannakian reconstruction in Haskell


We can immediately translate this result to Haskell. We replace the end by forall. The left
hand side becomes:
forall f. Functor f => f a -> f b
and the right hand side is the function type a->b.
We’ve seen polymorphic functions before: they were functions defined for all types, or
sometimes for classes of types. Here we have a function that is defined for all functors. It says:
give me a functorful of a’s and I’ll produce a functorful of b’s—no matter what functor you use.
The only way this can be implemented (using parametric polymorphism) is if this function has
secretly captured a function of the type a->b and is applying it using fmap.
Indeed, one direction of the isomorphism is just that: capturing a function and fmapping it
over the argument:
278 CHAPTER 18. TAMBARA MODULES

toRep :: (a -> b) -> (forall f. Functor f => f a -> f b)


toRep g fa = fmap g fa
The other direction uses the Yoneda trick:
fromRep :: (forall f. Functor f => f a -> f b) -> (a -> b)
fromRep g a = unId (g (Id a))
where the identity functor is defined as:
data Id a = Id a

unId :: Id a -> a
unId (Id a) = a

instance Functor Id where


fmap g (Id a) = Id (g a)
This kind of reconstruction might seem trivial and pointless. Why would anyone want to
replace function type a->b with a much more complicated type:
type Getter a b = forall f. Functor f => f a -> f b
It’s instructive, though, to think of a->b as the precursor of all optics. It’s a lens that focuses
on the 𝑏 part of 𝑎. It tells us that 𝑎 contains enough information, in one form or another, to
construct a 𝑏. It’s a “getter” or an “accessor.”
Obviously, functions compose. What’s interesting though is that functor representations
also compose, and they compose using simple function composition, as seen in this example:
boolToStrGetter :: Getter Bool String
boolToStrGetter = toRep (show) . toRep (bool 0 1)
Other optics don’t compose so easily, but their functor (and profunctor) representations do.

Tannakian reconstruction with adjunction


The trick in generalizing the Tannakian reconstruction is to define the end over some specialized
functor category  by first applying the forgetful functor to its functors. Let’s assume that we
have the free/forgetful adjunction 𝐹 ⊣ 𝑈 between two functor categories  and [, 𝐒𝐞𝐭]:

 (𝐹 𝑄, 𝑃 ) ≅ [, 𝐒𝐞𝐭](𝑄, 𝑈 𝑃 )

where 𝑄 is a functor in [, 𝐒𝐞𝐭] and 𝑃 a functor in  .


Our starting point for Tannakian reconstruction is the following end:
( )
𝐒𝐞𝐭 (𝑈 𝑃 )𝑎, (𝑈 𝑃 )𝑠
∫𝑃 ∶ 

Incidentally, the mapping  → 𝐒𝐞𝐭 parameterized by the object 𝑎, and given by the formula:

𝑃 ↦ (𝑈 𝑃 )𝑎

is sometimes called the fiber functor, so the end formula can be interpreted as a set of natural
transformations between two fiber functors. Conceptually, a fiber functor describes an “infinites-
imal neighborhood” of an object. It maps functors to sets but, more importantly, it maps natural
18.2. PROFUNCTOR LENSES 279

transformations to functions. These functions probe the environment in which the object lives.
In particular, natural transformations in  are involved in wedge conditions that define the end.
(In calculus, stalks of sheaves play a very similar role.)
As we did before, we first apply the Yoneda lemma to get:
( ( ) ( ))
𝐒𝐞𝐭 [, 𝐒𝐞𝐭] (𝑎, −), 𝑈 𝑃 , [, 𝐒𝐞𝐭] (𝑠, −), 𝑈 𝑃
∫𝑃 ∶ 

We can now use the adjunction:


( ( ) ( ))
𝐒𝐞𝐭  𝐹 (𝑎, −), 𝑃 ,  𝐹 (𝑠, −), 𝑃
∫𝑃 ∶ 

We end up with a mapping between two natural transformations in the functor category  . We
can perform the integration using the corollary to the Yoneda lemma, giving us:
( )
 𝐹 (𝑠, −), 𝐹 (𝑎, −)

We can apply the adjunction once more:


( )
𝐒𝐞𝐭 (𝑠, −), (𝑈 ◦𝐹 )(𝑎, −)

and the Yoneda lemma again:


( )
(𝑈 ◦𝐹 )(𝑎, −) 𝑠
The final observation is that the composition 𝑈 ◦𝐹 of adjoint functors is a monad in the functor
category. Let’s call this monad Φ. The result is the following identity that will serve as the
foundation for profunctor optics:
( ) ( )
𝐒𝐞𝐭 (𝑈 𝑃 )𝑎, (𝑈 𝑃 )𝑠 ≅ Φ(𝑎, −) 𝑠
∫𝑃 ∶ 

The right-hand side is the action of the monad Φ = 𝑈 ◦𝐹 on the representable functor (𝑎, −),
which is then evaluated at 𝑠.
Compare this with the earlier formula for Tannakian reconstruction, especially if we rewrite
it in the following form:
𝐒𝐞𝐭(𝐹 𝑎, 𝐹 𝑠) ≅ (𝑎, −)𝑠
∫𝐹 ∶ [,𝐒𝐞𝐭]
Keep in mind that, in the derivation of optics, we’ll replace 𝑎 and 𝑠 with pairs of objects
⟨𝑎, 𝑏⟩ and ⟨𝑠, 𝑡⟩ from  𝑜𝑝 × . In that case our functors will become profunctors.

18.2 Profunctor Lenses


Our goal is to find a functor representation for optics. We’ve seen before that, for instance, type-
changing lenses can be seen as hom-sets in the 𝐋𝐞𝐧𝐬 category. The objects in 𝐋𝐞𝐧𝐬 are pairs of
objects from some category , and a hom-set from one such pair ⟨𝑠, 𝑡⟩ to another ⟨𝑎, 𝑏⟩ is given
by the coend formula:
𝑐
⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩ = (𝑠, 𝑐 × 𝑎) × (𝑐 × 𝑏, 𝑡)

280 CHAPTER 18. TAMBARA MODULES

Notice that the pair of hom-sets in this formula can be seen as a single hom-set in the product
category  𝑜𝑝 × :
𝑐
⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩ = ( 𝑜𝑝 × )(𝑐 ∙ ⟨𝑎, 𝑏⟩, ⟨𝑠, 𝑡⟩)

where we define the action of 𝑐 on a pair ⟨𝑎, 𝑏⟩ as:

𝑐 ∙ ⟨𝑎, 𝑏⟩ = ⟨𝑐 × 𝑎, 𝑐 × 𝑏⟩

This is a shorthand notation for the diagonal part of a more general action of  𝑜𝑝 ×  on itself
given by:
⟨𝑐, 𝑐 ′ ⟩ ∙ ⟨𝑎, 𝑏⟩ = ⟨𝑐 × 𝑎, 𝑐 ′ × 𝑏⟩

This suggests that, to represent such optics, we should be looking at co-presheaves on the
category  𝑜𝑝 × , that is, we should be considering profunctor representations.

Iso
As a quick check of this idea, let’s apply our reconstruction formula to the simple case of  =
[ 𝑜𝑝 ×, 𝐒𝐞𝐭] with no additional structure. In that case we don’t need to use the forgetful functors,
or the monad Φ, and we just get the straightforward application of Tannakian reconstruction:

( ) ( )
⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩ = 𝐒𝐞𝐭 𝑃 ⟨𝑎, 𝑏⟩, 𝑃 ⟨𝑠, 𝑡⟩ ≅ ( 𝑜𝑝 × )(⟨𝑎, 𝑏⟩, −) ⟨𝑠, 𝑡⟩
∫𝑃 ∶ 

The right hand side evaluates to:

( 𝑜𝑝 × )(⟨𝑎, 𝑏⟩, ⟨𝑠, 𝑡⟩) = (𝑠, 𝑎) × (𝑏, 𝑡)

This optic is known in Haskell as Iso (or an adapter):


type Iso s t a b = (s -> a, b -> t)
and it, indeed, has a profunctor representation corresponding to the following end:
type IsoP s t a b = forall p. Profunctor p => p a b -> p s t
Given a pair of functions it’s easy to construct this profunctor-polymorphic function:
toIsoP :: (s -> a, b -> t) -> IsoP s t a b
toIsoP (f, g) = dimap f g
This is simply saying that any profunctor can be used to lift a pair of functions.
Conversely, we may ask the question: How can a single polymorphic function map the set
𝑃 ⟨𝑎, 𝑏⟩ to the set 𝑃 ⟨𝑠, 𝑡⟩ for every profunctor imaginable? The only thing this function knows
about the profunctor is that it can lift a pair of functions. Therefore it must be a closure that
either contains or is able to produce a pair of functions (s -> a, b -> t).

Exercise 18.2.1. Implement the function:


fromIsoP :: IsoP s t a b -> (s -> a, b -> t)
Hint: Use the fact that a pair of identity functions can be used to construct the following pro-
functor:
18.2. PROFUNCTOR LENSES 281

data Adapter a b s t = Ad (s -> a, b -> t)

Profunctors and lenses


Let’s try to apply the same logic to lenses. We have to find a class of profunctors to plug into
our profunctor representation. Let’s assume that the forgetful functor 𝑈 only forgets additional
structure but doesn’t change the sets, so the set 𝑃 ⟨𝑎, 𝑏⟩ is the same as the set (𝑈 𝑃 )⟨𝑎, 𝑏⟩.
Let’s start with the existential representation. We have at our disposal an object 𝑐 and a pair
of functions:
⟨𝑓 , 𝑔⟩ ∶ (𝑠, 𝑐 × 𝑎) × (𝑐 × 𝑏, 𝑡)
We want to build a profunctor representation, so we have to be able to map the set 𝑃 ⟨𝑎, 𝑏⟩
to the set 𝑃 ⟨𝑠, 𝑡⟩. We could get 𝑃 ⟨𝑠, 𝑡⟩ by lifting these two functions, but only if start from
𝑃 ⟨𝑐 × 𝑎, 𝑐 × 𝑏⟩. Indeed:
𝑃 ⟨𝑓 , 𝑔⟩ ∶ 𝑃 ⟨𝑐 × 𝑎, 𝑐 × 𝑏⟩ → 𝑃 ⟨𝑠, 𝑡⟩
What we are missing is the mapping:

𝑃 ⟨𝑎, 𝑏⟩ → 𝑃 ⟨𝑐 × 𝑎, 𝑐 × 𝑏⟩

And this is exactly the additional structure we shall demand from our profunctor class.

Tambara module
A profunctor 𝑃 equipped with the family of transformations:

𝛼⟨𝑎,𝑏⟩,𝑐 ∶ 𝑃 ⟨𝑎, 𝑏⟩ → 𝑃 ⟨𝑐 × 𝑎, 𝑐 × 𝑏⟩

is called a Tambara module.


We want this family to be natural in 𝑎 and 𝑏, but what should we demand from 𝑐? The
problem with 𝑐 is that it appears twice, once in the contravariant, and once in the covariant
position. So, if we want to interact nicely with arrows like ℎ ∶ 𝑐 → 𝑐 ′ , we have to modify the
naturality condition. We may consider a more general profunctor 𝑃 ⟨𝑐 ′ × 𝑎, 𝑐 × 𝑏⟩ and treat 𝛼 as
producing its diagonal elements, ones in which 𝑐 ′ is the same as 𝑐.
A transformation 𝛼 between diagonal parts of two profunctors 𝑃 and 𝑄 is called a dinatural
transformation (di-agonally natural) if the following diagram commutes for any 𝑓 ∶ 𝑐 → 𝑐 ′ :

𝑃 ⟨𝑐 ′ , 𝑐⟩
𝑃 ⟨𝑓 ,𝑐⟩ 𝑃 ⟨𝑐 ′ ,𝑓 ⟩

𝑃 ⟨𝑐, 𝑐⟩ 𝑃 ⟨𝑐 ′ , 𝑐 ′ ⟩
𝛼𝑐 𝛼𝑐 ′

𝑄⟨𝑐, 𝑐⟩ 𝑄⟨𝑐 ′ , 𝑐 ′ ⟩

𝑃 ⟨𝑐,𝑓 ⟩ 𝑃 ⟨𝑓 ,𝑐⟩
𝑄⟨𝑐, 𝑐 ′ ⟩

(I used the common shorthand 𝑃 ⟨𝑓 , 𝑐⟩, reminiscent of whiskering, for 𝑃 ⟨𝑓 , 𝑖𝑑𝑐 ⟩.)
282 CHAPTER 18. TAMBARA MODULES

In our case, the dinaturality condition simplifies to:

𝑃 ⟨𝑎, 𝑏⟩
𝛼⟨𝑎,𝑏⟩,𝑐 𝛼⟨𝑎,𝑏⟩,𝑐 ′

𝑃 ⟨𝑐 × 𝑎, 𝑐 × 𝑏⟩ 𝑃 ⟨𝑐 ′ × 𝑎, 𝑐 ′ × 𝑏⟩

𝑃 ⟨𝑐×𝑎,𝑓 ×𝑏⟩ 𝑃 ⟨𝑓 ×𝑏,𝑐×𝑏⟩


𝑃 ⟨𝑐 × 𝑎, 𝑐 ′ × 𝑏⟩

(Here, again 𝑃 ⟨𝑓 × 𝑏, 𝑐 × 𝑏⟩ stands for 𝑃 ⟨𝑓 × 𝑖𝑑𝑏 , 𝑖𝑑𝑐×𝑏 ⟩.)


There is one more consistency condition on Tambara modules: they must preserve the
monoidal structure. The action of multiplying by 𝑐 makes sense in a cartesian category: we
have to have a product for any pair of objects, and we want to have the terminal object to serve
as the unit of multiplication. Tambara modules have to respect unit and preserve multiplication.
For the unit (terminal object), we impose the following condition:

𝛼⟨𝑎,𝑏⟩,1 = 𝑖𝑑𝑃 ⟨𝑎,𝑏⟩

For multiplication, we have:

𝛼⟨𝑎,𝑏⟩,𝑐 ′ ×𝑐 ≅ 𝛼⟨𝑐×𝑎,𝑐×𝑏⟩,𝑐 ′ ◦𝛼⟨𝑎,𝑏⟩,𝑐

or, pictorially:

𝛼⟨𝑎,𝑏⟩,𝑐 ′ ×𝑐
𝑃 ⟨𝑎, 𝑏⟩ 𝑃 ⟨𝑐 ′ × 𝑐 × 𝑎, 𝑐 ′ × 𝑐 × 𝑏⟩

𝛼⟨𝑎,𝑏⟩,𝑐 𝛼⟨𝑐×𝑎,𝑐×𝑏⟩,𝑐 ′

𝑃 ⟨𝑐 × 𝑎, 𝑐 × 𝑏⟩

(Notice that the product is associative up to isomorphism, so there is a hidden associator in this
diagram.)
Since we want Tambara modules to form a category, we have to define morphisms between
them. These are natural transformations that preserve the additional structure. Let’s say we have
a natural transformation 𝜌 between two Tambara modules 𝜌 ∶ (𝑃 , 𝛼) → (𝑄, 𝛽). We can either
apply 𝛼 and then 𝜌, or do 𝜌 first and then 𝛽. We want the result to be the same:

𝛼⟨𝑎,𝑏⟩,𝑐
𝑃 ⟨𝑎, 𝑏⟩ 𝑃 ⟨𝑐 × 𝑎, 𝑐 × 𝑏⟩
𝜌⟨𝑎,𝑏⟩ 𝜌⟨𝑐×𝑎,𝑐×𝑏⟩
𝛽⟨𝑎,𝑏⟩,𝑐
𝑄⟨𝑎, 𝑏⟩ 𝑄⟨𝑐 × 𝑎, 𝑐 × 𝑏⟩

Keep in mind that the structure of the Tambara category is encoded in these natural transforma-
tions. They will determine, through the wedge condition, the shape of the ends that enter the
definition of profunctor lenses.
18.2. PROFUNCTOR LENSES 283

Profunctor lenses
Now that we have some intuition about how Tambara modules are related to lenses, let’s go back
to our main formula:
( ) ( )
⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩ = 𝐒𝐞𝐭 (𝑈 𝑃 )⟨𝑎, 𝑏⟩, (𝑈 𝑃 )⟨𝑠, 𝑡⟩ ≅ Φ( 𝑜𝑝 × )(⟨𝑎, 𝑏⟩, −) ⟨𝑠, 𝑡⟩
∫𝑃 ∶ 
This time we’re taking the end over the Tambara category. The only missing part is the monad
Φ = 𝑈 ◦𝐹 or the functor 𝐹 that freely generates Tambara modules.
It turns out that, instead of guessing the monad, it’s easier to guess the comonad. There
is a comonad in the category of profunctors that takes a profunctor 𝑃 and produces another
profunctor Θ𝑃 . Here’s the formula:

(Θ𝑃 )⟨𝑎, 𝑏⟩ = 𝑃 ⟨𝑐 × 𝑎, 𝑐 × 𝑏⟩
∫𝑐
You can check that this is indeed a comonad by implementing 𝜀 and 𝛿 (extract and duplicate).
For instance, 𝜀 maps Θ𝑃 → 𝑃 using the projection 𝜋1 for the terminal object (the unit of carte-
sian product).
What’s interesting about this comonad is that its coalgebras are Tambara modules. Again,
these are coalgebras that map profunctors to profunctors. They are natural transformations 𝑃 →
Θ𝑃 . We can write such a natural transformation as an element of the end:
( ) ( )
𝐒𝐞𝐭 𝑃 ⟨𝑎, 𝑏⟩, (Θ𝑃 )⟨𝑎, 𝑏⟩ = 𝐒𝐞𝐭 𝑃 ⟨𝑎, 𝑏⟩, 𝑃 ⟨𝑐 × 𝑎, 𝑐 × 𝑏⟩
∫𝑎,𝑏 ∫𝑎,𝑏 ∫𝑐

I used the continuity of the hom-functor to pull out the end over 𝑐. The resulting end encodes a
set of natural (dinatural in 𝑐) transformations that define a Tambara module:

𝛼⟨𝑎,𝑏⟩,𝑐 ∶ 𝑃 ⟨𝑎, 𝑏⟩ → 𝑃 ⟨𝑐 × 𝑎, 𝑐 × 𝑏⟩

In fact, these coalgebras are comonad coalgebras, that is they are compatible with the comonad
Θ. In other words, Tambara modules form the Eilenberg-Moore category of coalgebras for the
comonad Θ.
The left adjoint to Θ is a monad Φ given by the formula:
𝑢,𝑣,𝑐 ( )
(Φ𝑃 )⟨𝑠, 𝑡⟩ = ( 𝑜𝑝 × ) 𝑐 ∙ ⟨𝑢, 𝑣⟩, ⟨𝑠, 𝑡⟩ × 𝑃 ⟨𝑢, 𝑣⟩

where I used the shorthand notation:
( )
( 𝑜𝑝 × ) 𝑐 ∙ ⟨𝑢, 𝑣⟩, ⟨𝑠, 𝑡⟩ = (𝑠, 𝑐 × 𝑢) × (𝑐 × 𝑣, 𝑡)

This adjunction can be easily verified using some end/coend manipulations: The mapping
out of Φ𝑃 to some profunctor 𝑄 can be written as an end. The coends in Φ can then be taken
out using co-continuity of the hom-functor. Finally, applying the ninja-Yoneda lemma produces
the mapping into Θ𝑄. We get:

[( 𝑜𝑝 × , 𝐒𝐞𝐭](𝑃 Φ, 𝑄) ≅ [( 𝑜𝑝 × , 𝐒𝐞𝐭](𝑃 , Θ𝑄)

Replacing 𝑄 with 𝑃 we immediately see that the set of algebras for Φ is isomorphic to the set
of coalgebras for Θ. In fact they are monad algebras for Φ. This means that the Eilenberg-Moore
category for the monad Φ is the same as the Tambara category.
284 CHAPTER 18. TAMBARA MODULES

Recall that the Eilenberg-Moore construction factorizes a monad into a free/forgetful ad-
junction. This is exactly the adjunction we were looking for when deriving the formula for
profunctor optics.
What remains is to evaluate the action of Φ on the representable functor:

( ) 𝑢,𝑣,𝑐 ( ) ( )
Φ( 𝑜𝑝 × )(⟨𝑎, 𝑏⟩, −) ⟨𝑠, 𝑡⟩ = ( 𝑜𝑝 × ) 𝑐 ∙ ⟨𝑢, 𝑣⟩, ⟨𝑠, 𝑡⟩ × ( 𝑜𝑝 × ) ⟨𝑎, 𝑏⟩, ⟨𝑢, 𝑣⟩

Applying the co-Yoneda lemma, we get:


𝑐 ( ) 𝑐
( 𝑜𝑝 × ) 𝑐 ∙ ⟨𝑎, 𝑏⟩, ⟨𝑠, 𝑡⟩ = (𝑠, 𝑐 × 𝑎) × (𝑐 × 𝑏, 𝑡)
∫ ∫

which is exactly the existential representation of the lens.

Profunctor lenses in Haskell


To define profunctor representation in Haskell we introduce a class of profunctors that are Tam-
bara modules with respect to cartesian product (we’ll see more general Tambara modules later).
In the Haskell library this class is called Strong. It also appears in the literature as Cartesian:
class Profunctor p => Cartesian p where
alpha :: p a b -> p (c, a) (c, b)
The polymorphic function alpha has all the relevant naturality properties guaranteed by para-
metric polymorphism.
The profunctor lens is just a type synonym for a function type that is polymorphic in Cartesian
profunctors:
type LensP s t a b = forall p. Cartesian p => p a b -> p s t
The easiest way to implement such a function is to start from the existential representation
of a lens and apply alpha followed by dimap to the profunctor argument:
toLensP :: LensE s t a b -> LensP s t a b
toLensP (LensE from to) = dimap from to . alpha
Because profunctor lenses are just functions, we can compose them as such:
lens1 :: LensP s t x y
-- p s t -> p x y
lens2 :: LensP x y a b
-- p x y -> p a b
lens3 :: LensP s t a b
-- p s t -> p a b
lens3 = lens2 . lens1
The converse mapping from a profunctor representation to the set/get representation of the
lens is also possible. For that we need to guess the profunctor that we can feed into LensP. It
turns out that the get/set representation of the lens is such a profunctor, when we fix the pair of
types a and b. We define:
data FlipLens a b s t = FlipLens (s -> a) (s -> b -> t)
It’s easy to show that it’s indeed a profunctor:
18.3. GENERAL OPTICS 285

instance Profunctor (FlipLens a b) where


dimap f g (FlipLens get set) = FlipLens (get . f) (fmap g . set . f)
Not only that—it is also a Cartesian profunctor:
instance Cartesian (FlipLens a b) where
alpha(FlipLens get set) = FlipLens get' set'
where get' = get . snd
set' = \(x, s) b -> (x, set s b)
We can now initialize FlipLens with a trivial pair of a getter and a setter and feed it to our
profunctor representation:
fromLensP :: LensP s t a b -> (s -> a, s -> b -> t)
fromLensP pp = (get', set')
where FlipLens get' set' = pp (FlipLens id (\s b -> b))

18.3 General Optics


Tambara modules were originally defined for an arbitrary monoidal category1 with a tensor
product ⊗ and a unit object 𝐼. Their structure maps have the form:

𝛼⟨𝑎,𝑏⟩,𝑐 ∶ 𝑃 ⟨𝑎, 𝑏⟩ → 𝑃 ⟨𝑐 ⊗ 𝑎, 𝑐 ⊗ 𝑏⟩

You can easily convince yourself that all coherency laws translate directly to this case, and the
derivation of profunctor optics works without a change.

Prisms
From the programming point of view there are two obvious monoidal structures to explore: the
product and the sum. We’ve seen that the product gives rise to lenses. The sum, or the coproduct,
gives rise to prisms.
We get the existential representation simply by replacing the product by the sum in the
definition of a lens: 𝑐
⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩ = (𝑠, 𝑐 + 𝑎) × (𝑐 + 𝑏, 𝑡)

To simplify this, notice that the mapping out of a sum is equivalent to the product of mappings:
𝑐 𝑐
(𝑠, 𝑐 + 𝑎) × (𝑐 + 𝑏, 𝑡) ≅ (𝑠, 𝑐 + 𝑎) × (𝑐, 𝑡) × (𝑏, 𝑡)
∫ ∫
Using the co-Yoneda lemma, we can get rid of the coend to get:

(𝑠, 𝑡 + 𝑎) × (𝑏, 𝑡)

In Haskell, this defines a pair of functions:


match :: s -> Either t a
build :: b -> t
To understand this, let’s first translate the existential form of the prism:

1
In fact, Tambara modules were originally defined for a category enriched over vector spaces
286 CHAPTER 18. TAMBARA MODULES

data Prism s t a b where


Prism :: (s -> Either c a) -> (Either c b -> t) -> Prism s t a b
Here s either contains the focus a or the residue c. Conversely, t can be built either from the
new focus b, or from the residue c.
This logic is reflected in the conversion functions:
toMatch :: Prism s t a b -> (s -> Either t a)
toMatch (Prism from to) s =
case from s of
Left c -> Left (to (Left c))
Right a -> Right a

toBuild :: Prism s t a b -> (b -> t)


toBuild (Prism from to) b = to (Right b)

toPrism :: (s -> Either t a) -> (b -> t) -> Prism s t a b


toPrism match build = Prism from to
where
from = match
to (Left c) = c
to (Right b) = build b

The profunctor representation of the prism is almost identical to that of the lens, except for
swapping the product for the sum.
The class of Tambara modules for the sum type is called Choice in the Haskell library, or
Cocartesian in the literature:
class Profunctor p => Cocartesian p where
alpha' :: p a b -> p (Either c a) (Either c b)
The profunctor representation is a polymorphic function type:
type PrismP s t a b = forall p. Cocartesian p => p a b -> p s t

The conversion from the existential prism is virtually identical to that of the lens:
toPrismP :: Prism s t a b -> PrismP s t a b
toPrismP (Prism from to) = dimap from to . alpha'

Again, profunctor prisms compose using function composition.

Traversals
A traversal is a type of optic that focuses on multiple foci at once. Imagine, for instance, that
you have a tree that can have zero or more leaves of type a. A traversal should be able to get you
a list of those nodes. It should also let you replace these nodes with a new list. And here’s the
problem: the length of the list that delivers the replacements must match the number of nodes,
otherwise bad things happen.
A type-safe implementation of a traversal would require us to keep track of the sizes of lists.
In other words, it would require dependent types.
In Haskell, a (non-type-safe) traversal is often written as:
18.3. GENERAL OPTICS 287

type Traversal s t a b = s -> ([b] -> t, [a])


with the understanding that the sizes of the two lists are determined by s and must be the same.
When translating traversals to categorical language, we’ll express this condition using a sum
over the sizes of the list. A counted list of size 𝑛 is an 𝑛-tuple, or an element of 𝑎𝑛 , so we can
write: ( ∑ )
𝐓𝐫⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩ = 𝐒𝐞𝐭 𝑠, (𝐒𝐞𝐭(𝑏𝑛 , 𝑡) × 𝑎𝑛 )
𝑛
We interpret a traversal as a function that, given a source 𝑠 produces an existential type that is
hiding an 𝑛. It says that there exists an 𝑛 and a pair consisting of a function 𝑏𝑛 → 𝑡 and an 𝑛-tuple
𝑎𝑛 .
The existential form of a traversal must take into account the fact that the residues for dif-
ferent 𝑛’s will have, in principle, different types. For instance, you can decompose a tree into
an n-tuple of leaves 𝑎𝑛 and the residue 𝑐𝑛 with 𝑛 holes. So the correct existential representation
for a traversal must involve a coend over all sequences 𝑐𝑛 that are indexed by natural numbers:
𝑐𝑛 ∑ ∑
𝐓𝐫⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩ = (𝑠, 𝑐𝑚 × 𝑎𝑚 ) × ( 𝑐𝑘 × 𝑏𝑘 , 𝑡)
∫ 𝑚 𝑘

The sums here are coproducts in .


One way to look at sequences 𝑐𝑛 is to interpret them as fibrations. For instance, in 𝐒𝐞𝐭 we
would start with a set 𝐶 and a projection 𝑝 ∶ 𝐶 → ℕ, where ℕ is a set of natural numbers.
Similarly 𝑎𝑛 could be interpreted as a fibration of the free monoid on 𝑎 (the set of lists of 𝑎’s)
with the projection that extracts the length of the list.
Or we can look at 𝑐𝑛 ’s as mappings from the set of natural numbers to . In fact, we can
treat natural numbers as a discrete category  , in which case 𝑐𝑛 ’s are functors  → .
𝑐 ∶ [ ,] ∑ ∑
𝐓𝐫⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩ = (𝑠, 𝑐𝑚 × 𝑎𝑚 ) × ( 𝑐𝑘 × 𝑏𝑘 , 𝑡)
∫ 𝑚 𝑘
To show the equivalence of the two representations, we first rewrite the mapping out of a
sum as a product of mappings:
𝑐 ∶ [ ,] ∑ ∏
(𝑠, 𝑐𝑚 × 𝑎𝑚 ) × (𝑐𝑘 × 𝑏𝑘 , 𝑡)
∫ 𝑚 𝑘

and then use the currying adjunction:


𝑐 ∶ [ ,] ∑ ∏
(𝑠, 𝑐𝑚 × 𝑎𝑚 ) × (𝑐𝑘 , [𝑏𝑘 , 𝑡])
∫ 𝑚 𝑘
𝑘
Here, [𝑏𝑘 , 𝑡] is the internal hom, which is an alternative notation for the exponential object 𝑡𝑏 .
The next step is to recognize that a product in this formula represents a set of natural trans-
formations in [ , ]. Indeed, we could write it as an end:

(𝑐𝑘 , [𝑏𝑘 , 𝑡] ≅ 𝐶(𝑐𝑘 , [𝑏𝑘 , 𝑡])
∫𝑘∶
𝑘

This is because an end over a discrete category is just a product. Alternatively, we could write
it as a hom-set in the functor category:
[ , ](𝑐− , [𝑏− , 𝑡])
288 CHAPTER 18. TAMBARA MODULES

with placeholders replacing the arguments to the two functors in question:

𝑘 ↦ 𝑐𝑘

𝑘 ↦ [𝑏𝑘 , 𝑡]

We can now use the co-Yoneda lemma in the functor category [ , ]:

𝑐 ∶ [ ,] ∑ ( ) ∑
(𝑠, 𝑐𝑚 × 𝑎𝑚 ) × [ , ] 𝑐− , [𝑏− , 𝑡] ≅ (𝑠, [𝑏𝑚 , 𝑡] × 𝑎𝑚 )
∫ 𝑚 𝑚

This result is more general than our original formula, but it turns into it when restricted to the
category of sets.
To derive a profunctor representation for traversals, we should look more closely at the kind
of transformations that are involved. We define the action of a functor 𝑐 ∶ [ , ] on 𝑎 as:

𝑐∙𝑎= 𝑐𝑚 × 𝑎𝑚
𝑚

These actions can be composed by expanding the formula using distributivity laws:
∑ ∑
𝑐 ∙ (𝑐 ′ ∙ 𝑎) = 𝑐𝑚 × ( 𝑐𝑛′ × 𝑎𝑛 )𝑚
𝑚 𝑛

If the target category is 𝐒𝐞𝐭, this is equivalent to the following Day convolution (for non-𝐒𝐞𝐭
categories, one could use the enriched version of the Day convolution):
𝑚,𝑛
(𝑐 ⋆ 𝑐 ′ )𝑘 =  (𝑚 + 𝑛, 𝑘) × 𝑐𝑚 × 𝑐𝑛′

This gives monoidal structure to the category [ , ].


The existential representation of traversals can be written is terms of the action of this
monoidal category on :

𝑐 ∶ [ ,]
𝐓𝐫⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩ = (𝑠, 𝑐 ∙ 𝑎) × (𝑐 ∙ 𝑏, 𝑡)

To derive the profunctor representation of traversals, we have to generalize Tambara mod-


ules to the action of a monoidal category:

𝛼⟨𝑎,𝑏⟩,𝑐 ∶ 𝑃 ⟨𝑎, 𝑏⟩ → 𝑃 ⟨𝑐 ∙ 𝑎, 𝑐 ∙ 𝑏⟩

It turns out that the original derivation of profunctor optics still works for these generalized
Tambara modules, and traversals can be written as polymorphic functions:

( )
𝐓𝐫⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩ = 𝐒𝐞𝐭 (𝑈 𝑃 )⟨𝑎, 𝑏⟩, (𝑈 𝑃 )⟨𝑠, 𝑡⟩
∫𝑃 ∶ 

where the end is taken over a generalized Tambara module.


18.4. MIXED OPTICS 289

18.4 Mixed Optics


Whenever we have an action of a monoidal category  on  we can define the corresponding
optic. A category with such an action is called an actegory. We can go even further by con-
sidering two separate actions. Suppose that  can act on both  and . We’ll use the same
notation for both actions:
∙∶  ×  → 
∙∶  ×  → 
We can then define the mixed optics as:
𝑚∶ 
⟨𝑠, 𝑡⟩⟨𝑎, 𝑏⟩ = (𝑠, 𝑚 ∙ 𝑎) × (𝑚 ∙ 𝑏, 𝑡)

These mixed optics have profunctor representations in terms of profunctors:

𝑃 ∶  𝑜𝑝 ×  → 𝐒𝐞𝐭

and the corresponding Tambara modules that use two separate actions:

𝛼⟨𝑎,𝑏⟩,𝑚 ∶ 𝑃 ⟨𝑎, 𝑏⟩ → 𝑃 ⟨𝑚 ∙ 𝑎, 𝑚 ∙ 𝑏⟩

with 𝑎 an object of , 𝑏 and object of , and 𝑚 and object of .

Exercise 18.4.1. What are the mixed optics for the action of the cartesian product when one of
the categories is the terminal category? What if the first category is  𝑜𝑝 ×  and the second is
terminal?
Chapter 19

Kan Extensions

If category theory keeps raising levels of abstraction it’s because it’s all about discovering pat-
terns. Once patterns are discovered, it’s time to study patterns formed by these patterns, and so
on.
We’ve seen the same recurring concepts described more and more tersely at higher and
higher levels of abstraction.
For instance, we first defined the product using a universal construction. Then we saw
that the spans in the definition of the product were natural transformations. That led to the
interpretation of the product as a limit. Then we saw that we could define it using adjunctions.
We were able to combine it with the coproduct in one terse formula:

(+) ⊣ Δ ⊣ (×)

Lao Tzu said: “If you want to shrink something, you must first allow it to expand."
Kan extensions raise the level of abstraction even higher. Mac Lane said: “All concepts are
Kan extensions.”

19.1 Closed Monoidal Categories


We’ve seen how a function object can be defined as the right adjoint to the categorical product:

(𝑎 × 𝑏, 𝑐) ≅ (𝑎, [𝑏, 𝑐])

Here I used the alternative notation [𝑏, 𝑐] for the internal hom—the exponential 𝑐 𝑏 .
An adjunction between two functors can be thought of as one being the pseudo-inverse of
the other. They don’t compose to identity, but their composition is related to the identity functor
through unit and counit. For instance, if you squint hard enough, the counit of the currying
adjunction:
𝜀𝑏𝑐 ∶ [𝑏, 𝑐] × 𝑏 → 𝑐
suggests that [𝑏, 𝑐] embodies, in a sense, the inverse of multiplication. It plays a similar role as
𝑐∕𝑏 in:
𝑐∕𝑏 × 𝑏 = 𝑐
In a typical categorical manner, we may ask the question: What if we replace the product
with something else? The obvious thing, replacing it with a coproduct, doesn’t work (thus we

291
292 CHAPTER 19. KAN EXTENSIONS

have no analog of subtraction). But maybe there are other well-behaved binary operations that
have a right adjoint.
A natural setting for generalizing a product is a monoidal category with a tensor product ⊗
and a unit object 𝐼. If we have an adjunction:
(𝑎 ⊗ 𝑏, 𝑐) ≅ (𝑎, [𝑏, 𝑐])
we’ll call the category closed monoidal. In a typical categorical abuse of notation, unless it leads
to confusion, we’ll use the same symbol (a pair of square brackets) for the monoidal internal
hom as we did for the cartesian hom. There is an alternative lollipop notation for the right adjoint
to the tensor product:
(𝑎 ⊗ 𝑏, 𝑐) ≅ (𝑎, 𝑏 ⊸ 𝑐)
It is often used in the context of linear types.
The definition of the internal hom works well for a symmetric monoidal category. If the
tensor product is not symmetric, the adjunction defines a left closed monoidal category. The left
internal hom is adjoint to the “post-multiplication” functor (− ⊗ 𝑏). The right-closed structure
is defined as the right adjoint to the “pre-multiplication” functor (𝑏 ⊗ −). If both are defined
than the category is called bi-closed.

Internal hom for Day convolution


As an example, consider the symmetric monoidal structure in the category of co-presheaves
with Day convolution:
𝑎,𝑏
(𝐹 ⋆ 𝐺)𝑥 = (𝑎 ⊗ 𝑏, 𝑥) × 𝐹 𝑎 × 𝐺𝑏

We are looking for the adjunction:
[, 𝐒𝐞𝐭](𝐹 ⋆ 𝐺, 𝐻) ≅ [, 𝐒𝐞𝐭](𝐹 , [𝐺, 𝐻]Day )
The natural transformation on the left-hand side can be written as an end over 𝑥:
( 𝑎,𝑏 )
𝐒𝐞𝐭 (𝑎 ⊗ 𝑏, 𝑥) × 𝐹 𝑎 × 𝐺𝑏, 𝐻𝑥
∫𝑥 ∫
We can use co-continuity to pull out the coends:
( )
𝐒𝐞𝐭 (𝑎 ⊗ 𝑏, 𝑥) × 𝐹 𝑎 × 𝐺𝑏, 𝐻𝑥
∫𝑥,𝑎,𝑏
We can then use the currying adjunction in 𝐒𝐞𝐭 (the square brackets stand for the internal hom
in 𝐒𝐞𝐭):
( )
𝐒𝐞𝐭 𝐹 𝑎, [(𝑎 ⊗ 𝑏, 𝑥) × 𝐺𝑏, 𝐻𝑥]
∫𝑥,𝑎,𝑏
Finally, we use the continuity of the hom-set to move the two ends inside the hom-set:
( )
𝐒𝐞𝐭 𝐹 𝑎, [(𝑎 ⊗ 𝑏, 𝑥) × 𝐺𝑏, 𝐻𝑥]
∫𝑎 ∫𝑥,𝑏
We discover that the right adjoint to Day convolution is given by:
( ) [ ]
[𝐺, 𝐻]Day 𝑎 = (𝑎 ⊗ 𝑏, 𝑥), [𝐺𝑏, 𝐻𝑥] ≅ [𝐺𝑏, 𝐻(𝑎 ⊗ 𝑏)]
∫𝑥,𝑏 ∫𝑏
The last transformation is the application of the Yoneda lemma in 𝐒𝐞𝐭.
19.1. CLOSED MONOIDAL CATEGORIES 293

Exercise 19.1.1. Implement the internal hom for Day convolution in Haskell. Hint: Use a type
alias.
Exercise 19.1.2. Implement witnesses to the adjunction:
ltor :: (forall a. Day f g a -> h a) -> (forall a. f a -> DayHom g h a)
rtol :: Functor h =>
(forall a. f a -> DayHom g h a) -> (forall a. Day f g a -> h a)

Powering and co-powering


In the category of sets, the internal hom (the function object, or the exponential) is isomorphic
to the external hom (the set of morphisms between two objects):

𝐶 𝐵 ≅ 𝑆𝑒𝑡(𝐵, 𝐶)

We can therefore rewrite the currying adjunction that defines the internal hom in 𝐒𝐞𝐭 as:
( )
𝐒𝐞𝐭(𝐴 × 𝐵, 𝐶) ≅ 𝐒𝐞𝐭 𝐴, 𝐒𝐞𝐭(𝐵, 𝐶)

We can generalize this adjunction to the case where 𝐵 and 𝐶 are not sets but objects in some
category . The external hom in any category is always a set. But the left-hand side is no longer
defined by a product. Instead it defines the action of a set 𝐴 on an object 𝑏:
( )
(𝐴 ⋅ 𝑏, 𝑐) ≅ 𝐒𝐞𝐭 𝐴, (𝑏, 𝑐)

that is called a co-power.


You may think of this action as adding together (taking a coproduct of) 𝐴 copies of 𝑏. For
instance, if 𝐴 is a two-element set 𝟐, we get:
( )
(𝟐 ⋅ 𝑏, 𝑐) ≅ 𝐒𝐞𝐭 𝟐, (𝑏, 𝑐) ≅ (𝑏, 𝑐) × (𝑏, 𝑐) ≅ (𝑏 + 𝑏, 𝑐)

In other words,
𝟐⋅𝑏≅𝑏+𝑏
In this sense a co-power defines multiplication in terms of iterated addition, the way we learned
it in school.
If we multiply 𝑏 by the hom-set (𝑏, 𝑐) and take the coend over 𝑏’s, the result is isomorphic
to 𝑐:
𝑏
𝐶(𝑏, 𝑐) ⋅ 𝑏 ≅ 𝑐

Indeed, the mappings to an arbitrary 𝑥 from both sides are isomorphic due to the Yoneda lemma:
( 𝑏 ) ( )
 (𝑏, 𝑐) ⋅ 𝑏, 𝑥 ≅ 𝐒𝐞𝐭 (𝑏, 𝑐), (𝑏, 𝑥) ≅ (𝑐, 𝑥)
∫ ∫𝑏
As expected, in 𝐒𝐞𝐭, the co-power decays to the cartesian product.
( )
𝐒𝐞𝐭(𝐴 ⋅ 𝐵, 𝐶) ≅ 𝐒𝐞𝐭 𝐴, 𝐒𝐞𝐭(𝐵, 𝐶) ≅ 𝐒𝐞𝐭(𝐴 × 𝐵, 𝐶)

Similarly, we can express powering as iterated multiplication. We use the same right-hand
side, but this time we use the mapping-in to define the power:
( )
(𝑏, 𝐴 ⋔ 𝑐) ≅ 𝐒𝐞𝐭 𝐴, (𝑏, 𝑐)
294 CHAPTER 19. KAN EXTENSIONS

You may think of the power as multiplying together 𝐴 copies of 𝑐. Indeed, replacing 𝐴 with 𝟐
results in:
( )
(𝑏, 𝟐 ⋔ 𝑐) ≅ 𝐒𝐞𝐭 𝟐, (𝑏, 𝑐) ≅ (𝑏, 𝑐) × (𝑏, 𝑐) ≅ (𝑏, 𝑐 × 𝑐)
In other words:
𝟐⋔𝑐 ≅𝑐×𝑐
which is a fancy way of writing 𝑐 2 .
If we power 𝑐 by the hom-set (𝑐 ′ , 𝑐) and take the end over all 𝑐’s, the result is isomorphic
to 𝑐 :

(𝑐 ′ , 𝑐) ⋔ 𝑐 ≅ 𝑐 ′
∫𝑐
This follows from the Yoneda lemma. Indeed the mappings from any 𝑥 to both sides are iso-
morphic:
( ) ( )
 𝑥, (𝑐 ′ , 𝑐) ⋔ 𝑐 ≅ 𝐒𝐞𝐭 (𝑐 ′ , 𝑐), (𝑥, 𝑐) ≅ (𝑥, 𝑐 ′ )
∫𝑐 ∫𝑐
In 𝐒𝐞𝐭, the power decays to the exponential, which is isomorphic to the hom-set:

𝐴 ⋔ 𝐶 ≅ 𝐶 𝐴 ≅ 𝐒𝐞𝐭(𝐴, 𝐶)

This is the consequence of the symmetry of the product.

𝐒𝐞𝐭(𝐵, 𝐴 ⋔ 𝐶) ≅ 𝐒𝐞𝐭(𝐴, 𝐒𝐞𝐭(𝐵, 𝐶)) ≅ 𝐒𝐞𝐭(𝐴 × 𝐵, 𝐶)

≅ 𝐒𝐞𝐭(𝐵 × 𝐴, 𝐶) ≅ 𝐒𝐞𝐭(𝐵, 𝐒𝐞𝐭(𝐴, 𝐶))

19.2 Inverting a functor


One aspect of category theory is to discard information by performing lossy transformations;
the other is recovering the lost information. We’ve seen examples of making up for lost data
with free functors—the adjoints to forgetful functors. Kan extensions are another example. Both
make up for data that is lost by a functor that is not invertible.
There are two reasons why a functor might not be invertible. One is that it may map multiple
objects or arrows into a single object or arrow. In other words, it’s not injective on objects or
arrows. The other reason is that its image may not cover the whole target category. In other
words, it’s not surjective on objects or arrows.
Consider for instance an adjunction 𝐿 ⊣ 𝑅. Suppose that 𝑅 is not injective, and it collapses
two object 𝑐 and 𝑐 ′ into a single object 𝑑

𝑅𝑐 = 𝑑
𝑅𝑐 ′ = 𝑑

𝐿 has no chance of undoing it. It can’t map 𝑑 to both 𝑐 and 𝑐 ′ at the same time. The best it can
do is to map 𝑑 to a “more general” object 𝐿𝑑 that has arrows to both 𝑐 and 𝑐 ′ . These arrows are
needed to define the components of the counit of the adjunction:

𝜀𝑐 ∶ 𝐿𝑑 → 𝑐
𝜀𝑐 ′ ∶ 𝐿𝑑 → 𝑐 ′
19.2. INVERTING A FUNCTOR 295

where 𝐿𝑑 is both 𝐿(𝑅𝑐) and 𝐿(𝑅𝑐 ′ )

𝐿
𝐿𝑑
𝜀𝑐
𝜀𝑐 ′ 𝑐 𝑑

𝑅
𝑐′

Moreover, if 𝑅 is not surjective on objects, the functor 𝐿 must somehow be defined on those
objects of  that are not in the image of 𝑅. Again, naturality of the unit and counit will constrain
possible choices, as long as there are arrows connecting these objects to the image of 𝑅.
Obviously, all these constraints mean that an adjunction can only be defined in very special
cases.
Kan extensions are even weaker than adjunctions.
If adjoint functors work like inverses, Kan extensions work like fractions.
This is best seen if we redraw the diagrams defining the counit and the unit of an adjunction.
In the first diagram, 𝐿 seems to play the role of 1∕𝑅. In the second diagram 𝑅 pretends to be
1∕𝐿.

Id Id
   
𝜀
𝑅 𝐿 𝜂

𝐿 𝑅
 

The right Kan extension Ran𝑃 𝐹 and the left Kan extension Lan𝑃 𝐹 generalize these by
replacing the identity functor with some functor 𝐹 ∶  → . The Kan extensions then play the
role of fractions 𝐹 ∕𝑃 . Conceptually, they undo the action of 𝑃 and follow it with the action of
𝐹.

𝐹 𝐹
   
𝜀
𝑃 𝑃 𝜂

Ran𝑃 𝐹 Lan𝑃 𝐹
 

Just like with adjunctions, the “undoing” is not complete. The composition Ran𝑃 𝐹 ◦𝑃
doesn’t reproduce 𝐹 ; instead it’s related to it through the natural transformation 𝜀 called the
counit. Similarly, the composition Lan𝑃 𝐹 ◦𝑃 is related to 𝐹 through the unit 𝜂.
Notice that the more information 𝐹 discards, the easier it is for Kan extensions to “invert”
the functor 𝑃 . In as sense, it only has to invert 𝑃 “modulo 𝐹 ”.
Here’s the intuition behind Kan extensions. We start with a functor 𝐹 :

𝐹
 

There is a second functor 𝑃 that squishes  in another category . This may be an embedding
that is lossy and non-surjective. Our task is to somehow extend the definition of 𝐹 to cover the
whole of .
296 CHAPTER 19. KAN EXTENSIONS

In the ideal world we would like the following diagram to commute:

𝐹
 
𝑃
𝐾𝑎𝑛𝑃 𝐹

But that would involve equality of functors, which is something we try to avoid at all cost.
The next best thing would be to ask for a natural isomorphism between the two paths through
this diagram. But even that seems like asking too much. So we finally settle down on demanding
that one path be deformable into another, meaning there is a one-way natural transformation
between them. The direction of this transformation distinguishes between right and left Kan
extensions.

19.3 Right Kan extension


The right Kan extension is a functor Ran𝑃 𝐹 equipped with a natural transformation 𝜀, called
the counit of the Kan extension, defined as:

𝜀 ∶ (Ran𝑃 𝐹 )◦𝑃 → 𝐹

𝐹
 
𝜀
𝑃

Ran𝑃 𝐹

The pair (Ran𝑃 𝐹 , 𝜀) is universal among such pairs (𝐺, 𝛼), where 𝐺 is a functor 𝐺 ∶  → 
and 𝛼 is a natural transformation:
𝛼 ∶ 𝐺◦𝑃 → 𝐹
𝐹
 
𝛼
𝑃

𝐺

Universality means that for any such (𝐺, 𝛼) there is a unique natural transformation 𝜎 ∶ 𝐺 →
Ran𝑃 𝐹
𝐹
 
𝐺
𝜎
𝑃

Ran𝑃 𝐹


which factorizes 𝛼, that is:
𝛼 = 𝜀 ⋅ (𝜎◦𝑃 )
This is a combination of vertical and horizontal compositions of natural transformations in
which 𝜎◦𝑃 is the whiskering of 𝜎. Here’s the same equation drawn in terms of string diagrams:
19.3. RIGHT KAN EXTENSION 297

𝐹 𝐹
   
𝜀
𝛼
= Ran
𝜎
 
𝑃 𝐺 𝑃 𝐺

If the right Kan extension along 𝑃 is defined for every functor 𝐹 , then the universal con-
struction can be generalized to an adjunction—this time it’s an adjunction between two functor
categories:
[, ](𝐺◦𝑃 , 𝐹 ) ≅ [, ](𝐺, Ran𝑃 𝐹 )
For every 𝛼 that is an element of the left-hand side, there is a unique 𝜎 that is an element of the
right-hand side.
In other words, the right Kan extension, if it exists for every 𝐹 , is the right adjoint to functor
pre-composition:
(−◦𝑃 ) ⊣ Ran𝑃
The component of the counit of this adjunction at 𝐹 is 𝜀.
This is somewhat reminiscent of the currying adjunction:

(𝑎 × 𝑏, 𝑐) ≅ (𝑎, [𝑏, 𝑐])

in which the product is replaced by functor composition. (The analogy is not perfect, since
composition can be considered a tensor product only in the category of endofunctors.)

Right Kan extension as an end


Recall the ninja Yoneda lemma:

𝐹𝑏 ≅ 𝐒𝐞𝐭((𝑏, 𝑒), 𝐹 𝑒)
∫𝑒

Here, 𝐹 is a co-presheaf, that is a functor from  to 𝐒𝐞𝐭. The right Kan extensions of 𝐹 along
𝑃 generalizes this formula:
( )
(Ran𝑃 𝐹 )𝑏 ≅ 𝐒𝐞𝐭 (𝑏, 𝑃 𝑒), 𝐹 𝑒
∫𝑒

This works for a co-presheaf. In general we are interested in 𝐹 ∶  → , so we need to


replace the hom-set in 𝐒𝐞𝐭 by a power. Thus he right Kan extension is given by the following
end (if it exists):
(Ran𝑃 𝐹 )𝑏 ≅ (𝑏, 𝑃 𝑒) ⋔ 𝐹 𝑒
∫𝑒
The proof essentially writes itself: at every step there is only one thing to do. We start with
the adjunction:
[, ](𝐺◦𝑃 , 𝐹 ) ≅ [, ](𝐺, Ran𝑃 𝐹 )
298 CHAPTER 19. KAN EXTENSIONS

and rewrite it using ends:


( ) ( )
 𝐺(𝑃 𝑒), 𝐹 𝑒 ≅  𝐺𝑏, (Ran𝑃 𝐹 )𝑏
∫𝑒 ∫𝑏

We plug in our formula to get:


( )
≅  𝐺𝑏, (𝑏, 𝑃 𝑒) ⋔ 𝐹 𝑒
∫𝑏 ∫𝑒

We use the continuity of the hom-functor to pull the end to the front:
( )
≅  𝐺𝑏, (𝑏, 𝑃 𝑒) ⋔ 𝐹 𝑒
∫𝑏 ∫𝑒

Then we use the definition of power:


( )
𝐒𝐞𝐭 (𝑏, 𝑃 𝑒), (𝐺𝑏, 𝐹 𝑒)
∫𝑏 ∫𝑒

and apply the Yoneda lemma:


( )
 𝐺(𝑃 𝑒), 𝐹 𝑒
∫𝑒
This result is indeed the left-hand side of the adjunction.
If 𝐹 is a co-presheaf, the power in the formula for the right Kan extension decays to the
exponential/hom-set:
( )
(Ran𝑃 𝐹 )𝑏 ≅ 𝐒𝐞𝐭 (𝑏, 𝑃 𝑒), 𝐹 𝑒
∫𝑒
Notice also that, if 𝑃 has a left adjoint, let’s call it 𝑃 −1 , that is:

(𝑏, 𝑃 𝑒) ≅ (𝑃 −1 𝑏, 𝑒)

we could use the ninja Yoneda lemma to evaluate the end in:
( )
(Ran𝑃 𝐹 )𝑏 ≅ 𝐒𝐞𝐭 (𝑏, 𝑃 𝑒), 𝐹 𝑒 ≅ 𝐒𝐞𝐭((𝑃 −1 𝑏, 𝑒), 𝐹 𝑒) ≅ 𝐹 (𝑃 −1 𝑏)
∫𝑒 ∫𝑒

to get:
Ran𝑃 𝐹 ≅ 𝐹 ◦𝑃 −1
Since the adjunction is a weakening of the idea of an inverse, this result is in agreement with
the intuition that the Kan extension inverts 𝑃 and follows it with 𝐹 .

Right Kan extension in Haskell


The end formula for the right Kan extension can be immediately translated to Haskell:
newtype Ran p f b = Ran (forall e. (b -> p e) -> f e)
The counit 𝜀 of the right Kan extension is a natural transformation from the composition of
(Ran p f) after p to f:
19.3. RIGHT KAN EXTENSION 299

counit :: forall p f e'. Ran p f (p e') -> f e'


To implement it, we have to produce the value of the type (f c') given a polymorphic function
h :: forall e. (p e' -> p e) -> f e
We do it by instantiating this function at the type e = e' and calling it with the identity on
(p e'):
counit (Ran h) = h id
The computational power of the right Kan extension comes from its universal property. We
start with a functor 𝐺 equipped with a natural transformation:

𝛼 ∶ 𝐺◦𝑃 → 𝐹

This can be expressed as a Haskell data type:


type Alpha p f g = forall e. g (p e) -> f e
Universality tells us that there is a unique natural transformation 𝜎 from this functor to the
corresponding right Kan extension:
sigma :: Functor g => Alpha p f g -> forall b. (g b -> Ran p f b)
sigma alpha gb = Ran (\b_pe -> alpha $ fmap b_pe gb)
that factorizes 𝛼 through the counit 𝜀:

𝛼 = 𝜀 ⋅ (𝜎◦𝑃 )

Recall that whiskering means that we instantiate sigma at b = p c. It is then followed by


counit. The factorization of 𝛼 is thus given by:
factorize' :: Functor g => Alpha p f g -> forall e. g (p e) -> f e
factorize' alpha gpc = alpha gpc
The components of the three natural transformations are all morphism in the target category
:
𝐹

𝛼
𝜀
𝑃
Ran 𝜎◦𝑃

 G

Exercise 19.3.1. Implement the Functor instance for Ran.

Limits as Kan extensions


We have previously defined limits as universal cones. The definition of a cone involves two cat-
egories: the indexing category  that defines the shape of the diagram, and the target category
. A diagram is a functor 𝐷 ∶  →  that embeds the shape in the target category.
We can introduce a third category 𝟏: the terminal category that contains a single object and
a single identity arrow. We can then use a functor 𝑋 from that category to pick the apex 𝑥 of
the cone in . Since 𝟏 is terminal in 𝐂𝐚𝐭, we also have the unique functor from  to it, which
we’ll call !. It maps all objects to the only object of 𝟏, and all arrows to its identity arrow.
300 CHAPTER 19. KAN EXTENSIONS

It turns out that the limit of 𝐷 is the right Kan extension of the diagram 𝐷 along !. First,
let’s observe that the composition 𝑋◦! maps the shape  to a single object 𝑥, so it does the job
of the constant functor Δ𝑥 . It thus picks the apex of a cone. A cone with the apex 𝑥 is a natural
transformation 𝛾:
𝐷
 
𝛾
!

𝑋
𝟏

The following diagrams illustrate this. On the left we have two categories: 𝟏 with a single
object ∗, and  with three objects forming the shape for the diagram. On the right we have the
image of 𝐷 and the image of 𝑋◦!, which is the apex 𝑥. The three components of 𝛾 connect the
apex 𝑥 to the diagram. Naturality of 𝛾 ensures that the triangles that form the sides of the cone
commute.

∗ 𝑥
𝛾1 𝛾2

𝛾3
1 2 𝐷1 𝐷2

3 𝐷3

The right Kan extension (Ran! 𝐷, 𝜀) is the universal such cone. Ran! 𝐷 is a functor from 𝟏
to , so it selects an object in . This is indeed the apex, Lim𝐷, of the universal cone.
Universality means that for any pair (𝑋, 𝛾) there is a natural transformation 𝜎 ∶ 𝑋 → Ran! 𝐷

𝐷
 
𝑋
𝜎
!

Ran! 𝐷

which factorizes 𝛾.
The transformation 𝜎 has only one component 𝜎∗ , which is an arrow ℎ connecting the apex
𝑥 to the apex Lim𝐷. The factorization:

𝛾 = 𝜀 ⋅ (𝜎◦!)

reads, in components:
𝛾𝑖 = 𝜀𝑖 ◦ℎ
19.3. RIGHT KAN EXTENSION 301

It makes the triangles in the following diagram commute:


𝛾1 𝛾2

Lim𝐷 𝛾3

𝜀2
𝜀1

𝐷1 𝜀3 𝐷2

𝐷3

This universal condition makes Lim𝐷 the limit of the diagram 𝐷.

Left adjoint as a right Kan extension


We started by describing Kan extensions as a generalization of adjunctions. Looking at the
pictures, if we have a pair of adjoint functors 𝐿 ⊣ 𝑅, we expect the left functor to be the right
Kan extension of the identity along the right functor.

𝐿 ≅ Ran𝑅 Id

Indeed, the counit of the Kan extension is the same as the counit of the adjunction:

Id
 
𝜀
𝑅

𝐿

We also have to show universality:

𝑅
 
Id
  𝐺
𝛼 𝜎
𝑅 𝑅

𝐺 𝐿

To do that, we have at our disposal the unit of the adjunction:

𝜂 ∶ Id → 𝑅◦𝐿

We construct 𝜎 as the composite:


𝐺◦𝜂 𝛼◦𝐿
𝐺 → 𝐺◦Id ←←←←←←←→ ← Id◦𝐿 → 𝐿
← 𝐺◦𝑅◦𝐿 ←←←←←←←→
302 CHAPTER 19. KAN EXTENSIONS

In other words, we define 𝜎 as:


𝜎 = (𝛼◦𝐿) ⋅ (𝐺◦𝜂)
We could ask the converse question: if Ran𝑅 Id exists, is it automatically the left adjoint to
𝑅? It turns out that we need one more condition for that: The Kan extension must be preserved
by 𝑅, that is:

𝑅◦Ran𝑅 Id ≅ Ran𝑅 𝑅
We’ll see in the next section that the right-hand side of this condition defines the codensity
monad.

Exercise 19.3.2. Show the factorization condition:

𝛼 = 𝜀 ⋅ (𝜎◦𝑅)

for the 𝜎 that was defined above. Hint: draw the corresponding string diagrams and use the
triangle identity for the adjunction.

Codensity monad
We’ve seen that every adjunction 𝐿 ⊣ 𝐹 produces a monad 𝐹 ◦𝐿. It turns out that this monad
is the right Kan extension of 𝐹 along 𝐹 . Interestingly, even if 𝐹 doesn’t have a left adjoint, the
Kan extension Ran𝐹 𝐹 is still a monad called the codensity monad denoted by 𝑇 𝐹 :

𝑇 𝐹 = Ran𝐹 𝐹

If we were serious about the interpretation of Kan extensions as fractions, a codensity monad
would correspond to 𝐹 ∕𝐹 . A functor for which this “fraction” is equal to identity is called
codense.
To see that 𝑇 𝐹 is a monad, we have to define monadic unit and multiplication:

𝜂 ∶ Id → 𝑇 𝐹

𝜇 ∶ 𝑇 𝐹 ◦𝑇 𝐹 → 𝑇 𝐹
Both follow from universality. For every (𝐺, 𝛼) we have a 𝜎:
𝐹
 
𝐹
  𝐺
𝛼 𝜎
𝐹 𝐹

 𝐺 𝑇 𝐹 =Ran𝐹 𝐹

To get the unit, we replace 𝐺 with the identity functor Id and 𝛼 with the identity natural
transformation.
To get multiplication, we replace 𝐺 with 𝑇 𝐹 ◦𝑇 𝐹 and note that we have at our disposal the
counit of the Kan extension:
𝜀 ∶ 𝑇 𝐹 ◦𝐹 → 𝐹
We can chose 𝛼 of the type:
𝛼 ∶ 𝑇 𝐹 ◦𝑇 𝐹 ◦𝐹 → 𝐹
19.3. RIGHT KAN EXTENSION 303

to be the composite:
𝑖𝑑◦𝜀 𝜀
𝑇 𝐹 ◦𝑇 𝐹 ◦𝐹 ←←←←←←←←→
← 𝑇 𝐹 ◦𝐹 ←→
←← 𝐹
or, using the whiskering notation:
𝛼 = 𝜀 ⋅ (𝑇 𝐹 ◦𝜀)
The corresponding 𝜎 gives us the monadic multiplication.
Let’s now show that, if we start from an adjunction:
(𝐿𝑐, 𝑑) ≅ (𝑐, 𝐹 𝑑)
the codensity monad is given by 𝐹 ◦𝐿. Let’s start with the mapping from an arbitrary functor 𝐺
to 𝐹 ◦𝐿:
[, ](𝐺, 𝐹 ◦𝐿) ≅ (𝐺𝑐, 𝐹 (𝐿𝑐))
∫𝑐
We can rewrite it using the Yoneda lemma:
( )
≅ 𝐒𝐞𝐭 (𝐿𝑐, 𝑑), (𝐺𝑐, 𝐹 𝑑)
∫𝑐 ∫𝑑
Here, taking the end over 𝑑 has the effect of replacing 𝑑 with 𝐿𝑐. We can now use the adjunction:
( )
≅ 𝐒𝐞𝐭 (𝑐, 𝐹 𝑑), (𝐺𝑐, 𝐹 𝑑)
∫𝑐 ∫𝑑
and perform the ninja-Yoneda integration over 𝑐 to get:

≅ (𝐺(𝐹 𝑑), 𝐹 𝑑)
∫𝑑
This, in turn, defines a set of natural transformations:
≅ [, ](𝐺◦𝐹 , 𝐹 )
The pre-composition by 𝐹 is the left adjoint to the right Kan extension:
[, ](𝐺◦𝐹 , 𝐹 ) ≅ [, ](𝐺, Ran𝐹 𝐹 )
Since 𝐺 was arbitrary, we conclude that 𝐹 ◦𝐿 is indeed the codensity monad Ran𝐹 𝐹 .
Since every monad can be derived from some adjunction, it follows that every monad is a
codensity monad for some adjunction.

Codensity monad in Haskell


Translating the codensity monad to Haskell, we get:
newtype Codensity f c = C (forall d. (c -> f d) -> f d)
together with the extractor:
runCodensity :: Codensity f c -> forall d. (c -> f d) -> f d
runCodensity (C h) = h
This looks very similar to a continuation monad. In fact it turns into continuation monad if we
choose f to be the identity functor. We can think of Codensity as taking a callback (c -> f d)
and calling it when the result of type c becomes available.
Here’s the monad instance:
304 CHAPTER 19. KAN EXTENSIONS

instance Monad (Codensity f) where


return x = C (\k -> k x)
m >>= kl = C (\k -> runCodensity m (\a -> runCodensity (kl a) k))
Again, this is almost exactly like the continuation monad:
instance Monad (Cont r) where
return x = Cont (\k -> k x)
m >>= kl = Cont (\k -> runCont m (\a -> runCont (kl a) k))
This is why Codensity has the performance advantages of the continuation passing style. Since
it nests continuations “inside out,” it can be used to optimize long chains of binds that are pro-
duced by do blocks.
This property is especially important when working with free monads, which accumulate
binds in tree-like structures. When we finally interpret a free monad, these accumulated binds
require traversing the ever-growing tree. For every bind, the traversal starts at the root. Compare
this with the earlier example of reversing a list, which was optimized by accumulating functions
in a FIFO queue. The codensity monad offers the same kind of performance improvement.

Exercise 19.3.3. Implement the Functor instance for Codensity.

Exercise 19.3.4. Implement the Applicative instance for Codensity.

19.4 Left Kan extension


Just like the right Kan extension was defined as a right adjoint to functor pre-compositon, the
left Kan extension is defined as the left adjoint to functor pre-composition:

[, ](Lan𝑃 𝐹 , 𝐺) ≅ [, ](𝐹 , 𝐺◦𝑃 )

(There are also adjoints to post-composition: they are called Kan lifts.)
Alternatively, Lan𝑃 𝐹 can be defined as a functor equipped with a natural transformation
called the unit:
𝜂 ∶ 𝐹 → Lan𝑃 𝐹 ◦𝑃
𝐹
 
𝑃 𝜂

Lan𝑃 𝐹

Notice that the direction of the unit of the left Kan extension is opposite of that of the counit of
the right Kan extension.
The pair (Lan𝑃 𝐹 , 𝜂) is universal, meaning that, for any other pair (𝐺, 𝛼), where

𝛼 ∶ 𝐹 → 𝐺◦𝑃
𝐹
 
𝑃 𝛼
𝐺

19.4. LEFT KAN EXTENSION 305

there is a unique mapping 𝜎 ∶ Lan𝑃 𝐹 → 𝐺

𝐹
 
𝐺

𝑃
𝜎
Lan𝑃 𝐹

that factorizes 𝛼:
𝛼 = (𝜎◦𝑃 ) ⋅ 𝜂
Again, the direction of 𝜎 is reversed with respect to the right Kan extension.
Using string diagrams, we can picture the universal condition as:

𝑃 𝐺 𝑃 𝐺
 
𝜎
𝛼
= Lan
𝜂
   
𝐹 𝐹

This establishes the one-to-one mapping between two sets of natural transformations. For
every 𝛼 on the left there is a unique 𝜎 on the right:

[, ](𝐹 , 𝐺◦𝑃 ) ≅ [, ](Lan𝑃 𝐹 , 𝐺)

Left Kan extension as a coend


Recall the ninja co-Yoneda lemma. For every co-presheaf 𝐹 , we have:
𝑐
𝐹𝑏 ≅ (𝑐, 𝑏) × 𝐹 𝑐

The left Kan extension generalizes this formula to:


𝑒
(Lan𝑃 𝐹 ) 𝑏 ≅ (𝑃 𝑒, 𝑏) × 𝐹 𝑒

For a general functor 𝐹 ∶  → , we replace the product with the copower:


𝑒
(Lan𝑃 𝐹 ) 𝑏 ≅ (𝑃 𝑒, 𝑏) ⋅ 𝐹 𝑒

As long as the coend in question exists, we can prove this formula by considering a mapping
out to some functor 𝐺. We represent the set of natural transformations as the end over 𝑏:
( 𝑒 )
 (𝑃 𝑒, 𝑏) ⋅ 𝐹 𝑒, 𝐺𝑏
∫𝑏 ∫
306 CHAPTER 19. KAN EXTENSIONS

Using cocontinuity, we pull out the coend, turning it into an end:


( )
 (𝑃 𝑒, 𝑏) ⋅ 𝐹 𝑒, 𝐺𝑏
∫𝑏 ∫𝑒

and we plug in the definition of co-power:


( )
 (𝑃 𝑒, 𝑏), (𝐹 𝑒, 𝐺𝑏)
∫𝑏 ∫𝑒

We can now use the Yoneda lemma to integrate over 𝑏, replacing 𝑏 with 𝑃 𝑒:
)
(𝐹 𝑒, 𝐺(𝑃 𝑒)) ≅ [, ](𝐹 , 𝐺◦𝑃 )
∫𝑒

This indeed gives us the left adjoint to functor pre-composition:

[, ](Lan𝑃 𝐹 , 𝐺) ≅ [, ](𝐹 , 𝐺◦𝑃 )

In 𝐒𝐞𝐭, the co-power decays to a cartesian product, so we get a simpler formula:


𝑒
(Lan𝑃 𝐹 ) 𝑏 ≅ (𝑃 𝑒, 𝑏) × 𝐹 𝑒

Notice that, if the functor 𝑃 has a right adjoint, let’s call it 𝑃 −1 :

(𝑃 𝑒, 𝑏) ≅ (𝑒, 𝑃 −1 𝑏)

we can use the ninja co-Yoneda lemma to get:

(Lan𝑃 𝐹 ) 𝑏 ≅ (𝐹 ◦𝑃 −1 )𝑏

thus reinforcing the intuition that a Kan extension inverts 𝑃 and follows it with 𝐹 .

Left Kan extension in Haskell


When translating the formula for the left Kan extension to Haskell, we replace the coend with
the existential type. Symbolically:
type Lan p f b = exists e. (p e -> b, f e)
This is how we would encode the existential using GADT’s:
data Lan p f b where
Lan :: (p e -> b) -> f e -> Lan p f b
The unit of the left Kan extension is a natural transformation from the functor f to the
composition of (Lan p f) after p:
unit :: forall p f e'.
f e' -> Lan p f (p e')
To implement the unit, we start with a value of the type (f e'). We have to come up with some
type e, a function p e -> p e', and a value of the type (f e). The obvious choice is to pick
e = e' and use the identity at (p e'):
19.4. LEFT KAN EXTENSION 307

unit fe = Lan id fe
The computational power of the left Kan extension lies in its universal property. Given a
functor g and a natural transformation from f to the composition of g after p:
type Alpha p f g = forall e. f e -> g (p e)
there is a unique natural transformation 𝜎 from the corresponding left Kan extension to g:
sigma :: Functor g => Alpha p f g -> forall b. (Lan p f b -> g b)
sigma alpha (Lan pe_b fe) = fmap pe_b (alpha fe)
that factorizes 𝛼 through the unit 𝜂:

𝛼 = (𝜎◦𝑃 ) ⋅ 𝜂

The whiskering of 𝜎 means instantiating it at b = p e, so the factorization of 𝛼 is implemented


as:
factorize :: Functor g => Alpha p f g -> f e -> g (p e)
factorize alpha = sigma alpha . unit

Exercise 19.4.1. Implement the Functor instance for Lan.

Colimits as Kan extensions


Just like limits can be defined as right Kan extensions, colimits can be defined as left Kan
extension.
We start with an indexing category  that defines the shape of the colimit. The functor 𝐷
selects this shape in the target category . The apex of the cocone is selected by a functor from
the terminal single-object category 𝟏. The natural transformation defines a cocone from 𝐷 to
𝑋:
𝐷
 

! 𝛾

𝑋
𝟏
Here’s an illustrative example of a simple shape consisting of three objects and three mor-
phisms (not counting identities). The object 𝑥 is the image of the single object ∗ under the
functor 𝑋:
1 2 𝐷1 𝐷2

3 𝐷3
𝛾1 𝛾2
𝛾3

∗ 𝑥
The colimit is the universal cocone, which is given by the left Kan extension of 𝐷 along the
functor !:
Colim 𝐷 = Lan! 𝐷
308 CHAPTER 19. KAN EXTENSIONS

Right adjoint as a left Kan extension


We’ve seen that, when we have an adjunction 𝐿 ⊢ 𝑅, the left adjoint is related to the right Kan
extension. Dually, if the right adjoint exists, it can be expressed as the left Kan extension of the
identity functor:
𝑅 ≅ Lan𝐿 Id
Conversely, if the left Kan extension of identity exists and it preserves the functor 𝐿:

𝐿◦Lan𝐿 Id ≅ Lan𝐿 𝐿

than Lan𝐿 Id is the right adjoint of 𝐿. The left Kan extension of 𝐿 along itself is called the
density comonad.
The unit of Kan extension is the same as the unit of the adjunction:

Id
 
𝐿 𝜂

𝑅

The proof of universality is analogous to the one for the right Kan extension.

Exercise 19.4.2. Implement the Comonad instance for the density comonad:
data Density f c where
D :: (f d -> c) -> f d -> Density f c

Day convolution as a Kan extension


We’ve seen Day convolution defined as a tensor product in the category of co-presheaves over
a monoidal category :
𝑎,𝑏
(𝐹 ⋆ 𝐺)𝑐 = (𝑎 ⊗ 𝑏, 𝑐) × 𝐹 𝑎 × 𝐺𝑏

Co-presheaves, that is functors in [, 𝐒𝐞𝐭], can also be tensored using an external tensor product.
An external product of two objects, instead of producing an object in the same category, picks
an object in a different category. In our case, the product of two functors ends up in the category
of co-presheaves on  × :

̄ ∶ [, 𝐒𝐞𝐭] × [, 𝐒𝐞𝐭] → [ × , 𝐒𝐞𝐭]


The product of two co-presheaves acting on a pair of objects in  × , is given by the formula:

̄
(𝐹 ⊗𝐺)⟨𝑎, 𝑏⟩ = 𝐹 𝑎 × 𝐺𝑏

It turns out that Day convolution of two functors can be expressed as a left Kan extension
of their external product along the tensor product in :

𝐹 ⋆ 𝐺 ≅ Lan⊗ (𝐹 ⊗𝐺)
̄
19.5. USEFUL FORMULAS 309

Pictorially:
̄
𝐹 ⊗𝐺
× 𝐒𝐞𝐭

 Lan⊗ (𝐹 ⊗𝐺)
̄

Indeed, using the coend formula for the left Kan extension we get:
⟨𝑎,𝑏⟩
̄
(Lan⊗ (𝐹 ⊗𝐺))𝑐 ≅ ̄
(𝑎 ⊗ 𝑏, 𝑐) ⋅ (𝐹 ⊗𝐺)⟨𝑎, 𝑏⟩

⟨𝑎,𝑏⟩
≅ (𝑎 ⊗ 𝑏, 𝑐) ⋅ (𝐹 𝑎 × 𝐺𝑏)

Since the two functors are 𝐒𝐞𝐭-valued, the co-power decays into the cartesian product:
⟨𝑎,𝑏⟩
≅ (𝑎 ⊗ 𝑏, 𝑐) × 𝐹 𝑎 × 𝐺𝑏

and reproduces the formula for Day convolution.

19.5 Useful Formulas


• Co-power: ( )
(𝐴 ⋅ 𝑏, 𝑐) ≅ 𝐒𝐞𝐭 𝐴, (𝑏, 𝑐)

• Power: ( )
(𝑏, 𝐴 ⋔ 𝑐) ≅ 𝐒𝐞𝐭 𝐴, (𝑏, 𝑐)

• Right Kan extension:

[, ](𝐺◦𝑃 , 𝐹 ) ≅ [, ](𝐺, Ran𝑃 𝐹 )

(Ran𝑃 𝐹 )𝑏 ≅ (𝑏, 𝑃 𝑒) ⋔ 𝐹 𝑒
∫𝑒

• Left Kan extension:


[, ](Lan𝑃 𝐹 , 𝐺) ≅ [, ](𝐹 , 𝐺◦𝑃 )
𝑒
(Lan𝑃 𝐹 ) 𝑏 ≅ (𝑃 𝑒, 𝑏) ⋅ 𝐹 𝑒

• Right Kan extension in 𝐒𝐞𝐭:


( )
(Ran𝑃 𝐹 )𝑏 ≅ 𝐒𝐞𝐭 (𝑏, 𝑃 𝑒), 𝐹 𝑒
∫𝑒

• Left Kan extension in 𝐒𝐞𝐭:


𝑒
(Lan𝑃 𝐹 )𝑏 ≅ (𝑃 𝑒, 𝑏) × 𝐹 𝑒

Chapter 20

Enrichment

Lao Tzu says: "To know you have enough is to be rich."

20.1 Enriched Categories


This might come as a surprise, but the Haskell definition of a Functor cannot be fully explained
without some background in enriched categories. In this chapter I’ll try to show that, at least
conceptually, enrichment is not a huge step from the ordinary category theory.
Additional motivation for studying enriched categories comes from the fact that a lot of liter-
ature, notably the website nLab, contains descriptions of concepts in most general terms, which
often means in terms of enriched categories. Most of the usual constructs can be translated
just by changing the vocabulary, replacing hom-sets with hom-objects and 𝐒𝐞𝐭 with a monoidal
category .
Some enriched concepts, like weighted limits and colimits, turn out to be powerful on their
own, to the extent that one might be tempted to replace Mac Lane’s adage,“All concepts are Kan
extensions” with “All concepts are weighted (co-)limits.”

Set-theoretical foundations
Category theory is very frugal at its foundations. But it (reluctantly) draws upon set theory. In
particular the idea of the hom-set, defined as a set of arrows between two objects, drags in set
theory as the prerequisite to category theory. Granted, arrows form a set only in a locally small
category, but that’s a small consolation, considering that dealing with things that are too big to
be sets requires even more theory.
It would be nice if category theory were able to bootstrap itself, for instance by replacing
hom-sets with more general objects. That’s exactly the idea behind enriched categories. These
hom-object, though, have to come from some other category that has hom-sets and, at some
point we have to fall back on set-theoretical foundations. Nevertheless, having the option of
replacing stuctureless hom-sets with something different expands our ability to model more
complex systems.
The main property of sets is that, unlike objects, they are not atomic: they have elements. In
category theory we sometimes talk about generalized elements, which are simply arrows point-
ing at an object; or global elements, which are arrows from the terminal object (or, sometimes,
from the monoidal unit 𝐼). But most importantly, sets define equality of elements.

311
312 CHAPTER 20. ENRICHMENT

Virtually all that we’ve learned about categories can be translated into the realm of enriched
categories. However, a lot of categorical reasoning involves commuting diagrams, which ex-
press the equality of arrows. In the enriched setting we don’t have arrows going between objects,
so all these constructions will have to be modified.

Hom-Objects
At first sight, replacing hom-sets with objects might seem like a step backward. After all, sets
have elements, while objects are formless blobs. However, the richness of hom-objects is en-
coded in the morphisms of the category they come from. Conceptually, the fact that sets are
structure-less means that there are lots of morphisms (functions) between them. Having fewer
morphisms often means having more structure.
The guiding principle in defining enriched categories is that we should be able to recover
ordinary category theory as a special case. After all hom-sets are objects in the category 𝐒𝐞𝐭.
In fact we’ve worked really hard to express properties of sets in terms of functions rather than
elements.
Having said that, the very definition of a category in terms of composition and identity
involves morphisms that are elements of hom-sets. So let’s first re-formulate the primitives of a
category without recourse to elements.
Composition of arrows can be defined in bulk as a function between hom-sets:

◦ ∶ (𝑏, 𝑐) × (𝑎, 𝑏) → (𝑎, 𝑐)

Instead of talking about the identity arrow, we can use a function from the singleton set:

𝑗𝑎 ∶ 1 → (𝑎, 𝑎)

This shows us that, if we want to replace hom-sets (𝑎, 𝑏) with objects from some category
, we have to be able to multiply these objects to define composition, and we need some kind
of unit object to define identity. We could ask for  to be cartesian but, in fact, a monoidal
category works just fine. As we’ll see, the unit and associativity laws of a monoidal category
translate directly to identity and associativity laws for composition.

Enriched Categories
Let  be a monoidal category with a tensor product ⊗, a unit object 𝐼, and the associator and
two unitors (as well as their inverses):

𝛼 ∶ (𝑎 ⊗ 𝑏) ⊗ 𝑐 → 𝑎 ⊗ (𝑏 ⊗ 𝑐)
𝜆∶ 𝐼 ⊗ 𝑎 → 𝑎
𝜌∶ 𝑎 ⊗ 𝐼 → 𝑎

A category  enriched over  has objects and, for any pair of objects 𝑎 and 𝑏, a hom-object
(𝑎, 𝑏). This hom-object is an object in . Composition is defined using arrows in :

◦ ∶ (𝑏, 𝑐) ⊗ (𝑎, 𝑏) → (𝑎, 𝑐)

Identity is defined by the arrow:


𝑗𝑎 ∶ 𝐼 → (𝑎, 𝑎)
20.1. ENRICHED CATEGORIES 313

Associativity is expressed in terms of the associators in :

( ) 𝛼 ( )
(𝑐, 𝑑) ⊗ (𝑏, 𝑐) ⊗ (𝑎, 𝑏) (𝑐, 𝑑) ⊗ (𝑏, 𝑐) ⊗ (𝑎, 𝑏)
◦⊗𝑖𝑑 𝑖𝑑⊗◦

(𝑏, 𝑑) ⊗ (𝑎, 𝑏) (𝑐, 𝑑) ⊗ (𝑎, 𝑐)


◦ ◦

(𝑎, 𝑑)

Unit laws are expressed in terms of unitors in :

𝜆 𝜌
𝐼 ⊗ (𝑎, 𝑏) (𝑎, 𝑏) (𝑎, 𝑏) ⊗ 𝐼 (𝑎, 𝑏)
◦ ◦
𝑗𝑏 ⊗𝑖𝑑 𝑖𝑑⊗𝑗𝑎

(𝑏, 𝑏) ⊗ (𝑎, 𝑏) (𝑎, 𝑏) ⊗ (𝑎, 𝑎)

Notice that these are all diagrams in , where we do have arrows forming hom-sets. We still
fall back on set theory, but at a different level.
A category enriched over  is also called a -category. In what follows we’ll assume that the
enriching category is symmetric monoidal, so we can form opposite and product -categories.
The category  𝑜𝑝 opposite to a -category  is obtained by reversing hom-objects, that is:

 𝑜𝑝 (𝑎, 𝑏) = (𝑏, 𝑎)

Composition in the opposite category involves reversing the order of hom-objects, so it only
works if the tensor product is symmetric.
We can also define a tensor product of -categories; again, provided that  is symmetric.
The product of two -categories  ⊗ has, as objects, pairs of objects, one from each category.
The hom-objects between such pairs are defined to be tensor products:

(𝐶 ⊗ )(⟨𝑐, 𝑑⟩, ⟨𝑐 ′ , 𝑑 ′ ⟩) = (𝑐, 𝑐 ′ ) ⊗ (𝑑, 𝑑 ′ )

We need symmetry of the tensor product in order to define composition. Indeed, we need to
swap the two hom-objects in the middle, before we can apply the two available compositions:
( ) ( )
◦ ∶ (𝑐 ′ , 𝑐 ′′ , ) ⊗ (𝑑 ′ , 𝑑 ′′ ) ⊗ (𝑐, 𝑐 ′ ) ⊗ (𝑑, 𝑑 ′ ) → (𝑐, 𝑐 ′′ ) ⊗ (𝑑, 𝑑 ′′ )

The identity arrow is the tensor product of two identities:

𝑗𝑐 ⊗𝑗𝑑
← (𝑐, 𝑐) ⊗ (𝑑, 𝑑)
𝐼 ⊗ 𝐼 ←←←←←←←←←←→

Exercise 20.1.1. Define composition and unit in the -category  𝑜𝑝 .

Exercise 20.1.2. Show that every -category  has an underlying ordinary category 0 whose
objects are the same, but whose hom-sets are given by (monoidal global) elements of the hom-
objects, that is elements of (𝐼, (𝑎, 𝑏)).
314 CHAPTER 20. ENRICHMENT

Examples
Seen from this new perspective, the ordinary categories we’ve studied so far were trivially en-
riched over the monoidal category (𝐒𝐞𝐭, ×, 1), with the cartesian product as the tensor product,
and the singleton set as the unit.
Interestingly, a 2-category can be seen as enriched over 𝐂𝐚𝐭. Indeed, 1-cells in a 2-category
are themselves objects in another category. The 2-cells are just arrows in that category. In
particular the 2-category 𝐂𝐚𝐭 of small categories is enriched in itself. Its hom-objects are functor
categories, which are objects in 𝐂𝐚𝐭.

Preorders
Enrichment doesn’t always mean adding more stuff. Sometimes it looks more like impoverish-
ment, as is the case of enriching over a walking arrow category.
This category has just two objects which, for the purpose of this construction, we’ll call
False and True. There is a single arrow from False to True (not counting identity arrows),
which makes False the initial object and True the terminal one.
𝑖𝑑False 𝑖𝑑True

!
False True
To make this into a monoidal category, we define the tensor product, such that:
True ⊗ True = True
and all other combinations produce False. True is the monoidal unit, since:
True ⊗ 𝑥 = 𝑥
A category enriched over the monoidal walking arrow is called a preorder. A hom-object
(𝑎, 𝑏) between any two objects can be either False or True. We interpret True to mean that
𝑎 precedes 𝑏 in the preorder, which we write as 𝑎 ≤ 𝑏. False means that the two objects are
unrelated.
The important property of composition, as defined by:
(𝑏, 𝑐) ⊗ (𝑎, 𝑏) → (𝑎, 𝑐)
is that, if both hom-objects on the left are True, then the right hand side must also be True. (It
can’t be False, because there is no arrow going from True to False.) In the preorder interpreta-
tion, it means that ≤ is transitive:
𝑏≤𝑐∧𝑎≤𝑏 ⟹ 𝑎≤𝑐
By the same reasoning, the existence of the identity arrow:
𝑗𝑎 ∶ True → (𝑎, 𝑎)
means that (𝑎, 𝑎) is always True. In the preorder interpretation, this means that ≤ is reflexive,
𝑎 ≤ 𝑎.
Notice that a preorder doesn’t preclude cycles and, in particular, it’s possible to have 𝑎 ≤ 𝑏
and 𝑏 ≤ 𝑎 without 𝑎 being equal to 𝑏.
A preorder may also be defined without resorting to enrichment as a thin category—a cat-
egory in which there is at most one arrow between any two objects.
20.2. -FUNCTORS 315

Self-enrichment
Any cartesian closed category  can be viewed as self-enriched. This is because every external
hom-set (𝑎, 𝑏) can be replaced by the internal hom 𝑏𝑎 (the object of arrows).
In fact every monoidal closed category  is self-enriched. Recall that, in a monoidal closed
category we have the hom-functor adjunction:

(𝑎 ⊗ 𝑏, 𝑐) ≅ (𝑎, [𝑏, 𝑐])

The counit of this adjunction works as the evaluation morphism:

𝜀𝑏𝑐 ∶ [𝑏, 𝑐] ⊗ 𝑏 → 𝑐

To define composition in this self-enriched category, we need an arrow:

◦ ∶ [𝑏, 𝑐] ⊗ [𝑎, 𝑏] → [𝑎, 𝑐]

The trick is to consider the whole hom-set at once and show that we can always pick a canonical
element in it. We start with the set:

([𝑏, 𝑐] ⊗ [𝑎, 𝑏], [𝑎, 𝑐])

We can use the adjunction to rewrite it as:

(([𝑏, 𝑐] ⊗ [𝑎, 𝑏]) ⊗ 𝑎, 𝑐)

All we have to do now is to pick an element of this hom-set. We do it by constructing the


following composite:
𝛼 𝑖𝑑⊗𝜀𝑎𝑏 𝜀𝑏𝑐
([𝑏, 𝑐] ⊗ [𝑎, 𝑏]) ⊗ 𝑎 ←←→
← [𝑏, 𝑐] ⊗ ([𝑎, 𝑏] ⊗ 𝑎) ←←←←←←←←←←←←→
← [𝑏, 𝑐] ⊗ 𝑏 ←←←←←→
← 𝑐

We used the associator and the counit of the adjunction.


We also need an arrow that defines the identity:

𝑗𝑎 ∶ 𝐼 → [𝑎, 𝑎]

Again, we can pick it as a member of the hom-set (𝐼, [𝑎, 𝑎]). We use the adjunction:

(𝐼, [𝑎, 𝑎]) ≅ (𝐼 ⊗ 𝑎, 𝑎)

We know that this hom-set contains the left unitor 𝜆, so we can use it to define 𝑗𝑎 .

20.2 -Functors
An ordinary functor maps objects to objects and arrows to arrows. Similarly, an enriched functor
𝐹 maps object to objects, but instead of acting on individual arrows, it must map hom-objects
to hom-objects. This is only possible if the hom-objects in the source category  belong to the
same category as the hom-objects in the target category . In other words, both categories must
be enriched over the same . The action of 𝐹 on hom-objects is then defined using arrows in
:
𝐹𝑎𝑏 ∶ (𝑎, 𝑏) → (𝐹 𝑎, 𝐹 𝑏)
316 CHAPTER 20. ENRICHMENT

For clarity we specify the pair of objects in the subscript of 𝐹 .


A functor must preserve composition and identity. These can be expressed as commuting
diagrams in :


(𝑏, 𝑐) ⊗ (𝑎, 𝑏) (𝑎, 𝑐) 𝐼
𝑗𝑎 𝑗𝐹 𝑎
𝐹𝑏𝑐 ⊗𝐹𝑎𝑏 𝐹𝑎𝑐
◦ 𝐹𝑎𝑎
(𝐹 𝑏, 𝐹 𝑐) ⊗ (𝐹 𝑎, 𝐹 𝑏) (𝐹 𝑎, 𝐹 𝑏) (𝑎, 𝑎) (𝐹 𝑎, 𝐹 𝑎)

Notice that I used the same symbol ◦ for two different compositions and the same 𝑗 for two
different identity mappings. Their meaning can be derived from the context.
As before, all diagrams are in the category .

The Hom-functor
The hom-functor in a category that is enriched over a monoidal closed category  is an enriched
functor:
Hom ∶  𝑜𝑝 ⊗  → 
Here, in order to define an enriched functor, we have to treat  as self-enriched.
It’s clear how this functor works on (pairs of) objects:

Hom ⟨𝑎, 𝑏⟩ = (𝑎, 𝑏)

To define an enriched functor, we have to define the action of Hom on hom-objects. Here,
the source category is  𝑜𝑝 ⊗  and the target category is , both enriched over . Let’s consider
a hom-object from ⟨𝑎, 𝑎′ ⟩ to ⟨𝑏, 𝑏′ ⟩. The action of the hom-functor on this hom-object is an
arrow in :

Hom⟨𝑎,𝑎′ ⟩⟨𝑏,𝑏′ ⟩ ∶ (𝐶 𝑜𝑝 ⊗ )(⟨𝑎, 𝑎′ ⟩, ⟨𝑏, 𝑏′ ⟩) → (Hom⟨𝑎, 𝑎′ ⟩, Hom⟨𝑏, 𝑏′ ⟩)

By definition of the product category, the source is a tensor product of two hom-objects. The
target is the internal hom in . We are thus looking for an arrow:

(𝑏, 𝑎) ⊗ (𝑎′ , 𝑏′ ) → [(𝑎, 𝑎′ ), (𝑏, 𝑏′ )]

We can use the currying hom-functor adjunction to unpack the internal hom:
( )
(𝑏, 𝑎) ⊗ (𝑎′ , 𝑏′ ) ⊗ (𝑎, 𝑎′ ) → (𝑏, 𝑏′ )

We can construct this arrow by rearranging the product and applying the composition twice.
In the enriched setting, the closest we can get to defining an individual morphism from 𝑎 to 𝑏
is to use an arrow from the unit object. We define a (monoidal-global) element of a hom-object
as a morphism in :
𝑓 ∶ 𝐼 → (𝑎, 𝑏)
We can define what it means to lift such an arrow using the hom-functor. For instance, keeping
the first argument constant, we’d define:

(𝑐, 𝑓 ) ∶ (𝑐, 𝑎) → 𝐶(𝑐, 𝑏)


20.2. -FUNCTORS 317

as the composite:

𝜆−1 𝑓 ⊗𝑖𝑑 ◦
(𝑐, 𝑎) ←←←←←←→
← 𝐼 ⊗ (𝑐, 𝑎) ←←←←←←←←←←→
← (𝑎, 𝑏) ⊗ (𝑐, 𝑎) ←←→
← (𝑐, 𝑏)

Similarly, the contravariant lifting of 𝑓 :

(𝑓 , 𝑐) ∶ (𝑏, 𝑐) → (𝑎, 𝑐)

can be defined as:


𝜌−1 𝑖𝑑⊗𝑓 ◦
(𝑏, 𝑐) ←←←←←←→
← (𝑏, 𝑐) ⊗ 𝐼 ←←←←←←←←←←→
← (𝑏, 𝑐) ⊗ (𝑎, 𝑏) ←←→
← (𝑎, 𝑐)

A lot of the familiar constructions we’ve studied in ordinary category theory have their
enriched counterparts, with products replaced by tensor products and 𝐒𝐞𝐭 replaced by .

Exercise 20.2.1. What is a functor between two preorders?

Enriched co-presheaves
Co-presheaves, that is 𝐒𝐞𝐭-valued functors, play an important role in category theory, so it’s
natural to ask what their counterparts are in the enriched setting. The generalization of a co-
presheaf is a -functor  → . This is only possible if  can be made into a -category, that
is when it’s monoidal-closed.
An enriched co-presheaf maps object of  to objects of  and it maps hom-objects of  to
internal homs of :
𝐹𝑎𝑏 ∶ (𝑎, 𝑏) → [𝐹 𝑎, 𝐹 𝑏]
In particular, the Hom-functor is an example of a -valued -functor:

Hom ∶  𝑜𝑝 ⊗  → 

The hom-functor is a special case of an enriched profunctor, which is defined as:

 𝑜𝑝 ⊗  → 

Exercise 20.2.2. The tensor product is a functor in :

⊗∶  ×  → 

Show that if  is monoidal closed, the tensor product defines a -functor. Hint: Define its
action on internal homs.

Functorial strength and enrichment


When we were discussing monads, I mentioned an important property that made them work
in programming. The endofunctors that define monads must be strong, so that we can access
external contexts inside monadic code.
It turns out that the way we have defined endofunctors in Haskell makes them automatically
strong. The reason is that strength is related to enrichment and, as we’ve seen, a cartesian closed
category is self-enriched. Let’s start with some definitions.
318 CHAPTER 20. ENRICHMENT

Functorial strength for an endofunctor 𝐹 in a monoidal category is defined as a natural


transformation with components:

𝜎𝑎𝑏 ∶ 𝑎 ⊗ 𝐹 (𝑏) → 𝐹 (𝑎 ⊗ 𝑏)

There are some pretty obvious coherence conditions that make strength respect the properties
of the tensor product. This is the associativity condition:
𝜎(𝑎⊗𝑏)𝑐
(𝑎 ⊗ 𝑏) ⊗ 𝐹 (𝑐) 𝐹 ((𝑎 ⊗ 𝑏) ⊗ 𝑐)
𝛼 𝐹 (𝛼)
𝑎⊗𝜎𝑏𝑐 𝜎𝑎(𝑏⊗𝑐)
𝑎 ⊗ (𝑏 ⊗ 𝐹 (𝑐)) 𝑎 ⊗ 𝐹 (𝑏 ⊗ 𝑐) 𝐹 (𝑎 ⊗ (𝑏 ⊗ 𝑐))

and this is the unit condition:


𝜎𝐼𝑎
𝐼 ⊗ 𝐹 (𝑎) 𝐹 (𝐼 ⊗ 𝑎)
𝜆
𝐹 (𝜆)

𝐹 (𝑎)

In a general monoidal category this is called the left strength, and there is a corresponding
definition of the right strength. In a symmetric monoidal category, the two are equivalent.
An enriched endofunctor maps hom-objects to hom-objects:

𝐹𝑎𝑏 ∶ (𝑎, 𝑏) → (𝐹 𝑎, 𝐹 𝑏)

If we treat a monoidal closed category  as self-enriched, the hom-objects are internal homs,
so an enriched endofunctor is equipped with the mapping:

𝐹𝑎𝑏 ∶ [𝑎, 𝑏] → [𝐹 𝑎, 𝐹 𝑏]

Compare this with our definition of a Haskell Functor:


class Functor f where
fmap :: (a -> b) -> (f a -> f b)
The function types involved in this definition, (a -> b) and (f a -> f b), are the internal
homs. So a Haskell Functor is indeed an enriched functor.
We don’t normally distinguish between external and internal homs in Haskell, since their
sets of elements are isomorphic. It’s a simple consequence of the currying adjunction:

(1 × 𝑏, 𝑐) ≅ (1, [𝑏, 𝑐])

and the fact that the terminal object is the unit of the cartesian product.
It turns out that in a self-enriched category  every strong endofunctor is automatically
enriched. Indeed, to show that a functor 𝐹 is enriched we need to define the mapping between
internal homs, that is an element of the hom-set:

𝐹𝑎𝑏 ∈ ([𝑎, 𝑏], [𝐹 𝑎, 𝐹 𝑏])

Using the hom adjunction, this is isomorphic to:

([𝑎, 𝑏] ⊗ 𝐹 𝑎, 𝐹 𝑏)
20.3. -NATURAL TRANSFORMATIONS 319

We can construct this mapping by composing the strength and the counit of the adjunction (the
evaluation morphism):
𝜎[𝑎,𝑏]𝑎 𝐹 𝜖𝑎𝑏
[𝑎, 𝑏] ⊗ 𝐹 𝑎 ←←←←←←←←←→
← 𝐹 ([𝑎, 𝑏] ⊗ 𝑎) ←←←←←←←←→
← 𝐹𝑏

Conversely, every enriched endofunctor in  is strong. To show strength, we need to define


the mapping 𝜎𝑎𝑏 , or equivalently (by hom-adjunction):

𝑎 → [𝐹 𝑏, 𝐹 (𝑎 ⊗ 𝑏)]

Recall the definition of the unit of the hom adjunction, the coevaluation morphism:

𝜂𝑎𝑏 ∶ 𝑎 → [𝑏, 𝑎 ⊗ 𝑏]

We construct the following composite:


𝜂𝑎𝑏 𝐹𝑏,𝑎⊗𝑏
← [𝑏, 𝑎 ⊗ 𝑏] ←←←←←←←←←←→
𝑎 ←←←←←→ ← [𝐹 𝑏, 𝐹 (𝑎 ⊗ 𝑏)]

This can be translated directly to Haskell:


strength :: Functor f => (a, f b) -> f (a, b)
strength = uncurry (\a -> fmap (coeval a))
with the following definition of coeval:
coeval :: a -> (b -> (a, b))
coeval a = \b -> (a, b)
Since currying and evaluation are built into Haskell, we can further simplify this formula:
strength :: Functor f => (a, f b) -> f (a, b)
strength (a, bs) = fmap (a, ) bs

20.3 -Natural Transformations


An ordinary natural transformation between two functors 𝐹 and 𝐺 from  to  is a selection of
arrows from the hom-sets (𝐹 𝑎, 𝐺𝑎). In the enriched setting, we don’t have arrows, so the next
best thing we can do is to use the unit object 𝐼 to do the selection. We define a component of a
-natural transformation at 𝑎 as an arrow:

𝜈𝑎 ∶ 𝐼 → (𝐹 𝑎, 𝐺𝑎)

Naturality condition is a little tricky. The standard naturality square involves the lifting of
an arbitrary arrow 𝑓 ∶ 𝑎 → 𝑏 and the equality of the following compositions:

𝜈𝑏 ◦𝐹 𝑓 = 𝐺𝑓 ◦𝜈𝑎

Let’s consider the hom-sets that are involved in this equation. We are lifting a morphism
𝑓 ∈ (𝑎, 𝑏). The composites on both sides of the equation are the elements of (𝐹 𝑎, 𝐺𝑏).
On the left, we have the arrow 𝜈𝑏 ◦𝐹 𝑓 . The composition itself is a mapping from the product
of two hom-sets:
(𝐹 𝑏, 𝐺𝑏) × (𝐹 𝑎, 𝐹 𝑏) → (𝐹 𝑎, 𝐺𝑏)
320 CHAPTER 20. ENRICHMENT

Similarly, on the right we have 𝐺𝑓 ◦𝜈𝑎 , which a composition:


(𝐺𝑎, 𝐺𝑏) × (𝐹 𝑎, 𝐺𝑎) → (𝐹 𝑎, 𝐺𝑏)
In the enriched setting we have to work with hom-objects rather than hom-sets, and the
selection of the components of the natural transformation is done using the unit 𝐼. We can
always produce the unit out of thin air using the inverse of the left or the right unitor.
Altogether, the naturality condition is expressed as the following commuting diagram:
𝜈𝑏 ⊗𝐹𝑎𝑏
𝐼 ⊗ (𝑎, 𝑏) (𝐹 𝑏, 𝐺𝑏) ⊗ (𝐹 𝑎, 𝐹 𝑏)
𝜆−1 ◦

(𝑎, 𝑏) (𝐹 𝑎, 𝐺𝑏)

𝜌−1 ◦
𝐺𝑎𝑏 ⊗𝜈𝑎
(𝑎, 𝑏) ⊗ 𝐼 (𝐺𝑎, 𝐺𝑏) ⊗ (𝐹 𝑎, 𝐺𝑎)
This also works for an ordinary category, where we can trace two paths through this diagram
by first picking an 𝑓 from (𝑎, 𝑏). We can then use 𝜈𝑏 and 𝜈𝑎 to pick components of the natural
transformation. We also lift 𝑓 using either 𝐹 or 𝐺. Finally, we use composition to reproduce
the naturality equation.
This diagram can be further simplified if we use our earlier definition of the hom-functor’s
action on global elements of hom-objects. The components of a natural transformation are
defined as such global elements:
𝜈𝑎 ∶ 𝐼 → (𝐹 𝑎, 𝐺𝑎)
There are two such liftings at our disposal:
(𝑑, 𝜈𝑏 ) ∶ (𝑑, 𝐹 𝑏) → (𝑑, 𝐺𝑏)
and:
(𝜈𝑎 , 𝑑) ∶ (𝐺𝑎, 𝑑) → (𝐹 𝑎, 𝑑)
We get something that looks more like the familiar naturality square:
(𝐹 𝑎, 𝐹 𝑏)
𝐹𝑎𝑏 (𝐹 𝑎,𝜈𝑏 )

(𝑎, 𝑏) (𝐹 𝑎, 𝐺𝑏)

𝐺𝑎𝑏 (𝜈𝑎 ,𝐺𝑏)


(𝐺𝑎, 𝐺𝑏)
-natural transformations between two -functors 𝐹 and 𝐺 form a set we call -nat(𝐹 , 𝐺).
Earlier we have seen that, in ordinary categories, the set of natural transformations can be
written as an end:
[, ](𝐹 , 𝐺) ≅ (𝐹 𝑎, 𝐺𝑎)
∫𝑎
It turns out that ends and coends can be defined for enriched profunctors, so this formula works
for enriched natural transformations as well. The difference is that, instead of a set of natural
transformations -nat(𝐹 , 𝐺), it defines the object of natural transformations [, ](𝐹 , 𝐺) in .
The definition of the (co-)end of a -profunctor 𝑃 ∶  ⊗  𝑜𝑝 →  is analogous to the
definition we’ve seen for ordinary profunctors. For instance, the end is an object 𝑒 in  equipped
with an extranatural transformation 𝜋 ∶ 𝑒 → 𝑃 that is universal among such objects.
20.4. YONEDA LEMMA 321

20.4 Yoneda Lemma


The ordinary Yoneda lemma involves a 𝐒𝐞𝐭-valued functor 𝐹 and a set of natural transforma-
tions:
[, 𝐒𝐞𝐭]((𝑐, −), 𝐹 ) ≅ 𝐹 𝑐
To generalize it to the enriched setting, we’ll consider a -valued functor 𝐹 . As before, we’ll
use the fact that we can treat  as self-enriched, as long as it’s closed, so we can talk about
-valued -functors.
The weak version of the Yoneda lemma deals with a set of -natural transformations. There-
fore, we have to turn the right hand side into a set as well. This is done by taking the (monoidal-
global) elements of 𝐹 𝑐. We get:

-nat((𝑐, −), 𝐹 ) ≅ (𝐼, 𝐹 𝑐)

The strong version of the Yoneda lemma works with objects of  and uses the end over the
internal hom in  to represent the object of natural transformations:

[(𝑐, 𝑥), 𝐹 𝑥] ≅ 𝐹 𝑐
∫𝑥

20.5 Weighted Limits


Limits (and colimits) are built around commuting triangles, so they are not immediately trans-
latable to the enriched setting. The problem is that cones are constructed from “wires,” that is
individual morphisms. You may think of hom-sets as a thick bundle of wires, each wire having
zero thickness. When constructing a cone, you are selecting a single wire from a hom-set. We
have to replace wires with something thicker.
Consider a diagram, that is a functor 𝐷 from the indexing category  to the target category
. The wires for the cone with the apex 𝑥 are selected from hom-sets (𝑥, 𝐷𝑗), where 𝑗 is an
object of  .

𝑥
(𝑥,𝐷𝑗)

𝑗 𝑘
𝐷𝑗 𝐷𝑘

𝑙
𝐷𝑙
This selection of a 𝑗’th wire can be described as a function from the singleton set 1:

𝛾𝑗 ∶ 1 → (𝑥, 𝐷𝑗)

We can try to gather these functions into a natural transformation:

𝛾 ∶ Δ1 → (𝑥, 𝐷−)

where Δ1 is a constant functor mapping all objects of  to the singleton set. Naturality condi-
tions ensure that the triangles forming the sides of the cone commute.
322 CHAPTER 20. ENRICHMENT

The set of all cones with the apex 𝑥 is then given by the set of natural transformations:

[ , 𝐒𝐞𝐭](Δ1 , (𝑥, 𝐷−))

This reformulation gets us closer to the enriched setting, since it rephrases the problem in
terms of hom-sets rather than individual morphisms. We could start by considering both  and
 to be enriched over , in which case 𝐷 would be a -functor.
There is just one problem: how do we define a constant -functor Δ𝑥 ∶  → ? Its action
on objects is obvious: it maps all objects in  to one object 𝑥 in . But what should it do to
hom-objects?
An ordinary constant functor Δ𝑥 maps all morphisms in (𝑎, 𝑏) to the identity in (𝑥, 𝑥). In
the enriched setting, though, (𝑥, 𝑥) is an object with no internal structure. Even if it happened
to be the unit 𝐼, there’s no guarantee that for every hom-object (𝑎, 𝑏) we could find an arrow
to 𝐼; and even if there was one, it might not be unique. In other words, there is no reason to
believe that 𝐼 is the terminal object.
The solution is to “smear the singularity”: instead of using the constant functor to select
a single wire, we should use some other “weighting” functor 𝑊 ∶  →  to select a thicker
“cylinder”. Such a weighted cone with the apex 𝑥 is an element of the set of natural transfor-
mations:
[ , 𝐒𝐞𝐭] (𝑊 , (𝑥, 𝐷−))
A weighted limit, also known as an indexed limit, lim𝑊 𝐷, is then defined as the universal
weighted cone. It means that for any weighted cone with the apex 𝑥 there is a unique morphism
from 𝑥 to lim𝑊 𝐷 that factorizes it. The factorization is guaranteed by the naturality of the
isomorphism that defines the weighted limit:

(𝑥, lim𝑊 𝐷) ≅ [ , 𝐒𝐞𝐭](𝑊 , (𝑥, 𝐷−))

The regular, non-weighted limit is often called a conical limit, and it corresponds to using
the constant functor as the weight.
This definition can be translated almost verbatim to the enriched setting by replacing 𝐒𝐞𝐭
with :
(𝑥, lim𝑊 𝐷) ≅ [ , ](𝑊 , (𝑥, 𝐷−))
Of course, the meaning of the symbols in this formula is changed. Both sides are now objects
in . The left-hand side is the hom-object in , and the right-hand side is the object of natural
transformations between two -functors.
Dually, a weighted colimit is defined by the natural isomorphism:

(colim𝑊 𝐷, 𝑥) ≅ [ 𝑜𝑝 , ](𝑊 , (𝐷−, 𝑥))

Here, the colimit is weighed by a functor 𝑊 ∶  𝑜𝑝 →  from the opposite category.


Weighted (co-)limits, both in ordinary and in enriched categories, play a fundamental role:
they can be used to re-formulate a lot of familiar constructions, like (co-)ends, Kan extensions,
etc.

20.6 Ends as Weighted Limits


An end has a lot in common with a product or, more generally, with a limit. If you squint
hard enough, the projections 𝜋𝑥 ∶ 𝑒 → 𝑃 ⟨𝑎, 𝑎⟩ form the sides of a cone; except that instead of
20.6. ENDS AS WEIGHTED LIMITS 323

commuting triangles we have wedges. It turns out that we can express ends as weighted limits.
The advantage of this formulation is that it also works in the enriched setting.
We’ve seen that the end of a -valued -profunctor can be defined using the more funda-
mental notion of an extranatural transformation. This in turn allowed us to define the object of
natural transformations, which enabled us to define weighted limits. We can now go ahead and
extend the definition of the end to work with a more general -functor of mixed variance with
values in a -category :
𝑃 ∶  𝑜𝑝 ⊗  → 
We’ll use this functor as a diagram in .
At this point mathematicians start worrying about size issues. After all we are embedding
a whole category—squared—as a single diagram in . To avoid the size problems, we’ll just
assume that  is small; that is, its objects form a set.
We want to take a weighted limit of the diagram defined by 𝑃 . The weight must be a -
functor  𝑜𝑝 ⊗  → . There is one such functor that’s always at our disposal, the hom-functor
Hom . We will use it to define the end as a weighted limit:

𝑃 ⟨𝑐, 𝑐⟩ = limHom 𝑃
∫𝑐
First, let’s convince ourselves that this formula works in an ordinary (𝐒𝐞𝐭-enriched) category.
Since ends are defined by their mapping-in property, let’s consider a mapping from an arbitrary
object 𝑑 to the weighted limit and use the standard Yoneda trick to show the isomorphism. By
definition, we have:

(𝑑, limHom 𝑃 ) ≅ [ 𝑜𝑝 × , 𝐒𝐞𝐭]((−, =), (𝑑, 𝑃 (−, =))

We can rewrite the set of natural transformations as an end over pairs of objects ⟨𝑐, 𝑐 ′ ⟩:

𝐒𝐞𝐭((𝑐, 𝑐 ′ ), (𝑑, 𝑃 ⟨𝑐, 𝑐 ′ ⟩))


∫⟨𝑐,𝑐 ′ ⟩

Using the Fubini theorem, this is equivalent to the iterated end:

𝐒𝐞𝐭((𝑐, 𝑐 ′ ), (𝑑, 𝑃 ⟨𝑐, 𝑐 ′ ⟩))


∫𝑐 ∫𝑐 ′

We can now apply the ninja-Yoneda lemma to perform the integration over 𝑐 ′ . The result is:

(𝑑, 𝑃 ⟨𝑐, 𝑐⟩) ≅ (𝑑, 𝑃 ⟨𝑐, 𝑐⟩)


∫𝑐 ∫𝑐
where we used continuity to push the end under the hom-functor. Since 𝑑 was arbitrary, we
conclude that, for ordinary categories:

limHom 𝑃 ≅ 𝑃 ⟨𝑐, 𝑐⟩
∫𝑐
This justifies our use of the weighed limit to define the end in the enriched case.
An analogous formula works for the coend, except that we use the colimit with the hom-
functor in the opposite category Hom 𝑜𝑝 as the weight:

𝑃 ⟨𝑐, 𝑐⟩ = colimHom 𝑜𝑝 𝑃
∫𝑐
324 CHAPTER 20. ENRICHMENT

Exercise 20.6.1. Show that for ordinary 𝐒𝐞𝐭-enriched categories the weighted colimit definition
of a coend reproduces the earlier definition. Hint: use the mapping out property of the coend.

Exercise 20.6.2. Show that, as long as both sides exist, the following identities hold in ordinary
(𝐒𝐞𝐭-enriched) categories (they can be generalized to the enriched setting):

lim𝑊 𝐷 ≅ 𝑊 𝑗 ⋔ 𝐷𝑗
∫𝑗 ∶ 

𝑗∶ 
colim𝑊 𝐷 ≅ 𝑊 𝑗 ⋅ 𝐷𝑗

Hint: Use the mapping in/out with the Yoneda trick and the definition of power and copower.

20.7 Kan Extensions


We’ve seen how to express limits and colimits as Kan extensions using a functor from the sin-
gular category 1. Weighted limits let us dispose of the singularity, and a judicious choice of the
weight lets us express Kan extensions in terms of weighted limits.
First, let’s work out the formula for ordinary, 𝐒𝐞𝐭-enriched categories. The right Kan exten-
sion is defined as:
(Ran𝑃 𝐹 )𝑒 ≅ (𝑒, 𝑃 𝑐) ⋔ 𝐹 𝑐
∫𝑐
We’ll consider the mapping into it from an arbitrary object 𝑑. The derivation follows a number
of simple steps, mostly by expanding the definitions.
We start with:
(𝑑, (Ran𝑃 𝐹 )𝑒)
and substitute the definition of the Kan extension:

(𝑑, (𝑒, 𝑃 𝑐) ⋔ 𝐹 𝑐)
∫𝑐

Using the continuity of the hom-functor, we can pull out the end:

(𝑑, (𝑒, 𝑃 𝑐) ⋔ 𝐹 𝑐)
∫𝑐

We then use the definition of the pitchfork:


( )
(𝑑, 𝐴 ⋔ 𝑑 ′ ) ≅ 𝐒𝐞𝐭 𝐴, (𝑑, 𝑑 ′ )

to get:
(𝑑, (𝑒, 𝑃 𝑐) ⋔ 𝐹 𝑐) ≅ 𝐒𝐞𝐭((𝑒, 𝑃 𝑐), (𝑑, 𝐹 𝑐))
∫𝑐 ∫𝑐
This can be written as a set of natural transformation:

[, 𝐒𝐞𝐭]((𝑒, 𝑃 −), (𝑑, 𝐹 −))

The weighted limit is also defined through the set of natural transformations:

(𝑑, lim𝑊 𝐹 ) ≅ [, 𝐒𝐞𝐭](𝑊 , (𝑑, 𝐹 −))


20.8. USEFUL FORMULAS 325

leading us to the final result:


(𝑑, lim(𝑒,𝑃 −) 𝐹 )
Since 𝑑 was arbitrary, we can use the Yoneda trick to conclude that:

(Ran𝑃 𝐹 )𝑒 = lim(𝑒,𝑃 −) 𝐹

This formula becomes the definition of the right Kan extension in the enriched setting.
Similarly, the left Kan extension can be defined as a weighted colimit:

(Lan𝑃 𝐹 )𝑒 = colim(𝑃 −,𝑒) 𝐹

Exercise 20.7.1. Derive the formula for the left Kan extension for ordinary categories.

20.8 Useful Formulas


• Yoneda lemma:
[(𝑐, 𝑥), 𝐹 𝑥] ≅ 𝐹 𝑐
∫𝑥

• Weighted limit:
(𝑥, lim𝑊 𝐷) ≅ [ , ](𝑊 , (𝑥, 𝐷−))

• Weighted colimit:
(colim𝑊 𝐷, 𝑥) ≅ [ 𝑜𝑝 , ](𝑊 , (𝐷−, 𝑥))

• Right Kan extension:


(Ran𝑃 𝐹 )𝑒 = lim(𝑒,𝑃 −) 𝐹

• Left Kan extension:


(Lan𝑃 𝐹 )𝑒 = colim(𝑃 −,𝑒) 𝐹
Index

[()], 172 comonoid, 234


⦇⦈, 158 computation rule, 23
𝑛-category, 104 constant functor, 67
case statement, 24 continuous functor, 120
flip, 164 convolution, 233
if statement, 22 copower, 305
instance, 68 coproduct of functors, 197
let, 186 coslice, 221
typeclass, 68 coslice category, 65
type synonym, 79 costate comonad, 236
where, 51 covariant functor, 69
0-cells, 104
1-cells, 104 data constructor, 22
2-cells, 104 density comonad, 308
difference list, 275
actegory, 289 dinatural transformation, 281
ad-hoc polymorphism, 79 discrete category, 63
alternative, 28 display map, 133
associator, 36, 39 distributors, 254

backslash, 46 elimination rule, 23


banana brackets, 158 epimorphism, 10
bi-closed monoidal category, 292 equivariant function, 272
bicategory, 260 equivariant map, 219
bimodule, 254 existential types, 175
external tensor product, 308
cartesian product, 33 extranatural transformation, 247
cartesian projections, 33
class instance, 68 faithful functor, 101
closure, 47 fiber, 133
co-continuous functor, 119 fiber functor, 278
co-presheaves, 100 full functor, 101
co-wedge, 245 full subcategory, 96
cograph, 242
comma category, 116 generators of a monoid, 128
commuting operations, 16 group, 274

327
328 INDEX

heteromorphism, 242 Rust, 235

indexed limit, 322 slice category, 65


infinity categories, 104 small category, 96
injection, 100 solution set, 96, 123
injections, 9 stick-figure category, 63
internal hom, 106 store comonad, 236
introduction rule, 23 strength, functorial, 203
iterator, 188 strict monoidal category, 196
surjection, 10, 101
kind signatures, 72
Kleisli arrow, 184 tail recursion, 191
Kleisli category, 184 theorems for free, 79
thin category, 314
large category, 96 trivial bundle, 138
lawful lenses, 239 tuple section, 203
lens bracket, 172 type class, 40
lifted types, 174
lifting, 31, 37, 68 under-category, 65
limit, conical, 322 unique, 2
linear types, 235, 292 unitor, 36, 39
locally cartesian closed category, 137
locally small category, 96 walking arrow, 68
lollipop, 292 walking arrow, limit, 93
low-pass filter, 233 weakly terminal object, 96
weakly terminal set, 96, 123
M-set, 219 wedge, 249
mapping in, 34
mapping out, 22 Yoneda functor, 100
memoization, 103
zigzag identity, 214
mixed optics, 289
monoid, 41
monoid action, 272
monomorphism, 10

neural networks, 264

over category, 65

parametric polymorphism, 78
parametricity, 79
Peano numbers, 56
post-composition, 6
power, 297
pre-composition, 6
presheaves, 100
proof-relevant relation, 72, 242
proof-relevant subset, 242
pullback, 136
pushout, 135

You might also like