0% found this document useful (0 votes)
16 views301 pages

24 WD

Uploaded by

johnd31415926
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views301 pages

24 WD

Uploaded by

johnd31415926
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 301

An Introduction to Discrete

Mathematics
(Text for Math 221 Winter 2024 at Drexel University)

Darij Grinberg
draft, March 12, 2024

Contents
0. Preface 6
0.1. What is this? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.2. Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.3. Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
0.4. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1. Induction and recursion 9


1.1. The Tower of Hanoi . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.1. The puzzle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.2. Some explorations . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.3. The numbers mn . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.4. In search of an explicit formula . . . . . . . . . . . . . . . . 14
1.2. The Principle of Mathematical Induction . . . . . . . . . . . . . . 15
1.3. Some more proofs by induction . . . . . . . . . . . . . . . . . . . . 18
1.3.1. The sum of the first n positive integers . . . . . . . . . . . 18
1.3.2. The sum of the squares of the first n positive integers . . 20
1.4. Notations for an induction proof . . . . . . . . . . . . . . . . . . . 22
1.5. The Fibonacci numbers . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.2. The sum of the first n positive Fibonacci numbers . . . . . 24
1.6. Some more examples of induction . . . . . . . . . . . . . . . . . . 25
1.7. How not to use induction . . . . . . . . . . . . . . . . . . . . . . . 28
1.8. More on the Fibonacci numbers . . . . . . . . . . . . . . . . . . . . 29
1.8.1. The addition theorem . . . . . . . . . . . . . . . . . . . . . 30
1.8.2. Divisibility of Fibonacci numbers . . . . . . . . . . . . . . 33

1
Math 221 Winter 2024, version March 12, 2024 page 2

1.8.3. Binet’s formula . . . . . . . . . . . . . . . . . . . . . . . . . 36


1.9. Strong induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.9.1. Reminder on regular induction . . . . . . . . . . . . . . . . 38
1.9.2. Strong induction . . . . . . . . . . . . . . . . . . . . . . . . 39
1.9.3. Example: Proof of Binet’s formula . . . . . . . . . . . . . . 41
1.9.4. Baseless strong induction . . . . . . . . . . . . . . . . . . . 43
1.9.5. Example: Prime factorizations exist . . . . . . . . . . . . . 44
1.9.6. Example: Paying with 3-cent and 5-cent coins . . . . . . . 46
1.10. More exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.10.1. A fake proof . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.10.2. Negative Fibonacci numbers . . . . . . . . . . . . . . . . . 49
1.10.3. More on the Hanoi tower . . . . . . . . . . . . . . . . . . . 49
1.10.4. More on recursively defined sequences . . . . . . . . . . . 50
1.10.5. More coin problems . . . . . . . . . . . . . . . . . . . . . . 51
1.10.6. A bit of matrix algebra . . . . . . . . . . . . . . . . . . . . . 51
1.10.7. More induction proofs . . . . . . . . . . . . . . . . . . . . . 52

2. Sums and products 53


2.1. Finite sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2. Finite products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.3. Factorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.4. Binomial coefficients: Definition . . . . . . . . . . . . . . . . . . . 64
2.5. Binomial coefficients: Properties . . . . . . . . . . . . . . . . . . . 67
2.5.1. Pascal’s identity . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.5.2. The factorial formula . . . . . . . . . . . . . . . . . . . . . . 69
2.5.3. The symmetry of binomial coefficients . . . . . . . . . . . 70
2.5.4. Pascal’s triangle consists of integers . . . . . . . . . . . . . 72
2.5.5. Upper negation . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.5.6. Finding Fibonacci numbers in Pascal’s triangle . . . . . . 75
2.6. The binomial formula . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.7. More properties of binomial coefficients . . . . . . . . . . . . . . . 81

3. Elementary number theory 82


3.1. Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.1.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.1.2. Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.1.3. Divisibility criteria . . . . . . . . . . . . . . . . . . . . . . . 86
3.2. Congruence modulo n . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2.2. Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2.3. Proving the divisibility criteria . . . . . . . . . . . . . . . . 91
3.3. Division with remainder . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3.1. The theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3.2. The proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Math 221 Winter 2024, version March 12, 2024 page 3

3.3.3. An application: even and odd integers . . . . . . . . . . . 96


3.3.4. Basic properties of quotients and remainders . . . . . . . . 98
3.3.5. Base-b representation of nonnegative integers . . . . . . . 101
3.3.6. Congruence in terms of remainders . . . . . . . . . . . . . 109
3.3.7. The birthday lemma . . . . . . . . . . . . . . . . . . . . . . 110
3.4. Greatest common divisors . . . . . . . . . . . . . . . . . . . . . . . 112
3.4.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.4.2. Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.4.3. The Euclidean algorithm . . . . . . . . . . . . . . . . . . . 117
3.4.4. Bezout’s theorem and the extended Euclidean algorithm . 120
3.4.5. The universal property of the gcd . . . . . . . . . . . . . . 124
3.4.6. Factoring out a common factor from a gcd . . . . . . . . . 126
3.5. Coprime integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.5.1. Definition and examples . . . . . . . . . . . . . . . . . . . . 127
3.5.2. Three theorems about coprimality . . . . . . . . . . . . . . 129
3.5.3. Reducing a fraction . . . . . . . . . . . . . . . . . . . . . . . 131
3.6. Prime numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
3.6.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
3.6.2. The friend-or-foe lemma . . . . . . . . . . . . . . . . . . . . 132
3.6.3. There are infinitely many primes, and some more exercises 133
3.6.4. Binomial coefficients and primes . . . . . . . . . . . . . . . 135
3.6.5. Fermat’s little theorem . . . . . . . . . . . . . . . . . . . . . 136
3.6.6. Prime divisor separation theorem . . . . . . . . . . . . . . 138
3.6.7. p-valuations: definition . . . . . . . . . . . . . . . . . . . . 139
3.6.8. p-valuations: basic properties . . . . . . . . . . . . . . . . . 141
3.6.9. Back to Hanoi . . . . . . . . . . . . . . . . . . . . . . . . . . 143
3.6.10. More exercises . . . . . . . . . . . . . . . . . . . . . . . . . 146
3.6.11. The p-valuation of n! . . . . . . . . . . . . . . . . . . . . . . 146
3.6.12. Prime factorization . . . . . . . . . . . . . . . . . . . . . . . 149
3.6.13. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
3.7. Least common multiples . . . . . . . . . . . . . . . . . . . . . . . . 151
3.8. Sylvester’s xa + yb theorem (or the Chicken McNugget theorem) 154
3.9. Digression: An introduction to cryptography . . . . . . . . . . . . 160
3.9.1. Caesarian ciphers (alphabet rotation) . . . . . . . . . . . . 160
3.9.2. Keys and ciphers . . . . . . . . . . . . . . . . . . . . . . . . 164
3.9.3. The RSA cipher . . . . . . . . . . . . . . . . . . . . . . . . . 166

4. An informal introduction to enumeration 171


4.1. A refresher on sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
4.2. Counting, informally . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.3. Counting subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
4.3.1. Counting them all . . . . . . . . . . . . . . . . . . . . . . . 179
4.3.2. Counting the subsets of a given size . . . . . . . . . . . . . 181
Math 221 Winter 2024, version March 12, 2024 page 4

4.4. Tuples (aka lists) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185


4.4.1. Definition and disambiguation . . . . . . . . . . . . . . . . 185
4.4.2. Counting pairs . . . . . . . . . . . . . . . . . . . . . . . . . 186
4.4.3. Cartesian products . . . . . . . . . . . . . . . . . . . . . . . 190
4.4.4. Counting strictly increasing tuples (informally) . . . . . . 192

5. Maps (aka functions) 194


5.1. Functions, informally . . . . . . . . . . . . . . . . . . . . . . . . . . 194
5.2. Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
5.3. Functions, formally . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.4. Some more examples of functions . . . . . . . . . . . . . . . . . . 202
5.5. Well-definedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.6. The identity function . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.7. More examples, and multivariate functions . . . . . . . . . . . . . 206
5.8. Composition of functions . . . . . . . . . . . . . . . . . . . . . . . 207
5.8.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.8.2. Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.9. Jectivities (injectivity, surjectivity and bijectivity) . . . . . . . . . . 211
5.10. Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
5.10.1. Definition and examples . . . . . . . . . . . . . . . . . . . . 216
5.10.2. Invertibility is bijectivity by another name . . . . . . . . . 219
5.10.3. Uniqueness of the inverse . . . . . . . . . . . . . . . . . . . 220
5.10.4. More examples . . . . . . . . . . . . . . . . . . . . . . . . . 221
5.10.5. Inverses of inverses and compositions . . . . . . . . . . . . 222
5.11. Some exercises on jectivities and inverses . . . . . . . . . . . . . . 223
5.11.1. Exercises with solutions . . . . . . . . . . . . . . . . . . . . 223
5.11.2. More exercises . . . . . . . . . . . . . . . . . . . . . . . . . 227
5.12. Isomorphic sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

6. Enumeration revisited 233


6.1. Counting, formally . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
6.1.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
6.1.2. Rules for sizes of finite sets . . . . . . . . . . . . . . . . . . 235
6.1.3. A ∪ B and A ∩ B revisited . . . . . . . . . . . . . . . . . . . 237
6.2. Redoing some proofs rigorously . . . . . . . . . . . . . . . . . . . 238
6.2.1. Integers in an interval . . . . . . . . . . . . . . . . . . . . . 239
6.2.2. Counting all subsets . . . . . . . . . . . . . . . . . . . . . . 240
6.2.3. Counting all k-element subsets . . . . . . . . . . . . . . . . 244
6.2.4. Recounting pairs . . . . . . . . . . . . . . . . . . . . . . . . 250
6.3. Where do we stand now? . . . . . . . . . . . . . . . . . . . . . . . 252
6.4. Lacunar subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
6.4.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
6.4.2. The maximum size of a lacunar subset . . . . . . . . . . . 254
6.4.3. Counting all lacunar subsets of [n] . . . . . . . . . . . . . . 256
Math 221 Winter 2024, version March 12, 2024 page 5

6.4.4. Counting all k-element lacunar subsets of [n] . . . . . . . . 261


6.4.5. A corollary . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
6.4.6. The domino tilings connection . . . . . . . . . . . . . . . . 268
6.5. Compositions and weak compositions . . . . . . . . . . . . . . . . 269
6.5.1. Compositions . . . . . . . . . . . . . . . . . . . . . . . . . . 269
6.5.2. Weak compositions . . . . . . . . . . . . . . . . . . . . . . . 273
6.6. Selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
6.6.1. Unordered selections without repetition (= without re-
placement) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
6.6.2. Ordered selections without repetition (= without replace-
ment) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
6.6.3. Intermezzo: Listing n elements . . . . . . . . . . . . . . . . 281
6.6.4. Ordered selections with repetition (= with replacement) . 282
6.6.5. Unordered selections with repetition (= with replacement) 283
6.7. Anagrams and multinomial coefficients . . . . . . . . . . . . . . . 286
6.7.1. Counting anagrams . . . . . . . . . . . . . . . . . . . . . . 286
6.7.2. Multinomial coefficients . . . . . . . . . . . . . . . . . . . . 290
6.8. More counting problems . . . . . . . . . . . . . . . . . . . . . . . . 294
6.9. The pigeonhole principles . . . . . . . . . . . . . . . . . . . . . . . 295

7. (TODO) An introduction to combinatorial games 297


7.1. (TODO) Let’s play a game . . . . . . . . . . . . . . . . . . . . . . . 297
7.2. (TODO) The concept of a combinatorial game . . . . . . . . . . . 297
7.3. (TODO) Zermelo’s theorem . . . . . . . . . . . . . . . . . . . . . . 297
7.4. (TODO) Nim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
7.5. (TODO) Wythoff’s game . . . . . . . . . . . . . . . . . . . . . . . . 297
7.6. (TODO) Symmetry, strategy stealing and other tricks . . . . . . . 297
7.7. (TODO) Games with payoffs . . . . . . . . . . . . . . . . . . . . . 297

This work is licensed under a Creative Commons


“CC0 1.0 Universal” license.

∗∗∗
This is a set of lecture notes for my Math 221 course at Drexel University in
Winter 2024. At the moment, it is somewhat of a draft (and much of it is
copypasted from my Math 221 course in Winter 2023). Some parts of it are still
to be filled in (but the rest should already be readable).
Math 221 Winter 2024, version March 12, 2024 page 6

0. Preface
0.1. What is this?
This is a course on discrete mathematics. To us, discrete mathematics means
the mathematics of finite, discrete objects: integers, finite sets, occasionally
some more complex creatures such as graphs and polynomials. Integer se-
quences, while theoretically infinite, are also included since one usually makes
statements about finite pieces of the sequence. Much of linear algebra logically
belongs to discrete mathematics, but there are separate courses entirely devoted
to it, so we won’t touch on it here.
Discrete mathematics is in contrast to continuous mathematics, which stud-
ies real numbers, continuous functions and infinite sets. This mostly begins
with analysis (or calculus, which is its less rigorous variant).
So this course will introduce you to some of the major topics of discrete
mathematics:

• mathematical induction and recursion;

• elementary number theory (the properties of divisibility, prime numbers,


coprimality, possibly applications like the RSA cryptosystem);

• basic enumerative combinatorics (counting and binomial coefficients);

• basic combinatorial game theory (two-player games with no randomness


and full information).

We will neither go very deep nor be fully rigorous about everything. There
are deeper, more specific classes on most of these subjects:

• Math 222 (notes: [Grinbe19a], [Grinbe22]) is a quarter-length introduction


to enumerative combinatorics.

• Math 530 (notes: [Grinbe23a]) is an introduction to graph theory.

• CS 303 is a course on cryptography.

• Math 235 (notes: [Grinbe20] and [Grinbe23b]) is an introduction to math-


ematical problem-solving. This can be viewed as a continuation of this
course, leading into more advanced techniques and more exotic results.

The reader is assumed to have learned the concept of a mathematical proof,


the language of sets, and fundamental logical rules and techniques (such as
proof by contradiction).
I do not intend to give a Bourbaki-style axiomatic treatment of the subject
here, nor to spell out each proof in maximum possible rigor (though I am more
rigorous than many other undergraduate texts). My goal is merely to give a
Math 221 Winter 2024, version March 12, 2024 page 7

taste of each of several important and (if I dare say so) interesting topics, each
time veering deep enough to see some substance but not to get lost in the
jungle. Other introductions to discrete mathematics are [Levin21], [LeLeMe16],
[Newste23] and [GrKnPa94], just to name a few. There is no “standard” choice
of material for such a text; each author goes one’s own way through the vast
landscape. So do I in these notes. I have deliberately avoided anything analytic
or geometric in order to stick to the subject declared (discrete mathematics!),
but otherwise I have picked from different topics and fields. Some topics such
as graphs, posets and the construction of the number systems are nevertheless
missing from this introduction, as the lack of days in an academic quarter has
forced choices upon me.
The course that these notes were written for has a website:

https://www.cip.ifi.lmu.de/~grinberg/t/24wd/

on which you can find homework sets.

0.2. Plan
This text is split into 7 chapters:

1. Induction and recursion. Induction is one of the foundational principles


of mathematics; in a sense, it is the essence of the notion of an integer. We
explore two of its versions and many of its uses. Along the way, we in-
troduce some concepts (e.g., Fibonacci numbers and prime factorization)
that we will later revisit.

2. Sums and products. Here we define and study finite sums, finite prod-
ucts, factorials and binomial coefficients. These basic algebraic concepts
appear all over mathematics, and also offer us some more practice in us-
ing induction.

3. Elementary number theory. Here we explore the divisibility-related prop-


erties of integers: congruence modulo n; division with remainder; prime
numbers; greatest common divisors and least common multiples. One of
the most famous algorithms in mathematics – the Euclidean algorithm –
is encountered here, as are some more curious results such as the xa + yb
theorem. As an application, we briefly discuss two cryptographical algo-
rithms (methods for encrypting data): Caesarian ciphers and RSA.

4. An informal introduction to enumeration. Enumeration is another word


for counting – specifically, counting objects satisfying some properties,
such as 3-element subsets of a given 7-element set. We take a first dip into
this subject here, without formally defining what counting means; this is
to be done in a later chapter.
Math 221 Winter 2024, version March 12, 2024 page 8

5. Maps (aka functions). The notion of a map (or function, which is synony-
mous) is absolutely fundamental to modern mathematics. It encompasses
the functions known from calculus, but is more general, as it allows any
kinds of input and output. We define it rigorously and introduce its basic
features: composition, inverses, injectivity, surjectivity, bijectivity.

6. Enumeration revisited. Now that we have learned the language of maps,


we define the size of a finite set, which allows us to rigorously speak
about counting. We reproved the results previously shown informally,
and dig deeper, answering several classes of counting questions: e.g., sub-
sets, (weak) compositions, or anagrams of a word.

7. A bit of combinatorial games. If time allows: We introduce the notion of a


(combinatorial) game (for two players, with complete information and no
chance (i.e., no randomness). We study a few classical examples of such
games, such as the game of Nim.

0.3. Notations
We shall use the following notations:

• We let N denote the set of all nonnegative integers, that is, {0, 1, 2, . . .}.

• The notation |S| denotes the size (i.e., the number of elements) of a set S.

• The symbol # means “number”. For example, “# of positive integers that


have two digits” means “number of positive integers that have two digits”.

• The abbreviation “LHS” means “left hand side” (of an equation). The
abbreviation “RHS” means “right hand side”.

• The symbol ∅ means the empty set.

• The notations [n] and [ a, b] are defined in Definition 6.1.1.

• The notation “:=” means “is defined to be”. For example, “sn := 1 + 2 +
· · · + n” means that we define sn to be 1 + 2 + · · · + n.

0.4. Acknowledgments
I thank Keith Conrad, Karen Edwards, Andy Hicks and Tom Roby for helpful
advice and conversations about what a course on discrete mathematics should
contain. (Needless to say, I did not heed all of this advice in these notes.)
Your name could stand here: Please send corrections and comments to
[email protected] .
Math 221 Winter 2024, version March 12, 2024 page 9

1. Induction and recursion


1.1. The Tower of Hanoi
1.1.1. The puzzle
Let me start with a puzzle called the Tower of Hanoi.
You have 3 pegs (or rods). The first peg has n disks stacked on it. The n
disks have n different sizes, and they are stacked in the order of their size, with
the smallest one on top. Here is how this looks like for n = 8 (with the 3 pegs
numbered 1, 2, 3 from left to right):

(image by User:Evanherk on Wikipedia, licensed under the CC Attribution-


Share Alike 3.0 Unported License).
You can make a certain kind of moves (“Hanoi moves”): You can take the
topmost disk from one peg and move it on top of another peg. However, you
are only allowed to do this if this disk is smaller than the other disks currently
on the latter peg; in other words, you must never stack a larger disk atop a
smaller disk.
Your goal is to move all n disks onto the third peg.

This game can actually be played online, e.g., at https://codepen.io/eliortabeka/


pen/yOrrxG. (Be warned that this site has n = 7 hardcoded into it. But you can
easily fix this by modifying “disksNum = 3” and changing “minMoves = 127”
to “minMoves = 0”. Also note that the game allows you to win by moving all
disks to peg 2 as well, but this is clearly not a significant difference.)

1.1.2. Some explorations


Let us analyze the case n = 3. In this case, one strategy to win the game (i.e.,
achieve the goal) is as follows:

1. Move the smallest disk from peg 1 to peg 3.

2. Move the middle disk from peg 1 to peg 2.

3. Move the smallest disk from peg 3 to peg 2.

4. Move the largest disk from peg 1 to peg 3.


Math 221 Winter 2024, version March 12, 2024 page 10

5. Move the smallest disk from peg 2 to peg 1.

6. Move the middle disk from peg 2 to peg 3.

7. Move the smallest disk from peg 1 to peg 3.

So we can win in 7 moves for n = 3.

What about other values of n ? The questions we can ask are the following:

Question 1.1.1. (a) Can we always win the game?


(b) If so, then what is the smallest # of moves1 we need to make?

Let us record the answers for small values of n:

• For n = 0, we win in 0 moves (since all disks – of which there are none
– are on peg 3 already). This sounds very pedantic and pointless, but it’s
not a bad start.

• For n = 1, we win in 1 move (just move the single disk directly).

• For n = 2, we win in 3 moves. Fewer moves are not enough, for fairly
simple logical reasons: We need 1 move to free the largest disk, then 1
move to move it to peg 3, then 1 more move to get the other disk on top
of it.

• For n = 3, we win in 7 moves (as we have seen above). But do we need 7


moves, or can we do with less?

• For n = 4, what happens?

Solving the problem by brute force gets harder and harder as n grows. But
we can try to analyze our strategy for n = 3 and see if there is a pattern behind
it.
We observe that the largest disk moves only once, and its move is right in the
middle of the strategy. So our strategy for n = 3 can be summarized as follows:

1.–3. Move the two smaller disks from peg 1 onto peg 2.

4. Move the largest disk from peg 1 onto peg 3.

5.–7. Move the two smaller disks from peg 2 onto peg 3.

1 The symbol “#” means “number”.


Math 221 Winter 2024, version March 12, 2024 page 11

Moreover, the moves 1–3 in this strategy are essentially a Tower of Hanoi
game played only with the two smaller disks, except that the goal is not to
move them to peg 3 but to move them to peg 2 (but this doesn’t matter, because
the two games are clearly “isomorphic” – i.e., the roles of pegs 2 and 3 are
swapped but otherwise everything is the same). The largest disk stays at the
bottom of peg 1 all the time and thus does not prevent any of the moves (since
all the other disks are smaller than it and thus can fit on top of it).
Move 4 moves the newly liberated largest disk from peg 1 onto peg 3.
Moves 5–7 are again a little Tower of Hanoi game for the two smaller disks,
except that now they have to be moved from peg 2 to peg 3. Again, the largest
disk (which is now on the bottom of peg 3) does not interfere with any of the
moves.
Now the logic behind the above strategy has become clear (and also easier to
memorize).

Does this help us solve the n = 4 case?


Yes! We can win in 15 moves by a strategy that has the same structure:

1.–7. Move the three smaller disks from peg 1 onto peg 2. (This is a little Tower
of Hanoi game for these three smaller disks. The largest disk rests at the
bottom of peg 1 and does not interfere.)

8. Move the largest disk from peg 1 onto peg 3.

9.–15. Move the three smaller disks from peg 2 onto peg 3. (This is again a little
Tower of Hanoi game for these three smaller disks. The largest disk rests
at the bottom of peg 3 and does not interfere.)

Thus, we don’t just have a strategy for n = 3 and one for n = 4, but actually
a “meta-strategy” that lets us win the game for n disks if we know how to win
it for n − 1 disks. In a nutshell, it says “first move the n − 1 smaller disks onto
peg 2; then move the largest disk onto peg 3; then move the n − 1 smaller disks
onto peg 3”. We will still call this “meta-strategy” a strategy.

1.1.3. The numbers mn


Let us summarize what we gain from this strategy.

Definition 1.1.2. For any integer n ≥ 0, we let mn denote the # of moves


needed to win the Tower of Hanoi game with n disks. If the game cannot be
won with n disks, then we set mn = ∞ (where ∞ is not a number but just a
symbol).

Thus, both Question 1.1.1 (a) and Question 1.1.1 (b) boil down to computing
mn .
Math 221 Winter 2024, version March 12, 2024 page 12

Here is a table of small values of mn obtained using our strategy:

n 0 1 2 3 4 5 6 7 8
mn 0 1 3 7 15 31 63 127 255

Note that these values are easily computed using our strategy, because in order
to win the game for a given n, we have to win it for n − 1, then make one extra
move, then win it for n − 1 again. So we get mn = mn−1 + 1 + mn−1 = 2mn−1 + 1
(for n ≥ 1).
Right?

Not so fast! We have proved that, e.g., the game can be won in 127 moves
for n = 7. We have not proved that it cannot be won in fewer moves. So the
formula mn = 2mn−1 + 1 has been proved not for the # of moves needed to win,
but rather for the # of moves needed to win using our strategy. Maybe there is
a better strategy that wins for n = 7 in (say) 109 moves?

So what we really have proved is the following:

Proposition 1.1.3. Let n be a positive integer. If mn−1 is an integer (i.e., if


mn−1 ̸= ∞), then mn ≤ 2mn−1 + 1.

To gain some writing experience, let us write out the proof in detail:
Proof. Assume that mn−1 is an integer. Thus, we can win the game for n − 1
disks in mn−1 moves. Let S be the strategy (i.e., the sequence of moves) needed
to do this. So the strategy S moves n − 1 disks from peg 1 onto peg 3 in mn−1
moves.
Let S23 be the same strategy as S, but with the roles of pegs 2 and 3 swapped.
Thus, S23 moves n − 1 disks from peg 1 onto peg 2 in mn−1 moves.
Let S12 be the same strategy as S, but with the roles of pegs 1 and 2 swapped.
Thus, S12 moves n − 1 disks from peg 2 onto peg 3 in mn−1 moves.
Now, we proceed as follows to win the game with n disks:

A. We use strategy S23 to move the n − 1 smaller disks from peg 1 onto peg
2. (This is allowed because the largest disk rests at the bottom of peg 1
and does not interfere with the movement of smaller disks.)

B. We move the largest disk from peg 1 onto peg 3. (This is allowed because
this disk is free (i.e., there are no disks on top of it) and because peg 3 is
empty, since all the other disks are on peg 2.)

C. We use strategy S12 to move the n − 1 smaller disks from peg 2 onto peg
3. (Again, this is allowed since the largest disk rests at the bottom of peg
3 and does not interfere.)
Math 221 Winter 2024, version March 12, 2024 page 13

This strategy wins the game (for n disks) in mn−1 + 1 + mn−1 = 2mn−1 + 1
many moves. So the game for n disks can be won in 2mn−1 + 1 many moves.
In other words, mn ≤ 2mn−1 + 1. This proves Proposition 1.1.3.

Now, let us see if the inequality mn ≤ 2mn−1 + 1 that we have proved is an


equality or just an inequality – i.e., whether the above strategy is optimal or
there is a faster one. I claim it is the former:

Proposition 1.1.4. Let n be a positive integer. If mn−1 is an integer (i.e., if


mn−1 ̸= ∞), then mn = 2mn−1 + 1.

Proof. Again, assume that mn−1 is an integer.


We need to show that mn = 2mn−1 + 1. It suffices to show that mn ≥ 2mn−1 +
1 (since Proposition 1.1.3 yields mn ≤ 2mn−1 + 1, and we can combine these two
inequalities to get mn = 2mn−1 + 1). In other words, it suffices to show that any
winning strategy for n disks has at least 2mn−1 + 1 many moves.
So let us consider a winning strategy T for n disks. Somewhere during the
strategy T, the largest disk has to move (since it starts out on peg 1 but has to
end up on peg 3). Let us refer to these moves (the ones that move the largest
disk) as the special moves. There may be several special moves or just one, but
as we just said, there has to be at least one.
Before the first special move can happen, the smallest n − 1 disks have to
be moved away from peg 1 (since they would otherwise block the largest disk
from moving). Moreover, these smallest n − 1 disks must all be moved onto
the same peg (since otherwise, both pegs 2 and 3 would be occupied, and then
the largest disk would have nowhere to move). Thus, before the first special
move can happen, we must have won the Tower of Hanoi game for n − 1 disks.
Hence, before the first special move can happen, we already need to have made
mn−1 moves (since mn−1 is the smallest # of moves that can win the game for
n − 1 disks).
Now, consider what happens after the last special move. This last special
move necessarily moves the largest disk to peg 3 (since that’s where this disk
has to come to rest). After that, we still need to move all the other disks onto
peg 3. At the time we are making the last special moves, these other disks must
all be on the same peg (since they can be neither on the peg from which the
largest disk is moving, nor on the peg to which it is moving2 ). Therefore, after
the last special move, we still need to move all the remaining n − 1 disks from
one peg to another. And this is again tantamount to winning the game for n − 1
disks. So this again needs at least mn−1 moves.
So in total, we know that our strategy T needs to have

1. at least mn−1 moves before the first special move,

2 because in either case, they would block the move of the largest disk
Math 221 Winter 2024, version March 12, 2024 page 14

2. at least one special move, and


3. at least mn−1 moves after the last special move.
Thus, it needs to have at least mn−1 + 1 + mn−1 = 2mn−1 + 1 many moves
in total. This proves mn ≥ 2mn−1 + 1. As explained above, this completes the
proof of Proposition 1.1.4.

Proposition 1.1.4 confirms the table we have carelessly made before:

n 0 1 2 3 4 5 6 7 8
mn 0 1 3 7 15 31 63 127 255

Obviously, you can keep using Proposition 1.1.4 to compute m9 , m10 , m11 , . . ..
Indeed, the equation
mn = 2mn−1 + 1 (1)
is what is called a recursive formula for the numbers mn . This means a formula
that allows you to compute mn using the previous values m0 , m1 , . . . , mn−1 . In
our case, we only need the direct predecessor mn−1 , so this is a particularly
convenient recursive formula.

1.1.4. In search of an explicit formula


Still, can we perhaps do better? Can we find an explicit formula – i.e., one that
gives us mn directly?
You might have guessed a formula from our table of numbers already:
mn = 2n − 1.
Is there a way to see this without guessing? Let’s try applying the recursive
formula (1) again and again, simplifying each time:
mn = 2mn−1 + 1 (by (1))
= 2 (2mn−2 + 1) + 1 (by (1), applied to n − 1)
= 4mn−2 + 2 + 1
= 4 (2mn−3 + 1) + 2 + 1 (by (1), applied to n − 2)
= 8mn−3 + 4 + 2 + 1
= 8 (2mn−4 + 1) + 4 + 2 + 1 (by (1), applied to n − 3)
= 16mn−4 + 8 + 4 + 2 + 1
= ··· (keep going until you reach m0 )
= 2 n m 0 + 2 n − 1 + 2 n − 2 + · · · + 20
|{z}
=0
n −1
=2 + 2 n − 2 + · · · + 20
= 20 + 21 + 22 + · · · + 2n−1 .
Math 221 Winter 2024, version March 12, 2024 page 15

I claim that the right hand side is 2n − 1. Why?

1.2. The Principle of Mathematical Induction


At this place, I could explain why. But I prefer not to, since there is an eas-
ier way to prove that mn = 2n − 1 (and anyway, the above proof of mn =
20 + 21 + 22 + · · · + 2n−1 through a long computation was rather messy and
untrustworthy, so I would rather avoid relying on it).
This easier way uses one of the fundamental proof techniques in mathemat-
ics. This technique is called proof by induction, and it relies on the following
principle:

Theorem 1.2.1 (Principle of Mathematical Induction). Let b be an integer.


Let P (n) be a mathematical statement defined for each integer n ≥ b.
(For example, P (n) can be “n + 1 > n” or “n is even” or “n is prime” or
“there exists a prime number larger than n”. Note that not every statement
needs to be true (for example, “n is even” is true for some n’s and false for
others). So P (n) is a statement that depends on n; in logic, such a statement
is called a predicate.)
Assume the following:

1. The statement P (b) holds (i.e., the statement P (n) holds for n = b).

2. For each integer n ≥ b, the implication P (n) =⇒ P (n + 1) holds (i.e.,


if P (n) holds, then P (n + 1) does as well)3 .

Then, the statement P (n) holds for every integer n ≥ b.

3 Let me recall the meaning of the “=⇒” symbol:


If A and B are two statements, then “A =⇒ B” means the statement “if A, then B”. This
statement is true whenever B is true, but also true whenever A is false; only in the remaining
case (i.e., when A is true but B is false) is it false. In other words, its truth table is as follows:

A B A =⇒ B
true true true
true false false .
false true true
false false true

You can think of it as a contract: “If you make A true, then I make B true”. If you don’t
make A true, then this contract places no obligation on me, since you haven’t done your
part! The only way for me to violate the contract is if you make A true but I don’t make B
true. In other words, A =⇒ B is a “relative” statement, which is true by default if A is not.
Usually, if you want to prove an implication A =⇒ B, you start by assuming that A holds,
and you need to show that B holds (under this assumption).
Math 221 Winter 2024, version March 12, 2024 page 16

Before we discuss the true meaning of this principle, let me show how to use
it to prove our mn = 2n − 1 claim. We state this claim as a theorem:

Theorem 1.2.2 (explicit answer to Tower of Hanoi). For each integer n ≥ 0,


we let mn be the # of moves needed to win the Tower of Hanoi game with n
disks (or ∞ if it cannot be won).
Then,
m n = 2n − 1 for each integer n ≥ 0.

Proof. We denote the statement “mn = 2n − 1” by P (n). So we must prove that


P (n) holds for each integer n ≥ 0.
According to the Principle of Mathematical Induction (applied to b = 0), it
suffices (for this purpose) to show that

1. the statement P (0) holds;

2. for each integer n ≥ 0, the implication P (n) =⇒ P (n + 1) holds.

Proving these two claims will be our two goals; we call them Goal 1 and Goal
2. Let us see if we can achieve them.
Goal 1 is easy: The statement P (0) is just saying that m0 = 20 − 1, which is
true since both sides are 0.
We now start working towards Goal 2. Let n ≥ 0 be an integer. We must
prove the implication P (n) =⇒ P (n + 1). To prove this, we assume that P (n)
holds, and we set out to prove that P (n + 1) holds.
Our assumption says that P (n) holds, i.e., that

mn = 2n − 1.

In particular, mn is an integer, so that the Tower of Hanoi game for n disks is


winnable.
We need to prove that P (n + 1) holds, i.e., that
?
mn+1 = 2n+1 − 1.

(The question mark above the equality sign just serves to remind us that we
have not proved this equality yet.)
Proposition 1.1.4 yields that mn = 2mn−1 + 1 if n ≥ 1 (and if mn−1 is not ∞).
But this is not very helpful, since we are looking for mn+1 , not for mn .
However, we can also apply Proposition 1.1.4 to n + 1 instead of n (since n
is just an arbitrary integer ≥ 1 in that proposition; it is not bound to be our
current n). This gives us
mn+1 = 2mn + 1.
Math 221 Winter 2024, version March 12, 2024 page 17

Thus,

m n +1 = 2 m n + 1 = 2 · (2n − 1 ) + 1 = 2 · 2n − 2 + 1 = · 2n}
2| {z −1
|{z}
=2n −1 =2n +1
(by one of the laws
of exponents)

= 2n+1 − 1.

But this is precisely the statement P (n + 1). So we have shown that P (n + 1)


holds.
More precisely, we have shown that P (n + 1) holds under the assumption
that P (n) holds. In other words, we have proved the implication P (n) =⇒
P (n + 1). This achieves Goal 2.
So we have achieved both goals, and thus the Principle of Mathematical
Induction yields that P (n) holds for every integer n ≥ 0. In other words,
mn = 2n − 1 holds for every integer n ≥ 0. This proves the theorem.

What have we really done here? How did this proof work? What is the logic
underlying the Principle of Mathematical Induction?
Let us take a look at the structure of our above proof.
Our goal was to prove that P (n) holds for every n ≥ 0.
In other words, our goal was to prove the statements

P (0) , P (1) , P (2) , P (3) , . . . .

This is an infinite sequence of statements.


We have proved that P (0) holds; that was our Goal 1.
We have then proved that P (n) =⇒ P (n + 1) for each n. In other words, we
have proved that each statement in our sequence implies the next. In particular,
P (0) =⇒ P (1) and P (1) =⇒ P (2) and P (2) =⇒ P (3) and so on.
Combining P (0) with P (0) =⇒ P (1), we obtain P (1).
Combining P (1) with P (1) =⇒ P (2), we obtain P (2).
Combining P (2) with P (2) =⇒ P (3), we obtain P (3).
And so on. Continuing this logic, you obtain P (4), then P (5), then P (6),
and so on. By common sense, it is clear that if you keep going on like this,
you will eventually reach each statement in our infinite sequence; i.e., you will
obtain P (n) for any given integer n ≥ 0. Of course, this reasoning is informal
(“common sense” is not a mathematical concept, nor are the words “and so
on”).
Thus, if we want to use this kind of reasoning in a mathematical proof, we
need to state it as a precise principle and we need this principle to be true. The
Principle of Mathematical Induction is doing precisely that.
Math 221 Winter 2024, version March 12, 2024 page 18

Remark 1.2.3. You can metaphorically think of our proof (or any proof using
the Principle of Mathematical Induction) as an infinite daisy chain of lamps,
which stand for the statements P (0) , P (1) , P (2) , . . .: Goal 1 turns the first
lamp on, whereas Goal 2 ensures that each lamp turns the next on when it is
turned on itself.
Or, to use a more commonplace illustration, you have an infinite sequence
of dominos arranged in a row, at sufficiently close distances so that tipping
over one domino will tip over the next. After you tip over the first domino,
all the dominos will eventually fall down. (The dominos here stand for the
statements P (0) , P (1) , P (2) , . . ..)

I called the Principle of Mathematical Induction a theorem, but I will not


prove it, since it is one of the fundamental axioms of mathematics. You can at
best replace it by a different axiom, but this doesn’t change much; you need
some kind of axiom that allows you to “chain together” arbitrarily many little
implications.

1.3. Some more proofs by induction


A proof that uses the Principle of Mathematical Induction is called a proof by
induction (or an induction proof, or an inductive proof4 ). So our above proof
of Theorem 1.2.2 was a proof by induction.

1.3.1. The sum of the first n positive integers


Let us see another (simpler) example of a proof by induction. We will prove
the following result:

Theorem 1.3.1 (“Little Gauss formula”). For every integer n ≥ 0, we have

n ( n + 1)
1+2+···+n = .
2

The LHS (= left hand side) here is understood to be the sum of the first n
positive integers. For n = 0, this sum is an empty sum (i.e., it has no addends
at all), so its value is 0 by definition.
First proof of Theorem 1.3.1. We set

sn := 1 + 2 + · · · + n

4 This has barely anything to do with “inductive reasoning” as understood by philosophers


(known to mathematics as “generalization”, and not considered as a method of proof per
se).
Math 221 Winter 2024, version March 12, 2024 page 19

n ( n + 1)
for each n ≥ 0. Thus, we must prove that sn = for each n ≥ 0.
2
n ( n + 1)
Let us denote the statement “sn = ” by P (n). So we need to prove
2
that P (n) holds for every n ≥ 0.
According to the Principle of Mathematical Induction, it suffices to show that

1. the statement P (0) holds;

2. for each n ≥ 0, the implication P (n) =⇒ P (n + 1) holds.

0 (0 + 1)
Goal 1 is easy: To prove P (0), we must show that s0 = , but this is
2
true because both sides equal 0.
Now to Goal 2. We let n ≥ 0 be an integer, and we want to prove the
implication P (n) =⇒ P (n + 1). So we assume that P (n) holds, and we set out
to prove P (n + 1).
By assumption, P (n) holds, so that we have

n ( n + 1)
sn = .
2
We must prove P (n + 1); in other words, we must prove that

? (n + 1) ((n + 1) + 1)
s n +1 = .
2
To do so, we observe that

s n +1 = 1 + 2 + · · · + ( n + 1 ) = (1 + 2 + · · · + n ) + ( n + 1 )
| {z }
=sn
 
n ( n + 1) n ( n + 1)
= s n + ( n + 1) = + ( n + 1) since sn =
2 2
n ( n + 1) 2 ( n + 1) ( n + 2) ( n + 1) ( n + 1) ( n + 2)
= + = =
2 2 2 2
(n + 1) ((n + 1) + 1)
= .
2
In other words, P (n + 1) holds. Thus, we have proved the implication P (n) =⇒
P ( n + 1).
We have now achieved both goals, so the Principle of Mathematical Induction
yields that P (n) holds for every n ≥ 0. This proves the theorem.
There is also a non-inductive proof; this is how Gauss supposedly did it:
Math 221 Winter 2024, version March 12, 2024 page 20

Second proof of Theorem 1.3.1. We have

2 · (1 + 2 + · · · + n )
= (1 + 2 + · · · + n ) + (1 + 2 + · · · + n )
= (1 + 2 + · · · + n ) + ( n + ( n − 1) + · · · + 1)
 
here, we turned the second sum upside-down, i.e.,
we reversed the order of its addends
= (1 + n) + (2 + (n − 1)) + · · · + (n + 1)
| {z } | {z } | {z }
= n +1 = n +1 = n +1
 
here, we rearranged the sum by matching

 up each addend inside the first pair of 

 parentheses with the corresponding addend 
inside the second pair of parentheses
= ( n + 1) + ( n + 1) + · · · + ( n + 1)
| {z }
n addends
= n · ( n + 1) .

Dividing this by 2, we find

n · ( n + 1)
1+2+···+n = ,
2
and thus Theorem 1.3.1 is proved again.

1.3.2. The sum of the squares of the first n positive integers


Here is a similar theorem:

Theorem 1.3.2. For every integer n ≥ 0, we have

n (n + 1) (2n + 1)
12 + 22 + · · · + n2 = .
6

Proof. The following proof is almost a word-by-word copy of the first proof of
Theorem 1.3.1. The structure is the same; only the calculations change.
We set
s n : = 12 + 22 + · · · + n 2 .
n (n + 1) (2n + 1)
Thus, we must prove that sn = for each n ≥ 0.
6
n (n + 1) (2n + 1)
Let us denote the statement “sn = ” by P (n). So we need
6
to prove that P (n) holds for every n ≥ 0.
According to the Principle of Mathematical Induction, it suffices to show that
Math 221 Winter 2024, version March 12, 2024 page 21

1. the statement P (0) holds;

2. for each n ≥ 0, the implication P (n) =⇒ P (n + 1) holds.

0 (0 + 1) (2 · 0 + 1)
Goal 1 is easy: To prove P (0), we must show that s0 = ,
6
but this is true because both sides equal 0.
Now to Goal 2. We let n ≥ 0 be an integer, and we want to prove the
implication P (n) =⇒ P (n + 1). So we assume that P (n) holds, and we set out
to prove P (n + 1).
By assumption, P (n) holds, so that we have

n (n + 1) (2n + 1)
sn = .
6
We must prove P (n + 1); in other words, we must prove that

? (n + 1) ((n + 1) + 1) (2 (n + 1) + 1)
s n +1 = .
6
To do so, we observe that

sn+1 = 12 + 22 + · · · + (n + 1)2
 
= 1 + 2 + · · · + n + ( n + 1)2
2 2 2
| {z }
=sn
= s n + ( n + 1)2
 
n (n + 1) (2n + 1) 2 n (n + 1) (2n + 1)
= + ( n + 1) since sn =
6 6
 
n (2n + 1)
= ( n + 1) · + ( n + 1)
6
2n2 + 7n + 6
= ( n + 1) ·
6
2

(n + 1) 2n + 7n + 6
=
6
since 2n2 + 7n + 6 can be
 
(n + 1) (n + 2) (2n + 3)
=
6 factored as (n + 2) (2n + 3)
(n + 1) ((n + 1) + 1) (2 (n + 1) + 1)
= .
6
In other words, P (n + 1) holds. Thus, we have proved the implication P (n) =⇒
P ( n + 1).
We have now achieved both goals, so the Principle of Mathematical Induction
yields that P (n) holds for every n ≥ 0. This proves the theorem.
Math 221 Winter 2024, version March 12, 2024 page 22

As we said, our above proof of Theorem 1.3.2 was an almost verbatim copy of
our first proof of Theorem 1.3.1; we only needed to make the obvious changes
and calculate a little bit harder. Both proofs were more or less determined by
the idea to use induction. In contrast, the slick second proof of Theorem 1.3.1
cannot be adapted to Theorem 1.3.2. So the induction proof has the advantage
of better generalizability.
However, it has the disadvantage that it can only be used to prove a formula
n ( n + 1) n (n + 1) (2n + 1)
(in our case, 1 + 2 + · · · + n = or 12 + 22 + · · · + n2 = ),
2 6
not to find this formula in the first place. We could not have used induction to
answer the question “what is 1 + 2 + · · · + n?”; we could only use it to prove
the answer after guessing it in some way.

Exercise 1.3.1. Prove that


 2
3 3 3 n ( n + 1)
1 +2 +···+n =
2

for each nonnegative integer n. (The left hand side here is the sum of the
cubes of the first n positive integers.)

1.4. Notations for an induction proof


Here is some standard terminology that is commonly used in proofs by induc-
tion. Let’s say that you are proving a statement of the form P (n) for every
integer n ≥ b (where b is some fixed integer).

• The n is called the induction variable; you say that you induct on n. It
does not have to be called n. Your statement might just as well be “for
a ( a + 1)
every integer a ≥ 0, we have 1 + 2 + · · · + a = ”, and then you
2
can prove it by inducting on a.

• The proof of P (b) (that is, Goal 1 in our above proofs) is called the in-
duction base or the base case. In our above examples, this was always
the proof of P (0), but in general b can be another integer. (For example,
if you are proving the statement “every integer n ≥ 4 satisfies 2n ≥ n2 ”,
then b will have to be 4, so your induction base consists in proving that
24 ≥ 42 .)

• The proof of “P (n) =⇒ P (n + 1) for every n ≥ b” (that is, Goal 2


in our above proofs) is called the induction step. For example, in the
proof of Theorem 1.3.2, this was the part where we assumed that sn =
n (n + 1) (2n + 1) (n + 1) ((n + 1) + 1) (2 (n + 1) + 1)
and proved that sn+1 = .
6 6
Math 221 Winter 2024, version March 12, 2024 page 23

In the induction step, the assumption that P (n) holds is called the induc-
tion hypothesis or the induction assumption, and the claim that P (n + 1)
holds (this is the claim that you are trying to prove) is called the induction
goal. The induction step is complete when the induction goal is reached
(i.e., proved).
As an example, let us rewrite our above proof of Theorem 1.2.2 using this
language:
Proof of Theorem 1.2.2, rewritten. We induct on n.
Base case: The theorem5 holds for n = 0, since both m0 and 20 − 1 equal 0.
Induction step: Let n ≥ 0 be an integer. We assume that the theorem holds for
n (this is what we previously called P (n)). We will now show that the theorem
holds for n + 1 as well (this is what we previously called P (n + 1)).
We have assumed that the theorem holds for n. In other words, mn = 2n − 1.
This is our induction hypothesis.
We must prove that the theorem holds for n + 1. In other words, we must
?
prove that mn+1 = 2n+1 − 1.
To prove this, we apply Proposition 1.1.4 to n + 1 instead of n (we can do
this, since mn = 2n − 1 is not ∞). This gives us
m n +1 = 2 mn + 1 = 2 · (2n − 1 ) + 1
|{z}
=2n −1
(by the induction hypothesis)

· 2n} − 1 = 2n+1 − 1.
= 2 · 2n − 2 + 1 = 2| {z
=2n +1

Thus, the induction goal is reached, and the induction is complete. Hence, the
theorem is proved.

1.5. The Fibonacci numbers


1.5.1. Definition
Our next applications of induction will be some properties of the Fibonacci
sequence. The Fibonacci sequence is defined recursively – i.e., a given en-
try is not defined directly, but rather defined in terms of the previous entries.
Specifically, it is defined as follows:

Definition 1.5.1. The Fibonacci sequence is the sequence ( f 0 , f 1 , f 2 , . . .) of


nonnegative integers defined recursively by setting

f 0 = 0, f 1 = 1, and
f n = f n −1 + f n −2 for each n ≥ 2.

5 i.e., Theorem 1.2.2


Math 221 Winter 2024, version March 12, 2024 page 24

In other words, the Fibonacci sequence starts with the two entries 0 and 1,
and then every next entry is the sum of the two previous entries.
The entries of the Fibonacci sequence are called the Fibonacci numbers. Let
us compute the first fourteen of them:

n 0 1 2 3 4 5 6 7 8 9 10 11 12 13
.
fn 0 1 1 2 3 5 8 13 21 34 55 89 144 233

As we see, a recursive definition is a perfectly valid way to define (e.g.) a


sequence of numbers. It allows you to compute each entry of the sequence
eventually, as long as you compute the entries in order (i.e., first f 0 , then f 1 ,
then f 2 , and so on). In a sense, the reason why this works is the same as the
reason why induction works: You can get to any integer n ≥ 0 if you start at 0
and keep adding 1.
Note that it is important that our recursive definition of f n uses only previous
entries of the sequence (in our case, f n−1 and f n−2 ). If we had instead defined
the Fibonacci sequence by
f n = f n +1 − f n −2 ,
then we could not even compute f 2 , since this would require knowing f 3 , which
would in turn require knowing f 4 , and so on.

1.5.2. The sum of the first n positive Fibonacci numbers


The Fibonacci sequence is famous for its many properties and patterns6 . Here
is a first one:

Theorem 1.5.2. For any integer n ≥ 0, we have

f 1 + f 2 + · · · + f n = f n+2 − 1.

For example, for n = 8, this is saying that

1 + 1 + 2 + 3 + 5 + 8 + 13 + 21 = 55 − 1.

Proof of Theorem 1.5.2. We induct on n.


Base case: For n = 0, the theorem claims that f 1 + f 2 + · · · + f 0 = f 0+2 − 1.
This is true, since the LHS7 is an empty sum (thus = 0) whereas the RHS is
f 2 − 1 = 1 − 1 = 0.
Induction step: Let n ≥ 0 be an integer. Assume that the theorem holds for n.
We must prove that the theorem holds for n + 1.
6 There is an entire book about it (Vorobiev’s [Vorobi02], which I can recommend).
7I remind that the abbreviations “LHS” and “RHS” mean “left hand side” and “right hand
side”, respectively.
Math 221 Winter 2024, version March 12, 2024 page 25

So we assumed that

f 1 + f 2 + · · · + f n = f n+2 − 1.

We must prove that


?
f 1 + f 2 + · · · + f n+1 = f (n+1)+2 − 1.

We have

f 1 + f 2 + · · · + f n +1 = ( f 1 + f 2 + · · · + f n ) + f n +1 = f n +2 − 1 + f n +1
| {z }
= f n +2 −1
(by our induction hypothesis)
= f +f − 1 = f n +3 − 1
|n+2 {z n+}1
= f n +3
(since the recursive definition
of the Fibonacci sequence
yields f n+3 = f n+2 + f n+1 )
= f (n+1)+2 − 1 (since n + 3 = (n + 1) + 2) .

This is precisely what we wanted to prove – i.e., it says that the theorem holds
for n + 1. This completes the induction step. Thus, the theorem is proved.

1.6. Some more examples of induction


Let us see some more examples of proofs by induction. The following theorem
I have already mentioned at the end of Section 1.1:

Theorem 1.6.1. For any integer n ≥ 0, we have

20 + 21 + 22 + · · · + 2n−1 = 2n − 1.

Proof. We induct on n.
Base case: For n = 0, the equality 20 + 21 + 22 + · · · + 2n−1 = 2n − 1 is true,
because the LHS8 is an empty sum and thus equals 0, whereas the RHS is
20 − 1 = 1 − 1 = 0.
Induction step: Let n be an integer ≥ 0. Assume that Theorem 1.6.1 holds for
n, i.e., that we have

20 + 21 + 22 + · · · + 2n−1 = 2n − 1.

We must prove that Theorem 1.6.1 holds for n + 1 as well, i.e., that we have

20 + 21 + 22 + · · · + 2(n+1)−1 = 2n+1 − 1.
8 “LHS” means “left-hand side”. Likewise, “RHS” means “right-hand side”.
Math 221 Winter 2024, version March 12, 2024 page 26

However,
20 + 21 + 22 + · · · + 2(n+1)−1 = 20 + 21 + 22 + · · · + 2n
 
= 20 + 21 + 22 + · · · + 2n−1 + 2n
| {z }
=2n −1
(by the induction hypothesis)

· 2n} − 1 = 2n+1 − 1,
= 2n − 1 + 2n = 2| {z
=2n +1

which is precisely what we want: This shows that Theorem 1.6.1 holds for n + 1.
Thus, our induction step is complete, and Theorem 1.6.1 is proved.
Theorem 1.6.1 can be generalized:
Theorem 1.6.2. Let x and y be any two numbers. Then, for any integer n ≥ 0,
we have
 
( x − y) x n−1 + x n−2 y + x n−3 y2 + · · · + x2 yn−3 + xyn−2 + yn−1 = x n − yn .

Here, the big sum in the parentheses is the sum of all products xi y j where i
and j are nonnegative integers with i + j = n − 1.

Before we prove this, let us give some examples for what this theorem actu-
ally says:
• For n = 2, Theorem 1.6.2 says that
( x − y ) ( x + y ) = x 2 − y2 .

• For n = 3, Theorem 1.6.2 says that


 
( x − y) x + xy + y = x3 − y3 .
2 2

• For n = 4, Theorem 1.6.2 says that


 
( x − y) x + x y + xy + y = x4 − y4 .
3 2 2 3

• For x = 2 and y = 1, Theorem 1.6.2 says that


 
n −1 n −2 n −3 2 2 n −3 n −2 n −1
(2 − 1) 2 +2 1+2 1 +···+2 1 +2·1 +1 = 2n − 1n .

Since any power of 1 is 1 (and since the 2 − 1 factor also equals 1), this
simplifies to
2n−1 + 2n−2 + 2n−3 + · · · + 22 + 2 + 1 = 2n − 1,
which is precisely Theorem 1.6.1. Thus, Theorem 1.6.2 generalizes Theo-
rem 1.6.1.
Math 221 Winter 2024, version March 12, 2024 page 27

Let us now prove Theorem 1.6.2:


Proof of Theorem 1.6.2. We induct on n.
Base case: For n = 0, the claim
 
n −1 n −2 n −3 2 2 n −3 n −2 n −1
( x − y) x +x y+x y +···+x y + xy +y = x n − yn

is true, since the LHS is 0 (because the second factor is an empty sum), while
the RHS is x0 − y0 = 1 − 1 = 0 as well.
Induction step: Let n ≥ 0 be an integer. Assume that Theorem 1.6.2 is true for
n. That is, assume that
 
( x − y) x n−1 + x n−2 y + x n−3 y2 + · · · + x2 yn−3 + xyn−2 + yn−1 = x n − yn .

We must prove that Theorem 1.6.2 is also true for n + 1. That is, we must prove
that
 
n −1 n −2 2 3 n −3 2 n −2 n −1
n
( x − y) x + x y + x y + · · · + x y +x y + xy + y = x n +1 − y n +1 .
n

We begin by extracting the yn addend from the long sum in the second pair
of parentheses in this equation. We thus obtain
 
( x − y) x n + x n−1 y + x n−2 y2 + · · · + x3 yn−3 + x2 yn−2 + xyn−1 + yn
 
= ( x − y) x n + x n−1 y + x n−2 y2 + · · · + x3 yn−3 + x2 yn−2 + xyn−1 + ( x − y) yn
| {z }
=( x n − 1 n − 2 n − 3 2 2 n − 3 n − 2 n − 1
+x y+ x y +···+ x y + xy +y )x
(here, we have factored out an x from the sum)
 
= ( x − y) x n−1 + x n−2 y + x n−3 y2 + · · · + x2 yn−3 + xyn−2 + yn−1 x + ( x − y) yn
| {z }
= x n −yn
(by the induction hypothesis)

= ( x n − yn ) x + ( x − y) yn = x n+1 − xyn + xyn − yn+1 = x n+1 − yn+1 .

This means precisely that Theorem 1.6.2 is also true for n + 1. Thus, the induc-
tion step is complete, and the theorem is proved.
Another useful particular case of Theorem 1.6.2 is the following equality:9

Corollary 1.6.3. Let q be a number distinct from 1. Let n ≥ 0 be an integer.


Then,
qn − 1
q 0 + q 1 + q 2 + · · · + q n −1 = .
q−1

9A “corollary” means a theorem that follows easily from another theorem.


Math 221 Winter 2024, version March 12, 2024 page 28

Proof. Apply Theorem 1.6.2 to x = q and y = 1. We obtain


 
n −1 n −2 n −3 2 2 n −3 n −2 n −1
( q − 1 ) q + q 1 + q 1 + · · · + q 1 + q · 1 + 1 = q n − 1n .

Simplifying this, we obtain


 
(q − 1) qn−1 + qn−2 + qn−3 + · · · + q2 + q + 1 = qn − 1.

Thus,
qn − 1
q n −1 + q n −2 + q n −3 + · · · + q 2 + q + 1 = .
q−1
In other words,
qn − 1
q 0 + q 1 + q 2 + · · · + q n −1 =
q−1
(since the sum on the left hand side can be rearranged in any order). This
proves Corollary 1.6.3.

Exercise 1.6.1. Let N denote the set of all nonnegative integers (that is,
{0, 1, 2, . . .}). Let q and d be two real numbers such that q ̸= 1. Let
( a0 , a1 , a2 , . . .) be a sequence of real numbers. Assume that

an+1 = qan + d for each n ∈ N. (2)

Prove that
qn − 1
a n = q n a0 + d for each n ∈ N. (3)
q−1

1.7. How not to use induction


Induction proofs can be slippery:

Theorem 1.7.1 (Fake theorem). In any set of n ≥ 1 horses, all the horses are
the same color.

Proof. We induct on n.
Base case: This is clearly true for n = 1, since a single horse always has the
same color as itself.
Induction step: Let n ≥ 1 be an integer. We assume that the theorem holds for
n, i.e., that any n horses are the same color.
We must prove that it also holds for n + 1, i.e., that any n + 1 horses are the
same color.
So let H1 , H2 , . . . , Hn+1 be n + 1 horses.
By our induction hypothesis, the first n horses H1 , H2 , . . . , Hn are the same
color.
Math 221 Winter 2024, version March 12, 2024 page 29

Again by our induction hypothesis, the last n horses H2 , H3 , . . . , Hn+1 are the
same color.
Now, consider the first horse H1 and the last horse Hn+1 . They both have the
same color as the “middle horses” H2 , H3 , . . . , Hn (according to the preceding
two paragraphs). Thus, all the n + 1 horses have the same color, right?
When a claim is as obviously wrong as this one, there is an easy way to find
the mistake in the proof: You just look at some example in which the claim is
wrong, and you trace the proof on this example. The first time you see a wrong
conclusion, that’s where the error probably is.
Theorem 1.7.1 is wrong for n = 2 already, i.e., for two horses. So let us see
where the induction step goes wrong when n = 1 (that is, going from 1 horse
to 2 horses). In this induction step, we claim that H1 and Hn+1 = H2 both have
the same color as the “middle horses” H2 , H3 , . . . , H1 . But there are no “middle
horses”, so it makes no sense to have the same color as these “middle horses”.
So the argument doesn’t work.
Thus, our mistake was to implicitly treat the “middle horses” as if they ex-
isted. They do exist for any n > 1, but not for n = 1, and thus our induction
step breaks down for n = 1.
Note how one little mistake has brought down the entire proof! For an in-
duction proof to work, the induction step needs to work for all n; that is, we
need the implication P (n) =⇒ P (n + 1) to hold for every n. If even one of
these implications breaks down, the whole chain is disconnected, and all the
statements P (n) “to the right of” this breaking point are no longer guaranteed
to hold. For example, if we have a statement P (n) for each n ≥ 0, and we have
proved the base case P (0) and the implication P (n) =⇒ P (n + 1) for all n ̸= 4,
then we can conclude that P (0) , P (1) , P (2) , P (3) and P (4) hold, but we
cannot guarantee that any of P (5) , P (6) , P (7) , . . . hold. As so often, a chain
is only as strong as its weakest link.

1.8. More on the Fibonacci numbers


Recall the Fibonacci sequence, which we defined in Definition 1.5.1. We recall
that it is the sequence ( f 0 , f 1 , f 2 , . . .) of nonnegative integers defined recursively
by setting f 0 = 1 and f 1 = 1 and f n = f n−1 + f n−2 for each n ≥ 2.
The entries of the Fibonacci sequence are called the Fibonacci numbers. Here
are the first few:

n 0 1 2 3 4 5 6 7 8 9 10 11 12 13
.
fn 0 1 1 2 3 5 8 13 21 34 55 89 144 233

We proved a first property of Fibonacci numbers (Theorem 1.5.2) a while


ago. In this section, we shall prove some deeper properties of the Fibonacci
sequence.
Math 221 Winter 2024, version March 12, 2024 page 30

As a warm-up, we begin with two (inconsequential but neat) identities:

Exercise 1.8.1. (a) Prove that every nonnegative integer n satisfies

f 1 + f 3 + f 5 + · · · + f 2n−1 = f 2n .

(The left hand side is the sum of all f 2i−1 with i ∈ {1, 2, . . . , n}.)
(b) Prove that every nonnegative integer n satisfies

f 0 + f 2 + f 4 + · · · + f 2n = f 2n+1 − 1.

(The left hand side is the sum of all f 2i with i ∈ {0, 1, . . . , n}.)

1.8.1. The addition theorem


The next theorem is one of the most important properties of the Fibonacci se-
quence.

Theorem 1.8.1 (addition theorem for Fibonacci numbers). We have

f n + m +1 = f n f m + f n +1 f m +1 for all integers n, m ≥ 0.

Proof. Can you induct on two variables at the same time? Not directly (although
you can induct on n and then induct on m in the induction step, so that you
have one induction proof inside another). Fortunately, we don’t need to do this
here. It suffices to induct on one of the variables.
To be specific, let us induct on n. To that purpose, for every integer n ≥ 0,
we define the statement P (n) to say
“for all integers m ≥ 0, we have f n+m+1 = f n f m + f n+1 f m+1 ”.
(Don’t forget the “for all integers m ≥ 0” part! The statement P (n) is not
just a single equality f n+m+1 = f n f m + f n+1 f m+1 for some specific value of
m, but rather combines infinitely many such equalities, one for each integer
m ≥ 0. If we fixed a value of m and defined P (n) to be just the single equality
f n+m+1 = f n f m + f n+1 f m+1 , then the induction proof below would not work,
because we are going to apply the induction hypothesis to a different m than
we start with.)
We shall now prove this statement P (n) for all n ≥ 0 by induction on n.
Base case: We must prove P (0). In other words, we must prove that
“for all integers m ≥ 0, we have f 0+m+1 = f 0 f m + f 0+1 f m+1 ”.
This is easy to show: For all integers m ≥ 0, we have f 0+m+1 = f m+1 and
f 0 f m + f 0+1 f m+1 = 0 f m + 1 f m+1 = f m+1 , so the two sides are equal.
|{z} |{z}
=0 = f 1 =1
Math 221 Winter 2024, version March 12, 2024 page 31

Induction step: Let n ≥ 0 be an integer. We assume that P (n) holds. We must


show that P (n + 1) holds.
Our induction hypothesis says that P (n) holds, i.e., that
“for all integers m ≥ 0, we have f n+m+1 = f n f m + f n+1 f m+1 ” holds.
We must prove that P (n + 1) holds, i.e., that
“for all integers m ≥ 0, we have f n+1+m+1 = f n+1 f m + f n+1+1 f m+1 ” holds.
To prove this, we let m ≥ 0 be an integer. Then,
f n +1 f m + f 1+1 f m +1
|n +
{z }
= f n +2
= f n +1 + f n
(by the recursive
definition of the
Fibonacci numbers)
= f n +1 f m + ( f n +1 + f n ) f m +1
= f n +1 f m + f n +1 f m +1 + f n f m +1
= f n +1 ( f m + f m +1 ) + f n f m +1
| {z }
= f m +1 + f m
= f m +2
(by the recursive
definition of the
Fibonacci numbers)
= f n +1 f m +2 + f n f m +1 = f n f m +1 + f n +1 f m +2 . (4)
Now, recall that the induction hypothesis says that P (n) holds, i.e., that
“for all integers m ≥ 0, we have f n+m+1 = f n f m + f n+1 f m+1 ” holds.
Note that the m in this statement is a bound variable, i.e., it has nothing to do
with the m that we have fixed; it just happens to have the same name. Thus,
we are free to apply our induction hypothesis P (n) not to the current m, but to
any other m as well. In particular, we can apply it to m + 1 instead of m. Thus,
we obtain
f n + m +1+1 = f n f m +1 + f n +1 f m +1+1 .
10 This can be trivially simplified to
f n + m +2 = f n f m +1 + f n +1 f m +2 .
10 Let me explain this again in a slightly clearer (if longer) way.
Our induction hypothesis tells us that

“for all integers m ≥ 0, we have f n+m+1 = f n f m + f n+1 f m+1 ” holds.

We can rename the variable m as p in this statement (since it is just a bound variable). Thus,
we obtain that

“for all integers p ≥ 0, we have f n+ p+1 = f n f p + f n+1 f p+1 ” holds.


Math 221 Winter 2024, version March 12, 2024 page 32

This equality has the same right hand side as (4). Thus, the left hand sides of
the two equalities must be equal as well. In other words, we must have

f n + m +2 = f n +1 f m + f n +1+1 f m +1 .

Since n + m + 2 = n + 1 + m + 1, we can rewrite this as

f n +1+ m +1 = f n +1 f m + f n +1+1 f m +1 .

Thus, we have proved that for all integers m ≥ 0, we have f n+1+m+1 =


f n+1 f m + f n+1+1 f m+1 . In other words, we have proved that P (n + 1) holds.
So the induction step is complete, and Theorem 1.8.1 is proved.
The next exercise gives two further properties of the Fibonacci sequence:

Exercise 1.8.2. (a) Show that every positive integer n satisfies

f n+1 f n−1 − f n2 = (−1)n .

(The word “Show” is a synonym for “Prove”.)


(b) Show that every nonnegative integer n satisfies

f 12 + f 22 + · · · + f n2 = f n f n+1 .

(The left hand side here is the sum of the squares of the first n positive
Fibonacci numbers.)

The following exercise generalizes Theorem 1.8.1 to a more general class of


recursively defined sequences:

Exercise 1.8.3. Let u and v be two real numbers. Let ( x0 , x1 , x2 , . . .) be a


sequence of real numbers such that x0 = 0 and x1 = 1 and

xn = uxn−1 + vxn−2 for each n ≥ 2.

(When u = 1 and v = 1, this is the Fibonacci sequence.) Prove that

xn+m+1 = vxn xm + xn+1 xm+1 for all integers n, m ≥ 0.

Now, applying this latter statement to p = m + 1 (where m is the m that we fixed), we obtain

f n + m +1+1 = f n f m +1 + f n +1 f m +1+1 .
Math 221 Winter 2024, version March 12, 2024 page 33

1.8.2. Divisibility of Fibonacci numbers


Our next theorem involves divisibility of integers. We will study this in more
detail in Section 3.1 (it is the fundamental concept of number theory), but for
now let me give its definition:

Definition 1.8.2. Let a and b be two integers. We say that a divides b (and
we write a | b) if there exists an integer c such that b = ac. Equivalently, we
say that b is divisible by a in this case.

For example, we have 2 | 4 and 3 | 12 and 10 | 30 and 0 | 0 and 5 | 0. But


we don’t have 2 | 3 or 0 | 1. The integer 0 is divisible by every integer, but only
divides itself.
Now we can state a divisibility property of Fibonacci numbers:

Theorem 1.8.3. If a, b ≥ 0 are two integers that satisfy a | b, then f a | f b .

In other words, in our above table of Fibonacci numbers, if some entry of the
first row divides some other entry of the first row, then the same holds for the
corresponding entries of the second row. For example, 6 | 12 implies f 6 | f 12
(which is saying that 8 | 144).
Proof of Theorem 1.8.3. It is reasonable to try induction. However, inducting on
a does not lead anywhere: The base case is easy, but in the induction step it
is completely unclear how to reach the goal, since the condition a | b in the
induction hypothesis usually has nothing to do with the condition a + 1 | b in
the induction goal.
Similar problems appear if you try to induct on b. So neither of the two
variables in the theorem is suitable for being inducted on.
What can we do? Give up on induction?
Not so fast. One thing we haven’t tried is to introduce a new variable and
then induct on that new variable.
To do so, we observe that two integers a, b ≥ 0 satisfy a | b if and only if there
exists an integer c such that b = ac (by the definition of “divides”). Moreover,
if this integer c exists, then it can be chosen to be ≥ 0 (this is automatic when
b
b ̸= 0, because c = > 0 in this case; but otherwise we can achieve this by
a
simply choosing c = 0). Thus, two integers a, b ≥ 0 satisfy a | b if and only if
there exists an integer c ≥ 0 such that b = ac.
Hence, a pair of integers a, b ≥ 0 satisfying a | b is nothing but a pair of the
form a, ac where a, c ≥ 0 are integers. This allows us to restate Theorem 1.8.3
as follows:

Restated theorem: “For any integers a, c ≥ 0, we have f a | f ac .”


Math 221 Winter 2024, version March 12, 2024 page 34

Now, we shall prove this restated theorem by induction on c. In other words,


for each c ≥ 0, we shall prove the statement

P (c) := (“for any integer a ≥ 0, we have f a | f ac ”) .

Base case: We must prove P (0). In other words, we must prove that

“for any integer a ≥ 0, we have f a | f a·0 ”.

But this is easy, because for any integer a ≥ 0, we have f a·0 = f 0 = 0, which is
divisible by any integer (thus in particular by f a ).
Induction step: Let c ≥ 0 be an integer. We assume that P (c) holds, i.e., that

“for any integer a ≥ 0, we have f a | f ac ” holds.

We must prove that P (c + 1) holds, i.e., that

“for any integer a ≥ 0, we have f a | f a(c+1) ” holds.

Let a ≥ 0 be any integer. Then, the induction hypothesis (i.e., our assumption
that P (c) holds) yields that f a | f ac . In other words, f ac = f a p for some integer
p. Now,

f a(c+1) = f ac+a = f ac+(a−1)+1


 
by Theorem 1.8.1,
= f ac f a−1 + f ac+1 f a
|{z} applied to n = ac and m = a − 1
= fa p
= f a p f a−1 + f ac+1 f a = f a · ( p f a−1 + f ac+1 ) .
| {z }
an integer

This immediately yields that f a | f a(c+1) . Thus, we have shown that for any
integer a ≥ 0, we have f a | f a(c+1) . In other words, we have proved that P (c + 1)
holds. This completes the induction step, and thus the restated theorem is
proved. Therefore, the original Theorem 1.8.3 is also proved.
...............
Is it? There is a subtle gap in our above argument. Can you find it?
...............
Can you? Don’t look down just yet. The gap is somewhere above!
...............
This time, the theorem itself is correct, so you can’t find the gap by tracing
the proof through a case where the theorem is false. Though an example might
be useful...
...............
Math 221 Winter 2024, version March 12, 2024 page 35

No, we didn’t misuse the principle of induction. The structure of the proof
is fine. (Actually, we could have made our statements a bit shorter by fixing
a ≥ 0, but this wouldn’t have made much of a difference.)
...............
The base case was fine, too.
...............
A computer, of course, would spot the problem.
If you tried to formalize the above proof in a computer language (e.g., Coq
or Lean), you would run into a type mismatch error. Some statement has been
proved for variables of a certain type, but is being used for variables of a dif-
ferent type. Very slightly different.
...............
The statement in question is Theorem 1.8.1. It is stated for one kind of vari-
ables, but we have used it for a slightly different kind.
...............
OK, I am spelling it out: Theorem 1.8.1 (i.e., the addition formula f n+m+1 =
f n f m + f n+1 f m+1 ) has been stated and proved for all integers n, m ≥ 0, but we
have applied it to n = ac and m = a − 1. For this to work, we need ac ≥ 0 and
a − 1 ≥ 0. Now, ac ≥ 0 is indeed satisfied (since a ≥ 0 and c ≥ 0), but a − 1 ≥ 0
holds only if a ≥ 1, which is not guaranteed. Thus, our use of Theorem 1.8.1
was illegal when a = 0. And indeed, if we apply Theorem 1.8.1 for a = 0,
then we end up with an f −1 term, which is undefined. Even if you define f −1
appropriately (and there is a good definition; see Subsection 1.10.2), we have
not proved Theorem 1.8.1 for negative n, m. So there is a gap in our proof. Can
we fix it?
...............
Fortunately, we can: Our argument breaks down only in the case when a = 0,
and we can just treat this case a = 0 manually, since it is an easy case. So we
build a case distinction into our above induction step. Thus, the induction step
takes the following form:
Induction step (corrected): Let c ≥ 0 be an integer. We assume that P (c) holds,
i.e., that
“for any integer a ≥ 0, we have f a | f ac ” holds.
We must prove that P (c + 1) holds, i.e., that

“for any integer a ≥ 0, we have f a | f a(c+1) ” holds.

Let a ≥ 0 be any integer. We must show that f a | f a(c+1) . We are in one of the
following two cases:
Case 1: We have a = 0.
Case 2: We have a ̸= 0.
Math 221 Winter 2024, version March 12, 2024 page 36

In Case 1, we have a = 0, so that both f a and f a(c+1) equal f 0 = 0, and thus


f a | f a(c+1) holds (since 0 | 0). Thus, the divisibility f a | f a(c+1) is proved in Case
1.
Now, consider Case 2. In this case, a ̸= 0, so that a ≥ 1 (because a is an
integer and ≥ 0). Hence, a − 1 ≥ 0. This will allow us to apply Theorem 1.8.1
to n = ac and m = a − 1 in a few moments. The induction hypothesis (i.e., our
assumption that P (c) holds) yields that f a | f ac . In other words, f ac = f a p for
some integer p. Now,

f a(c+1) = f ac+a = f ac+(a−1)+1


 
by Theorem 1.8.1,
= f ac f a−1 + f ac+1 f a
|{z} applied to n = ac and m = a − 1
= fa p
= f a p f a−1 + f ac+1 f a = f a · ( p f a−1 + f ac+1 ) .
| {z }
an integer

This immediately yields that f a | f a(c+1) .


So we have proved f a | f a(c+1) in both Cases 1 and 2. Therefore, f a | f a(c+1)
always holds.
Thus, P (c + 1) is proved. This completes the induction step, and thus the
restated theorem is proved. Therefore, Theorem 1.8.3 is proved – correctly this
time!

1.8.3. Binet’s formula


Is there an explicit formula for f n , that is, a formula that does not rely on the
previous entries of the Fibonacci sequence?
Yes, there is one; it is known as Binet’s formula:

Theorem 1.8.4 (Binet’s formula). Let


√ √
1+ 5 1− 5
φ= ≈ 1.618 . . . and ψ= ≈ −0.618 . . . .
2 2
Then,
φn − ψn
fn = √ for every integer n ≥ 0.
5

Some remarks:

• The number φ is called the golden ratio, and is famous for many prop-
erties, including the fact that φ2 = φ + 1 (which you can easily check by
Math 221 Winter 2024, version March 12, 2024 page 37

expanding both sides11 ). The number ψ is its so-called conjugate and also
satisfies ψ2 = ψ + 1.

• The numbers f n are integers, but Binet’s formula expresses them in terms
of two irrational numbers φ and ψ. This should be rather unexpected.

• As n grows large, ψn approaches 0 (since −1 < ψ < 1), whereas φn grows


exponentially (since φ > 1). So f n also grows exponentially (according to
Binet’s formula), with growth rate φ ≈ 1.618 . . ..

Two questions arise:

1. How do we prove Binet’s formula?

2. How could we find Binet’s formula if we didn’t already know it?

We will answer Question 1 soon. Question 2 is significantly trickier and will


not be answered in this course12 .

Let us try to prove Binet’s formula by induction on n:


Attempted proof of Binet’s formula. We induct on n:
Base case: For n = 0, we have f n = f 0 = 0 and

φn − ψn φ0 − ψ0 1−1
√ = √ = √ = 0.
5 5 5
Thus, Binet’s formula holds for n = 0.
Induction step: Let n ≥ 0 be an integer.
Assume (as induction hypothesis) that Binet’s formula holds for n; we must
prove that it holds for n + 1.


11 Namely: 1+ 5
From φ = , we obtain
2
√ !2 √ √ √ √
2 1 + 5 1+2 5+5 6+2 5 3+ 5 1+ 5
φ = = = = = + 1 = φ + 1.
2 4 4 2 2

12 Answers at different levels of generality can be found in:

• [Grinbe20, Subsection 4.9.2] (which solves any linear recurrence of the form xn =
axn−1 + bxn−2 for constant numbers a and b in an explicit and elementary way);
• [Melian01] and [Ivanov08] (which solve the more general version xn = a1 xn−1 +
a2 xn−2 + · · · + ak xn−k in terms of the eigenvalues of a matrix).

Textbooks on combinatorics or advanced linear algebra also tend to discuss such se-
quences (called linearly recurrent sequences).
Math 221 Winter 2024, version March 12, 2024 page 38

So we must prove that

φ n +1 − ψ n +1
f n +1 = √ .
5
The recursive definition of the Fibonacci sequence yields

φn − ψn
f n +1 = f n + f n −1 = √ + f n −1 (by the induction hypothesis) .
5
So far so good, but how can we simplify f n−1 ? Our induction hypothesis only
φn − ψn
tells us that f n = √ , but it says nothing about f n−1 .
5
So this induction proof does not work.13
Let us see how to fix this by introducing a more advanced version of induc-
tion.

1.9. Strong induction


1.9.1. Reminder on regular induction
Recall the (original) principle of mathematical induction:

Theorem 1.9.1 (Principle of Mathematical Induction). Let b be an integer.


Let P (n) be a mathematical statement defined for each integer n ≥ b.
Assume the following:

1. “Base case”: The statement P (b) holds.

2. “Induction step”: For each integer n ≥ b, the implication P (n) =⇒


P (n + 1) holds.

Then, the statement P (n) holds for every integer n ≥ b.

We can restate this principle slightly by renaming the n in the induction step
as n − 1 (so that the implication P (n) =⇒ P (n + 1) turns into P (n − 1) =⇒
P (n)). Thus, it takes the following form:

Theorem 1.9.2 (Principle of Mathematical Induction, restated). Let b be an


integer.
Let P (n) be a mathematical statement defined for each integer n ≥ b.
Assume the following:

1. “Base case”: The statement P (b) holds.


13 There is also one more little (fixable) gap in the above attempted proof. Do you see it?
Math 221 Winter 2024, version March 12, 2024 page 39

2. “Induction step”: For each integer n > b, the implication P (n − 1) =⇒


P (n) holds.

Then, the statement P (n) holds for every integer n ≥ b.

The idea behind the principle (in either form) is that the base case gives us
P (b) whereas the induction step gives us the implications

P (b) =⇒ P (b + 1) ,
P (b + 1) =⇒ P (b + 2) ,
P (b + 2) =⇒ P (b + 3) ,
....

In the domino metaphor (see Remark 1.2.3), the base case tips over the first
domino, and the induction step ensures that each domino falls from the impact
of the previous domino’s falling.

1.9.2. Strong induction


Now, assume that the b + 2-domino (i.e., P (b + 2)) falls not from the impact of
the previous domino P (b + 1), but rather from the combined force of the domi-
nos P (b) and P (b + 1). This would still suffice, because the latter two dominos
have already fallen. In other words, instead of the implication P (b + 1) =⇒
P (b + 2), we could just as well prove the implication

( P (b) AND P (b + 1)) =⇒ P (b + 2) ,

which is somewhat weaker (since it assumes more to get to the same conclu-
sion) but nevertheless gives the same result. Likewise, we could just as well
replace the implication P (b + 2) =⇒ P (b + 3) by the weaker implication

( P (b) AND P (b + 1) AND P (b + 2)) =⇒ P (b + 3) .

More generally, for each n > b, instead of proving the implication P (n − 1) =⇒


P (n), it will suffice to prove the weaker implication

( P (b) AND P (b + 1) AND P (b + 2) AND · · · AND P (n − 1)) =⇒ P (n)


| {z }
i.e., the statement P(k) holds for each k∈{b,b+1,...,n−1}

(so that the domino P (n) is tipped over by the combined force of all the pre-
ceding dominos, not just the one domino directly to its left).
This induction principle is called strong induction. Explicitly, it says the
following:
Math 221 Winter 2024, version March 12, 2024 page 40

Theorem 1.9.3 (Principle of Strong Induction). Let b be an integer.


Let P (n) be a mathematical statement defined for each integer n ≥ b.
Assume the following:

1. “Base case”: The statement P (b) holds.

2. “Induction step”: For each integer n > b, the implication

( P (b) AND P (b + 1) AND P (b + 2) AND · · · AND P (n − 1)) =⇒ P (n)

holds.

Then, the statement P (n) holds for every integer n ≥ b.

Proofs using this principle are called proofs by strong induction (or strong
induction proofs). They differ from proofs by (regular) induction as follows:
In the induction step of a strong induction proof, you can use not just the pre-
ceding statement P (n − 1), but also all the statements before it (P (n − 2) and
P (n − 3) and so on, all the way down to P (b)). In other words, the induc-
tion hypothesis is now stronger (thus the name “strong induction”). Roughly
speaking, strong induction is “induction with a long memory” (as opposed to
regular induction, whose memory only is 1 step long).14
(We will later see a slightly nicer form of strong induction, in which the base
case is incorporated in the induction step.)

Before we see an example of a strong induction proof, let me explain why


it works. Let’s say you have proved a statement P (n) for all n ≥ 0 by strong
induction. Thus,

• you have proved P (0) (this is the base case);

• you have proved the implication P (0) =⇒ P (1) (this is the induction step
for n = 1), so you conclude that P (1) holds (since P (0) holds);

14 Aremark for the logically inclined:


Surprisingly, the Principle of Strong Induction is logically equivalent to the regular Prin-
ciple of Mathematical Induction (i.e., each of the two principles can be derived from the
other). Thus, we don’t need to assume the former as an extra axiom (once we have assumed
the latter). See [Grinbe15, §2.8.1] for how the former can be derived from the latter.
In essence, this means that strong induction is just a “more convenient user interface” for
regular induction; everything that can be proved using strong induction can still be proved
using regular induction. (But it requires a little trick: If you can prove P (n) by strong
induction on n, then you can prove the statement

Q (n) := ( P (b) AND P (b + 1) AND P (b + 2) AND · · · AND P (n))

by regular induction on n, and then you can derive P (n) from Q (n).)
Math 221 Winter 2024, version March 12, 2024 page 41

• you have proved the implication ( P (0) AND P (1)) =⇒ P (2) (this is the
induction step for n = 2), so you conclude that P (2) holds (since P (0)
and P (1) hold);

• you have proved the implication ( P (0) AND P (1) AND P (2)) =⇒ P (3)
(this is the induction step for n = 3), so you can conclude that P (3) holds
(since P (0) and P (1) and P (2) hold);

• and so on.

1.9.3. Example: Proof of Binet’s formula


Let us now prove Binet’s formula by strong induction:
Proof of Theorem 1.8.4 (i.e., of Binet’s formula). We strongly induct on n (i.e., we
use strong induction on n). That is, we let P (n) denote the statement

φn − ψn
 
“ fn = √ ”
5
for each n ≥ 0, and we apply the Principle of Strong Induction (for b = 0) to
prove this statement P (n) for each n ≥ 0.
Base case: As above, we check that Binet’s formula (i.e., the statement P (n))
holds for n = 0.
Induction step: Let n > 0 be an integer. We must prove the implication

( P (0) AND P (1) AND P (2) AND · · · AND P (n − 1)) =⇒ P (n) .

Thus, we assume that P (0) AND P (1) AND P (2) AND · · · AND P (n − 1)
holds. In other words, we assume that Binet’s formula holds for 0, for 1, for 2,
φk − ψk
and so on, all the way up to n − 1. (In other words, we assume that f k = √
5
for each k ∈ {0, 1, . . . , n − 1}.)
We have to prove P (n). In other words, we have to prove that Binet’s formula
φn − ψn
also holds for n. In other words, we have to prove that f n = √ .
5
We assumed that Binet’s formula holds for n − 1. That is, we have f n−1 =
φ −1 − ψ n −1
n
√ .
5
We assumed that Binet’s formula holds for n − 2. That is, we have f n−2 =
φ −2 − ψ n −2
n
√ .
5
As we have seen above, we have φ2 = φ + 1 and ψ2 = ψ + 1.
Math 221 Winter 2024, version March 12, 2024 page 42

But the recursive definition of the Fibonacci sequence yields

φ n −1 − ψ n −1 φ n −2 − ψ n −2
f n = f n −1 + f n −2 = √ + √
5 5

φ n − 1 −ψ n − 1 φ n −2 − ψ n −2

since f n−1 = √ and f n−2 = √
5 5
1  
= √ φ n −1 − ψ n −1 + φ n −2 − ψ n −2
5
 

1  n −1 n −2

n −1

n −2 
= √ φ +φ − ψ +ψ 
5 | {z } | {z }
= φ n −2 ( φ +1 ) = ψ n −2 ( ψ +1 )
 
1 
= √  φ n −2 ( φ + 1 ) − ψ n −2 ( ψ + 1 ) 

5 | {z } | {z }
= φ2 = ψ2
 
1  1 φn − ψn
= √  φ n −2 φ 2 − ψ n −2 ψ 2  = √ ( φ n − ψ n ) = √ .

5 | {z } | {z } 5 5
n=φ n =ψ

So we have proved Binet’s formula for n. Right?


..............................
Wait a moment! We have assumed (as the induction hypothesis) that Binet’s
formula holds for each of the numbers 0, 1, . . . , n − 1. But then we have used
it for n − 2 and for n − 1. This tacitly relied on the fact that n − 2 and n − 1
are among the numbers 0, 1, . . . , n − 1. However, this fact is only true if n ≥ 2.
If n = 1, then n − 2 is not among the numbers 0, 1, . . . , n − 1 (because it is
negative).
So our induction step worked for n = 2, 3, 4, . . . but not for n = 1. What can
we do?
We can fix this by just proving the claim for n = 1 by hand. So we must
φ1 − ψ1
prove that f 1 = √ . This can be checked by a direct computation:
5
√ √
1+ 5 1− 5 √
φ1 − ψ1 φ−ψ − 5
√ = √ = 2 √ 2 = √ = 1 = f1.
5 5 5 5
Now our induction step is really complete, and Binet’s formula is proved.

Let us summarize: We have used strong induction in our above proof of


Theorem 1.8.4, because the “extra memory” in a strong induction step allowed
us to express not just f n−1 but also f n−2 via the induction hypothesis.
Math 221 Winter 2024, version March 12, 2024 page 43

Note that we have had to handle the two cases n = 0 and n = 1 by hand
in our above proof, because we had to reach “2 steps back” in memory in the
induction step (i.e., we had to apply the induction hypothesis both to n − 1 and
to n − 2). 15 The case n = 0 was our base case, whereas the case n = 1 was
part of the induction step, but nevertheless had to be singled out for special
treatment (since n − 2 is negative for n = 1). Nevertheless, it makes sense to
think of the n = 1 case as a “second base case”, even if it is de-jure part of the
induction step.

1.9.4. Baseless strong induction


You can actually reformulate the principle of strong induction in a form that
does not have a de-jure base case at all:

Theorem 1.9.4 (Principle of Strong Induction, restated). Let b be an integer.


Let P (n) be a mathematical statement defined for each integer n ≥ b.
Assume the following:

• “Induction step”: For each integer n ≥ b, the implication

( P (b) AND P (b + 1) AND P (b + 2) AND · · · AND P (n − 1)) =⇒ P (n)

holds.

Then, the statement P (n) holds for every integer n ≥ b.

How does this restated principle work without a base case? Easy: We have
just repackaged the base case into the induction step. Indeed, note that the
induction step now says “n ≥ b”, not “n > b”. In particular, this means that the
implication

( P (b) AND P (b + 1) AND P (b + 2) AND · · · AND P (n − 1)) =⇒ P (n)

has to hold for n = b. However, for n = b, the antecedent (= if-part) of this


implication is a tautology (i.e., is an empty statement that is automatically true
by dint of its emptiness16 ), and thus proving this implication is tantamount to
just unconditionally proving P (b), which was what we previously viewed as
15 Had we reached further back, we would have needed extra cases (e.g., if we had applied the
induction hypothesis to n − 5, then we would have to handle all the cases n = 0, 1, 2, 3, 4 by
hand).
16 Don’t believe it? Observe that this antecedent

( P (b) AND P (b + 1) AND P (b + 2) AND · · · AND P (n − 1))

is a conjunction of n − b statements (since there are n − b numbers between b and n − 1


inclusive). If n = b, this means that it is a conjunction of b − b = 0 statements, i.e., of no
statements whatsoever. So it is an empty statement, automatically true.
Math 221 Winter 2024, version March 12, 2024 page 44

our base case. So we have not magically removed the need for a base case;
we just have merged it into the induction step. Nevertheless, this makes for a
slightly cleaner version of strong induction.

1.9.5. Example: Prime factorizations exist


Another example of a strong induction proof comes from elementary number
theory. We recall two basic definitions (more on this later, when we cover
number theory):

Definition 1.9.5. Let b be an integer. A divisor of b means an integer a


satisfying a | b.

For example, the divisors of 6 are 1, 2, 3, 6, −1, −2, −3, −6.

Definition 1.9.6. A prime (or prime number) means an integer p > 1 whose
only positive divisors are 1 and p.

So the primes (in increasing order) are

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, . . . .

There are infinitely many primes, as we will show later.

Theorem 1.9.7. Every positive integer is a product of finitely many primes.

Here and in the following, I understand an empty product (i.e., a product of


no numbers whatsoever) to be 1. Thus, Theorem 1.9.7 does hold for 1, since 1
is a product of no primes.
Here are more interesting examples:

• 2023 = 7 · 17 · 17 is a product of three primes.

• 2024 = 2 · 2 · 2 · 11 · 23 is a product of five primes.

• 2 = 2 is a product of one prime (namely, 2 itself).

How do we prove Theorem 1.9.7 in general?


Proof of Theorem 1.9.7. We must prove the statement

P (n) = (“n is a product of finitely many primes”)

for each integer n ≥ 1.


We shall prove this by strong induction on n. (We use the original variant of
strong induction, with a base case.)
Base case: P (1) is true, since 1 is a product of finitely many primes (specifi-
cally, of 0 primes, as we saw).
Math 221 Winter 2024, version March 12, 2024 page 45

Induction step: Let n > 1. We must prove the implication

( P (1) AND P (2) AND · · · AND P (n − 1)) =⇒ P (n) .

So we assume that P (1) AND P (2) AND · · · AND P (n − 1) holds. We must


prove that P (n) holds.
In other words, we must prove that n is a product of finitely many primes.
We are in one of the following two cases:
Case 1: The only positive divisors of n are 1 and n.
Case 2: There is a positive divisor d of n that is neither 1 nor n.
(Other cases are not possible, since 1 and n always are positive divisors of n.)
Consider Case 1 first. In this case, n itself is a prime (by the definition of a
prime), and thus is a product of finitely many primes (namely, of just 1 prime:
itself). Thus, P (n) holds in Case 1.
Now, consider Case 2. In this case, there is a positive divisor d of n that
is neither 1 nor n. Consider such a d (you might have to choose one, but
any choice is fine). Since d is a positive divisor of n, we have 1 ≤ d ≤ n
(strictly speaking, this needs to be proved, but we take this for granted here17 ).
Therefore, 1 < d < n (since d is neither 1 nor n). Hence, d is one of the numbers
1, 2, . . . , n − 1 (actually 2, 3, . . . , n − 1, but we don’t care).
n
Furthermore, is an integer (since d is a divisor of n) and positive (since n
d
n n
and d are positive). Multiplying the inequality 1 < d by , we obtain 1 · <
d d
n 18
d · (since we can always divide an inequality by a positive number ). In
d
n n n
other words, < n. Since is a positive integer, we thus conclude that is
d d d
one of the numbers 1, 2, . . . , n − 1.
Now, our induction hypothesis says that P (1) AND P (2) AND · · · AND
P (n − 1) holds. In particular, P (d) holds (since d is one of the numbers 1, 2, . . . , n −
1). In other words, d is a product of primes. That is, we can write d as

d = p1 p2 · · · p k for some primes p1 , p2 , . . . , pk .

Consider these primes p1 , p2 , . . . , pk .


Again, our induction hypothesis says that P (1) AND P (2) AND · · · AND
n n
P (n − 1) holds. In particular, P holds (since is one of the numbers
d d
n
1, 2, . . . , n − 1). In other words, is a product of primes. That is, we can write
d
n
as
d
n
= q1 q2 · · · q ℓ for some primes q1 , q2 , . . . , qℓ .
d
17 Actually, the inequality 1 ≤ d is obvious (since d is a positive integer), whereas the inequality
d ≤ n follows from Proposition 3.1.4 (c).
18 This is a basic fact that we are taking for granted.
Math 221 Winter 2024, version March 12, 2024 page 46

Consider these primes q1 , q2 , . . . , qℓ .


Now,
n
n = d · = p1 p2 · · · p k · q1 q2 · · · q ℓ
d
n
(since d = p1 p2 · · · pk and = q1 q2 · · · qℓ ). This shows that n is a product of
d
primes (since p1 , p2 , . . . , pk as well as q1 , q2 , . . . , qℓ are primes). In other words,
P (n) holds. Thus, we have proved P (n) in Case 2.
Now, we have proved P (n) both in Case 1 and Case 2. Therefore, P (n)
always holds. Thus, the induction step is complete, and Theorem 1.9.7 is
proven.
The above proof is just reflecting the elementary recursive algorithm for fac-
toring an integer n into a product of primes: We search for a positive divisor d
of n that is neither 1 nor n. If such a d does not exist, then n itself is a prime. If
n
it does, then we are reduced to the simpler problems of factoring d and , and
d
just have to multiply the resulting factorizations at the end.

1.9.6. Example: Paying with 3-cent and 5-cent coins


Here is another example of how strong induction can be used:

Exercise 1.9.1. Assume that you have 3-cent coins and 5-cent coins (each in
infinite supply). What denominations can you pay with these coins?

Let’s make a table (“yes” means that you can pay it; “no” means that you
Math 221 Winter 2024, version March 12, 2024 page 47

can’t):
0 cents yes
1 cents no
2 cents no
3 cents yes
4 cents no
5 cents yes
6 cents yes: 2 · 3
7 cents no .
8 cents yes: 3 + 5
9 cents yes: 3 · 3
10 cents yes: 2 · 5
11 cents yes: 2 · 3 + 5
12 cents yes: 4 · 3
13 cents yes: 3 + 2 · 5
··· ···
Experimentally, we seem to observe that any denomination ≥ 8 cents can be
paid. Why?
We can notice that if a denomination k (that is, k cents) can be paid, then
so can k + 3 (just add a 3-cent coin). Thus, because we can pay 8 cents, we
can also pay 11, 14, 17, . . . cents. Because we can pay 9 cents, we can also pay
12, 15, 18, . . . cents. Because we can pay 10 cents, we can also pay 13, 16, 19, . . .
cents. Together, these three sequences account for all the integers ≥ 8. Thus,
any denomination of ≥ 8 cents can be paid.
Let us formalize this argument as an induction proof.
We define N to be the set of all nonnegative integers:

N = {0, 1, 2, . . .} .

Proposition 1.9.8. For any integer n ≥ 8, we can pay n cents with 3-cent and
5-cent coins. In other words, any integer n ≥ 8 can be written as n = 3a + 5b
with a, b ∈ N.

Proof. We proceed by strong induction on n:


Base case: For n = 8, the claim is true, since 8 = 3 · 1 + 5 · 1.
Induction step: Fix an integer n > 8. Assume that the proposition is already
proved for all the integers 8, 9, . . . , n − 1. We must prove that it also holds for n.
Math 221 Winter 2024, version March 12, 2024 page 48

In other words, we must prove that we can pay n cents with 3-cent and 5-cent
coins.
We are in one of the following three cases (since n > 8):
Case 1: We have n = 9.
Case 2: We have n = 10.
Case 3: We have n ≥ 11.
In Case 1, we are done, since n = 9 = 3 · 3 + 5 · 0 (that is, n cents can be paid
with three 3-cent coins).
In Case 2, we are done, since n = 10 = 3 · 0 + 5 · 2 (that is, n cents can be paid
with two 5-cent coins).
Now, consider Case 3. In this case, we have n ≥ 11. Hence, n − 3 ≥ 8. This
shows that n − 3 is one of the numbers 8, 9, . . . , n − 1.
Thus, we can apply the induction hypothesis to n − 3. We conclude that
n − 3 cents can be paid with 3-cent and 5-cent coins, i.e., we can write n − 3 as
n − 3 = 3c + 5d with c, d ∈ N. Using these c, d ∈ N, we therefore have
n = 3 + 3c + 5d (since n − 3 = 3c + 5d)
= 3 (c + 1) + 5d,
which shows that n cents can also be paid with 3-cent and 5-cent coins. This
shows that the proposition is true for n, and thus the induction step is complete.
The proposition is thus proved.
Note that the above proof had one “de-jure base case” (the case n = 8) and
two “de-facto base cases” (the cases n = 9 and n = 10, which were formally
part of the induction step but had to be treated separately because n − 3 would
be smaller than 8 in these cases). We could have just as well used the baseless
form of strong induction, in which case we would have to treat all three of these
cases as “de-facto base cases”. This would be a bit more uniform, although this
is entirely a matter of taste.

1.10. More exercises


Let us finish this chapter with some further exercises on induction.

1.10.1. A fake proof

Exercise 1.10.1. Find the error(s) in the following fake proof:


We claim that 3n = 1 for each n ∈ N.
“Proof:” We proceed by strong induction on n. So we let n ∈ N be arbitrary,
and we assume (as the induction hypothesis) that 3k = 1 for each k < n. We
must now prove that 3n = 1.
By our induction hypothesis, we have 3n−1 = 1 (since n − 1 < n) and
2
n − 2 n
3n −1
3 = 1 (since n − 2 < n). Now, 3 = (since the laws of exponents
3n −2
Math 221 Winter 2024, version March 12, 2024 page 49

2
3n −1
yield = 32·(n−1)−(n−2) = 3n ). In view of 3n−1 = 1 and 3n−2 = 1, this
3n −2
12
rewrites as 3n = = 1. This completes the induction step, and thus the
1
claim is proved.

1.10.2. Negative Fibonacci numbers


Recall again the Fibonacci sequence ( f 0 , f 1 , f 2 , . . .) from Definition 1.5.1. Let us
now extend this sequence “to the left” by defining f n not only for nonnegative
integers n, but also for negative integers n. To do so, we simply rewrite the
equation f n = f n−1 + f n−2 (which we used to recursively define the Fibonacci
sequence) as f n−2 = f n − f n−1 . This allows us to compute f n−2 from f n and
f n−1 . Thus, we can compute f −1 from f 1 and f 0 , then compute f −2 from f 0 and
f −1 , and so on:

f −1 = f 1 − f 0 = 1 − 0 = 1;
f −2 = f 0 − f −1 = 0 − 1 = −1;
f −3 = f −1 − f −2 = 1 − (−1) = 2;
f −4 = f −2 − f −3 = (−1) − 2 = −3;
....

Thus, we gradually extend the Fibonacci sequence to the left, obtaining a “two-
sided sequence” (. . . , f −2 , f −1 , f 0 , f 1 , f 2 , . . .) that is “infinite in both directions”.
By virtue of its construction, it satisfies f n = f n−1 + f n−2 not only for all n ≥ 2,
but also for all integers n. However, a quick look at the first (say) 7 “extended”
Fibonacci numbers to the left of f 0 reveals that they are not as new as they
might seem: They are just copies of the positive Fibonacci numbers with signs.
More precisely, it looks like we have

f −n = (−1)n−1 f n for each n ≥ 0. (5)

Exercise 1.10.2. (a) Try to prove (5) directly by induction on n. (So the
induction step involves assuming that f −n = (−1)n−1 f n and proving that
f −(n+1) = (−1)n f n+1 . Don’t use strong induction yet!) Does this work?

(b) Now, instead, try to prove the stronger claim that “ f −n = (−1)n−1 f n
and f −n+1 = (−1)n−2 f n−1 for each n ≥ 0” by induction on n. Does this
work?
(c) Now, prove (5) by strong induction on n.

1.10.3. More on the Hanoi tower


Math 221 Winter 2024, version March 12, 2024 page 50

Exercise 1.10.3. Let n ≥ 0 be an integer, and let k ∈ {1, 2, . . . , n}. In the proof
of Proposition 1.1.3, we presented a certain strategy for solving the Tower of
Hanoi puzzle with n disks.
Prove that the k-th largest disk is moved exactly 2k−1 many times during
this strategy

1.10.4. More on recursively defined sequences

Exercise 1.10.4. Let ( a0 , a1 , a2 , . . .) be a sequence of integers defined recur-


sively by

a0 = 2, a1 = 3,
an = 3an−1 − 2an−2 for all n ≥ 2.

Prove that an = 2n + 1 for each integer n ≥ 0.

Exercise 1.10.5. Let ( a0 , a1 , a2 , . . .) be a sequence of integers defined recur-


sively by

a0 = 2, a1 = 1,
an = an−1 + 6an−2 for all n ≥ 2.

Prove that an = 3n + (−2)n for each n ∈ N.

Exercise 1.10.6. Recall the Fibonacci sequence (Definition 1.5.1) again.


(a) Let k be a nonnegative integer. Show that

f n2 − f n+k f n−k = (−1)n−k f k2 for every integer n ≥ k.

(b) Which of the previously posed exercises does this generalize?

Exercise 1.10.7. Recall the Fibonacci sequence (Definition 1.5.1) again. Let
n ≥ 0. Prove that

f 3n is even;
f 3n+1 is odd;
f 3n+2 is odd.

(In this exercise, you can freely use basic properties of even and odd num-
bers – such as Proposition 3.3.8.)
Math 221 Winter 2024, version March 12, 2024 page 51

Exercise 1.10.8. Define a sequence (t0 , t1 , t2 , . . .) of positive rational numbers


recursively by setting

t0 = 1, t1 = 1, t2 = 1, and
1 + t n −1 t n −2
tn = for each n ≥ 3.
t n −3

(So its next entries after t2 are

1 + t2 t1 1+1·1
t3 = = = 2;
t0 1
1 + t3 t2 1+2·1
t4 = = = 3;
t1 1
1 + t4 t3 1+3·2
t5 = = = 7;
t2 1
1 + t5 t4 1+7·3
t6 = = = 11,
t3 2
and so on.)
(a) Prove that tn+2 = 4tn − tn−2 for each n ≥ 2.
(b) Prove that tn is a positive integer for each integer n ≥ 0.
[Hint: Use regular induction for part (a) and strong induction for part (b).
Note that the “positive” part is clear from the definition, so you only need to
prove the “integer” part in (b).]

1.10.5. More coin problems

Exercise 1.10.9. (a) Prove the following: For any integer n ≥ 12, we can pay
n cents with 3-cent and 7-cent coins. In other words, any integer n ≥ 12 can
be written as n = 3a + 7b with a, b ∈ N. (Here, again, N = {0, 1, 2, . . .}.)
(b) Find the largest integer k such that k cents cannot be paid with 2-cent
and 13-cent coins. Prove that for every integer n > k, we can pay n cents
with these kinds of coins.
(c) Is there a largest integer k such that k cents cannot be paid with 2-cent
and 6-cent coins?

1.10.6. A bit of matrix algebra


The next two exercises are about matrix multiplication. For an introduction
to matrix multiplication, see any textbook on linear algebra (e.g., [BoyVan18,
§10.1] covers it in detail). However, all we need for these exercises will be
2 × 2-matrices, so let us recall how matrix multiplication works for them:
Math 221 Winter 2024, version March 12, 2024 page 52

   
a b x y
• The product AX of two 2 × 2-matrices A = and X =
c d z w
 
ax + bz ay + bw
is defined to be .
cx + dz cy + dw

• The n-th power An of a 2 × 2-matrix A is defined to be the product


· · · A}.
|AA{z
n factors
 n  
1 1 1 n
Exercise 1.10.10. (a) Prove that = for each positive in-
0 1 0 1
teger n.
 n
a b
(b) Find a formula for , where a, b, c are real numbers and n is a
0 c
positive integer.

Exercise 1.10.11. Recall the Fibonacci sequence ( f 0 , f 1 , f 2 , . . .) =


(0, 1, 1, 2, 3, 5, . . .). Prove that
 n  
1 1 f n +1 fn
= for each positive integer n.
1 0 fn f n −1

1.10.7. More induction proofs

Exercise 1.10.12. Let m ∈ N. Prove that there exists a way to arrange the
first m positive integers (1, 2, . . . , m) in a row in such a way that the average
of two numbers never stands between these two numbers.
(For example, for m = 8, one such arrangement is 1, 5, 3, 7, 2, 6, 4, 8. The
arrangement 1, 3, 2, 7, 8, 5, 6, 4 is invalid because the average of 1 and 5 is 3,
which stands between 1 and 5.)
[Hint: First show that there is such an arrangement when m is a power of
2 (that is, when m = 2n for some n ∈ N). Then, choose a sufficiently large
power of 2 and remove all entries larger than m.]

More advanced and creative uses of induction can be found in [Grinbe20, Chap-
ter 2], [Grinbe23b, Lecture 1], [AndCri17] and [Gunder10].
Math 221 Winter 2024, version March 12, 2024 page 53

2. Sums and products


2.1. Finite sums
Previously, we have encountered sums such as

x n−1 + x n−2 y + x n−3 y2 + · · · + x2 yn−3 + xyn−2 + yn−1

(in Section 1.6). Such sums can be tricky to decipher: You need to guess the
pattern of the addends to understand what the “· · · ” means. There is a notation
that makes such sums both shorter and easier to understand. This is the finite
sum notation (also known as the sigma notation). In its simplest form, it is
defined as follows:

Definition 2.1.1. Let u and v be two integers. Let au , au+1 , . . . , av be some


numbers. Then,
v
∑ ak
k=u
is defined to be the sum

a u + a u +1 + · · · + a v

(in more detail: au + au+1 + au+2 + au+3 + · · · + av−1 + av ). It is called the


sum of the numbers ak where k ranges from u to v. When v < u, this sum
is called empty and defined to be 0.
Math 221 Winter 2024, version March 12, 2024 page 54

For example:
10
∑ k = 5 + 6 + 7 + 8 + 9 + 10 = 45;
k =5
10
1 1 1 1 1 1 1 2131
∑ k
= + + + + +
5 6 7 8 9 10
=
2520
;
k =5
10
∑ kk = 55 + 66 + 77 + 88 + 99 + 1010;
k =5
5
∑ k = 5;
k =5
4
∑k=0 (an empty sum) ;
k =5
3
∑k=0 (an empty sum) ;
k =5
8
∑ 3 = 3 + 3 + 3 + 3 = 12 (a sum of four equal terms) ;
k =5
n −1
∑ q k = q 0 + q 1 + · · · + q n −1 for any n ∈ N and any number q;
k =0
n −1
∑ x k y n −1− k = x 0 y n −1 + x 1 y n −2 + x 2 y n −3 + · · · + x n −3 y 2 + x n −2 y 1 + x n −1 y 0
k =0
= yn−1 + xyn−2 + x2 yn−3 + · · · + x n−3 y2 + x n−2 y + x n−1
= x n−1 + x n−2 y + x n−3 y2 + · · · + x2 yn−3 + xyn−2 + yn−1
for any n ∈ N and any numbers x and y.

Thus, Theorem 1.6.2 is saying that


!
n −1
( x − y) ∑ x k y n −1− k = x n − yn
k =0

for any numbers x and y and any n ∈ N.


The variable k is not set in stone; you can replace it by any other variable
(unless this other variable already stands for something else). For example,
v v v v
∑ ak = ∑ ai = ∑ aⓈ = ∑ a♠ .
k=u i =u Ⓢ= u ♠=u

v
Just don’t make it ∑ au .
u=u
Math 221 Winter 2024, version March 12, 2024 page 55

Here are a couple more examples: For any n ∈ N, we have


n
n ( n + 1)
∑ k = 1+2+···+n = 2
(by Theorem 1.3.1) ;
k =1
n
∑ k2 = 12 + 22 + · · · + n2
k =1
n (n + 1) (2n + 1)
= (by Theorem 1.3.2) ;
6
n
∑ 1 = 1| + 1 +{z· · · + 1} = n · 1 = n;
k =1 n times
n
∑ (2k − 1) = (2 · 1 − 1) + (2 · 2 − 1) + (2 · 3 − 1) + · · · + (2n − 1)
k =1
= 1 + 3 + 5 + · · · + (2n − 1)
= (the sum of the first n odd positive integers) .

We have not computed this last sum, so let us do this. I will use the following
“laws of summation”:

• We have
v v v
∑ ( a k − bk ) = ∑ ak − ∑ bk (6)
k=u k=u k=u
for any integers u, v and any numbers ak , bk . Indeed, if you rewrite this
without finite sum notation, it takes the form

( a u − bu ) + ( a u + 1 − bu + 1 ) + · · · + ( a v − bv )
= ( a u + a u + 1 + · · · + a v ) − ( bu + bu + 1 + · · · + bv ) ,

which is rather clear. (A formal proof can be given by induction on v.)

• We have
v v
∑ λak = λ ∑ ak (7)
k=u k=u
for any integers u, v and any numbers λ, ak . Indeed, rewritten without the
use of finite sum notation, this is just saying that

λau + λau+1 + · · · + λav = λ ( au + au+1 + · · · + av ) ,

which is again clear (and can be proved by induction on v).

Rules like this are dime a dozen, and you should be able to come up with
them on the spot when you need them. (See [Grinbe15, §1.4.2] for these and
several others.)
Math 221 Winter 2024, version March 12, 2024 page 56

Let us now compute our sum:


n n n
∑ (2k − 1) = ∑ 2k − ∑ 1 (by (6))
k =1 k =1 k =1
n n
=2 ∑ k − ∑1 (by (7))
k =1 k =1
|{z} |{z}
n ( n + 1) =n
=
2
n ( n + 1)
= 2· − n = n ( n + 1) − n = n2 .
2
As another illustration of the use of our notation, we can rewrite Gauss’s
proof of the equality
n
n ( n + 1)
∑ k = 1+2+···+n = 2
(8)
k =1

(Theorem 1.3.1) using finite sum notation. We will need three new rules this
time:

• We have
v v v
∑ ak + ∑ bk = ∑ ( a k + bk ) (9)
k=u k=u k=u
for any integers u, v and any numbers ak , bk . Indeed, if you rewrite this
without finite sum notation, it takes the form

( a u + a u + 1 + · · · + a v ) + ( bu + bu + 1 + · · · + bv )
= ( a u + bu ) + ( a u + 1 + bu + 1 ) + · · · + ( a v + bv ) .

• We have
v v
∑ ak = ∑ au+v−k (10)
k=u k=u
for any integers u, v and any numbers ak . This is called “substituting
u + v − k for k in the sum” or just “turning the sum upside-down”, as
it amounts to reversing the order of the addends; restated without finite
sum notation, this is just saying that

a u + a u +1 + · · · + a v = a v + a v −1 + · · · + a u ,

which is saying that a sum of a bunch of numbers does not change if we


add its addends together in reverse order.
Math 221 Winter 2024, version March 12, 2024 page 57

• For any integers u ≤ v and any number λ, we have


v
∑ λ = (v − u + 1) λ. (11)
k=u

(This is just saying that a sum of v − u + 1 many equal addends λ is


(v − u + 1) λ. Note that the sum on the left hand side has v − u + 1 ad-
dends, because there are v − u + 1 numbers in the set {u, u + 1, . . . , v}.)

Now, Gauss’s proof of (8) takes the following shape:


n n n
2 ∑ k= ∑ k+ ∑k
k =1 k =1 k =1
n n
= ∑ k + ∑ (n + 1 − k)
k =1 k =1
 
here, we substituted n + 1 − k for k in the second
sum (i.e., rewrote it using (10))
n
= ∑ |(k + (n {z
+ 1 − k)) (by (9))
k =1
}
= n +1
n
= ∑ ( n + 1) = n · ( n + 1) (by (11)) .
k =1

Dividing both sides by 2, we recover (8) again.

We have found closed-form expressions (i.e., expressions without ∑ signs or


“· · · ”s) for several sums. Not every sum has a closed-form expression. For
instance, there is no closed form for
n n
1
∑k or for ∑ kk .
k =1 k =1

Some more terminology:


v
The notation ∑ ak is called sigma notation or finite sum notation. The
k=u
symbol ∑ itself is called the summation sign. The numbers u and v are called
the lower limit and the upper limit of the summation19 . The variable k is called
the summation index or the running index, and is said to range (or run) from
u to v. The numbers ak are called the addends of the finite sum.

19 Thisuse of the word “limit” is totally unrelated to the way this word is used in analy-
sis/calculus.
Math 221 Winter 2024, version March 12, 2024 page 58

v Rv
There are many similarities between finite sums ∑ ak and integrals f ( x ) dx,
k=u u
Ru
but the analogy should not be taken too far (e.g., an integral f ( x ) dx whose
u
upper and lower limit are equal will always be 0, but an “analogous” finite sum
u
∑ ak will be au ).
k=u

We note two more rules for finite sums:


• The “splitting-off rule”: For any integers u ≤ v and any numbers au , au+1 , . . . , av ,
we have
v v −1 v
∑ ak = ∑ ak + av = au + ∑ ak .
k=u k=u k = u +1
This is just saying that
a u + a u +1 + · · · + a v = ( a u + a u +1 + · · · + a v −1 ) + a v
= a u + ( a u +1 + a u +2 + · · · + a v ) .
This rule allows us to split the first or the last addend out of a finite sum.
This is important for proofs by induction.
v
• More generally, any finite sum ∑ ak can be split at any point: We have
k=u
v w v
∑ ak = ∑ ak + ∑ ak
k=u k=u k = w +1

for any integers u ≤ w ≤ v and any numbers ak . This is just saying that
a u + a u +1 + · · · + a v = ( a u + a u +1 + · · · + a w ) + ( a w +1 + a w +2 + · · · + a v ) .
(Strictly speaking, this is true not just for u ≤ w ≤ v but more generally
for u − 1 ≤ w ≤ v. If you find this confusing, recall that an empty sum
equals 0 by definition.)

Finite sum notation, in the form defined above, is helpful when the summa-
tion index is running over an integer interval (i.e., a set of consecutive integers).
For more general situations, there is a more general version of finite sum nota-
tion, e.g.:
∑ k = 2 + 4 + 6 + · · · + m,
k∈{1,2,...,n} is even

where m is the largest even element of {1, 2, . . . , n}. We won’t use it much, but
it is fairly self-explanatory; essentially, the writing under the summation sign
explains what k’s the sum is ranging over. See [Grinbe15, §1.4.1] for a more
precise explanation.
Math 221 Winter 2024, version March 12, 2024 page 59

Exercise 2.1.1. Let n ∈ N. Prove that


n
( n − 1) n ( n + 1)
∑ k (n − k) = 6
.
k =0

Exercise 2.1.2. Let n ∈ N.


(a) Prove that
n
1 n
∑ k ( k + 1)
=
n+1
.
k =1
(b) More generally: Let b and d be two numbers, and let

ai := b + id for each i ∈ {1, 2, . . . , n + 1} .

(Thus, ( a1 , a2 , . . . , an+1 ) is what is called an arithmetic progression – i.e., a


sequence of numbers that increase from each to the next by the same amount
d.) Assume that all the n + 1 numbers a1 , a2 , . . . , an+1 are nonzero. Prove that
n
1 n
∑ a a
=
a 1 a n +1
.
k =1 k k +1

Exercise 2.1.3. The floor ⌊ x ⌋ of a real number x means the largest integer that
is smaller or equal to x. For instance, ⌊6.2⌋ = 6 and ⌊7.7⌋ = 7 and ⌊8⌋ = 8.
(In other words, ⌊ x ⌋ is what you get if you round x down. Beware: ⌊−1.3⌋ is
−2, not −1.)
Let n ∈ N. Prove that
n   jnk n + 1
k
∑ 2 = 2 · 2 .
k =1

(In this exercise, you can freely use basic properties of even and odd num-
bers – such as Proposition 3.3.8.)

Exercise 2.1.4. Let n ∈ N, and let q be any number distinct from 1. Prove
that
n
nqn+1 − (n + 1) qn + 1
∑ kq k
= q ·
( q − 1)2
.
k =1

Exercise 2.1.5. Let n ∈ N. Prove that


1| + 2 +{z· · · + n} = n2 − (n − 1)2 + (n − 2)2 − (n − 3)2 ± · · · + (−1)n−1 12 .
| {z }
n n
=∑ k = ∑ (−1)n−k k2
k =1 k =1
Math 221 Winter 2024, version March 12, 2024 page 60

2.2. Finite products


Finite products are analogous to finite sums, just using multiplication instead
of addition:

Definition 2.2.1. Let u and v be two integers. Let au , au+1 , . . . , av be some


numbers. Then,
v
∏ ak
k=u
is defined to be the product

a u a u +1 · · · a v .

It is called the product of the numbers ak where k ranges from u to v. When


v < u, this product is called empty and defined to be 1.

For example:
10
∏ k = 5 · 6 · 7 · 8 · 9 · 10 = 151 200;
k =5
5
1 1 1 1 1 1 1
∏k = · · · · =
1 2 3 4 5 120
;
k =1
5
1 1
∏k = ;
5
k =5
5
1
∏k =1 (an empty product) ;
k =6
n
∏ a = |aa {z
· · · }a = an for any fixed number a and any n ∈ N;
k =1 n times
n
∏ a k = a1 a2 · · · a n
k =1
!
by one of the laws of exponents:
= a1+2+···+n
namely, the law ai1 ai2 · · · ain = ai1 +i2 +···+in
= an(n+1)/2 for any fixed number a and any n ∈ N.

v
In a finite product ∏ ak , the k is called the product index or the running
k=u
index20 , and the symbol ∏ is called the product sign. The numbers ak are
20 And just like in a sum, you can use any letter for it (unless it already stands for something
different).
Math 221 Winter 2024, version March 12, 2024 page 61

called the factors of the product. Other terminology is analogous to the case of
a finite sum (e.g., lower limit, upper limit). Almost all rules for finite sums have
analogues for finite products. Let me only state the analogues of the “splitting-
off rule” and of the rule (6):

• The “splitting-off rule” for products: For any integers u ≤ v and any
numbers au , au+1 , . . . , av , we have
!
v v −1 v
∏ ak = ∏ ak av = au ∏ ak .
k=u k=u k = u +1

This is just saying that

a u a u +1 · · · a v = ( a u a u +1 · · · a v −1 ) a v = a u ( a u +1 a u +2 · · · a v ) .

This rule allows us to split the first or the last factor out of a finite product.
This is important for proofs by induction.

• The analogue of the rule (6) for products: We have


! !
v v v
∏ (ak /bk ) = ∏ ak / ∏ bk (12)
k=u k=u k=u

for any integers u, v and any numbers ak , bk , as long as the numbers bk are
nonzero. This is an analogue of (6), since the multiplicative counterpart
to subtraction is division. (We had to assume that the bk are nonzero in
order for the fractions ak /bk to be well-defined.)

2.3. Factorials
Now, we define a sequence of integers that appears all over mathematics. Recall
that N = {0, 1, 2, . . .}.

Definition 2.3.1. For any n ∈ N, we define the positive integer n! (called the
factorial of n, and often pronounced “n factorial”) by
n
n! = ∏ k = 1 · 2 · · · · · n.
k =1

This is the product of the first n positive integers.


Math 221 Winter 2024, version March 12, 2024 page 62

For example,
0! = (empty product) = 1;
1! = 1 = 1;
2! = 1 · 2 = 2;
3! = 1 · 2 · 3 = 6;
4! = 1 · 2 · 3 · 4 = 24;
5! = 1 · 2 · 3 · 4 · 5 = 120;
6! = 1 · 2 · 3 · 4 · 5 · 6 = 720;
7! = 5 040;
8! = 40 320;
9! = 362 880;
10! = 3 628 800.
Note the following:
Proposition 2.3.2 (recursion of the factorials). For any positive integer n, we
have
n! = (n − 1)! · n.

Proof. Let n be a positive integer. Then,


n! = 1 · 2 · · · · · n = (1 · 2 · · · · · (n − 1)) · n = (n − 1)! · n.
| {z }
=(n−1)!

Exercise 2.3.1. Prove that

1 · 1! + 2 · 2! + 3 · 3! + · · · + n · n! = (n + 1)! − 1

for each n ∈ N.
(Meanwhile, there is no such simple formula for 1! + 2! + 3! + · · · + n!. Not
every sum can be simplified!)
Exercise 2.3.2. (a) Prove that
n  
1 n+1
∏ 1− 2
i
=
2n
i =2

for each positive integer n.


(b) Find and prove a closed-form expression (i.e., no ∏ or ∑ signs) for
n  
1
∏ 1− i .
i =2
Math 221 Winter 2024, version March 12, 2024 page 63

Exercise 2.3.3. Prove that


n n n
∏ k! = ∏ k! = ∏ kn−k+1 for each n ∈ N.
k =0 k =1 k =1

Exercise 2.3.4. Prove that


n  
∏ i! · i i
= n!n+1 for each n ∈ N.
i =1

Exercise 2.3.5. Let ( a0 , a1 , a2 , . . .) be a sequence of integers defined recursively


by
a n = 1 + a 0 a 1 · · · a n −1 for all n ≥ 0.
(In particular, a0 = 1 + a a ···a = 1 + 1 = 2.) Here are the first few
|0 1 {z 0−}1
=(empty product)=1
entries of this sequence:

n 0 1 2 3 4 5 6
an 2 3 7 43 1807 3263443 10650056950807

(notice the astronomical growth!).


(a) Prove that

an+1 = a2n − an + 1 for each n ≥ 0.

(b) Prove that


1 1 1 1
+ +···+ = 1− for each n ≥ 0.
a0 a1 a n −1 an − 1

Exercise 2.3.6. Define a sequence (s0 , s1 , s2 , . . .) of integers recursively by


s0 = 1 and s n = 2 + s 0 s 1 · · · s n −1 for all n ≥ 1.
(Thus, s1 = 3 and s2 = 5 and s3 = 17.)
(a) Prove that sn = s2n−1 − 2sn−1 + 2 for all n ≥ 1.
(b) Prove that
n −1
+1 s n = 22
for all n ≥ 1.
c
(Keep in mind that a “power tower” of the form ab has to be understood as
c c
a(b ) , not as ab .)
n −1
(c) Does this equality sn = 22 + 1 also hold for n = 0 ?
Math 221 Winter 2024, version March 12, 2024 page 64

2.4. Binomial coefficients: Definition


We shall now define one of the most important families of numbers in mathe-
matics:

Definition
  2.4.1. Let n and k be any numbers. Then, we define a number
n
as follows:
k

• If k ∈ N, then we set

n ( n − 1) ( n − 2) · · · ( n − k + 1)
 
n
:= .
k k!

(The numerator here is the product of k factors, where the first factor is
n and each further factor is 1 smaller than the previous. You can also
k −1
write this product as ∏ (n − i ).)
i =0

/ N, then we set
• If k ∈  
n
:= 0.
k
 
n
The number is called “n choose k”, and is known as the binomial
k    
n n
coefficient of n and k. Do not mistake the notation for a vector .
k k

Example 2.4.2. For any number n, we have

n ( n − 1) ( n − 2) n ( n − 1) ( n − 2)
 
n
= = ;
3 3! 6
n ( n − 1) n ( n − 1)
 
n
= = ;
2 2! 2
 
n n
= = n;
1 1!
 
n (empty product) 1
= = = 1;
0 0! 1
 
n
=0 (since 2.5 ∈/ N) ;
2.5
 
n
=0 (since − 1 ∈ / N) .
−1
Math 221 Winter 2024, version March 12, 2024 page 65

For any k ∈ N, we have


(
0 (0 − 1) (0 − 2) · · · (0 − k + 1)
 
0 1, if k = 0;
= =
k k! 0, if k ̸= 0
 
since the product 0 (0 − 1) (0 − 2) · · · (0 − k + 1)
 is empty for k = 0, and otherwise has a 
factor equal to 0 and thus must be 0
−1 (−1) (−2) (−3) · · · (−k) 1·2·····k
 
= = (−1)k · = (−1)k .
k k! | k!
{z }
=1

 
n
Let us tabulate the values of for nonnegative integers n and k:
k

k=0 k=1 k=2 k=3 k=4 k=5 k=6


n=0 1 0 0 0 0 0 0
n=1 1 1 0 0 0 0 0
n=2 1 2 1 0 0 0 0
.
n=3 1 3 3 1 0 0 0
n=4 1 4 6 4 1 0 0
n=5 1 5 10 10 5 1 0
n=6 1 6 15 20 15 6 1

What patterns can we spot in this table? (We are ignoring negative and non-
integer n’s for now.)
The following is probably the most visible one:
 
n
Proposition 2.4.3. Let n ∈ N and k > n. Then, = 0.
k

/ N, then this is clear by definition. Otherwise, again by definition,


Proof. If k ∈
we have
n ( n − 1) ( n − 2) · · · ( n − k + 1)
 
n 0
= =
k k! k!
(since the product n (n − 1) (n − 2) · · · (n − k + 1) has a factor of n − n = 0, and
thus is 0). For example, for n = 3 and k = 6, we have
3 · 2 · 1 · 0 · (−1) · (−2)
 
3 0
= = = 0.
6 6! 6!
Math 221 Winter 2024, version March 12, 2024 page 66

Remark 2.4.4. Note that Proposition 2.4.3 would not hold without the n ∈ N
assumption. For example,

1.5 · 0.5 · (−0.5)


 
1.5
= ̸= 0 even though 3 > 1.5.
3 3!

The product in the numerator is not 0, since it “jumps over” the 0 factor.
 
n
Proposition 2.4.3 explains why our above table of has so many zeroes
k
in it. More precisely, it tells us that all entries above the main diagonal of the
table are zeroes (no matter how many more rows and columns we add). Thus,
we can redraw our table as a triangular table (and fill in a few more rows while
at that):

k =0

k =1
n=0 → 1 ↙
k =2
n=1 → 1 1 ↙
k =3
n=2 → 1 2 1 ↙
k =4
n=3 → 1 3 3 1 ↙
k =5
n=4 → 1 4 6 4 1 ↙
k =6
n=5 → 1 5 10 10 5 1 ↙
k =7
n=6 → 1 6 15 20 15 6 1 ↙
n=7 → 1 7 21 35 35 21 7 1

n=8 → 1 8 28 56 70 56 28 8 1

This table is known as Pascal’s triangle, and has a variety of wonderful prop-
erties. Here are just a few:

• Pascal’s identity, aka the recurrence of the binomial coefficients: For


any numbers n and k, we have

n−1 n−1
     
n
= + .
k k−1 k

• Symmetry
   of binomial
 coefficients: For any n ∈ N and any k, we have
n n
= .
k n−k
Math 221 Winter 2024, version March 12, 2024 page 67

 
n
• We have = 1 for each n ∈ N.
n
• Integrality
  of binomial coefficients: For any n ∈ Z and any k, we have
n
∈ Z.
k
In the next section, we will prove these four propositions and more.

2.5. Binomial coefficients: Properties


2.5.1. Pascal’s identity
We begin with the most important property of binomial coefficients:
Theorem 2.5.1 (Pascal’s identity, aka the recurrence of the binomial coeffi-
cients). For any numbers n and k, we have

n−1 n−1
     
n
= + .
k k−1 k

     
7 6 6
Example 2.5.2. For n = 7 and k = 3, this is claiming that = + ,
3 2 3
which explicitly is saying that 35 = 15 + 20.
But note that Theorem 2.5.1 also can be applied when n or k is negative or
non-integer.
Proof of Theorem 2.5.1. Let n and k be two numbers. We are in one of the follow-
ing three cases:
Case 1: The number k is a positive integer.
Case 2: We have k = 0.
Case 3: None of the above.
Let us first consider Case 1 (this is the interesting case). Here, k is a posi-
tive integer, so that both k and k − 1 belong to N. The definition of binomial
coefficients therefore yields the three formulas
n ( n − 1) ( n − 2) · · · ( n − k + 1)
 
n
= ;
k k!
n−1 (n − 1) (n − 2) (n − 3) · · · ((n − 1) − (k − 1) + 1)
 
=
k−1 ( k − 1) !
( n − 1) ( n − 2) ( n − 3) · · · ( n − k + 1)
= ;
( k − 1) !
n−1 (n − 1) (n − 2) (n − 3) · · · ((n − 1) − k + 1)
 
=
k k!
( n − 1) ( n − 2) ( n − 3) · · · ( n − k )
= .
k!
Math 221 Winter 2024, version March 12, 2024 page 68

Let us set a := (n − 1) (n − 2) (n − 3) · · · (n − k + 1) (this is the common factor


in the numerators of all these three formulas). Then, these three formulas can
be rewritten as
 
n na
= ; (13)
k k!
n−1
 
a
= ; (14)
k−1 ( k − 1) !
n−1 a (n − k)
 
= . (15)
k k!

But the claim that we are trying to prove is

n−1 n−1
     
n
= + .
k k−1 k

Using the formulas (13), (14) and (15), this can be rewritten as

na a a (n − k)
= + .
k! ( k − 1) ! k!

Multiplying both sides by k!, we can transform this into

k!
na = a · + a (n − k) .
( k − 1) !
k!
Since = k (because the recursion of the factorials (i.e., Proposition 2.3.2)
( k − 1) !
yields k! = (k − 1)! · k), we can simplify this further to

na = a · k + a (n − k ) ,

which is obviously true. Thus, our claim is proved in Case 1.


Now, we consider Case 2. In this case, k = 0. Our claim

n−1 n−1
     
n
= +
k k−1 k

thus rewrites as
n−1 n−1
     
n
= + ,
0 0−1 0
n−1
   
n
which again is true (because Example 2.4.2 shows that = 1 and =
0 0
n−1 n−1
   
1 and = = 0).
0−1 −1
Math 221 Winter 2024, version March 12, 2024 page 69

Finally, we consider Case 3. In this case, k is neither a positive integer nor 0.


Hence, k ∈ / N. Thus, k − 1 ∈
/ N as well. Hence, in our claim

n−1 n−1
     
n
= + ,
k k−1 k
 
m
all three binomial coefficients are 0 (since a binomial coefficient is 0 by

definition when ℓ ∈/ N). Thus, again, the claim is true (since 0 = 0 + 0).
We have now proved Theorem 2.5.1 in all three cases; thus, it is always true.

Pascal’s
 identity is highly useful for proving properties of binomial coeffi-
n
cients by induction on n. (We will see an example of this very soon, in the
k
proof of Theorem 2.5.9.)
Pascal’s identity shows that every entry of Pascal’s triangle (except the 1 at
the apex) equals the sum of the two entries directly above it (i.e., of the entry one
step northwest of it and the entry one step northeast of it). But it also applies to
binomial coefficients
 that
 are
 not  (commonly) considered
  to be  part ofPascal’s
−3 −4 −4
 
3.2 2.2 2.2
triangle, such as = + and = + .
5 4 5 2 1 2

2.5.2. The factorial formula


 
n
Binomial coefficients make sense for arbitrary numbers n and k. However,
k
when n and k are nonnegative integers with k ≤ n (that is,when  n ∈ N and
n
k ∈ {0, 1, . . . , n}), there is a particularly simple formula for , known as the
k
factorial formula:

Theorem 2.5.3 (factorial formula). Let n ∈ N and k ∈ {0, 1, . . . , n}. Then,


 
n n!
= .
k k! · (n − k )!

 
n
Proof. The definition of yields
k

n ( n − 1) ( n − 2) · · · ( n − k + 1)
 
n
= .
k k!
Math 221 Winter 2024, version March 12, 2024 page 70

Multiplying both sides by k!, we obtain


 
n
k! · = n ( n − 1) ( n − 2) · · · ( n − k + 1)
k
= ( n − k + 1) ( n − k + 2) ( n − k + 3) · · · n
1·2·····n
=
1 · 2 · · · · · (n − k)
since n − k + 1, n − k + 2, n − k + 3, . . . , n are
 
 precisely the factors of the product 1 · 2 · · · · · n 
that do not appear in the product 1 · 2 · · · · · (n − k )
n!
= .
(n − k)!
Dividing this by k!, we obtain
 
n n! n!
= /k! = .
k (n − k)! k! · (n − k)!

This proves the factorial formula.


 
10
Warning 2.5.4. The factorial formula can be used to compute for exam-
4
−1
   
1.2
ple, but it cannot be used to compute or (because the “n ∈ N
3 2
and k ∈ {0, 1, . . . , n}” conditions in the factorial formula are not satisfied
here). It is thus not as general as the definition of binomial coefficients!

Exercise 2.5.1. Let n ∈ N. Prove that


n   n  
2i n+i
∏ i = 2 ∏ n−i .
n
i =0 i =0

2.5.3. The symmetry of binomial coefficients


Here is another property of Pascal’s triangle: It has a vertical axis of symmetry,
meaning that the entries to the left of this axis equal the corresponding entries
to the right of the axis. Let us state this more precisely:

Theorem 2.5.5 (symmetry of Pascal’s triangle). Let n ∈ N, and let k be any


number. Then,    
n n
= .
k n−k
Math 221 Winter 2024, version March 12, 2024 page 71

Proof. We are in one of the following four cases:


Case 1: We have k ∈ {0, 1, . . . , n}.
Case 2: We have k < 0.
Case 3: We have k > n.
Case 4: The number k is not an integer.
Let us first consider Case 1. In this case, we have k ∈ {0, 1, . . . , n} and thus
also n − k ∈ {0, 1, . . . , n}. Since k ∈ {0, 1, . . . , n}, we can apply the factorial
formula to obtain  
n n!
= .
k k! · (n − k )!
Since n − k ∈ {0, 1, . . . , n}, we can also apply the factorial formula to n − k
instead of k, and thus we find
 
n n! n! n!
= = = .
n−k (n − k)! · (n − (n − k))! (n − k)! · k! k! · (n − k )!

The right hand sides of these two


  equalities
 are
 equal. Thus, the left hand sides
n n
are equal as well. This proves = in Case 1.
  k n − k
n
In Case 2, we have / N), whereas
= 0 by definition (since k < 0 entails k ∈
  k
n
= 0 by Proposition 2.4.3 (since n ∈ N and n − |{z} k > n). This proves
n−k
    <0
n n
= in Case 2.
k n−k
Case 3 is analogous
   that k and n − k trade places.
to Case2, except
n n
In Case 4, both and are 0 by definition (since neither k nor
k n−k
  to N).
n − k belongs  
n n
Thus, = is proved in all four cases, so that Theorem 2.5.5
k n−k
follows.
Alternatively, Theorem 2.5.5 could have been proved by induction on n.

Warning 2.5.6. Theorem


  2.5.5 holds only for n ∈ N. For n = −1 and k = 0,
−1 −1

it is false (since = 1 but = 0).
0 −1 − 0

One corollary of Theorem 2.5.5 is the fact that the “right border” of Pascal’s
triangle is filled with 1’s:
 
n
Corollary 2.5.7. For any n ∈ N, we have = 1.
n
Math 221 Winter 2024, version March 12, 2024 page 72

Proof. For any n ∈ N, Theorem 2.5.5 (applied to k = n) yields


     
n n n
= = = 1.
n n−n 0

Warning 2.5.8. Corollary 2.5.7 does not hold for negative (or non-integer) n.

2.5.4. Pascal’s triangle consists of integers


The perhaps most surprising pattern in Pascal’s triangle is that all its entries
are integers! It is tempting
 to take this for granted, but this is not at all obvious
n
from our definition of as a fraction. Nevertheless, we can now prove it
k
without much trouble:
 
n
Theorem 2.5.9. For any n ∈ N and any number k, we have ∈ N.
k
Proof. We induct on n.
Base case: Theorem 2.5.9 holds for n = 0, since any number k satisfies
  (
0 1, if k = 0;
= (easy to see from the definition)
k 0, if k ̸= 0
∈ N.
Induction step: We will make an induction step from n − 1 to n (instead of the
more conventional step from n to n + 1). So we fix a positive integer n, and we
assume (as the induction hypothesis) that Theorem 2.5.9 holds for n − 1 instead
of n. In other words, we assume that
n−1
 
∈N for all numbers k. (16)
k
Our goal is to prove that Theorem 2.5.9 also holds for n. In other words, we
must prove that  
n
∈N for all numbers k.
k
But this is easy: Pascal’s identity yields
n−1 n−1
     
n
= + ∈N for all numbers k.
k k−1 k
| {z } | {z }
∈N ∈N
(by (16), (by (16))
applied to k−1
instead of k)

So the induction step is complete, and the theorem is proved.


Math 221 Winter 2024, version March 12, 2024 page 73

Theorem 2.5.9 is crying for a better explanation: Certainly, a number shouldn’t


belong to N for no reason! (Actually, it can, but let’s be optimistic.) Such an
explanation does indeed exist:

Theorem 2.5.10 (combinatorial interpretation of binomial coefficients). Let


n ∈ N, and let k be any number. Let A be any n-element set. (Here, “n-
element set” means a set that has exactly n distinct elements. For example,
{2, 6, 11} is a 3-element set, and this does not change if I rewrite this set
as {2, 6, 2, 11}. Note that the sets {2, 3} and {3, 2} are identical, since a set
doesn’t care how its elements are ordered.)
Then,  
n
is the number of k-element subsets of A.
k

Example 2.5.11. Let n = 4 and k = 2 and A = {1, 2, 3, 4}. Then, the 2-element
subsets of A are

{1, 2} , {1, 3} , {1, 4} , {2, 3} , {2, 4} , {3, 4} .


 
n
So their number is 6. And this agrees with Theorem 2.5.10, since =
  k
4
= 6.
2
Another example: The 3-element subsets of {1, 2, 3, 4, 5} are

{1, 2, 3} , {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 3, 5} , {1, 4, 5} ,


{2, 3, 4} , {2, 3, 5} , {2, 4, 5} , {3, 4, 5} .
 
5
There are 10 of them, just as Theorem 2.5.10 predicts (since = 10).
3

We will prove Theorem 2.5.10 later in this course (see Theorem 4.3.3), as we
learn more about finite sets and their sizes. Note that the k-element subsets of
A are also known
 as combinations without replacement. Theorem 2.5.10 also
n
explains why is called “n choose k”: After all, a k-element subset of A is a
k
“choice” of k distinct elements (without regard for order) from A.  
n
Note again that Theorem 2.5.10 says nothing about binomial coefficients
k
with n ∈ / N, since anumber / N cannot be the size of a set. So
n ∈  Theorem
−5
 
5
2.5.10 explains why is an integer, but does not explain why is an
2 2
integer.
Math 221 Winter 2024, version March 12, 2024 page 74

2.5.5. Upper negation


Here is another property of binomial coefficients, called the upper negation
formula:

Theorem 2.5.12 (upper negation formula). For any numbers n and k ∈ Z, we


have
−n k n+k−1
   
= (−1) .
k k

/ N, then this is clear because both binomial coefficients are 0 by


Proof. If k ∈
definition.
Thus, we only need to prove the theorem in the case when k ∈ N.
In this case, the definition of binomial coefficients yields

−n (−n) (−n − 1) (−n − 2) · · · (−n − k + 1)


 
=
k k!
n ( n + 1) ( n + 2) · · · ( n + k − 1)
= (−1)k ·
k!
(here, we factored out all the minus signs from the numerator) and

n+k−1 ( n + k − 1) ( n + k − 2) ( n + k − 3) · · · n
 
=
k k!
n ( n + 1) ( n + 2) · · · ( n + k − 1)
= .
k!
−n k n+k−1
   
Comparing these equalities, we find = (−1) . This proves
k k
the theorem.
 
n
Corollary 2.5.13. For any n ∈ Z and any number k, we have ∈ Z.
k

Proof. If n ≥ 0, then this has already been


  proved in Theorem 2.5.9.
n
/ N, then this is clear because
If k ∈ = 0.
k
In the remaining case, use the upper negation formula. Details are left to the
reader (see [Grinbe19a, Theorem 1.3.16]).
−3
 
Note that (as we said above) “negative” binomial coefficients such as =
5
−21 have no immediate combinatorial meaning, because there is no such thing
as a (−3)-element set. Nevertheless, they have uses in algebra and elsewhere.
Math 221 Winter 2024, version March 12, 2024 page 75

2.5.6. Finding Fibonacci numbers in Pascal’s triangle


The binomial coefficients are related to the Fibonacci numbers:

Theorem 2.5.14. For any n ∈ N, the Fibonacci number f n+1 is

n−0 n−1 n−2 n−n


       
f n +1 = + + +···+
0 1 2 n
n 
n−k

= ∑ .
k =0
k

For example, for n = 7, this is saying that

7−0 7−1 7−2 7−7


       
f8 = + + +···+
0 1 2 7
= 1 + 6 + 10 + 4 + 0 + 0 + 0 + 0 = 21.

We will prove Theorem 2.5.14 in Chapter 6 (as Corollary 6.4.8) using enumer-
ative combinatorics. You can find proofs of Theorem 2.5.14 in [Vorobi02, §15]
and in [Grinbe19a, §1.4.5, proof of Proposition 1.3.32] as well.

2.6. The binomial formula


One of the most important properties of binomial coefficients (which, inciden-
tally, explains their name) is the binomial formula:

Theorem 2.6.1 (binomial formula, aka binomial theorem). Let a and b be any
numbers, and let n ∈ N. Then,
n  
n k n−k
( a + b) = ∑
n
a b . (17)
k =0
k

Restating this without the summation sign:


       
n n 0 n n 1 n −1 n 2 n −2 n n 0
( a + b) = a b + a b + a b +···+ a b .
0 1 2 n

Equivalently:
n  
n n−k k
( a + b) = ∑
n
a b . (18)
k =0
k
Math 221 Winter 2024, version March 12, 2024 page 76

Example 2.6.2. For n = 5, the formula (17) is saying that

( a + b )5
5  
5 k 5− k
= ∑ a b
k =0
k
           
5 0 5 5 1 4 5 2 3 5 3 2 5 4 1 5 5 0
= a b + a b + a b + a b + a b + a b
0 1 2 3 4 5
= 1b5 + 5ab4 + 10a2 b3 + 10a3 b2 + 5a4 b + 1a5
= b5 + 5ab4 + 10a2 b3 + 10a3 b2 + 5a4 b + a5 .

For a more familiar example, for n = 2, the formula (17) becomes

( a + b)2 = b2 + 2ab + a2 .

Proof of Theorem 2.6.1. Clearly, the formula (18) is just the formula (17) with the
variables a and b swapped (since b + a = a + b). Thus, it will suffice to prove
(17).
We will prove (17) by induction on n:
Base case: For n = 0, this formula (17) is true, since
0    
0 k 0− k 0
0
( a + b) = 1 and ∑ k a b = 0 |{z} 0−0
a0 b|{z} = 1.
k =0 |{z} =1 =b0 =1
=1

Induction step: Let n ∈ N. We assume (as the induction hypothesis) that the
formula (17) holds for n. In other words, we assume that
n  
n k n−k
( a + b) = ∑
n
a b . (19)
k =0
k

We must show that the formula (17) also holds for n + 1. In other words, we
must prove that
n +1  
n + 1 k n +1− k
( a + b) n +1
= ∑ a b . (20)
k =0
k
Math 221 Winter 2024, version March 12, 2024 page 77

Indeed, we have

( a + b ) n +1
= ( a + b)n · ( a + b)
!
n  
n k n−k
= ∑ a b · ( a + b) (by (19))
k =0
k
! !
n   n  
n k n−k n k n−k
= ∑ a b ·a+ ∑ a b ·b
k =0
k k =0
k
n   n  
n n k n−k
= ∑ n−k
ak b{z a+∑ a b| {z b}
k =0
k | }
k =0
k
= a k +1 b n − k = b n − k +1
! !
v v
by distributivity for finite sums, i.e., by the rule ∑ as c= ∑ as c
s=u s=u
n   n  
n k +1 n − k n k n − k +1
= ∑ a b +∑ a b . (21)
k =0
k k =0
k

On the other hand, for each k, we have


     
n+1 n n
= +
k k−1 k
Math 221 Winter 2024, version March 12, 2024 page 78

(indeed, this is just Theorem 2.5.1, applied to n + 1 instead of n). Hence,


n +1  
n + 1 k n +1− k
∑ k ab
k =0
n+1    
n n
= ∑ + a k b n +1− k
k =0
k−1 k
n +1     
n n k n +1− k
= ∑ k n +1− k
a b + a b
k =0
k − 1 k
n +1   n +1  
n n k n +1− k
= ∑ k n +1− k
a b +∑ a b
k =0
k−1 k =0
k
 
 
 
   n +1   
n n
0 n +1−0
+∑ k n +1− k 
 
=
 a b a b
−1
 | 0 {z k =1
k−1 

 } 
 =0 
(by definition,
since 0−1/∈N)
 
 
 
 n     
n k n +1− k n
∑ k a b
n+1 n+1−(n+1) 
 
+ + a b
 k =0 n+1 

 | {z } 
 =0 
(by Proposition 2.4.3,
since n+1>n)
 
here, we have split off the k = 0 addend from the first sum,
and the k = n + 1 addend from the second sum
n +1   n  
n n k n +1− k
= ∑ k n +1− k
a b +∑ a b . (22)
k =1
k−1 k =0
k
Let us now compare the two equalities (21) and (22). Our goal is to prove
that their left hand sides are equal (because this equality will be precisely (20)).
Let us look at the right hand sides instead. The right hand side of (21) consists
of two finite sums, and so does the right hand side of (22). The second sums
of both right hand sides are equal, since n − k + 1 = n + 1 − k for each k. If we
can also show that the respective first sums are equal, then we will conclude
that the right hand sides of (21) and (22) are equal, and therefore the left hand
sides are also equal, and thus we will conclude that
n +1  
n + 1 k n +1− k
( a + b) n +1
= ∑ a b ,
k =0
k
which is precisely our goal.
Math 221 Winter 2024, version March 12, 2024 page 79

So it remains to prove that the first sums on the right hand sides of (21) and
(22) are equal. In other words, it remains to prove that

n k +1 n − k n +1
n    
n
∑ k a b = ∑ k − 1 a k b n +1− k . (23)
k =0 k =1

But this becomes clear if we observe that these two sums contain the exact
same addends: Indeed, written out without using summation signs, both sums
become
       
n 1 n n 2 n −1 n 3 n −2 n n +1 0
a b + a b + a b +···+ a b .
0 1 2 n
This argument can be made more rigorously using an important summation
rule, known as substitution. In its simplest form, this rule says that
v v+δ
∑ ck = ∑ ck−δ (24)
k=u k=u+δ

for any integers u, v, δ and any numbers cu , cu+1 , . . . , cv . This is the discrete
analogue of the formula
Z v Z v+δ
f ( x ) dx = f ( x − δ) dx
u u+δ

from real analysis. A formal proof of (24) can easily be given by induction on v,
but intuitively (24) should be obvious (since both sides are cu + cu+1 + · · · + cv ).
v v+δ
When we use (24) to rewrite a sum of the form ∑ ck as ∑ ck−δ , we say
k=u k=u+δ
that we are substituting k − δ for k in the sum. For example, taking u = 4 and
v = 9 and ck = kk and δ = −2, we see that
9 9+(−2) 7
∑ kk = ∑ (k − (−2))k−(−2) = ∑ ( k + 2 ) k +2 .
k =4 k=4+(−2) k =2
 
n n k +1 n − k
Now, substituting k − 1 for k in the sum ∑ a b , we obtain
k =0 k

n k +1 n − k n +1
n   n +1 
  
n n
∑ k a b = ∑ k − 1 |a {z } b| {z } = ∑ k − 1 ak bn+1−k .
(k−1)+1 n−(k−1)

k =0 k =1 = ak k =1
= b n +1− k

This proves (23) rigorously.


Having proved (23), we have shown that the first sums on the right hand
sides of (21) and (22) are equal. As we explained, this yields (20), and thus
completes the induction step. This proves (17), thus concluding the proof of
Theorem 2.6.1.
Math 221 Winter 2024, version March 12, 2024 page 80

Exercise 2.6.1. Let n ∈ N.


(a) Prove that
n   n   (
n n 1, if n > 0;
∑ k = 2n and ∑ (−1)n k = 0, if n = 0.
k =0 k =0
       
4 4 4 4
(For example, for n = 4, this is saying that + + + +
           0  1 2 3
4 4 4 4 4 4
= 24 and that − + − + = 0.)
4 0 1 2 3 4
(b) Assume that n is positive. Show that
 
n
∑ k
= 2n −1 .
k∈{0,1,...,n}
is even
     
n n n
(The left hand side can be explicitly written as + + + ···,
    0 2 4
n n
ending at or depending on whether n is odd or even.)
n−1 n
[You can use Proposition 3.3.8 here.]

Exercise 2.6.2. Recall the Fibonacci sequence (Definition 1.5.1). Prove that
every n ∈ N and m ∈ N satisfy
n  
n
∑ k f m+k = f m+2n .
k =0

n−1
   
n
Exercise 2.6.3. (a) Prove that k =n for any two numbers n and
k k−1
k.
 
n n k
(b) Prove that ∑ k x = nx ( x + 1)n−1 for any positive integer n and
k =0 k
any number x.
/ N.]
[Hint: In part (a), don’t forget about cases like k = 0 and k ∈

Exercise 2.6.4. Let ( f 0 , f 1 , f 2 , . . .) be the Fibonacci sequence. Let n ∈ N. Prove


that
n  
n
2 n −1
· fn = ∑ · 5k .
k =0
2k + 1

1+ 5
[Hint: The 5 on the right hand side looks suspiciously like the 5 in ,
2
Math 221 Winter 2024, version March 12, 2024 page 81

whereas the binomial coefficients look like the binomial formula...]

2.7. More properties of binomial coefficients


Exercise 2.7.1. Prove that every n ∈ N and a ∈ R satisfy
n    
k+a n+a+1
∑ k = n
.
k =0

(For example, for n = 4 and a = 2, this is saying that


           
2 3 4 5 6 7
+ + + + = .
0 1 2 3 4 4

Keep in mind that a doesn’t have to be an integer in general!)

Exercise 2.7.2. Let n ∈ N.


(a) Prove that 1 · 3 · 5 · · · · · (2n − 1) (that is, the product of the first n odd
(2n)!
positive integers) is n .
2 · n!
−1/2 −1 n 2n
     
(b) Prove that = .
n 4 n

Exercise 2.7.3. Let m, n ∈ N be such that n > 0.


mn − 1
   
mn
(a) Prove that =n .
m m−1
(mn)!
(b) Prove that is an integer.
m!n · n!
[Hint: For part (b), induct on n.]
Math 221 Winter 2024, version March 12, 2024 page 82

3. Elementary number theory


Number theory is commonly understood to be the study of integers, and par-
ticularly of those properties and features of integers that do not make much
sense for rational, real or complex numbers. Divisibility is one such property;
prime numbers are another. In this course, we will only cover the very basics of
elementary number theory; there is no shortage of texts that go much deeper
(some freely available ones are [Mileti22], [Hackma09], [Stein08], [Shoup08] and
[Martin17]).

3.1. Divisibility
3.1.1. Definition
We begin by defining the one most important concept in number theory:

Definition 3.1.1. Let a and b be two integers.


We write a | b (and we say that “a divides b”, or “b is divisible by a”, or
“b is a multiple of a”, or “a is a divisor of b”; yes, all these statements are
equivalent) if there exists an integer c such that b = ac.
We write a ∤ b if we don’t have a | b.

Example 3.1.2. (a) We have 4 | 12, because 12 = 4 · 3.


(b) We have 4 ∤ 11, because there exists no integer c such that 11 = 4c.
(c) We have 1 | b for every integer b, since b = 1 · b.
(d) We have a | a for every integer a, since a = a · 1. In particular, 0 | 0,
which is somewhat controversial (but true in our opinion). (Some authors
0
deliberately exclude 0 as a divisor on the grounds that is not well-defined,
0
but I believe that making this an exception is more trouble than it is worth.)
(e) We have a | 0 for every integer a, since 0 = a · 0.
(f) An integer b satisfies 0 | b if and only if b = 0.

The well-known concepts of even and odd integers are instances of divisibil-
ity:

Definition 3.1.3. (a) An integer n is said to be even if 2 | n.


(b) An integer n is said to be odd if 2 ∤ n.

You probably know a few things about even and odd numbers already: e.g.,

1. The sum of two even numbers is even.


2. The sum of an even with an odd number is odd.
Math 221 Winter 2024, version March 12, 2024 page 83

3. The sum of two odd numbers is even.

Strictly speaking, these claims (particularly the third one) are not at all ob-
vious. So we need to understand divisibility better to even convince ourselves
that such fundamental statements are true. We will do this soon (Corollary
3.3.9). First, let us prove some basic facts about divisibility.

3.1.2. Basic properties


In the next proposition, we shall let abs x denote the absolute value of a real
number x. Thus, (
x, if x ≥ 0;
abs x =
− x, if x < 0.
This absolute value abs x is normally called | x |, but I believe that writing “abs a |
abs b” is less confusing than writing “| a| | |b|” (where four of the bars stand for
absolute values, while the middle bar stands for divisibility).

Proposition 3.1.4. Let a and b be two integers. Then:


(a) We have a | b if and only if abs a | abs b.
(b) If a | b and b ̸= 0, then abs a ≤ abs b.
(c) If a | b and b | a, then abs a = abs b.
b
(d) Assume that a ̸= 0. Then, a | b if and only if ∈ Z.
a

Proof. (a) Proposition 3.1.4 (a) says that the divisibility a | b does not depend
on the signs of a and b; in other words, it says that we can replace the numbers
a and b by their absolute values without changing the truth (or falsity) of a | b.
Clearly, in order to prove this, it suffices to show the following two state-
ments:

1. We can replace a by − a without changing the truth (or falsity) of a | b;

2. We can replace b by −b without changing the truth (or falsity) of a | b;

But both of these statements are easy:


For the first statement, we assume that a | b. Thus, b = ac for some integer
c (by the definition of “a | b”). Hence, for this integer c, we have b = ac =
(− a) (−c), which allows us to conclude that − a | b (since −c is an integer, too).
Thus, we have shown that a | b implies − a | b. Conversely, a similar argument
shows that − a | b implies a | b (indeed, it is the same argument with the roles
of a and − a swapped, because − (− a) = a). Thus, the statements a | b and
− a | b are equivalent. In other words, we can replace a by − a without changing
the truth (or falsity) of a | b. This proves the first of our above two statements.
Math 221 Winter 2024, version March 12, 2024 page 84

The proof of the second statement is similar. (This time, you need to argue
that a | b implies a | −b. Again, write b as b = ac, and conclude that −b =
− ac = a (−c), so that a | −b.)
Thus, both statements are proved, so that the proof of Proposition 3.1.4 (a) is
complete.
(b) Assume that a | b and b ̸= 0. We must show that abs a ≤ abs b.
Let x = abs a and y = abs b. Thus, x is a nonnegative integer and y is a
positive integer (since b ̸= 0). Thus, x ≥ 0 and y > 0.
Proposition 3.1.4 (a) yields that abs a | abs b (since a | b). In other words, x | y
(since x = abs a and y = abs b). In other words, y = xz for some integer z.
Consider this z.
If we had z ≤ 0, then we would have y = |{z} x z ≤ 0 (by the standard
|{z}
≥0 ≤0
rules for inequalities), which would contradict y > 0. Hence, we cannot have
z ≤ 0. Thus, z > 0, so that z ≥ 1 (since z is an integer). Hence, xz ≥ x1
(since x ≥ 0 allows us to multiply any inequality by x without having to flip
the sign). Therefore, y = xz ≥ x1 = x. In other words, x ≤ y. In other words,
abs a ≤ abs b (since x = abs a and y = abs b). This proves Proposition 3.1.4 (b).
(c) Let a | b and b | a. We must prove that abs a = abs b.
If a = 0, then this is easily done (because if a = 0, then 0 = a | b quickly leads
to b = 0, and therefore a = 0 = b, so that abs a = abs b).
Likewise, this is easily done if b = 0.
It remains to handle the third possible case, which is when both a and b are
̸= 0. Consider this case. In this case, Proposition 3.1.4 (b) yields abs a ≤ abs b
(since a | b and b ̸= 0). However, we can also apply Proposition 3.1.4 (b) with
the roles of a and b interchanged (since b | a and a ̸= 0), and thus obtain abs b ≤
abs a. Combining this with abs a ≤ abs b, we find abs a = abs b. Proposition
3.1.4 (c) is thus proved.
(d) This is quite straightforward:
Assume that a | b. Thus, there exists some integer c such that b = ac (by the
b b
definition of “a | b”). This c must then be (since b = ac implies c = in view
a a
b b
of a ̸= 0). Hence, is an integer, i.e., we have ∈ Z.
a a
b
Forget that we assumed a | b. We thus have shown that ∈ Z if a | b. The
a
b
same argument (done in reverse) yields that conversely, if ∈ Z, then a | b.
a
b
Combining these two facts, we conclude that a | b if and only if ∈ Z. This
a
proves Proposition 3.1.4 (d).
This was a warm-up (if somewhat laborious to write up). Here are some
slightly more substantial properties of divisibility:
Math 221 Winter 2024, version March 12, 2024 page 85

Theorem 3.1.5 (rules for divisibility). (a) We have a | a for each a ∈ Z. (This
is called reflexivity of divisibility.)
(b) If a, b, c ∈ Z satisfy a | b and b | c, then a | c. (This is called transitivity
of divisibility.)
(c) If a1 , a2 , b1 , b2 ∈ Z satisfy a1 | b1 and a2 | b2 , then a1 a2 | b1 b2 . (This is
called multiplying two divisibilities.)
(d) If d, a, b ∈ Z satisfy d | a and d | b, then d | a + b. (This is often restated
as “a sum of two multiples of d is again a multiple of d”.)

Proof. (a) Let a ∈ Z. Then, a = a · 1, so that a | a (since 1 is an integer). This


proves Theorem 3.1.5 (a).
(b) Let a, b, c ∈ Z satisfy a | b and b | c.
From a | b, we see that there exists an integer x such that b = ax.
From b | c, we see that there exists an integer y such that c = by.
Consider these integers x and y. Now,

b y = axy.
c = |{z}
= ax

Hence, there exists some integer z such that c = az (namely, z = xy). This
shows that a | c. Theorem 3.1.5 (b) is thus proven.
(c) Let a1 , a2 , b1 , b2 ∈ Z satisfy a1 | b1 and a2 | b2 .
From a1 | b1 , we see that b1 = a1 c1 for some integer c1 .
From a2 | b2 , we see that b2 = a2 c2 for some integer c2 .
Consider these integers c1 and c2 . Now,

b1 b2 = a1 c1 a2 c2 = ( a1 a2 ) (c1 c2 ) .
|{z} |{z} | {z }
= a1 c1 = a2 c2 an integer

Thus, a1 a2 | b1 b2 . This proves Theorem 3.1.5 (c).


(d) Let d, a, b ∈ Z satisfy d | a and d | b.
From d | a, we see that a = dx for some integer x.
From d | b, we see that b = dy for some integer y.
Consider these integers x and y. Now,

a + b = dx + dy = d ( x + y) .
| {z }
an integer

Thus, d | a + b. This proves Theorem 3.1.5 (d).

Theorem 3.1.5 (b) tells us that divisibilities can be chained together: If a | b


and b | c, then a | c. Therefore, you will often see a statement of the form “a | b
Math 221 Winter 2024, version March 12, 2024 page 86

and b | c” rewritten as “a | b | c”, just like two inequalities a ≤ b and b ≤ c can


be chained together to form a ≤ b ≤ c. More generally, the statement

“a1 | a2 | · · · | ak ”

shall mean that each of the numbers a1 , a2 , . . . , ak divides the next (i.e., that
a1 | a2 and a2 | a3 and so on, ending with ak−1 | ak ). By induction on k, it is
easy to see that such a chain of divisibilities always entails a1 | ak . For example,
3 | 6 | 18 | 36 entails 3 | 36.

Exercise 3.1.1. Let a, b ∈ Z satisfy a | b. Prove that ak | bk for each k ∈ N.

3.1.3. Divisibility criteria


How can you spot divisibilities between actual numbers? For small values of a,
there are several known divisibility criteria (also known as divisibility rules),
which give simple methods to check whether a given integer b is divisible by a
b
(without computing ). Here are some:
a
Theorem 3.1.6. Let b ∈ N. Write b in decimal notation. Then:
(a) We have 2 | b if and only if the last digit of b is 0 or 2 or 4 or 6 or 8.
(b) We have 5 | b if and only if the last digit of b is 0 or 5.
(c) We have 10 | b if and only if the last digit of b is 0.
(d) We have 3 | b if and only if the sum of the digits of b is divisible by 3.
(e) We have 9 | b if and only if the sum of the digits of b is divisible by 9.

Example 3.1.7. Let b = 10835. Then, 2 ∤ b, since the last digit of b is neither
0 nor 2 nor 4 nor 6 nor 8 (but 5). However, 5 | b, since the last digit of b is 0
or 5. Do we have 3 | b ? The sum of the digits of b is 1 + 0 + 8 + 3 + 5 = 17,
which is not divisible by 3. Thus, b is not divisible by 3. Hence, b is not
divisible by 9 either, because if we had 9 | b, then we would get 3 | 9 | b (by
Theorem 3.1.5 (b)), which would contradict the previous sentence.

How do we prove Theorem 3.1.6?


The easiest part is part (c): If you multiply a number (written in decimal) by
10, then its decimal representation just grows a new digit 0 at the end. Thus,
if 10 | b, then the last digit of b is 0. Conversely, if the last digit of b is 0, then
b = 10b′ , where b′ is the number b with its last digit removed. For example,
390 = 10 · 39.
Parts (a) and (b) of Theorem 3.1.6 are somewhat trickier, and parts (d) and (e)
more so. To get simple proofs for these parts, we will now introduce another
type of relation between integers, known as congruence modulo n.
Math 221 Winter 2024, version March 12, 2024 page 87

3.2. Congruence modulo n


3.2.1. Definition

Definition 3.2.1. Let n, a, b ∈ Z. We say that a is congruent to b modulo n if


and only if n | a − b.
We shall use the notation “a ≡ b mod n” for “a is congruent to b modulo
n”.
We shall use the notation “a ̸≡ b mod n” for “a is not congruent to b mod-
ulo n”.

Example 3.2.2. (a) Is 3 ≡ 7 mod 2 ? This would mean that 2 | 3 − 7, which is


true (since 3 − 7 = −4 = 2 · (−2)). So yes, we do have 3 ≡ 7 mod 2.
(b) Is 3 ≡ 6 mod 2 ? This would mean that 2 | 3 − 6, which is false (since
3 − 6 = −3 is not divisible by 2). So we have 3 ̸≡ 6 mod 2.
(c) We have a ≡ b mod 1 for any integers a and b. This is because 1 | a − b
(since 1 divides every integer).
(d) Two integers a and b satisfy a ≡ b mod 0 if and only if a = b (since 0
divides only 0 itself).
(e) For any two integers a and b, we have a + b ≡ a − b mod 2, since
( a + b) − ( a − b) = 2b is clearly divisible by 2.

The word “modulo” in the phrase “a is congruent to b modulo n” has been


invented by Gauss and should be read as something like “with respect to”. You
can translate the statement “a is congruent to b modulo n” as “a equals b up
to a multiple of n”. Indeed, the definition of congruence can be restated as
follows:

a ≡ b mod n if and only if a = b + nc for some c ∈ Z.

As we will soon see, congruence modulo 2 is essentially parity:

• Two even numbers are always congruent (to each other) modulo 2.

• Two odd numbers are always congruent (to each other) modulo 2.

• An even number is never congruent to an odd number modulo 2.

We will prove this in Corollary 3.3.17.

3.2.2. Basic properties


First, we shall establish some fundamental properties of congruence.
Math 221 Winter 2024, version March 12, 2024 page 88

Proposition 3.2.3. Let n, a ∈ Z. Then, a ≡ 0 mod n if and only if n | a.

Proof. By the definition of congruence, we have the following equivalences:

( a ≡ 0 mod n) ⇐⇒ (n | a − 0) ⇐⇒ (n | a) .

Proposition 3.2.3 thus follows.

Proposition 3.2.4. Let n ∈ Z. Then:


(a) We have a ≡ a mod n for every a ∈ Z. (This is called the reflexivity of
congruence.)
(b) If a, b ∈ Z satisfy a ≡ b mod n, then b ≡ a mod n. (This is called the
symmetry of congruence.)
(c) If a, b, c ∈ Z satisfy a ≡ b mod n and b ≡ c mod n, then a ≡ c mod n.
(This is called the transitivity of congruence.)
(d) If a1 , a2 , b1 , b2 ∈ Z satisfy

a1 ≡ b1 mod n and a2 ≡ b2 mod n,

then

a1 + a2 ≡ b1 + b2 mod n; (25)
a1 − a2 ≡ b1 − b2 mod n; (26)
a1 a2 ≡ b1 b2 mod n. (27)

(In other words, two congruences modulo n can be added, subtracted or


multiplied.)
(e) Let m ∈ Z be such that m | n. If a, b ∈ Z satisfy a ≡ b mod n, then
a ≡ b mod m.

Proof. (a) Let a ∈ Z. Then, n | a − a because a − a = 0 = n · 0. But this means


that a ≡ a mod n. Thus, Proposition 3.2.4 (a) follows.
(b) Let a, b ∈ Z be such that a ≡ b mod n. Thus, n | a − b.
We must prove that b ≡ a mod n, i.e., that n | b − a.
However, b − a = ( a − b) · (−1), so that a − b | b − a. Hence, n | a − b | b − a.
Therefore, by the transitivity of divisibility, n | b − a. But this means precisely
that b ≡ a mod n. Thus, Proposition 3.2.4 (b) is proved.
(c) Let a, b, c ∈ Z be such that a ≡ b mod n and b ≡ c mod n.
From a ≡ b mod n, we obtain n | a − b.
From b ≡ c mod n, we obtain n | b − c.
Recall that a sum of two multiples of n is again a multiple of n (this is Theo-
rem 3.1.5 (d)). Thus, from n | a − b and n | b − c, we obtain n | ( a − b) + (b − c).
Math 221 Winter 2024, version March 12, 2024 page 89

Since ( a − b) + (b − c) = a − c, we can rewrite this as n | a − c. In other words,


a ≡ c mod n. This proves Proposition 3.2.4 (c).
(d) Let a1 , a2 , b1 , b2 ∈ Z satisfy

a1 ≡ b1 mod n and a2 ≡ b2 mod n.

Thus, n | a1 − b1 and n | a2 − b2 .
From n | a1 − b1 , we see that a1 − b1 = nc1 for some integer c1 .
From n | a2 − b2 , we see that a2 − b2 = nc2 for some integer c2 .
Consider these integers c1 and c2 .
From a1 − b1 = nc1 , we obtain a1 = b1 + nc1 . Similarly, a2 = b2 + nc2 .
Adding the equalities a1 = b1 + nc1 and a2 = b2 + nc2 together, we find

a1 + a2 = (b1 + nc1 ) + (b2 + nc2 ) = b1 + b2 + n (c1 + c2 ) .

Thus, a1 + a2 differs from b1 + b2 by a multiple of n (namely, by n (c1 + c2 )). In


other words, n | ( a1 + a2 ) − (b1 + b2 ). Hence,

a1 + a2 ≡ b1 + b2 mod n.

Subtracting the equalities a1 = b1 + nc1 and a2 = b2 + nc2 from one another,


we obtain

a1 − a2 = (b1 + nc1 ) − (b2 + nc2 ) = b1 − b2 + n (c1 − c2 ) .

Thus, a1 − a2 differs from b1 − b2 by a multiple of n (namely, by n (c1 − c2 )).


Hence,
a1 − a2 ≡ b1 − b2 mod n.
Multiplying the equalities a1 = b1 + nc1 and a2 = b2 + nc2 together, we find

a1 a2 = (b1 + nc1 ) (b2 + nc2 ) = b1 b2 + b1 nc2 + nc1 b2 + nc1 nc2


= b1 b2 + n (b1 c2 + c1 b2 + nc1 c2 ) .

Thus, a1 a2 differs from b1 b2 by a multiple of n (namely, by n (b1 c2 + c1 b2 + nc1 c2 )).


Therefore,
a1 a2 ≡ b1 b2 mod n.
Altogether, we have proved all claims of Proposition 3.2.4 (d) now.
(e) Let m ∈ Z be such that m | n. Let a, b ∈ Z satisfy a ≡ b mod n.
Thus, n | a − b. Hence, m | n | a − b, so that m | a − b (by the transitivity
of divisibility). But this means that a ≡ b mod m. Thus, Proposition 3.2.4 (e)
follows.
Math 221 Winter 2024, version March 12, 2024 page 90

Proposition 3.2.4 (b) says that congruences can be turned around: From
a ≡ b mod n, we can always obtain b ≡ a mod n. (This is very different from
divisibilities, for which a | b almost never implies b | a.)
Proposition 3.2.4 (c) says that congruences can be chained together: From
a ≡ b mod n and b ≡ c mod n, we can always obtain a ≡ c mod n. This is
analogous to Theorem 3.1.5 (b), and leads to a similar convention: Instead
of writing “a ≡ b mod n and b ≡ c mod n”, we will often just write “a ≡ b ≡
c mod n”, understanding that (by Proposition 3.2.4 (c)) this chain of congruences
automatically implies a ≡ c mod n. More generally, the statement
“a1 ≡ a2 ≡ · · · ≡ ak mod n”
shall mean that each of the numbers a1 , a2 , . . . , ak is congruent to the next mod-
ulo n (i.e., that ai ≡ ai+1 mod n for each i ∈ {1, 2, . . . , k − 1}). By induction on k,
it is easy to see that such a chain of congruences always entails a1 ≡ ak mod n
(and, better yet: ai ≡ a j mod n for all i and j).
Note that we can only chain together two congruences modulo the same n,
not two congruences modulo two different n’s. For example, if we know that
a ≡ b mod 2 and b ≡ c mod 3, then we cannot conclude any congruence between
a and c.
Proposition 3.2.4 (d) says that congruences modulo n (for a fixed integer n)
can be added, subtracted and multiplied together (just like equalities). Before
you get over-enthusiastic, keep in mind that
• they cannot be divided by one another: We have 2 ≡ 0 mod 2 and 2 ≡
2 mod 2 but 2/2 ̸≡ 0/2 mod 2.
• they cannot be taken to each other’s power: We have 2 ≡ 2 mod 2 and
2 ≡ 0 mod 2 but 22 ̸≡ 20 mod 2.
However, we can take a congruence to a k-th power for a fixed k ∈ N:
Exercise 3.2.1. Let n, a, b ∈ Z be such that a ≡ b mod n. Let k ∈ N. Prove
that ak ≡ bk mod n.
Proposition 3.2.4 (e) shows that the n in a congruence a ≡ b mod n can be
replaced by any divisor of n. For example, if two integers a and b satisfy
a ≡ b mod 15, then a ≡ b mod 3, since 3 is a divisor of 15.
The next exercise shows that we can divide a congruence a ≡ b mod n by a
nonzero integer d as long as we divide all three numbers in it (a, b and n) by d
(rather than just a and b):
Exercise 3.2.2. Let n, d, a, b ∈ Z, and assume that d ̸= 0 and da ≡ db mod dn.
(a) Prove that a ≡ b mod n.
(b) Show by an example that a ≡ b mod dn is not necessarily true (i.e., we
cannot simply cancel the d from da and db while leaving the dn unchanged).
Math 221 Winter 2024, version March 12, 2024 page 91

3.2.3. Proving the divisibility criteria


Now, let us prove Theorem 3.1.6 (e), restating it as follows:

Proposition 3.2.5. Let m ∈ N. Let s be the sum of the digits of m written in


decimal. (For instance, if m = 302, then s = 3 + 0 + 2 = 5.)
Then, 9 | m if and only if 9 | s.

Proof. Let the integer m have decimal representation md md−1 · · · m0 (where md


is the leading digit). Thus,

m = md · 10d + md−1 · 10d−1 + · · · + m0 · 100 and


s = m d + m d −1 + · · · + m 0 .

However, 10 ≡ 1 mod 9 (since 10 − 1 = 9 is divisible by 9). Hence, by Exercise


3.2.1, we have 10k ≡ 1k mod 9 for every k ∈ {0, 1, . . . , d}. Multiplying this
congruence with the obvious congruence mk ≡ mk mod 9, we obtain21

mk · 10k ≡ mk · 1k mod 9 for every k ∈ {0, 1, . . . , d} .

In other words,

mk · 10k ≡ mk mod 9 for every k ∈ {0, 1, . . . , d}

1k = mk ). In other words, we have


(since mk · |{z}
=1

md · 10d ≡ md mod 9;
md−1 · 10d−1 ≡ md−1 mod 9;
md−2 · 10d−2 ≡ md−2 mod 9;
...;
m0 · 100 ≡ m0 mod 9.

Adding these d + 1 many congruences together, we obtain22

md · 10d + md−1 · 10d−1 + · · · + m0 · 100 ≡ md + md−1 + · · · + m0 mod 9.

In other words,
m ≡ s mod 9

21 The reason why we can multiply two congruences together is Proposition 3.2.4 (d) (specifi-
cally, (27)).
22 The reason why we can add two congruences together is Proposition 3.2.4 (d) (specifically,

(25)). To be very pedantic, we have to apply (25) several times, since we are adding not two
but d + 1 many congruences together.
Math 221 Winter 2024, version March 12, 2024 page 92

(since m = md · 10d + md−1 · 10d−1 + · · · + m0 · 100 and s = md + md−1 + · · · +


m0 ). Turning this congruence around (i.e., applying Proposition 3.2.4 (b)), we
obtain s ≡ m mod 9.
Now, if 9 | m, then m ≡ 0 mod 9 (by Proposition 3.2.3), whence s ≡ m ≡
0 mod 9 (here we are tacitly using Proposition 3.2.4 (c)), which entails 9 | s
(again by Proposition 3.2.3). Thus, we have shown that if 9 | m, then 9 | s.
Conversely, if 9 | s, then s ≡ 0 mod 9 (by Proposition 3.2.3), whence m ≡
s ≡ 0 mod 9, which in turn entails 9 | m (by Proposition 3.2.3). Thus, we have
shown that if 9 | s, then 9 | m.
Now we have proved that each of the statements 9 | m and 9 | s implies
the other. In other words, we have 9 | m if and only if 9 | s. This proves the
proposition.
In other words, Theorem 3.1.6 (e) is proven. A similar argument (with 9
replaced by 3) can be used to prove Theorem 3.1.6 (d). In fact, s ≡ m mod 9
entails s ≡ m mod 3 by Proposition 3.2.4 (e), because 3 | 9.
Parts (a) and (b) of Theorem 3.1.6 can be proved along similar lines, but are
in fact easier. Indeed, if m ∈ N has decimal representation md md−1 · · · m0 ,
then m ≡ m0 mod 10 (since the number m − m0 has decimal representation
md md−1 · · · m1 0 and thus is divisible by 10), and therefore (by Proposition 3.2.4
(e)) we have m ≡ m0 mod 2 and m ≡ m0 mod 5 as well.
Exercise 3.2.3. Let m be a positive integer, and let md md−1 · · · m0 be its deci-
mal representation, so that m0 , m1 , . . . , md are digits satisfying

d
m = md · 10d + md−1 · 10d−1 + · · · + m0 · 100 = ∑ mk · 10k .
k =0

Let a be the alternating sum of digits of m; this is defined by

d
a := m0 − m1 + m2 − m3 ± · · · + (−1)d md = ∑ (−1)k mk .
k =0

Prove that 11 | m if and only if 11 | a. (This is the classical divisibility test for
divisibility by 11.)

3.3. Division with remainder


3.3.1. The theorem
What comes next is the most fundamental theorem of number theory:
Theorem 3.3.1 (division-with-remainder theorem). Let n be an integer. Let d
be a positive integer. Then, there exists a unique pair (q, r ) of integers
q∈Z and r ∈ {0, 1, . . . , d − 1}
Math 221 Winter 2024, version March 12, 2024 page 93

such that
n = qd + r.

We will prove this soon. First, let us introduce some notations:

Definition 3.3.2. Let n be an integer. Let d be a positive integer. Let (q, r ) be


the pair whose existence and uniqueness is claimed in Theorem 3.3.1. Then:

• The number q is called the quotient of the division of n by d, and will


be denoted by n//d.

• The number r is called the remainder of the division of n by d, and will


be denoted by n%d.

• The pair (q, r ) is called the quo-rem pair of n and d.

For now, of course, we do not yet know that these q and r exist and are
unique (because we haven’t proved the theorem yet). Thus, we will take care to
speak of “a quotient”, “a remainder” and “a quo-rem pair”, never taking their
existence and uniqueness for granted until we have proved it.

Example 3.3.3. What are 8//5 and 8%5 ? We have

|{z} 1 · |{z}
8 = |{z} 5 + 3
|{z} ,
=n =q =d =r ∈{0,1,2,3,4}

so 8//5 = 1 and 8%5 = 3. (This is taking the uniqueness of 8//5 and 8%5
for granted, but we will prove this soon.)

Example 3.3.4. What are 19//5 and 19%5 ? We have 19 = 3 · 5 + 4, so


19//5 = 3 and 19%5 = 4.

Example 3.3.5. What are (−7) //5 and (−7) %5 ? We have

−7 = (−2) · |{z}
|{z} 5 + 3
|{z} ,
| {z }
=n =q =d =r ∈{0,1,2,3,4}

so (−7) //5 = −2 and (−7) %5 = 3.

So Theorem 3.3.1 is saying that for any integer n and any positive integer d,
there is a unique quo-rem pair of n and d. Let us now prove this.

3.3.2. The proof


Proof of Theorem 3.3.1. We need to prove two things: that a quo-rem pair of n
and d exists, and that it is unique. Let me prove the uniqueness part first.
Math 221 Winter 2024, version March 12, 2024 page 94

Proof of the uniqueness part: Fix an integer n and a positive integer d. We must
show that there is at most one quo-rem pair (q, r ) of n and d. In other words,
we must show that there are no two distinct quo-rem pairs of n and d.
We shall prove this by contradiction. So we assume that (q1 , r1 ) and (q2 , r2 )
are two distinct quo-rem pairs of n and d. We want to derive a contradiction.
Since (q1 , r1 ) is a quo-rem pair of n and d, we have

q1 ∈ Z and r1 ∈ {0, 1, . . . , d − 1} and n = q1 d + r1 .

Since (q2 , r2 ) is a quo-rem pair of n and d, we have

q2 ∈ Z and r2 ∈ {0, 1, . . . , d − 1} and n = q2 d + r2 .

Subtracting the equation n = q2 d + r2 from n = q1 d + r1 , we find

0 = (q1 d + r1 ) − (q2 d + r2 ) = (r1 − r2 ) − (q2 d − q1 d) = (r1 − r2 ) − (q2 − q1 ) d.

In other words,
r1 − r2 = (q2 − q1 ) d. (28)
We are in one of the following three cases:
Case 1: We have q1 < q2 .
Case 2: We have q1 = q2 .
Case 3: We have q1 > q2 .
Let us first consider Case 1. In this case, we have q1 < q2 , so that q2 − q1 >
0. Since q2 − q1 is an integer, this entails that q2 − q1 ≥ 1. We can multiply
this inequality by d (since d > 0), thus obtaining (q2 − q1 ) d ≥ 1d = d. In
view of (28), we can rewrite this as r1 − r2 ≥ d. However, r1 ≤ d − 1 (since
r1 ∈ {0, 1, . . . , d − 1}) and r2 ≥ 0 (since r2 ∈ {0, 1, . . . , d − 1}). Hence, r1 −
r2 ≤ r1 ≤ d − 1 < d. This contradicts r1 − r2 ≥ d. Thus, we have found a
|{z}
≥0
contradiction in Case 1.
Let us next consider Case 2. In this case, we have q1 = q2 . Hence, we can
rewrite (28) as r1 − r2 = (q2 − q2 ) d = 0, so that r1 = r2 . Combining q1 = q2
| {z }
=0
with r1 = r2 , we obtain (q1 , q2 ) = (r1 , r2 ), which contradicts our assumption
that the two quo-rem pairs (q1 , r1 ) and (q2 , r2 ) are distinct. Thus, we have
found a contradiction in Case 2.
Finally, in Case 3, we have q1 > q2 and therefore q2 < q1 . Thus, Case 3 is just
a copy of Case 1 with the roles of the two pairs (q1 , r1 ) and (q2 , r2 ) switched
(since the two quo-rem pairs (q1 , r1 ) and (q2 , r2 ) are playing identical roles).
Hence, we obtain a contradiction in Case 3 (since we obtained one in Case 1).
We have now obtained contradictions in all three Cases 1, 2 and 3. Thus,
we always have a contradiction. Hence, our assumption was wrong. This
completes our proof of the uniqueness of the quo-rem pair of n and d.
Math 221 Winter 2024, version March 12, 2024 page 95

Now, let us come to the existence part. It is reasonable to try induction, but
there is a hurdle: Induction on d does not work (there is no good way to use the
induction hypothesis), whereas induction on n cannot be used as long as n can
be negative. Fortunately, the latter hurdle is surmountable. One way around it
is to first prove the existence of a quo-rem pair in the case when n ∈ N (that is,
n ≥ 0), and afterwards generalize this result to arbitrary integers n.
So let us prove the n ∈ N case:
Lemma 3.3.6. Let n ∈ N, and let d be a positive integer. Then, there exists a
quo-rem pair of n and d.

Proof of Lemma 3.3.6. Fix d. We apply strong induction on n:


Induction step:23 Let n ∈ N. Assume (as the induction hypothesis) that
Lemma 3.3.6 is proved for all nonnegative integers smaller than n instead of
n. In other words, assume that for each nonnegative integer k < n, there exists
a quo-rem pair of k and d. We must prove that Lemma 3.3.6 also holds for n,
i.e., that there exists a quo-rem pair of n and d.
If n < d, then such a pair can be explicitly constructed: it is (0, n). (Indeed,
n = 0d + n and n ∈ {0, 1, . . . , d − 1}).
Otherwise, we have n ≥ d, so that n − d ∈ N. Thus, we can apply the
induction hypothesis to n − d instead of n (since n − d < n). We conclude that
there exists a quo-rem pair of n − d and d. We denote this pair by (q, r ). Then,
I claim that (q + 1, r ) is a quo-rem pair of n and d. Indeed, since (q, r ) is a
quo-rem pair of n − d and d, we have

n − d = qd + r.

Thus,
n = (qd + r ) + d = qd + d + r = (q + 1) d + r,
which shows that (q + 1, r ) is a quo-rem pair of n and d (since r ∈ {0, 1, . . . , d − 1}).
Thus, there exists a quo-rem pair of n and d. This completes our induction step,
and thus Lemma 3.3.6 is proved.
We now return to proving Theorem 3.3.1. We have shown that

• there is always at most one quo-rem pair of n and d, and

• there is at least one quo-rem pair of n and d if n ∈ N.

What remains to be done is proving that there is at least one quo-rem pair of
n and d if n < 0.
This can be done in several ways. One way is to proceed similarly to the
proof of Lemma 3.3.6, but using strong induction on −n.

23 Recall that a strong induction needs no base case (see Subsection 1.9.4).
Math 221 Winter 2024, version March 12, 2024 page 96

Alternatively, there is a slicker argument: We can reduce the “negative n”


case to the “nonnegative n” case (which is already covered by Lemma 3.3.6).
Namely, let n ∈ Z be negative. Then, the product (1 − d) n is nonnegative
(since both factors 1 − d and n are ≤ 0), so we can apply Lemma 3.3.6 to
(1 − d) n instead of n. Thus, we conclude that there exists a quo-rem pair (q, r )
of (1 − d) n and d. This pair (q, r ) satisfies

(1 − d) n = qd + r

(by the definition of a quo-rem pair). In other words,

n − dn = qd + r.

Hence,
n = dn + qd + r = (n + q) d + r.
This shows that (n + q, r ) is a quo-rem pair of n and d. Hence, such a quo-rem
pair exists. Hence, we have proved the existence of a quo-rem pair in the case
when n is negative. This completes our proof of Theorem 3.3.1.

3.3.3. An application: even and odd integers


We shall now use this theorem to derive some basic properties of even and odd
numbers. Recall what these words mean:

Definition 3.3.7. (a) An integer n is said to be even if 2 | n.


(b) An integer n is said to be odd if 2 ∤ n.

In other words, an integer is called even if it is divisible by 2, and is called


odd if it is not even.
Now we shall show the following:

Proposition 3.3.8. Let n be an integer.


(a) The integer n is even if and only if there exists some k ∈ Z such that
n = 2k.
(b) The integer n is odd if and only if there exists some k ∈ Z such that
n = 2k + 1.

Proof. Part (a) is a direct consequence of the definition of divisibility. But part
(b) is not!
So let us prove part (b). This is an “if and only if” statement, so we need to
prove both directions:

(n is odd) =⇒ (there exists some k ∈ Z such that n = 2k + 1)


Math 221 Winter 2024, version March 12, 2024 page 97

and

(there exists some k ∈ Z such that n = 2k + 1) =⇒ (n is odd) .

For the sake of brevity, I shall refer to these two directions as the “=⇒” and
“⇐=” directions (respectively).
Proof of the “=⇒” direction: Assume that n is odd. By Theorem 3.3.1, there
exists a quo-rem pair (q, r ) of n and 2. Consider this (q, r ). By the definition of
a quo-rem pair, this pair satisfies

q∈Z and r ∈ {0, 1} and n = 2q + r.

If r were 0, then we would thus get n = 2q + |{z}


r = 2q, which would show
=0
that n is even; but this is impossible because n is odd. Therefore, we must have
r ̸= 0, so that r = 1 (since r ∈ {0, 1}). Thus, n = 2q + |{z}
r = 2q + 1. Hence,
=1
there exists some k ∈ Z such that n = 2k + 1 (namely, k = q). Thus we have
shown the “=⇒” direction.
Proof of the “⇐=” direction: Assume that there exists some k ∈ Z such that
n = 2k + 1. Consider this k.
We must show that n is odd. This means showing that 2 ∤ n. This means
proving that n cannot be written as 2c for an integer c.
To prove this, we assume the contrary. That is, we assume that n = 2c for
some integer c. Consider this c.
Now, the two pairs (k, 1) and (c, 0) both are quo-rem pairs of n and 2, because
we have n = 2k + 1 and n = 2c = 2c + 0 (and 1 and 0 belong to {0, 1}). However,
Theorem 3.3.1 says that the quo-rem pair of n and 2 is unique, so these two pairs
(k, 1) and (c, 0) must be identical. But this is absurd, since their second entries
1 and 0 are different. So we find a contradiction. This concludes our proof that
n is odd. Thus, we have shown the “⇐=” direction of Proposition 3.3.8 (b).
This completes the proof of Proposition 3.3.8 (b) (since both directions are
proved).

Corollary 3.3.9. (a) The sum of any two even integers is even.
(b) The sum of any even integer with any odd integer is odd.
(c) The sum of any two odd integers is even.

Proof. We will only prove part (c), since the other two parts are analogous (and
even simpler).
(c) Let a and b be two odd integers. We must prove that a + b is even.
The integer a is odd. Hence, Proposition 3.3.8 (b) shows that we can write a
as a = 2k + 1 for some integer k.
Similarly, we can write b as b = 2ℓ + 1 for some integer ℓ.
Math 221 Winter 2024, version March 12, 2024 page 98

Consider these k and ℓ. Now, from a = 2k + 1 and b = 2ℓ + 1, we obtain

a + b = (2k + 1) + (2ℓ + 1) = 2k + 2ℓ + 2 = 2 (k + ℓ + 1) ,

which is clearly even. This proves Corollary 3.3.9 (c).

Remark 3.3.10. Corollary 3.3.9 (c) is a property specific to the number 2. For
example, it is not true that the sum of any two integers not divisible by 3 is
divisible by 3.

3.3.4. Basic properties of quotients and remainders


Here are some elementary facts about quotients and remainders:

Proposition 3.3.11. Let n ∈ Z, and let d be a positive integer. Then:


(a) We have n%d ∈ {0, 1, . . . , d − 1} and n%d ≡ n mod d.
(b) We have d | n if and only if n%d = 0.
(c) If c ∈ {0, 1, . . . , d − 1} satisfies c ≡ n mod d, then c = n%d.
(d) We have n = (n//d) d + (n%d).
(e) If n ∈ N, then n//d ∈ N.

Note that part (a) of this proposition can be restated as follows: The remain-
der n%d is an element of {0, 1, . . . , d − 1} that is congruent to n modulo d. Part
(c) says that, conversely, any element c of {0, 1, . . . , d − 1} that is congruent to n
modulo d must be this remainder n%d. Thus, together, these two parts uniquely
characterize the remainder n%d as the only element of {0, 1, . . . , d − 1} that is
congruent to n modulo d. This characterization is good to keep in mind, as it
describes the remainder independently of the quotient.
Proof of Proposition 3.3.11. We set

q := n//d and r := n%d.

Thus, (q, r ) is a quo-rem pair of n and d (by the definition of a quo-rem pair).
In other words, we have n = qd + r and q ∈ Z and r ∈ {0, 1, . . . , n − 1}. We can
now prove all five parts of the proposition:
(d) We have n = q d + |{z}
r = (n//d) d + (n%d). This proves Proposition
|{z}
=n//d =n%d
3.3.11 (d).
(a) We have n%d = r ∈ {0, 1, . . . , d − 1}. Moreover, from n = qd + r, we
obtain r − n = r − (qd + r ) = −qd, which is clearly divisible by d. Hence,
d | r − n. Equivalently, r ≡ n mod d. In other words, n%d ≡ n mod d (since
Math 221 Winter 2024, version March 12, 2024 page 99

r = n%d). Thus, Proposition 3.3.11 (a) is proved (since we have shown that
n%d ∈ {0, 1, . . . , d − 1} as well).
(c) Let c ∈ {0, 1, . . . , d − 1} satisfy c ≡ n mod d. We must show that c = n%d.
From c ≡ n mod d, we obtain d | c − n. In other words, c − n = de for
some e ∈ Z. Consider this e. From c − n = de, we obtain c = n + de, so that
n = c − de = (−e) d + c. This (combined with c ∈ {0, 1, . . . , d − 1}) shows that
(−e, c) is a quo-rem pair of n and d. However, (q, r ) is also a quo-rem pair of
n and d (by its definition). Since there is only one quo-rem pair of n and d (by
Theorem 3.3.1), this shows that (−e, c) = (q, r ). Hence, c = r = n%d. This
proves Proposition 3.3.11 (c).
(b) Again, this is an “if and only if” statement, and we shall prove its “=⇒”
and “⇐=” directions separately:
=⇒: Assume that d | n. We must prove that n%d = 0. In other words, we
must prove that r = 0.
Indeed, d | n yields that n ≡ 0 mod d (by Proposition 3.2.3). In other words,
0 ≡ n mod d. Since we furthermore have 0 ∈ {0, 1, . . . , d − 1}, we can thus
apply Proposition 3.3.11 (c) to c = 0, and conclude that 0 = n%d. In other
words, n%d = 0. This proves the “=⇒” direction (i.e., it proves that if d | n,
then n%d = 0).
⇐=: If n%d = 0, then d | n because

n = qd + |{z}
r = qd.
=n%d=0

This proves the “⇐=” direction. Thus, both directions are proved, so that
Proposition 3.3.11 (b) holds.
(e) Assume that n ∈ N. Recall that r ∈ {0, 1, . . . , d − 1}, so that r ≤ d − 1 < d.
But n = qd + r, so that qd + r = n ≥ 0 (since n ∈ N). In other words, qd ≥
−r > −d (since r < d).
If we had q < 0, then we would have q ≤ −1 (since q is an integer) and
therefore qd ≤ (−1) d (since we can multiply the inequality q ≤ −1 by the
positive number d); but this would contradict qd > −d = (−1) d. Hence, we
cannot have q < 0. Thus, q ≥ 0, so that q ∈ N. In other words, n//d ∈ N
(since q = n//d). This proves Proposition 3.3.11 (e).

Corollary 3.3.12. Let n ∈ Z. Then:


(a) The integer n is even if and only if n%2 = 0.
(b) The integer n is odd if and only if n%2 = 1.

Proof. (a) We have the following chain of logical equivalences:

(n is even) ⇐⇒ (2 | n) (by the definition of “even”)


⇐⇒ (n%2 = 0) (by Proposition 3.3.11 (b), applied to d = 2) .
Math 221 Winter 2024, version March 12, 2024 page 100

Hence, Corollary 3.3.12 (a) is proved.


(b) Proposition 3.3.11 (a) (applied to d = 2) yields n%2 ∈ {0, 1, . . . , 2 − 1} and
n%2 ≡ n mod 2. Thus, n%2 ∈ {0, 1, . . . , 2 − 1} = {0, 1}. Hence, n%2 is either 0
or 1. Thus, n%2 ̸= 0 holds if and only if n%2 = 1. In other words, we have the
logical equivalence (n%2 ̸= 0) ⇐⇒ (n%2 = 1).
However, we have the following chain of logical equivalences:

(n is odd) ⇐⇒ (2 ∤ n) (by the definition of “odd”)


⇐⇒ (not 2 | n)
⇐⇒ (not n%2 = 0)
 
since Proposition 3.3.11 (b) (applied to d = 2)
yields that 2 | n holds if and only if n%2 = 0
⇐⇒ (n%2 ̸= 0)
⇐⇒ (n%2 = 1) .

This proves Corollary 3.3.12 (b).


Quotients and remainders are closely connected to the so-called floor func-
tion:

Definition 3.3.13. The integer part (aka floor) of a real number x is defined
to be the largest integer that is ≤ x. It is denoted by ⌊ x ⌋.

For example,
j√ k
⌊3.8⌋ = 3, ⌊4.2⌋ = 4, ⌊5⌋ = 5, 2 = 1,
⌊π ⌋ = 3, ⌊0.5⌋ = 0, ⌊−1.2⌋ = −2

(make sure you understand the last example! −1 is not ≤ −1.2, but −2 is).
Now, here is the connection to quotients and remainders:

Proposition 3.3.14 (“explicit formulas” for quotient and remainder). Let n ∈


Z, and let d be a positive integer. Then,
jnk jnk
n//d = and n%d = n − d · .
d d

Proof. Proposition 3.3.11 (a) yields n%d ∈ {0, 1, . . . , d − 1}. Hence, n%d ≥ 0
and n%d ≤ d − 1 < d.
Proposition 3.3.11 (d) yields n = (n//d) d + (n%d). Thus,

n = (n//d) d + (n%d) < (n//d) d + d = ((n//d) + 1) d.


| {z }
<d
Math 221 Winter 2024, version March 12, 2024 page 101

Dividing both sides of this inequality by d (we can do this, since d > 0), we
n n
obtain < (n//d) + 1. In other words, (n//d) + 1 > .
d d
On the other hand,

n = (n//d) d + (n%d) ≥ (n//d) d.


| {z }
≥0

Dividing both sides of this inequality by d (we can do this, since d > 0), we
n
obtain ≥ n//d.
d
n n
Now, the integer n//d is ≤ (since ≥ n//d), but the next-larger integer
d d
n
(n//d) + 1 is not (since (n//d) + 1 > ). Thus, n//d is the largest integer that
n jnk d jnk
is ≤ . In other words, n//d = (by the definition of the floor ).
d d d
Solving the equation n = (n//d) d + (n%d) for n%d, we find
jnk jnk
n%d = n − (n//d) d = n − d = n−d· .
| {z } d d
n
 
=
d
Thus, Proposition 3.3.14 is proved.
Division with remainder is one of the most fundamental facts about integers;
almost all of number theory is downstream of it. Here are some applications:

Exercise 3.3.1. Let n be any integer. Prove the following:


(a) If n is odd, then 8 | n2 − 1.
(b) If 3 ∤ n, then 3 | n2 − 1.
[Hint: In part (a), write n as 2k + 1. In part (b), write n as q · 3 + r and
consider the possible values for r.]

Exercise 3.3.2. Let p be a positive integer.


Assume that you are given p-cent coins and ( p + 1)-cent coins (each in
infinite supply).
Prove that you can pay n cents using these coins for every integer n ≥
2
p − p.
In other words, prove that each integer n ≥ p2 − p can be written as
a ( p + 1) + bp with a, b ∈ N.

3.3.5. Base-b representation of nonnegative integers


Division with remainder is the main ingredient in a feature of integers that you
may well be taking for granted, but actually needs to proved: the fact that every
Math 221 Winter 2024, version March 12, 2024 page 102

integer can be uniquely expressed in decimal notation, or, more generally, in


base-b notation for any given integer b > 1.
What does this mean? For example,
3401 = 3 · 1000 + 4 · 100 + 0 · 10 + 1 · 1
= 3 · 103 + 4 · 102 + 0 · 101 + 1 · 100 .
Thus, we have written the fairly large number 3401 as a pretty short sum of
powers of 10, with the coefficients being integers between 0 and 9 (commonly
known as “digits”).
This can be done for any nonnegative integer n instead of 3401. This can also
be done with any fixed integer b > 1 instead of 10, except that the coefficients
(“generalized digits”) will then be integers between 0 and b − 1. This is called
the “base-b representation” of the integer n.
For instance, let us find the base-4 representation of the integer 3401: This
will be a representation of 3401 in the form

3401 = r6 46 + r5 45 + r4 44 + r3 43 + r2 42 + r1 41 + r0 40 ,
where each ri is a “base-4 digit” (i.e., an element of {0, 1, 2, 3}). Here, we are tac-
itly assuming that 46 is the highest power of 4 that we need; but we don’t actu-
ally know this yet, so we must be prepared to add higher powers (47 , 48 , 49 , . . .)
if needed.
How do we find these base-4 digits r0 , r1 , . . . , r6 ?
We start by identifying r0 . Indeed, on the RHS24 of the equation

3401 = r6 46 + r5 45 + r4 44 + r3 43 + r2 42 + r1 41 + r0 40 ,
all but the last addends are multiples of 4, whereas the last addend is r0 40 = r0 .
Hence, we can rewrite this equation as follows (factoring out the 4):
 
3401 = 4 · r6 45 + r5 44 + r4 43 + r3 42 + r2 41 + r1 40 + r0 .

Since r0 ∈ {0, 1, 2, 3}, this equation reveals that the pair


 
r6 45 + r5 44 + r4 43 + r3 42 + r2 41 + r1 40 , r0

is a quo-rem pair of 3401 and 4. In particular, we must have


r0 = 3401%4 = 1 and
5 4 3 2 1 0
r6 4 + r5 4 + r4 4 + r3 4 + r2 4 + r1 4 = 3401//4 = 850.
Thus, we have identified the last base-4 digit r0 as 1. In order to find the
remaining digits, we analyze the latter equation

850 = r6 45 + r5 44 + r4 43 + r3 42 + r2 41 + r1 40 .
24 “RHS” means “right hand side”.
Math 221 Winter 2024, version March 12, 2024 page 103

In this equation, the only addend on the RHS not divisible by 4 is r1 40 = r1 , so


we can rewrite this equation as
 
850 = 4 · r6 44 + r5 43 + r4 42 + r3 41 + r2 40 + r1 ,

and thus conclude that

r1 = 850%4 = 2 and
r6 44 + r5 43 + r4 42 + r3 41 + r2 40 = 850//4 = 212.

Thus, we have identified the base-4 digit r1 as 2. In order to find the remain-
ing digits, we analyze the latter equation

212 = r6 44 + r5 43 + r4 42 + r3 41 + r2 40 .

In this equation, the only addend on the RHS not divisible by 4 is r2 40 = r2 , so


we can rewrite this equation as
 
3 2 1 0
212 = 4 · r6 4 + r5 4 + r4 4 + r3 4 + r2 ,

and thus conclude that

r2 = 212%4 = 0 and
3 2 1 0
r6 4 + r5 4 + r4 4 + r3 4 = 212//4 = 53.

Thus, we have identified the base-4 digit r2 as 0. In order to find the remain-
ing digits, we analyze the latter equation

53 = r6 43 + r5 42 + r4 41 + r3 40 .

In this equation, the only addend on the RHS not divisible by 4 is r3 40 = r3 , so


we can rewrite this equation as
 
53 = 4 · r6 42 + r5 41 + r4 40 + r3 ,

and thus conclude that

r3 = 53%4 = 1 and
r6 42 + r5 41 + r4 40 = 53//4 = 13.

Thus, we have identified the base-4 digit r3 as 1. In order to find the remain-
ing digits, we analyze the latter equation

13 = r6 42 + r5 41 + r4 40 .
Math 221 Winter 2024, version March 12, 2024 page 104

In this equation, the only addend on the RHS not divisible by 4 is r4 40 = r4 , so


we can rewrite this equation as
 
13 = 4 · r6 41 + r5 40 + r4 ,

and thus conclude that


r4 = 13%4 = 1 and
r6 41 + r5 40 = 13//4 = 3.
Thus, we have identified the base-4 digit r4 as 1. In order to find the remain-
ing digits, we analyze the latter equation

3 = r6 41 + r5 40 .
In this equation, the only addend on the RHS not divisible by 4 is r5 40 = r5 , so
we can rewrite this equation as
 
0
3 = 4 · r6 4 + r5 ,

and thus conclude that


r5 = 3%4 = 3 and
r6 40 = 3//4 = 0.
Thus, we have identified the base-4 digit r5 as 3. Moreover, the equation
r6 40 = 0 shows that r6 = 0.
Thus, altogether, we have found the representation of 3401 we were looking
for:
3401 = r6 46 + r5 45 + r4 44 + r3 43 + r2 42 + r1 41 + r0 40 .
|{z} |{z} |{z} |{z} |{z} |{z} |{z}
=0 =3 =1 =1 =0 =2 =1

In analogy to the decimal system, we can state this as “the number 3401
written in base-4 is 0311021” (since the base-4 digits r6 , r5 , . . . , r0 have been
identified as 0, 3, 1, 1, 0, 2, 1). Commonly, one would omit the leading zeroes, so
this would become 311021.
The method we just used can be used for any given integer b > 1 instead
of 4 and any nonnegative integer n ∈ N instead of 3401: To find the “base-
b digits” of a nonnegative integer n, we first divide n by b with remainder,
then divide the resulting quotient again by b with remainder, then divide the
resulting quotient again by b with remainder, and so on, until we are left with
the quotient 0. The remainders obtained in the process will then be the base-b
digits of n (from right to left). This process must eventually come to an end
because (since b > 1) each quotient will be smaller than the preceding one.
We can summarize this as a theorem:
Math 221 Winter 2024, version March 12, 2024 page 105

Theorem 3.3.15. Let b > 1 be an integer. Let n ∈ N. Then:


(a) We can write n in the form

n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0

with
k∈N and r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1} .
(b) If n < bk+1 for some k ∈ N, then we can write n in the form

n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0

with
r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1} .
(c) These r0 , r1 , . . . , rk are unique (when k is given). Moreover, they can be
explicitly computed by the formula
 
ri = n//bi %b for each i ∈ {0, 1, . . . , k } .

That is, they can be explicitly computed by

r0 = n%b,
r1 = (n//b) %b,
 
r2 = n//b2 %b,
 
r3 = n//b3 %b,
...,
 
rk = n//bk %b.

Proof. Forget that n was fixed (but keep b fixed). We shall prove the following two
claims:

Claim 1: Let n ∈ N and k ∈ N be such that n < bk+1 . Then, we can write n
in the form

n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0

with
r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1} .

Claim 2: Let n ∈ N and k ∈ N. Assume that n has been written in the form

n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0
Math 221 Winter 2024, version March 12, 2024 page 106

with
r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1} .
Then,  
ri = n//bi %b for each i ∈ {0, 1, . . . , k } .

Once these two claims are proved, Theorem 3.3.15 will follow, because

• Theorem 3.3.15 (b) follows directly from Claim 1.

• Theorem 3.3.15 (c) follows directly from Claim 2.

• Theorem 3.3.15 (a) follows from Claim 1 (since we can pick k ∈ N high enough
that n < bk+1 holds25 ).

Hence, it remains to prove Claim 1 and Claim 2.

Proof of Claim 1. We proceed by induction on k:


Base case: For k = 0, Claim 1 is saying that every n ∈ N satisfying n < b can be
written in the form n = r0 · b0 with r0 ∈ {0, 1, . . . , b − 1}. But this is obvious: Since
n ∈ N and n < b, we have n ∈ {0, 1, . . . , b − 1}, and thus we can just pick r0 = n and
b0 = r0 = n). Hence, Claim 1 is proved for k = 0.
have n = r0 · b0 (since r0 · |{z}
=1
Induction step: We make a step from k − 1 to k. Thus, we let k be a positive integer.
Assume (as the induction hypothesis) that Claim 1 holds for k − 1 instead of k. We
must now show that Claim 1 holds for k as well.
So let n ∈ N be such that n < bk+1 . Then, Proposition 3.3.11 (e) (applied to d = b)
yields n//b ∈ N. Moreover, n%b ∈ {0, 1, . . . , b − 1} (by the definition of a remainder).
Hence, n%b ≥ 0. Now, Proposition 3.3.11 (d) (applied to d = b) yields

n = (n//b) b + (n%b) ≥ (n//b) b.


| {z }
≥0

Hence, (n//b) b ≤ n < bk+1 . Dividing this inequality by the positive number b, we
obtain n//b < bk+1 /b = bk .
Now, recall our induction hypothesis, which says that Claim 1 holds for k − 1 instead
of k. In other words, if m ∈ N is such that m < b(k−1)+1 , then we can write m in the
form26
m = s k −1 · b k −1 + s k −2 · b k −2 + · · · + s 1 · b 1 + s 0 · b 0
with
s0 , s1 , . . . , sk−1 ∈ {0, 1, . . . , b − 1} .

25 Indeed, the assumption b > 1 ensures that the sequence b0 , b1 , b2 , . . . is strictly increasing


and thus eventually outgrows any given integer, including our n. Or we can argue this
directly: An easy induction (on n) shows that n < bn+1 , and thus we can simply take k = n.
26 We are deliberately using the letters m and s instead of n and r here, since the letter n is
i i
already taken (and the letters ri will be needed for something different).
Math 221 Winter 2024, version March 12, 2024 page 107

We can apply this to m = n//b (since n//b ∈ N and n//b < bk = b(k−1)+1 ), and
conclude that we can write n//b in the form

n//b = sk−1 · bk−1 + sk−2 · bk−2 + · · · + s1 · b1 + s0 · b0

with
s0 , s1 , . . . , sk−1 ∈ {0, 1, . . . , b − 1} .
Let us do this. Thus,

n= (n//b) b + (n%b)
| {z }
=sk−1 ·bk−1 +sk−2 ·bk−2 +···+s1 ·b1 +s0 ·b0
 
= sk−1 · bk−1 + sk−2 · bk−2 + · · · + s1 · b1 + s0 · b0 b + (n%b)
= sk−1 · bk + sk−2 · bk−1 + · · · + s1 · b2 + s0 · b1 + (n%b)
| {z }
=(n%b)·b0
k k −1
= s k −1 · b + s k −2 · b + · · · + s1 · b + s0 · b + (n%b) · b0 .
2 1

Note that the coefficients n%b, s0 , s1 , . . . , sk−1 on the right hand side here all belong to
{0, 1, . . . , b − 1} (as we know). Thus, through this equality, we have written n in the
form
n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0
with
r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1}
(namely, with r0 = n%b and r1 = s0 and r2 = s1 and . . . and rk−1 = sk−2 and rk = sk−1 ).
Hence, n can be written in this form.
We have thus proved that if n ∈ N is such that n < bk+1 , then we can write n in the
form
n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0
with
r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1} .
In other words, we have proved Claim 1 for our k. This completes the induction step.
Thus, Claim 1 is proved by induction.

Proof of Claim 2. We could prove this by induction as well, but let us instead go for a
direct proof.
By assumption, we have
k k
n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0 = ∑ rj · bj = ∑ rj bj.
j =0 j =0

Now, we must prove that ri = n//bi %b for each i ∈ {0, 1, . . . , k }. So let us fix an


i ∈ {0, 1, . . . , k }.
We have
k i −1 k
n= ∑ rj bj = ∑ rj bj + ∑ rj bj (29)
j =0 j =0 j =i
Math 221 Winter 2024, version March 12, 2024 page 108

(here, we have split our sum into two parts: one part which contains the addends for
j ∈ {0, 1, . . . , i − 1}, and one part which contains the addends for j ∈ {i, i + 1, . . . , k }).
We can rewrite the second sum as follows:
k k k
∑ rj bj =
|{z} ∑ r j bi b j −i = bi ∑ r j b j −i .
j =i j =i j =i
= bi b j −i

Thus, we can rewrite (29) as


i −1 k
n= ∑ r j b j + bi ∑ r j b j −i . (30)
j =0 j =i

Let us set
k i −1
q′ := ∑ r j b j −i and r′ := ∑ rj bj.
j =i j =0

With these notations, we can rewrite (30) as

n = r ′ + bi q ′ = q ′ bi + r ′ . (31)

k i −1
Note that both sums q′ = ∑ r j b j−i and r ′ = ∑ r j b j are integers (indeed, b j−i is always
j =i j =0
an integer in the first sum, since j ≥ i entails j − i ∈ N).
We have assumed that r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1}. In particular, the integers
r0 , r1 , . . . , rk are all ≥ 0 and ≤ b − 1. In other words, each j ∈ {0, 1, . . . , k } satisfies
i −1
r j ≥ 0 and r j ≤ b − 1. Hence, r ′ = ∑ r j b j ≥ 0 (since all the integers r j are ≥ 0, and so
j =0
is b) and
i −1 i −1 i −1
bi − 1
r′ = ∑ |{z}
rj bj ≤ ∑ ( b − 1) b j = ( b − 1) ∑ bj = ( b − 1) ·
b−1
= bi − 1.
j =0 j =0 j =0
≤ b −1 | {z }
=b0 +b1 +···+bi−1
bi − 1
=
b−1
(by Corollary 1.6.3,
applied to b and i
instead of q and n)

Thus, r ′ ∈ 0, 1, . . . , bi − 1 .


The equality (31) says that n = q′ bi + r ′ . In light of q′ ∈ Z and r ′ ∈ 0, 1, . . . , bi − 1 ,




this shows that (q′ , r ′ ) is a quo-rem pair of n and bi . Therefore, in particular, q′ is the
quotient of the division of n by bi . In other words,

q′ = n//bi .
Math 221 Winter 2024, version March 12, 2024 page 109

However,
k
q′ = ∑ r j b j − i = r i b 0 + r i +1 b 1 + r i +2 b 2 + · · · + r k b k − i
j =i
 
b 0 + r i +1 b 1 + r i +2 b 2 + · · · + r k b k − i
= ri |{z}
=1 | {z }
=(ri+1 b0 +ri+2 b1 +···+rk bk−i−1 )b
 
= ri + ri+1 b0 + ri+2 b1 + · · · + rk bk−i−1 b.

Thus, q′ − ri = ri+1 b0 + ri+2 b1 + · · · + rk bk−i−1 b, which is clearly divisible by b. That




is, b | q′ − ri . In other words, q′ ≡ ri mod b. In other words, ri ≡ q′ mod b. Since


we furthermore have ri ∈ {0, 1, . . . , b − 1} (because r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1}), we
thus conclude that ri = q′ %b (by Proposition 3.3.11 (c), applied to q′ , band ri instead
of n, d and c). In view of q′ = n//bi , we can rewrite this as ri = n//b i %b.
i

Forget that we fixed i. We thus have shown that ri = n//b %b for each i ∈
{0, 1, . . . , k}. This proves Claim 2.
Now, both Claims 1 and 2 are proved. As explained above, this completes the proof
of Theorem 3.3.15.

The inductive proof of Claim 1 in the above proof is just a formal avatar of the
algorithm for writing a nonnegative integer n in base b that we demonstrated on an
example before the theorem. The formula ri = n//bi %b from Claim 2, on the other


hand, gives an alternative way of computing each base-b digit of n directly.

3.3.6. Congruence in terms of remainders


Here is one more application of division with remainder: a new criterion for
congruence. Specifically, two integers a and b are congruent modulo a given
positive integer d if and only if they leave the same remainder when divided
by d (that is, satisfy a%d = b%d). In other words:

Proposition 3.3.16. Let d be a positive integer. Let a and b be two integers.


Then, a ≡ b mod d if and only if a%d = b%d.

Proof. Proposition 3.3.11 (a) (applied to n = a) yields that a%d ∈ {0, 1, . . . , d − 1}


and a%d ≡ a mod d. Similarly, b%d ∈ {0, 1, . . . , d − 1} and b%d ≡ b mod d.
We must prove the logical equivalence ( a ≡ b mod d) ⇐⇒ ( a%d = b%d). In
other words, we must prove the two implications

( a ≡ b mod d) =⇒ ( a%d = b%d)

and
( a%d = b%d) =⇒ ( a ≡ b mod d) .
Let us prove these implications separately:
Math 221 Winter 2024, version March 12, 2024 page 110

Proof of ( a ≡ b mod d) =⇒ ( a%d = b%d): Assume that a ≡ b mod d. Thus,


b ≡ a mod d (by symmetry of congruence – i.e., by Proposition 3.2.4 (b)). Com-
bining b%d ≡ b mod d with b ≡ a mod d, we obtain b%d ≡ a mod d (by transi-
tivity of congruence – i.e., by Proposition 3.2.4 (c)).
Thus, we know that b%d ∈ {0, 1, . . . , d − 1} and b%d ≡ a mod d. Hence,
Proposition 3.3.11 (c) (applied to n = a and c = b%d) yields b%d = a%d. In
other words, a%d = b%d. Thus, we have proved the implication ( a ≡ b mod d) =⇒
( a%d = b%d).
Proof of ( a%d = b%d) =⇒ ( a ≡ b mod d): Assume that a%d = b%d. How-
ever, we know that a%d ≡ a mod d, so that a ≡ a%d mod d (by symmetry of
congruence). In view of a%d = b%d, we can rewrite this as a ≡ b%d mod d.
Combining this with b%d ≡ b mod d, we obtain a ≡ b mod d (by transitivity of
congruence – i.e., by Proposition 3.2.4 (c)). Thus, we have proved the implica-
tion ( a%d = b%d) =⇒ ( a ≡ b mod d).
Now, both implications are proved, so that Proposition 3.3.16 is proved.

Corollary 3.3.17. Let a and b be two integers. Then, a ≡ b mod 2 holds if and
only if the numbers a and b are either both even or both odd.

Proof. =⇒: Assume that a ≡ b mod 2. We must show that the numbers a and b
are either both even or both odd.
Proposition 3.3.16 (applied to d = 2) shows that a ≡ b mod 2 if and only if
a%2 = b%2. Thus, a%2 = b%2 (since a ≡ b mod 2). However, a%2 ∈ {0, 1}
(by Proposition 3.3.11 (a), applied to n = a and d = 2). In other words, a%2 is
either 0 or 1. If a%2 = 0, then b%2 = 0 as well (since a%2 = b%2), and therefore
both a and b are even (by Corollary 3.3.12 (a)). If a%2 = 1, then b%2 = 1 as
well (since a%2 = b%2), and therefore both a and b are odd (by Corollary 3.3.12
(b)). Other cases cannot occur, since we know that a%2 is either 0 or 1. Hence,
in every possible case, the numbers a and b are either both even or both odd.
This proves the “=⇒” direction of Corollary 3.3.17.
⇐=: Assume that the numbers a and b are either both even or both odd.
Thus, the numbers a%2 and b%2 are either both 0 (this happens when a and
b are both even, by Corollary 3.3.12 (a)) or both 1 (this happens when a and b
are both odd, by Corollary 3.3.12 (b)). In either case, we have a%2 = b%2. By
Proposition 3.3.16 (applied to d = 2), this entails a ≡ b mod 2. Hence, the “⇐=”
direction of Corollary 3.3.17 is proved.

3.3.7. The birthday lemma


If you have lived for exactly n days, then you are n//365 years and n%365 days
old (assuming, for simplicity, that every year has exactly 365 days; leapyears
would complicate this a lot). On any “normal” day, the latter number (that
is, n%365) increases by 1 while the former number (that is, n//365) stays un-
changed. On a birthday, however, the latter number gets reset to 0 while the
Math 221 Winter 2024, version March 12, 2024 page 111

former number increases by 1. This simple and intuitive observation is not


specific to 365, and is worth stating as a proposition:

Proposition 3.3.18 (birthday lemma). Let n ∈ Z, and let d be a positive inte-


ger. Then:
(a) If d | n, then

n//d = ((n − 1) //d) + 1 and


n%d = 0 and (n − 1) %d = d − 1.

(b) If d ∤ n, then

n//d = (n − 1) //d and n%d = ((n − 1) %d) + 1.

It should be easy to prove both parts of this lemma, but we give a proof for
the sake of completeness.
Proof of Proposition 3.3.18. (a) Assume that d | n. Thus, n = dq for some q ∈ Z. Con-
sider this q.
Recall Definition 3.3.2. We have q ∈ Z and 0 ∈ {0, 1, . . . , d − 1} and n = qd + 0
(since qd + 0 = qd = dq = n). In other words, (q, 0) is a quo-rem pair of n and d (by
the definition of a quo-rem pair). Hence, Definition 3.3.2 shows that n//d = q and
n%d = 0.
On the other hand, from n = dq, we obtain

n − 1 = dq − 1
= ( q − 1) d + ( d − 1) (since (q − 1) d + (d − 1) = qd − d + d − 1 = qd − 1) .
Thus, we have q − 1 ∈ Z and d − 1 ∈ {0, 1, . . . , d − 1} and n − 1 = (q − 1) d + (d − 1).
In other words, the pair (q − 1, d − 1) is a quo-rem pair of n − 1 and d (by the defini-
tion of a quo-rem pair). Hence, Definition 3.3.2 shows that (n − 1) //d = q − 1 and
(n − 1) %d = d − 1.
Now, from (n − 1) //d = q − 1, we obtain ((n − 1) //d) + 1 = q = n//d. In other
words, n//d = ((n − 1) //d) + 1. Combining this with n%d = 0 and (n − 1) %d =
d − 1, we see that Proposition 3.3.18 (a) has been proved.
(b) Assume that d ∤ n. Let q = n//d and r = n%d. Then, by the definition of quotient
and remainder, we have

q∈Z and r ∈ {0, 1, . . . , d − 1} and n = qd + r.

If we had r = 0, then we would have n = qd + |{z}


r = qd = dq, which would entail
=0
d | n; but this would contradict d ∤ n. Hence, we cannot have r = 0. In other words, r
is not 0.
So r is an element of the set {0, 1, . . . , d − 1} but is not 0. Therefore, r is one of the
remaining elements 1, 2, . . . , d − 1. Therefore, r − 1 is one of the elements 0, 1, . . . , d − 2.
Thus, r − 1 ∈ {0, 1, . . . , d − 1}.
Math 221 Winter 2024, version March 12, 2024 page 112

Also, from n = qd + r, we obtain n − 1 = (qd + r ) − 1 = qd + (r − 1). So we know


that q ∈ Z and r − 1 ∈ {0, 1, . . . , d − 1} and n − 1 = qd + (r − 1). In other words, the
pair (q, r − 1) is a quo-rem pair of n − 1 and d (by the definition of a quo-rem pair).
Hence, Definition 3.3.2 shows that (n − 1) //d = q and (n − 1) %d = r − 1.
Thus, (n − 1) //d = q = n//d, so that n//d = (n − 1) //d. Also, from (n − 1) %d =
r − 1, we obtain ((n − 1) %d) + 1 = r = n%d, so that n%d = ((n − 1) %d) + 1. Thus,
we have proved Proposition 3.3.18 (b).

Part of Proposition 3.3.18 can be restated using the floor notation:

Corollary 3.3.19. Let n ∈ Z, and let d be a positive integer. Then:


(a) If d | n, then
n−1
jnk  
= + 1.
d d
(b) If d ∤ n, then
n−1
jnk  
= .
d d
jnk
Proof. Proposition 3.3.14 yields n//d = . The same argument (applied to n − 1
d
n−1

instead of n) yields (n − 1) //d = .
d
(a) Assume that d | n. Then, Proposition 3.3.18
 (a) yields n//d = ((n − 1) //d ) + 1.
jnk n−1 jnk
In view of n//d = and (n − 1) //d = , we can rewrite this as =
d d d
n−1
 
+ 1. This proves Corollary 3.3.19 (a).
d
(b) Assume that d ∤ n. Then, Proposition 3.3.18(b) yields n//d = (n − 1) //d.
jnk n−1 jnk
In view of n//d = and (n − 1) //d = , we can rewrite this as =
d d d
n−1
 
. This proves Corollary 3.3.19 (b).
d

3.4. Greatest common divisors


3.4.1. Definition
The following definition plays a crucial role in number theory, particularly in
the study of prime numbers that will be the topic of Section 3.6.

Definition 3.4.1. Let a and b be two integers.


(a) The common divisors of a and b are the integers that divide a and
simultaneously divide b.
Math 221 Winter 2024, version March 12, 2024 page 113

(b) The greatest common divisor of a and b is the largest among the com-
mon divisors of a and b, unless a = b = 0. In the case a = b = 0, it is defined
to be 0 instead.
We denote the greatest common divisor of a and b as gcd ( a, b), and we
refer to it as the gcd of a and b.

We will soon see that this greatest common divisor is well-defined (see Re-
mark 3.4.2 below). But first, some examples:

• What is gcd (4, 6) ?


The divisors of 4 are −4, −2, −1, 1, 2, 4.
The divisors of 6 are −6, −3, −2, −1, 1, 2, 3, 6.
Thus, the common divisors of 4 and 6 are −2, −1, 1, 2.
So the greatest common divisor of 4 and 6 is 2. That is, gcd (4, 6) = 2.

• What is gcd (0, 5) ?


The divisors of 0 are all integers (you cannot list them all).
The divisors of 5 are −5, −1, 1, 5.
Thus, the common divisors of 0 and 5 are just the divisors of 5, which are
−5, −1, 1, 5.
So the gcd is 5. That is, gcd (0, 5) = 5.

• What is gcd (0, 0) ?


The common divisors of 0 and 0 are all integers, so there is no greatest one
among them, but we have defined gcd (0, 0) to be 0. (This is the reason
why we had to make an exception for the a = b = 0 case in Definition
3.4.1 (b).)

Let us now convince ourselves that gcd ( a, b) is well-defined:

Remark 3.4.2. Let a, b ∈ Z. We want to show that gcd ( a, b) is well-defined


in Definition 3.4.1 (b).
If a = b = 0, then this is clear, since we defined this gcd to be 0.
Consider the remaining case – i.e., the case when a ̸= 0 or b ̸= 0 (or both).
For instance, let us assume that a ̸= 0. Then, the divisors d of a all satisfy
|d| ≤ | a| (since Proposition 3.1.4 (b) shows that they satisfy abs d ≤ abs a,
which in our present notations means |d| ≤ | a|). In other words, all these
divisors are integers in the interval [− | a| , | a|]. Hence, there are finitely
many of them. Therefore, there are finitely many common divisors of a and
b (since any common divisor of a and b is a divisor of a). On the other hand,
there is at least one common divisor of a and b (namely, 1). Therefore, the
set of all common divisors of a and b is nonempty and finite, and thus has
Math 221 Winter 2024, version March 12, 2024 page 114

a maximum element. In other words, there is a (literally) largest among the


common divisors of a and b. This shows that gcd ( a, b) is well-defined when
a ̸= 0.
An analogous argument leads to the same conclusion when b ̸= 0. Thus,
we have shown that gcd ( a, b) is always well-defined.

This argument also gives us a slow and stupid algorithm to compute gcd ( a, b)
when a ̸= 0: We just go through all integers in the interval [− | a| , | a|], and
check which of them are common divisors of a and b. But there is a much
faster algorithm.

3.4.2. Basic properties


To find this algorithm, we first collect some basic properties of gcds:

Proposition 3.4.3. (a) We have gcd ( a, b) ∈ N for any a, b ∈ Z.


(b) We have gcd ( a, 0) = gcd (0, a) = | a| for any a ∈ Z.
(c) We have gcd ( a, b) = gcd (b, a) for any a, b ∈ Z.
(d) If a, b, c ∈ Z satisfy b ≡ c mod a, then gcd ( a, b) = gcd ( a, c).
(e) We have gcd ( a, b) = gcd ( a, ua + b) for any a, b, u ∈ Z.
(f) We have gcd ( a, b) = gcd ( a, b%a) for any positive integer a and any
b ∈ Z.
(g) We have gcd ( a, b) | a and gcd ( a, b) | b for any a, b ∈ Z.
(h) We have gcd (− a, b) = gcd ( a, b) and gcd ( a, −b) = gcd ( a, b) for any
a, b ∈ Z.
(i) If a, b ∈ Z satisfy a | b, then gcd ( a, b) = | a|.

Proof. (a) Let a, b ∈ Z. We must prove that gcd ( a, b) ∈ N.


If a = b = 0, then this follows from gcd (0, 0) = 0 ∈ N.
Thus, let us assume that not both of a and b are 0. Then, gcd ( a, b) is
literally the greatest common divisor of a and b. If gcd ( a, b) was negative,
then − gcd ( a, b) would be an even greater common divisor of a and b (since
− gcd ( a, b) divides whatever gcd ( a, b) divides, but the negativity of gcd ( a, b)
implies − gcd ( a, b) > gcd ( a, b)), which would contradict the previous sen-
tence. Hence, gcd ( a, b) cannot be negative. Thus, gcd ( a, b) ∈ N. This proves
Proposition 3.4.3 (a).
(b) Let a ∈ Z. Every integer is a divisor of 0. Thus, the common divisors of
a and 0 are just the divisors of a. However, the largest divisor of a is | a| (unless
a = 0, which case can be easily handled separately)27 . Hence, the greatest
27 Thisfact is a consequence of Proposition 3.1.4 (b) (recalling that | a| was called abs a back in
that proposition).
Math 221 Winter 2024, version March 12, 2024 page 115

common divisor of a and 0 is | a|. In other words, we have gcd ( a, 0) = | a|.


Similarly, we can see that gcd (0, a) = | a|. Thus, Proposition 3.4.3 (b) is proved.

(c) Proposition 3.4.3 (c) follows from observing that a and b play equal roles
in Definition 3.4.1.
(d) Let a, b, c ∈ Z satisfy b ≡ c mod a. We must prove that gcd ( a, b) =
gcd ( a, c).
If a = 0, then this is clearly true (because in this case, b ≡ c mod a becomes
b ≡ c mod 0, which entails b = c).
It thus remains to consider the case a ̸= 0 only. In this case, gcd ( a, b) is
literally the greatest common divisor of a and b, whereas gcd ( a, c) is literally
the greatest common divisor of a and c. Hence, in order to prove that these
two gcds are equal, it will suffice to show that the common divisors of a and b
are precisely the common divisors of a and c. To do this, in turn, it suffices to
prove the following two claims:

Claim 1: Each common divisor of a and b is a common divisor of a


and c.

Claim 2: Each common divisor of a and c is a common divisor of a


and b.

Before we prove these two claims, let us recall that b ≡ c mod a; in other
words, c ≡ b mod a (by the symmetry of congruence). Hence, the numbers b
and c play equal roles in our setting. Thus, Claims 1 and 2 are analogous, so
that any proof of one of the two will also prove the other (once the roles of b
and c are switched).
Proof of Claim 1. Let d be a common divisor of a and b. Thus, d | a and d | b
(by the definition of a common divisor). In other words, we have a = dx and
b = dy for some integers x and y. Consider these x and y.
But b ≡ c mod a. In other words, a | b − c. Hence, d | a | b − c (by the
transitivity of divisibility). In other words, b − c = dz for some integer z.
Consider this z.
Now, b − (b − c) = c, so that

b − (b − c) = dy − dz = d (y − z) .
c = |{z}
| {z } | {z }
=dy =dz an integer

Therefore, d | c. From d | a and d | c, we conclude that d is a common divisor


of a and c.
So we have shown that if d is a common divisor of a and b, then d is a
common divisor of a and c. In other words, each common divisor of a and b is
a common divisor of a and c. This proves Claim 1.
Math 221 Winter 2024, version March 12, 2024 page 116

Proof of Claim 2. As we said, we can obtain a proof of Claim 2 by switching the


roles of b and c in the above proof of Claim 1 (because we have c ≡ b mod a).
Combining Claim 1 with Claim 2, we see that the common divisors of a and
b are precisely the common divisors of a and c. Therefore, the greatest common
divisor of a and b equals the greatest common divisor of a and c. In other
words, gcd ( a, b) = gcd ( a, c). This proves Proposition 3.4.3 (d).
(e) Proposition 3.4.3 (e) follows from Proposition 3.4.3 (d) (applied to c =
ua + b), since b ≡ ua + b mod a (because b − (ua + b) = −ua is divisible by a).
(f) Proposition 3.4.3 (f) follows from Proposition 3.4.3 (d) (applied to c =
b%a), since b ≡ b%a mod a (because Proposition 3.3.11 (a) yields b%a ≡ b mod a).

(g) is obvious when a = b = 0 (since 0 | 0), and otherwise follows from the
definition of gcd ( a, b).
(h) The divisors of a are precisely the divisors of − a. The divisors of b are
precisely the divisors of −b. Thus, the common divisors of a and b remain
unchanged if we replace a by − a or replace b by −b. Therefore, Proposition
3.4.3 (h) follows from the definition of the gcd.
(i) Let a, b ∈ Z satisfy a | b. Then, b ≡ 0 mod a. Hence, Proposition 3.4.3 (d)
(applied to c = 0) yields gcd ( a, b) = gcd ( a, 0) = | a| (by Proposition 3.4.3 (b)).
This proves Proposition 3.4.3 (i).

Corollary 3.4.4 (Euclidean recursion for the gcd). Let a ∈ Z, and let b be a
positive integer. Then,

gcd ( a, b) = gcd (b, a%b) .

Proof. Proposition 3.4.3 (c) yields

gcd ( a, b) = gcd (b, a) = gcd (b, a%b)

(by Proposition 3.4.3 (f), applied to b and a instead of a and b). This proves
Corollary 3.4.4.
Math 221 Winter 2024, version March 12, 2024 page 117

3.4.3. The Euclidean algorithm


By applying Corollary 3.4.4 repeatedly, we can compute gcds rather quickly:
For example,
 

gcd (93, 18) = gcd 18, 93%18


| {z }
 (by Corollary 3.4.4)
=3
= gcd (18, 3)
 

= gcd 3, 18%3


| {z }
 (by Corollary 3.4.4)
=0
= gcd (3, 0) = |3| (by Proposition 3.4.3 (b))
=3
Math 221 Winter 2024, version March 12, 2024 page 118

and
 

gcd (1145, 739) = gcd 739, 1145%739


| {z }
 (by Corollary 3.4.4)
=406
= gcd (739, 406)
 

= gcd 406, 739%406


| {z }
 (by Corollary 3.4.4)
=333
= gcd (406, 333)
 

= gcd 333, 406%333


| {z }
 (by Corollary 3.4.4)
=73
= gcd (333, 73)
= gcd (73, 333%73) (by Corollary 3.4.4)
= gcd (73, 41)
= gcd (41, 73%41) (by Corollary 3.4.4)
= gcd (41, 32)
= gcd (32, 41%32) (by Corollary 3.4.4)
= gcd (32, 9)
= gcd (9, 32%9) (by Corollary 3.4.4)
= gcd (9, 5)
= gcd (5, 9%5) (by Corollary 3.4.4)
= gcd (5, 4)
= gcd (4, 5%4) (by Corollary 3.4.4)
= gcd (4, 1)
= gcd (1, 4%1) (by Corollary 3.4.4)
= gcd (1, 0) = |1| (by Proposition 3.4.3 (b))
= 1.
These two computations are instances of a general algorithm for computing
gcd ( a, b) for any two numbers a ∈ Z and b ∈ N. This algorithm proceeds as
follows:

• If b = 0, then the gcd is | a|.


• If b > 0, then we replace a and b by b and a%b and recurse (i.e., we apply
the method again to b and a%b instead of a and b).

In Python code28 , this algorithm looks as follows:


28 I am using the Python programming language because of its ease of use and abundance
Math 221 Winter 2024, version March 12, 2024 page 119

def gcd(a, b): # for b nonnegative


if b == 0:
return abs(a) # This is the absolute value of a.
return gcd(b, a%b)
This algorithm is called the Euclidean algorithm. Let us convince ourselves
that it really terminates (rather than getting stuck in an endless loop):

Proposition 3.4.5. Let a ∈ Z and b ∈ N. Then, the Euclidean algorithm


terminates after at most b steps. (Here, we count each time that the algorithm
replaces a and b by b and a%b as a “step”.)

Proof. In each step of the Euclidean algorithm, the second argument b gets
replaced by a%b. This has the consequence that b decreases by at least 1 (since
the definition of a remainder yields a%b ∈ {0, 1, . . . , b − 1} and thus a%b ≤
b − 1). But b remains nonnegative throughout the algorithm. Thus, b cannot
decrease (by at least 1) more than b0 times in succession, where b0 is the original
value of b (as it was fed into the algorithm). Hence, the algorithm cannot have
more than b0 steps. In other words, the algorithm must terminate after at most
b0 steps. This proves Proposition 3.4.5 (since b0 is precisely the original value
of b).
Proposition 3.4.5 greatly overestimates the actual time that the Euclidean al-
gorithm needs to terminate: In truth, it terminates after at most log2 ( ab) + 2
steps (if a and b are positive)29 , which is usually much fewer than b. Some vari-
ants of the Euclidean algorithm get to the goal even faster. This speediness is
part of the reason why the Euclidean algorithm (and greatest common divisors)
is so useful in practical applications of number theory.
The Euclidean algorithm can be easily adapted to arbitrary b ∈ Z instead of
just b ∈ N (by adding a first step in which we replace b by −b if b is negative):

of inbuilt fundamental mathematical tools. All the algorithms can be implemented in any
other language as well, but the code looks best in Python.
29 Hints to the proof. Recall that each step of the algorithm replaces the numbers a and b by b

and a%b. Since b > a%b (because a%b ∈ {0, 1, . . . , b − 1} entails a%b < b), this yields that
after each step of the algorithm, the “current” numbers a and b satisfy a > b.
Now, consider the product ab of the two numbers a and b. We claim that each step of
the algorithm, except perhaps the first one, decreases this number by a factor of at least 2.
ab
In order to see this, you need to show that b ( a%b) ≤ whenever a > b. But this follows
2
a
from a%b ≤ , which in turn follows easily from a > b (why?).
2
Now you know that the product ab decreases by a factor of at least 2 at each step of the
algorithm except for the first one. In other words, its binary logarithm log2 ( ab) decreases
by at least 1 at each step of the algorithm except for the first one. At the first step, it also
decreases or stays unchanged. From this, it follows easily that the algorithm cannot have
more than log2 ( ab) + 1 steps until it reaches a situation in which log2 ( ab) ≤ 0. But in such
a situation, we must have a = b = 1, and it will only take one more step to reach the end of
the algorithm.
Math 221 Winter 2024, version March 12, 2024 page 120

def gcd(a, b): # for b arbitrary


if b < 0:
return gcd(a, -b) # replace b by -b.
if b == 0:
return abs(a) # This is the absolute value of a.
return gcd(b, a%b)

3.4.4. Bezout’s theorem and the extended Euclidean algorithm


The Euclidean algorithm can be adapted so that it doesn’t only compute gcd ( a, b),
but also expresses gcd ( a, b) as an “integer linear combination” of a and b (that
is, as a multiple of a plus a multiple of b). This allows us to prove the following
theorem:
Theorem 3.4.6 (Bezout’s theorem for integers). Let a and b be two integers.
Then, there exist two integers x and y such that

gcd ( a, b) = xa + yb.

We will soon prove this theorem. First, we introduce a notation and give a
few examples:
Definition 3.4.7. Let a and b be two integers. Then, a Bezout pair for ( a, b)
means a pair ( x, y) of two integers satisfying gcd ( a, b) = xa + yb.
For instance, a Bezout pair for (4, 7) is a pair ( x, y) of integers satisfying
gcd (4, 7) = x · 4 + y · 7. In view of gcd (4, 7) = 1, this latter equation simplifies
to 1 = 4x + 7y. So a Bezout pair for (4, 7) is a solution to this equation 1 =
4x + 7y in integers x and y. This is similar to the coin problem from Subsection
1.9.6, in the sense that you can think of such a Bezout pair ( x, y) as a way to pay
1 cent with x many 4-cent coins and y many 7-cent coins, assuming that you
are allowed to get change (because x and y are allowed to be negative). Without
change, of course, you could not pay 1 cent using 4-cent coins and 7-cent coins.
But with change, it works: You pay two 4-cent coins and get one 7-cent coin
in return, and thus end up paying 2 · 4 + (−1) · 7 = 1 cent, which is what you
wanted. In other words, the pair ( x, y) = (2, −1) satisfies 1 = 4x + 7y. In other
words, (2, −1) is a Bezout pair for (4, 7). There are also other Bezout pairs for
(4, 7), for example (−5, 3) (since 4 (−5) + 7 · 3 = 1). So a Bezout pair is usually
not unique.
So Bezout’s theorem can be restated as follows: For any two integers a and
b, you can pay gcd ( a, b) cents with a-cent coins and b-cent coins, if you can get
change30 . What denominations can be paid without change is a more compli-
cated story, and we will return to this in Section 3.8.
30 more
precisely: if you can get change in a-cent coins and b-cent coins (and there are infinitely
many coins of either denomination available)
Math 221 Winter 2024, version March 12, 2024 page 121

Here is another example: A Bezout pair for (6, 16) is (3, −1), since gcd (6, 16) =
2 = 6x + 16y for ( x, y) = (3, −1).

So Bezout’s theorem (Theorem 3.4.6) is saying that for any two integers a, b ∈
Z, there exists a Bezout pair for ( a, b).
How can we prove this theorem? Induction (particularly strong induction)
appears to be a reasonable method. Unfortunately, induction can only be used
to prove a statement about elements of a set of the form {k, k + 1, k + 2, . . .} for
a given integer k (that is, a statement about integers from a given lower bound
onwards). To put it differently, induction can only prove a statement that “starts
somewhere” (even if it is presented as a strong induction with no base case).
Meanwhile, in Bezout’s theorem, both a and b are just arbitrary integers, so
they can be arbitrarily low.
This hurdle can be surmounted: While we cannot prove Bezout’s theorem by
induction directly, we can first restrict it to the case when b ∈ N, and prove
this restriction by induction. In other words, we shall use induction to prove
the following particular case of Bezout’s theorem:

Lemma 3.4.8 (restricted Bezout’s theorem). Let a ∈ Z and b ∈ N. Then,


there exists a Bezout pair for ( a, b).

Once this lemma is proved, we will quickly deduce Bezout’s theorem in full
generality from it. So let us prove this lemma.
Proof of Lemma 3.4.8. We shall use strong induction on b. Here, we do not con-
sider a to be fixed. Thus, the statement that we will be proving for all b ∈ N
is
P (b) := (for each a ∈ Z, there exists a Bezout pair for ( a, b)) .
Our goal is to prove this statement P (b) for all b ∈ N. We shall do this by
strong induction on b:
Base case: Let us prove the statement P (0). Indeed, for each a ∈ Z, let us set

1,
 if a > 0;
sign a := 0, if a = 0;

−1, if a < 0.

Then, for each a ∈ Z, the pair (sign a, 0) is a Bezout pair for ( a, 0), since

gcd ( a, 0) = | a| (by Proposition 3.4.3 (b))


 
this is a general fact that holds for any real
= (sign a) · a  number a, and can be easily verified by 
checking the cases a > 0, a = 0 and a < 0
= (sign a) · a + 0 · 0.
Math 221 Winter 2024, version March 12, 2024 page 122

Hence, for each a ∈ Z, there exists a Bezout pair for ( a, 0). In other words, the
statement P (0) holds.
Induction step: Fix a positive integer b. We must prove the implication

( P (0) AND P (1) AND P (2) AND · · · AND P (b − 1)) =⇒ P (b) .

Thus, we assume (as the induction hypothesis) that P (0) AND P (1) AND P (2)
AND · · · AND P (b − 1) holds. In other words, we assume that the b statements
P (0) , P (1) , P (2) , . . . , P (b − 1) all hold. In other words, we assume that

(for each a ∈ Z, there exists a Bezout pair for ( a, 0)) and


(for each a ∈ Z, there exists a Bezout pair for ( a, 1)) and
(for each a ∈ Z, there exists a Bezout pair for ( a, 2)) and
··· and
(for each a ∈ Z, there exists a Bezout pair for ( a, b − 1)) .

In other words, we assume that for each a ∈ Z and each d ∈ {0, 1, . . . , b − 1},
there exists a Bezout pair for ( a, d). Renaming a as c here, we can restate this
as follows: We assume that for each c ∈ Z and each d ∈ {0, 1, . . . , b − 1}, there
exists a Bezout pair for (c, d). So this is our induction hypothesis (brought to
its most convenient form).
Our goal is now to prove P (b). In other words, we must prove that for each
a ∈ Z, there exists a Bezout pair for ( a, b).
So we fix an a ∈ Z, and we set out to find a Bezout pair for ( a, b).
The Euclidean recursion (Corollary 3.4.4) yields

gcd ( a, b) = gcd (b, a%b) . (32)

However, a%b ∈ {0, 1, . . . , b − 1} (by Proposition 3.3.11 (a), applied to n = a


and d = b).
Recall our induction hypothesis, which says that for each c ∈ Z and each
d ∈ {0, 1, . . . , b − 1}, there exists a Bezout pair for (c, d). We can apply this
to c = b and d = a%b (because b ∈ Z and a%b ∈ {0, 1, . . . , b − 1}), and thus
conclude that there exists a Bezout pair for (b, a%b). Let us denote this Bezout
pair by (u, v). Thus, by the definition of a Bezout pair, u and v are integers and
satisfy
gcd (b, a%b) = ub + v ( a%b) . (33)
However, Proposition 3.3.11 (d) (applied to n = a and d = b) yields

a = ( a//b) b + ( a%b) .

Solving this for a%b, we obtain

a%b = a − ( a//b) b. (34)


Math 221 Winter 2024, version March 12, 2024 page 123

Now, (32) becomes

gcd ( a, b) = gcd (b, a%b) = ub + v ( a%b) (by (33))


| {z }
= a−( a//b)b
(by (34))
= ub + v ( a − ( a//b) b)
= ub + va − v ( a//b) b
= |{z}
v a + (u − v ( a//b)) b.
| {z }
an integer an integer

Thus, we have written gcd ( a, b) as a multiple of a plus a multiple of b. More


specifically, the pair
(v, u − v ( a//b))
is a Bezout pair for ( a, b). And so we conclude that there exists a Bezout pair
for ( a, b) (because we just found one). This proves the statement P (b) for our
b, and thus completes the induction step.
Hence, by induction, we have shown that P (b) holds for all b ∈ N. But this
is saying precisely that there exists a Bezout pair for ( a, b) whenever a ∈ Z and
b ∈ N. Thus, Lemma 3.4.8 is proved.
This inductive proof contains a recursive algorithm for finding a Bezout pair
for ( a, b) whenever a ∈ Z and b ∈ N. Written in the Python programming
language, this algorithm looks as follows:31
def bezout_pair(a, b): # for b nonnegative
if b == 0:
return (sign(a), 0)
(u, v) = bezout_pair(b, a%b)
return (v, u - v * (a//b))
This algorithm is known as the extended Euclidean algorithm.

Now that Lemma 3.4.8 has been proven, Bezout’s theorem in the general case
(Theorem 3.4.6) easily follows:
Proof of Theorem 3.4.6. We are in one of the following two cases:
Case 1: We have b ≥ 0.
31 Here,sign(a) is what was called sign a in the above proof. In Python, this can be defined as
follows:
def sign(a):
if a < 0:
return -1
if a == 0:
return 0
if a > 0:
return 1
Math 221 Winter 2024, version March 12, 2024 page 124

Case 2: We have b < 0.


Let us first consider Case 1. In this case, b ≥ 0. Hence, b ∈ N. Thus, Lemma
3.4.8 yields that there exists a Bezout pair for ( a, b). In other words, there exists
a pair ( x, y) of two integers satisfying gcd ( a, b) = xa + yb (by the definition
of a Bezout pair). But this is precisely what Theorem 3.4.6 is claiming. Thus,
Theorem 3.4.6 is proved in Case 1.
Let us now consider Case 2. In this case, b < 0. Hence, −b > 0, so that −b ∈
N. Hence, Lemma 3.4.8 (applied to −b instead of b) yields that there exists a
Bezout pair for ( a, −b). Let (u, v) be this Bezout pair. Then, by the definition of
a Bezout pair, u and v are integers and satisfy gcd ( a, −b) = ua + v (−b).
However, Proposition 3.4.3 (h) yields gcd ( a, −b) = gcd ( a, b). Thus,

gcd ( a, b) = gcd ( a, −b) = ua + v (−b) = ua + (−v) b.


| {z }
=(−v)b

Thus, there exist two integers x and y such that gcd ( a, b) = xa + yb (namely,
x = u and y = −v). This proves Theorem 3.4.6 in Case 2.
We have now proved Theorem 3.4.6 in both Cases 1 and 2, so that the theorem
always holds.

Exercise 3.4.1. Recall the bezout_pair function defined above. This function
outputs a Bezout pair for any given pair ( a, b) with a ∈ Z and b ∈ N. Tweak
it so that it works for arbitrary b ∈ Z (not just for b ∈ N).
[Feel free to use your favorite programming language instead of Python,
but do not change the logic in the case when b ≥ 0.]

Exercise 3.4.2. (a) Prove that gcd (2n + 3, 3n + 4) = 1 for each n ∈ Z.


(b) Prove that gcd (15n + 4, 12n + 5) = 1 for each n ∈ Z.

Exercise 3.4.3. Let a ∈ Z and b ∈ Z be nonzero integers. Let ( x, y) be some


Bezout pair for ( a, b).
Let g = gcd ( a, b). Let a′ = a/g and b′ = b/g.
Prove that each Bezout pair for ( a, b) can be written in the form
( x + kb′ , y − ka′ ) for some k ∈ Z.
[Hint: It is probably easiest to first prove this in the case when a and b are
coprime. In this case, g = 1 and a′ = a and b′ = b.]

3.4.5. The universal property of the gcd


Bezout’s theorem is helpful for proving properties of gcds. Here is the most
important one, which is called the universal property of the gcd:
Math 221 Winter 2024, version March 12, 2024 page 125

Theorem 3.4.9 (universal property of the gcd). Let a, b, m ∈ Z. Then, we have


the equivalence

(m | a and m | b) ⇐⇒ (m | gcd ( a, b)) .

In other words, the common divisors of a and b are precisely the divisors of
gcd ( a, b). In other words, gcd ( a, b) is not just the greatest among the common
divisors of a and b (if a and b are not both 0), but it also is divisible by all of
them.
Proof of Theorem 3.4.9. We must prove the two implications

(m | a and m | b) =⇒ (m | gcd ( a, b))

and
(m | gcd ( a, b)) =⇒ (m | a and m | b) .
The second of these two implications is easy to prove: If m | gcd ( a, b), then
m | a (since m | gcd ( a, b) | a) and m | b (similarly).
It thus remains to prove the first implication: i.e., to prove that

(m | a and m | b) =⇒ (m | gcd ( a, b)) .

To prove this, we assume that m | a and m | b. We must show that m |


gcd ( a, b).
Bezout’s theorem (Theorem 3.4.6) tells us that there exist two integers x and y
such that gcd ( a, b) = xa + yb. Consider these x and y. Then, m | a | xa, so that
xa is a multiple of m. Similarly, yb is a multiple of m. Thus, xa + yb is a multiple
of m as well (since a sum of two multiples of m is again a multiple of m). But
this is saying that gcd ( a, b) is a multiple of m (since gcd ( a, b) = xa + yb). In
other words, m | gcd ( a, b). But this is precisely what we wanted to show. Thus,
the first implication is proved, and the proof of Theorem 3.4.9 is complete.
We note that Theorem 3.4.9 is commonly used in the “=⇒” direction (since
the “⇐=” direction is trivial). That is, the following fact is used most of the
time:

Corollary 3.4.10 (universal property of the gcd, forward direction). Let


a, b, m ∈ Z. If m | a and m | b, then m | gcd ( a, b).

Proof. This is the “=⇒” direction of Theorem 3.4.9.

Exercise 3.4.4. Let a1 , a2 , b1 , b2 ∈ Z be integers satisfying a1 | b1 and a2 | b2 .


Prove that gcd ( a1 , a2 ) | gcd (b1 , b2 ).
Math 221 Winter 2024, version March 12, 2024 page 126

3.4.6. Factoring out a common factor from a gcd


The following theorem has an “intuitively obvious” feel, but its proof is not as
simple as you might suspect:

Theorem 3.4.11. Let s, a, b ∈ Z. Then,

gcd (sa, sb) = |s| · gcd ( a, b) .

This is saying that when two integers have a common factor s, then this
common factor can be pulled out of their gcd. (The caveat is, of course, that the
common factor must be replaced by its absolute value, since a gcd cannot be
negative by definition.)
Proof of Theorem 3.4.11. Let

g = gcd ( a, b) and h = gcd (sa, sb) .

Thus, we must prove that h = |s| · g. Note that h and g are nonnegative (because
Proposition 3.4.3 (a) shows that gcds are always nonnegative). Thus, h = |h|
and g = | g|, so that |s| · g = |s| · | g| = |sg| (since | x | · |y| = | xy| for any two real
numbers x and y).
Our goal is to prove that h = |s| · g. Since h = |h| and |s| · g = |sg|, this
amounts to proving that |h| = |sg|. So this is our goal now.
One good way to prove that two integers p and q satisfy | p| = |q| is by
showing that p | q and q | p. Indeed, from p | q and q | p, it follows that
| p| = |q| (by Proposition 3.1.4 (c)).
Thus, in order to prove that |h| = |sg|, it will suffice to show that h | sg and
sg | h. Now, let us do this.

• Proof of sg | h: We have g = gcd ( a, b) | a. Multiplying both sides by s, we


thus obtain sg | sa 32 . Similarly, sg | sb. Hence, Corollary 3.4.10 (applied
to sg, sa and sb instead of m, a and b) yields sg | gcd (sa, sb). In other
words, sg | h (since h = gcd (sa, sb)).

• First proof of h | sg: If s = 0, then the claim h | sg is obvious (since


s g = 0 = h · 0). Thus, let us consider the case when s ̸= 0.
|{z}
=0
We have just showed that sg | h, but we also clearly have s | sg. Thus,
h
s | sg | h. Since s ̸= 0, this entails that ∈ Z (by Proposition 3.1.4 (d),
s
applied to s and h instead of a and b).

32 “Multiplying both sides by s” means using the following simple fact: If two integers x and y
satisfy x | y, then sx | sy.
Math 221 Winter 2024, version March 12, 2024 page 127

h h
This integer satisfies s · = h = gcd (sa, sb) | sa. Dividing both sides
s s
h h
by s, we thus obtain | a 33 . Similarly, | b. Hence, Corollary 3.4.10
s s
h h h
(applied to m = ) yields | gcd ( a, b). In other words, | g (since
s s s
h
g = gcd ( a, b)). Multiplying both sides by s, we thus obtain s · | sg. In
s
other words, h | sg. Thus, h | sg is proved.

• Second proof of h | sg: We have h = gcd (sa, sb) | sa. In other words, sa =
hu for some integer u. Similarly, sb = hv for some integer v. Consider
these integers u and v.
However, Bezout’s theorem (Theorem 3.4.6) shows that there exist two
integers x and y such that gcd ( a, b) = xa + yb. Consider these x and y.
Now, g = gcd ( a, b) = xa + yb, so that

sg = s ( xa + yb) = sxa + syb = |{z}


sa x + |{z}
sb y = hux + hvy = h (ux + vy) .
| {z }
=hu =hv an integer

This again proves that h | sg.

We have now proved that h | sg (proved in two different ways) and sg | h.


Hence, as explained above, we obtain |h| = |sg|. As we also explained above,
this completes our proof of Theorem 3.4.11.

3.5. Coprime integers


3.5.1. Definition and examples
Greatest common divisors are at their most useful when they are 1. This is
called “coprimality”:

Definition 3.5.1. Two integers a and b are said to be coprime (or relatively
prime) if gcd ( a, b) = 1.

Remark 3.5.2. This is a symmetric relation: If a and b are coprime, then b


and a are coprime (since gcd (b, a) = gcd ( a, b)).

Example 3.5.3. (a) An integer n is coprime to 2 if and only if n is odd. Indeed,


we know that gcd (n, 2) is a divisor of 2 and is a nonnegative integer (since
any gcd is a nonnegative integer). Thus, gcd (n, 2) must be either 1 or 2 (since
the only nonnegative divisors of 2 are 1 and 2). Now:
33 “Dividing both sides by s” means using the following simple fact: If two integers x and y
satisfy sx | sy, then x | y. (Note that this relies on s ̸= 0.)
Math 221 Winter 2024, version March 12, 2024 page 128

• If gcd (n, 2) = 2, then n is even (since 2 = gcd (n, 2) | n).

• If gcd (n, 2) = 1, then n is odd (because otherwise, 2 would be a com-


mon divisor of n and 2, but this cannot happen when the greatest com-
mon divisor of n and 2 is 1).

(b) An integer n is coprime to 3 if and only if n is not divisible by 3. (This


can be proved just as part (a), since the only nonnegative divisors of 3 are 1
and 3.)
(c) An integer n is coprime to 4 if and only if n is odd. (If you expected
“... if n is not divisible by 4” here, then you were wrong. The nonnegative
divisors of 4 are not only 1 and 4 but also 2. Thus, gcd (n, 4) can be 1, 2 or 4.
Specifically:

• If gcd (n, 4) = 1, then n is odd (since otherwise, 2 would be a common


divisor of n and 4, but this cannot happen when the greatest common
divisor is 1).

• If gcd (n, 4) = 2, then n is even (since 2 = gcd (n, 4) | n).

• If gcd (n, 4) = 4, then n is even as well (since 2 | 4 = gcd (n, 4) | n). )

(d) An integer n is coprime to 5 if and only if n is not divisible by 5. (This


can be proved just as part (a), since the only nonnegative divisors of 5 are 1
and 5.)
(e) An integer n is coprime to 6 if and only if n is neither even nor divisible
by 3. (Indeed, the nonnegative divisors of 6 are 1, 2, 3 and 6. Thus, gcd (n, 6)
can be 1, 2, 3 or 6. Specifically:

• If gcd (n, 6) = 1, then n is odd (since otherwise, 2 would be a common


divisor of n and 6, but this cannot happen when the greatest common
divisor is 1) and not divisible by 3 (for similar reasons but using 3
instead of 2).

• If gcd (n, 6) = 2, then n is even (since 2 = gcd (n, 6) | n).

• If gcd (n, 6) = 3, then n is divisible by 3 (since 3 = gcd (n, 6) | n).

• If gcd (n, 6) = 6, then n is both even and divisible by 3.)

Informally, I think of coprimality as some sort of “unrelatedness” or “inde-


pendence” or “orthogonality” or “noninterference” relation. In other words,
two integers a and b are coprime if and only if they have “nothing to do with
each other”, in some sense. This is nowhere near a rigorous statement, but it
motivates many properties of coprimality, including the ones we will see below.
Math 221 Winter 2024, version March 12, 2024 page 129

3.5.2. Three theorems about coprimality


The following three theorems are useful properties of coprime integers:

Theorem 3.5.4 (coprime divisors theorem). Let a, b, c ∈ Z satisfy a | c and


b | c. Assume that a and b are coprime. Then, ab | c.
(In other words, a product of two coprime divisors of c is again a divisor
of c.)

Proof. We have ab | ac (since b | c) and ba | bc (because a | c). Since ba = ab and


ac = ca and bc = cb, we can rewrite this as follows: We have ab | ca and ab | cb.
Thus, Corollary 3.4.10 (applied to ab, ca and cb instead of m, a and b) yields

ab | gcd (ca, cb) = |c| · gcd ( a, b) (by Theorem 3.4.11)


| {z }
=1
(since a is coprime to b)
= |c| .

Since divisibility does not depend on signs (Proposition 3.1.4 (a)), we thus ob-
tain ab | c 34 . This proves Theorem 3.5.4.

Example 3.5.5. We have 4 | 56 and 7 | 56. Since 4 and 7 are coprime, we can
thus conclude (by Theorem 3.5.4, applied to a = 4, b = 7 and c = 56) that
4 · 7 | 56.
In contrast, from 6 | 12 and 4 | 12, we cannot conclude that 6 · 4 | 12, since
6 and 4 are not coprime.

In terms of our “coprimality as independence” heuristic, Theorem 3.5.4 can be made


intuitive as follows: If a and b are two coprime divisors of c, then (because a and b are
coprime) a and b must divide “different parts” of c, and thus their product ab is still a
divisor of c. Of course, the notion of “different parts” here is not a real thing, but it is
helpful as a mnemonic device.

Theorem 3.5.6 (coprime removal theorem). Let a, b, c ∈ Z satisfy a | bc. As-


sume that a is coprime to b. Then, a | c.

Proof. We have a | ca and a | bc = cb. Thus, Corollary 3.4.10 (applied to a, ca

34 Here is this argument in detail: We have just proved that ab | abs c (where we write abs x
for | x | in order to avoid confusing absolute-value bars with divisibility symbols). Propo-
sition 3.1.4 (a) shows that we have ab | c if and only if abs ( ab) | abs c. However, the
same proposition shows that we have ab | abs c if and only if abs ( ab) | abs (abs c). Since
abs (abs c) = abs c, the latter statement can be rewritten as abs ( ab) | abs c. Thus, both state-
ments ab | c and ab | abs c are equivalent to abs ( ab) | abs c, and thus are equivalent to each
other. Hence, from ab | abs c, we obtain ab | c.
Math 221 Winter 2024, version March 12, 2024 page 130

and cb instead of m, a and b) yields

a | gcd (ca, cb) = |c| · gcd ( a, b) (by Theorem 3.4.11)


| {z }
=1
(since a is coprime to b)
= |c| .

Since divisibility does not depend on signs, this means that a | c. Thus, Theo-
rem 3.5.6 holds.

Example 3.5.7. We have 6 | 7 · 12, but 6 is coprime to 7. Thus, Theorem 3.5.6


(applied to a = 6, b = 7 and c = 12) yields 6 | 12 (as if you didn’t know this
already).
But we cannot obtain 6 | 7 from 6 | 12 · 7, since 6 is not coprime to 12.

Again, Theorem 3.5.6 can be motivated using the “independence” view on coprimal-
ity: If a is coprime to b, then b cannot be the “reason” for the divisibility a | bc, and
thus b can be removed from this divisibility. Again, this is neither a proof nor even a
rigorous statement, but it makes Theorem 3.5.6 looks less surprising.

Theorem 3.5.8 (coprime product theorem). Let a, b, c ∈ Z. Assume that each


of the numbers a and b is coprime to c. Then, ab is also coprime to c.

Proof. Let g = gcd ( ab, c). Thus, we must prove that g = 1.


We have g = gcd ( ab, c) | ab and g = gcd ( ab, c) | c | ac. Hence, Corollary
3.4.10 (applied to g, ab and ac instead of m, a and b) yields

g | gcd ( ab, ac) = | a| · gcd (b, c) (by Theorem 3.4.11)


| {z }
=1
(because b is coprime to c)
= | a| · 1 = | a| .

Hence, g | a (since divisibility does not depend on signs). Combining this with
g | c, we obtain g | gcd ( a, c) (by Corollary 3.4.10, applied to g, a and c instead
of m, a and b). However, gcd ( a, c) = 1 (since a is coprime to c), so we obtain
g | gcd ( a, c) = 1.
However, g is a nonnegative integer (since any gcd is a nonnegative integer).
Thus, g is a nonnegative divisor of 1 (since g | 1). Since the only nonnegative
divisor of 1 is 1, we thus conclude that g = 1. Hence, gcd ( ab, c) = g = 1. This
shows that ab is coprime to c, and we have proved Theorem 3.5.8.

Example 3.5.9. Each of the numbers 3 and 4 is coprime to 5. Thus, Theorem


3.5.8 (applied to a = 3, b = 4 and c = 5) yields that 3 · 4 is coprime to 5.

Again, Theorem 3.5.8 can be viewed within the “independence” paradigm: If each
of a and b is coprime to c, then so should be ab, because any “dependence” between
Math 221 Winter 2024, version March 12, 2024 page 131

ab and c should come from a or from b. Alternatively, if you think of coprimality as


an analogue of orthogonality, then you can view Theorem 3.5.8 as an analogue of the


fact that if two vectors −

a and b are both orthogonal to a given vector − →
c , then so is

→ −

their sum a + b . Again, none of these metaphors should be mistaken for a proof of
Theorem 3.5.8.
Theorems 3.5.4, 3.5.6 and 3.5.8 can be generalized, dropping some of the coprimality
assumptions (but leading to less memorable results). Here is the generalization of
Theorem 3.5.4:

Theorem 3.5.10. Let a, b, c ∈ Z satisfy a | c and b | c. Then, ab | gcd ( a, b) · c.

Proof. Read our above proof of Theorem 3.5.4 until the point where it shows that ab |
|c| · gcd ( a, b). Now, observe that |c| divides c (since |c| is either c or −c), and thus
|c| · gcd ( a, b) divides c · gcd ( a, b). Hence,

ab | |c| · gcd ( a, b) | c · gcd ( a, b) = gcd ( a, b) · c.

This proves Theorem 3.5.10.

Here is the generalization of Theorem 3.5.6:

Theorem 3.5.11. Let a, b, c ∈ Z satisfy a | bc. Then, a | gcd ( a, b) · c.

Proof. Read our above proof of Theorem 3.5.6 until the point where it shows that a |
|c| · gcd ( a, b). Now, observe that |c| divides c (since |c| is either c or −c), and thus
|c| · gcd ( a, b) divides c · gcd ( a, b). Hence,

a | |c| · gcd ( a, b) | c · gcd ( a, b) = gcd ( a, b) · c.

This proves Theorem 3.5.11.

3.5.3. Reducing a fraction


Here is one more property of gcds:

Theorem 3.5.12. Let a and b be two integers that are not both 0. Let g =
a b
gcd ( a, b). Then, the integers and are coprime.
g g

This theorem is important for understanding rational numbers. Indeed, a


u
ratio of two integers is said to be in reduced form if u and v are coprime.
v
a
Now, Theorem 3.5.12 shows that if we start with a ratio of two integers, and
b
cancel gcd ( a, b) from the numerator and the denominator, then the result will
be a ratio in reduced form. Hence, each rational number can be brought to a
12 12/3 4
reduced form. For example, = = .
21 21/3 7
Math 221 Winter 2024, version March 12, 2024 page 132

Proof of Theorem 3.5.12. Since a and b are not both 0, we have gcd ( a, b) ̸= 0
(since 0 cannot divide any nonzero integer). Since we know that gcd ( a, b) ∈
N, we thus conclude that gcd ( a, b) > 0. In other words, g > 0 (since g =
a b
gcd ( a, b)). Thus, and are well-defined. Also, from g > 0, we obtain
g g
| g| = g.
a b
Since g = gcd ( a, b), we have g | a and g | b. Hence, and are integers.
g g
Moreover,
   
a b a b
g = gcd ( a, b) = gcd g · , g · since a = g · and b = g ·
g g g g
by Theorem 3.4.11,
 
 
a b
= | g| · gcd ,  a b 
g g since and are integers
g g
|{z}
=g
 
a b
= g · gcd , .
g g
Dividing this equality by g, we find
 
a b
1 = gcd , (since g ̸= 0) .
g g
a b
This shows that and are coprime. Thus, Theorem 3.5.12 is proven.
g g

3.6. Prime numbers


3.6.1. Definition
The following is one of the most famous concepts in mathematics:

Definition 3.6.1. An integer n > 1 is said to be prime (or a prime) if the only
positive divisors of n are 1 and n.

The first few primes (= prime numbers) are


2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43.
It can be shown that there are infinitely many primes (see Exercise 3.6.1 (b)
for one proof).

3.6.2. The friend-or-foe lemma


The first property of primes that we will show is an important result that we
call the friend-or-foe lemma:
Math 221 Winter 2024, version March 12, 2024 page 133

Lemma 3.6.2 (friend-or-foe lemma). Let p be a prime. Let n ∈ Z. Then, n is


either divisible by p or coprime to p, but not both.

Proof. The number p is prime, and thus its only positive divisors are 1 and p.
Since gcd (n, p) is a positive divisor of p (this is easy to see35 ), we thus conclude
that gcd (n, p) must be either 1 or p. So we are in one of the following two cases:
Case 1: We have gcd (n, p) = 1.
Case 2: We have gcd (n, p) = p.
Let us first consider Case 1. In this case, we have gcd (n, p) = 1. In other
words, n is coprime to p. Furthermore, the greatest common divisor of n and
p is gcd (n, p) = 1; therefore, p cannot be a common divisor of n and p (since
p > 1). Thus, n is not divisible by p (since this would entail that p is a common
divisor of n and p). So we have shown that n is coprime to p and not divisible
by p. Thus, Lemma 3.6.2 is proved in Case 1.
Let us now consider Case 2. In this case, we have gcd (n, p) = p ̸= 1. Thus,
n is not coprime to p. Also, p = gcd (n, p) | n shows that n is divisible by p. So
we have shown that n is divisible by p and not coprime to p. Hence, Lemma
3.6.2 is proved in Case 2.
We have now proved Lemma 3.6.2 in both Cases 1 and 2; thus, Lemma 3.6.2
is fully proved.
(The moniker “friend-or-foe lemma” is metaphorical: You can think of inte-
gers that are divisible by p as “friends of p”, and think of integers coprime to
p as “foes of p”. Thus, a prime number cleanly divides the integers into its
“friends” and its “foes”. In contrast, the non-prime number 4 has a more “nu-
anced” relationship with certain integers such as 2 (since 2 is neither divisible
by 4 nor coprime to 4).)

3.6.3. There are infinitely many primes, and some more exercises

Exercise 3.6.1. Let ( a0 , a1 , a2 , . . .) be a sequence of integers defined recursively


by
a n = 1 + a 0 a 1 · · · a n −1 for all n ≥ 0.
(This sequence has been studied in Exercise 2.3.5.)
(a) Prove that gcd ( an , am ) = 1 for any two distinct integers n, m ∈ N.
For each n ∈ N, let pn be a prime that divides an . (Such a prime exists,
since an = 1 + a0 a1 · · · an−1 ≥ 1 + 1 > 1. Of course, there will often be several
| {z }
≥1
choices. In this case, just choose one.)
(b) Prove that the primes p0 , p1 , p2 , . . . are distinct.
35 Proof.The number gcd (n, p) is a divisor of p, and thus is nonzero (since 0 does not divide
p). Furthermore, gcd (n, p) is nonnegative (since any gcd is nonnegative). Thus, gcd (n, p)
is positive. Hence, gcd (n, p) is a positive divisor of p.
Math 221 Winter 2024, version March 12, 2024 page 134

Exercise 3.6.1 (b) shows that there are infinitely many primes. This is a famous
result of Euclid; many other proofs of it are known (see, e.g., [Conrad22]).
All primes except for 2 are odd. Thus, the distances between consecutive
primes (except for 2 and 3) are always even. Beside this, however, these dis-
tances are rather unpredictable. For instance, the two consecutive primes 41
and 43 are a distance of 2 apart, whereas the two consecutive primes 113 and
127 are a distance of 14 apart. Even some very simple-sounding questions, such
as “are there infinitely many pairs of consecutive primes at a distance of 2 from
each other?” (such primes are called twin primes) are so-far unresolved (this
one is called the twin primes conjecture). At least, one can show that three
consecutive primes cannot be at distances of 2 from each other:

Exercise 3.6.2. Let p be a prime such that p − 2 and p + 2 are also prime.
Prove that p = 5.
[Hint: Consider the remainders upon division by 6.]

Exercise 3.6.3. Let p be a prime larger than 3. Prove that p2 ≡ 1 mod 24.
[Hint: Recall some older problems. Also note that the integers 3 and 8 are
coprime.]

The following exercise helps checking whether a given integer n is prime:

Exercise 3.6.4. Let n be an integer such that n > 1 but n is not a prime. Let d
be the smallest divisor of n that is larger than 1. Prove that d2 ≤ n.
(You can use standard
 properties of inequalities – e.g., the equivalence
2 2
(u ≤ v) ⇐⇒ u ≤ v when u and v are positive.)
Math 221 Winter 2024, version March 12, 2024 page 135

3.6.4. Binomial coefficients and primes


The friend-or-foe lemma has myriad applications. As a first example, recall
Pascal’s triangle (which we saw in Section 2.4):

k =0

k =1
n=0 → 1 ↙
k =2
n=1 → 1 1 ↙
k =3
n=2 → 1 2 1 ↙
k =4
n=3 → 1 3 3 1 ↙
k =5
n=4 → 1 4 6 4 1 ↙
k =6
n=5 → 1 5 10 10 5 1 ↙
k =7
n=6 → 1 6 15 20 15 6 1 ↙
n=7 → 1 7 21 35 35 21 7 1

n=8 → 1 8 28 56 70 56 28 8 1

One property of Pascal’s triangle that you might have already noticed is the
following: All
 entries
  in its n=  7 row except for the two 1’s (i.e., all the binomial
7 7 7
coefficients , ,..., ) are divisible by 7; all entries in the n = 5 row
1 2 6
except for the two 1’s are divisible by 5; likewise for the n = 3 and n = 2 rows.
The pattern here can be generalized to any prime number instead of 7, 5, 3, 2:
 
p
Theorem 3.6.3. Let p be a prime. Let k ∈ {1, 2, . . . , p − 1}. Then, p | .
k

Proof. Exercise 2.6.3 (a) (applied to n = p) yields

p−1
   
p
k =p .
k k−1
| {z }
an integer
(by Theorem 2.5.9)
 
p
Thus, p | k .
k
From k ∈ {1, 2, . . . , p − 1}, we furthermore obtain p ∤ k (because if we had
p | k, then Proposition 3.1.4 (b) would entail | p| ≤ |k |, which would contradict
|k| = k ≤ p − 1 < p = | p|). In other words, k is not divisible by p.
Math 221 Winter 2024, version March 12, 2024 page 136

But the friend-or-foe lemma (Lemma 3.6.2, applied to n = k) says that k is


either divisible by p or coprime to p. Since k is not divisible by p, we thus
conclude that
  k must be coprimeto 
p. In other words, p is coprime to k. Hence,
p p
from p | k , we obtain p | using the coprime cancellation theorem
k k  
p
(Theorem 3.5.6, applied to a = p and b = k and c = ). This proves
k
Theorem 3.6.3.

 Theorem
 3.6.3 shows that if p is a prime, then all the binomial coefficients
p
in the p-th row of Pascal’s triangle are divisible by p (except for the two
i
1’s on the borders of the triangle). The following exercise, in contrast, claims
that the binomial coefficients in the ( p − 1)-st row are alternatingly congruent
to 1 and to −1 modulo p:

Exercise 3.6.5. Let p be a prime. Prove that

p−1
 
≡ (−1)i mod p for each i ∈ {0, 1, . . . , p − 1} .
i

p−1 p−1
   
[Hint: What connects the three binomial coefficients ,
  i i−1
p
and ?]
i

3.6.5. Fermat’s little theorem


It is easy to see that every integer a satisfies a2 ≡ a mod 2. Indeed, the difference
a2 − a = a ( a − 1) is divisible by 2, since at least one of the two consecutive
integers a and a − 1 must be even and thus contributes a factor of 2 to the
product a ( a − 1).
Likewise, every integer a satisfies a3 ≡ a mod 3, since the difference a3 − a =
( a − 1) a ( a + 1) is divisible by 3 (because at least one of the three consecutive
integers a − 1, a and a + 1 must be divisible by 3).
This pattern does not persist for 4: Indeed, a4 ≡ a mod 4 does not hold for
a = 2. However, for 5, the pattern emerges again: Every integer a satisfies
a5 ≡ a mod 5. This is not as easy to see as the analogous claims for a2 and
a3 (since a5 − a does not factor into linear factors any more), but still can be
checked with a bit of work (there are only 5 possible values for the remainder
a%5, and each of these values allows us to check a5 ≡ a mod 5 by reducing both
sides modulo 5).
The pattern is lost again for 6 (the congruence a6 ≡ a mod 6 fails for a = 2),
but reemerges for 7.
As you may have guessed, there is a general result here:
Math 221 Winter 2024, version March 12, 2024 page 137

Theorem 3.6.4 (Fermat’s Little Theorem). Let p be a prime. Let a ∈ Z. Then,

a p ≡ a mod p.

Proof. We shall induct on a. This will only cover the case a ≥ 0, so we will have
to handle the case a < 0 by a separate argument afterwards.
Base case: The congruence a p ≡ a mod p clearly holds for a = 0 (since 0 p =
0 ≡ 0 mod p).
Induction step: Let a ∈ N. Assume (as the induction hypothesis) that a p ≡
a mod p. We must prove that ( a + 1) p ≡ a + 1 mod p.
But the binomial formula (Theorem 2.6.1) yields
p  p  
p k p−k p k
( a + 1) = ∑
p
a 1|{z} = ∑ a
k =0
k k = 0
k
=1
  p −1    
p p k p p
= a +∑
0
a + a
0 |{z}
k = 1
k p
| {z } =1 | {z }
=1 =1
 
here, we have split off the addends
for k = 0 and for k = p from the sum
p −1   p −1  
p k p k
= 1+ ∑ a +a = ∑p
a + a p + 1.
k =1
k k =1
k

In other words,
p −1  
p k
p p
( a + 1) − ( a + 1) = ∑ k
a . (35)
k =1

 However,
   Theorem 3.6.3 shows   that each k ∈ {1, 2, . . . , p − 1} satisfies p |
p p k p k
| a . In other words, a is a multiple of p for each k ∈ {1, 2, . . . , p − 1}.
k k k
p −1 p
 
Hence, ∑ ak is a sum of multiples of p, and thus itself a multiple of
k =1 k
p −1 p
 
p. That is, we have p | ∑ ak . In view of (35), we can rewrite this as
k =1 k
p | ( a + 1) p − ( a p + 1). In other words,

( a + 1) p ≡ a p + 1 mod p. (36)

However, the induction hypothesis says that a p ≡ a mod p. Adding the obvi-
ous congruence 1 ≡ 1 mod p to this, we obtain

a p + 1 ≡ a + 1 mod p.
Math 221 Winter 2024, version March 12, 2024 page 138

Combining this congruence with (36), we obtain

( a + 1) p ≡ a p + 1 ≡ a + 1 mod p,

which shows that ( a + 1) p ≡ a + 1 mod p (by the transitivity of congruence).


This completes the induction step.
Thus, Theorem 3.6.4 is proved for all a ≥ 0. It remains to prove it for all a < 0
now. This can be done with a neat trick:
Let a ∈ Z satisfy a < 0. Then, we must prove that a p ≡ a mod p.
But we already know that b p ≡ b mod p for all integers b ≥ 0 (because we
have already proved Theorem 3.6.4 for all a ≥ 0). We can apply this to b = a%p
(since the remainder a%p is ≥ 0), and thus obtain

( a%p) p ≡ a%p mod p.

However, Proposition 3.3.11 (a) (applied to n = a and d = p) shows that


a%p ∈ {0, 1, . . . , p − 1} and a%p ≡ a mod p. We can take the congruence a%p ≡
a mod p to the p-th power, we obtain ( a%p) p ≡ a p mod p (we have here used
Exercise 3.2.1). Therefore, a p ≡ ( a%p) p mod p. Combining all the congruences
we have obtained so far, we obtain

a p ≡ ( a%p) p ≡ a%p ≡ a mod p,

from which we can conclude that a p ≡ a mod p (by transitivity of congruence).


Thus, we have proved Theorem 3.6.4 for a < 0. This completes the proof of
Theorem 3.6.4.
Fermat’s Little Theorem has a bunch of applications, some of which we might
see later.
One wrinkle in the pattern we have discussed above: Theorem 3.6.4 shows that
every prime p satisfies a p ≡ a mod p for all a ∈ Z. But there are some positive integers
p that satisfy this even though they are not prime! The smallest such integers are
1, 561, 1105, 1729, 2465. See Carmichael numbers for more details.

3.6.6. Prime divisor separation theorem


You can think of the primes as “inseparable” positive integers: They cannot be
written as products of two smaller positive integers. (Of course, 1 also has this
property but does not count as a prime. In a way, 1 is inseparable because there
is nothing to separate, so it doesn’t count as a prime.)
One useful consequence of this “inseparability” is that if a prime p divides a
product ab of two integers, then it must divide one of the two factors a and b,
since (speaking heuristically) it cannot be “separated” into a part that divides
a and a part that divides b. Never mind that this is not a valid argument, the
conclusion is a true fact:
Math 221 Winter 2024, version March 12, 2024 page 139

Theorem 3.6.5 (prime divisor separation theorem). Let p be a prime. Let


a, b ∈ Z be such that p | ab. Then, p | a or p | b.

Proof of Theorem 3.6.5. We shall prove the claim of Theorem 3.6.5 in the follow-
ing equivalent form: “If p ∤ a, then p | b.”
Assume that p ∤ a. We must then prove that p | b.
The friend-or-foe lemma (Lemma 3.6.2) yields that a is either divisible by p
or coprime to p. Thus, a is coprime to p (since p ∤ a). In other words, p is
coprime to a. Hence, we can use the coprime cancellation theorem (Theorem
3.5.6, applied to p, a and b instead of a, b and c) to obtain p | b from p | ab. This
is precisely what we wanted to prove. Theorem 3.6.5 is thus proved.
Theorem 3.6.5 shows that if a prime number p divides a product ab, then it
must divide a or b (or both). In contrast, a non-prime number like 4 can divide
a product ab without dividing a or b. For example, 4 | 2 · 6 but 4 ∤ 2 and 4 ∤ 6.
We can extend Theorem 3.6.5 to products of several factors:

Corollary 3.6.6 (prime divisor separation theorem for k factors). Let p be a


prime. Let a1 , a2 , . . . , ak ∈ Z be such that p | a1 a2 · · · ak . Then, p | ai for some
i ∈ {1, 2, . . . , k }.
(In words: If a prime divides a product of several integers, then it must
divide at least one of the factors.)

Proof sketch. Induct on k. In the induction step, use Theorem 3.6.5. (The base
case is the case k = 0, in which case Corollary 3.6.6 is vacuously true because
p ∤ 1.) (See [Grinbe19b, Proposition 2.13.7] for this proof in detail.)
The following exercise is another form of Fermat’s Little Theorem:

Exercise 3.6.6. Let p be a prime. Let a ∈ Z be an integer not divisible by p.


Prove that a p−1 ≡ 1 mod p.

3.6.7. p-valuations: definition


We will need the following simple lemma:

Lemma 3.6.7. Let p be a prime. Let n be a nonzero integer. Then, there exists
a largest m ∈ N such that pm | n.
n
Proof. The relation pm | n means that ∈ Z. In other words, it means that
pm
we can divide n by p at least m times without obtaining a non-integer. So the
claim of Lemma 3.6.7 is saying that there is a largest number of times that we
can divide n by p without obtaining a non-integer. But this is clear: Every time
Math 221 Winter 2024, version March 12, 2024 page 140

we divide n by p, the absolute value |n| decreases (since p > 1), and obviously
this cannot go on forever without eventually yielding a non-integer.36
(See [Grinbe19b, Proof of Lemma 2.13.22] for a more formal proof of Lemma
3.6.7.)
Lemma 3.6.7 allows us to make the following definition:

Definition 3.6.8. Let p be a prime.


(a) Let n be a nonzero integer. Then, v p (n) shall denote the largest m ∈ N
such that pm | n. (This is well-defined by Lemma 3.6.7. Thus, v p (n) is the
number of times that you can divide n by p without getting a non-integer.)
This number v p (n) will be called the p-valuation (or the p-adic valuation)
of n.
(b) In order to have v p (n) defined for all integers n (as opposed to just for
nonzero n), we also define v p (0) to be ∞ (because 0 can be divided by p an
arbitrary number of times without any changes). This symbol ∞ is not an
actual number, but we shall pretend that it behaves like a number at least in
some regards. In particular, we will eventually add or compare it to other
numbers. In doing so, we shall follow the rules that

k+∞ = ∞+k = ∞ for all k ∈ Z;


∞ + ∞ = ∞;
k < ∞ and ∞ > k for all k ∈ Z.

Thus, ∞ acts like a “mythical number that is larger than any actual number”.
We can keep up this charade as long as we only add and compare, but never
subtract ∞ from anything (since 1 + ∞ = ∞ would turn into 1 = 0 if you
subtracted ∞).

Here are some examples:

• We have
 
v3 (99) = 2 since 32 | 99 but 33 ∤ 99 ;
 
0 1
v3 (98) = 0 since 3 | 98 but 3 ∤ 98 ;
 
v3 (96) = 1 since 31 | 96 but 32 ∤ 96 ;
v3 (0) = ∞.

We can restate the definition of v p (n) in yet another way: If p is a prime and
n is a positive integer, then v p (n) is the number of zeroes at the end of the
36 Of
course, we are also tacitly using the fact that n is an integer in the first place, so that m = 0
does satisfy pm | n (since p0 = 1 | n).
Math 221 Winter 2024, version March 12, 2024 page 141

base-p representation of the number n. For example, the base-2 representation


of the number 344 is 101011000, which has three zeroes at its end (the other
zeroes don’t count!), so that v2 (344) = 3.
Note that Definition 3.6.8 can be generalized to any positive integer p > 1
(prime or not). But most of the useful properties of p-valuations hold only
when p is prime.

3.6.8. p-valuations: basic properties


Let us now discuss some basic properties of p-valuations. We begin with a
lemma that is almost trivial, but quite helpful:

Lemma 3.6.9. Let p be a prime. Let i ∈ N and n ∈ Z. Then, pi | n if and only


if v p (n) ≥ i.

Proof. If n = 0, then this is clear (because in this case, we have both pi | 0 = n


and v p (n) = v p (0) = ∞ ≥ i).
It remains to deal with the case n ̸= 0. In this case, v p (n) is defined as the
largest m ∈ N such that pm | n. Thus, in this case, we have pi | pv p (n) | n
whenever i ≤ v p (n), whereas pi ∤ n whenever i > v p (n). In other words, we
have pi | n if and only if i ≤ v p (n). In other words, we have pi | n if and only
if v p (n) ≥ i. Thus, Lemma 3.6.9 is proved in this case as well.
Recall some standard notations: For any two numbers x and y, we let min { x, y}
denote the smaller of these two numbers, and we let max { x, y} denote the
larger of these two numbers. More generally, if S is a set of numbers, then
min S means the smallest element of S (if it exists), and max S means the largest
element of S (if it exists). We extend these notations to sets that include ∞ in the
obvious way (recalling that ∞ is larger than any integer). Thus, in particular,

max {∞, k } = max {k, ∞} = ∞ for all k ∈ Z ∪ {∞} ;


min {∞, k } = min {k, ∞} = k for all k ∈ Z ∪ {∞} .

Now, we can state a bunch of rather important properties of p-valuations:

Theorem 3.6.10 (basic properties of p-valuations). Let p be a prime. Then:


(a) We have v p ( ab) = v p ( a) + v p (b) for any a, b ∈ Z.
(b) We have v p ( a + b) ≥ min v p ( a) , v p (b) for any a, b ∈ Z.


(c) We have v p (1) = 0.


(d) We have v p ( p) = 1.
(e) We have v p (q) = 0 for any prime q ̸= p.
Math 221 Winter 2024, version March 12, 2024 page 142

Proof. (a) Let a, b ∈ Z. We must prove that v p ( ab) = v p ( a) + v p (b).


If a = 0, then this is saying that ∞ = ∞ + v p (b), which follows from our
rules for ∞ (specifically, from the rules saying that ∞ + k = ∞ for all k ∈ Z and
that ∞ + ∞ = ∞). Likewise, we can prove our claim if b = 0.
It thus remains to handle the case when neither a nor b is 0. So let us con-
sider this case. Since a and b are nonzero, the numbers v p ( a) and v p (b) are
nonnegative integers. Let us call give them names: We set

n = v p ( a) and m = v p (b) .

Thus, pn | a and pm | b. In other words, there are integers x and y such that
a = pn x and b = pm y. Consider these x and y.
If we had p | x, then we would readily obtain pn+1 | a (because p | x entails
that x = pz for some integer z, and thus this integer z must satisfy a = pn |{z}
x =
= pz
pn pz =pn+1 z) and therefore v p ( a) ≥ n + 1 (by Lemma 3.6.9, applied to n + 1
and a instead of i and n), which would contradict v p ( a) = n < n + 1. Thus, we
cannot have p | x. For similar reasons, we cannot have p | y.
However, multiplying a = pn x with b = pm y, we obtain ab = pn x · pm y =
p +m xy, and thus pn+m | ab. Therefore, v p ( ab) ≥ n + m (by Lemma 3.6.9,
n

applied to ab and n + m instead of n and i).


Now, we shall show that this inequality is an equality. To do so, we must
show that pn+m+1 ∤ ab.
To prove this, we assume the contrary. Thus, pn+m+1 | ab = pn+m xy. Divid-
ing both sides of this divisibility by pn+m , we obtain p | xy.
However, the prime divisor separation theorem (Theorem 3.6.5) says that if
the prime number p divides a product of two integers, then it must divide one
of these two integers. Therefore, from p | xy, we obtain either p | x or p | y
(since x and y are integers). But this contradicts the fact that we cannot have
p | x and we cannot have p | y. This contradiction shows that our assumption
must have been wrong. Thus, we have shown that pn+m+1 ∤ ab.
So we know that pn+m | ab but pn+m+1 ∤ ab. In other words, the largest
i ∈ N that satisfies pi | ab is n + m. In other words, v p ( ab) = n + m (by the
definition of v p ( ab)). Since n = v p ( a) and m = v p (b), we can rewrite this as
v p ( ab) = v p ( a) + v p (b). This proves Theorem 3.6.10 (a).
(b) Leta, b ∈ Z. We must prove that v p ( a + b) ≥ min v p ( a) , v p (b) .


If min vp ( a) , v p (b) = ∞, then this inequality boils down to ∞ ≥ ∞ (be-


cause min v p ( a) , v p (b) = ∞ yields v p ( a) = ∞ and v p (b) = ∞, so that a = 0
and b = 0, and thus a + b = 0 as well, which in turn leads to v p ( a + b) = ∞),
which is true.
v p ( a) , v p (b) ̸= ∞. Thus,

Thus, it remains to handle the case when min
min v p ( a) , v p (b) ∈ N. Set k = min v p ( a) , v p (b) . Then, k ≤ v p ( a) and
 

k ≤ v p (b). From k ≤ v p ( a), we obtain v p ( a) ≥ k and thus pk | a (by Lemma


3.6.9, applied to n = a and i = k). Similarly, pk | b. Thus, a and b are multiples
Math 221 Winter 2024, version March 12, 2024 page 143

of pk . Hence, their sum a + b is also a multiple of pk . In other words,


 pk | a + b.
Using Lemma 3.6.9, this in turn entails v p ( a + b) ≥ k = min v p ( a) , v p (b) .
Thus, Theorem 3.6.10 (b) is proved.
(c) This follows from p0 = 1 | 1 and p1 = p ∤ 1.
(d) This follows from p1 = p | p and p2 ∤ p.
(e) Let q ̸= p be a prime. Then, the only positive divisors of q are 1 and q
(since q is a prime). Hence, p is not a positive divisor of q (since p ̸= 1 and
p ̸= q). Therefore, p is not a divisor of q (since p is positive). In other words,
p ∤ q. Now, from p0 = 1 | q and p1 = p ∤ q, we obtain v p (q) = 0. This proves
Theorem 3.6.10 (e).

Corollary 3.6.11. Let p be a prime. Then,

v p ( a1 a2 · · · a k ) = v p ( a1 ) + v p ( a2 ) + · · · + v p ( a k )

for any k integers a1 , a2 , . . . , ak .

Proof. Induct on k. The base case uses v p (1) = 0. The induction step relies on
Theorem 3.6.10 (a).
Note that Theorem 3.6.10 (a) would fail if p were allowed to be non-prime.
For instance, v4 (2 · 2) = 1 but v4 (2) + v4 (2) = 0 + 0 = 0.

3.6.9. Back to Hanoi


Let us take a closer look at 2-valuations. The sequence

( v2 (1) , v2 (2) , v2 (3) , v2 (4) , v2 (5) , . . . )


= (0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, . . .)

is called the ruler sequence, as it resembles the pattern of markings on a ruler


(a small marking at every inch, a slightly larger marking every 2 inches, an
even larger marking every 4 inches, and so on). It tends to appear every once
in a while in seemingly unexpected places. Case in point:

Proposition 3.6.12. Let n ∈ N.


In Section 1.1, we proposed a strategy for solving the Tower of Hanoi
puzzle with n disks. Let Sn be this strategy.
Let k ∈ {1, 2, . . . , 2n − 1}. Then, the k-th move of the strategy Sn moves the
(v2 (k) + 1)-th smallest disk.

Thus, in particular, every odd move (i.e., the 1-st, the 3-rd, the 5-th, and so
on moves) moves the smallest disk (since v2 (k ) = 0 when k is odd).
The proof of Proposition 3.6.12 relies on the following lemma about p-valuations:
Math 221 Winter 2024, version March 12, 2024 page 144

Lemma 3.6.13. Let p be a prime. Let m ∈ N. Let k be an integer such that


pm ∤ k. Then, v p ( pm + k) = v p (k).

Proof of Lemma 3.6.13. From pm ∤ k, we obtain k ̸= 0, so that v p (k ) ̸= ∞. In other words,


v p (k ) ∈ N.
Let i = v p (k ). Thus, i ∈ N (since v p (k ) ∈ N), and the definition of v p (k ) shows that
pi | k and pi+1 ∤ k.
If we had m ≤ i, then we would have pm | pi | k, which would contradict pm ∤ k.
Thus, we cannot have m ≤ i. In other words, we have i < m. Thus, i ≤ m − 1 (since i
and m are integers), so that i + 1 ≤ m. Therefore, pi+1 | pm .
From the definition of p-valuations, it follows easily that v p ( pm ) = m and v p (− pm ) =
m.
The numbers pm and k are multiples of pi (since pi | pi+1 | pm and pi | k). Thus, their
sum pm + k is a multiple of pi as well. In other words, pi | pm + k.
On the other hand, let us show that pi+1 ∤ pm + k. Indeed, assume the contrary. Thus,
p 1 | pm + k.
i +

Therefore, the numbers pm + k and − pm are multiples of pi+1 (since pi+1 | pm + k


and pi+1 | pm | pm · (−1) = − pm ). Hence, their sum ( pm + k ) + (− pm ) is a multiple of
pi+1 as well. In other words, k is a multiple of pi+1 (since ( pm + k ) + (− pm ) = k). But
this contradicts pi+1 ∤ k.
This contradiction shows that our assumption was wrong. Hence, pi+1 ∤ pm + k is
proved.
Combining pi | pm + k with pi+1 ∤ pm + k, we see that i is the largest j ∈ N satisfying
p | pm + k. In other words, i = v p ( pm + k ). Hence, v p ( pm + k ) = i = v p (k ). This
j

proves Lemma 3.6.13.

Proof of Proposition 3.6.12. We will prove Proposition 3.6.12 by induction on n:


n n
 Base case:0 If n = 0, then there exists no k ∈ {1, 2, . . . , 2 − 1} (since the set {1, 2, . . . , 2 − 1} =
1, 2, . . . , 2 − 1 = {1, 2, . . . , 0} is empty in this case). Thus, in this case, Proposition
3.6.12 is vacuously true (i.e., true because it makes a claim about non-existing objects).
Induction step: Let n be a positive integer. Assume (as the induction hypothesis) that
Proposition 3.6.12 holds for n − 1 instead of n. We must now prove that Proposition
3.6.12 holds for n as well.
So let k ∈ {1, 2, . . . , 2n − 1} be arbitrary. We must prove that the k-th move of the
strategy Sn moves the (v2 (k ) + 1)-th smallest disk.
Lemma 3.6.13 (applied to 2, n − 1 and k − 2n−1 instead of p, m and k) yields
   
v 2 2n −1 + k − 2n −1 = v 2 k − 2n −1 ,

so that  
 
v2 k − 2n−1 = v2 2| n−1 +{z
k − 2n−}1  = v2 (k ) . (37)
=k
Recall that the strategy Sn was defined recursively: It consists of first performing the
strategy Sn−1 (but with pegs 2 and 3 swapped), then moving the largest disk (from peg
1 to peg 3), and then again performing the strategy Sn−1 (but now with pegs 1 and 2
swapped). Since strategy Sn−1 requires 2n−1 − 1 moves in total, we thus conclude that
Math 221 Winter 2024, version March 12, 2024 page 145

1. the first 2n−1 − 1 moves of strategy Sn are identical with the corresponding moves
of strategy Sn−1 (except that pegs 2 and 3 are swapped);
2. the 2n−1 -th move of strategy Sn consists in moving the largest disk;
3. the next 2n−1 − 1 moves of strategy Sn (that is, the moves numbered 2n−1 +
1, 2n−1 + 2, . . . , 2n − 1) are identical with the moves of strategy Sn−1 (except
that pegs 1 and 2 are swapped).
Therefore, the k-th move of the strategy Sn
• moves the same disk as the k-th move of Sn−1 if k < 2n−1 ;
• moves the largest disk if k = 2n−1 ;
• moves the same disk as the k − 2n−1 -th move of Sn−1 if k > 2n−1 .


We thus distinguish between the following three cases:


Case 1: We have k < 2n−1 .
Case 2: We have k = 2n−1 .
Case 3: We have k > 2n−1 .
Let us first consider Case 1. In this case, we have k < 2n−1 . Thus, the k-th move of
the strategy Sn moves the same disk as the k-th move of Sn−1 (according to the first of
the three bullet points above). But our induction hypothesis shows that the latter move
n n−1 entails
moves the (v2 (nk−)1+ 1)-th smallest disk (since k ∈ {1, 2, . . . , 2 − 1} and k < 2
k ∈ 1, 2, . . . , 2 − 1 ). Thus, the former move also moves the (v2 (k) + 1)-th smallest
disk. So the claim we are trying to prove has been proved in Case 1.
Let us now consider Case 2. In this case, we have k = 2n−1 . Thus, the k-th move of
the strategy Sn moves the largest disk (according to the second of the three bullet points
above), i.e., the n-th smallest disk (since there are n disks in total, so the largest disk
is the n-th smallest). However, we have k = 2n−1 and thus v2 (k ) = v2 2n−1 = n − 1,


so that n = v2 (k ) + 1. Thus, the k-th move of the strategy Sn moves the (v2 (k ) + 1)-th
smallest disk (because we have shown that it moves the n-th smallest disk). So the
claim we are trying to prove has been proved in Case 2.
n −1
Let us finally consider Case 3. In this case, we have
n − 1
 k > 2 . Thus, the k-th move of
the strategy Sn moves the same disk as the k − 2 -th move of Sn−1 (according to the
third of the three bullet points above). But our induction hypothesis  (applied to k − 2n−1
n − 1

instead of k) yields that the latter move moves the v2 k − 2  + 1 -th smallest disk
(since k ∈ {1, 2, . . . , 2n − 1} and k > 2n−1 entails k − 2n− 1 ∈  1, 2, . . . , 2n−1 − 1 quite
easily37 ). Thus, the former move moves the v2 k − 2n−1 + 1 -th smallest disk as well.
In view of (37), we can restate this as follows: The former move moves the (v2 (k ) + 1)-
th smallest disk. So the claim we are trying to prove has been proved in Case 3.
Thus, we have proved our claim in all three Cases 1, 2 and 3. In other words, we
have shown that the k-th move of the strategy Sn moves the (v2 (k ) + 1)-th smallest
disk. Hence, we have proved that Proposition 3.6.12 holds for n. This completes the
induction step. Thus, Proposition 3.6.12 is proved.
37 Here are the details: From k ∈ {1, 2, . . . , 2n − 1} ⊆ Z and k > 2n−1 , we see immediately
that k − 2n−1 is a positive integer. Furthermore, from k ∈ {1, 2, . . . , 2n − 1}, we obtain
k ≤ 2n − 1 = 2 · 2n−1 − 1 = 2n−1 + 2n−1 − 1, so that k − 2n−1 ≤ 2n−1 − 1. Since k − 2n−1 is a
positive integer, this results in k − 2n−1 ∈ 1, 2, . . . , 2n−1 − 1 .
Math 221 Winter 2024, version March 12, 2024 page 146

The ruler sequence also has an appearance in data storage:

Remark 3.6.14. A “Tower of Hanoi” backup scheme is a backup scheme


where you have several backup drives for your system. Every odd day, you
back up to the first drive. Every even day that is not divisible by 4, you back
up to the second drive. Every day that is divisible by 4 but not by 8, you
back up to the third drive. And so on. Thus, on the k-th day, you back up
to the (v2 (k ) + 1)-th drive. This scheme ensures that at every point in time,
you have both a fresh backup and several levels of older backups available.
(Of course, I only said “day” for simplicity; you can use any unit of time
instead. Of course, the first drive will see the largest traffic and therefore will
wear out and need replacement.)

3.6.10. More exercises

Exercise 3.6.7. Let p be a prime, and let m ∈ N. Let a and b be two integers
such that pm | ab and pm ∤ a. Prove that p | b.

The following exercise generalizes Theorem 3.6.3:

Exercise 3.6.8.Let p be a prime. Let m ∈ N, and let k ∈ {1, 2, . . . , pm − 1}.


pm
Prove that p | .
k

3.6.11. The p-valuation of n!


What is the p-valuation of a factorial n! ? There turns out to be a nice formula
for this:38

Theorem 3.6.15 (de Polignac’s formula). Let p be a prime. Let n ∈ N. Then,


     
n n n
v p (n!) = + + +···
p1 p2 p3
     
= n//p1 + n//p2 + n//p3 + · · · .

Proof sketch. First, these sums are infinite sums. Why do they make sense?39

38 See Definition 3.3.13 and Definition 3.3.2 for the notations we are using here. The meaning
of the infinite sums will be discussed in the proof of the theorem.
39 It is trivially easy to concoct an infinite sum that does not make sense: for instance, 1 + 1 +
1 1 1
1 + · · · , or + + + · · · . In general, “infinite” operations in mathematics do not usually
1 2 3
exist unless their existence has been justified.
Math 221 Winter 2024, version March 12, 2024 page 147

Because we can discard all the addends that are zero, and then only finitely
many nonzero addends remain. For instance, if p = 2 and n = 13, then
     
n n n
1
+ 2 + 3 +···
p p p
     
13 13 13
= 1 + 2 + 3 +···
2 2 2
= ⌊6.5⌋ + ⌊3.25⌋ + ⌊1.625⌋ + ⌊0.8125⌋ + ⌊0.40625⌋ + · · ·
= 6+3+1+ 0| + 0 + 0 +{z0 + 0 + · · }·
These are zeroes, thus don’t contribute to the sum
= 6 + 3 + 1 = 10,
which is a well-defined
  (finite)
 value.
 More
 generally, for any prime p and any
n n n
n ∈ N, the sum 1
+ 2 + 3 + · · · has only finitely many nonzero
p p p
n
addends (because for every i ≥ n, we have pi ≥ pn > n and thus 0 ≤ i < 1, so
  p
n
that = 0), and thus becomes a finite sum once we discard all its addends
pi
that are zero; but a finite sum obviously has a well-defined
jnk value.
Moreover, for every positive integer d, we have = n//d (by Proposition
d
3.3.14). Thus, the two infinite sums
     
n n n
1
+ 2 + 3 +··· and
p p p
     
n//p1 + n//p2 + n//p3 + · · ·

are equal.
It remains to prove that these two sums equal v p (n!). In other words, we
must prove that
     
n n n
v p (n!) = 1
+ 2 + 3 +··· . (38)
p p p
We can prove this by induction on n:
The base case (n = 0) boils down to 0 = 0 + 0 + 0 + · · · , which is true.
For the induction step, we proceed from n − 1 to n. So we fix a positive integer
n, and we assume (as our induction hypothesis) that
n−1 n−1 n−1
     
v p ((n − 1)!) = + + +··· , (39)
p1 p2 p3
and we set out to prove that
     
n n n
v p (n!) = 1
+ 2 + 3 +··· . (40)
p p p
Math 221 Winter 2024, version March 12, 2024 page 148

We first compare the left hand sides: Let k = v p (n). We know that n! =
(n − 1)! · n, and therefore
v p (n!) = v p ((n − 1)! · n)
= v p ((n − 1)!) + v p (n) (by Theorem 3.6.10 (a))
| {z }
=k
= v p ((n − 1)!) + k.
In other words, the LHS40 of (40) equals the LHS of (39) plus k.
Now, we shall show that the RHSs of the two equations differ by k as well.
For each i ∈ {1, 2, . . . , k}, we have pi | pk | n (since k = v p (n)) and therefore
n−1
   
n 
i

= + 1 by Corollary 3.3.19 (a), applied to d = p .
pi pi
On the other hand, for each i ∈ {k + 1, k + 2, k + 3, . . .}, we have pi ∤ n (since
i > k = v p (n)) and thus
n−1
   
n 
i

= by Corollary 3.3.19 (b), applied to d = p .
pi pi
These two equalities together yield
     
n n n
1
+ 2 + 3 +···
p p p
n−1 n−1 n−1
        
= +1 + +1 +···+ +1
p1 p2 pk
n−1 n−1 n−1
     
+ + + +···
p k +1 p k +2 p k +3
n−1 n−1 n−1
      
= + + + · · · + k.
p1 p2 p3
In other words, the RHS of (40) equals the RHS of (39) plus k.
But previously, we have shown the same for the LHSs. Thus, the equality
(40) is just the equality (39) with each side increased by k. Since (39) holds (by
the induction hypothesis), it thus follows that (40) also holds. In other words,
     
n n n
v p (n!) = 1
+ 2 + 3 +··· .
p p p
But this completes the induction step, and thus Theorem 3.6.15 is proven.
(For another proof of Theorem 3.6.15, see [Grinbe19b, Exercise 2.17.2 (c)] or
[Grinbe21, Theorem 5.3.1].)
Theorem 3.6.15 is known as de Polignac’s formula or Legendre’s formula.
Various uses of this formula can be found in [Grinbe21].
40 Theword “LHS” means “left hand side”.
The word “RHS” means “right hand side”.
Math 221 Winter 2024, version March 12, 2024 page 149

3.6.12. Prime factorization


We are now ready to prove one of the most important properties of primes: the
fact that every positive integer can be uniquely decomposed into a product of
some primes. For instance,

200 = 2 · 100 = 2 · 2 · 50 = 2 · 2 · 5 · 10 = 2| · 2 ·{z


5 · 2 · 5} .
a product of primes

The word “uniquely” means here that any two ways of decomposing a given
positive integer n into a product of primes are equal up to reordering the fac-
tors. For example, we can also decompose 200 as 5 · 2 · 2 · 5 · 2, but this is the
same product with the factors in a different order.
Let us state this fact in full generality. First, we introduce a name for these
decompositions:

Definition 3.6.16. Let n be a positive integer. A prime factorization of n


means a finite list ( p1 , p2 , . . . , pk ) of primes (not necessarily distinct) such
that
n = p1 p2 · · · p k .

Thus, (2, 2, 5, 2, 5) and (5, 2, 2, 5, 2) are prime factorizations of 200. Another


such is (2, 2, 2, 5, 5). There are more41 , but all of them contain the number 2
thrice and the number 5 twice (and no other numbers), just as we said.
Let us state this as a general claim:

Theorem 3.6.17 (Fundamental Theorem of Arithmetic). Let n be a positive


integer. Then:
(a) There exists a prime factorization of n.
(b) This prime factorization is unique up to reordering its entries. In other
words, if ( p1 , p2 , . . . , pk ) and (q1 , q2 , . . . , qℓ ) are two prime factorizations of
n, then (q1 , q2 , . . . , qℓ ) can be obtained from ( p1 , p2 , . . . , pk ) by reordering the
entries.
(c) Let ( p1 , p2 , . . . , pk ) be a prime factorization of n. Let p be any prime.
Then, the number of times that p appears in the list ( p1 , p2 , . . . , pk ) (in other
words, the number of i ∈ {1, 2, . . . , k } satisfying pi = p) is v p (n).

Proof. (a) This is Theorem 1.9.7.


(c) By the definition of a prime factorization, we have n = p1 p2 · · · pk . Thus,

v p ( n ) = v p ( p1 p2 · · · p k )
= v p ( p1 ) + v p ( p2 ) + · · · + v p ( p k ) (41)
41 In Remark 6.7.2, we will see how many.
Math 221 Winter 2024, version March 12, 2024 page 150

(by Corollary 3.6.11).


The right hand side of this equality is a sum of k addends. Each of these
addends has the form v p ( pi ) for some i ∈ {1, 2, . . . , k }. Each such addend
v p ( pi ) equals 1 if pi = p (by Theorem 3.6.10 (d)) and equals 0 if pi ̸= p (by
Theorem 3.6.10 (e)).
Thus, our sum v p ( p1 ) + v p ( p2 ) + · · · + v p ( pk ) has an addend equal to 1 for
each i ∈ {1, 2, . . . , k} that satisfies pi = p, and an addend equal to 0 for each i
that doesn’t.
Obviously, the addends that are equal to 0 do not affect the sum. Hence, the
sum equals the number of addends equal to 1. In other words, the sum equals
the number of i ∈ {1, 2, . . . , k } that satisfy pi = p.
In view of (41), we can restate this as follows: v p (n) equals the number of
i ∈ {1, 2, . . . , k} that satisfy pi = p. In other words, v p (n) equals the number of
times that p appears in the list ( p1 , p2 , . . . , pk ). This proves Theorem 3.6.17 (c).
(b) This follows easily from part (c). Namely:
Let ( p1 , p2 , . . . , pk ) and (q1 , q2 , . . . , qℓ ) be two prime factorizations of n. We
must prove that (q1 , q2 , . . . , qℓ ) can be obtained from ( p1 , p2 , . . . , pk ) by reorder-
ing the entries.
Each prime p appears v p (n) times in the list ( p1 , p2 , . . . , pk ) (by part (c)),
and appears v p (n) times in the list (q1 , q2 , . . . , qℓ ) (similarly). Thus, each prime
p appears the same number of times in either list. Since both lists consist
of primes, this shows that the two lists contain the same numbers the same
number of times. Therefore, (q1 , q2 , . . . , qℓ ) can be obtained from ( p1 , p2 , . . . , pk )
by reordering the entries. This proves Theorem 3.6.17 (b).
(We have used the intuitively obvious fact that if two lists of numbers contain
the same numbers the same number of times, then one can be obtained from
the other by reordering. You are free to trust your intuition on this one; for a
formal proof, see [Grinbe19b, Lemma 2.13.20].)

Theorem 3.6.17 (a) shows that every positive integer n has a prime factoriza-
tion. Finding this prime factorization is a classical hard computational problem.
(Quite a few encryption standards rely on its hardness.)

3.6.13. Applications
Prime factorizations can be rather useful. The next few exercises provide some
examples:

Exercise 3.6.9. Let n and m be integers. Prove that n | m if and only if each
prime p satisfies v p (n) ≤ v p (m).
[Hint: For the “if” direction, start by picking prime factorizations of n and
m.]
Math 221 Winter 2024, version March 12, 2024 page 151

Exercise 3.6.10. Let k be a positive integer. Let w be a rational number such


that wk is an integer. Prove that w is an integer.
[Hint: Use Exercise 3.6.9.]

Exercise 3.6.11. Let n ∈ N. Let a and b be two coprime positive integers.


Assume that ab is the n-th power of a positive integer. Prove that a and b are
n-th powers of positive integers.

3.7. Least common multiples


In Section 3.4, we have studied greatest common divisors in some detail. Let me
now briefly discuss least common multiples: a kind of counterpart to greatest
common divisors. The greatest common divisor of two positive integers a and
b is usually smaller than both a and b; in contrast, the least common multiple
is usually larger than both.

Definition 3.7.1. Let a and b be two integers.


(a) The common multiples of a and b are the integers that are divisible by
a and simultaneously divisible by b.
(b) The least common multiple (aka the lowest common multiple, or just
the lcm) of a and b is defined as follows:

• If a and b are nonzero, then it is the smallest positive common multiple


of a and b.

• Otherwise, it is 0.

It is denoted by lcm ( a, b).

Some examples:

• We have lcm (3, 4) = 12.

• We have lcm (6, 4) = 12.

• We have lcm (6, 8) = 24.

• We have lcm (2, 4) = 4.

• We have lcm (0, 5) = 0.

• We have lcm (−2, 3) = 6.


Math 221 Winter 2024, version March 12, 2024 page 152

Note that the lcm of two positive integers is a fairly well-known concept:
When you bring two fractions (of integers) to their lowest common denomina-
tor, this lowest common denominator is actually the lcm of the denominators
of the fractions.
Here are some properties of lcms:

Theorem 3.7.2. Let a and b be two integers. Then:


(a) The lcm of a and b exists.
(b) We have lcm ( a, b) ∈ N.
(c) We have lcm ( a, b) = lcm (b, a).
(d) We have a | lcm ( a, b) and b | lcm ( a, b).
(e) We have lcm (− a, b) = lcm ( a, b) and lcm ( a, −b) = lcm ( a, b).

Proof sketch. Easy consequences of the definitions. (For part (a), observe that
two nonzero integers a and b have at least one positive common multiple –
namely, | ab|.)
Here is a counterpart to the universal property of the gcd (Theorem 3.4.9):

Theorem 3.7.3 (universal property of the lcm). Let a, b, m ∈ Z. Then, we have


the equivalence

( a | m and b | m) ⇐⇒ (lcm ( a, b) | m) .

In other words, the common multiples of two integers a and b are precisely
the multiples of lcm ( a, b).
Proof sketch. (See [Grinbe19b, Theorem 2.11.7] for a detailed proof.)
⇐=: If lcm ( a, b) | m, then a | m (since Theorem 3.7.2 (d) yields a | lcm ( a, b) |
m) and b | m (similarly). Thus, the “⇐=” direction of the desired equivalence
is proved.
=⇒: Assume that a | m and b | m. We must show that lcm ( a, b) | m.
If one of a and b is 0, then this is easy (in fact, let’s say that a = 0; then,
0 = a | m, thus m = 0, and therefore lcm ( a, b) | 0 = m). Hence, we need only
to consider the case when a and b are nonzero.
In this case, set ℓ = lcm ( a, b). Recall that ℓ is defined as the smallest positive
common multiple of a and b. Hence, ℓ is a positive integer and is a multiple of
a and of b. Let q and r be the quotient and the remainder of the division of m
by ℓ. Thus,

q∈Z and r ∈ {0, 1, . . . , ℓ − 1} and m = qℓ + r


Math 221 Winter 2024, version March 12, 2024 page 153

(by the definition of quotient and remainder). From r ∈ {0, 1, . . . , ℓ − 1}, we


obtain r < ℓ.
From m = qℓ + r, we obtain r = m − qℓ. Since both m and ℓ are multiples of a,
we thus conclude that r is a multiple of a as well. Similarly, r is a multiple of b.
Thus, r is a common multiple of a and b. But ℓ is the smallest positive common
multiple of a and b. If r was positive, then r would contradict this minimality
(because r < ℓ). Hence, r cannot be positive. Since r ∈ {0, 1, . . . , ℓ − 1}, we
conclude that r must be 0. Hence, m = qℓ + |{z} r = qℓ, so that ℓ | m. In other
=0
words, lcm ( a, b) | m (since ℓ = lcm ( a, b)). This proves the “=⇒” direction of
the desired equivalence.

The gcd and the lcm of two integers are connected to each other by the
following formula:

Theorem 3.7.4. Let a and b be two integers. Then,

gcd ( a, b) · lcm ( a, b) = | ab| .

Proof sketch. (See [Grinbe19b, Theorem 2.11.6] for a detailed proof.)


First, dispose of the case when a or b is 0. In the remaining case, argue that
ab
is an integer and is a common multiple of a and b. By Theorem
gcd ( a, b)
ab
3.7.3, this entails that lcm ( a, b) | , so that gcd ( a, b) · lcm ( a, b) | ab. On
gcd ( a, b)
ab
the other hand, argue (again using Theorem 3.7.3) that is an integer
lcm ( a, b)
and divides gcd ( a, b) (because it divides each of a and b). Thus conclude that
ab | gcd ( a, b) · lcm ( a, b). Now, recall that two integers x and y that satisfy x | y
and y | x must satisfy | x | = |y|.

Both gcds and lcms have easily computable p-valuations:

Theorem 3.7.5. Let p be a prime. Let a and b be two integers. Then,



v p (gcd ( a, b)) = min v p ( a) , v p (b) and

v p (lcm ( a, b)) = max v p ( a) , v p (b) .

Proof sketch. This is a particular case of [Grinbe19b, Proposition 5.2.15]. Any-


way, the proof is a nice exercise in using the universal properties of the gcd and
the lcm (and the definition of p-valuation), so you should do it yourself.
Math 221 Winter 2024, version March 12, 2024 page 154

Theorem 3.7.5 gives an easy way to compute gcd ( a, b) and lcm ( a, b) if you
know prime factorizations of two positive integers a and b. For example, know-
ing that 18 = 2 · 32 and 12 = 22 · 3, we obtain

gcd (18, 12) = 2 · 3 = 6 and


2 2
lcm (18, 12) = 2 · 3 = 36.

If you don’t know the prime factorizations of a and b, the quickest way to
find lcm ( a, b) is by using the Euclidean algorithm to find gcd ( a, b) first, and
then solving the equality gcd ( a, b) · lcm ( a, b) = | ab| for lcm ( a, b). This gives42

| ab| a
lcm ( a, b) = = ·b .
gcd ( a, b) gcd ( a, b)

Gcds and lcms can be defined for multiple numbers (not just for two num-
bers). Their properties are mostly analogous to the case of two numbers, with
some exceptions (i.e., the formula gcd ( a, b) · lcm ( a, b) = | ab| does not general-
ize to gcd ( a, b, c) · lcm ( a, b, c) = | abc|, but rather to gcd ( a, b, c) · lcm (bc, ca, ab) =
| abc|). See [Grinbe19b, §2.11] for more details.

3.8. Sylvester’s xa + yb theorem (or the Chicken McNugget


theorem)
We come to a rather curious (although not overly important) topic in elemen-
tary number theory: the N-linear combinations of two positive integers.
For this entire section, we let a and b be two positive integers.

Definition 3.8.1. (a) A Z-linear combination (short: Z-LC) of a and b will


mean a number of the form

xa + yb with x, y ∈ Z.

In other words, it means a number of cents that you can pay with a-cent coins
and b-cent coins if you can get change.
(b) An N-linear combination (short: N-LC) of a and b will mean a number
of the form
xa + yb with x, y ∈ N.
In other words, it means a number of cents that you can pay with a-cent coins
and b-cent coins without getting change.

Thus, Proposition 1.9.8 is saying that any integer n ≥ 8 is an N-LC of 3 and


5. Moreover, as we saw just above that proposition, the numbers 0, 3, 5, 6 are
42 Here we are assuming that a and b are nonzero. If a or b is 0, then lcm ( a, b) is just 0.
Math 221 Winter 2024, version March 12, 2024 page 155

N-LCs of 3 and 5 as well, whereas the numbers 1, 2, 4, 7 are not. Thus the
complete list of all N-LCs of 3 and 5 is

0, 3, 5, 6, 8, 9, 10, . . . .
| {z }
all integers n≥8

This should prompt us to study N-LCs of a and b in the general case. We


shall begin with the Z-LCs, however, since they are much easier to describe.
Note that the N-LCs of a and b are always ≥ 0 (because if x, y ∈ N, then
x a + y |{z}
|{z} |{z} b ≥ 0), whereas the Z-LCs of a and b can have any sign.
|{z}
≥0 >0 ≥0 >0
Clearly, any N-LC of a and b is a Z-LC of a and b. However, a Z-LC of a and
b doesn’t have to be an N-LC of a and b, even if it is ≥ 0. For example, 1 is a
Z-LC of 3 and 5 (since 1 = 2 · 3 + (−1) · 5), but not an N-LC of 3 and 5.

We can easily describe the Z-LCs of a and b:

Proposition 3.8.2. The Z-LCs of a and b are exactly the multiples of gcd ( a, b).

Proof. We must prove the following two claims:

Claim 1: Any Z-LC of a and b is a multiple of gcd ( a, b).

Claim 2: Any multiple of gcd ( a, b) is a Z-LC of a and b.

But both claims are easy:

Proof of Claim 1. Let n be a Z-LC of a and b. We must show that n is a multiple of


gcd ( a, b).
Indeed, n is a Z-LC of a and b, and thus has the form n = xa + yb for some x, y ∈ Z.
Consider these x, y. We have gcd ( a, b) | a | xa and gcd ( a, b) | b | yb. In other
words, both numbers xa and yb are multiples of gcd ( a, b). Hence, their sum xa + yb
is a multiple of gcd ( a, b) as well. In other words, n is a multiple of gcd ( a, b) (since
n = xa + yb). This proves Claim 1.

Proof of Claim 2. Let n be a multiple of gcd ( a, b). We must prove that n is a Z-LC of a
and b.
Bezout’s theorem (Theorem 3.4.6) says that there exist two integers x and y such that
gcd ( a, b) = xa + yb. Consider these x and y. However, n is a multiple of gcd ( a, b); in
other words, there exists an integer c such that n = gcd ( a, b) · c. Consider this c. Now,

n = gcd ( a, b) · c = ( xa + yb) · c = xac + ybc = (cx ) a + (cy) b.


| {z }
= xa+yb

This shows that n is a Z-LC of a and b (since cx and cy are integers). This proves Claim
2.

Combining Claim 1 with Claim 2, we conclude that the Z-LCs of a and b are exactly
the multiples of gcd ( a, b). Thus, Proposition 3.8.2 is proved.
Math 221 Winter 2024, version March 12, 2024 page 156

Now we move on to the N-LCs. What are they? Can we describe them any
better than by their definition?
a
Let g = gcd ( a, b). Then, g divides each of a and b, so that the numbers
g
b
and are positive integers. We can simplify our problem by replacing a and
g
a b a
b with and . Clearly, the N-LCs of a and b are just the N-LCs of and
g g g
b a b
, multiplied by g. By Theorem 3.5.12, the two integers and are coprime.
g g g
Thus, understanding the N-LCs of the original integers a and b is equivalent to
a b
understanding the N-LCs of the coprime integers and .
g g
Hence, it suffices to solve our problem in the case when a and b are coprime.
In this case, Proposition 3.8.2 shows that every integer is a Z-LC of a and
b (since every integer is a multiple of 1 = gcd ( a, b)). The N-LCs are more
interesting. We have already listed the N-LCs of 3 and 5 above; let us now give
a somewhat more complicated example: The N-LCs of 5 and 9 are

0, 5, 9, 10, 14, 15, 18, 19, 20, 23, 24, 25, 27, 28, 29, 30, 32, 33, 34, . . . .
| {z }
all integers n≥32

Note that every integer n ≥ 32 is an N-LC of 3 and 5. Among the first 32


nonnegative integers 0, 1, . . . , 31, exactly half (that is, 16) are N-LCs of 5 and 9.
A similar phenomenon can be seen in our above example with 3 and 5, except
that 32 is replaced by 8.
This phenomenon generalizes:

Theorem 3.8.3 (Sylvester’s two-coin theorem, or Chicken McNugget theo-


rem). Assume that the two positive integers a and b are coprime. Then:
(a) Every integer n > ab − a − b is an N-LC of a and b.
(b) The number ab − a − b is not an N-LC of a and b.
(c) Among the first ( a − 1) (b − 1) nonnegative integers 0, 1, . . . , ab − a −
b, exactly half are N-LCs of a and b.
(d) Let n ∈ Z. Then, exactly one of the two numbers n and ab − a − b − n
is an N-LC of a and b.

This theorem was discovered by J. J. Sylvester in 1884, as a side-product of


his work in invariant theory. Its more recent moniker is due to the McDonald’s
Chicken McNuggets, which used to be sold in packs of 9 or 20, prompting
mathematicians to wonder what numbers of nuggets could be bought.
The theorem stops short of explicitly answering which of the first ( a − 1) (b − 1)
nonnegative integers are N-LCs of a and b. There is no “easy formula” for this
Math 221 Winter 2024, version March 12, 2024 page 157

answer. But Theorem 3.8.3 (a) gives you all the information you need to com-
pute all the N-LCs of a and b, since the first ( a − 1) (b − 1) nonnegative integers
can be checked one by one.
The particular case of Theorem 3.8.3 (a) where a = p and b = p + 1 was
Exercise 3.3.2.
Before we prove Theorem 3.8.3, we show a basic lemma:

Lemma 3.8.4. Assume that the two positive integers a and b are coprime. Let
n ∈ Z. Then, there exist two integers u and v such that 0 ≤ u ≤ b − 1 and
ua + vb = n.

Proof of Lemma 3.8.4. Bezout’s theorem (Theorem 3.4.6) says that there exist two inte-
gers x and y such that gcd ( a, b) = xa + yb. Consider these x and y. Thus, xa + yb =
gcd ( a, b) = 1 (since a and b are coprime).
Recall that b is a positive integer. Thus, division with remainder by b is well-defined
(see Definition 3.3.2 for the terminology).
Let q = (nx ) //b and r = (nx ) %b. In other words, let q and r be the quotient and
the remainder of the division of nx by b. By the definition of quotient and remainder,
we thus have

q∈Z and r ∈ {0, 1, . . . , b − 1} and nx = qb + r.

From r ∈ {0, 1, . . . , b − 1}, we see that r is an integer satisfying 0 ≤ r ≤ b − 1.


On the other hand, nxa + nyb = n ( xa + yb) = n, so that
| {z }
=1

n = |{z}
nx a + nyb = (qb + r ) a + nyb
=qb+r

= qba + ra + nyb = ra + qba + nyb = ra + (qa + ny) b.


| {z }
=(qa+ny)b

In other words, ra + (qa + ny) b = n.


Altogether, we now know that r and qa + ny are two integers satisfying 0 ≤ r ≤ b − 1
and ra + (qa + ny) b = n. Thus, there exist two integers u and v such that 0 ≤ u ≤ b − 1
and ua + vb = n (namely, u = r and v = qa + ny). This proves Lemma 3.8.4.

Proof of Theorem 3.8.3. We shall first prove part (b) and then part (d). The other two
parts will follow quite easily from these.
(b) Assume the contrary. Thus, ab − a − b is an N-LC of a and b. In other words,
there exist integers x and y such that ab − a − b = xa + yb. Consider these x and y.
From ab − a − b = xa + yb, we obtain ab = xa + yb + a + b = ( x + 1) a + (y + 1) b =
a ( x + 1) + b (y + 1). Hence,

b (y + 1) = ab − a ( x + 1) = a · (b − ( x + 1)) .
| {z }
an integer
Math 221 Winter 2024, version March 12, 2024 page 158

This shows that a | b (y + 1). Thus, the coprime removal theorem (Theorem 3.5.6)
y+1
yields that a | y + 1 (since a is coprime to b). Therefore, is an integer (since
a
y+1
a ̸= 0). Since y + 1 ≥ 1 > 0 and a > 0, this integer is furthermore positive,
|{z} a
≥0
and thus is ≥ 1. In other words, y + 1 ≥ a. Hence, y ≥ a − 1. Now,

ab − a − b = |{z}
x a + y b ≥ 0a + ( a − 1) b = ab − b.
|{z}
≥0 ≥ a −1

Subtracting ab − a − b from both sides of this inequality, we obtain 0 ≥ a, which con-


tradicts the positivity of a. This contradiction shows that our assumption was false.
Thus, Theorem 3.8.3 (b) is proved.
(d) Let m = ab − a − b − n. Hence, n + m = ab − a − b. Thus, n + m is not an N-LC
of a and b (since Theorem 3.8.3 (b) shows that ab − a − b is not an N-LC of a and b).
We shall now prove the following two claims:

Claim 1: At least one of the two numbers n and m is an N-LC of a and b.

Claim 2: At most one of the two numbers n and m is an N-LC of a and b.

Proof of Claim 1. Lemma 3.8.4 shows that there exist two integers u and v such that
0 ≤ u ≤ b − 1 and ua + vb = n. Consider these u and v. Now,

(b − 1 − u) a + (−v − 1) b = ba − a − ua − vb − b
ba − a − b − (ua + vb)
= |{z}
| {z }
= ab =n
= ab − a − b − n = m (42)

(by the definition of m). We are in one of the following two cases:
Case 1: We have v ≥ 0.
Case 2: We have v < 0.
Let us first consider Case 1. In this case, we have v ≥ 0. Thus, v ∈ N. Also, u ∈ N
(since 0 ≤ u). Recall that ua + vb = n, so that n = |{z}
u a + |{z}
v b. This shows that n is
∈N ∈N
an N-LC of a and b. Thus, at least one of the two numbers n and m is an N-LC of a
and b. So we have proved Claim 1 in Case 1.
Let us next consider Case 2. In this case, we have v < 0. Hence, −v > 0, so that
−v ≥ 1 (since −v is an integer) and therefore −v − 1 ≥ 0. Thus, −v − 1 ∈ N. Moreover,
from u ≤ b − 1, we obtain b − 1 − u ≥ 0, so that b − 1 − u ∈ N. However, (42) yields

m = (b − 1 − u) a + (−v − 1) b.
| {z } | {z }
∈N ∈N

This shows that m is an N-LC of a and b. Thus, at least one of the two numbers n and
m is an N-LC of a and b. So we have proved Claim 1 in Case 2.
Thus, Claim 1 holds in each of Cases 1 and 2. The proof of Claim 1 is therefore
complete.
Math 221 Winter 2024, version March 12, 2024 page 159

Proof of Claim 2. Assume the contrary. Thus, both numbers n and m are N-LCs of a
and b. Therefore, we can write n as n = xa + yb for some x, y ∈ N (since n is an N-LC
of a and b). Furthermore, we can write m as m = za + wb for some z, w ∈ N (since m is
an N-LC of a and b). Consider these x, y, z, w. Now, adding the equalities n = xa + yb
and m = za + wb together, we obtain

n + m = ( xa + yb) + (za + wb) = ( x + z) a + (y + w) b.


| {z } | {z }
∈N ∈N

This shows that n + m is an N-LC of a and b. This contradicts the fact that n + m is not
an N-LC of a and b. This contradiction shows that our assumption was wrong. Hence,
Claim 2 is proved.

Combining Claim 1 with Claim 2, we see that exactly one of the two numbers n
and m is an N-LC of a and b. In other words, exactly one of the two numbers n and
ab − a − b − n is an N-LC of a and b (since m = ab − a − b − n). This proves Theorem
3.8.3 (d).
(a) Let n > ab − a − b. Then, the integer ab − a − b − n is negative, and thus cannot
be an N-LC of a and b (since any N-LC of a and b is ≥ 0). However, Theorem 3.8.3 (d)
yields that exactly one of the two numbers n and ab − a − b − n is an N-LC of a and b.
Since ab − a − b − n cannot be an N-LC of a and b, we thus conclude that n is an N-LC
of a and b. This proves Theorem 3.8.3 (a).
(c) Consider the following table of integers:

0 1 2 ········· ab − a − b − 1 ab − a − b
ab − a − b ab − a − b − 1 ab − a − b − 2 ········· 1 0

(whose first row is listing the numbers 0, 1, 2, . . . , ab − a − b in increasing order, while


the second row is listing the same numbers in decreasing order). This table has ab −
a − b + 1 = ( a − 1) (b − 1) many columns.
Each column of this table contains the numbers n and ab − a − b − n for some n ∈
{0, 1, . . . , ab − a − b}. Thus, each column of this table contains exactly one N-LC of a
and b (by Theorem 3.8.3 (d)). Hence, in total, exactly ( a − 1) (b − 1) entries of our table
are N-LCs of a and b (since our table has ( a − 1) (b − 1) many columns). Since our
table contains each element of the set {0, 1, . . . , ab − a − b} exactly twice, this entails
( a − 1) ( b − 1)
that exactly elements of this set are N-LCs of a and b. In other words,
2
among the elements of the set {0, 1, . . . , ab − a − b}, exactly half are N-LCs of a and
b. But this is precisely the claim of Theorem 3.8.3 (c). Thus, Theorem 3.8.3 (c) is
proved.

Theorem 3.8.3 is one of the deepest results we will see in this course, but it
is only the beginning of a theory! See the Wikipedia page for “Coin problem”
for more general (and trickier) questions, such as describing the N-LCs of three
integers a, b, c. See also the slides of Drew Armstrong’s talk at FPSAC 2017 for
deep connections to algebraic combinatorics (and a visual proof different from
ours).
Math 221 Winter 2024, version March 12, 2024 page 160

3.9. Digression: An introduction to cryptography


In this short section, we shall make a short foray into cryptography (also known as
cryptology): the study of ciphers, i.e., methods of encrypting data, mostly for the
purpose of maintaining secrecy or proving authenticity. This is a wide field with a
several thousand years’ long history; while it is not fully part of mathematics (as it
is governed to a significant extent by real-life limitations and the “human factor”), it
relies on mathematical concepts and results.
We will see an ancient (Roman) as well as a modern (20th century) cipher. Both
are underlain by elementary number theory. The second is still in use (occasionally),
whereas the former is only used for recreational purposes (e.g., hiding spoilers in
forum posts). Many more ciphers have been invented over the ages, and much has been
learned about how to break them (“cryptanalysis”) and how to keep them secure. As
so often, we will only reach skindeep into the subject. The interested reader can learn
much more from popular introductions such as [Beutel94] as well as many number
theory texts such as [KraWas15] and [KraWas18]. (Ciphers can also be based on other
parts of mathematics, but the majority use number theory and abstract algebra.)

3.9.1. Caesarian ciphers (alphabet rotation)


We begin with an algorithm that was supposedly used by Julius Caesar to encrypt
military communications. We assume that our messages are textual and are written in
the modern Latin alphabet, all in uppercase letters.43
The modern Latin alphabet has 26 letters: A, B, . . ., Z. Let us assign a number to
each of these letters in the most natural way:

A B C D E F G H I J K L M
0 1 2 3 4 5 6 7 8 9 10 11 12

N O P Q R S T U V W X Y Z
13 14 15 16 17 18 19 20 21 22 23 24 25

Thus, each letter corresponds to a unique number in the set {0, 1, . . . , 25}. For instance,
the letter F corresponds to the number 5, and the letter X corresponds to the number 23.
This gives us a method to encode letters as numbers (and, conversely, decode numbers
back into letters); this method will be called numeric encoding of letters.
A word is just a finite list of letters: For example, the word “KITTEN” is the list
(K, I, T, T, E, N). If we encode each of these six letters numerically, then we obtain
the list (10, 8, 19, 19, 4, 13) (since the letter K corresponds to 10, the letter I to 8, and
so on). This way, we can encode any word as a finite list of numbers (specifically, of
numbers in the set {0, 1, . . . , 25}). Conversely, any finite list of such numbers can be
43 Other alphabets (and lowercase letters) can be handled similarly. Note that the Romans had
a slightly different Latin alphabet than we do, but we shall use the modern one (with its 26
letters) for the sake of familiarity.
Math 221 Winter 2024, version March 12, 2024 page 161

decoded into a word (although not necessarily a meaningful word): For instance, the
list (17, 0, 19) decodes as “RAT”, since the number 17 corresponds to the letter R, the
number 0 to the letter A, and the letter 19 to the letter T.
We can now formulate Caesar’s algorithm, which is nowadays known as the “Cae-
sarian cipher ROT3 ” (we will soon see other variants):

Caesarian cipher ROT3 : To encrypt a word (written in the modern Latin


alphabet, all uppercase), proceed as follows:
1. Encode the word as a finite list of numbers ( a1 , a2 , . . . , an ) (using the
numeric encoding).
2. Replace each number ai in this list by ( ai + 3) %26.
3. Decode the resulting list back into a word.

Example 3.9.1. Let us encrypt the word “CRAZY” using the Caesarian cipher ROT3 .
First, we encode it as a finite list of numbers:

CRAZY −→ (2, 17, 0, 25, 24) .

Next, we replace each number ai in this list by ( ai + 3) %26. Thus,

• we replace the number 2 by (2 + 3) %26 = 5,

• we replace the number 17 by (17 + 3) %26 = 20,

• we replace the number 0 by (0 + 3) %26 = 3,

• we replace the number 25 by (25 + 3) %26 = 28%26 = 2, and

• we replace the number 24 by (24 + 3) %26 = 27%26 = 1.

Our list (2, 17, 0, 25, 24) thus turns into the new list (5, 20, 3, 2, 1). Decoding
the latter list back into a word, we find “FUDCB”.

An easy way to visualize the Caesarian cipher ROT3 is by placing the 26 letters of
the alphabet in the sectors of a “26-hour clock” (an analog clock with 26 hours instead
of the usual 12), in the order A, B, C, . . ., Z clockwise. This “alphabet clock” looks as
Math 221 Winter 2024, version March 12, 2024 page 162

follows:

Y Z A
X B
W C

V D

U E

T F

S G

R H

Q I

P J
O K
N M L
.

Then, ROT3 simply shifts each letter forward by 3 “hours” (so A becomes D, whereas
B becomes E, and so on).
Thus, it is clear how we can decrypt a word encrypted using ROT3 : We just need to
shift each letter backward by 3 “hours”, i.e., replace each ai by ( ai − 3) %26. We can
denote this operation by ROT−3 .
More generally, we define the operation ROTk for any integer k as follows:

Caesarian cipher ROTk (for a given integer k): To encrypt a word (written
in the modern Latin alphabet, all uppercase), proceed as follows:
1. Encode the word as a finite list of numbers ( a1 , a2 , . . . , an ) (using the
numeric encoding).
2. Replace each number ai in this list by ( ai + k ) %26.
3. Decode the resulting list back into a word.

In terms of our “letter clock”, ROTk shifts each letter forward by k “hours”. It is easy
to see that a word encrypted using ROTk can be decrypted back using ROT−k , since
ROT−k shifts each letter backward by k “hours”. We can also prove this rigorously
using our definition of ROTk , using the following simple lemma:

Lemma 3.9.2. Let k be an integer. Let a, b ∈ {0, 1, . . . , 25} be two numbers satisfying
b = ( a + k ) %26. Then, a = (b − k) %26.
Math 221 Winter 2024, version March 12, 2024 page 163

Proof. We have b = ( a + k ) %26 ≡ a + k mod 26 (by Proposition 3.3.11 (a), applied to


n = a + k and d = 26). Subtracting the trivial congruence k ≡ k mod 26 from this
congruence, we obtain b − k ≡ a mod 26, so that a ≡ b − k mod 26. Hence, Proposition
3.3.16 (applied to b − k and 26 instead of b and d) yields a%26 = (b − k ) %26. However,
from a ∈ {0, 1, . . . , 25}, we see that the remainder a%26 is a itself. Thus, a = a%26 =
(b − k) %26. This proves Lemma 3.9.2.
Of course, we used nothing special about the number 26 here; we could just as well
replace it by any fixed positive integer m in Lemma 3.9.2.
Lemma 3.9.2 shows that the Caesarian cipher ROT−k undoes the Caesarian cipher
ROTk : Indeed, when we apply ROTk to a word, each entry a in the corresponding list
of numbers gets replaced by b := ( a + k ) %26; then, a subsequent application of ROT−k
replaces this new number b by (b + (−k )) %26 = (b − k ) %26 = a (by Lemma 3.9.2),
which is the original entry before ROTk was applied.

Exercise 3.9.1. (a) Encrypt the word “REED” using ROT4 .


(b) Encrypt the word “BOON” using ROT16 .
(c) Some word was encrypted using ROT10 and became “GSDROB”. Reconstruct
the original word.
(d) Some (meaningful) word was encrypted using ROTk for some unknown inte-
ger k and became “WBBSFACGH”. Reconstruct the original word. (This illustrates
why Caesarian ciphers are not very secure, to put it mildly. There are quick ways to
solve this without trying “all” possibilities.)

Let us make some further observations about Caesarian ciphers:

• The encryption method ROT0 does nothing: Each word is encrypted as itself
(since shifting by 0 “hours” on the letter clock changes nothing, or since ( a + 0) %26 =
a%26 = a for each a ∈ {0, 1, . . . , 25}).

• The encryption method ROT26 also does nothing: Each word is encrypted as
itself (since shifting by 26 “hours” on the letter clock amounts to a full revolution,
or since ( a + 26) %26 = a for each a ∈ {0, 1, . . . , 25}).

• The encryption method ROT27 does the same as ROT1 (since ( a + 27) %26 =
( a + 1) %26 for each a ∈ {0, 1, . . . , 25}).
• More generally, if two integers u and v satisfy u ≡ v mod 26, then ROTu = ROTv .
Thus, there are only 26 distinct Caesarian ciphers, namely

ROT0 , ROT1 , . . . , ROT25 .

Any other ROTk is just a copy of one of these. Of these 26 ciphers, only 25 are
useful, since ROT0 does nothing.

• The cipher ROT13 inverts itself: Any word encrypted using ROT13 can be de-
crypted by applying ROT13 again. Indeed, ROT13 is undone by ROT−13 , but
ROT−13 = ROT13 because −13 ≡ 13 mod 26.
Math 221 Winter 2024, version March 12, 2024 page 164

• Encrypting a word using ROTu (for some integer u) and then encrypting the
result using ROTv (for some integer v) is the same as encrypting the original
word using ROTu+v .

Exercise 3.9.2. Prove the latter statement rigorously using the description in terms
of remainders. That is, prove the following fact:
If u and v are two integers, and if a, b, c ∈ {0, 1, . . . , 25} are three numbers satisfy-
ing b = ( a + u) %26 and c = (b + v) %26, then c = ( a + (u + v)) %26.

We have so far been encrypting single words. To encrypt an entire text, one must
decide what to do about whitespaces. There are different legitimate choices: e.g.,
one can leave them unchanged; one can remove them (at the risk of making the text
hard to read even after decryption); or one can treat them as a “27th letter” of the
alphabet (thus adapting the definition of Caesarian ciphers to use ( a + k ) %27 instead
of ( a + k ) %26). We shall not delve any deeper into these questions here.

3.9.2. Keys and ciphers


Ciphers such as ROTk are one-trick ponies: Once your enemy knows the method, he
will be able to decrypt anything you encrypt.
This is true to an extent even if the enemy does not know the k, but only knows that
you have used some Caesarian cipher. Indeed, there are only 26 Caesarian ciphers

ROT0 , ROT1 , . . . , ROT25 .

Thus, if your enemy finds a text you encrypted using some ROTk , he can just try to
decrypt it using
ROT−0 , ROT−1 , . . . , ROT−25 ,
and see which of the results gives a meaningful word/text rather than gibberish (see
Exercise 3.9.1 (d)).44
In modern language, this is saying that Caesarian ciphers have too small a key size
to be secure. The key here is the number k. While technically there are infinitely many
options for k, there are only 26 distinct ciphers obtained, so the “true” key is just an
element of {0, 1, . . . , 25}. No wonder the cipher is easily broken.
Another problem with Caesarian ciphers is that they are “too regular”: e.g., equal
letters in the original word remain equal after encryption. This, too, causes weaknesses
that render the cipher easy to break.
So how can we create a cipher that is harder to break? We need a bigger key size,
and we need “more chaos” (e.g., don’t apply the same rule to each letter). Here are
some ciphers that are slightly better in some of these regards:

• Monoalphabetic substitution: Here we still do the same thing to each letter, but
this thing is no longer just a shift by k “hours”. Instead, we fix any permutation
of the alphabet (i.e., a rule that sends each letter to a different letter) and we

44 Theanswer might be non-unique when the word is short (see Exercise 3.9.1 (b)), but will
practically always be unique when the word/text is long enough.
Math 221 Winter 2024, version March 12, 2024 page 165

apply this permutation separately to each letter. For instance, we can use the
following permutation:

A B C D E F G H I J K L M
C Z X B N M P A D T S R Q

N O P Q R S T U V W X Y Z
K O E W Y U I J F L G H V

Then, the word “KITTEN” is encrypted as “SDIINK”.


The key size of this encryption method is huge: The number of possible keys is
the number of all permutations of the alphabet, which is (as we will soon see45 )
26! = 403 291 461 126 605 635 584 000 000. You cannot just try each of these keys to
see which one works. But you can still exploit certain patterns in the English lan-
guage (or whatever language the text is written it), such as frequencies of letters,
frequencies of two-letter combinations, and so on. If you have a ciphertext (i.e.,
encrypted text) of sufficient length (e.g., a page, but often a paragraph will be
enough), you can break a monoalphabetic substitution cipher using just statistics
and a bit of patience (see, e.g., [Beutel94, §1.6] for details). Essentially, this is
because the cipher is “not chaotic enough”.
• Vigenère substitution aka the running key cipher: Now the key is an infinite
(or finite but sufficiently long) sequence
(k1 , k2 , k3 , . . .)
of elements of {0, 1, . . . , 25} (or just of integers). To encrypt a word, we first
encode it as a tuple of integers ( a1 , a2 , . . . , an ), and then replace each number ai
by ( ai + k i ) %26; then, we decode the resulting tuple back into a word.
This is essentially a generalized Caesarian cipher, in which we let each letter get
a different key depending on its position.
This cipher is completely unbreakable, but it is also very inconvenient: You need
an infinitely long key, or at least a key that is at least as long as the text you want
to encrypt. Such keys are historically known as codebooks.
In many cases, this becomes impractical, so people have tried to “cheat”, e.g.,
by using a periodic sequence (k1 , k2 , k3 , . . .); but such cheats make the cipher
breakable when the ciphertext is sufficiently long (see [Beutel94, §2.3]). Likewise,
if you reuse the same sequence (k1 , k2 , k3 , . . .) as a key too many times, certain
frequency-based patterns will appear in your ciphertexts that will eventually give
away the key.

Many different algorithms have been invented over the ages, usually striking some
balance between practicality (ease of use, simplicity, shortness of the key) and security
(unbreakability). See [Singh01] for more classical algorithms and their history.
45 See Corollary 6.6.6 and the discussion that follows it.
Math 221 Winter 2024, version March 12, 2024 page 166

3.9.3. The RSA cipher


All ciphers invented until the early 20th century are classical ciphers: ciphers that can
be computed (specifically, encrypted and decrypted with) by hand, without the use
of computers. Practically all these ciphers can be broken with the help of computers
(provided the ciphertext is long enough), and thus are obsolete in the 21th century.
(Actually, breaking ciphers was one of the earliest uses of computers: The quest to
break the German Enigma cipher during World War II was a major motivation for the
development of computers in the mid-20th century.)
Modern ciphers are ciphers that require a computer to encrypt and decrypt (at
realistically useful speeds). Using the computational power of modern electronics,
they can afford to rely on much longer calculations with much larger numbers than
0, 1, . . . , 25. Quite a few modern ciphers are nowadays considered unbreakable, in the
sense that no realistic methods for breaking them are known (unless the ciphers are
used incorrectly), and there are good reasons to assume that such methods do not exist
on any currently existing hardware.
In this subsection, we will discuss one such modern cipher: the RSA cipher, devel-
oped by Rivest, Shamir and Adleman in 1977. Like a Caesarian cipher, it is based upon
division with remainder, but in a much less “predictable” way.
Unlike most classical ciphers, the RSA cipher is surprisingly robust, in the sense
that it can be used in a much less “fair-weather” situation. For most classical ciphers,
the sender and the receiver must have privately agreed on the key (e.g., the k in a
Caesarian cipher) in advance (ideally at a private vis-a-vis long before the need for
secrecy arises). If the enemy has managed to eavesdrop on this agreement, he will
know the key, and the cipher will be useless. In contrast, in the RSA cipher, the parties
can agree on their keys whenever they need them, and they can do so even over a
completely public channel (e.g., screaming at each other from the rooftops, or posting
on reddit)!
This rather counterintuitive feature is achieved by using two types of keys: public
keys (which are sent out in plain text, so that every curious outsider knows them) and
private keys (which the sender and the receiver compute individually, and don’t share
with anyone – not even with each other!).
Let me describe (while omitting some technicalities) how the RSA cipher works.
Assume that Albert and Julia want to communicate securely over a public channel
(e.g., an internet forum with no private-message functionality). They cannot hide the
fact that they are talking to each other (at least not using the RSA cipher), but they
want to hide the contents of their communications by encrypting them in a way that
no eavesdroppers can decipher. Albert and Julia have not exchanged keys in advance.
What do they do?
First, they need to set up the cipher:

• Julia tells Albert (over the public channel) that she wants to communicate, and
thus he should start creating keys.

• Albert generates two distinct large and sufficiently random primes p and q.
[What exactly does this mean, and how does he do this? With modern hardware,
“large” means approximately 300 digits or more. “Sufficiently random” means
Math 221 Winter 2024, version March 12, 2024 page 167

“pseudorandom”, e.g., (roughly speaking) practically unpredictable and devoid


of discernible patterns. Large pseudorandom primes can be generated fairly fast
by various algorithms, with a bit of input from the outside world (e.g., Geiger
counters) to generate randomness.]

• Albert computes the positive integers

m := pq and ℓ : = ( p − 1) ( q − 1) .

He makes the number m public (i.e., sends it to Julia over the public channel), but
keeps the number ℓ private (even Julia does not need to know it). Eavesdroppers
will thus learn m, but will struggle to find p and q, since no fast algorithm for
factoring numbers into primes is known. (If anyone finds such an algorithm, the
RSA cipher will be broken.)

• Albert randomly picks an integer e ∈ {2, 3, . . . , ℓ − 1} that is coprime to ℓ.


[The easiest way to do so is to pick a bunch of numbers in the set {2, 3, . . . , ℓ − 1}
at random, and grab the first of them that is coprime to ℓ. Coprimality can be
checked quickly using the Euclidean algorithm. If no chosen number is coprime
to ℓ, then roll the dice again.]

• Albert computes a positive integer d such that ed ≡ 1 mod ℓ.


[How? Bezout’s theorem (Theorem 3.4.6) shows that gcd (e, ℓ) = xe + yℓ for some
integers x and y. These x and y can be computed quickly using the Extended
Euclidean Algorithm (see Subsection 3.4.4). Having found these integers x and
y, we have gcd (e, ℓ) = xe + yℓ ≡ xe = ex mod ℓ and therefore ex ≡ gcd (e, ℓ) =
1 mod ℓ (since e is coprime to ℓ). So we just take d = x.]

• Albert publishes the pair (e, m) (so that Julia knows it, and so does anyone else
who cares to listen). This pair is his public key, whereas the (secret) pair (d, ℓ)
is his private key.

Encrypting a message:
Now, assume that Julia wants to send a message to Albert. She encodes this message
as an element a of the set {0, 1, . . . , m − 1}. (If it does not fit into this set, she just breaks
it up into size-m chunks and encrypts each chunk separately. Note that the encoding
has to be agreed on in advance, but this can be a public method.)
She computes the remainder ae %m and sends this remainder to Albert.
[Practical issue: To compute ae %m fast, she should not try to compute the huge
number ae , since there is no space in the universe to store such a huge number. Instead,
she can “work modulo m”, and use binary exponentiation. For example, to compute
a190 , she should not use the definition a190 = |aa {z · · · }a but the much faster formula
190 times
 2 2
! 2
2 2 2
 
a190 =  a 2 a a a a a , and moreover, since she only needs the
 

remainder a190 %m, she can reduce each intermediate result modulo m (that is, replace
Math 221 Winter 2024, version March 12, 2024 page 168

it by its remainder upon division by m), so that no overly large numbers should appear
in the process.]

Decrypting a message:
Albert receives the remainder b = ae %m. To recover Julia’s original message a, he
just needs to take the d-th power and take its remainder upon division by m. In other
words,
a = bd %m.
[Just like Julia, Albert should use binary exponentiation and work modulo m to
compute this efficiently.]

So the encryption algorithm is just “take the e-th power and then take its remainder
when divided by m”, whereas the decryption algorithm is just “take the d-th power
and then take its remainder when divided by m” (although the implementation is a bit
more complex, in order to be efficient).
Why does this work? Obviously, we need to prove the following proposition:

Proposition 3.9.3 (correctness of RSA). Let p and q be two distinct primes. Let
m = pq and ℓ = ( p − 1) (q − 1). Let e and d be two positive integers such that
ed ≡ 1 mod ℓ.
Let a and b be two numbers in {0, 1, . . . , m − 1} such that b = ae %m. Then, a =
bd %m.
This is not at all obvious! The RSA cipher might resemble a Caesarian cipher in that it
uses remainders, but it is different in that it takes powers instead of adding/subtracting
a fixed k.
To prove Proposition 3.9.3, we will need a lemma, which resembles Fermat’s Little
Theorem (Theorem 3.6.4):

Lemma 3.9.4. Let p and q be two distinct primes. Let N be a positive integer such
that N ≡ 1 mod ( p − 1) (q − 1). Let a be any integer. Then,

a N ≡ a mod pq.

Example 3.9.5. Let p = 3 and q = 5 and N = 9. Then, N ≡ 1 mod ( p − 1) (q − 1), so


that Lemma 3.9.4 yields that

a9 ≡ a mod 15 for any integer a.

Proof of Lemma 3.9.4. Fermat’s little theorem (Theorem 3.6.4) says that a p ≡ a mod p
and aq ≡ a mod q. Our claim looks similar, but not quite the same. Nevertheless, we
are on the right trail.
We must prove that a N ≡ a mod pq. In other words, we must prove that pq | a N − a.
But p and q are two distinct primes, and thus are coprime (why?46 ). Hence, pq | a N − a
46 Proof.The only positive divisors of q are 1 and q (since q is prime). Since p is neither 1 nor q,
it thus follows that p is not a positive divisor of q. In other words, q is not a multiple of p.
Hence, the friend-or-foe lemma (Lemma 3.6.2) shows that q is coprime to p.
Math 221 Winter 2024, version March 12, 2024 page 169

would follow from the coprime divisors theorem (Theorem 3.5.4), if we can show that
p | a N − a and q | a N − a.
It thus remains to prove that p | a N − a and q | a N − a. We will only show p | a N − a,
since q | a N − a is analogous.
So we must show that p | a N − a. In other words, we must show that a N ≡ a mod p.
However, p − 1 | ( p − 1) (q − 1) | N − 1 (since N ≡ 1 mod ( p − 1) (q − 1)). In other
words, N − 1 = ( p − 1) c for some integer c. Consider this c. It is easy to see that c ≥ 0
(why?47 ), so that c ∈ N. From N − 1 = ( p − 1) c, we obtain

N = 1 + ( p − 1) c. (43)

However, recall that a p ≡ a mod p. Using this fact, we can easily see that

a1+( p−1)k ≡ a mod p (44)

for each k ∈ N.
[Proof of (44): We can prove this by induction on k:
Base case (k = 0): We have a1+( p−1)0 = a1 = a ≡ a mod p. Thus, (44) holds for k = 0.
Induction step: Let k ∈ N. Assume (as the induction hypothesis) that (44) holds for
k; that is, we have a1+( p−1)k ≡ a mod p. We must then prove that (44) holds for k + 1
instead of k; in other words, we must prove that a1+( p−1)(k+1) ≡ a mod p.
But ( p − 1) (k + 1) = ( p − 1) k + ( p − 1), and therefore

a1+( p−1)(k+1) = a1+( p−1)k+( p−1) = a1+( p−1)k a p−1


!
here we multiplied the congruence a1+( p−1)k ≡ a mod p
≡ aa p−1
by the trivial congruence a p−1 ≡ a p−1 mod p
= a p ≡ a mod p.

This completes the induction step. Thus, (44) is proved.]


Now, (44) (applied to k = c) yields a1+( p−1)c ≡ a mod p (since c ∈ N). In view of
(43), we can rewrite this as a N ≡ a mod p. In other words, p | a N − a. Similarly, we can
show that q | a N − a. As explained above, this completes the proof of Lemma 3.9.4.

Proof of Proposition 3.9.3. We have b = ae %m ≡ ae mod m (by Proposition 3.3.11 (a),


applied to ae and m instead of n and d). Taking this congruence to the d-th power
(using Exercise 3.2.1), we obtain

bd ≡ ( ae )d = aed mod m.

But ed ≡ 1 mod ℓ, that is, ed ≡ 1 mod ( p − 1) (q − 1) (since ℓ = ( p − 1) (q − 1)). Hence,


Lemma 3.9.4 (applied to N = ed) yields

aed ≡ a mod pq.

47 Proof.
Assume the contrary. Thus, c < 0. But p > 1 (since p is prime), so that p − 1 > 0.
Hence, ( p − 1) c < 0 (since c < 0). Thus, N − 1 = ( p − 1) c < 0, so that N < 1. But
N ≥ 1 (since N is a positive integer). This is in obvious contradiction to N < 1. Hence, our
assumption was false, qed.
Math 221 Winter 2024, version March 12, 2024 page 170

In other words, aed ≡ a mod m (since pq = m). Combining what we have shown, we
obtain
bd ≡ aed ≡ a mod m.
Therefore, Proposition 3.3.16 (applied to bd , a and m instead of a, b and d) yields
bd %m = a%m = a (since a ∈ {0, 1, . . . , m − 1}). This proves Proposition 3.9.3.

The RSA cipher, as demonstrated above, lets Julia send secret messages to Albert. If
Albert wants to respond secretly, the two can switch roles (i.e., now Julia must set up
her two primes p′ and q′ and her m′ , ℓ′ , e′ and d′ , publish her public key (e′ , m′ ), and
let Albert encrypt his message using that public key).
The RSA cipher is not hard to implement in your favorite programming language,
provided that it supports sufficiently big integers. But there are some practical consid-
erations:

• You need sufficiently random primes. (Generally, any cipher requires something
sufficiently random that the eavesdroppers cannot guess.)

• Certain primes make for bad choices of p and q, since they allow certain tricks
for computing d. You want to avoid such primes.

• You want to avoid certain practical “side channels” (as with any ciphers).

• You don’t want your message a to be much smaller than m. If it is, pad it with
random bits.

These and many other caveats are discussed on the Wikipedia page for the RSA
cipher, as well as in more serious textbooks on modern cryptography (e.g., [BaEdHa18],
[Buchma04] or [HoPiSi14]).
The RSA cipher can be used not just for encrypting secret messages, but also for
authentification (i.e., proving that a message is really coming from you). See [HoPiSi14,
Chapter 4] or [Buchma04, Chapter 12] for this application.
There are many other modern ciphers. In particular, elliptic curve cryptography (see
[Buchma04, §13.2] or [HoPiSi14, Chapter 6]) can be viewed as a more intricate version
of the RSA cipher.

Exercise 3.9.3. Prove the following three-primes version of Lemma 3.9.4:


Let p, q and r be three distinct primes. Let N be a positive integer such that
N ≡ 1 mod ( p − 1) (q − 1) (r − 1). Let a be any integer. Then,

a N ≡ a mod pqr.
Math 221 Winter 2024, version March 12, 2024 page 171

4. An informal introduction to enumeration


Enumeration is a fancy word for counting – i.e., answering questions of the
form “how many things of a certain type are there?”. Here are some examples
of counting problems:

• How many ways are there to choose 3 odd integers between 0 and 20,
if the order matters (i.e., we count the choice 1, 3, 5 as different from the
choice 3, 1, 5)? (The answer is 1000.)

• How many ways are there to choose 3 odd integers between 0 and 20, if
the order does not matter? (The answer is 220.)

• How many ways are there to choose 3 distinct odd integers between 0 and
20, if the order matters? (The answer is 720.)

• How many ways are there to choose 3 distinct odd integers between 0 and
20, if the order does not matter? (The answer is 120.)

• How many prime factorizations does 200 have (where we count different
orderings as distinct)? (The answer is 10. This is a mix between a number
theory problem and a counting problem.)

• How many ways are there to tile a 2 × 15-rectangle with dominos (i.e.,
rectangles of size 1 × 2 or 2 × 1) ? (The answer is 987. For instance, the
tiling

is one of these 987 ways.)

• How many addends do you get when you expand the product
( a + b) (c + d + e) ( f + g) ? (The answer is 12.)
• How many differentmonomials do you get when you expand the product
( a − b) a2 + ab + b2 ? (This one is more of an algebra problem, but I
wanted to list it because it is connected to counting. The answer is 2,
because ( a − b) a2 + ab + b2 = a3 − b3 .)

• How many positive divisors does 24 have? (We can actually list them:
1, 2, 3, 4, 6, 8, 12, 24. This one is again a mix of a counting problem and
a number theory problem.)

We will first solve a few basic counting problems informally, and then (in
Chapter 6) make the underlying concepts rigorous.
Math 221 Winter 2024, version March 12, 2024 page 172

4.1. A refresher on sets


In prerequisite courses, you have seen basic properties of sets, and basic nota-
tions around sets, but let me quickly remind you of them.
Formally, the notion of a set is fundamental and cannot be defined.
Informally, a set is a collection of objects (which can be anything: num-
bers, matrices, functions or other sets) that knows which objects it contains and
which objects it doesn’t.
That is, if S is a set and p is any object, then S can either contain p (in which
case we write p ∈ S) or not contain p (in which case we write p ∈ / S). There is
no such thing as “containing p twice”.
The objects that a set S contains are called the elements of S; they are said to
belong to S (or lie in S, or be contained in S).
A set can be finite or infinite (i.e., contain finitely or infinitely many elements).
It can be empty (i.e., contain nothing) or nonempty (i.e., contain at least one
element).
An example of a set is the set of all odd integers. This is the set that contains
each odd integer and no other objects. Generally, “the set of X” means the set
that contains X and nothing else.
When a set is finite, it can be written by listing all its elements. For example,
the set of all odd integers between 0 and 10 can be written as

{1, 3, 5, 7, 9} .
The braces { and } around the list are there to signal that we mean the set of all
the elements, not the single elements themselves. These braces are called “set
braces”, and are involved in several different notations for sets.
Some more examples of finite sets are

{1, 2, 3, 4, 5} ,
{1, 2} ,
{1} (this is the set that only contains 1) ,
{} (the empty set, also denoted ∅) ,
{1, 2, . . . , 1000} (you understand what “ . . . ” means here) .
Some infinite sets can also be written in this form:

{1, 2, 3, . . .} (this is the set of all positive integers) ,


{0, 1, 2, . . .} (this is the set of all nonnegative integers) ,
{4, 5, 6, . . .} (this is the set of all integers ≥ 4) ,
{−1, −2, −3, . . .} (this is the set of all negative integers) ,
{. . . , −2, −1, 0, 1, 2, . . .} (this is the set of all integers) .
Some others cannot. For example, how would you list all the real numbers? Or
even all the rational numbers?
Math 221 Winter 2024, version March 12, 2024 page 173

Another way to describe a set is just by putting a description of its elements


in set braces. For example:

{all integers} (this is the set of all integers) ,


{all integers between 3 and 9 inclusive} ,
{all real numbers} .
Often, you want to define a set that contains all objects of a certain type that
satisfy a certain condition. For example, let’s say you want the set of all integers
x that satisfy x2 < 13. There is a notation for this:
n o
2
x is an integer | x < 13 .

The vertical bar | here should be read as “such that” (don’t mistake it for a
divisibility or absolute value bracket). The part before this bar says what type
of objects you are considering (in our case, it is the integers x); the part after this
bar imposes a condition (or several) on these objects (in our case, the condition
is x2 < 13). What you get is the set of all objects of the former type that satisfy
the latter condition. For instance,
n o
x is an integer | x2 < 13
= {all integers whose square is smaller than 13}
= {−3, −2, −1, 0, 1, 2, 3} .
Some authors write a colon (:)instead of the vertical bar |. Thus, they write
x is an integer | x2 < 13 as x is an integer : x2 < 13 .


Some sets have standard names:

Z = {all integers} = {. . . , −2, −1, 0, 1, 2, . . .} ;


N = {all nonnegative integers} = {0, 1, 2, . . .}
(beware that some authors use N for {1, 2, 3, . . .} instead) ;
Q = {all rational numbers} ;
R = {all real numbers} (you barely need them in this course) ;
C = {all complex numbers} (you don’t need them in this course) ;
∅ = {} (this is the empty set) .
Using these notations, we can rewrite
n o n o
2
x is an integer | x < 13 as x ∈ Z | x < 13 .
2

Yet another way of defining sets is when you let a variable range over a given
set and collect certain derived quantities. For example,
n o
x2 + 2 | x ∈ {1, 3, 5, 7, 9}
Math 221 Winter 2024, version March 12, 2024 page 174

means the set whose elements are the numbers x2 + 2 for all x ∈ {1, 3, 5, 7, 9}.
Thus,
n o n o
2 2 2 2 2 2
x + 2 | x ∈ {1, 3, 5, 7, 9} = 1 + 2, 3 + 2, 5 + 2, 7 + 2, 9 + 2
= {3, 11, 27, 51, 83} .

In general, if S is a given set, then the notation

{an expression | x ∈ S}

stands for the set whose elements are the values of the given expression for all
x ∈ S.
Some more examples of this:
   
x+1 1+1 2+1 3+1 4+1 5+1
| x ∈ {1, 2, 3, 4, 5} = , , , ,
x 1 2 3 4 5
 
3 4 5 6
= 2, , , ,
2 3 4 5
and
n o n o
x %5 | x ∈ N = 0 %5, 1 %5, 2 %5, 3 %5, 4 %5, 5 %5, 6 %5, . . .
2 2 2 2 2 2 2 2

= {0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, . . .} .

Note that the remainders x2 %5 repeat every five steps, because every integer
x satisfies ( x + 5)2 ≡ x2 mod 5 and thus ( x + 5)2 %5 = x2 %5 (by Proposition
3.3.16).
Let me stress once again that a set cannot contain an element more than once.
Also, sets do not come with an ordering of their elements. Thus,

{1, 2} = {2, 1} = {2, 1, 1} = {1, 2, 1, 2, 1} ,

since each of these four sets contains 1 and 2 and nothing else. If S is a set
and p is an object, then S either contains p or does not contain p; it cannot
“contain p twice”, nor can it contain an element “before” another. So when
you write {2, 1, 1}, you aren’t making a set that contains 1 twice; you are just
saying twice that it contains 1, and this is equivalent to saying the same thing
once. Likewise, the sets {1, 2} and {2, 1} do not “contain 1 and 2 in different
orders”; you are just saying in different orders that they contain 1 and 2, but
the meaning is the same. So
n o
x %5 | x ∈ N = {0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, . . .}
2

= {0, 1, 4} .

This is a finite set, even though N is infinite!


Math 221 Winter 2024, version March 12, 2024 page 175

Note that sets can contain any mathematical objects, not just numbers. In
particular, they can contain other sets. Make sure you understand what the
sets
{1, 2, 3} , {{1, 2, 3}} , {{1, 2} , {3}} , {{1} , {2} , {3}}
are and why they are different48 .

Sets can be compared and combined in several ways:

Definition 4.1.1. Let A and B be two sets.


(a) We say that A is a subset of B (and we write A ⊆ B) if every element
of A is an element of B.
(b) We say that A is a superset of B (and we write A ⊇ B) if every element
of B is an element of A. This is tantamount to saying B ⊆ A.
(c) We say that A = B if the sets A and B contain the same elements. This
is tantamount to saying that both A ⊆ B and A ⊇ B hold.
(d) We define the union of A and B to be the set

A ∪ B := {all elements that are contained in A or B}


= { x | x ∈ A or x ∈ B} .

(The “or” is non-exclusive, as usual. So this includes the elements that are
contained in both A and B.)
(e) We define the intersection of A and B to be the set

A ∩ B := {all elements that are contained in both A and B}


= { x | x ∈ A and x ∈ B} .

(f) We define the set difference of A and B to be the set

A \ B := {all elements that are contained in A but not in B}


= { x | x ∈ A and x ∈/ B} = { x ∈ A | x ∈/ B} .

This is also denoted by A − B by certain authors.


(g) We say that A and B are disjoint if A ∩ B = ∅ (that is, A and B have
no element in common).
48 Answer:

• The set {1, 2, 3} contains three elements, namely the numbers 1, 2 and 3.
• The set {{1, 2, 3}} contains one element, namely the set {1, 2, 3}.
• The set {{1, 2} , {3}} contains two elements, namely the sets {1, 2} and {3}.
• The set {{1} , {2} , {3}} contains three elements, namely the sets {1}, {2} and {3}.
Math 221 Winter 2024, version March 12, 2024 page 176

For example,

{1, 3, 5} ⊆ {1, 2, 3, 4, 5} ,
{1, 2, 3, 4, 5} ⊇ {1, 3, 5} ,
we don’t have {5, 6, 7} ⊆ {1, 2, 3, 4, 5} ,
{1, 2, 3} = {3, 2, 1} ,
{1, 3, 5} ∪ {3, 6} = {1, 3, 5, 3, 6} = {1, 3, 5, 6} ,
{1, 3, 5} ∩ {3, 6} = {3} ,
{1, 2, 4} ∩ {3, 5} = ∅ (so that the sets {1, 2, 4} and {3, 5} are disjoint) ,
{1, 3, 5} \ {3, 6} = {1, 5} ,
{3, 6} \ {1, 3, 5} = {6} ,
Z \ N = {−1, −2, −3, . . .} = {all negative integers} .

Definition 4.1.2. Several sets A1 , A2 , . . . , Ak are said to be disjoint if any two


of them (not counting a set and itself) are disjoint, i.e., if we have Ai ∩ A j = ∅
for all i < j.

For example, the three sets {1, 2}, {5} and {0, 7} are disjoint. On the other
hand, the three sets {1, 2}, {5} and {2, 3} are not (since {1, 2} ∩ {2, 3} = {2} ̸=
∅).

4.2. Counting, informally


Now, let us see how the elements of a set can be counted. Formally speaking,
we will define “counting” later, so we will play around with not-quite-rigorous
concepts for now. As long as we are working with finite sets, your intuitive
understanding of “counting” should not mislead you.
For example, the set of all odd integers between 0 and 10 has 5 elements
(1, 3, 5, 7, 9), and this doesn’t change if you write it redundantly as {1, 3, 5, 5, 5, 5, 7, 9}.
In other words, there are 5 odd integers between 0 and 10.
More generally, I claim:
 
n+1
Proposition 4.2.1. Let n ∈ N. Then, there are exactly (n + 1) //2 =
2
odd integers between 0 and n (inclusive).
 
n+1
Informal proof. The equality (n + 1) //2 = follows from Proposition
2 
n+1
3.3.14. It remains to show that there are exactly odd integers between
2
0 and n. (We shall always understand the word “between” to be inclusive, so
that n itself is counted if n is odd.)
We prove this by induction on n:
Math 221 Winter 2024, version March 12, 2024 page 177

 
0+1
Base case: For n = 0, the claim is true, because there are 0 = odd
2
integers between 0 and 0.
Induction step: Let n be a positive integer. Assume (as the induction hypoth-
jnk
esis) that the claim is true for n − 1. That is, assume that there are exactly
2
odd integers between 0 and n − 1. We  must show that the claim also holds for
n+1
n, i.e., that there are exactly odd integers between 0 and n.
2
Let me introduce a shorthand: The symbol “#” shall mean “number”. Thus,
our induction hypothesis says
jnk
(# of odd integers between 0 and n − 1) = , (45)
2
and our goal is to prove that
 
n+1
(# of odd integers between 0 and n) = .
2
We are in one of the following two cases:
Case 1: The number n is even.
Case 2: The number n is odd.
Let us consider Case 1 first. In this case, n is even. Thus, n is not odd.
Therefore, the odd integers between 0 and n are precisely the odd integers
between 0 and n − 1 (since the extra integer n does not qualify as odd). Hence,
(# of odd integers between 0 and n)
= (# of odd integers between 0 and n − 1)
jnk
= (by (45)) . (46)
2
However, n + 1 is odd (since n is even), and thus 2 ∤ n + 1. Therefore,
 Corol-

n+1
lary 3.3.19 (b) (applied to 2 and n + 1 instead of d and n) yields =
2
( n + 1) − 1
  j k
n
= . Comparing this with (46), we find
2 2
 
n+1
(# of odd integers between 0 and n) = .
2
Thus, we have achieved our goal in Case 1.
Let us now consider Case 2. In this case, n is odd. Thus, the odd integers
between 0 and n are precisely the odd integers between 0 and n − 1 along with
the new odd integer n. Hence,
(# of odd integers between 0 and n)
= (# of odd integers between 0 and n − 1) + 1
jnk
= +1 (by (45)) . (47)
2
Math 221 Winter 2024, version March 12, 2024 page 178

However, n + 1 is even (since n is odd), and thus 2 | n + 1. Therefore,


 Corol-

n+1
lary 3.3.19 (a) (applied to 2 and n + 1 instead of d and n) yields =
2
( n + 1) − 1
  jnk
+1 = + 1. Comparing this with (47), we find
2 2
 
n+1
(# of odd integers between 0 and n) = .
2
Thus, we have achieved our goal in Case 2.
So the goal has been achieved in either case, and the induction step is com-
plete. This proves Proposition 4.2.1.
Note: We called the above proof “informal” because we still don’t have a
rigorous definition of the size of a set (i.e., of what “the number of” means).
But we will soon see such a definition. Once we have learnt this definition and
its basic properties, the above proof will become a formal proof with trivial
changes.
Incidentally, it is worth stating the formula for the number of integers (not
just odd integers) in a given interval. Before we state it, let us agree that if a
and b are two integers, then the notation
{ a, a + 1, a + 2, . . . , b} (or, shorter, { a, a + 1, . . . , b})
stands for the set of all integers between a and b (inclusive), i.e., the set
{ x ∈ Z | a ≤ x ≤ b}. In particular, this set is just { a} if a = b, and is empty if
a > b. The following proposition gives its size whenever it is nonempty:
Proposition 4.2.2. Let a, b ∈ Z be such that a ≤ b + 1.
Then, there are exactly b − a + 1 numbers in the set
{ a, a + 1, a + 2, . . . , b}. In other words, there are exactly b − a + 1
integers between a and b (inclusive).
Informal proof. This is intuitively obvious and can be rigorously proved by in-
duction on b.
The hard part about Proposition 4.2.2 is not the proof, but rather remember-
ing the “+1”! If your intuition comes from calculus, you think of the interval
[ a, b] as having length b − a (if b ≥ a). But since we are doing discrete mathemat-
ics, we are computing not the geometric length of this interval, but rather the
number of integers on this interval, including both endpoints; and this number
is 1 larger than the length. (For example, if a = b, then the geometric interval
[ a, b] has zero length, but it contains one integer, namely a.)
It is also worth saying that if two integers a and b satisfy a ≤ b − 1, then there
are exactly b − a − 1 integers between a and b exclusive (meaning that we count
neither a nor b).
Math 221 Winter 2024, version March 12, 2024 page 179

Convention 4.2.3. We agree to use the symbol “#” for “number”.

4.3. Counting subsets


4.3.1. Counting them all
Now, let us count something less trivial than numbers.
How many subsets does the set {1, 2, 3} have? These subsets are

{} , {1} , {2} , {3} ,


{1, 2} , {1, 3} , {2, 3} , {1, 2, 3} .
(Yes, every set A satisfies A ⊆ A and {} ⊆ A.) Thus, there are 8 subsets of
{1, 2, 3} in total.
Likewise,

• there are 4 subsets of {1, 2}, namely {} , {1} , {2} , {1, 2}.
• there are 2 subsets of {1}, namely {} and {1}.
• there is 1 subset of {}, namely {}.
• there are 16 subsets of {1, 2, 3, 4}.

The pattern here is hard to miss:49

Theorem 4.3.1. Let n ∈ N. Then,

(# of subsets of {1, 2, . . . , n}) = 2n .

Informal proof. We induct on n.


The base case (n = 0) is easy: The set {1, 2, . . . , 0} is empty, and thus its only
subset is {} itself; hence, the # of subsets of {1, 2, . . . , 0} is 1 = 20 .
Induction step: We proceed from n − 1 to n. Thus, let n be a positive integer.
We assume (as the induction hypothesis) that Theorem 4.3.1 holds for n − 1
instead of n, and we set out to prove that it holds for n.
So our induction hypothesis says that

(# of subsets of {1, 2, . . . , n − 1}) = 2n−1 .


Our goal is to prove that

(# of subsets of {1, 2, . . . , n}) = 2n .


We define
49 The
expression “{1, 2, . . . , n}” should be read as {1, 2} if n = 2, as {1} if n = 1, and as the
empty set {} if n = 0.
Math 221 Winter 2024, version March 12, 2024 page 180

• a red set to be a subset of {1, 2, . . . , n} that contains n;


• a green set to be a subset of {1, 2, . . . , n} that does not contain n.
For example, if n = 3, then the red sets are
{3} , {1, 3} , {2, 3} , {1, 2, 3} ,
whereas the green sets are
{} , {1} , {2} , {1, 2} .
Each subset of {1, 2, . . . , n} is either red or green, but not both. Hence,
(# of subsets of {1, 2, . . . , n}) = (# of red sets) + (# of green sets) .
(This is an instance of a basic counting principle: If some objects are classified
into two types, then we can count these objects by counting the objects of each
type and adding the results. Later we will state this as a rigorous theorem,
called the sum rule for two sets.)
Thus it remains to count the red sets and the green sets separately.
The green sets are easy: They are just the subsets of {1, 2, . . . , n − 1}. Hence,

(# of green sets) = (# of subsets of {1, 2, . . . , n − 1}) = 2n−1


(by the induction hypothesis).
Counting the red sets is trickier, but we can reduce the problem to counting
the green sets: Indeed, the red sets are just the green sets with the element n
inserted into them. To be more precise: Each green set can be turned into a
red set by inserting n into it50 . Conversely, each red set can be turned into a
green set by removing the element n from it. These two operations are mutually
inverse, and thus set up a one-to-one correspondence between the green sets
and the red sets.51 This reveals that the # of red sets is the # of green sets. Thus,

(# of red sets) = (# of green sets) = 2n−1 .


Combining what we have shown, we now obtain
(# of subsets of {1, 2, . . . , n}) = (# of red sets) + (# of green sets)
| {z } | {z }
=2n −1 =2n −1
n −1 n −1 n −1 n
=2 +2 = 2·2 =2 .
50 For example, if n = 3, then the green set {2} becomes {2, 3} in this way.
51 For instance, for n = 3, it looks like this:

green set {} {1} {2} {1, 2}


↕ ↕ ↕ ↕ .
red set {3} {1, 3} {2, 3} {1, 2, 3}
Math 221 Winter 2024, version March 12, 2024 page 181

This is precisely what we needed to prove. This completes the induction step,
and thus Theorem 4.3.1 is proved.
More generally, we have the following:

Theorem 4.3.2. Let n ∈ N. Let S be an n-element set. Then,

(# of subsets of S) = 2n .

Informal proof. This follows from Theorem 4.3.1, since we can rename the n ele-
ments of S as 1, 2, . . . , n.
For example,

(# of subsets of {“cat”, “dog”, “bat”}) = 23 .

4.3.2. Counting the subsets of a given size


Let us now refine our question: Instead of counting all subsets of {1, 2, . . . , n},
we shall only count the ones that have a given size k. Here, the size of a
set means the # of its elements, i.e., how many distinct elements it has. (For
example, the set {1, 4, 1, 15} has size 3, never mind that I needlessly listed one
of its elements twice.) A set of size k is also known as a k-element set. (Soon
we will define these concepts rigorously.)
For instance, {1, 2, 3, 4} is a 4-element set. How many 2-element subsets does
it have? It has six:

{1, 2} , {1, 3} , {1, 4} , {2, 3} , {2, 4} , {3, 4} .

More generally, the answer to the question “how many k-element subsets  
n
does a given n-element set have” turns out to be the binomial coefficient .
k
Let us state this as a theorem and give an informal proof (which will easily
become rigorous once we have the basic concepts of counting pinned down):52

Theorem 4.3.3. Let n ∈ N, and let k be any number (not necessarily an


integer). Let S be an n-element set. Then,
 
n
(# of k-element subsets of S) = .
k

52 This theorem is exactly Theorem 2.5.10, which we left unproved a few chapters ago.
Math 221 Winter 2024, version March 12, 2024 page 182

Informal proof. We induct on n (without fixing k). That is, we use induction on
n to prove the statement

“for any number k and any n-element set S,


 
 
P (n) :=  n 
we have (# of k-element subsets of S) = ”
k

for each n ∈ N.
Base case: Let us prove P (0). Let k be any number. The only 0-element set
is ∅, and its only subset is ∅. Thus, a 0-element set S necessarily has one
0-element subset (∅) and no other subsets. Hence, it satisfies
(
1, if k = 0;
(# of k-element subsets of S) =
0, else.

However, we also have


  (
0 1, if k = 0;
=
k 0, else
(this follows easily from the definition of binomial coefficients). By comparing
these two equalities, we see that any 0-element set S satisfies
 
0
(# of k-element subsets of S) = .
k

In other words, P (0) holds.


Induction step: Let n be a positive integer. Assume (as the induction hypoth-
esis) that P (n − 1) holds. We must prove that P (n) holds.
So we consider any number k and any n-element set S. We must prove that
 
n
(# of k-element subsets of S) = .
k

We rename the n elements of S as 1, 2, . . . , n, so we must prove that


 
n
(# of k-element subsets of {1, 2, . . . , n}) = .
k

To prove this, we define

• a red set to be a k-element subset of {1, 2, . . . , n} that contains n;

• a green set to be a k-element subset of {1, 2, . . . , n} that does not contain


n.

For instance:
Math 221 Winter 2024, version March 12, 2024 page 183

• For n = 4 and k = 2, the red sets are


{1, 4} , {2, 4} , {3, 4} ,
while the green sets are
{1, 2} , {1, 3} , {2, 3} .
• For n = 5 and k = 2, the red sets are
{1, 5} , {2, 5} , {3, 5} , {4, 5} ,
while the green sets are
{1, 2} , {1, 3} , {1, 4} , {2, 3} , {2, 4} , {3, 4} .
Each k-element subset of {1, 2, . . . , n} is either red or green (but not both).
Hence,
(# of k-element subsets of {1, 2, . . . , n})
= (# of red sets) + (# of green sets) . (48)
The green sets are just the k-element subsets of {1, 2, . . . , n − 1}. Thus,
(# of green sets) = (# of k-element subsets of {1, 2, . . . , n − 1})
n−1
 
=
k
(by the statement P (n − 1), which we have assumed to hold).
Now, let’s try to count the red sets.
If T is a red set, then T \ {n} is a (k − 1)-element subset of {1, 2, . . . , n − 1}.
Let us refer to the (k − 1)-element subsets of {1, 2, . . . , n − 1} as blue sets.
Thus, if T is a red set, then T \ {n} is a blue set. Conversely, if U is a blue set,
then U ∪ {n} is a red set. This sets up a one-to-one correspondence between
the red sets and the blue sets: We turn red sets into blue sets by removing the
element n, and conversely we turn blue sets red by inserting the element n into
the set.53 Hence,
(# of red sets) = (# of blue sets)
= (# of (k − 1) -element subsets of {1, 2, . . . , n − 1})
(since this is how the blue sets were defined)
n−1
 
=
k−1
53 For instance, for n = 4 and k = 2, this correspondence looks like this:

red set {1, 4} {2, 4} {3, 4}


↕ ↕ ↕ .
blue set {1} {2} {3}
Math 221 Winter 2024, version March 12, 2024 page 184

(again by the statement P (n − 1), but now applied to k − 1 instead of k). Note
that we deliberately did not fix k in our induction, so that we were now able to
apply P (n − 1) to k − 1 instead of k.
Now, (48) becomes

(# of k-element subsets of {1, 2, . . . , n}) = (# of red sets) + (# of green sets)


|  {z  } |  {z  }
n−1 n−1
= =
k−1 k
n−1 n−1
     
n
= + =
k−1 k k

by Pascal’s recurrence (Theorem 2.5.1). But this is precisely the equality that
we have to prove. This completes the induction step, and thus Theorem 4.3.3 is
proved.
The above proof can also be used to write an algorithm that lists all the k-
element subsets of {1, 2, . . . , n}. This algorithm is recursive and proceeds as
follows:

• If n = 0, then:
– if k = 0, then list ∅ (i.e., the resulting list will consist only of ∅).
– otherwise, list nothing.

• Otherwise,
– list the red sets (by listing all the (k − 1)-element subsets of {1, 2, . . . , n − 1},
and inserting n into each of them);
– list the green sets (i.e., the k-element subsets of {1, 2, . . . , n − 1});
– combine these two lists.

In Python, this algorithm (or one possible implementation of it) looks as


follows54 :

54 Note that lists are enclosed within brackets in Python: e.g., a list that we call ( a, b, c) would
be written [a,b,c] in Python. Also, Python’s notation set([a,b,c]) corresponds to our
{ a, b, c}.
Math 221 Winter 2024, version March 12, 2024 page 185

def subsets(n, k):


# listing all subsets of {1, 2, ..., n} that have size k.
if n == 0:
if k == 0:
return [set([])] # set([]) is the empty set
return [] # empty list
# Now, the case when n is not 0:
green_sets = subsets(n-1, k)
# This is the list of all green sets.
red_sets = [U.union([n]) for U in subsets(n-1, k-1)]
# This is the list of all red sets. We construct it by
# taking all the (k-1)-element subsets of {1, 2, ..., n-1}
# (i.e., the blue sets), and inserting n into each of
# them.
return red_sets + green_sets
# In Python, the plus sign can be used to combine two lists.

With this code, subsets(4, 2) yields


[{3, 4}, {2, 4}, {1, 4}, {2, 3}, {1, 3}, {1, 2}]
as an output, and this is indeed a list of all 2-element subsets of {1, 2, . . . , 4} =
{1, 2, 3, 4}.
Theorem 4.3.3 is often called the combinatorial interpretation
  of binomial
n
coefficients, since it reveals that the binomial coefficients (at least for n ∈
k
N) have a combinatorial meaning (viz., counting k-element subsets of a given
n-element set). However, it is just one of many such interpretations, and we
will see four others in Chapter 6!

Exercise 4.3.1. Let n ≥ 2 be an integer. The symbol “#” means “number”.


(a) Compute the # of subsets of {1, 2, . . . , n} that contain both 1 and 2.
(b) Compute the # of 3-element subsets of {1, 2, . . . , n} that contain both 1
and 2.
[To “compute” a number means to find a closed-form expression for this
number (with no summation signs) and to prove this formula. I expect proofs
to be given at the level of detail and rigor seen in this chapter.]

4.4. Tuples (aka lists)


4.4.1. Definition and disambiguation
Speaking of lists: What is a finite list? Here is a somewhat awkward definition:
Math 221 Winter 2024, version March 12, 2024 page 186

Definition 4.4.1. A finite list (aka tuple) is a list consisting of finitely many
objects. The objects appear in this list in a specified order, and they don’t
have to be distinct.
A finite list is delimited using parentheses: i.e., the list that contains the
objects a1 , a2 , . . . , an in this order is denoted by ( a1 , a2 , . . . , an ).
“Specified order” means that the list has a well-defined first entry, a
well-defined second entry, and so on. Thus, two lists ( a1 , a2 , . . . , an ) and
(b1 , b2 , . . . , bm ) are considered equal if and only if

• we have n = m, and

• we have ai = bi for each i ∈ {1, 2, . . . , n}.

For example:

• The lists (1, 2) and (2, 1) are not equal (although the sets {1, 2} and {2, 1}
are equal).

• The lists (1, 2) and (1, 1, 2) are not equal (although the sets {1, 2} and
{1, 1, 2} are equal).
• The lists (1, 1, 2) and (1, 2, 2) are not equal (although the sets {1, 1, 2} and
{1, 2, 2} are equal).

Definition 4.4.2. (a) The length of a list ( a1 , a2 , . . . , an ) is defined to be the


number n.
(b) A list of length 2 is called a pair (or an ordered pair).
(c) A list of length 3 is called a triple.
(d) A list of length 4 is called a quadruple.
(e) A list of length n is called an n-tuple.

For example, (1, 3, 2, 2) is a list of length 4 (although it has only 3 distinct


entries), i.e., a quadruple or a 4-tuple. For another example, (5, 8) is a pair, i.e.,
a 2-tuple.
Note that there is exactly one list of length 0: the empty list (), which contains
nothing.
Lists of length 1 consist of just a single entry. For example, (3) is a list
containing only the entry 3.

4.4.2. Counting pairs


Now, let us count some pairs:
Math 221 Winter 2024, version March 12, 2024 page 187

• How many pairs ( a, b) are there with a, b ∈ {1, 2, 3} ? There are nine:

(1, 1) , (1, 2) , (1, 3) ,


(2, 1) , (2, 2) , (2, 3) ,
(3, 1) , (3, 2) , (3, 3) .

The fact that there are nine of them is not surprising given how I’ve laid
them out: They are forming a table with 3 rows and 3 columns, where the
row determines the first entry of the pair55 and the column determines
the second entry. Thus, their total number is 3 · 3 = 9.

• How many pairs ( a, b) are there with a, b ∈ {1, 2, 3} and a < b ? There are
three:
(1, 2) , (1, 3) , (2, 3) .

• How many pairs ( a, b) are there with a, b ∈ {1, 2, 3} and a = b ? Again,


three:
(1, 1) , (2, 2) , (3, 3) .

• How many pairs ( a, b) are there with a, b ∈ {1, 2, 3} and a > b ? Again,
three:
(2, 1) , (3, 1) , (3, 2) .

Let us generalize this:

Proposition 4.4.3. Let n ∈ N. Then:


(a) The # of pairs ( a, b) with a, b ∈ {1, 2, . . . , n} is n2 .
(b) The # of pairs ( a, b) with a, b ∈ {1, 2, . . . , n} and a < b is 1 + 2 + · · · +
( n − 1).
(c) The # of pairs ( a, b) with a, b ∈ {1, 2, . . . , n} and a = b is n.
(d) The # of pairs ( a, b) with a, b ∈ {1, 2, . . . , n} and a > b is 1 + 2 + · · · +
( n − 1).

Informal proof. (a) These pairs can be arranged in a table with n rows and n
columns, where the rows determine the first entry and the columns determine

55 i.e.:

– The first row contains the pairs that begin with 1.


– The second row contains the pairs that begin with 2.
– The third row contains the pairs that begin with 3.
Math 221 Winter 2024, version March 12, 2024 page 188

the second. Here is how this table looks like:

(1, 1) , (1, 2) , . . . , (1, n) ,


(2, 1) , (2, 2) , . . . , (2, n) ,
.. .. .. ..
. . . .
(n, 1) , (n, 2) , . . . , (n, n) .

So there are n · n = n2 of these pairs.


(b) In the table we have just shown, a pair ( a, b) satisfies a < b if and only if it
is placed above the main diagonal (i.e., the diagonal starting at the northwestern
corner and ending at the southeastern corner of the table). Thus, the # of such
pairs is the # of cells above the main diagonal in this table. But this # is

0 + 1 + 2 + · · · + ( n − 1) ,

because there are 0 such cells in the first column, 1 such cell in the second, 2
such cells in the third, and so on. Hence,

(# of pairs ( a, b) with a, b ∈ {1, 2, . . . , n} and a < b)


= 0 + 1 + 2 + · · · + ( n − 1)
= 1 + 2 + · · · + ( n − 1) .

(c) A pair ( a, b) with a = b is just a pair of the form ( a, a), that is, a single
element of {1, 2, . . . , n} written twice in succession. Counting such pairs is
therefore tantamount to counting single elements of {1, 2, . . . , n}; but there are
clearly n of them.
(d) The pairs ( a, b) that satisfy a > b are in one-to-one correspondence with
the pairs ( a, b) that satisfy a < b: Namely, each former pair becomes a latter pair
if we swap its two entries, and vice versa. Thus, the # of former pairs equals the
# of latter pairs. But we have already found (in part (b)) that the # of latter pairs
is 1 + 2 + · · · + (n − 1). Hence, the # of former pairs is 1 + 2 + · · · + (n − 1) as
well.
Math 221 Winter 2024, version March 12, 2024 page 189

Proposition 4.4.3 has a nice consequence: For any n ∈ N, we have

n2 = (# of pairs ( a, b) with a, b ∈ {1, 2, . . . , n}) (by Proposition 4.4.3 (a))


= (# of pairs ( a, b) with a, b ∈ {1, 2, . . . , n} and a < b)
| {z }
=1+2+···+(n−1)
(by Proposition 4.4.3 (b))
+ (# of pairs ( a, b) with a, b ∈ {1, 2, . . . , n} and a = b)
| {z }
=n
(by Proposition 4.4.3 (c))
+ (# of pairs ( a, b) with a, b ∈ {1, 2, . . . , n} and a > b)
| {z }
=1+2+···+(n−1)
(by Proposition 4.4.3 (d))
 
since each pair ( a, b) satisfies either a < b or a = b or a > b,
and never more than one of these three conditions
= (1 + 2 + · · · + (n − 1)) + n + (1 + 2 + · · · + (n − 1))
| {z } | {z }
=1+2+···+n =(1+2+···+n)−n
= (1 + 2 + · · · + n ) + (1 + 2 + · · · + n ) − n
= 2 · (1 + 2 + · · · + n) − n.

Solving this for 1 + 2 + · · · + n, we obtain

n2 + n n ( n + 1)
1+2+···+n = = .
2 2
Thus, we have recovered the Little Gauss formula (Theorem 1.3.1) by counting
pairs. This illustrates the fact that counting can be used to prove algebraic
identities.

Exercise 4.4.1. How many pairs ( a, b) are there with a ∈ {1, 2, 3} and b ∈
{1, 2, 3, 4, 5} ?

Solution. By the same reasoning as in Proposition 4.4.3 (a), there are 15 such
pairs, since the pairs can be arranged in a table with 3 rows and 5 columns.
The same reasoning gives the following more general result:

Theorem 4.4.4. Let n, m ∈ N. Let A be an n-element set. Let B be an m-


element set. Then,

(# of pairs ( a, b) with a ∈ A and b ∈ B) = nm.

What about triples?


Math 221 Winter 2024, version March 12, 2024 page 190

Theorem 4.4.5. Let n, m, p ∈ N. Let A be an n-element set. Let B be an


m-element set. Let C be a p-element set. Then,

(# of triples ( a, b, c) with a ∈ A and b ∈ B and c ∈ C ) = nmp.

Informal proof. You can think of these triples as occupying the cells of a 3-
dimensional table, but this kind of visualization is tricky (and gets even less
reliable when you get to higher dimensions).
A better approach: Re-encode each triple ( a, b, c) as a pair (( a, b) , c) (a pair
whose first entry is itself a pair). This is a pair whose first entry comes from
the set of all pairs ( a, b) with a ∈ A and b ∈ B, whereas its second entry comes
from C. Let U be the set of all pairs ( a, b) with a ∈ A and b ∈ B. Then, this set
U is an nm-element set, because

(# of elements of U ) = (# of pairs ( a, b) with a ∈ A and b ∈ B)


= nm (by Theorem 4.4.4) .
Now, we have re-encoded each triple ( a, b, c) as a pair (( a, b) , c) with ( a, b) ∈
U and c ∈ C. Thus,

(# of triples ( a, b, c) with a ∈ A and b ∈ B and c ∈ C )


= (# of pairs (( a, b) , c) with ( a, b) ∈ U and c ∈ C )
= (# of pairs (u, c) with u ∈ U and c ∈ C )
= (nm) p
(by Theorem 4.4.4, since U is an nm-element set while C is a p-element set). In
other words,

(# of triples ( a, b, c) with a ∈ A and b ∈ B and c ∈ C ) = nmp.


This proves Theorem 4.4.5.

4.4.3. Cartesian products


There is a general notation for sets of pairs:

Definition 4.4.6. Let A and B be two sets.


The set of all pairs ( a, b) with a ∈ A and b ∈ B is denoted by A × B, and is
called the Cartesian product (or just product) of the sets A and B.

For instance, {1, 2} × {7, 8, 9} is the set of all pairs ( a, b) with a ∈ {1, 2} and
b ∈ {7, 8, 9}. Explicitly, it consists of the following six pairs:

(1, 7) , (1, 8) , (1, 9) ,


(2, 7) , (2, 8) , (2, 9) .
Math 221 Winter 2024, version March 12, 2024 page 191

Likewise, the set {1, 2} × {2, 3} consists of the four pairs


(1, 2) , (1, 3) ,
(2, 2) , (2, 3) .
A similar notation exists for sets of triples, of quadruples or of k-tuples in
general:

Definition 4.4.7. Let A1 , A2 , . . . , Ak be k sets.


The set of all k-tuples ( a1 , a2 , . . . , ak ) with a1 ∈ A1 and a2 ∈ A2 and · · · and
ak ∈ Ak is denoted by
A1 × A2 × · · · × A k ,
and is called the Cartesian product (or just product) of the sets
A1 , A2 , . . . , A k .
For example, the set {1, 2} × {5} × {2, 7, 6} consists of all triples ( a1 , a2 , a3 )
with a1 ∈ {1, 2} and a2 ∈ {5} and a3 ∈ {2, 7, 6}. One such pair is (2, 5, 2);
another is (2, 5, 6). In total, there are 3 · 1 · 2 such triples (by Theorem 4.4.5).
The word “Cartesian” in “Cartesian product” honors René Descartes, who
has observed that a point in the Euclidean plane can be characterized by its
two coordinates (i.e., a pair of real numbers), whereas a point in space can be
characterized by its three coordinates (i.e., a triple of real numbers). These two
observations allow us to think of the plane as the Cartesian product R × R, and
to think of space as the Cartesian product R × R × R.
Using the notation A × B, we can restate Theorem 4.4.4 as follows:

Theorem 4.4.8 (product rule for two sets). If A is an n-element set, and B is
an m-element set, then A × B is an nm-element set.
Likewise, we can restate Theorem 4.4.5 as follows:

Theorem 4.4.9 (product rule for two sets). If A is an n-element set, and B is
an m-element set, and C is a p-element set, then A × B × C is an nmp-element
set.
More generally:

Theorem 4.4.10 (product rule for k sets). Let A1 , A2 , . . . , Ak be k sets. If each


Ai is an ni -element set, then A1 × A2 × · · · × Ak is an n1 n2 · · · nk -element set.
In other words, when you count k-tuples, with each entry coming from a
certain set, the total number is the product of the numbers of options for each
entry.
You can prove Theorem 4.4.10 by induction on k, using Theorem 4.4.8 and
the same “re-encode a tuple as a nested pair” trick that we used in our proof
of Theorem 4.4.5. We will later come back to this in more detail.
Math 221 Winter 2024, version March 12, 2024 page 192

Exercise 4.4.2. Let n ∈ N. Compute the number of all pairs ( a, b) ∈


{1, 2, . . . , n} × {1, 2, . . . , n} satisfying a ≡ b mod 2.
(The answer will depend on whether n is even or odd. You can find a
unified formula using the floor of a number, but you don’t have to.)

Exercise 4.4.3. Let A and B be two sets. Show that A × B = B × A holds if


and only if we have
A = B or A = ∅ or B = ∅.

4.4.4. Counting strictly increasing tuples (informally)


In Proposition 4.4.3 (b), we have seen that for any given n ∈ N, the # of pairs
( a, b) of elements of {1, 2, . . . , n} satisfying a < b is

( n − 1) n
 
n
1 + 2 + · · · + ( n − 1) = = .
2 2

What is the # of triples ( a, b, c) of elements of {1, 2, . . . , n} satisfying a < b < c


?
Such a triple ( a, b, c) always determines a 3-element subset { a, b, c} of {1, 2, . . . , n}
(and yes, this will really be a 3-element subset, because a < b < c entails that
a, b, c are distinct). Conversely, any 3-element subset of {1, 2, . . . , n} becomes a
triple ( a, b, c) with a < b < c if we list its elements in increasing order. Thus,
the triples ( a, b, c) of elements of {1, 2, . . . , n} satisfying a < b < c are just the
3-element subsets of {1, 2, . . . , n} in disguise.56 Hence,

(# of triples ( a, b, c) of elements of {1, 2, . . . , n} satisfying a < b < c)


= (# of 3-element subsets of {1, 2, . . . , n})
 
n
= (by Theorem 4.3.3, applied to S = {1, 2, . . . , n} and k = 3) .
3

More generally, for any k ∈ N, we have

(# of k-tuples ( a1 , a2 , . . . , ak ) of elements of {1, 2, . . . , n} satisfying a1 < a2 < · · · < ak )


 
n
=
k

(by a similar argument: these k-tuples are just the k-element subsets of {1, 2, . . . , n}
in disguise). For comparison, if we drop the “a1 < a2 < · · · < ak ” requirement,

56 We are again being informal here. To be more rigorous, we should be speaking of a one-to-
one correspondence between the former triples and the latter subsets. But it is not yet the
time for this pedantry.
Math 221 Winter 2024, version March 12, 2024 page 193

then we have

(# of k-tuples ( a1 , a2 , . . . , ak ) of elements of {1, 2, . . . , n})


· · · n}
= |nn {z (by Theorem 4.4.10)
k times
k
=n .

Other counting problems don’t have answers this simple. For instance, it is
not hard to see that

(# of k-tuples ( a1 , a2 , . . . , ak ) of elements of {1, 2, . . . , n}


such that a1 is the largest entry)
= 1k −1 + 2k −1 + 3k −1 + · · · + n k −1 ,

but there is no way to express this without a “· · · ” or a ∑ sign. For each specific
k, however, we can simplify this:

10 + 20 + · · · + n0 = 1| + 1 +{z· · · + 1} = n;
n times
n ( n + 1)
11 + 21 + · · · + n1 = 1 + 2 + · · · + n = ;
2
n (n + 1) (2n + 1)
12 + 22 + · · · + n2 = ;
6
n2 ( n + 1)2
13 + 23 + · · · + n3 = ;
4
n (2n + 1) (n + 1) 3n + 3n2 − 1

4 4 4
1 +2 +···+n = ;
30
....

Such a closed-form expression for 1m + 2m + · · · + nm exists for any specific


value of m (see, e.g., [Grinbe22, Lecture 17, Theorem 2.5.3] for how to find it).

In the next two chapters, we will learn what it means for a set to have n
elements, and what rules we have actually been using in our above informal
arguments. To do so, we must first get familiar with the concept of maps (also
known as functions).
Math 221 Winter 2024, version March 12, 2024 page 194

5. Maps (aka functions)


5.1. Functions, informally
One of the main notions in mathematics is that of a function, aka map, aka
mapping, aka transformation.
Intuitively, a function is a “black box” that takes inputs and transforms them
into outputs. For example, the “ f (t) = t2 ” function takes a real number t and
outputs its square t2 .
You can thus think of a function as a rule for producing an output from an
input. This gives the following provisional definition of a function:

Definition 5.1.1 (Informal definition of a function). Let X and Y be two sets.


A function from X to Y is (provisionally) a rule that transforms each element
of X into some element of Y.
If this function is called f , then the result of applying it to a given x ∈ X
(that is, the output produced by f when x is the input) will be called f ( x )
(or sometimes f x).

This is not a real definition, as it only kicks the can down the road: It defines
“function” in terms of “rule”, but what is a rule? But it gives some good
intuition, provided that it is correctly understood. Here are some comments
that should clarify it:

• A function has to “work” for each element of X. It cannot decline to


operate on some elements! Thus, “take the reciprocal” is not a function
from R to R, since it does not operate on 0 (because 0 has no reciprocal).
However, “take the reciprocal” is a function from R \ {0} to R, since any
nonzero real number does have a reciprocal.
• A function must not be ambiguous. Each input must produce exactly
one output. Thus, “take your number to some random power” is not a
function from R to R, since different powers give different results. (There
is a “multi-valued” variant of functions around, but they aren’t called
“functions”.)
• We write “ f : X → Y” for “ f is a function from X to Y”.
• Instead of saying “ f ( x ) = y”, we can say “ f transforms x into y” or “ f
sends x to y” or “ f maps x to y” or “ f takes the value y at x” or “y is
the value of f at x” or “y is the image of x under f ” or “applying f to x
yields y” or “ f takes x to y” or “ f : x 7→ y”. All of these statements are
synonyms.
For instance, if f is the “take the square” function from R to R, then
f (2) = 22 = 4, so that f transforms 2 to 4, or sends 2 to 4, or takes the
value 4 at 2, etc., or f : 2 7→ 4.
Math 221 Winter 2024, version March 12, 2024 page 195

Do not confuse the → arrow with the 7→ arrow! The former arrow is
written between the sets X and Y, whereas the latter is written between a
specific input and the corresponding output.

• As the above terminology suggests, the value of a function f at an input


x means the corresponding output f ( x ).

• The notation

X → Y,
x 7→ (some expression involving x )

(where X and Y are two sets) means “the function from X to Y that sends
each element x of X to the expression to the right of the “7→” symbol”.
1 x
Here, the expression can (for example) be x2 or or .
x+4 x+2
For example,

R → R,
x 7→ x2

is the “take the square” function (sending each element x of R to x2 ). For


another example,

R → R,
x
x 7→
sin x + 15
is the function that takes the sine of the input, then adds 15, then divides
the input by the result. (Note that this is well-defined, since sin x + 15 is
x
never zero and thus the expression is always meaningful, so we
sin x + 15
really get a function from R to R.)
For yet another example,

R → R,
x 7→ 2

is the function that sends each real number x to 2; this is an example of


a constant function. (This is a case where our “expression involving x”
does not actually contain x. This is perfectly fine; it’s just a very simple
particular case.)
For yet another example,

Z → Q,
x 7→ 2x
Math 221 Winter 2024, version March 12, 2024 page 196

is a function (sending each integer x to 2x ). Some of its values are listed


in the following table:

x −2 −1 0 1 2
1 1
2x 1 2 4
4 2

A more complicated example is the function

Z → Q,

 1 , if x ̸= 1;
x 7→ x − 1
5, if x = 1.

• The notation

f : X → Y,
x 7→ (some expression involving x )

means that we take the function from X to Y that sends each x ∈ X to the
expression to the right of the “7→” symbol, and we call this function f .
(Or, if a function named f has already been defined, this notation means
that this f is the function from X to Y that sends each x ∈ X to the
expression to the right of the “7→” symbol.)
For example, if we write

f : R → R,
x 7→ x2 + 1,

then f henceforth will denote the function from R to R that sends each
x ∈ R to x2 + 1.

• If the set X is finite, then a function f : X → Y can be specified by simply


listing all its values. For example, I can define a function h : {0, 2, 4} → N
by setting

h (0) = 92,
h (2) = 20,
h (4) = 92.

The values here have been chosen at whim, for no particular reason. A
function does not have to be “natural” or “meaningful” in any way; all it
has to do is transform each element of X into some element of Y.
Math 221 Winter 2024, version March 12, 2024 page 197

• If f is a function from X to Y, then the sets X and Y are part of the


function. Thus,
g1 : Z → Q,
x 7→ 2x
and
g2 : N → Q,
x 7→ 2x
and
g3 : N → N,
x 7→ 2x
are three distinct functions! We distinguish between them, so that we can
later speak of the “domain” and the “target” of a function. Namely, the
domain of a function f : X → Y is defined to be the set X, whereas the
target of a function f : X → Y is defined to be the set Y. Thus, the above
function g2 has target Q, whereas the function g3 has target N. The above
function g1 has domain Z, whereas the function g2 has domain N.
• When are two functions equal? In programming, functions are often un-
derstood to be (implemented) algorithms, and two algorithms can be dif-
ferent even if they compute the same thing. In mathematics, it’s different:
Only the domain, the target and the output values matter; the way they
are computed does not (and indeed there might not even be a way to
compute them). Two algorithms that (always) compute the same thing
count for one function only.
So when are two functions considered to be equal?
Two functions f 1 : X1 → Y1 and f 2 : X2 → Y2 are said to be equal if and
only if
X1 = X2 and Y1 = Y2 and
f1 (x) = f2 (x) for all x ∈ X1 .

An example of two equal functions is


f 1 : R → R,
x 7→ x2
and
f 2 : R → R,
x 7 → | x |2 ,

since each x ∈ R satisfies x2 = | x |2 .


Math 221 Winter 2024, version March 12, 2024 page 198

• The Caesarian ciphers from Section 3.9 can also be viewed as examples of maps
(i.e., functions). Specifically, if k is any integer, and if we denote the set of all
words (including nonsensical ones like “OQJCLA”) by W, then the Caesarian ci-
pher ROTk is a map from W to W. For instance, ROT1 (“KITTEN”) = “LJUUFO”.

At this point, we have a good idea of what a function is, but the provisional
definition given above (Definition 5.1.1) wasn’t as precise as we would like.
Even worse, the word “rule” in that definition is still unclear, and prevents us
from dealing with functions that can neither be given by an explicit formula
(such as “take the square”) nor be specified by a complete list of values (e.g.,
since the domain is infinite). Thus, we need a better definition of a function.
This is what we will do in the present chapter. The trick is to first define
the more general concept of a relation, and then to characterize functions as
relations with a certain property.

5.2. Relations
Relations (to be specific: binary relations) are another concept that you have
already seen on myriad examples:

• The relation ⊆ is a relation between two sets. For example, we have


{1, 3} ⊆ {1, 2, 3, 4} but we don’t have {1, 5} ⊆ {1, 2, 3, 4}.
• The order relations ≤ and < and > and ≥ are relations between two
integers (or rational numbers, or real numbers). For example, 1 ≤ 5 but
1 ≰ −1.

• The containment relation ∈ is a relation between an object and a set. For


instance, 3 ∈ {1, 2, 3, 4} but 5 ∈
/ {1, 2, 3, 4}.

• The divisibility relation | is a relation between two integers.

• The relation “coprime” is a relation between two integers.

• In plane geometry, there are lots of relations: “parallel” (between two


lines), “perpendicular” (between two lines), “congruent” (between two
shapes), “similar”, “directly similar”, etc.

• For any given integer n, the relation “congruent modulo n” is a relation


n n
between two integers. Let me call it ≡. Thus, a ≡ b holds if and only if
3 3
a ≡ b mod n. For example, 2 ≡ 8 but 2 ̸≡ 7.

What do these relations all have in common? They can be applied to pairs
of objects. Applying a relation to a pair of objects gives a statement which can
be true or false. For example, applying the relation “coprime” to the pair (5, 8)
Math 221 Winter 2024, version March 12, 2024 page 199

yields the statement “5 is coprime to 8”, which is true. Applying it to the pair
(5, 10) yields the statement “5 is coprime to 10”, which is false.
A general relation R relates elements of a set X with elements of a set Y. For
any pair ( x, y) ∈ X × Y (that is, for any pair consisting of an element x ∈ X
and an element y ∈ Y), we can apply the relation R to the pair ( x, y), obtaining
a statement “x R y” which is either true or false. To describe this relation R,
we need to know which pairs ( x, y) ∈ X × Y do satisfy x R y and which pairs
don’t. In other words, we need to know the set of all pairs ( x, y) ∈ X × Y that
satisfy x R y. For a rigorous definition of a relation, we simply take the relation
R to be this set of pairs. In other words, we define relations as follows:

Definition 5.2.1. Let X and Y be two sets. A relation from X to Y is a subset


of X × Y (that is, a set of pairs ( x, y) with x ∈ X and y ∈ Y).
If R is a relation from X to Y, and if ( x, y) ∈ X × Y is any pair, then

• we write x R y if ( x, y) ∈ R;

• we write x R y if ( x, y) ∈
/ R.

All the relations we have seen so far can be recast in terms of this definition:

• The divisibility relation | is a subset of Z × Z, namely the subset

{( x, y) ∈ Z × Z | x divides y}
= {( x, y) ∈ Z × Z | there exists some z ∈ Z such that y = xz}
= {( x, xz) | x ∈ Z and z ∈ Z} .
For instance, the pairs (2, 4) and (3, 9) and (10, 20) belong to this subset,
whereas the pairs (2, 3) and (2, 15) and (10, 5) do not.
• The coprimality relation (“coprime to”) is a subset of Z × Z, namely the
subset

{( x, y) ∈ Z × Z | x is coprime to y}
= {( x, y) ∈ Z × Z | gcd ( x, y) = 1} .
It contains, for instance, (2, 3) and (7, 9), but not (4, 6).
n
• For any n ∈ Z, the “congruent modulo n” relation ≡ is a subset of Z × Z,
namely the subset

{( x, y) ∈ Z × Z | x ≡ y mod n}
= {( x, x + nz) | x ∈ Z and z ∈ Z}
(because for a given integer x, the integers y that satisfy x ≡ y mod n are
precisely the integers of the form x + nz for z ∈ Z).
Math 221 Winter 2024, version March 12, 2024 page 200

• A geometric example: Let P be the set of all points in the plane, and let
L be the set of all lines in the plane. Then, the “lies on” relation (as in “a
point lies on a line”) is a subset of P × L, namely the subset

{( p, ℓ) ∈ P × L | the point p lies on the line ℓ} .

• If A is any set, then the equality relation on A is the subset E A of A × A


given by

E A = {( x, y) ∈ A × A | x = y}
= {( x, x ) | x ∈ A} .

Two elements x and y of A satisfy x E A y if and only if they are equal.

• We can literally take any subset of X × Y (where X and Y are two sets)
and it will be a relation from X to Y. Just as with functions, a relation
does not have to follow any “meaningful” rule. For example, here is a
relation from {1, 2, 3} to {5, 6, 7}:

{(1, 6) , (1, 7) , (3, 5)} .

Equivalently, it can be specified by the table

5 6 7
1 no yes yes
2 no no no
3 yes no no

(where a “yes” in row x and column y means that ( x, y) belongs to the


relation). If we call this relation R, then we have 1 R 6 and 1 R 7 and 3 R 5
but not 1 R 5 or 2 R 6.

A good way to visualize a relation R from a set X to a set Y (at least when
X and Y are finite) is by drawing the sets X and Y as blobs, drawing their
elements as nodes within these blobs, and drawing an arrow from the x-node
to the y-node for every pair ( x, y) that belongs to the relation R. For example,
the relation R in our last example can be visualized as follows:

X Y

1 5

2 6

3 7
(49)
Math 221 Winter 2024, version March 12, 2024 page 201

5.3. Functions, formally


We can now define functions rigorously:

Definition 5.3.1 (Rigorous definition of a function). Let X and Y be two sets.


A function from X to Y means a relation R from X to Y that has the following
property:

• Output uniqueness: For each x ∈ X, there exists exactly one y ∈ Y


such that x R y.

If R is a function from X to Y, and if x is an element of X, then the unique


element y ∈ Y satisfying x R y will be called R ( x ).

In our above example, the relation

{(1, 6) , (1, 7) , (3, 5)}

(which we illustrated in (49)) is not a function from {1, 2, 3} to {5, 6, 7}. In fact,
it violates output uniqueness at x = 1 (since there are two y ∈ {5, 6, 7} that
satisfy 1 R y) and also violates it at x = 2 (since there are no y ∈ {5, 6, 7} that
satisfy 2 R y). Each of these two violations is reason enough to disqualify this
relation from being a function.
In our above list of relations, only the equality relation E A is a function.
Here is an example of a function from X = {1, 2, 3} to Y = {5, 6, 7, 8}: the
relation
{(1, 7) , (2, 5) , (3, 7)} .
This relation satisfies output uniqueness and thus is a function. Visualized by
blobs and arrows, it looks as follows:

X Y

1 5

2 6

3 7

8
.

If we denote this function by f , then f (1) = 7 and f (2) = 5 and f (3) = 7.


Our way of visualizing relations by blobs and arrows makes the output
uniqueness property quite intuitive: This property just says that for each x ∈ X,
there is exactly one arrow starting at the x-node. In other words, each node in
the X-blob has to be the starting point of exactly one arrow. Thus, a function
Math 221 Winter 2024, version March 12, 2024 page 202

is a relation whose visual picture has exactly one arrow coming out of each
X-node.

Now we have two definitions of a function: the provisional definition (Defi-


nition 5.1.1) and the rigorous one (Definition 5.3.1). These two definitions are
equivalent. Indeed:

• If R is a function from X to Y in the sense of the rigorous definition (i.e.,


a relation from X to Y that satisfies output uniqueness), then R can also
be viewed as a rule that sends each element x of X to some element of
Y: namely, to the unique y ∈ Y that satisfies x R y. Thus, R becomes a
function in the provisional sense.

• Conversely, if f is a function from X to Y in the provisional sense (i.e., a


rule sending elements of X to elements of Y), then f can also be viewed
as a function in the rigorous sense (i.e., as a relation from X to Y that
satisfies output uniqueness), as follows: Let R be the set

{( x, f ( x )) | x ∈ X } .

This set R is a subset of X × Y, that is, a relation from X to Y. (In a


more intuitive language, this relation R is characterized as follows: Two
elements x ∈ X and y ∈ Y satisfy x R y if and only if y = f ( x ). That is,
roughly speaking, the relation R relates each input x ∈ X with the corre-
sponding output value f ( x ) ∈ Y and with nothing else.) This relation R
satisfies output uniqueness (because each input x ∈ X produces exactly
one output value f ( x )), and therefore is a function from X to Y in the
rigorous sense. Thus, f becomes a function in the rigorous sense (namely,
the rigorous function R).

Therefore, we can translate rigorous functions into provisional ones and vice
versa. We thus shall think of the two concepts as being the same (i.e., we
will regard the rigorous concept as a clarification of the provisional one). In
particular, all the notations we have introduced for provisional functions will
be used for rigorous ones.

5.4. Some more examples of functions


Let us give some examples of functions as well as some examples of what looks
like functions but are not.

Example 5.4.1. Consider the function

f 0 : {1, 2, 3, 4} → {1, 2, 3, 4}
Math 221 Winter 2024, version March 12, 2024 page 203

that sends 1, 2, 3, 4 to 3, 2, 3, 3, respectively. As a rigorous function, it is the


relation R that satisfies

1 R 3, 2 R 2, 3 R 3, 4R3

(and nothing else). In other words, it is the relation

{(1, 3) , (2, 2) , (3, 3) , (4, 3)} .

Example 5.4.2. What about the function

f 1 : {1, 2, 3, 4} → {1, 2, 3} ,
n 7→ n ?

Such a function f 1 does not exist, since it would have to send 4 to 4, but 4 is
not in the target {1, 2, 3}.
This is a pedantic issue, but it should be kept in mind: Not every ex-
pression that appears to define a function actually defines a function. Make
sure that the expression to the right of the “7→” symbol always is an actual
element of the target (which, in this case, is the set {1, 2, 3}).

Example 5.4.3. Consider the function

f 2 : {1, 2, 3, . . .} → {1, 2, 3, . . .} ,
n 7→ (the number of positive divisors of n) .

As a relation, it is

{(1, 1) , (2, 2) , (3, 2) , (4, 3) , (5, 2) , (6, 4) , (7, 2) , (8, 4) , (9, 3) , . . .} .

(We cannot list all the pairs, since there are infinitely many.) Thus, f 2 (1) = 1
and f 2 (2) = 2 and f 2 (3) = 2 and so on.

Example 5.4.4. What about the function

fe2 : Z → {1, 2, 3, . . .} ,
n 7→ (the number of positive divisors of n) ?

There is no such function fe2 , since fe2 (0) would have to be undefined or ∞
(because 0 has infinitely many positive divisors).
This is the exact same problem that we had with the non-function f 1 above.
Math 221 Winter 2024, version March 12, 2024 page 204

Example 5.4.5. What about the function

f 3 : {1, 2, 3, . . .} → {1, 2, 3, . . .} ,
n 7→ (the smallest prime divisor of n) ?

Again, there is no such function f 3 , since f 3 (1) makes no sense (indeed, the
number 1 has no prime divisors, thus no smallest prime divisor).
This is essentially the same problem as with the function fe2 from the pre-
vious example, except that this time the value f 3 (1) is really undefined (as
opposed to just failing to belong to the target).
Note that the function f 3 “almost” exists: There is a relation “y is the
smallest prime divisor of x” from {1, 2, 3, . . .} to {1, 2, 3, . . .}, but this relation
fails the output uniqueness requirement at x = 1, and thus is not a function.
However, we can make it into a function by removing the offending element
1 from its domain. That is, there is a function

fe3 : {2, 3, 4, . . .} → {1, 2, 3, . . .} ,


n 7→ (the smallest prime divisor of n) .

Example 5.4.6. What about the function

f 4 : Q → Z,
a
7→ a (for a, b ∈ Z with b ̸= 0) ?
b
Restated in words, this is to be a function that takes a rational number as
input, writes it as a ratio of two integers and outputs the numerator. Is there
such a function?
Again, the answer is no. Again, the problem is a failure of output unique-
ness, but this time, it fails not because the output does not exist (or does
not belong to the target), but rather because the output is non-unique. For
example, if f 4 was a function, then we would have the two equalities
 
1
f 4 (0.5) = f 4 =1 and
2
 
3
f 4 (0.5) = f 4 = 3,
6
which contradict one another. The underlying issue is that a rational number
can be written as a fraction in several different ways, and the numerators of
these fractions will usually not be the same. Thus, if you follow the rule
a
7→ a to compute the output of f 4 for a given input, your output will depend
b
on how exactly you write your input as a fraction, and this is a violation of
output uniqueness.
Math 221 Winter 2024, version March 12, 2024 page 205

5.5. Well-definedness
The issues that we have seen in the last few examples (supposed functions
failing to exist either because their output values make no sense, or because
these values don’t lie in Y, or because these values are ambiguous) are known
as well-definedness issues. Often, mathematicians say that “a function is well-
defined” when they mean that its definition does not suffer from such issues
(i.e., its definition really defines a function). So you should read “This function
is well-defined [or: not well-defined]” as “The definition we just gave really
defines a function [or: does not actually define a function]”.
For example, as we just saw, the function

f 4 : Q → Z,
a
7→ a
b
is not well-defined (i.e., there is no such function), but the function

f 5 : Q → Q,
a a2
7→ 2
b b
a
is well-defined (because if you write a given rational number as for different
b
a2
pairs ( a, b), the resulting quotients will all be equal). The function
b2
f 1 : {1, 2, 3, 4} → {1, 2, 3} ,
n 7→ n

is not well-defined (since its supposed output f 1 (4) fails to lie in the target
{1, 2, 3}), whereas the function

f 6 : {1, 2, 3, 4} → {1, 2, 3} ,
n 7→ 1 + (n%3)

is well-defined (since its outputs at 1, 2, 3, 4 are 2, 3, 1, 2).

5.6. The identity function


Definition 5.6.1. For any set A, there is an identity function id A : A → A.
This is the function that sends each element a ∈ A to a itself. In other words,
it is precisely the relation E A defined in Section 5.2.
Math 221 Winter 2024, version March 12, 2024 page 206

Here is the blobs-and-arrows visualization of the identity function id A for A =


{1, 2, 3}:
X Y

1 1

2 2

3 3

5.7. More examples, and multivariate functions


As we said before, a function f : X → Y can be described either by a rule or by
a list of values (if X is finite) or as a relation. For instance, the “take the square”
function on real numbers is the function
f : R → R,
x 7→ x2 .
As a relation, it is the set
n  o
x, x2 | x∈R .

When the domain of a function f is a Cartesian product of several sets (i.e., its
inputs are tuples), f is called a multivariate function. For instance, the function
f : Z × Z → Z,
( a, b) 7→ a + b
(which sends each pair ( a, b) of two integers to their sum a + b) is a multivariate
function. Its input is a pair of two integers, i.e., it really has two inputs (a and
b). As a relation, it is the subset
{(( a, b) , a + b) | a, b ∈ Z}
= {(( a, b) , c) | a, b, c ∈ Z such that c = a + b}
of (Z × Z) × Z. Of course, this function has a name: It is the addition of
integers. Other multivariate functions are
Z × Z → Z,
( a, b) 7→ a − b
(known as the subtraction of integers) and
Z × Z → Z,
( a, b) 7→ ab
Math 221 Winter 2024, version March 12, 2024 page 207

(the multiplication of integers), as well as similar functions defined for other


sets of numbers. Keep in mind that there is no “division” function

Z × Z → Z,
( a, b) 7→ a/b,

since a/b is not always an integer (and does not even exist when b = 0).
When f is a multivariate function whose inputs are k-tuples, we commonly
use the shorthand notation f ( a1 , a2 , . . . , ak ) for its values f (( a1 , a2 , . . . , ak )).
(That is, we commonly omit the outer pair of parentheses.) For instance, if
f is the addition of integers, then f ( a, b) = f (( a, b)) = a + b for all a, b ∈ Z.

5.8. Composition of functions


5.8.1. Definition
There are some ways to transform functions into other functions. The most
important one is composition:

Definition 5.8.1. Let X, Y and Z be three sets. Let f : Y → Z and g : X → Y


be two functions. Then, f ◦ g denotes the function

X → Z,
x 7→ f ( g ( x )) .

In other words, f ◦ g is the function that first applies g and then applies f .
This function f ◦ g is called the composition of f with g (and I pronounce it
“ f after g”).

In terms of relations, if we view f and g as two relations F and G (as in


Definition 5.3.1), then f ◦ g is the relation

{( x, z) | there exists y ∈ Y such that x G y and y F z} from X to Z.

Example 5.8.2. Consider the two functions

f : R → R,
x 7→ x3

and

g : R → R,
1
x 7→ .
x2 +7
Math 221 Winter 2024, version March 12, 2024 page 208

Then, for any real x ∈ R, we have


   3
1 1
( f ◦ g) ( x ) = f ( g ( x )) = f 2
= 2
x +7 x +7

whereas
  1 1
( g ◦ f ) ( x ) = g ( f ( x )) = g x3 = 2
= .
( x3 ) + 7 x6 + 7

Note that these two results are different. Thus, f ◦ g ̸= g ◦ f in general.

Example 5.8.3. Consider the two functions f : {1, 2, 3} → {1, 2, 3, 4} and


g : {1, 2, 3, 4} → {1, 2, 3} given by the following tables of values:

i 1 2 3 i 1 2 3 4
.
f (i ) 1 3 2 g (i ) 2 1 3 2

These two functions can be visualized using blobs and arrows, and we can
even reuse the target-blob from g as the domain-blob for f :

{1, 2, 3, 4} {1, 2, 3} {1, 2, 3, 4}

1 1 1

2 2 2

3 3 3
g f
4 4
.

This allows us to visually construct f ◦ g by removing the middle blob and


merging each g-arrow with the f -arrow that starts where the g-arrow ends:

{1, 2, 3, 4} {1, 2, 3, 4}

1 1

2 2

3 3

4 f ◦g 4
Math 221 Winter 2024, version March 12, 2024 page 209

Exercise 5.8.1. For any positive integer d, let us define the function

rd : Z → Z,
n 7→ n%d

(which sends each integer n to the remainder of the division of n by d). For
example, r5 (18) = 18%5 = 3 and r6 (18) = 18%6 = 0.
(a) Make a table of the values of the function r2 ◦ r3 on the inputs
0, 1, 2, 3, 4, 5.
(b) Prove that r2 ◦ r3 ̸= r2 .
(c) Let d and e be two positive integers such that d | e. Prove that rd ◦ re =
rd .

5.8.2. Basic properties


Let us recap. In Definition 5.8.1, we defined the composition of two functions57
f to g to be the function
(domain of g) → (target of f ) ,
x 7→ f ( g ( x )) .
This composition is denoted by f ◦ g.
As we saw, the compositions f ◦ g and g ◦ f are usually not the same (in fact,
in many cases, one of these is defined and the other isn’t). In other words,
composition of functions does not satisfy commutativity. However, it has a few
other nice properties:
Theorem 5.8.4 (associativity of composition). Let X, Y, Z, W be four sets. Let
f : Z → W, g : Y → Z and h : X → Y be three functions. Then,

( f ◦ g) ◦ h = f ◦ ( g ◦ h) .

Proof. Both ( f ◦ g) ◦ h and f ◦ ( g ◦ h) are functions from X to W. Moreover, for


each x ∈ X, we have
( f ◦ ( g ◦ h)) ( x ) = f (( g ◦ h) ( x )) (by the definition of f ◦ ( g ◦ h))
since the definition of g ◦ h
 
= f ( g (h ( x )))
yields ( g ◦ h) ( x ) = g (h ( x ))
and
(( f ◦ g) ◦ h) ( x ) = ( f ◦ g) (h ( x )) (by the definition of ( f ◦ g) ◦ h)
= f ( g (h ( x ))) (by the definition of f ◦ g) ,
57 Recall: “Function” and “map” mean the same thing.
Math 221 Winter 2024, version March 12, 2024 page 210

so that
( f ◦ ( g ◦ h)) ( x ) = f ( g (h ( x ))) = (( f ◦ g) ◦ h) ( x ) .
Since this holds for each x ∈ X, we conclude that f ◦ ( g ◦ h) = ( f ◦ g) ◦ h
(because two functions u and v from X to W are equal if and only if the equality
u ( x ) = v ( x ) holds for each x ∈ X). This proves the theorem.
Intuitively, the claim of Theorem 5.8.4 is pretty obvious: It is just saying that if
you can do three things (applying h, applying g and applying f ) in succession,
then it does not matter whether you view it as “first doing h followed by g, and
then doing f ” or as “first doing h, and then doing g followed by f ”.
Thanks to Theorem 5.8.4, we can write compositions of several functions
without parentheses: i.e., instead of writing f ◦ ( g ◦ h) or ( f ◦ g) ◦ h, we can just
write f ◦ g ◦ h.
The following property of composition of functions is even easier. We recall
that idP means the identity map on a given set P; this is the map from P to P
that sends each element p ∈ P to itself.

Theorem 5.8.5. Let f : X → Y be a function. Then,

f ◦ idX = idY ◦ f = f .

Proof. For each x ∈ X, we have

( f ◦ idX ) ( x ) = f (idX ( x ))
= f (x) (since the definition of idX yields idX ( x ) = x ) .
This shows that f ◦ idX = f (since both f ◦ idX and f are functions from X to
Y). A similar computation yields idY ◦ f = f . Thus, the theorem follows.
Thanks to Theorem 5.8.5, we can remove identity maps from compositions:
e.g., the composition f ◦ g ◦ idP ◦ h (where P is the target of h and the domain
of g) can be simplified to f ◦ g ◦ h.

Exercise 5.8.2. Let s1 , s2 , s3 be the three functions from {1, 2, 3, 4} to


{1, 2, 3, 4} defined by the following tables of values:

i 1 2 3 4
s1 ( i ) 2 1 3 4
.
s2 ( i ) 1 3 2 4
s3 ( i ) 1 2 4 3

(That is, each si is the function that transforms the two numbers i and i + 1
into one another while leaving all other inputs unchanged.)
Math 221 Winter 2024, version March 12, 2024 page 211

(a) Make a table of values of the function s2 ◦ s3 ◦ s1 ◦ s2 .


(b) Is s1 ◦ s3 = s3 ◦ s1 ?
(c) Is s1 ◦ s2 = s2 ◦ s1 ?
(d) Let w be the function from {1, 2, 3, 4} to {1, 2, 3, 4} with the following
table of values:
i 1 2 3 4
.
w (i ) 4 2 1 3

Write w as a composition of some of the functions s1 , s2 , s3 . (You can use


these functions in any order and any number of times, including none. For
example, “s2 ◦ s3 ◦ s1 ◦ s2 ” would be a valid answer if this function was w.)

5.9. Jectivities (injectivity, surjectivity and bijectivity)


Now we introduce some important properties of functions, which have to do
with how often they attain certain values. There are three of these properties,
and I refer to them as the “jectivity properties”, as they are called injectivity,
surjectivity and bijectivity.

Definition 5.9.1. Let f : X → Y be a function. Then:


(a) We say that f is injective (aka one-to-one, aka an injection) if

for each y ∈ Y, there exists at most one x ∈ X such that f ( x ) = y.

In other words: We say that f is injective if there are no two distinct elements
x1 , x2 ∈ X such that f ( x1 ) = f ( x2 ).
In other words: We say that f is injective if any two elements x1 , x2 ∈ X
satisfying f ( x1 ) = f ( x2 ) must also satisfy x1 = x2 .
(b) We say that f is surjective (aka onto, aka a surjection) if

for each y ∈ Y, there exists at least one x ∈ X such that f ( x ) = y.

In other words: We say that f is surjective if every element of Y is an output


value of f .
(c) We say that f is bijective (aka a one-to-one correspondence, aka a
bijection) if

for each y ∈ Y, there exists exactly one x ∈ X such that f ( x ) = y.

Thus, f is bijective if and only if f is both injective and surjective.

Here are some examples:


Math 221 Winter 2024, version March 12, 2024 page 212

• The function

f : N → N,
k 7 → k2

is injective (because no two distinct nonnegative integers x1 , x2 satisfy


x12 = x22 ) but not surjective (because, e.g., the nonnegative integer 2 ∈ N
is not the square of any nonnegative integer). Thus, it is not bijective.

• Let S = {0, 1, 4, 9, 16, . . .} be the set of all perfect squares (i.e., all squares
of integers). Then, the function

g : N → S,
k 7 → k2

is injective (for the same reason as the f in the previous example) and
also surjective (since every perfect square can be written as k2 for some
k ∈ N). Thus, it is bijective.
Take note: The functions f and g differ only in their choice of target! Other
than that, they are indistinguishable (both have domain N, and send each
element of this domain to its square). But of course, this little difference
matters for the surjectivity, since the surjectivity depends crucially on the
target. No wonder that g is surjective while f is not.

• Let S = {0, 1, 4, 9, 16, . . .} be the set of all perfect squares again. Consider
the function

gZ : Z → S,
k 7 → k2 ,

which differs from g only in its domain (it allows all integers rather than
only nonnegative integers as inputs). This function gZ is not injective
(since gZ (1) = gZ (−1)), but is still surjective (since each perfect square
can be written as k2 for some k ∈ Z). Since it is not injective, it cannot be
bijective.

• The function

h : N → N,
k 7→ k//2

(recall that k//2 is the quotient of the division of k by 2) is not injective (for
example, the two distinct elements 0, 1 ∈ N satisfy h (0) = h (1), because
both h (0) = 0//2 and h (1) = 1//2 are 0), but is surjective (because for
each y ∈ N, there exists an x ∈ N such that h ( x ) = y, namely for example
x = 2y). Hence, it is not bijective.
Math 221 Winter 2024, version March 12, 2024 page 213

• Let E = {0, 2, 4, 6, . . .} be the set of all even nonnegative integers. The


function
heven : E → N,
k 7→ k//2
(note that k//2 = k/2 here, since k is even) is both injective and surjective,
thus bijective.
• Let O = {1, 3, 5, 7, . . .} be the set of all odd nonnegative integers. The
function
hodd : O → N,
k 7→ k//2
is also injective and surjective, thus bijective.
• The function
f : Z × Z → Z,
( a, b) 7→ a + b
(that is, the addition of integers) is not injective (because, for instance,
f (0, 1) = 1 = f (1, 0)), but is surjective (since each n ∈ Z satisfies n =
f (n, 0)). In other words, two pairs of integers can have the same sum, but
every integer can be written as a sum of two integers.
The following criterion for injectivity, surjectivity and bijectivity is just a re-
statement of Definition 5.9.1, but it can be quite useful for checking these prop-
erties:
Remark 5.9.2. Consider a function f : X → Y given by a table of all its values
(possibly an infinite table if X is infinite). Assume that all possible inputs
x ∈ X appear in the top row (each exactly once), and the corresponding
outputs f ( x ) appear in the bottom row, so the table looks as follows:

x a b c d ···
.
f (x) f ( a) f (b) f (c) f (d) ···

Then:
(a) The function f is injective if and only if the bottom row of this table has
no two equal entries.
(b) The function f is surjective if and only if every element of Y appears in
the bottom row.
(c) The function f is bijective if and only if every element of Y appears
exactly once in the bottom row.
Math 221 Winter 2024, version March 12, 2024 page 214

For example:

• The function

f : {1, 2, 3} → {7, 8, 9} ,
k 7→ k + 6

is bijective, as you can see from its table of values:

k 1 2 3
f (k) 7 8 9

(by noticing that every element of {7, 8, 9} appears exactly once in the
bottom row of this table). Of course, this can also be shown logically (by
arguing that f is injective and surjective because adding 6 can be undone
by subtracting 6).

• The function58

f : {4, 6, 7} → {0, 1, 2} ,
k 7→ k%3

is neither injective nor surjective. Indeed, its table of values

k 4 6 7
f (k) 1 0 1

has the element 1 appear twice in the bottom row (so f is not injective)
and does not have the element 2 in its bottom row (so f is not surjective).

Here is yet another way to restate Definition 5.9.1:

Remark 5.9.3. If you visualize a function f : X → Y as a blobs-and-arrows


picture (as explained in Section 5.3), then

• the function f is injective if and only if no two arrows hit the same
Y-node;

• the function f is surjective if and only if every node in the Y-blob gets
hit by at least one arrow;

• the function f is bijective if and only if every node in the Y-blob gets
hit by exactly one arrow.

58 Recall that k%3 denotes the remainder of the division of k by 3.


Math 221 Winter 2024, version March 12, 2024 page 215

This can be illustrated by the following four examples:

X Y
X Y
1 1
1 1
2 2
2 2
3 3
3
neither injective nor surjective
injective but not surjective
(since 1 ∈ Y is not hit,
(since 2 ∈ Y is not hit)
while 2 ∈ Y is hit twice)

X Y X Y

1 1 1 1

2 2 2 2

3 3 3

surjective but not injective both injective and surjective


(since 1 ∈ Y is hit twice) (i.e., bijective)

Exercise 5.9.1. For each of the following functions, determine whether it is


injective, surjective and/or bijective:
(a) The function
f : Z → Z,
x 7→ x2 .
(b) The function
f : Z → Z,
x 7→ x3 .
(c) The function
f : Z × Z → Z,
( x, y) 7→ x2 + y2 .
(d) The function
f : Z → Z,
x 7→ 3 − x.
Math 221 Winter 2024, version March 12, 2024 page 216

(e) The function

f : Z → Z,
x 7→ 3 − 2x.

(f) The function

f : N → N,
x 7→ x!.

(Keep in mind that 0 ∈ N.)


(g) The function

f : Z × Z → Z × Z,
( x, y) 7→ ( x + y, x − y) .

(h) The function

f : Z × Z → Z × Z,
( x, y) 7→ ( x − y, y − x ) .

(i) The function

f : Z × Z → Z × Z,
( x, y) 7→ ( x + 2y, x + y) .

(j) The function

f : Z × Z → Z × Z,
( x, y) 7→ ( x + 2y, 2x + y) .

5.10. Inverses
5.10.1. Definition and examples
Bijective maps have a special power: They can be inverted. Here is what this
means:

Definition 5.10.1. Let f : X → Y be a function. An inverse of f means a


function g : Y → X such that

f ◦ g = idY and g ◦ f = idX .


Math 221 Winter 2024, version March 12, 2024 page 217

In other words, an inverse of f means a function g : Y → X such that

f ( g (y)) = y for each y ∈ Y, and


g ( f ( x )) = x for each x ∈ X.

Roughly speaking, an inverse of f thus means a map that both undoes f and
is undone by f .
Not every function has an inverse. We shall soon see which ones do and
which ones don’t; we will also prove that an inverse of f is unique if it exists.
For now, however, let us explore a few examples:

• Let f : {1, 2, 3} → {7, 8, 9} be the “add 6” function – i.e., the function that
sends each x ∈ {1, 2, 3} to x + 6 ∈ {7, 8, 9}. Then, f has an inverse: the
“subtract 6” function (i.e., the function from {7, 8, 9} to {1, 2, 3} that sends
each y to y − 6). Indeed, if we denote the “subtract 6” function by g, then
we have

f ( g (y)) = f (y − 6) = (y − 6) + 6 = y for each y ∈ {7, 8, 9} , and


g ( f ( x )) = g ( x + 6) = ( x + 6) − 6 = x for each x ∈ {1, 2, 3} .

• Let f : {1, 2, 3} → {7, 8, 9} be the “subtract from 10” function – i.e., the
function that sends each x ∈ {1, 2, 3} to 10 − x ∈ {7, 8, 9}. Then, f has
its inverse: In fact, this function f is its own inverse. This is because
10 − (10 − n) = n for each n ∈ Z.

• Let f : {1, 2, 3, 4, 5} → {1, 2, 3, 4, 5} be the function that sends 1, 2, 3, 4, 5 to


3, 4, 1, 5, 2, respectively. Then, f has an inverse: namely, the function g that
sends 1, 2, 3, 4, 5 to 3, 5, 1, 2, 4, respectively. We can check that f ( g (y)) =
y for each y ∈ {1, 2, 3, 4, 5}. For example, for y = 3, this is because
f ( g (3)) = f (1) = 3. Similarly we can check that g ( f ( x )) = x for each
x ∈ {1, 2, 3, 4, 5}.
This is best seen by drawing the blobs-and-arrows diagrams of f and g
Math 221 Winter 2024, version March 12, 2024 page 218

side by side:

X Y Y X

1 1 1 1

2 2 2 2

3 3 3 3 .

4 4 4 4

5 5 5 5

f g
As you see, there is a “dual” relationship between these two diagrams:
Whenever the diagram of f has an arrow from some x ∈ X to some
y ∈ Y, the diagram of g has an arrow from y to x. In other words, the
diagram of g can be obtained from the diagram of f by swapping the X-
blob with the Y-blob and reversing the direction of each arrow. This rule
applies not just to our specific two maps f and g, but to any map f that
has an inverse. Thus, if you have drawn a blobs-and-arrows diagram of
a function f , it is fairly easy to construct its inverse (as long as such an
inverse exists).
This rule can also be restated in terms of tables of values: If you have a
table of all values of a function f : X → Y, then you can get an inverse of f
by swapping the two rows of this table. For instance, if f : {1, 2, 3, 4, 5} →
{1, 2, 3, 4, 5} is the function we just showed, then f has the table of values
k 1 2 3 4 5
,
f (k) 3 5 1 2 4
and thus you can get its inverse g by swapping the two rows:
k 3 5 1 2 4
.
g (k) 1 2 3 4 5

• Let f : {1, 2, 3, 4} → {1, 2, 3, 4} be the function that sends 1, 2, 3, 4 to


1, 2, 3, 3, respectively. Then, f has no inverse. Indeed, if g was an inverse
of f , then we would have
3 = g ( f (3)) (since g ( f ( x )) = x for each x ∈ {1, 2, 3, 4})
= g ( f (4)) (since f (3) = 3 = f (4))
=4 (since g ( f ( x )) = x for each x ∈ {1, 2, 3, 4}) ,
Math 221 Winter 2024, version March 12, 2024 page 219

which is absurd.
The same argument shows that more generally, if a function f : X → Y
is to have an inverse, then f should be injective, because two distinct
elements x1 and x2 of X satisfying f ( x1 ) = f ( x2 ) would create a contra-
diction via x1 = g ( f ( x1 )) = g ( f ( x2 )) = x2 .

• Let f : {1, 2, 3} → {1, 2, 3, 4} be the function that sends 1, 2, 3 to 1, 2, 3,


respectively. Then, f has no inverse. Indeed, if g was an inverse of f , then
we would have f ( g (4)) = 4, but this is absurd, since 4 is not an output
of f .
The same argument shows that more generally, if a function f : X → Y is
to have an inverse, then f should be surjective, because each y ∈ Y will
satisfy y = f ( g (y)) and thus be an output value of f .

5.10.2. Invertibility is bijectivity by another name


Combining the morals of the last two examples, we conclude that if a function
f : X → Y is to have an inverse, then f should be both injective and surjective,
i.e., should be bijective. In other words, only bijective maps have a chance at
having inverses. This turns out to be sufficient as well: If a map is bijective,
then it has an inverse. Let us summarize this as a theorem:

Theorem 5.10.2. Let f : X → Y be a map between two sets X and Y. Then, f


has an inverse if and only if f is bijective.

Proof. We must prove the logical equivalence

( f has an inverse) ⇐⇒ ( f is bijective) . (50)

Let us prove the =⇒ and ⇐= directions separately:


=⇒: Assume that f has an inverse. We must show that f is bijective.59
We assumed that f has an inverse. Let g be this inverse.
Let us show that f is injective. Let x1 , x2 ∈ X satisfy f ( x1 ) = f ( x2 ). We must
prove that x1 = x2 . Set y = f ( x1 ); then, y = f ( x2 ) as well (since f ( x1 ) = f ( x2 )).
Since g is an inverse of f , we have x1 = g ( f ( x1 )) = g (y) (since f ( x1 ) = y) and
x2 = g ( f ( x2 )) = g (y) (since f ( x2 ) = y). Thus, x1 = g (y) = x2 . This completes our
proof that f is injective.
Let us show that f is surjective. Let y ∈ Y. Then, y = f ( g (y)) (since g is an inverse
of f ). Therefore, there exists an x ∈ X such that y = f ( x ) (namely, x = g (y)). So we
have proved for each y ∈ Y that there exists an x ∈ X such that y = f ( x ). In other
words, f is surjective.
So f is both injective and surjective, thus bijective. This proves the “=⇒” direction
of our equivalence (50).

59 Wehave already done this in the above examples, but we repeat it for the sake of complete-
ness.
Math 221 Winter 2024, version March 12, 2024 page 220

Let us now prove the “⇐=” direction:


⇐=: Assume that f is bijective. We must show that f has an inverse.
Since f is bijective, for each y ∈ Y, there exists a unique x ∈ X such that f ( x ) = y.
Thus, we can define a map
g : Y → X,
which sends each y ∈ Y to this unique x. It is easy to see that g is an inverse of f .
Thus, f has an inverse. This proves the “⇐=” direction of our equivalence (50). Thus,
the proof of (50) is complete, i.e., Theorem 5.10.2 is proved.

Theorem 5.10.2 says that bijective maps are the same as invertible maps (i.e.,
maps that have an inverse). This is a fundamental result that is used all over
mathematics.

5.10.3. Uniqueness of the inverse


As we promised, let us now show that an inverse of a map f is unique if it
exists:

Theorem 5.10.3. Let f : X → Y be a function. Then, f has at most one


inverse.
Proof. What does “at most one inverse” mean? It means that f has no two dis-
tinct inverses. In other words, it means that any two inverses of f are identical.
So let us prove this. Let g1 and g2 be two inverses of f . We must show that
g1 = g2 .
Since g1 is an inverse of f , we have g1 ◦ f = idX and f ◦ g1 = idY .
Since g2 is an inverse of f , we have g2 ◦ f = idX and f ◦ g2 = idY .
By associativity of composition (Theorem 5.8.4), the two maps ( g1 ◦ f ) ◦ g2
and g1 ◦ ( f ◦ g2 ) are equal. Thus, we can denote both of these maps by g1 ◦ f ◦
g2 .
Comparing

g1 ◦ f ◦ g2 = g1 ◦ idY = g1 with
| {z }
=idY
g ◦ f ◦ g = idX ◦ g2 = g2 ,
| 1{z } 2
=idX

we find g1 = g2 , qed.

Definition 5.10.4. Let f : X → Y be a map that has an inverse. Then, this


inverse (which is unique by Theorem 5.10.3) is called f −1 .

Thus, if f : X → Y is a map that has an inverse (i.e., by Theorem 5.10.2, a


bijective map), then we have

f −1 ◦ f = idX and f ◦ f −1 = idY ,


Math 221 Winter 2024, version March 12, 2024 page 221

that is,

f −1 ( f ( x )) = x for each x ∈ X, and (51)


 
f f −1 ( y ) = y for each y ∈ Y. (52)

These equalities should explain why the notation f −1 was chosen for the inverse
of f .

5.10.4. More examples


Here are some further examples of inverses:

• Let E = {0, 2, 4, 6, . . .} be the set of all even nonnegative integers. Consider


the function

f : E → N,
k 7→ k/2.

Then, f has an inverse. This inverse is the function

f −1 : N → E,
k 7→ 2k.

• Let R≥0 = {all nonnegative real numbers}. Then, the function

f : R≥0 → R≥0 ,
x 7→ x2

has an inverse. This inverse is the function

f −1 : R≥0 → R≥0 ,

x 7→ x.

• In contrast, the function

f : R → R,
x 7→ x2

has no inverse. In fact, this function is not injective (since f (1) = f (−1))
and not surjective (since −1 is not a square of a real number), so it is
certainly not bijective, and thus not invertible.
Math 221 Winter 2024, version March 12, 2024 page 222

• The function
f : R → R,
x 7→ x3
has an inverse. This inverse is the function
f −1 : R → R,

x 7→ 3 x.

• Another example of inverses comes from cryptography: If k is any integer, then


the Caesarian cipher ROTk (defined in Section 3.9, regarded as a map from W =
{words} to W) has an inverse, namely ROT−k . This is just saying that any word
encrypted with ROTk can be decrypted with ROT−k and vice versa.

Exercise 5.10.1. Show that the function

f : Z × Z → Z × Z,
( x, y) 7→ ( x + 3y, 2x + 5y)

has an inverse. Give an explicit formula for this inverse (i.e., for f −1 ((u, v))).

[Hint: This is a linear algebra question, since f −1 ((u, v)) should be a pair
( x, y) ∈ Z × Z satisfying ( x + 3y, 2x + 5y) = (u, v).]

5.10.5. Inverses of inverses and compositions


Here are some more general properties of inverses:
Proposition 5.10.5. Let X be any set. Then, the identity map idX : X → X is
bijective, and is its own inverse.

Proof. The map idX is an inverse of itself (since idX ◦ idX = idX and idX ◦ idX =
idX ). Hence, it has an inverse, and thus is bijective (by Theorem 5.10.2).

Theorem 5.10.6. Let f : X → Y be a map that has an inverse f −1 : Y → X.


Then, f −1 has an inverse, namely f .

Proof. Since f −1 is an inverse of f , we have f ◦ f −1 = idY and f −1 ◦ f = idX .


But the same two equalities can be read as saying that f is an inverse of f −1 .
Theorem 5.10.7 (socks-and-shoes formula). Let X, Y and Z be three sets. Let
g : X → Y and f : Y → Z be two bijective functions. Then, the composition
f ◦ g : X → Z is bijective as well, and its inverse is

( f ◦ g ) −1 = g −1 ◦ f −1 .
Math 221 Winter 2024, version March 12, 2024 page 223

Proof. This is obvious from the blobs-and-arrows picture; but let us check this
rigorously.
For any x ∈ X, we have
 
 
g−1 ◦ f −1 (( f ◦ g) ( x )) = g−1  f −1 ( f ( g ( x ))) = g−1 ( g ( x )) = x.
 
| {z }
= g( x )

For any z ∈ Z, we have


 
        
−1 −1 −1
( f ◦ g) g ◦f (z) = f  g g f (z)  = f f −1 (z) = z.
−1
 
| {z }
= f −1 ( z )

Thus, g−1 ◦ f −1 is an inverse of f ◦ g. Hence, f ◦ g has an inverse, and thus is


bijective (by Theorem 5.10.2).
Remark 5.10.8. Note that g−1 ◦ f −1 is not the same as f −1 ◦ g−1 . Indeed,
f −1 ◦ g−1 might not even exist in Theorem 5.10.7.
A surprising feature of the socks-and-shoes formula ( f ◦ g)−1 = g−1 ◦ f −1
is that the order in which the inverses f −1 and g−1 appear on the right hand
side is different from the order in which f and g appear on the left hand side.
However, this is completely natural: If you want to undo two things you have
done in some order, then you should undo them in the opposite order! For
example, if you have put on your socks and then your shoes in the morning,
then you need to first take off the shoes and then the socks when you go to bed.
(The formula owes its moniker to this metaphor.)
Remark 5.10.9. Part of Theorem 5.10.7 says that a composition of two bijec-
tive functions is bijective. However, a composition f ◦ g of two non-bijective
functions f and g can sometimes also be bijective. Here is an example:

{1, 2}

{1} 1 {1}

1 2 1
g f

5.11. Some exercises on jectivities and inverses


5.11.1. Exercises with solutions
Here are a few solved exercises on jectivity properties and inverses.
Math 221 Winter 2024, version March 12, 2024 page 224

Exercise 5.11.1. Let X, Y and Z be three sets, and f : Y → Z and g : X → Y


be two maps. Which of the following are true?
(a) If f and g are injective, then f ◦ g is injective.
(b) If f ◦ g is injective, then f is injective.
(c) If f ◦ g is injective, then g is injective.
(d) If f and g are surjective, then f ◦ g is surjective.
(e) If f ◦ g is surjective, then f is surjective.
(f) If f ◦ g is surjective, then g is surjective.
(g) If f and g are bijective, then f ◦ g is bijective.
(h) If f ◦ g is bijective, then f is bijective.
(i) If f ◦ g is bijective, then g is bijective.

Solution. We shall use the following definitions of “injective”, “surjective” and


“bijective”60 :

• A map h : U → V is injective if and only if it has the following property:


For any u1 , u2 ∈ U satisfying h (u1 ) = h (u2 ), we have u1 = u2 .

• A map h : U → V is surjective if and only if it has the following property:


For any v ∈ V, there exists some u ∈ U such that h (u) = v.

• A map h : U → V is bijective if and only if h is both injective and


surjective.

(a) This is true.


[Proof: Assume that f and g are injective. We must prove that f ◦ g is injective.
Let u1 , u2 ∈ X satisfy ( f ◦ g) (u1 ) = ( f ◦ g) (u2 ). We shall show that u1 = u2 .
Indeed, we have ( f ◦ g) (u1 ) = f ( g (u1 )) (by the definition of f ◦ g), so that

f ( g (u1 )) = ( f ◦ g) (u1 ) = ( f ◦ g) (u2 ) = f ( g (u2 ))

(again by the definition of f ◦ g). Since f is injective, we thus conclude that


g (u1 ) = g (u2 ) 61 . Since g is injective, we thus conclude that u1 = u2 .
Forget that we fixed u1 , u2 . We thus have shown that for any u1 , u2 ∈ X
satisfying ( f ◦ g) (u1 ) = ( f ◦ g) (u2 ), we have u1 = u2 . In other words, the map
f ◦ g is injective (by our definition of “injective”). This completes our proof.]
60 We gave several equivalent definitions for “injective”, “surjective” and “bijective” in Defini-
tion 5.9.1; you can just as well use any of them instead.
61 In some more detail:

We know that f is injective. In other words, for any v1 , v2 ∈ Y satisfying f (v1 ) = f (v2 ),
we have v1 = v2 (by our definition of “injective”). Applying this to v1 = g (u1 ) and v2 =
g (u2 ), we obtain g (u1 ) = g (u2 ) (since f ( g (u1 )) = f ( g (u2 ))).
Math 221 Winter 2024, version March 12, 2024 page 225

(b) This is false.


[Counterexample: For instance, we can set X = {1} and Y = {1, 2} and Z =
{1}, and let f : Y → Z be the map that sends both elements of Y to 1, while
g : X → Y is the map sending 1 to 1. Then, f ◦ g is injective (in fact, f ◦ g is the
identity map id{1} ), but f is not.]
(c) This is true.
[Proof: Assume that f ◦ g is injective. We must prove that g is injective.
Let u1 , u2 ∈ X satisfy g (u1 ) = g (u2 ). We shall show that u1 = u2 .
Indeed, we have ( f ◦ g) (u1 ) = f ( g (u1 )) (by the definition of f ◦ g) and
( f ◦ g) (u2 ) = f ( g (u2 )) (likewise). Hence,
 

( f ◦ g) (u1 ) = f  g (u1 ) = f ( g (u2 )) = ( f ◦ g) (u2 ) .


 
| {z }
= g ( u2 )

Since f ◦ g is injective, this entails u1 = u2 .


Forget that we fixed u1 , u2 . We thus have shown that for any u1 , u2 ∈ X
satisfying g (u1 ) = g (u2 ), we have u1 = u2 . In other words, the map g is
injective (by our definition of “injective”). This completes our proof.]
(d) This is true.
[Proof: Assume that f and g are surjective. We must prove that f ◦ g is
surjective.
Let z ∈ Z be arbitrary. We shall show that there exists some x ∈ X such that
( f ◦ g) ( x ) = z.
Indeed, recall that f is surjective. Thus, there exists some y ∈ Y such that
f (y) = z. Consider this y.
Recall now that g is surjective. Thus, there exists some w ∈ X such that
g (w) = y. Consider this w.  

We have ( f ◦ g) (w) = f  g (w) = f (y) = z. Hence, there exists some


 
| {z }
=y
x ∈ X such that ( f ◦ g) ( x ) = z (namely, x = w).
Forget that we fixed z. We thus have shown that for any z ∈ Z, there exists
some x ∈ X such that ( f ◦ g) ( x ) = z. In other words, the map f ◦ g is surjective
(by our definition of “surjective”). This completes our proof.]
(e) This is true.
[Proof: Assume that f ◦ g is surjective. We must prove that f is surjective.
Let z ∈ Z be arbitrary. We shall show that there exists some y ∈ Y such that
f (y) = z.
Indeed, recall that f ◦ g is surjective. Thus, there exists some x ∈ X such that
( f ◦ g) ( x ) = z. Consider this x.
Now, f ( g ( x )) = ( f ◦ g) ( x ) = z. Hence, there exists some y ∈ Y such that
f (y) = z (namely, y = g ( x )).
Math 221 Winter 2024, version March 12, 2024 page 226

Forget that we fixed z. We thus have shown that for any z ∈ Z, there exists
some y ∈ Y such that f (y) = z. In other words, the map f is surjective (by our
definition of “surjective”). This completes our proof.]
(f) This is false.
[Counterexample: For instance, we can set X = {1} and Y = {1, 2} and Z =
{1}, and let f : Y → Z be the map that sends both elements of Y to 1, while
g : X → Y is the map sending 1 to 1. Then, f ◦ g is surjective (in fact, f ◦ g is
the identity map id{1} ), but g is not.]
(g) This is true.
[Proof: This is part of Theorem 5.10.7. But let us give a different proof as
well: Assume that f and g are bijective. Thus, f and g are both injective and
surjective. Hence, f ◦ g is injective (by Exercise 5.11.1 (a)) and surjective (by
Exercise 5.11.1 (d)). Thus, f ◦ g is bijective.]
(h) This is false.
[Counterexample: For instance, we can set X = {1} and Y = {1, 2} and Z =
{1}, and let f : Y → Z be the map that sends both elements of Y to 1, while
g : X → Y is the map sending 1 to 1. Then, f ◦ g is bijective (in fact, f ◦ g is the
identity map id{1} ), but f is not.]
(i) This is false.
[Counterexample: For instance, we can set X = {1} and Y = {1, 2} and Z =
{1}, and let f : Y → Z be the map that sends both elements of Y to 1, while
g : X → Y is the map sending 1 to 1. Then, f ◦ g is bijective (in fact, f ◦ g is the
identity map id{1} ), but g is not.]

Exercise 5.11.2. Let f : X → Y be a map that has an inverse f −1 : Y → X. Let


x ∈ X and y ∈ Y. Prove that we have the logical equivalence
 
( f ( x ) = y) ⇐⇒ f −1 (y) = x .

Solution. We shall prove the “=⇒” and “⇐=” parts of this equivalence sepa-
rately:
=⇒: If we have f ( x ) = y, then
 

f −1  y  = f −1 ( f ( x )) = x (by (51)) .
 
|{z}
= f (x)

Thus, the “=⇒” part of the equivalence holds.


⇐=: If we have f −1 (y) = x, then
 
 
x  = f f −1 ( y ) = y
f  |{z} (by (52)) .
= f −1 ( y )
Math 221 Winter 2024, version March 12, 2024 page 227

Thus, the “⇐=” part of the equivalence holds.

Exercise 5.11.3. Let A and B be two sets. As we know from Exercise 4.4.3,
the two sets A × B and B × A are usually not the same. However, I claim that
there is a bijective map from A × B to B × A. Prove this (by finding one such
map, and showing that it is bijective).

Solution. Consider the map

f : A × B → B × A,
( a, b) 7→ (b, a) .
This is the map that sends each pair ( a, b) ∈ A × B to the pair (b, a) ∈ B × A; in
other words, it swaps the two entries of the input pair. Likewise, consider the
map

g : B × A → A × B,
(b, a) 7→ ( a, b)
(which does the same as f , but does it to a pair in B × A instead of a pair in
A × B). Let us show that these two maps f and g are mutually inverse.
Indeed, in order to show this, we must check that f ◦ g = idB× A and g ◦ f =
id A× B .
Let us check that f ◦ g = idB× A . This means checking that ( f ◦ g) (y) =
idB× A (y) for each y ∈ B × A. So let y ∈ B × A be arbitrary. Thus, y is a pair
(b, a) with b ∈ B and a ∈ A. Consider these b and a. Hence, y = (b, a), so that
g (y) = g ((b, a)) = ( a, b) (by the definition of g). By the definition of f ◦ g, we
have  

( f ◦ g) (y) = f  g (y)  = f (( a, b)) = (b, a)


 
| {z }
=( a,b)

(by the definition of f ). Comparing this with idB× A (y) = y = (b, a), we obtain
( f ◦ g) (y) = idB× A (y).
Forget that we fixed y. We thus have shown that ( f ◦ g) (y) = idB× A (y) for
each y ∈ B × A. Thus, we have proved the equality f ◦ g = idB× A . Similarly we
can show the equality g ◦ f = id A× B (since the maps f and g are constructed
in the same way, just with the roles of A and B switched). These two equalities
(together) show that the map g is an inverse of f . Hence, the map f has an
inverse, and thus is bijective (by Theorem 5.10.2). Thus, there exists a bijective
map from A × B to B × A (namely, f ).

5.11.2. More exercises


Here are some more exercises.
Math 221 Winter 2024, version March 12, 2024 page 228

Exercise 5.11.4. Let A, B, C, D be four sets. Let f : A → C and g : B → D be


two maps. Define a new map f ∗ g : A × B → C × D by setting

( f ∗ g) ( a, b) = ( f ( a) , g (b)) for every pair ( a, b) ∈ A × B.

Prove the following:


(a) If f and g are injective, then f ∗ g is injective.
(b) If f and g are surjective, then f ∗ g is surjective.
(c) If f ∗ g is injective and the sets A and B are nonempty, then f and g are
injective.
(d) If f ∗ g is surjective and the sets C and D are nonempty, then f and g
are surjective.

Exercise 5.11.5. Let X and Y be two sets. Let f : X → Y be a map.


A left inverse of f means a map g : Y → X that satisfies g ◦ f = idX (but
not necessarily f ◦ g = idY ).
A right inverse of f means a map g : Y → X that satisfies f ◦ g = idY (but
not necessarily g ◦ f = idX ).
(a) Prove that f has a right inverse if and only if f is surjective.
(b) Assume that X ̸= ∅. Prove that f has a left inverse if and only if f is
injective.
(c) Find two distinct left inverses of the map

X Y

1 1

2 2

3
.

(d) Find two distinct right inverses of the map

X Y

1 1

2 2

3
.
Math 221 Winter 2024, version March 12, 2024 page 229

Exercise 5.11.6. For each of the following functions, determine whether it is


injective, surjective and/or bijective:
(a) The function

f : Q → Q,
x
x 7→ .
1 + x2
(b) The function

f : Z → Q,
x
x 7→ .
1 + x2
(c) The function

f : {finite nonempty subsets of Z} → Z,


S 7→ min S

(which sends each set to its smallest element).


(d) The function

f : Z × Z → Z,
( x, y) 7→ 2x + 3y.

(e) The function

f : N × N → N,
( x, y) 7→ 2x + 3y.

5.12. Isomorphic sets


As an application of inverses, we can define the concept of isomorphic sets:

Definition 5.12.1. Let X and Y be two sets. We say that these two sets X
and Y are isomorphic as sets (or, for short, isomorphic, or in bijection, or
in one-to-one correspondence, or equinumerous) if there exists a bijective
map from X to Y.

Note that this relation “isomorphic as sets” is symmetric (i.e., if X and Y


are isomorphic, then Y and X are isomorphic). This is because if f : X → Y
is a bijective map, then f has an inverse f −1 (by Theorem 5.10.2), and this
inverse f −1 is again bijective (since Theorem 5.10.6 shows that f −1 again has an
inverse).
Math 221 Winter 2024, version March 12, 2024 page 230

Some examples:

• The sets {1, 2} and {1, 2, 3} are not isomorphic. In fact, there is no sur-
jective map f : {1, 2} → {1, 2, 3} (since, informally, a map from {1, 2} to
{1, 2, 3} has only two arrows, but two arrows cannot hit all three elements
of {1, 2, 3}). Thus, there is no bijective map f : {1, 2} → {1, 2, 3} either.

• The sets {1, 2, 3} and {6, 7, 8} are isomorphic. In fact, the map

{1, 2, 3} → {6, 7, 8} ,
k 7→ k + 5

(that is, the “add 5” map) is bijective (and its inverse sends k 7→ k − 5).

• The sets {1, 2, 3} and {3, 8, 19} are isomorphic. In fact, the map f :
{1, 2, 3} → {3, 8, 19} with the table of values

x 1 2 3
f (x) 3 8 19

is bijective.

• The sets {1, 2, 3} and {1, 3, 5} are isomorphic. In fact, the map

{1, 2, 3} → {1, 3, 5} ,
k 7→ 2k − 1

is a bijection.

• The sets N and E := {all even nonnegative integers} are isomorphic,


since the map

N → E,
n 7→ 2n

is a bijection.

• The sets N and O := {all odd nonnegative integers} are isomorphic, since
the map

N → O,
n 7→ 2n + 1

is a bijection.
Math 221 Winter 2024, version March 12, 2024 page 231

• The sets N and Z are isomorphic, since there is a bijection from N to Z


that sends
0, 1, 2, 3, 4, 5, 6, 7, 8, . . . to
0, 1, −1, 2, −2, 3, −3, 4, −4, . . . , respectively.
Explicitly, this map f can be defined by the following formula:
(
−n/2, if n is even;
f (n) = for each n ∈ N.
(n + 1) /2, if n is odd
(This formula ensures that the values f (0) , f (2) , f (4) , f (6) , . . . cover
exactly the integers 0, −1, −2, −3, . . . that are ≤ 0, whereas the values
f (1) , f (3) , f (5) , f (7) , . . . cover exactly the positive integers 1, 2, 3, 4, . . ..)
There are, of course, many other bijections from N to Z.
• The sets N and Q are isomorphic, since there is a bijection from N to Q
that sends
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, . . . to
−1 0 1 −2 −1 1 2 −3 −3 −2 −1 1 2 3 3
, , , , , , , , , , , , , , ,...,
| 1 {z1 1} | 1 {z 2 2 1} | 1 2 3 {z3 3 3 2 1}
all reduced fractions all reduced fractions all reduced fractions
whose numerator and whose numerator and whose numerator and
denominator are ≤1 denominator are ≤2 denominator are ≤3
in absolute value but not ≤1 but not ≤2
(ordered from smallest in absolute value in absolute value
to largest) (ordered from smallest (ordered from smallest
to largest) to largest)

respectively. (To be precise, we must only allow fully reduced fractions


a
– i.e., fractions with a ∈ Z and b ∈ {1, 2, 3, . . .} and gcd ( a, b) = 1 – in
b
order to avoid having the same rational number appear twice.)
• The sets N and N × N are isomorphic, since there is a bijection f from N
to N × N that sends
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, . . . to
(0, 0) , (0, 1) , (1, 0), (0, 2) , (1, 1) , (2, 0), (0, 3) , (1, 2) , (2, 1) , (3, 0), . . . ,
| {z } | {z } | {z } | {z }
all pairs all pairs all pairs all pairs
whose entries whose entries whose entries whose entries
sum to 0 sum to 1 sum to 2 sum to 3
(ordered by (ordered by (ordered by
increasing increasing increasing
first entry) first entry) first entry)

respectively. The inverse f −1 : N × N → N of this bijection can actually


be described by an explicit formula:
( n + m ) ( n + m + 1)
f −1 (n, m) = +n
2
Math 221 Winter 2024, version March 12, 2024 page 232

(nice and not-too-easy exercise: prove this!). This is the so-called Cantor
pairing function.

• The sets N and R are not isomorphic, i.e., there exists no bijection from
N to R. Informally speaking, this is because there are “a lot more” real
numbers than there are nonnegative integers. This is not a proof at all (af-
ter all, N and Q are isomorphic, despite the rational numbers seemingly
outnumbering the nonnegative integers!); an actual proof can be found
(e.g.) in [Newste23, Theorem 6.2.21] or in [LeLeMe16, Corollary 8.1.17].
Math 221 Winter 2024, version March 12, 2024 page 233

6. Enumeration revisited
6.1. Counting, formally
6.1.1. Definition
As you might have noticed, isomorphic sets (at least when they are finite) have
the same number of elements – i.e., the same size. We shall now use this to
define the size of a set!
First, some notations:

Definition 6.1.1. (a) If n ∈ N, then [n] shall mean the set {1, 2, . . . , n}.
For example, [3] = {1, 2, 3} and [7] = {1, 2, 3, 4, 5, 6, 7} and [0] = ∅ and
[1] = {1}.
(b) If a, b ∈ Z, then [ a, b] shall mean the set

{ a, a + 1, a + 2, . . . , b} = {all integers x satisfying a ≤ x ≤ b}


= { x ∈ Z | a ≤ x ≤ b} .

If a > b, then this is understood to be the empty set.


For example, [2, 6] = {2, 3, 4, 5, 6} and [3, 3] = {3} and [4, 2] = ∅.

Now, let us define the size of a finite set:

Definition 6.1.2. Let n ∈ N. A set S is said to have size n if S is isomorphic


to [n] (that is, if there exists a bijection from S to [n]).

For example:

• The set {“cat”, “dog”, “rat”} has size 3, since the map

{“cat”, “dog”, “rat”} → [3] ,


“cat” 7→ 1,
“dog” 7→ 2,
“rat” 7→ 3

is a bijection.
• The set {4, 5, 6, 7} has size 4, since the map

{4, 5, 6, 7} → [4] ,
k 7→ k − 3

is a bijection.
• The set N is infinite, so there is no bijection from N to [n] for any n ∈ N.
Thus, N does not have size n for any n ∈ N.
Math 221 Winter 2024, version March 12, 2024 page 234

Here is another equivalent definition of size:

Definition 6.1.3. We define the notion of a “set of size n” recursively as


follows:
(a) A set S is said to have size 0 if and only if it is empty.
(b) Let n be a positive integer. A set S is said to have size n if and only if
there exists an s ∈ S such that S \ {s} has size n − 1.

In other words, a set has size n (for n > 0) if and only if we can remove
a single element from it and obtain a set of size n − 1. This is a recursive
definition, as it reduces the question “what is a set of size n” to the (simpler)
question “what is a set of size n − 1”.
The following fact is not obvious, but can be proved:

Theorem 6.1.4. (a) The above two definitions of size (Definition 6.1.2 and
Definition 6.1.3) are equivalent.
(b) The size of a finite set is determined uniquely – i.e., a set cannot have
two different sizes at the same time.

Now, we are ready to introduce some notations for sizes of sets:

Definition 6.1.5. (a) An n-element set (for some n ∈ N) means a set of size
n.
(b) A set is said to be finite if it has size n for some n ∈ N.
(c) If S is a finite set, then |S| shall denote the size of S (which is unique
because of Theorem 6.1.4 (b)).
(d) We also refer to |S| as the cardinality of S, or as the number of elements
of S. In particular, the number of some things means the size of the set of
these things.

Thus, our examples above show that

|{“cat”, “dog”, “rat”}| = 3 and |{4, 5, 6, 7}| = 4.

The number of odd integers between 4 and 10 is the size of the set

{odd integers between 4 and 10} = {5, 7, 9} ,

and thus equals 3.


(Don’t forget that sets cannot “contain an element more than once”. Thus,
|{5, 6, 5}| is 2, not 3.)
Math 221 Winter 2024, version March 12, 2024 page 235

6.1.2. Rules for sizes of finite sets


We have defined the size |S| of a finite set S in Subsection 6.1.1. Let us now state
some rules for these sizes that make them easier to compute. We will not prove
these rules, as they are all dictated by common sense and their rigorous proofs
would reasonably belong into a text on formalized foundations of mathematics.

The most important rule is the following:

Theorem 6.1.6 (Bijection Principle). Let A and B be two finite sets. Then,
| A| = | B| if and only if there exists a bijection from A to B.

(As we recall, a “bijection” means a bijective map. By Theorem 5.10.2, this is


the same as a map that has an inverse.)
The next rule is obvious from one of our definitions of size:

Theorem 6.1.7. For each n ∈ N, we have |[n]| = n.

Here, we recall that [n] means the set {1, 2, . . . , n} consisting of the first n
positive integers.
The next rule classifies sets of small size:

Theorem 6.1.8. Let S be a set. Then:


(a) We have |S| = 0 if and only if S = ∅ (that is, if S is empty).
(b) We have |S| = 1 if and only if S = {s} for a single element s.
(c) We have |S| = 2 if and only if S = {s, t} for two distinct elements s and
t.
The next rule says that inserting a new element into a finite set increases the
size of this set by 1:

Theorem 6.1.9. Let S be a finite set. Let t be an object such that t ∈


/ S (that is,
t does not belong to S). Then,

|S ∪ {t}| = |S| + 1.

Here are some more substantial facts:

Theorem 6.1.10 (Sum rule for two sets). Let A and B be two disjoint finite
sets. (Recall that “disjoint” means A ∩ B = ∅.) Then, the set A ∪ B is again
finite, and has size
| A ∪ B| = | A| + | B| .
Math 221 Winter 2024, version March 12, 2024 page 236

Theorem 6.1.11 (Sum rule for k sets). Let A1 , A2 , . . . , Ak be k disjoint finite


sets. (Recall that “disjoint” for k sets means that any two of them are disjoint,
i.e., that Ai ∩ A j = ∅ for any i < j.) Then, the set A1 ∪ A2 ∪ · · · ∪ Ak is finite,
and has size

| A1 ∪ A2 ∪ · · · ∪ A k | = | A1 | + | A2 | + · · · + | A k | .

Theorem 6.1.12 (Difference rule). Let T be a subset of a finite set S. Then:


(a) The set T is finite, and its size | T | satisfies | T | ≤ |S|.
(b) We have |S \ T | = |S| − | T |.
(c) If | T | = |S|, then T = S.

The following theorem has been previously stated (without using the “size”
terminology) as Theorem 4.4.8:

Theorem 6.1.13 (Product rule for two sets). Let A and B be any finite sets.
Then, the set

A × B = {all pairs ( a, b) with a ∈ A and b ∈ B}

is again finite and has size

| A × B| = | A| · | B| .

Likewise, the following theorem was Theorem 4.4.10:

Theorem 6.1.14 (Product rule for k sets). Let A1 , A2 , . . . , Ak be any k finite


sets. Then, the set

A1 × A2 × · · · × Ak = {all k-tuples ( a1 , a2 , . . . , ak ) with ai ∈ Ai for each i ∈ [k]}

is again finite and has size

| A1 × A2 × · · · × A k | = | A1 | · | A2 | · · · · · | A k | .

All the above theorems are foundational, and are perhaps the reason why the
arithmetic operations +, − and · on nonnegative integers have been introduced
some millennia ago. Nevertheless, they can be rigorously proved, but this is
not something we will do here.62
62 Forinstance, Theorem 6.1.13 can be proved by induction on | B| using Theorem 6.1.11 and
Theorem 6.1.6, whereas Theorem 6.1.14 can be proved by induction on k using Theorem
6.1.13.
Math 221 Winter 2024, version March 12, 2024 page 237

The above theorems are known as “basic counting rules” or “counting prin-
ciples”. There are a few more counting principles, which we might state later
on.

6.1.3. A ∪ B and A ∩ B revisited


As a first application of these rules, let us derive the following “generalized
sum rule for two sets”:

Theorem 6.1.15. Let A and B be two finite sets (not necessarily disjoint).
Then, the set A ∪ B is finite and has size

| A ∪ B| = | A| + | B| − | A ∩ B| .

Partial proof. We shall take for granted that A ∪ B is finite, and only prove the
equality | A ∪ B| = | A| + | B| − | A ∩ B| here.
We first claim that
( A ∪ B) \ A = B \ ( A ∩ B) . (53)
This equality is obvious using Venn diagrams, but let us prove it rigorously
using “element chasing”:63
Proof of (53). Let us first prove ( A ∪ B) \ A ⊆ B \ ( A ∩ B). In order to do so, we
must show that each x ∈ ( A ∪ B) \ A belongs to B \ ( A ∩ B). Let us do this:
Let x ∈ ( A ∪ B) \ A. Thus, x ∈ A ∪ B but x ∈ / A. From x ∈ A ∪ B, we see
that x ∈ A or x ∈ B. But the first of these two possibilities is impossible (since
x∈/ A). Thus, the second possibility must hold. In other words, we have x ∈ B.
Furthermore, we have x ∈ / A ∩ B (since x ∈ A ∩ B would entail x ∈ A ∩ B ⊆ A,
which would contradict x ∈ / A). Combining x ∈ B with x ∈ / A ∩ B, we obtain
x ∈ B \ ( A ∩ B ).
Forget that we fixed x. We thus have shown that each x ∈ ( A ∪ B) \ A belongs
to B \ ( A ∩ B). In other words, ( A ∪ B) \ A ⊆ B \ ( A ∩ B).
Next, let us prove that B \ ( A ∩ B) ⊆ ( A ∪ B) \ A. To do so, we must show
that each x ∈ B \ ( A ∩ B) belongs to ( A ∪ B) \ A. We do this as follows: Let
x ∈ B \ ( A ∩ B). Thus, x ∈ B but x ∈ / A ∩ B. If we had x ∈ A, then we would
have x ∈ A ∩ B (since x ∈ A and x ∈ B), which would contradict x ∈ / A ∩ B.
Hence, we cannot have x ∈ A. Thus, x ∈ / A. Also, x ∈ B ⊆ A ∪ B. Combining
this with x ∈/ A, we find x ∈ ( A ∪ B) \ A.
Forget that we fixed x. We thus have shown that each x ∈ B \ ( A ∩ B) belongs
to ( A ∪ B) \ A. In other words, B \ ( A ∩ B) ⊆ ( A ∪ B) \ A.
Now, combining the two inclusions

( A ∪ B) \ A ⊆ B \ ( A ∩ B) and B \ ( A ∩ B) ⊆ ( A ∪ B) \ A,
63 Itis worth noting that both sides of (53) are equal to B \ A. However, we will not need this
fact.
Math 221 Winter 2024, version March 12, 2024 page 238

we obtain ( A ∪ B) \ A = B \ ( A ∩ B). Thus, (53) is proved.]


We now step to the counting. Taking sizes on both sides of (53), we obtain

|( A ∪ B) \ A| = | B \ ( A ∩ B)| . (54)

But A is a subset of A ∪ B. Thus, the difference rule (Theorem 6.1.12 (b),


applied to S = A ∪ B and T = A) yields

|( A ∪ B) \ A| = | A ∪ B| − | A| . (55)

Also, A ∩ B is a subset of B. Thus, the difference rule (Theorem 6.1.12 (b),


applied to S = B and T = A ∩ B) yields

| B \ ( A ∩ B)| = | B| − | A ∩ B| . (56)

But we know from (54) that the left hand sides of the two equalities (55) and
(56) are equal. Thus, their right hand sides are also equal. In other words,

| A ∪ B| − | A| = | B| − | A ∩ B| .

Solving this for | A ∪ B|, we find

| A ∪ B| = | A| + | B| − | A ∩ B| .

This proves Theorem 6.1.15.


(A nicer proof can be given using finite sums; this is done, e.g., in [Grinbe22,
Lecture 19, §2.7].)
Note that Theorem 6.1.15 has an analogue for three sets: If A, B, C are three
finite sets, then

| A ∪ B ∪ C | = | A| + | B| + |C | − | A ∩ B| − | A ∩ C | − | B ∩ C | + | A ∩ B ∩ C | .

More generally, such a formula can be stated for any k finite sets, and is known
as the “principle of inclusion and exclusion” or “Sylvester’s sieve formula”. See
[Grinbe22, Lecture 19, §2.7] or any textbook on combinatorics.

Exercise 6.1.1. Let A and B be two finite sets. Prove that

| A ∪ B| = | A \ B| + | B \ A| + | A ∩ B| .

6.2. Redoing some proofs rigorously


Previously (in Section 4.2), we have proved some results using informal count-
ing arguments. Let us now revisit these results and make these arguments
rigorous.
Math 221 Winter 2024, version March 12, 2024 page 239

6.2.1. Integers in an interval


We recall that the notation [ a, b] means the set

{ a, a + 1, a + 2, . . . , b} = { x ∈ Z | a ≤ x ≤ b}
whenever a and b are two integers. In particular, [n] = [1, n] for every n ∈ N.
We begin with Proposition 4.2.2 (rewritten using the notation [ a, b]):

Proposition 6.2.1. Let a, b ∈ Z be such that a ≤ b + 1.


Then, there are exactly b − a + 1 numbers in the set [ a, b]. In other words,
there are exactly b − a + 1 integers between a and b (inclusive).

Back in Section 4.2, we proved this informally by inducting on b. This proof


can be trivially made rigorous; the induction step relies on Theorem 6.1.9 (be-
cause [ a, b + 1] = [ a, b] ∪ {b + 1} and b + 1 ∈
/ [ a, b]).
But there is also a more direct proof:
Second proof of Proposition 6.2.1. Consider the map

f : [ b − a + 1] → [ a, b] ,
| {z } |{z}
={1,2,...,b− a+1} ={ a,a+1,...,b}
i 7 → i + ( a − 1) .

This map f just adds a − 1 to its input. (Informally, we can view it as moving
numbers to the right by a − 1 units on the number line.)
It is easy to see that this map f has an inverse: Namely, the map

[ a, b] → [b − a + 1] ,
j 7 → j − ( a − 1)

is an inverse of f (since subtraction undoes addition). Thus, the map f is


bijective (by Theorem 5.10.2), i.e., is a bijection. Hence, there is a bijection from
[b − a + 1] to [ a, b] (namely, f ). The bijection principle (Theorem 6.1.6, applied
to A = [b − a + 1] and B = [ a, b]) thus yields

|[b − a + 1]| = |[ a, b]| .


Hence,
|[ a, b]| = |[b − a + 1]| = b − a + 1
(by Theorem 6.1.7, since a ≤ b + 1 yields b − a + 1 ∈ N). In other words, there
are exactly b − a + 1 numbers in the set [ a, b]. This proves Proposition 6.2.1
again.
We could also reprove Proposition 4.2.1 rigorously, but (again) the proof we
gave was already rigorous enough; we just need to rewrite it using Theorem
6.1.9.
Math 221 Winter 2024, version March 12, 2024 page 240

6.2.2. Counting all subsets


Now, we recall Theorem 4.3.1 (but shorten it using the notation [n] for {1, 2, . . . , n}):

Theorem 6.2.2. Let n ∈ N. Then,

(# of subsets of [n]) = 2n .

The proof we gave in Section 4.3 had some informal steps; let us now make
it rigorous:64
Rigorous proof of Theorem 6.2.2. We induct on n.
The base case (n = 0) is easy: The set [0] is empty, and thus its only subset is
{} itself; hence, the # of subsets of [0] is 1 = 20 . In other words, Theorem 6.2.2
holds for n = 0.
Induction step: We proceed from n − 1 to n. Thus, let n be a positive integer.
We assume (as the induction hypothesis) that Theorem 6.2.2 holds for n − 1
instead of n, and we set out to prove that it holds for n.
So our induction hypothesis says that

(# of subsets of [n − 1]) = 2n−1 .

Our goal is to prove that

(# of subsets of [n]) = 2n .

We define

• a red set to be a subset of [n] that contains n;

• a green set to be a subset of [n] that does not contain n.

For example, if n = 3, then the red sets are

{3} , {1, 3} , {2, 3} , {1, 2, 3} ,

whereas the green sets are

{} , {1} , {2} , {1, 2} .

A set cannot be red and green at the same time. In other words, the sets

{red sets} and {green sets}

64 Most of the proof below is copied verbatim from Section 4.3.


Math 221 Winter 2024, version March 12, 2024 page 241

are disjoint65 . Hence, the sum rule for two sets (Theorem 6.1.10, applied to
A = {red sets} and B = {green sets}) yields

|{red sets} ∪ {green sets}| = |{red sets}| + |{green sets}| .

(This is just a formal way to say “the # of all sets that are red or green equals
the # of red sets plus the # of green sets”. Indeed, the notation {red sets} means
the set of all red sets, and thus the expression |{red sets}| means the size of the
set of all red sets, i.e., the # of all red sets.)
Furthermore, each subset of [n] is either red or green (and conversely, each
red or green set is a subset of [n]). Hence,

{subsets of [n]} = {red sets} ∪ {green sets} .

Therefore,

|{subsets of [n]}| = |{red sets} ∪ {green sets}|


= |{red sets}| + |{green sets}|

(as we have proved above). In other words,

(# of subsets of [n]) = (# of red sets) + (# of green sets) . (57)

Thus it remains to count the red sets and the green sets separately.
The green sets are easy: They are just the subsets of [n − 1]. Hence,

(# of green sets) = (# of subsets of [n − 1]) = 2n−1

(by the induction hypothesis).


Counting the red sets is trickier. In Section 4.3, we did this by setting up a
one-to-one correspondence between the red sets and the green sets. Formally, a
one-to-one correspondence is just a bijection. Thus, our one-to-one correspon-
dence should become a bijection from {green sets} to {red sets} (i.e., from the
set of all green sets to the set of all red sets).
As we recall, we obtained this correspondence as follows: To turn a green set
red, we insert n into it; conversely, to turn a red set green, we remove n from it.
Rigorously, this means that we define two maps

insn : {green sets} → {red sets} ,


G 7→ G ∪ {n}

65 Keep
in mind: The notation “{red sets}” stands for the set of all red sets. For example, if
n = 3, then
{red sets} = {{3} , {1, 3} , {2, 3} , {1, 2, 3}} .
Math 221 Winter 2024, version March 12, 2024 page 242

and
remn : {red sets} → {green sets} ,
R 7→ R \ {n} .

It is easy to see that both of these maps insn and remn are well-defined66 . A
little bit of set-theoretic computation shows that
insn (remn ( R)) = R for every red set R
(because if R is a red set, then
insn (remn ( R)) = remn ( R) ∪ {n} (by the definition of insn )
| {z }
= R\{n}
(by the definition of remn )
= ( R \ {n}) ∪ {n} = R (since n ∈ R (because R is red))
). Similarly,
remn (insn ( G )) = G for every green set G.
These two equalities show that the map remn is an inverse of insn . Hence, the
map insn has an inverse, i.e., is bijective (by Theorem 5.10.2). In other words,
insn is a bijection. Hence, there exists a bijection from {green sets} to {red sets}
(namely, insn ). Thus, the bijection principle yields
|{green sets}| = |{red sets}| .
In other words,
(# of green sets) = (# of red sets) ,
and thus
(# of red sets) = (# of green sets) = 2n−1 .
Combining what we have shown, we now obtain
(# of subsets of [n]) = (# of red sets) + (# of green sets)
| {z } | {z }
=2n −1 =2n −1
n −1 n −1 n −1 n
=2 +2 = 2·2 =2 .
This is precisely what we needed to prove. This completes the induction step,
and thus Theorem 6.2.2 is proved.
66 Indeed, we need to show that
• if G is a green set, then G ∪ {n} is a red set;
• if R is a red set, then R \ {n} is a green set.
Both of these claims are very easy. For instance, if G is a green set, then G is a subset of
[n], and thus G ∪ {n} is a subset of [n] as well (since n ∈ [n]), and furthermore is red (since
n ∈ {n} ⊆ G ∪ {n}).
Math 221 Winter 2024, version March 12, 2024 page 243

Theorem 4.3.2 says the following:

Theorem 6.2.3. Let n ∈ N. Let S be an n-element set. Then,

(# of subsets of S) = 2n .

Rigorous proof. Informally, we derived this from Theorem 6.2.2 by renaming the
elements of S as 1, 2, . . . , n (so that S became the set [n]).
Rigorously, this means setting up a one-to-one correspondence between the
subsets of S and the subsets of [n], and then using the bijection principle to
argue that the # of the former equals the # of the latter.
How do we get this correspondence? First, we set up a one-to-one corre-
spondence between the elements of S and the elements of [n]. (This is what
the “renaming” in our informal proof was secretly doing.) Formally, this can
be done as follows:
The set S is an n-element set, i.e., has size n. Hence, by the definition of size,
the set S is isomorphic to [n]. In other words, there is a bijection α : S → [n].
Consider this α. Being a bijection, the map α has an inverse α−1 (by Theorem
5.10.2).
Now, define a map

α∗ : {subsets of S} → {subsets of [n]} ,


T 7→ {α (t) | t ∈ T } .

Explicitly, this map α∗ sends every subset {s1 , s2 , . . . , sk } of S to the subset


{α (s1 ) , α (s2 ) , . . . , α (sk )} of [n]; that is, it applies α to every element of the
input subset. (For example, if n = 3 and S = {“cat”, “dog”, “rat”} and if
α (“cat”) = 1 and α (“dog”) = 2 and α (“rat”) = 3, then α∗ ({“cat”, “rat”}) =
{1, 3}.)
Conversely, we can define a map
 
α−1 : {subsets of [n]} → {subsets of S} ,
∗ n o
T 7 → α −1 ( t ) | t ∈ T .

(This map α−1 ∗ is defined in the same way as α∗ , but using the map α−1


instead of α. For example, if n = 3 and S = {“cat”, “dog”, “rat” } and if


− 1

α (“cat”) = 1 and α (“dog”) = 2 and α (“rat”) = 3, then α ∗ ({2, 3}) =
{“dog”, “rat”}.)
It is easy to see that the map α−1 ∗ is an inverse of α∗ (because applying α


to each element of a given set and then applying α−1 to the results will recover
the original set, and likewise if you apply α−1 first and then α). Thus, the
map α∗ has an inverse, i.e., is a bijection (by Theorem 5.10.2). Thus, we have
Math 221 Winter 2024, version March 12, 2024 page 244

found a bijection from {subsets of S} to {subsets of [n]} (namely, α∗ ). Hence,


the bijection principle (Theorem 6.1.6) yields

|{subsets of S}| = |{subsets of [n]}| .

In other words,

(# of subsets of S) = (# of subsets of [n]) = 2n

(by Theorem 6.2.2).

6.2.3. Counting all k-element subsets


We move on to counting subsets of a given size.
Theorem 4.3.3 says:

Theorem 6.2.4. Let n ∈ N, and let k be any number (not necessarily an


integer). Let S be an n-element set. Then,
 
n
(# of k-element subsets of S) = .
k

Rigorous proof. We induct on n (without fixing k). That is, we use induction on
n to prove the statement

“for any number k and any n-element set S,


 
 
P (n) :=  n 
we have (# of k-element subsets of S) = ”
k

for each n ∈ N.
Base case: Let k be any number. The only 0-element set is ∅, and its only
subset is ∅. Thus, a 0-element set S necessarily has one 0-element subset (∅)
and no other subsets. Hence, it satisfies
(
1, if k = 0;
(# of k-element subsets of S) =
0, else.

However, we also have


  (
0 1, if k = 0;
=
k 0, else
(this follows easily from the definition of binomial coefficients). By comparing
these two equalities, we see that any 0-element set S satisfies
 
0
(# of k-element subsets of S) = .
k
Math 221 Winter 2024, version March 12, 2024 page 245

In other words, P (0) holds.


Induction step: Let n be a positive integer. Assume (as the induction hypoth-
esis) that P (n − 1) holds. We must prove that P (n) holds.
So we consider any number k and any n-element set S. We must prove that
 
n
(# of k-element subsets of S) = .
k

We rename the n elements of S as 1, 2, . . . , n (this corresponds formally to


constructing a bijection α : S → [n] and applying it elementwise to subsets of
S, as we did in the proof of Theorem 6.2.3), so we must prove that
 
n
(# of k-element subsets of [n]) = .
k

To prove this, we define

• a red set to be a k-element subset of [n] that contains n;

• a green set to be a k-element subset of [n] that does not contain n.

For instance, for n = 4 and k = 2, the red sets are

{1, 4} , {2, 4} , {3, 4} ,

while the green sets are

{1, 2} , {1, 3} , {2, 3} .

Each k-element subset of [n] is either red or green (but not both). Hence,
using the sum rule for two sets, we find

(# of k-element subsets of [n])


= (# of red sets) + (# of green sets) . (58)

(This is proved just as we proved (57) in the rigorous proof of Theorem 6.2.2.)
The green sets are just the k-element subsets of [n − 1]. Thus,

(# of green sets) = (# of k-element subsets of [n − 1])


n−1
 
=
k

(by the statement P (n − 1), which we have assumed to hold).


Now, let’s try to count the red sets.
Math 221 Winter 2024, version March 12, 2024 page 246

Let us refer to the (k − 1)-element subsets of [n − 1] as blue sets. If R is a red


set, then R \ {n} is a blue set67 . Thus, we obtain a map

remn : {red sets} → {blue sets} ,


R 7→ R \ {n} .

Conversely, if B is a blue set, then B ∪ {n} is a red set68 . Thus, we obtain a map

insn : {blue sets} → {red sets} ,


B 7→ B ∪ {n} .

These two maps remn and insn are mutually inverse69 . Thus, the map remn
has an inverse, i.e., is bijective (by Theorem 5.10.2). Hence, we have found a
bijection from {red sets} to {blue sets} (namely, remn ). The bijection principle
therefore yields
|{red sets}| = |{blue sets}| .
In other words,

(# of red sets) = (# of blue sets)


= (# of (k − 1) -element subsets of [n − 1])
(since this is how the blue sets were defined)
n−1
 
=
k−1

(again by the statement P (n − 1), but now applied to k − 1 instead of k). Note
that we deliberately formulated P (n) as a “for any k” statement (rather than
fixing k at the onset of our proof), so that we were now able to apply P (n − 1)
to k − 1 instead of k.

67 Proof.Let R be a red set. Then, R is a k-element set (by the definition of a red set), so that
| R| = k. Moreover, n ∈ R (by the definition of a red set), so that {n} ⊆ R. Hence, the
difference rule (Theorem 6.1.12 (b), applied to S = R and T = {n}) yields | R \ {n}| =
| R| − |{n}| = k − 1. Hence, R \ {n} is a (k − 1)-element set. Since R \ {n} is furthermore
|{z} | {z }
=k =1
a subset of [n − 1] (because R is a subset of [n], and we are removing n from it), we thus
conclude that R \ {n} is a (k − 1)-element subset of [n − 1], that is, a blue set.
68 Proof. Let B be a blue set. Then, B is a ( k − 1)-element subset of [ n − 1] (by the definition

of “blue set”). In other words, | B| = k − 1 and B ⊆ [n − 1]. From B ⊆ [n − 1], we obtain


n ∈ / B (since n ∈ / [n − 1]). Hence, Theorem 6.1.9 (applied to S = B and t = n) yields
| B ∪ {n}| = | B| + 1 = k (since | B| = k − 1). Hence, B ∪ {n} is a k-element set. Furthermore,
B ∪ {n} is a subset of [n] (since B ⊆ [n − 1] ⊆ [n] and {n} ⊆ [n]) that contains n (since
n ∈ {n} ⊆ B ∪ {n}). Thus, B ∪ {n} is a k-element subset of [n] that contains n. In other
words, B ∪ {n} is a red set (by the definition of “red set”).
69 This can be proved just as in our above proof of Theorem 6.2.2.
Math 221 Winter 2024, version March 12, 2024 page 247

Now, (58) becomes


(# of k-element subsets of [n]) = (# of red sets) + (# of green sets)
|  {z  } |  {z  }
n−1 n−1
= =
k−1 k
n−1 n−1
     
n
= + =
k−1 k k
by Pascal’s recurrence (Theorem 2.5.1). But this is precisely the equality that
we have to prove. This completes the induction step, and thus Theorem 6.2.4 is
proved.
Remark 6.2.5. Our above proof of Theorem 6.2.4 can be simplified: There is
no need to “rename” the elements of S as 1, 2, . . . , n in the induction step.
Instead, we could have just as well picked an arbitrary element t of S (such
an element exists, since |S| = n > 0 entails that S is nonempty) and defined

• a red set to be a k-element subset of S that contains t;

• a green set to be a k-element subset of S that does not contain t.

Then, a simple application of Theorem 6.1.12 (b) would have shown that
S \ {t} is an (n − 1)-element set, so we could apply our induction hypothesis
P (n − 1) to it. Thus, the above argument could be made using S, t and
S \ {t} instead of [n], n and [n − 1]. In particular, the green sets would be
precisely the k-element subsets of S \ {t}, whereas the red sets would be in
one-to-one correspondence (i.e., bijection) with the (k − 1)-element subsets
of S \ {t} (and the bijection would be given by removing/inserting t). This
argument would be not only shorter but also more conceptual than the one
we gave above.
However, I chose to give the proof I gave because it has the advantage of
familiarity (the set [n] = {1, 2, . . . , n} is easier to visualize than an arbitrary
n-element set), and in order to illustrate how the bijection principle can be
used to rename the elements of a given set in a convenient way.
Likewise, Theorem 6.2.3 could also be proved more directly: Instead of
deducing it from Theorem 6.2.2 via “renaming”, we could have proved it
by induction, again picking an element t of S in the induction step, defin-
ing red and green sets, and counting both kinds of sets using the induction
hypothesis (applied to the (n − 1)-element set S \ {t}).
Let us derive a nice, if simple, corollary from our last few theorems:
Corollary 6.2.6. Let n ∈ N. Then,
n  
n
∑ k = 2n .
k =0
Math 221 Winter 2024, version March 12, 2024 page 248

Proof. Consider the n-element set [n] = {1, 2, . . . , n}. This set has size n, so each
subset of [n] must have size ≤ n (by Theorem 6.1.12 (a)). Hence, each subset of
[n] has size 0 or size 1 or size 2 or · · · or size n. Thus, we can write the set

{subsets of [n]}

as a union

{0-element subsets of [n]}


∪ {1-element subsets of [n]}
∪ {2-element subsets of [n]}
∪···
∪ {n-element subsets of [n]} .

Furthermore, this union is a union of disjoint sets (since a subset of [n] cannot
have several distinct sizes at once). Therefore, the sum rule for k sets (Theorem
6.1.11) yields

|{subsets of [n]}|
= |{0-element subsets of [n]}|
+ |{1-element subsets of [n]}|
+ |{2-element subsets of [n]}|
+···
+ |{n-element subsets of [n]}| .
70

70 In more details:
The n + 1 sets

{0-element subsets of [n]} ,


{1-element subsets of [n]} ,
{2-element subsets of [n]} ,
...,
{n-element subsets of [n]}

are finite (since [n] has only finitely many subsets) and disjoint (since a subset of [n] cannot
Math 221 Winter 2024, version March 12, 2024 page 249

have several distinct sizes at once). Thus, the sum rule for k sets (Theorem 6.1.11) yields

{0-element subsets of [n]} ∪ {1-element subsets of [n]}

{2-element subsets of [n]} ∪ · · · ∪ {n-element subsets of [n]}


= |{0-element subsets of [n]}|
+ |{1-element subsets of [n]}|
+ |{2-element subsets of [n]}|
+···
+ |{n-element subsets of [n]}| .

Since
{subsets of [n]}
is the union

{0-element subsets of [n]}


∪ {1-element subsets of [n]}
∪ {2-element subsets of [n]}
∪···
∪ {n-element subsets of [n]} ,

we can rewrite this equality as

|{subsets of [n]}|
= |{0-element subsets of [n]}|
+ |{1-element subsets of [n]}|
+ |{2-element subsets of [n]}|
+···
+ |{n-element subsets of [n]}| .
Math 221 Winter 2024, version March 12, 2024 page 250

In other words,

(# of subsets of [n])
= (# of 0-element subsets of [n])
+ (# of 1-element subsets of [n])
+ (# of 2-element subsets of [n])
+···
+ (# of n-element subsets of [n])
n
= ∑ (|# of k-element{zsubsets of [n])
k =0   }
n
=
k
(by Theorem 6.2.4, applied to S=[n])
n  
n
= ∑ .
k =0
k

Thus,
n  
n
∑ k = (# of subsets of [n]) = 2n
k =0
(by Theorem 6.2.2).
Corollary 6.2.6 can also be easily obtained from the binomial formula (this
was part of Exercise 2.6.1 (a)). Our above proof, however, reveals its combina-
torial meaning: It comes from the comparison of two different ways to count
one and the same thing (viz., the subsets of [n]). This technique of proving
equalities is called double counting, and has multiple other applications (see,
e.g., [Newste23, §8.1]).

6.2.4. Recounting pairs


Proposition 4.4.3 says:

Proposition 6.2.7. Let n ∈ N. Then:


(a) The # of pairs ( a, b) with a, b ∈ [n] is n2 .
(b) The # of pairs ( a, b) with a, b ∈ [n] and a < b is 1 + 2 + · · · + (n − 1).
(c) The # of pairs ( a, b) with a, b ∈ [n] and a = b is n.
(d) The # of pairs ( a, b) with a, b ∈ [n] and a > b is 1 + 2 + · · · + (n − 1).

Let us reprove part (b) of this proposition rigorously:


Rigorous proof of Proposition 6.2.7 (b) (sketched). If ( a, b) is a pair with a, b ∈ [n]
and a < b, then the first entry of this pair (that is, the number a) must be one
Math 221 Winter 2024, version March 12, 2024 page 251

of the numbers 1, 2, . . . , n − 1 (because a < b ≤ n forces a to be ≤ n − 1). Thus,


by the sum rule for k sets (Theorem 6.1.11), we have

(# of pairs ( a, b) with a, b ∈ [n] and a < b)


n −1
= ∑ (|# of pairs ( a, b) with a, b ∈ [n] and a < b and a = k)
k =1
{z }
=n−k
(because these pairs are (k, k+1), (k, k+2), ..., (k,n)
(strictly speaking, this argument is an application of
the bijection principle))
n −1
= ∑ (n − k) = (n − 1) + (n − 2) + · · · + (n − (n − 1))
k =1
= ( n − 1) + ( n − 2) + · · · + 1
= 1 + 2 + · · · + ( n − 1) ,
and thus Proposition 6.2.7 (b) is proven.

Exercise 6.2.1. Let n ∈ N. Consider the set [2n] = {1, 2, . . . , 2n}.


A set of integers will be called parity-ambivalent if it contains at least
one even element and at least one odd element. (For instance, {2, 4, 5} is
parity-ambivalent, but {2, 4, 10} is not.)
Compute the # of all parity-ambivalent subsets of [2n].
[Hint: How many subsets of [2n] contain no even element? How many
contain no odd element? How many contain neither?]

Exercise 6.2.2. Let n ∈ N. Compute the # of pairs ( A, B) of subsets of [n]


that satisfy A ∩ B = ∅.
(For example, if n = 2, then this # is 9, since there are 9 such pairs:

(∅, ∅) , (∅, {1}) , (∅, {2}) , (∅, {1, 2}) ,


({1} , ∅) , ({1} , {2}) , ({2} , ∅) , ({2} , {1}) ,
({1, 2} , ∅) .

Exercise 6.2.3. Let n ∈ N. Compute the # of pairs ( A, B) of subsets of [n]


that satisfy A ⊆ B.
(For example, if n = 2, then this # is 9, since there are 9 such pairs:

(∅, ∅) , (∅, {1}) , (∅, {2}) , (∅, {1, 2}) ,


({1} , {1}) , ({1} , {1, 2}) , ({2} , {2}) , ({2} , {1, 2}) ,
({1, 2} , {1, 2}) .

)
Math 221 Winter 2024, version March 12, 2024 page 252

6.3. Where do we stand now?


Recall the introductory counting problems from the start of Chapter 4 (before
Section 4.2). We can now answer some of these:

• How many ways are there to choose 3 odd integers between 0 and 20,
if the order matters (i.e., we count the choice 1, 3, 5 as different from the
choice 3, 1, 5)? (The answer is 1000.)
We can solve this now: To choose 3 odd integers between 0 and 20, if
the order matters, amounts to choosing a 3-tuple ( a, b, c) where a, b, c ∈
{1, 3, 5, . . . , 19}. Since this set {1, 3, 5, . . . , 19} is a 10-element set (because
Proposition 4.2.1 yields that the # of odd integers between 0 and 20 is
(20 + 1) //2 = 10), the # of these 3-tuples is 10 · 10 · 10 = 1000 (by Theo-
rem 4.4.5).

• How many ways are there to choose 3 odd integers between 0 and 20, if
the order does not matter? (The answer is 220.)
We cannot solve this yet, at least not if the values 3 and 20 are generalized
to k and n. This will be done in Theorem 6.6.9.

• How many ways are there to choose 3 distinct odd integers between 0 and
20, if the order matters? (The answer is 720.)
We cannot solve this yet, at least not if the values 3 and 20 are generalized
to k and n. This will be done in Theorem 6.2.4.

• How many ways are there to choose 3 distinct odd integers between 0 and
20, if the order does not matter? (The answer is 120.)
We can solve this now: This amounts to counting the 3-element subsets
of {1, 3, 5, . . . , 19}; but Theorem 6.2.4 answers such questions. Since  the
10
set {1, 3, 5, . . . , 19} has size 10, its number of 3-element subsets is =
3
10 · 9 · 8
= 120.
3!
• How many prime factorizations does 200 have (where we count different
orderings as distinct)? (The answer is 10. This is a mix between a number
theory problem and a counting problem.)
We can solve this now, at least for 200: We know that 200 = 2 · 2 · 2 · 5 · 5.
Thus, by the fundamental theorem of arithmetic, all prime factorizations
of 200 consist of five factors, three of which are 2’s and two of which
are 5’s. The only freedom is in choosing where to place the three 2’s
among the five positions (of course, the two 5’s will then have to occupy
the remaining positions). There are 5 factors in total, so 5 positions, and
we have to choose 3 of these 5 positions to put our three 2’s in. This
Math 221 Winter 2024, version March 12, 2024 page 253

is tantamount to choosing a 3-element subset of [5] (the subset  of the


5
positions in which we put the 2’s), and the # of ways to do this is = 10
3
(by Theorem 6.2.4). Thus, 200 has 10 prime factorizations (if we count
different orderings as distinct).
However, it is trickier to extend this reasoning to prime factorizations of
150. Indeed, 150 = 2 · 3 · 5 · 5, so a prime factorization of 150 has one 2,
one 3 and two 5’s. How many ways are there to place one 2, one 3 and
two 5’s in altogether four positions? I’ll leave this one to you for now, but
we will come back to this later.

• How many ways are there to tile a 2 × 15-rectangle with dominos (i.e.,
rectangles of size 1 × 2 or 2 × 1) ? (The answer is 987.)
We cannot solve this yet. But we will outline a solution in Subsection
6.4.6.

• How many addends do you get when you expand the product
( a + b) (c + d + e) ( f + g) ? (The answer is 12.)
We can solve this now: Each addend consists of exactly one of a and b,
exactly one of c, d and e, and exactly one of f and g. So the addends are
in one-to-one correspondence with the triples ( x, y, z) where x ∈ { a, b}
and y ∈ {c, d, e} and z ∈ { f , g}. Thus, their # is 2 · 3 · 2 (since { a, b} is a
2-element set, {c, d, e} is a 3-element set, and { f , g} is a 2-element set).
Note that we are using the fact that the addends all end up distinct, so
they don’t cancel or combine.

• How many differentmonomials do you get when you expand the product
( a − b) a2 + ab + b2 ? (This one is more of an algebra problem, but I
wanted to list it because it is connected to counting. The answer is 2,
because ( a − b) a2 + ab + b2 = a3 − b3 .)
This is not a combinatorics problem: The answer is 2, because we have
( a − b) a + ab + b = a − b3 . The other addends all cancel out, so you
2 2 3


get an answer much less than 6.


In general, problems like this (where you count addends after cancellation
and combination) cannot be solved combinatorially; you have to actually
expand and collect.

• How many positive divisors does 24 have? (We can actually list them:
1, 2, 3, 4, 6, 8, 12, 24. This one is again a mix of a counting problem and
a number theory problem.)
Okay, but let us generalize: How many positive divisors does a given
positive integer n have? We cannot solve this yet. In this course, we will
not get to solve it, but it is not too hard to solve using the methods we
Math 221 Winter 2024, version March 12, 2024 page 254

have learned (see [Grinbe19b, §2.18.1] or [Grinbe23b, Lecture 5, Exercise


5.3.3 (a)]).

6.4. Lacunar subsets


6.4.1. Definition
Another type of objects that can be counted are the so-called lacunar subsets
(also known as sparse subsets to some authors). Here is their definition:

Definition 6.4.1. A set S of integers is said to be lacunar if it contains no two


consecutive integers (i.e., if there is no integer i such that both i and i + 1
belong to S).

The word “lacunar” comes from Latin “lacuna” (= “gap”). The idea is that a
lacunar set has a “gap” (or “buffer zone”) between any two distinct elements.
For example, the set {2, 4, 7} is lacunar, but the set {2, 4, 5} is not (since 4 and
5 are consecutive integers). Any 1-element set of integers is lacunar, and so is
the empty set.
Now we can ask ourselves some natural questions: For given n ∈ N,
1. how many lacunar subsets does the set [n] = {1, 2, . . . , n} have?
2. how many k-element lacunar subsets does [n] have for a given k ∈ N?
3. what is the largest size of a lacunar subset of [n] ?
We shall answer all these three questions in this section.

6.4.2. The maximum size of a lacunar subset


We start with the third question, as it is the easiest one to answer. Recall the
floor notation (Definition 3.3.13).

Proposition  Let n ∈ N. Then, the maximum size of a lacunar subset


 6.4.2.
n+1
of [n] is .
2

Proof. The set


{all odd numbers in [n]} = {all odd integers between 1 and n (inclusive)}
= {all odd integers between 0 and n (inclusive)}
= {1, 3, 5, . . .} ∩ [n]
 
n+1
is a lacunar subset of [n], and has size (by Proposition 4.2.1). Thus, the
  2
n+1
size is attainable (for a lacunar subset of [n]).
2
Math 221 Winter 2024, version March 12, 2024 page 255

It remains to show that this size is the largest possible – i.e., that if L is a
lacunar subset of [n], then  
n+1
| L| ≤ .
2
 
n+1
So let L be a lacunar subset of [n]. Our goal is to prove that | L| ≤ .
2
n+1
We shall first prove that | L| ≤ .
2
Here are two different ways to prove this (each way illustrates a nice tech-
nique):
n+1
First proof of | L| ≤ . Let ℓ1 , ℓ2 , . . . , ℓk be the elements of L, listed in increas-
2
ing order, so that L = {ℓ1 , ℓ2 , . . . , ℓk } and ℓ1 < ℓ2 < · · · < ℓk . Thus, | L| = k.
Now, we assume (for the moment) that k > 0. Thus, k ≥ 1 (since k is
an integer). We have ℓ1 ∈ L ⊆ [n], so that ℓ1 ≥ 1. Moreover, the elements
ℓ1 and ℓ2 of L satisfy ℓ1 < ℓ2 and ℓ2 ̸= ℓ1 + 1 (since L is lacunar), so that
ℓ2 ≥ ℓ1 + 2 ≥ 1 + 2 = 3. Furthermore, the elements ℓ2 and ℓ3 of L satisfy
|{z}
≥1
ℓ2 < ℓ3 and ℓ3 ̸= ℓ2 + 1 (since L is lacunar), so that ℓ3 ≥ ℓ2 + 2 ≥ 3 + 2 = 5.
|{z}
≥3
Proceeding in the same way, we find that

ℓi ≥ 2i − 1 for each i ∈ [k] . (59)

(Strictly speaking, this can be proved by induction on i. The base case follows
from ℓ1 ≥ 1 = 2 · 1 − 1, whereas the induction step requires deriving ℓi+1 ≥
2 (i + 1) − 1 from ℓi ≥ 2i − 1, which can be done by observing that L is lacunar
and therefore ℓi+1 ≥ ℓi + 2 ≥ 2i − 1 + 2 = 2 (i + 1) − 1.)
|{z}
≥2i −1
Now, we can apply (59) to i = k, and thus obtain ℓk ≥ 2k − 1. However,
ℓk ∈ L ⊆ [n], so that ℓk ≤ n. Thus, n ≥ ℓk ≥ 2k − 1, so that n + 1 ≥ 2k and
n+1
thus ≥ k. We have proved this under the assumption that k > 0, but this
2
n+1
also holds in the opposite case (because if k ≤ 0, then ≥ 0 ≥ k). Thus, we
2
n+1
always have ≥ k (independently of any assumptions). In other words,
2
n+1 n+1
we have ≥ | L| (since k = | L|). In other words, we have | L| ≤ .
2 2
n+1
Second proof of | L| ≤ . Define a new set
2
L+ := {ℓ + 1 | ℓ ∈ L} .
Math 221 Winter 2024, version March 12, 2024 page 256

This set L+ consists of each element of L, incremented by 1. For example, if


L = {3, 5, 9}, then L+ = {4, 6, 10}. Another way to view L+ is as follows:

L+ = {i ∈ Z | i − 1 ∈ L }

(because an integer i satisfies i − 1 ∈ L if and only if it has the form ℓ + 1 for


some ℓ ∈ L).
The set L+ is just L with each element incremented by 1. Thus, | L+ | = | L|.
Moreover, since L is a subset of [n] = {1, 2, . . . , n}, we conclude that L+ is
a subset of {2, 3, . . . , n + 1}. Hence, both sets L and L+ are subsets of [n + 1].
Their union L ∪ L+ is thus a subset of [n + 1] as well. Therefore (by Theorem
6.1.12 (a), applied to S = [n + 1] and T = L ∪ L+ ), we conclude that

L ∪ L+ ≤ |[n + 1]| = n + 1.

If the sets L and L+ had an element j in common, then both j − 1 and j would
belong to L (indeed, j ∈ L+ = {i ∈ Z | i − 1 ∈ L} would entail j − 1 ∈ L),
which would contradict the fact that L is lacunar (since j − 1 and j are two
consecutive integers). Thus, the sets L and L+ have no element in common. In
other words, they are disjoint. Hence, by the sum rule (Theorem 6.1.10, applied
to A = L and B = L+ ), we have | L ∪ L+ | = | L| + L+ = | L| + | L| = 2 · | L|.
|{z}
=| L|
Hence,
2 · | L| = L ∪ L+ ≤ n + 1.
n+1
In other words, | L| ≤ .
2
n+1
We have now proved (in two different ways) that | L| ≤ . Now, recall
2
 If x is a real number, then ⌊ x ⌋ is the
the definition of the floor of a real number:
n+1 n+1
largest integer that is ≤ x. Hence, is the largest integer that is ≤ .
2   2
n+1 n+1
Therefore, any integer that is ≤ must also be ≤ . Applying this
2   2
n+1 n+1
to the integer | L|, we conclude that | L| ≤ (since | L| ≤ ). As
2 2
explained above, this completes the proof of Proposition 6.4.2.

6.4.3. Counting all lacunar subsets of [n]


Now let us count the lacunar subsets of [n]. We shall first count them all, then
count the ones of a given size k.
First, a few words about how to find answers to counting questions like this.
For any specific value of n, finding the # of lacunar subsets of [n] is a “finite
problem”: You can just count them all. Or, better, you can have your computer
Math 221 Winter 2024, version March 12, 2024 page 257

do this. In SageMath (a computer algebra system, one of the best suited to


combinatorial questions), this takes just a few lines:71

def is_lacunar(S): # test if the set S is lacunar


return all(i+1 not in S for i in S)

def num_lacs(n): # number of lacunar subsets of [n]


return sum(1 for S in Subsets(n) if is_lacunar(S))

for n in range(10):
print("For n = " + str(n) + ", the number is " + str(num_lacs(n)))

The first two lines here speak for themselves (once you know that all is
the universal quantifier). The function Subsets computes the set of all subsets
of a given set, or (if we provide it an integer n as input) all subsets of [n].
The sum(1 for S in SomeSet) construction is just a slick way of counting the
elements of SomeSet, exploiting the fact that a sum of the form 1 + 1 + · · · + 1
equals the number of its addends. The last two lines are prompting SageMath
to compute the # of lacunar subsets of [n] for each n ∈ [0, 9] (note that range(a,
b) means the integer interval [ a, b − 1] in SageMath) and to output these 10
numbers. I refer to [Grinbe19a, §1.4.3] for more hints on the use of SageMath,
and to its documentation for a more systematic introduction. Note that you
can use SageMathCell to easily call SageMath from your browser (although the
computations you call are limited by 30 seconds each, since they happen on the
server).
The answers we get from SageMath are interesting:

n 0 1 2 3 4 5 6 7 8 9
.
# of lacunar subsets of [n] 1 2 3 5 8 13 21 34 55 89

Haven’t we seen these numbers before?


Yes, we have: In Definition 1.5.1, we defined the Fibonacci sequence. This
is the sequence ( f 0 , f 1 , f 2 , . . .) of nonnegative integers defined recursively by
setting

f 0 = 0, f 1 = 1, and
f n = f n −1 + f n −2 for each n ≥ 2.
71 SageMath is built on top of the Python programming language, so you will recognize a
lot of Python syntax. Actually, the only piece of non-Python code in the following code
snippet is the Subsets(n) part. If you want to use (pure) Python instead of SageMath,
you can replace sum(1 for S in Subsets(n) if is_lacunar(S)) by sum(1 for i in
range(n+1) for S in combinations(range(1, n+1), i) if is_lacunar(S)), after first
importing the combinations function from the itertools package (using from itertools
import combinations).
Math 221 Winter 2024, version March 12, 2024 page 258

Its first few entries are

n 0 1 2 3 4 5 6 7 8 9 10 11 12 13
.
fn 0 1 1 2 3 5 8 13 21 34 55 89 144 233

The two above tables have the same entries, if you discount the fact that the
first two Fibonacci numbers f 0 = 0 and f 1 = 1 are missing from the former
table. So we have good reasons to suspect that

(# of lacunar subsets of [n]) = f n+2

for each n ∈ N. And indeed, this is true:

Theorem 6.4.3. For any integer n ≥ −1, we have

(# of lacunar subsets of [n]) = f n+2 .

Here, we agree that [−1] := ∅. More generally, we agree that [k ] := ∅ for


any k ≤ 0.

Example 6.4.4. The lacunar subsets of [4] are

∅, {1} , {2} , {3} , {4} , {1, 3} , {1, 4} , {2, 4} .

So there are 8 of them, as predicted by Theorem 6.4.3 (since f 4+2 = f 6 = 8).

(Are you wondering why we are allowing n to be −1 in Theorem 6.4.3? The


answer is “because we can”, and more precisely “because it will make our
induction easier”. The case n = −1 is not interesting by itself; the claim of
Theorem 6.4.3 in this case is just that the # of lacunar subsets of ∅ is 1.)
Proof of Theorem 6.4.3. For any integer n ≥ −1, let us set

ℓn := (# of lacunar subsets of [n]) .

Thus, we must prove that

ℓ n = f n +2 for each n ≥ −1. (60)

We have ℓ−1 = 1 (since the set [−1] = ∅ has only one lacunar subset, namely
∅ itself) and f −1+2 = f 1 = 1. Hence, ℓ−1 = 1 = f −1+2 . In other words, (60)
holds for n = −1. A similar computation shows that (60) holds for n = 0.
Let us next show the following:

Claim 1: We have ℓn = ℓn−1 + ℓn−2 for each integer n ≥ 1.

Proof of Claim 1. Let n ≥ 1 be an integer. We shall call a subset of [n]


Math 221 Winter 2024, version March 12, 2024 page 259

• red if it contains n, and

• green if it does not contain n.

Then, the definition of ℓn shows that

ℓn = (# of lacunar subsets of [n])


= (# of red lacunar subsets of [n]) + (# of green lacunar subsets of [n])

(by the sum rule, since each lacunar subset of [n] is either red or green but
cannot be both at the same time72 ).
The green lacunar subsets of [n] are just the lacunar subsets of [n − 1] (since
“green” means “does not contain n”). Thus,

(# of green lacunar subsets of [n])


= (# of lacunar subsets of [n − 1]) = ℓn−1

(by the definition of ℓn−1 ).


Counting the red lacunar subsets is trickier. We shall show that their # is
ℓ n −2 .
If R is a red lacunar subset of [n], then R contains n (by the definition of
“red”), so that R does not contain n − 1 (by lacunarity), and therefore R \ {n}
is a lacunar subset of [n − 2] (since R \ {n} contains neither n nor n − 1). Thus,
we obtain a map

remn : {red lacunar subsets of [n]} → {lacunar subsets of [n − 2]} ,


R 7→ R \ {n} .

Conversely, if L is a lacunar subset of [n − 2], then L ∪ {n} is a lacunar subset


of [n] (indeed, the integer n − 1 is a “buffer zone” between the elements of L
and the new element n, so that the lacunarity of L is preserved when we insert
n into the set), and is red (since n ∈ {n} ⊆ L ∪ {n}). Thus, we obtain a map

insn : {lacunar subsets of [n − 2]} → {red lacunar subsets of [n]} ,


L 7→ L ∪ {n} .

It is easy to see (just as in the proof of Theorem 6.2.2) that the map remn is an
inverse of insn . Thus, the map insn has an inverse, i.e., is bijective (by Theorem
5.10.2). Hence, we have found a bijection

from {lacunar subsets of [n − 2]} to {red lacunar subsets of [n]}

(namely, insn ). Therefore, by the bijection principle, we have

|{lacunar subsets of [n − 2]}| = |{red lacunar subsets of [n]}| .


72 This is the same argument that has been used in the proof of Theorem 6.2.2.
Math 221 Winter 2024, version March 12, 2024 page 260

In other words,
(# of lacunar subsets of [n − 2]) = (# of red lacunar subsets of [n]) .
Thus,
(# of red lacunar subsets of [n]) = (# of lacunar subsets of [n − 2]) = ℓn−2
(by the definition of ℓn−2 ).
Altogether,
ℓn = (# of red lacunar subsets of [n]) + (# of green lacunar subsets of [n])
| {z } | {z }
=ℓn−2 =ℓn−1
= ℓ n −2 + ℓ n −1 = ℓ n −1 + ℓ n −2 .
This proves Claim 1.
Now we still need to prove (60). In other words, we need to prove that the
two sequences (ℓ−1 , ℓ0 , ℓ1 , . . .) and ( f 1 , f 2 , f 3 , . . .) are identical. But at this point,
this is very easy: These two sequences
• have the same two starting entries ℓ−1 = f 1 and ℓ0 = f 2 (this can be easily
checked directly),
• and satisfy the same recursive equation: namely, each entry of either
sequence is the sum of the preceding two entries (since Claim 1 yields
ℓn = ℓn−1 + ℓn−2 , whereas the definition of the Fibonacci numbers yields
f n+2 = f n+1 + f n ).
Since a recursively defined sequence is uniquely determined by its starting
entries and its recursive equation, we thus conclude that the two sequences
(ℓ−1 , ℓ0 , ℓ1 , . . .) and ( f 1 , f 2 , f 3 , . . .) are identical. Thus, (60) follows. This slightly
informal argument can be formalized as a straightforward strong induction73 .
73 Proof. Let us prove (60) by strong induction on n:
Base case: We have already checked that (60) holds for n = −1.
Induction step: Let n ≥ 0 be an integer. Assume (as the induction hypothesis) that the
claim (60) holds for each of −1, 0, 1, . . . , n − 1 instead of n. We must prove that (60) holds
for n as well, i.e., that we have ℓn = f n+2 .
If n = 0, then this follows from the fact (observed above) that (60) holds for n = 0. It thus
remains to consider the case when n ̸= 0. So let us assume that n ̸= 0. Since n ≥ 0, we thus
obtain n ≥ 1, so that n − 1 ≥ 0 and n − 2 ≥ −1.
In particular, n − 1 ≥ 0 ≥ −1. Hence, our induction hypothesis yields that the claim (60)
holds for n − 1 instead of n. In other words, we have ℓn−1 = f (n−1)+2 = f n+1 .
Also, our induction hypothesis yields that the claim (60) holds for n − 2 instead of n (since
n − 2 ≥ −1). In other words, we have ℓn−2 = f (n−2)+2 = f n .
Now, Claim 1 yields ℓn = ℓn−1 + ℓn−2 = f n+1 + f n . But the recursive definition of the
|{z} |{z}
= f n +1 = fn
Fibonacci sequence also yields f n+2 = f n+1 + f n . Comparing these two equalities, we find
ℓn = f n+2 . In other words, (60) holds for n. This completes the induction step. Thus, (60) is
proved.
Math 221 Winter 2024, version March 12, 2024 page 261

Thus we have proved (60). In other words, we have proved Theorem 6.4.3
(because we have ℓn = (# of lacunar subsets of [n])).

6.4.4. Counting all k-element lacunar subsets of [n]


Let us now address the remaining question about lacunar subsets: counting
k-element lacunar subsets of [n] for given n and k.
Again, we start by asking SageMath for some data:

def is_lacunar(S): # test if the set S is lacunar


return all(i+1 not in S for i in S)

def num_lacs(n, k): # number of k-element lacunar subsets of [n]


return sum(1 for S in Subsets(n, k) if is_lacunar(S))

for n in range(10):
print("For n = " + str(n) + ", the numbers are " + \
str([num_lacs(n, k) for k in range(n+1)]))

We obtain the following table:

k=0 k=1 k=2 k=3 k=4 k=5


n=0 1
n=1 1 1
n=2 1 2
n=3 1 3 1
n=4 1 4 3
n=5 1 5 6 1
n=6 1 6 10 4
n=7 1 7 15 10 1
n=8 1 8 21 20 5
n=9 1 9 28 35 15 1

(where each entry is the # of lacunar k-element subsets of [n] for the correspond-
ing values of n and k, and where an empty box means that the corresponding
# is 0). The many 0’s are unsurprising (they are predicted by Proposition 6.4.2),
and likewise the values for k = 0 and k = 1 are clear (since every subset that
has size ≤ 1 is lacunar). But staring at the table for a bit longer reveals some-
thing subtler: It is a sheared Pascal’s triangle! For example, the n = 7 row
contains the numbers 1, 7, 15, 10, 1, which appear along a diagonal in Pascal’s
Math 221 Winter 2024, version March 12, 2024 page 262

triangle. All the entries are binomial coefficients, and a bit of work reveals the
exact formula:

Theorem 6.4.5. Let n ∈ Z and k ∈ N be such that k ≤ n + 1. Then,

n+1−k
 
(# of k-element lacunar subsets of [n]) = .
k

For instance, for n = 7 and k = 3, this yields

7+1−3
   
5
(# of 3-element lacunar subsets of [7]) = = = 10,
3 3

which agrees with our above table.


Note that the condition k ≤ n + 1 in Theorem 6.4.5 is needed. If k > n + 1,
then the # of k-element lacunar subsets of [n] is 0 (since a subset of [n] cannot
have more than n elements, let alone more than n + 1 elements,
 and even less so
n+1−k
when it is lacunar), but the binomial coefficient is nonzero (since
k
the n + 1 − k on its top is negative).
You can prove Theorem 6.4.5 by induction on n, using a similar red/green
coloring as in our above proof of Theorem 6.4.3 (and carefully checking that the
condition k ≤ n + 1 is satisfied whenever you apply the induction hypothesis74 ).
Such a proof can be found in [Grinbe17, Exercise 3 (a)]75 .
There is, however, a nicer proof, which proceeds by constructing a bijection

from {k-element lacunar subsets of [n]}


to {k-element subsets of [n + 1 − k ]} ,

n+1−k
 
and observing that the # of k-element subsets of [n + 1 − k ] is (by
k
Theorem 6.2.4). Such a proof has the advantage of not just proving Theorem
6.4.5 but also explaining “why” it holds (at least if you consider it as a given
that binomial coefficients count k-element subsets).
This second proof rests upon a basic feature of finite sets of integers:

Proposition 6.4.6. Let k ∈ N. Let S be a k-element set of integers. Then,


there exists a unique k-tuple (s1 , s2 , . . . , sk ) of integers satisfying

{ s1 , s2 , . . . , s k } = S and s1 < s2 < · · · < s k .

74 This
necessitates a bit of casework.
75 Tobe very pedantic: [Grinbe17, Exercise 3 (a)] only states Theorem 6.4.5 in the case when
n ∈ N. But the remaining case is trivial (since k ≤ n + 1 leads to k = 0 when n is negative,
and thus we have to count 0-element subsets of an empty set, which is not a deep question).
Math 221 Winter 2024, version March 12, 2024 page 263

This proposition is just saying that if you are given a k-element set S of
integers, then there is a unique way to list the elements of S in increasing order
(with no repetitions). Intuitively, this is clear (just write down the smallest
element of S, then the second-smallest element, then the third-smallest, and so
on, until you run out of elements; it’s not like you have any other options!).
But intuition is not proof. Nevertheless, we will not stoop down to this low a
foundational level here76 , and just take Proposition 6.4.6 for granted.
In connection with Proposition 6.4.6, we introduce a notation:

Convention 6.4.7. Let s1 , s2 , . . . , sk be some integers. Then, the notation


“{s1 < s2 < · · · < sk }” shall mean the set {s1 , s2 , . . . , sk } and additionally sig-
nify that the chain of inequalities s1 < s2 < · · · < sk holds.

Thus, for example, {2 < 4 < 5} is the set {2, 4, 5}, whereas the expression
{4 < 2 < 5} is meaningless.
Proposition 6.4.6 can now be restated as follows: If k ∈ N, then any k-element
set of integers can be written in the form {s1 < s2 < · · · < sk } for a unique k-
tuple (s1 , s2 , . . . , sk ) of integers.
We are now ready to prove Theorem 6.4.5:
Proof of Theorem 6.4.5. Let m := n + 1 − k. Then, m = n + 1 − k ≥ 0 (since
k ≤ n + 1), so that [m] is an m-element set. Also, m = n + 1 − k = n − (k − 1),
so that m + (k − 1) = n.


Now, if S = {s1 < s2 < · · · < sk } is a k-element lacunar subset of [n], then S
shall mean the set

{si − (i − 1) | i ∈ [k]} = {s1 , s2 − 1, s3 − 2, . . . , sk − (k − 1)} .




This set S is obtained from S by the following process:

• Leave the smallest element of S unchanged.

• Decrease the second-smallest element of S by 1.

• Decrease the third-smallest element of S by 2.

• And so on, until eventually decreasing the largest (= k-th-smallest) ele-


ment of S by k − 1.

We refer to this process as the compression process, as it causes the elements


of S to come closer together (in such a way that the distance between any two

76 A boring and detailed (but ultimately very simple) proof of Proposition 6.4.6 can be found
in [Grinbe15, proof of Theorem 2.46].
Math 221 Winter 2024, version March 12, 2024 page 264

“positionally adjacent” elements77 of S shrinks by 1). Consequently, we call the




resulting set S the compression of S. For example, if S = {3 < 5 < 9 < 11},
←−
then S = {3 < 4 < 7 < 8}. Let us illustrate this example graphically:

S 3 5 9 11


− 3 4 7 8
S

(note that each of the red arrows is slightly more horizontal than the previous
one).
We note the following properties of compression: If S = {s1 < s2 < · · · < sk }


is a k-element lacunar subset of [n], then its compression S is still a k-element
set (i.e., the compression process does not cause any two distinct elements to
“collide”) and can be written as

{s1 < s2 − 1 < s3 − 2 < · · · < sk − (k − 1)}

(since S is lacunar, so that any two “positionally adjacent” elements si and si+1
of S satisfy si < si+1 − 1 and thus si − (i − 1) < (si+1 − 1) − (i − 1) = si+1 − i).


Furthermore, S is a subset of [m] (because its smallest element is s1 ≥ 1 (since
s1 ∈ S ⊆ [n]), whereas its largest element is sk − ( k − 1) ≤ n −
|{z}
≤n
(since sk ∈S⊆[n])
(k − 1) = m). Thus, we can define a map

compress : {k-element lacunar subsets of [n]} → {k-element subsets of [m]} ,




S 7→ S .


Conversely, if T = {t1 < t2 < · · · < tk } is a k-element subset of [m], then T
shall mean the set

{ti + (i − 1) | i ∈ [k]} = {t1 , t2 + 1, t3 + 2, . . . , tk + (k − 1)} .




This set T is obtained from T by the following process:

• Leave the smallest element of T unchanged.


77 Wecall two elements i and j of S “positionally adjacent” if they satisfy i < j but there are
no other elements of S lying between them (i.e., there are no elements s ∈ S satisfying
i < s < j). For example, the elements 4 and 6 of the set {2, 4, 6, 8} are positionally adjacent,
but the elements 4 and 6 of the set {2, 3, 4, 5, 6} are not (since the element 5 lies between
them).
Math 221 Winter 2024, version March 12, 2024 page 265

• Increase the second-smallest element of T by 1.

• Increase the third-smallest element of T by 2.

• And so on, until eventually increasing the largest (= k-th-smallest) element


of T by k − 1.

We refer to this process as the expansion process, as it causes the elements


of T to drift further apart (in such a way that the distance between any two
“positionally adjacent” elements of T increases by 1). Consequently, we call the


resulting set T the expansion of T. For example, if T = {3 < 4 < 7 < 8}, then


T = {3 < 5 < 9 < 11}. Let us illustrate this example graphically:

T 3 4 7 8


→ 3 5 9 11
T

(note that each of the red arrows is slightly more horizontal than the previous
one).
We note the following properties of expansion: If T = {t1 < t2 < · · · < tk } is


a k-element subset of [m], then its expansion T is still a k-element set (i.e., the
expansion process does not cause any two distinct elements to “collide”) and
can be written as

{t1 < t2 + 1 < t3 + 2 < · · · < tk + (k − 1)}

(since each i ∈ [k − 1] satisfies ti < ti+1 and thus ti + (i − 1) < ti+1 + (i − 1) <


ti+1 + i). Furthermore, T is a subset of [n] (because its smallest element is t1 ≥
1 (since t1 ∈ T ⊆ [m]), whereas its largest element is tk + ( k − 1) ≤
|{z}
≤m
(since tk ∈ T ⊆[m])
m + (k − 1) = n), and is lacunar (since the expansion process ensures that
the distance between any two “positionally adjacent” elements of T has been


increased by 1 in T , so they can no longer be consecutive integers). Thus, we
can define a map

expand : {k-element subsets of [m]} → {k-element lacunar subsets of [n]} ,




T 7→ T .
Math 221 Winter 2024, version March 12, 2024 page 266

It is easy to see that the map expand is an inverse of compress 78 . Hence,


the map compress has an inverse, i.e., is bijective. Thus, it is a bijection from
{k-element lacunar subsets of [n]} to {k-element subsets of [m]}. Hence, the
bijection principle yields

(# of k-element lacunar subsets of [n])


= (# of k-element subsets of [m])
   
m by Theorem 6.2.4
=
k (applied to m and [m] instead of n and S)
n+1−k
 
= (since m = n + 1 − k) .
k

This proves Theorem 6.4.5.

6.4.5. A corollary
Combining Theorem 6.4.5 with Theorem 6.4.3, we obtain a curious formula for
the Fibonacci numbers in terms of binomial coefficients:

Corollary 6.4.8. Let n ∈ N. Then, the Fibonacci number f n+1 is


n 
n−k n−0 n−1 n−n
      
f n +1 = ∑ = + +···+ .
k =0
k 0 1 n

78 In
fact, each k-element subset T of [m] satisfies compress (expand T ) = T, because if we write
T as T = {t1 < t2 < · · · < tk }, then

expand T = expand ({t1 < t2 < · · · < tk }) = {t1 < t2 + 1 < t3 + 2 < · · · < tk + (k − 1)}

and therefore

compress (expand T ) = compress ({t1 < t2 + 1 < t3 + 2 < · · · < tk + (k − 1)})


= {t1 < (t2 + 1) − 1 < (t3 + 2) − 2 < · · · < (tk + (k − 1)) − (k − 1)}
= {t1 < t2 < · · · < tk } = T.

A similar argument shows that any k-element lacunar subset S of [n] satisfies
expand (compress S) = S.
Math 221 Winter 2024, version March 12, 2024 page 267

Example 6.4.9. For n = 5, Corollary 6.4.8 says that

6−0 6−1 6−2 6−3


       
f6 = + + +
0 1 2 3
6−4 6−5 6−6
     
+ + +
4 5 6
             
6 5 4 3 2 1 0
= + + + + + +
0 1 2 3 4 5 6
= 1 + 5 + 6 + 1 + 0 + 0 + 0,

which is indeed true. Of course, the three summandsthat are  0 could just
n n−k
as well be excluded from the sum, and the sum ∑ in Corollary
k =0 k
⌊n/2⌋ n − k n−k
   
6.4.8 could be replaced by the smaller sum ∑ (since =0
k =0 k k
whenever ⌊n/2⌋ < k ≤ n); but I find it more important to keep the sum
simple than to minimize the number of its addends.

Proof of Corollary 6.4.8. It is easy to see that any subset of [n − 1] has a size
between 0 and n (inclusive)79 . (Actually, it cannot have size n unless n = 0,
but I find it more convenient to nevertheless include the “unnecessary” value
n among the theoretically possible sizes; I am not saying that all of these sizes
actually are achievable.)
Now, from n ∈ N, we obtain n ≥ 0, thus n − 1 ≥ −1. Hence, Theorem 6.4.3
(applied to n − 1 instead of n) yields

(# of lacunar subsets of [n − 1]) = f (n−1)+2 = f n+1 .

79 Proof.
Let T be a subset of [n − 1]. We must show that T has a size between 0 and n (inclusive).
In other words, we must prove that | T | ∈ {0, 1, . . . , n}.
However, we have T ⊆ [n − 1] ⊆ [n] and therefore | T | ≤ |[n]| (by Theorem 6.1.12 (a),
applied to S = [n]). Hence, | T | ≤ |[n]| = n. Since | T | is a nonnegative integer, we thus
obtain | T | ∈ {0, 1, . . . , n}, as desired.
Math 221 Winter 2024, version March 12, 2024 page 268

Therefore,

f n+1 = (# of lacunar subsets of [n − 1])


= (# of lacunar subsets of [n − 1] having size 0)
+ (# of lacunar subsets of [n − 1] having size 1)
+ (# of lacunar subsets of [n − 1] having size 2)
+···
+ (# of lacunar subsets of [n − 1] having size n)
 
by the sum rule (Theorem 6.1.11), since any
subset of [n − 1] has a size between 0 and n (inclusive)
n
= ∑ (# of lacunar subsets of [n − 1] having size k)
k =0
| {z } 
( n − 1) + 1 − k

=(# of k-element lacunar subsets of [n−1])=
k
(by Theorem 6.4.5, applied to n−1 instead of n
(since k≤n=(n−1)+1))
n n 
( n − 1) + 1 − k n−k
  
= ∑ k
= ∑
k
(since (n − 1) + 1 = n)
k =0 k =0
n−0 n−1 n−n
     
= + +···+ .
0 1 n

This proves Corollary 6.4.8.

6.4.6. The domino tilings connection


At the beginning of Chapter 4, I asked for the # of ways to tile a 2 × 15-rectangle
with dominos (i.e., rectangles of size 1 × 2 or 2 × 1), such as the following:

Of course, the same problem can be asked for n × m-rectangles for arbitrary
n and m, but we shall focus on the case n = 2 (that is, a rectangle of height
2). (See [Grinbe19a, §1.1] for some references on the much harder cases when
n > 2.)
It turns out that the ways to tile a 2 × m-rectangle with dominos are in bi-
jection with the lacunar subsets of [m − 1]. Indeed, if T is a way to tile the
2 × m-rectangle, then we let C (T ) be the set of all columns (counted from the
left) in which horizontal dominos of T start (where we say that a horizontal
domino is a domino of height 1 and width 2, and it starts in the leftmost of the
two columns that it spans). For example, if T is the tiling shown above, then
Math 221 Winter 2024, version March 12, 2024 page 269

C (T ) = {2, 6, 8, 11}. Now, it is not hard to see (but not completely obvious; see
[Grinbe19a, §1.4.4, Second proof of Proposition 1.4.9]) that the map

{ways to tile a 2 × m-rectangle with dominos} → {lacunar subsets of [m − 1]} ,


T 7→ C (T )
is a bijection, and therefore the bijection principle yields

(# of ways to tile a 2 × m-rectangle with dominos)


= (# of lacunar subsets of [m − 1]) = f m+1
(by Theorem 6.4.3, applied to n = m − 1). In particular, for m = 15, we obtain

(# of ways to tile a 2 × 15-rectangle with dominos) = f 15+1 = f 16 = 987.

Exercise 6.4.1. A set S of integers will be called pseudolacunar if it has the


property that no two elements s, t of S satisfy |s − t| = 2. For instance, the
set {2, 5, 6} is pseudolacunar, but the set {2, 5, 7} is not (since |5 − 7| = 2).
For each n ∈ N, let pn be the # of pseudolacunar subsets of [n].
Prove that

p n = p n −1 + p n −3 + p n −4 for each n ≥ 4.

[Hint: To each pseudolacunar subset, assign one of three colors.]

Exercise 6.4.2. A set S of integers shall be called self-starting if its size |S| is
also its smallest element. (For example, {3, 5, 6} is self-starting, while {2, 3, 4}
and {3} are not.)
Let n ∈ N.
(a) For any k ∈ [n], find the number of self-starting subsets of [n] having
size k.
(b) Find the number of all self-starting subsets of [n].

6.5. Compositions and weak compositions


Two other useful objects to count are compositions and weak compositions.

6.5.1. Compositions
How many ways are there to write the integer 5 as a sum of 3 positive integers,
if the order matters? Since 5 and 3 are not very large numbers, we can just list
all these ways:

5 = 2+2+1 = 2+1+2 = 1+2+2


= 3 + 1 + 1 = 1 + 3 + 1 = 1 + 1 + 3.
Math 221 Winter 2024, version March 12, 2024 page 270

So there are 6 such ways.


What if we replace 5 and 3 by arbitrary nonnegative integers n and k ? So
we want to count the k-tuples ( a1 , a2 , . . . , ak ) of positive integers satisfying a1 +
a2 + · · · + ak = n. These tuples have a name:

Definition 6.5.1. (a) If n ∈ N, then a composition of n shall mean a tuple


(i.e., finite list) of positive integers whose sum is n.
(b) If n, k ∈ N, then a composition of n into k parts shall mean a k-tuple
of positive integers whose sum is n.
(The word “composition” here is completely unrelated to the notion of
composition of two functions.)

Example 6.5.2. (a) The compositions of 5 into 3 parts are

(2, 2, 1) , (2, 1, 2) , (1, 2, 2) ,


(3, 1, 1) , (1, 3, 1) , (1, 1, 3) .

These are exactly the 6 ways we found above (but written as 3-tuples).
(b) The compositions of 3 are

(1, 1, 1) , (2, 1) , (1, 2) , (3) .

(c) The only composition of 0 is the empty list (), which is a 0-tuple. It is a
composition into 0 parts.

Let us now count compositions of n into k parts. (Later, we will count all
compositions of n.) Again, the answer turns out to be a binomial coefficient:

Theorem 6.5.3. Let n, k ∈ N. Then,

n−1
 
(# of compositions of n into k parts) = . (61)
n−k

If n > 0, then we furthermore have

n−1
 
(# of compositions of n into k parts) = . (62)
k−1

Proof sketch. The proof is straightforward in the case when n = 0. (Indeed,


if n = 0, then the only composition of n is the empty list (), and this is a
composition of n into 0 parts. Thus, if n = 0, then we have
(
1, if k = 0;
(# of compositions of n into k parts) =
0, if k ̸= 0;
Math 221 Winter 2024, version March 12, 2024 page 271

but we also have


   (
n−1 0−1 −1
  
1, if k = 0;
= = = (check this!)
n−k 0−k −k 0, if k ̸= 0

in this case, and we obtain (61) by comparing these two equalities. Thus, The-
orem 6.5.3 holds for n = 0 (because the equality (62) is claimed for n > 0
only).)
Thus, we only need to consider the case when n ̸= 0. Let us thus focus on
this case. From n ̸= 0, we obtain n ≥ 1 (since n ∈ N), thus n − 1 ∈ N.
For any composition a = ( a1 , a2 , . . . , ak ) of n into k parts, we define the partial
sum set C ( a) to be the set

{ a 1 , a 1 + a 2 , a 1 + a 2 + a 3 , . . . , a 1 + a 2 + · · · + a k −1 }
= { a1 + a2 + · · · + ai | i ∈ [k − 1]} .

This set C ( a) consists of all the “partial sums” a1 + a2 + · · · + ai of the sum


a1 + a2 + · · · + ak , except for the empty partial sum a1 + a2 + · · · + a0 (which is 0
by definition) and the full sum a1 + a2 + · · · + ak (which is n, since a is a compo-
sition of n). Thus, all elements of C ( a) are integers between 0 and n (exclusive)
(since they have more addends than the empty partial sum, but fewer than the
full sum80 ). In other words, C ( a) is a subset of {1, 2, . . . , n − 1} = [n − 1].
We can visualize the partial sum set C ( a) of a composition a = ( a1 , a2 , . . . , ak )
as follows: The interval [0, n]R := { x ∈ R | 0 ≤ x ≤ n} on the real line has
length n. If we split this interval into blocks of lengths a1 , a2 , . . . , ak (from left
to right), then the elements of C ( a) are precisely the endpoints of these blocks
(i.e., the points at which one block ends and the next begins), except for the
leftmost endpoint 0 and the rightmost endpoint n. See this picture:

a1 a2 ··· ak

0 s1 s2 ··· s k −1 n

(on which the partial sums a1 + a2 + · · · + ai are denoted by si ).


It is thus easy to see that if a is a composition of n into k parts, then C ( a) is
a (k − 1)-element subset of [n − 1]. Thus, we obtain a map

C : {compositions of n into k parts} → {(k − 1) -element subsets of [n − 1]} ,


a 7→ C ( a) .

80 and since all these addends are positive (because a composition has positive entries)
Math 221 Winter 2024, version March 12, 2024 page 272

Furthermore, it is not hard to see that this map C has an inverse81 , and thus is
a bijection. Hence, the bijection principle yields

(# of compositions of n into k parts)


= (# of (k − 1) -element subsets of [n − 1])
n−1
   
by Theorem 6.2.4
=
k−1 (applied to k − 1, n − 1 and [n − 1] instead of k, n and S)
n−1
   
by the symmetry of Pascal’s triangle
=
( n − 1) − ( k − 1) (Theorem 2.5.5), since n − 1 ∈ N
n−1
 
= (since (n − 1) − (k − 1) = n − k) .
n−k

Thus, both (61) and (62) have been proved. This completes the proof of Theorem
6.5.3.
We can also count all compositions of a given n:

Theorem 6.5.4. Let n be a positive integer. Then, the # of all compositions of


n is 2n−1 .

Proof sketch. This can be proved using a similar argument as in Theorem 6.5.3
(but now we need to count all subsets of [n − 1]). See [Grinbe19c, Exercise 1
(b)] for details.
Note that Theorem 6.5.4 does not hold for n = 0 (since 0 has 1 composition,
1
but 20−1 = ).
2
81 This is easiest to see using the visual description of C ( a) that we showed above: Given a
(k − 1)-element subset I of [n − 1], we can use the elements of I to subdivide the interval
[0, n]R into k blocks. The lengths of these blocks (listed from left to right) form a composition
a of n into k parts, and this composition satisfies C ( a) = I. Moreover, this composition is
the only one with this property. Thus, the map that sends each (k − 1)-element subset I
of [n − 1] to the corresponding composition a (whose construction we just explained) is an
inverse map of C.
Rigorously, this can be restated as follows: For each (k − 1)-element subset I =
{i1 < i2 < · · · < ik−1 } of [n − 1] (where we are using Convention 6.4.7 again), we can define
a composition

A ( I ) : = ( i 1 − i 0 , i 2 − i 1 , i 3 − i 2 , . . . , i k −1 − i k −2 , i k − i k −1 ) ,

where we set i0 := 0 and ik := n. Then, the map

A : {(k − 1) -element subsets of [n − 1]} → {compositions of n into k parts} ,


I 7→ A ( I )

is easily seen to be an inverse map of C. A detailed proof can be found in [Grinbe19c,


solution to Exercise 1 (b)] (except that the latter solution does not pay attention to the size
of the subset).
Math 221 Winter 2024, version March 12, 2024 page 273

6.5.2. Weak compositions


One particularly useful variant of compositions are the so-called weak compo-
sitions. These are defined as tuples of nonnegative integers (i.e., they differ
from compositions in that their entries are allowed to be 0). In other words:

Definition 6.5.5. (a) If n ∈ N, then a weak composition of n shall mean a


tuple of nonnegative integers whose sum is n.
(b) If n, k ∈ N, then a weak composition of n into k parts shall mean a
k-tuple of nonnegative integers whose sum is n.

For instance:

• The weak compositions of 2 into 3 parts are


(1, 1, 0) , (1, 0, 1) , (0, 1, 1) ,
(2, 0, 0) , (0, 2, 0) , (0, 0, 2) .

• The weak compositions of 2 into 2 parts are


(2, 0) , (1, 1) , (0, 2) .
(Note that any composition is a weak composition, but there are usually
more weak compositions than that.)
• The weak compositions of 1 are all tuples of the form
 

0, 0, . . . , 0 , 1, 0, 0, . . . , 0 .
 
 | {z } | {z }
any number of zeroes any number of zeroes

Here, “any number” allows for the possibility of “none”, and in particular
the 1-tuple (1) is a weak composition of 1.

Counting all weak compositions of a given n is no longer possible, since


there are infinitely many (as we just saw). But we can still count all weak
compositions of a given n into k parts for a given k.

Theorem 6.5.6. Let n, k ∈ N. Then,

n+k−1
 
(# of weak compositions of n into k parts) = .
n

Moreover, if n + k > 0 (that is, if n and k are not both 0), then

n+k−1
 
(# of weak compositions of n into k parts) = .
k−1
Math 221 Winter 2024, version March 12, 2024 page 274

Proof. We shall deduce this from Theorem 6.5.3.


Indeed, if b is a nonnegative integer, then b + 1 is a positive integer. Thus,
if ( a1 , a2 , . . . , ak ) is a weak composition of n into k parts, then the k-tuple
( a1 + 1, a2 + 1, . . . , ak + 1) is a composition of n + k into k parts (since the
sum of its entries is

( a1 + 1) + ( a2 + 1) + · · · + ( a k + 1) = ( a1 + a2 + · · · + a k ) +k
| {z }
=n
(since ( a1 ,a2 ,...,ak ) is a weak composition of n)
= n+k

). Thus, the map

{weak compositions of n into k parts} → {compositions of n + k into k parts} ,


( a1 , a2 , . . . , ak ) 7→ ( a1 + 1, a2 + 1, . . . , ak + 1)

is well-defined. Similarly, the map

{compositions of n + k into k parts} → {weak compositions of n into k parts} ,


( a1 , a2 , . . . , ak ) 7→ ( a1 − 1, a2 − 1, . . . , ak − 1)

is well-defined. These two maps are clearly inverses of each other (since adding
1 and subtracting 1 are inverse operations). Therefore, they are bijections. The
bijection principle thus yields

(# of weak compositions of n into k parts)


= (# of compositions of n + k into k parts)
n+k−1
 
= (by (61), applied to n + k instead of n)
n+k−k
n+k−1
 
= .
n

If n + k > 0, then n + k ≥ 1 and thus n + k − 1 ∈ N, so that this becomes

(# of weak compositions of n into k parts)


n+k−1
 
=
n
n+k−1
   
by the symmetry of Pascal’s triangle
=
( n + k − 1) − n (Theorem 2.5.5), since n + k − 1 ∈ N
n+k−1
 
= in this case.
k−1

Thus, Theorem 6.5.6 is fully proved.


Math 221 Winter 2024, version March 12, 2024 page 275

Exercise 6.5.1. Let n ∈ N.


A {1, 2}-composition of n shall mean a composition ( a1 , a2 , . . . , ak ) of n
such that a1 , a2 , . . . , ak ∈ {1, 2}.
For example, the {1, 2}-compositions of 5 are

(1, 1, 1, 1, 1) , (1, 1, 1, 2) , (1, 1, 2, 1) , (1, 2, 1, 1) ,


(2, 1, 1, 1) , (2, 2, 1) , (2, 1, 2) , (1, 2, 2) .

(a) Prove that

(# of {1, 2} -compositions of n) = f n+1

(where ( f 0 , f 1 , f 2 , . . .) denotes the Fibonacci sequence, as defined in Definition


1.5.1).
(b) Let k ∈ N. A {1, 2}-composition of n into k parts shall mean a com-
position ( a1 , a2 , . . . , ak ) of n into k parts such that a1 , a2 , . . . , ak ∈ {1, 2}.
Prove that
 
k
(# of {1, 2} -compositions of n into k parts) = .
n−k

6.6. Selections
We now come back to a class of problems that we have posed at the start of
Chapter 4 (before Section 4.1) but haven’t fully answered yet: counting the
ways to select a bunch of elements from a given set.
To be more specific, these are problems of the following form: Given an n-
element set S, how many ways are there to select k elements from S (where n
and k are fixed nonnegative integers)?
The words “k elements” in this question are ambiguous, as they allow for
several interpretations:
1. Do we want k arbitrary elements or k distinct elements?
2. Does the order of these k elements matter or not? (In other words, would
“1, 2” and “2, 1” count as two different selections?)
In total, these decisions leave you with 4 options, leading to 4 different prob-
lems. In this section, we shall address them all.

6.6.1. Unordered selections without repetition (= without replacement)


Let us begin with the case when we want to select k distinct elements, and the
order does not matter. This just means selecting a k-element subset of S. We
already know how to count these subsets (Theorem 6.2.4):
Math 221 Winter 2024, version March 12, 2024 page 276

Theorem 6.6.1. Let n ∈ N, and let k be any number. Let S be an n-element


set. Then,  
n
(# of k-element subsets of S) = .
k

In other words, the # of ways to choose k distinct


  elements from a given
n
n-element set S, if the order does not matter, is .
k

6.6.2. Ordered selections without repetition (= without replacement)


Now, let us consider the case when the order does matter. Thus, we are looking
not for subsets, but for k-tuples. But these k-tuples are not arbitrary k-tuples;
they are k-tuples of distinct elements. We shall call such k-tuples injective (in
analogy to injective functions):

Definition 6.6.2. Let k ∈ N. A k-tuple (i1 , i2 , . . . , ik ) is said to be injective if


its k entries i1 , i2 , . . . , ik are distinct (i.e., if we have i a ̸= ib for all a ̸= b).

For example, the 3-tuple (6, 1, 2) is injective, but (2, 1, 2) is not.


Note that injective k-tuples and injective functions are closely related: A func-
tion f : [k ] → S (for a set S and a number k ∈ N) is injective if and only if the
k-tuple ( f (1) , f (2) , . . . , f (k)) is injective.
Next, we introduce another convenient notation:

Definition 6.6.3. Let S be any set, and let k ∈ N. Then, Sk shall mean the
Cartesian product

| ×S×
S {z· · · × S} = {( a1 , a2 , . . . , ak ) | a1 , a2 , . . . , ak ∈ S}
k times
= {k-tuples whose all entries belong to S} .

For example, {5, 6}3 is the set

{5, 6} × {5, 6} × {5, 6}


= {(5, 5, 5) , (5, 5, 6) , (5, 6, 5) , (5, 6, 6) , (6, 5, 5) , (6, 5, 6) , (6, 6, 5) , (6, 6, 6)} .
None of the 3-tuples (i.e., triples) in this set is injective, but it is easy to find
an example where injective k-tuples do appear: For instance, the set {1, 2, 3, 4}3
contains both injective 3-tuples such as (1, 4, 3) and non-injective 3-tuples such
as (3, 3, 1).
Now, we can define rigorously what we are looking for: A way to select k
distinct elements from a given set S, if the order matters, is the same as an
injective k-tuple in Sk . We shall now count such ways:
Math 221 Winter 2024, version March 12, 2024 page 277

Theorem 6.6.4. Let n ∈ N and k ∈ N. Let S be an n-element set. Then,


 
# of injective k-tuples in Sk = n (n − 1) (n − 2) · · · (n − k + 1) .

Example 6.6.5. Applying Theorem 6.6.4 to n = 5, k = 3 and S = {1, 2, 3, 4, 5},


we find that
 
3
# of injective 3-tuples in {1, 2, 3, 4, 5} = 5 (5 − 1) (5 − 2) = 5 · 4 · 3 = 60.

And indeed, there are 60 injective 3-tuples in {1, 2, 3, 4, 5}3 . For example,
(2, 5, 4) and (5, 3, 2) are two of them.

Note that the right hand side in Theorem  6.6.4 is precisely the numerator in
n
the definition of the binomial coefficient (Definition 2.4.1), and thus can be
  k
n
rewritten as k! · (since k! is the denominator). Thus, the claim of Theorem
k
6.6.4 can be restated as
 

k
 n
# of injective k-tuples in S = k! · .
k

Now, how do we prove the theorem? Let us first give an informal proof:
Informal proof of Theorem 6.6.4. Let us look at an example (which is representa-
tive of the general case): We let n = 5 and k = 3 and S = { a, b, c, d, e}. How
many injective k-tuples are there in Sk ? In other words (since k = 3): How
many injective 3-tuples are there in S3 ?
Such a 3-tuple has the form ( x, y, z), where x, y, z are three distinct elements
of S. Let us see how such a 3-tuple can be chosen:

1. First, we choose its first entry x. There are 5 options for this, since S has 5
elements (and x can be any of these 5).

2. Then, we choose its second entry y. There are 4 options for it, since y
can be any of the 5 elements of S except for x (because the injectivity of
( x, y, z) demands y to be distinct from x).
3. Finally, we choose its third entry z. There are 3 options for it, since z can
be any of the 5 elements of S except for x and y (because the injectivity
of ( x, y, z) demands z to be distinct from x and y) and since x and y are
already distinct.

Altogether, we have 5 options at the first step, then 4 options at the second
step (no matter which option has been chosen at the first step), and finally
Math 221 Winter 2024, version March 12, 2024 page 278

3 options at the third step. Altogether, we can therefore choose our 3-tuple in
5 · 4 · 3 many different ways, because the numbers of options multiply. Here, we
have used a counting rule called “dependent product rule”, which informally
says that if we perform a multi-step construction, and we have

• exactly n1 options in step 1,

• exactly n2 options in step 2,

• . . .,

• exactly nk options in step k,

then the entire construction can be performed in n1 n2 · · · nk many different


ways. We shall not formalize this rule (let alone prove it); the reader can find
rigorous versions of this rule in [Loehr11, §1.8] and in [Newste23, Theorem
8.1.19]. However, we shall next give a more rigorous proof of Theorem 6.6.4,
which uses induction on k instead of this “dependent product rule” (although
the underlying idea is the same).
Rigorous proof of Theorem 6.6.4. Forget that we fixed S and n. We thus must
prove the statement
!
“for all n ∈ N and all n-element sets S, we have
P (k) :=
# of injective k-tuples in Sk = n (n − 1) (n − 2) · · · (n − k + 1) ”


for each k ∈ N. We shall prove this by induction on k.


Base case: We must prove that P (0) holds. In other words, we must prove
that for all n ∈ N and all n-element sets S, we have
 
# of injective 0-tuples in S0 = n (n − 1) (n − 2) · · · (n − 0 + 1) .

But this is an easy exercise in understanding emptiness: Let n ∈ N, and let S


be an n-element set. The only 0-tuple in S0 is (), and this 0-tuple is injective.
Thus,  
# of injective 0-tuples in S0 = 1.
Comparing this with

n (n − 1) (n − 2) · · · (n − 0 + 1) = (empty product) = 1,

we obtain # of injective 0-tuples in S0 = n (n − 1) (n − 2) · · · (n − 0 + 1). Thus,




P (0) is proved. This completes the base case.


Induction step: Let k be a positive integer. Assume (as the induction hypothe-
sis) that P (k − 1) holds. Our goal is to prove P (k).
Math 221 Winter 2024, version March 12, 2024 page 279

We have assumed that P (k − 1) holds. In other words, for all n ∈ N and all
n-element sets S, we have
 
k −1
# of injective (k − 1) -tuples in S
= n ( n − 1) ( n − 2) · · · ( n − ( k − 1) + 1) . (63)

Now, let us focus on proving P (k ). Thus, we fix an n ∈ N and an n-element


set S. Our goal is then to prove that
 
?
# of injective k-tuples in Sk = n (n − 1) (n − 2) · · · (n − k + 1) .

(Again, the question mark atop the equality sign reminds us that this is not
proved yet.)
Let s1 , s2 , . . . , sn be the n elements of S (listed without repetition). Then, any
k-tuple in Sk ends82 with exactly one of s1 , s2 , . . . , sn . Hence, by the sum rule,
we have
 
k
# of injective k-tuples in S
 
= # of injective k-tuples in Sk that end with s1
 
+ # of injective k-tuples in Sk that end with s2
+···
 
+ # of injective k-tuples in Sk that end with sn
n  
= ∑ # of injective k-tuples in S that end with si .
k
(64)
i =1

Now, we shall compute the addends in this sum.


Fix any i ∈ [n]. An injective k-tuple in Sk that ends with si must have the
form
( . . . , si ) ,
where the “. . .” are k − 1 distinct elements of S \ {si } (not merely of S, but actu-
ally of S \ {si }, because if any of them was si , then our k-tuple would contain the
entry si twice and thus fail to be injective). In other words, an injective k-tuple
in Sk that ends with si is an injective (k − 1)-tuple in (S \ {si })k−1 followed by
the entry si . Thus, we obtain a map
n o n o
injective k-tuples in Sk that end with si → injective (k − 1) -tuples in (S \ {si })k−1 ,
( . . . , si ) 7 → ( . . . )
82 We say that a k-tuple ends with a given element b if b is the last entry of this k-tuple. Note
that every k-tuple does indeed have a last entry, since k is positive.
Math 221 Winter 2024, version March 12, 2024 page 280

(which removes the last entry from our k-tuple and leaves the other entries as
they are)83 . Conversely, we have a map
n o n o
injective (k − 1) -tuples in (S \ {si })k−1 → injective k-tuples in Sk that end with si ,
( . . . ) 7 → ( . . . , si )
(which inserts an si after the end of a (k − 1)-tuple; the result is still injec-
tive84 )85 . These two maps are clearly inverses of each other86 , and thus are
bijections. Hence, the bijection principle yields
 
k
# of injective k-tuples in S that end with si
 
= # of injective (k − 1) -tuples in (S \ {si })k−1 .

However, recall our induction hypothesis (63). We have |S| = n (since S is an


n-element set). Since si is an element of S, the set {si } is a subset of S. Thus,
the difference rule (Theorem 6.1.12 (b)) yields
|S \ {si }| = |S| − |{si }| = n − 1,
|{z} | {z }
=n =1

so that S \ {si } is an (n − 1)-element set, and we have n − 1 = |S \ {si }| ∈ N.


Hence, we can apply (63) to n − 1 and S \ {si } instead of n and S (because (63)
is a “for all n ∈ N” statement, not just a statement about the specific n that we
have fixed right now!). As a result, we obtain
 
k −1
# of injective (k − 1) -tuples in (S \ {si })
= (n − 1) ((n − 1) − 1) ((n − 1) − 2) · · · ((n − 1) − (k − 1) + 1)
| {z }| {z } | {z }
= n −2 = n −3 = n − k +1
= ( n − 1) ( n − 2) ( n − 3) · · · ( n − k + 1) .
Combining what we have found, we obtain
 
# of injective k-tuples in Sk that end with si
 
= # of injective (k − 1) -tuples in (S \ {si })k−1
= ( n − 1) ( n − 2) ( n − 3) · · · ( n − k + 1) .
83 For example, if k = 4, then this map sends a k-tuple ( a, b, c, si ) to ( a, b, c).
84 Proof. We must show that if we insert an si after the end of an injective (k − 1)-tuple in
(S \ {si })k−1 , then the result is still injective.
Indeed, the only way this could fail is if the newly inserted entry si would already appear
in the original (k − 1)-tuple. However, this is impossible, since the original (k − 1)-tuple
belongs to (S \ {si })k−1 and thus cannot contain the entry si .
85 For example, if k = 4, then this map sends a ( k − 1)-tuple ( a, b, c ) to ( a, b, c, s ).
i
86 because a k-tuple that ends with s stays unchanged if we replace its last entry with s
i i
Math 221 Winter 2024, version March 12, 2024 page 281

Now, forget that we fixed i. We have thus proved that


 
k
# of injective k-tuples in S that end with si
= ( n − 1) ( n − 2) ( n − 3) · · · ( n − k + 1) (65)

for every i ∈ [n]. Therefore, (64) becomes


 
# of injective k-tuples in Sk
n  
= ∑ # of injective k-tuples in S that end with si
k
i =1 | {z }
=(n−1)(n−2)(n−3)···(n−k+1)
(by (65))
n
= ∑ ( n − 1) ( n − 2) ( n − 3) · · · ( n − k + 1)
i =1
= n · ( n − 1) ( n − 2) ( n − 3) · · · ( n − k + 1)
!
n
since ∑ a = na for any number a
i =1
= n ( n − 1) ( n − 2) · · · ( n − k + 1) .

Forget that we fixed n and S. We thus have proved that for all n ∈ N and all
n-element sets S, we have
 
k
# of injective k-tuples in S = n (n − 1) (n − 2) · · · (n − k + 1) .

In other words, we have proved P (k ). Thus, the induction step is complete,


and Theorem 6.6.4 is proved.

6.6.3. Intermezzo: Listing n elements


Theorem 6.6.4 tells us that if S is an n-element set, then the # of ways to choose
k distinct elements from S, if the order matters, is
 
n
n (n − 1) (n − 2) · · · (n − k + 1) = k! · .
k

In particular, applying this to k = n, we conclude that the # of ways to choose


n distinct elements from S, if the order matters, is
 
n
n (n − 1) (n − 2) · · · (n − n + 1) = n! · = n!.
n
| {z }
=1
(by Corollary 2.5.7)
Math 221 Winter 2024, version March 12, 2024 page 282

Of course, when we are choosing n distinct elements from an n-element set,


we are not actually choosing the elements (since all elements have to be cho-
sen87 ); we are only choosing the order in which we list them. So what we have
just shown (if somewhat informally) is the following result:

Corollary 6.6.6. Let n ∈ N. Let S be an n-element set. Then, the # of ways to


list the n elements of S in some order (that is, the # of n-tuples that contain
each element of S exactly once) is n!.

Example 6.6.7. Applying Corollary 6.6.6 to n = 3 and S = {1, 2, 3}, we see


that the # of ways to list the 3 numbers 1, 2, 3 in some order (i.e., the # of
3-tuples that contain each of the numbers 1, 2, 3 exactly once) is 3! = 6. And
indeed, here are these 6 ways:

(1, 2, 3) , (1, 3, 2) , (2, 1, 3) , (2, 3, 1) , (3, 1, 2) , (3, 2, 1) .

Corollary 6.6.6 is one of the reasons why factorials are ubiquitous in com-
binatorics. The n! ways to list the n elements of a given n-element set S are
sometimes called the “permutations” of S, but this name is more frequently
used for the bijective maps from S to S. (The # of the latter maps is also n!, and
the two concepts are closely related. For details, see [Grinbe22, §1.7.4 in Lecture
13]. See also [Grinbe22, Lectures 26–28] for much more about permutations.)

Exercise 6.6.1. (a) How many 7-digit numbers are there? (A “k-digit number”
means a nonnegative integer that has k digits when written in the decimal
system (without leading zeroes). For example, 3902 is a 4-digit number, not
a 5-digit number.)
(b) How many 7-digit numbers are there that have no two equal digits?
(c) How many 7-digit numbers have an even sum of digits?
(d) How many 7-digit numbers are palindromes? (A “palindrome” is a
number such that reading its digits from right to left yields the same number.
For example, 5 and 1331 and 49094 are palindromes.)
[If your answer is a product or power, you do not need to simplify it to a
number.]

6.6.4. Ordered selections with repetition (= with replacement)


We have now solved two variants of our “select k out of n” counting question.
We have two more variants to go: the ones where the k elements are arbitrary
(not necessarily distinct). Again, we have the choice of caring or not caring
about their order.
87 This follows from Theorem 6.1.12 (c).
Math 221 Winter 2024, version March 12, 2024 page 283

If we care about their order, then we are just counting all k-tuples in Sk . The
answer to this question is simple:

Theorem 6.6.8. Let n ∈ N and k ∈ N. Let S be an n-element set. Then,


 
# of all k-tuples in Sk = nk .

Proof. The set S is an n-element set; in other words, |S| = n. Now,


 
# of all k-tuples in Sk
 

= Sk = |S × S ×{z· · · × S} since Sk is defined to be S × S × · · · × S


| {z }
k times k times
 
by the product rule for k sets
= |S| · |S| · · · · · |S|
| {z } (Theorem 6.1.14)
k times
k k
= |S| = n (since |S| = n) .
This proves Theorem 6.6.8.

Exercise 6.6.2. Let n ∈ N. Compute the # of 4-tuples ( a, b, c, d) ∈ [n]4 that


satisfy a ≤ b < c ≤ d. (Not a typo: the second sign is a <, not a ≤.)
(Recall that [n]4 = [n] × [n] × [n] × [n], so that a 4-tuple ( a, b, c, d) ∈ [n]4
means a 4-tuple of integers a, b, c, d ∈ {1, 2, . . . , n}.)

6.6.5. Unordered selections with repetition (= with replacement)


Now only one question remains: What is the # of ways to choose k arbitrary
elements from an n-element set S if we don’t care about their order?
There are several equivalent ways to rigorously define what this means:

1. We can define the notion of a multiset, which is “like a finite set but
allowing an element to be contained multiple times”. This is done, e.g., in
[Grinbe22, §2.9 (Lectures 21–22)] or (in more detail) in [Grinbe19a, §2.11].
Then, a selection of k arbitrary elements from a set S, disregarding the
order, can be formalized as a size-k multisubset of the set S.
2. Alternatively, we can define the notion of an unordered k-tuple, which
is “a k-tuple up to reordering its entries”. Formally, these unordered
k-tuples are defined as the equivalence classes of usual (i.e., ordered) k-
tuples with respect to a certain equivalence relation. (See, e.g., [Grinbe19a,
Example 3.3.24] for the details.) Then, a selection of k arbitrary elements
from a set S, disregarding the order, can be formalized as an unordered
k-tuple of elements of S.
Math 221 Winter 2024, version March 12, 2024 page 284

3. Finally, if we restrict ourselves to the case when S = [n] (which case


is sufficient for all practical purposes, since we can otherwise rename
the elements of S as 1, 2, . . . , n), then the following “low-tech” solution
becomes available: We say that a k-tuple (i1 , i2 , . . . , ik ) ∈ Sk is weakly
increasing (aka sorted in weakly increasing order) if it satisfies i1 ≤
i2 ≤ · · · ≤ ik . Now, a selection of k arbitrary elements from S = [n],
disregarding the order, can be defined as a weakly increasing k-tuple in
Sk (because if we don’t care about the order of our k elements, then we
can just as well sort them in increasing order, and the result of such a
sorting operation is clearly unique88 ).
These three definitions yield different objects, but these objects are equiv-
alent, in the sense that there are bijections from each one to each other. In
particular, the # of selections of k arbitrary elements from S (without regard for
their order) does not depend on which way we define these selections. Thus,
when it comes to counting them, we can pick whatever definition we prefer.
Now that all the requisite warnings and disclaimers have been said, we can
finally count these selections:
Theorem 6.6.9. Let n ∈ N and k ∈ N. Let S be an n-element set. Then,

(# of all ways to select k elements from S (if order does not matter))
k+n−1
 
=
k

(where our k elements don’t have to be distinct).

Example 6.6.10. Applying Theorem 6.6.9 to n = 5 and k = 2 and S = [5] =


{1, 2, 3, 4, 5}, we obtain
(# of all ways to select 2 elements from [5] (if order does not matter))
2+5−1
   
6
= = = 15.
2 2
And indeed, here are these 15 ways:

(1, 1) , (1, 2) , (1, 3) , (1, 4) , (1, 5) ,


(2, 2) , (2, 3) , (2, 4) , (2, 5) ,
(3, 3) , (3, 4) , (3, 5) ,
(4, 4) , (4, 5) ,
(5, 5) .
Here, we have represented each of these selections as a weakly increasing
k-tuple in Sk (as explained above).
88 Clearly
if you believe in common sense. Not so clearly if you want a formal proof. See, e.g.,
[Grinbe19a, Exercise 2.11.2] for such a proof.
Math 221 Winter 2024, version March 12, 2024 page 285

Informal proof of Theorem 6.6.9 (sketched). For the sake of simplicity, we assume
that S = [n] (since otherwise, we can rename the n elements of S as 1, 2, . . . , n).
Then, as we said above, a selection of k arbitrary elements from S = [n] (disre-
garding the order) can be defined as a weakly increasing k-tuple in Sk . But a
weakly increasing k-tuple in Sk must always look as follows:
 

1, 1, . . . , 1, 2, 2, . . . , 2, . . . , n, n, . . . , n
 
| {z } | {z } | {z }
a1 many 1’s a2 many 2’s an many n’s

for some numbers a1 , a2 , . . . , an ∈ N (in particular, each ai can be 0, which


means that i does not appear in our k-tuple) that satisfy a1 + a2 + · · · + an = k
(because we want a k-tuple). Such a k-tuple is uniquely determined by these
numbers a1 , a2 , . . . , an , and conversely, any choice of these numbers a1 , a2 , . . . , an
leads to a different k-tuple.
Thus, there is a bijection
n o
k
from weakly increasing k-tuples in S
to {n-tuples ( a1 , a2 , . . . , an ) ∈ Nn satisfying a1 + a2 + · · · + an = k } .
Hence, the bijection principle yields
 
k
# of weakly increasing k-tuples in S
= (# of n-tuples ( a1 , a2 , . . . , an ) ∈ Nn satisfying a1 + a2 + · · · + an = k)
= (# of weak compositions of k into n parts)
since the n-tuples ( a1 , a2 , . . . , an ) ∈ Nn
 
 satisfying a1 + a2 + · · · + an = k 
are precisely the weak compositions of k into n parts
k+n−1
   
by Theorem 6.5.6,
= .
k applied to k and n instead of n and k

This proves Theorem 6.6.9 (because these weakly increasing k-tuples in Sk are
the ways to select k elements from S (if order does not matter)).
For a rigorous proof, see [Grinbe19a, Corollary 2.11.3] (but note that the
meanings of the letters n and k are switched in [Grinbe19a, Corollary 2.11.3]).

Theorem 6.6.9 is our fifth combinatorial interpretation of binomial coeffi-


cients so far! Previously, we have seen that they count subsets (Theorem 6.2.4),
lacunar subsets (Theorem 6.4.5), compositions (Theorem 6.5.3) and weak com-
positions (Theorem 6.5.6). This all is not too surprising, since we proved four
of these five theorems using the bijection principle (reducing them to previ-
ously proved theorems), but it is impressive to see so many counting problems
answered by the same family of numbers.
Math 221 Winter 2024, version March 12, 2024 page 286

We have now solved all our four selection problems. We now come to a
different counting problem.

6.7. Anagrams and multinomial coefficients


6.7.1. Counting anagrams
An anagram of a given word w means a word that consists of the same letters
as w but possibly in a different order. For example:
• The anagrams of the word “cat” are “act”, “atc”, “cat”, “cta”, “tac” and
“tca”.
• The word “labl” is an anagram of “ball” (and so are several others).
As you see here, we make no distinction between meaningful and meaning-
less words. (Also, being logically coherent at the expense of common sense, we
consider each word w to be an anagram of itself.)
Now, we can take a given word w and ask how many anagrams w has. For
instance:
• How many anagrams does the word “cat” have?
It has six (and we have just listed them above). In fact, we can put the
three letters in any order, and there are 6 possible orders (by Corollary
6.6.6).
• How many anagrams does the word “dud” have?
It has three (“dud”, “ddu”, “udd”). Note that the answer does not directly
follow from Corollary 6.6.6, since two of the three letters are equal.
• How many anagrams does the word “ball” have?
It has 12 of them: In fact, if the two “l”s were two different letters, then it
would have 24 anagrams (again by Corollary 6.6.6), but since the two “l”s
are the same, these 24 anagrams merge into pairs of equal words (you get
24
“ball” twice, you get “blal” twice, etc.), so the answer is = 12.
2
(Not convinced? Good; it’s worth to be skeptical about arguments like
this. Still, this argument can be made precise and rigorous. See [Loehr11,
first proof of Theorem 1.46] for this.)
• How many anagrams does the word “bookkeeper” have?
Too many to list by brute force, and the “divide by 2” technique from
the previous example gets muddled somewhat as there are several equal
letters89 .
89 Actually,the technique can be salvaged, but this requires some carefulness that I am too lazy
for right now. (Once again, see [Loehr11, first proof of Theorem 1.46].)
Math 221 Winter 2024, version March 12, 2024 page 287

Thus, let us try a new strategy. The word “bookkeeper” has 10 letters.
Hence, any anagram of it is a 10-letter word as well. Its letters are

1 “b”, 3 “e”s, 2 “k”s, 2 “o”s, 1 “p” and 1 “r”.

In order to choose an anagram of “bookkeeper”, we have to distribute all


these letters into 10 positions. In other words, we have to choose which
position the 1 “b” will occupy, which positions the 3 “e”s will occupy, and
so on. Let us do this step by step:
 
10
– We first choose the position of the 1 “b”. There are many
1
options for this, since we need to choose a 1-element subset of the set
of all 10 positions.
 
9
– We then choose the positions of the 3 “e”s. There are many
3
options for this, since we need to choose a 3-element subset of the set
of all 9 positions not already occupied.
 
6
– We then choose the positions of the 2 “k”s. There are many
2
options for this, since we need to choose a 2-element subset of the set
of all 6 positions not already occupied.
 
4
– We then choose the positions of the 2 “o”s. There are many
2
options for this, since we need to choose a 2-element subset of the set
of all 4 positions not already occupied.
 
2
– We then choose the positions of the 1 “p”. There are many
1
options for this, since we need to choose a 1-element subset of the set
of all 2 positions not already occupied.
 
1
– We then choose the positions of the 1 “r”. There are many
1
options for this, since we need to choose a 1-element subset of the set
of all 1 positions not already occupied.
By the dependent product rule (see the informal proof of Theorem 6.6.4
Math 221 Winter 2024, version March 12, 2024 page 288

above), the total # of ways to perform this construction is therefore


           
10 9 6 4 2 1
· · · · ·
1 3 2 2 1 1
10! 9! 6! 4! 2! 1!
= · · · · ·
1! · 9! 3! · 6! 2! · 4! 2! · 2! 1! · 1! 1! · 0!
(by the factorial formula (Theorem 2.5.3))
10!
= (by cancellations)
1! · 3! · 2! · 2! · 1! · 1! · 0!
10!
= (since 0! = 1)
1! · 3! · 2! · 2! · 1! · 1!
= 151 200.
10!
Thus, the word “bookkeeper” has 151 200 = ana-
1! · 3! · 2! · 2! · 1! · 1!
grams.
• How many anagrams does the word “anteater” have?
By the same logic as we just used, it has
8!
= 5 040 anagrams.
2! · 2! · 1! · 1! · 2!
The same argument works in the general case:
Theorem 6.7.1. Let s1 , s2 , . . . , sn be n distinct objects, and let a1 , a2 , . . . , an be
n nonnegative integers. Then, the # of tuples that consist of

a1 copies of s1 ,
a2 copies of s2 ,
...,
an copies of sn

is
n 
( a1 + a2 + · · · + a n ) ! a k + a k +1 + · · · + a n

=∏ .
a1 ! · a2 ! · · · · · a n ! k =1
ak

Informal proof (sketched). Follow the same logic as we used for “bookkeeper”
above. To construct such a tuple, we
• first
 choose the positions for the a1 many s1 ’s among its entries (there are
a1 + a2 + · · · + a n

many options for this);
a1
• then
 choose the positions for the a2 many s2 ’s among its entries (there are
a2 + a3 + · · · + a n

many options for this);
a2
Math 221 Winter 2024, version March 12, 2024 page 289

• then
 choose the positions for the a3 many s3 ’s among its entries (there are
a3 + a4 + · · · + a n

many options for this);
a3
• and so on, until finally
 choosing
 the positions for the an many sn ’s among
an
its entries (there are many options for this).
an

By the dependent product rule, the total # of such tuples is therefore

a1 + a2 + · · · + a n a2 + a3 + · · · + a n a3 + a4 + · · · + a n
     
an
···
a1 a2 a3 an
n 
a + a k +1 + · · · + a n

=∏ k
k =1
ak
n
( a k + a k +1 + · · · + a n ) !
 
by the factorial formula
=∏
a ! (( ak + ak+1 + · · · + an ) − ak )!
k =1 k
(Theorem 2.5.3)
n
(a + a + · · · + an )!
= ∏ ak ! (akk+1 +k+a1k+2 + · · · + an )!
k =1
n
∏ ( a k + a k +1 + · · · + a n ) !
k =1
= n 
n

∏ ak ! ∏ ( a k +1 + a k +2 + · · · + a n ) !
k =1 k =1
( a1 + a2 + · · · + a n ) ! · ( a2 + a3 + · · · + a n ) ! · · · · · a n !
= n

∏ ak ! (( a2 + a3 + · · · + an )! · ( a3 + a4 + · · · + an )! · · · · · an ! · 0!)
k =1
( a1 + a2 + · · · + a n ) !
 
here, we have cancelled factors that appear
= 
n

both in the numerator and the denominator
∏ ak ! · 0!
k =1
( a1 + a2 + · · · + a n ) !
= n (since 0! = 1)
∏ ak !
k =1
( a1 + a2 + · · · + a n ) !
= .
a1 ! · a2 ! · · · · · a n !
This proves Theorem 6.7.1.
(For a rigorous proof, see [Grinbe19a, Proposition 2.12.13]. Note that the
objects s1 , s2 , . . . , sn are required to be 1, 2, . . . , n in [Grinbe19a, Proposition
2.12.13], but this makes no serious difference, since we can always rename them
at will.)
Math 221 Winter 2024, version March 12, 2024 page 290

Remark 6.7.2. We can now answer the question “how many prime factor-
izations does a given number have?” from Subsection 3.6.12. For example,
consider the number 600 = 23 · 3 · 52 . A prime factorization of 600 is a tuple
that consists of three 2’s, one 3 and two 5’s, in an arbitrary order. Thus, the
6!
# of such prime factorizations is (by Theorem 6.7.1). Similarly, we
3! · 1! · 2!
can proceed for any positive integer instead of 600.

6.7.2. Multinomial coefficients


The number
( a1 + a2 + · · · + a n ) !
a1 ! · a2 ! · · · · · a n !
in Theorem 6.7.1 has a name: It is called a multinomial coefficient. By Theo-
rem6.7.1, it is an integer(since it counts something), and can be rewritten as
n a k + a k +1 + · · · + a n
∏ . Note that for n = 2, it becomes a binomial coeffi-
k =1 ak
cient:  
( a + b)! a+b
= .
a! · b! a
Multinomial coefficients have some further properties. There is a standard
notation for them: Namely, if a1 , a2 , . . . , an ∈ N are any nonnegative integers,
and if we set b = a1 + a2 + · · · + an , then the multinomial coefficient
( a1 + a2 + · · · + a n ) ! b!
=
a1 ! · a2 ! · · · · · a n ! a1 ! · a2 ! · · · · · a n !
is denoted by  
b
.
a1 , a2 , . . . , a n
As already mentioned, multinomial coefficients generalize the binomial co-
efficients that
 are  found in Pascal’s triangle: With our new notation, a binomial
n
coefficient with n ∈ N and k ∈ {0, 1, . . . , n} equals the multinomial co-
 k 
n
efficient . Pascal’s identity (Theorem 2.5.1, at least for n > 0 and
k, n − k
k ∈ {0, 1, . . . , n}) thus can be rewritten as
b−1 b−1
     
b
= +
a1 , a2 a1 − 1, a2 a1 , a2 − 1
for b > 0 and a1 , a2 ∈ N with a1 + a2 = b,

where we agree to interpret a multinomial coefficient with a negative number


at the bottom to mean 0. An analogue of this identity holds for multinomial
coefficients with more parameters:
Math 221 Winter 2024, version March 12, 2024 page 291

Theorem 6.7.3 (Recurrence of the multinomial coefficients). Let b ∈ N and


a1 , a2 , . . . , an ∈ N be such that a1 + a2 + · · · + an = b > 0. Then,
n 
b−1
  
b
=∑ .
a1 , a2 , . . . , a n a , . . . , ai−1 , ai − 1, ai+1 , . . . , an
i =1 | 1 {z }
This should be interpreted as 0 if ai =0

Proof. Nice and fairly easy exercise! (See [Grinbe19a, Exercise 2.12.6] for a
proof.)
 
n
Just like the binomial coefficients with n ∈ N and k ∈ {0, 1, . . . , n} can
k  
b
be arranged into Pascal’s triangle, the multinomial coefficients
a1 , a2 , . . . , a n
(for a given n) can be arranged into an n-dimensional analogue of Pascal’s
triangle, called Pascal’s simplex (or, for n = 3, Pascal’s pyramid). Theorem
6.7.3 then says that each entry in this simplex (except for the 1 at the apex) is
the sum of its n adjacent entries just above it.
Multinomial coefficients owe their name to another fundamental property
they satisfy: a generalization of the binomial formula, called the multinomial
formula:

Theorem 6.7.4 (the multinomial formula). Let x1 , x2 , . . . , xn be n numbers.


Let b ∈ N. Then,
 
b
∑ n a1, a2, . . . , an x11 x2a2 · · · xnan .
b a
( x1 + x2 + · · · + x n ) =
( a ,a ,...,a )∈N ;
1 2 n
a1 + a2 +···+ an =b

Proof. See [Grinbe19a, Theorem 2.12.17] (which gives two references). Here is
the simplest proof in a nutshell:
We expand ( x1 + x2 + · · · + xn )b and collect equal terms. For instance, if
n = 2 and b = 3, then

( x1 + x2 + · · · + x n ) b
= ( x1 + x2 )3
= ( x1 + x2 ) ( x1 + x2 ) ( x1 + x2 )
= x1 x1 x1 + x1 x1 x2 + x1 x2 x1 + x1 x2 x2 + x2 x1 x1 + x2 x1 x2 + x2 x2 x1 + x2 x2 x2
= x13 + 3x12 x2 + 3x1 x22 + x23 .
Math 221 Winter 2024, version March 12, 2024 page 292

What terms do we get for general n and b ? Well, if we expand the product

( x1 + x2 + · · · + x n ) b
= ( x1 + x2 + · · · + x n ) ( x1 + x2 + · · · + x n ) · · · ( x1 + x2 + · · · + x n ),
| {z }
b times

then we obtain the sum of all nb possible products of the form


xi1 xi2 · · · xib with i1 , i2 , . . . , ib ∈ [n] .
Each such product can be rewritten as the monomial x1a1 x2a2 · · · xnan , where a1 is
the # of 1’s in the b-tuple (i1 , i2 , . . . , ib ), where a2 is the # of 2’s in this b-tuple,
and so on. Moreover, this monomial satisfies a1 + a2 + · · · + an = b, since the
total # of entries of the b-tuple (i1 , i2 , . . . , ib ) is b.
Thus, expanding ( x1 + x2 + · · · + xn )b , we obtain a sum of monomials of the
form x1a1 x2a2 · · · xnan with a1 + a2 + · · · + an = b, but each such monomial can
appear several times in this sum. The total # of copies of a given monomial
x1a1 x2a2 · · · xnan that appear in this sum equals the # of all b-tuples that consist of
a1 copies of 1,
a2 copies of 2,
...,
an copies of n
(because of the previous paragraph). But this latter # equals
( a1 + a2 + · · · + a n ) !
(by Theorem 6.7.1)
a1 ! · a2 ! · · · · · a n !
 
b! since a1 + a2 + · · · + an = b (because
=
a1 ! · a2 ! · · · · · a n ! our b-tuple (i1 , i2 , . . . , ib ) has b entries in total)
    
b b
= by the definition of .
a1 , a2 , . . . , a n a1 , a2 , . . . , a n
Thus,
 each monomial  x1a1 x2a2 · · · xnan with a1 + a2 + · · · + an = b appears exactly
b
times in the sum that we obtain by expanding ( x1 + x2 + · · · + xn )b .
a1 , a2 , . . . , a n
Collecting all copies of each monomial in this expansion, we thus obtain
 
b
∑ n a1, a2, . . . , an x11 x2a2 · · · xnan .
b a
( x1 + x2 + · · · + x n ) =
( a ,a ,...,a )∈N ;
1 2 n
a1 + a2 +···+ an =b

This proves Theorem 6.7.4.


We note that this yields a new proof of the binomial formula (Theorem 2.6.1),
since the latter formula is the particular case of Theorem 6.7.4 for n = 2.
Math 221 Winter 2024, version March 12, 2024 page 293

Remark 6.7.5. We note that Theorem 6.7.3 can be used to give a second proof of
Theorem 6.7.1. Here is a rough outline of this proof:
A tuple that consists of

a1 copies of s1 ,
a2 copies of s2 ,
...,
an copies of sn
 
s1 s2 · · · s n
will be called an -tuple. Thus, Theorem 6.7.3 is claiming that
a1 a2 · · · a n
   
s1 s2 · · · s n b
the # of -tuples is , where b := a1 + a2 + · · · + an .
a1 a2 · · · a n a1 , a2 , . . . , a n
We shall now prove this by induction on b. The base case (b = 0) is trivial (since
b = 0 entails a1 = a2 = · · · = an = 0, so  we are counting  0-tuples). In the induction
s1 s2 · · · s n
step (from b − 1 to b), we separate the -tuples according to their
a1 a2 · · · a n
last entry (just as in our above rigorous proof of Theorem 6.6.4). This last entry is
either s1 or s2 or · · · or sn . Hence, the sum rule yields
   
s1 s2 · · · s n
# of -tuples
a1 a2 · · · a n
n    
s1 s2 · · · s n
=∑ # of -tuples that end with si
i =1
a1 a2 · · · a n

|  {z 
} 
s 1 s 2 · · · s i −1 si s i +1 · · · s n 
=# of  -tuples
a 1 a 2 · · · a i −1 a i − 1 a i +1 · · · a n
(by a bijection argument, just as in the proof of Theorem 6.6.4,
using the bijection that removes the last entry from a tuple)
n    
s1 s2 · · · s i −1 si s i +1 · · · s n
= ∑ # of
a1 a2 · · · a i −1 a i − 1 a i +1 · · · a n
-tuples
i =1 | {z }
b−1
 
=
a1 , . . . , ai−1 , ai − 1, ai+1 , . . . , an
(by the induction hypothesis if ai >0, and for obvious reasons if ai =0)
n
b−1
 
=∑
i =1
a1 , . . . , ai−1 , ai − 1, ai+1 , . . . , an
 
b
= (by Theorem 6.7.3) ,
a1 , a2 , . . . , a n

which completes the induction step. This proof is less conceptual than the proof
we sketched above, but it is easier to formalize, since it does not use the dependent
product rule.
Math 221 Winter 2024, version March 12, 2024 page 294

6.8. More counting problems


Exercise 6.8.1. Let n, m ∈ N. Let X be an n-element set. Let Y be an m-
element set. Let f : X → Y be an injective map. Prove that f has exactly
nm−n many left inverses.
If S is any set, and n is any nonnegative integer, then the Cartesian product
n 3
| ×S×
S {z· · · × S} is denoted by S . For example, S = S × S × S.
n times
Recall that a k-tuple (i1 , i2 , . . . , ik ) is called injective if its k entries i1 , i2 , . . . , ik
are all distinct (i.e., if i a ̸= ib for all a ̸= b).
Exercise 6.8.2. Let n ∈ N. How many injective (2n)-tuples (i1 , i2 , . . . , i2n ) ∈
[2n]2n are there such that all of the first n entries i1 , i2 , . . . , in are even?
(For instance, for n = 2, there are 4 such tuples: (2, 4, 1, 3), (2, 4, 3, 1),
(4, 2, 1, 3) and (4, 2, 3, 1).)

Exercise 6.8.3. Let n ≥ 2 be an integer.


(a) How many injective n-tuples (i1 , i2 , . . . , in ) ∈ [n]n begin with the entry
2?
(b) How many injective n-tuples (i1 , i2 , . . . , in ) ∈ [n]n contain the entry 1
before the entry 2 ? (“Before” means “somewhere to the left of”, not neces-
sarily “immediately before”. For instance, for n = 4, the 4-tuple (1, 3, 2, 4)
qualifies, but the 4-tuple (2, 3, 1, 4) does not.)
(c) How many injective n-tuples (i1 , i2 , . . . , in ) ∈ [n]n contain the entry 1
immediately preceding the entry 2 ? (Here, (1, 3, 2, 4) no longer qualifies, but
(4, 1, 2, 3) does.)
If h : S → S is any map from a set to itself, then a fixed point of h means an
element s ∈ S satisfying h (s) = s. The set of all fixed points of h will be called
Fix h.
Exercise 6.8.4. Let X and Y be two finite sets (not necessarily of the same
size).
Let f : X → Y and g : Y → X be two maps. Prove that

|Fix ( f ◦ g)| = |Fix ( g ◦ f )| .

[Hint: Show that f ( x ) ∈ Fix ( f ◦ g) for each x ∈ Fix ( g ◦ f ). Thus, there is


a map

f ′ : Fix ( g ◦ f ) → Fix ( f ◦ g) ,
x 7→ f ( x ) .

Construct a similar map g′ in the opposite direction. Prove that these two
maps f ′ and g′ are inverse to each other.]
Math 221 Winter 2024, version March 12, 2024 page 295

Now, recall Exercise 6.4.1. In that exercise, we decided to call a set S of


integers pseudolacunar if no two elements s, t of S satisfy |s − t| = 2. We
denoted the # of pseudolacunar subsets of [n] (for a given n ∈ N) by pn .
Recall also the Fibonacci sequence ( f 0 , f 1 , f 2 , . . .) that we introduced in Defi-
nition 1.5.1, and the floor function introduced in Definition 3.3.13.

Exercise 6.8.5. Prove that

pn = f ⌊(n+1)/2⌋+2 · f ⌊n/2⌋+2 for each n ≥ 2.

[Hint: What does the pseudolacunarity of a set S mean for the even ele-
ments of S ? What does it mean for the odd elements of S ?]

Exercise 6.8.6. Let n ∈ N. An n-bitstring shall mean an n-tuple


n
( a1 , a2 , . . . , an ) ∈ {0, 1} (that is, an n-tuple of 0’s and 1’s). The product
rule shows that there are 2n many n-bitstrings. (For example, (1, 1, 0, 1) is a
4-bitstring.)

(a) An n-bitstring ( a1 , a2 , . . . , an ) is said to be lacunar if it contains no two


consecutive 1’s (that is, there exists no i ∈ {1, 2, . . . , n − 1} such that
ai = ai+1 = 1). How many lacunar n-bitstrings are there?
[Example: The bitstring (0, 1, 0, 0, 1) is lacunar, but the bitstring
(0, 0, 1, 1, 0) is not.]
(b) An n-bitstring ( a1 , a2 , . . . , an ) is said to be slow if it contains no en-
try that differs from both its neighbors (i.e., there exists no i ∈
{2, 3, . . . , n − 1} such that ai is distinct from both ai−1 and ai+1 ). How
many slow n-bitstrings are there?
[Example: The bitstring (0, 0, 1, 1, 0) is slow, but the bitstring
(0, 0, 1, 0, 0) is not.]

6.9. The pigeonhole principles


While studying maps, you might have observed an intuitively obvious fact: A
map f : X → Y between two finite sets cannot be injective if | X | > |Y | (since
there are “too many arrows” to hit each element of Y only once), and cannot be
surjective if | X | < |Y | (since there are “not enough arrows” to hit each element
of Y). This is indeed true, and a bit more can be said:

Theorem 6.9.1 (pigeonhole principles for maps). Let X and Y be two finite
sets. Let f : X → Y be a map. Then:
(a) If | X | > |Y |, then f cannot be injective.
(b) If f is injective and | X | = |Y |, then f is bijective.
Math 221 Winter 2024, version March 12, 2024 page 296

(c) If | X | < |Y |, then f cannot be surjective.


(d) If f is surjective and | X | = |Y |, then f is bijective.

Theorem 6.9.1 is known as the pigeonhole principle (or principles) because


of a traditional way to state it in terms of pigeons and pigeonholes. For exam-
ple, part (a) says that if n pigeons are placed in m pigeonholes where n > m,
then there are (at least) two pigeons in the same hole. (Here, the pigeons are
the elements of X, the pigeonholes are the elements of Y, and the assignment of
a hole to each pigeon is the map f .) Similarly, the other three parts of Theorem
6.9.1 can be reformulated. All parts of Theorem 6.9.1 are intuitively obvious90 ,
but surprisingly useful (see, e.g., [Grinbe23b, Worksheet 3] for multiple appli-
cations.)

90 Butbeware of extending your intuition to infinite sets! It is easy to construct an injective but
not surjective map f : N → N.
Math 221 Winter 2024, version March 12, 2024 page 297

7. (TODO) An introduction to combinatorial


games
In this chapter, we will explore the beginnings of combinatorial game theory
– a subject that is among the most exotic in this course, yet highly elementary
and concrete. It is also full of surprises.
I will only scratch the surface of this nowadays extensive field. A read-
able textbook is [AlNoWo19], and an introduction that goes beyond the present
notes is [KarPer16, Chapter 1].

7.1. (TODO) Let’s play a game


TODO

7.2. (TODO) The concept of a combinatorial game


TODO

7.3. (TODO) Zermelo’s theorem


TODO

7.4. (TODO) Nim


TODO

7.5. (TODO) Wythoff’s game


TODO

7.6. (TODO) Symmetry, strategy stealing and other tricks


TODO

7.7. (TODO) Games with payoffs


TODO
Math 221 Winter 2024, version March 12, 2024 page 298

References
[AlNoWo19] Michael H. Albert, Richard J. Nowakowski, David Wolfe, Lessons
in Play: An Introduction to Combinatorial Game Theory, 2nd edition,
CRC Press 2019.

[AndCri17] Titu Andreescu, Vlad Crisan, Mathematical Induction: A powerful


and elegant method of proof, XYZ Press 2017.

[AndFen04] Titu Andreescu, Zuming Feng, A Path to Combinatorics for Under-


graduates: Counting Strategies, Springer 2004.

[BaEdHa18] Mohamed Barakat, Christian Eder, Timo Hanke, An Introduction


to Cryptography, 20 September 2018.
https://agag-ederc.math.rptu.de/~ederc/download/
Cryptography.pdf

[Beutel94] Albrecht Beutelspacher, Cryptology, MAA Spectrum, MAA 1994.

[BoyVan18] Stephen Boyd, Lieven Vandenberghe, Introduction to Applied Linear


Algebra: Vectors, Matrices, and Least Squares, Cambridge University
Press 2018.
https://web.stanford.edu/~boyd/vmls/vmls.pdf

[Buchma04] Johannes A. Buchmann, Introduction to Cryptography, 2nd edition,


Springer 2004.
See https://web.archive.org/web/20210514033344/
https://www.springer.com/cda/content/document/
cda_downloaddocument/9780387207568-e1.pdf?SGWID=
0-0-45-148352-p27166260 for errata.

[Conrad22] Keith Conrad, The Infinitude of the Primes, 24 April 2023.


https://kconrad.math.uconn.edu/blurbs/ugradnumthy/
infinitudeofprimes.pdf

[Day16] Martin V. Day, An Introduction to Proofs and the Mathematical


Vernacular, 7 December 2016.
https://web.archive.org/web/20180712152432/https:
//www.math.vt.edu/people/day/ProofsBook/IPaMV.pdf .

[GrKnPa94] Ronald L. Graham, Donald E. Knuth, Oren Patashnik, Concrete


Mathematics, Second Edition, Addison-Wesley 1994.
See https://www-cs-faculty.stanford.edu/~knuth/gkp.html
for errata.

[Grinbe15] Darij Grinberg, Notes on the combinatorial fundamentals of algebra, 15


September 2022, arXiv:2008.09862v3.
Math 221 Winter 2024, version March 12, 2024 page 299

[Grinbe17] Darij Grinberg, UMN Fall 2017 Math 4707 & Math 4990 homework set
#2 with solutions, http://www.cip.ifi.lmu.de/~grinberg/t/17f/
hw2s.pdf

[Grinbe19a] Darij Grinberg, Enumerative Combinatorics: class notes, 13 Septem-


ber 2022.
http://www.cip.ifi.lmu.de/~grinberg/t/19fco/n/n.pdf Also
available on the mirror server http://darijgrinberg.gitlab.io/
t/19fco/n/n.pdf

[Grinbe19b] Darij Grinberg, Introduction to Modern Algebra (UMN Spring 2019


Math 4281 notes), 29 June 2019.
http://www.cip.ifi.lmu.de/~grinberg/t/19s/notes.pdf

[Grinbe19c] Darij Grinberg, Drexel Fall 2019 Math 222 homework set #0 with solu-
tions, http://www.cip.ifi.lmu.de/~grinberg/t/19fco/hw0s.pdf

[Grinbe20] Darij Grinberg, Math 235: Mathematical Problem Solving, 10 August


2021.
https://www.cip.ifi.lmu.de/~grinberg/t/20f/mps.pdf

[Grinbe21] Darij Grinberg, Math 235 Fall 2021, Worksheet 5: p-valuations, 29


December 2021.
https://www.cip.ifi.lmu.de/~grinberg/t/21f/lec5.pdf

[Grinbe22] Darij Grinberg, Math 222: Enumerative Combinatorics, Fall 2022.


https://www.cip.ifi.lmu.de/~grinberg/t/22fco/

[Grinbe23a] Darij Grinberg, An introduction to graph theory (Text for Math 530 in
Spring 2022 at Drexel University), arXiv:2308.04512v1.

[Grinbe23b] Darij Grinberg, Math 235: Mathematical Problem Solving, Fall 2023,
worksheets.
https://www.cip.ifi.lmu.de/~grinberg/t/23f/

[Guicha20] David Guichard, An Introduction to Combinatorics and Graph Theory,


4 March 2023.
https://www.whitman.edu/mathematics/cgt_online/book/

[Gunder10] David S. Gunderson, Handbook of Mathematical Induction: Theory


and Applications, CRC Press 2010.

[Hackma09] Peter Hackman, Elementary Number Theory, 1 November 2009.


https://citeseerx.ist.psu.edu/document?repid=rep1&type=
pdf&doi=d345c68fcb70874805be1f100a82d6a0c8256b6d
Math 221 Winter 2024, version March 12, 2024 page 300

[HoPiSi14] Jeffrey Hoffstein, Jill Pipher, Joseph H. Silverman, An Introduction


to Mathematical Cryptography, 2nd edition, Springer 2014.
See https://www.math.brown.edu/johsilve/MathCrypto/
MathCryptoErrata2ndEd.pdf for errata.

[KarPer16] Anna R. Karlin, Yuval Peres, Game Theory, Alive, 13 December


2016.
https://homes.cs.washington.edu/~karlin/GameTheoryBook.
pdf

[KraWas15] James S. Kraft, Lawrence C. Washington, Elementary Number The-


ory, CRC Press 2015.
See https://www.math.umd.edu/~lcw/ENTerrata.pdf for errata.

[KraWas18] James S. Kraft, Lawrence C. Washington, An Introduction to Number


Theory with Cryptography, 2nd edition, CRC Press 2018.
See https://www.math.umd.edu/~lcw/NT2ndErrata.pdf for er-
rata.

[LeLeMe16] Eric Lehman, F. Thomson Leighton, Albert R. Meyer, Mathematics


for Computer Science, revised Tuesday 6th June 2018,
https://courses.csail.mit.edu/6.042/spring18/mcs.pdf .

[Levin21] Oscar Levin, Discrete Mathematics: An Open Introduction, 3rd edi-


tion 2021.
https://discrete.openmathbooks.org/dmoi3.html

[Loehr11] Nicholas A. Loehr, Bijective Combinatorics, Chapman & Hall/CRC


2011.

[Martin17] Kimball Martin, An (algebraic) introduction to Number Theory, Fall


2017, December 25, 2017.

[Melian01] María Victoria Melián, Linear recurrence relations with constant


coefficients, 9 April 2001.
http://matematicas.uam.es/~mavi.melian/CURSO_15_16/web_
Discreta/recurrence.pdf

[Mileti22] Joseph R. Mileti, Combinatorics and Number Theory, 16 August 2022.


https://mileti.math.grinnell.edu/ComboNumber.pdf

[Ivanov08] Nikolai V. Ivanov, Linear Recurrences, 21 January 2008.


https://nikolaivivanov.files.wordpress.com/2014/02/
ivanov2008arecurrence.pdf

[Newste23] Clive Newstead, An Infinite Descent into Pure Mathematics, version


1.0 preview, 10 January 2024.
https://infinitedescent.xyz
Math 221 Winter 2024, version March 12, 2024 page 301

[Shoup08] Victor Shoup, A Computational Introduction to Number Theory and


Algebra, 2nd edition, Cambridge University Press 2008, with errata
2017.

[Singh01] Simon Singh, The Code Book, Delacorte Press 2001.

[Stein08] William Stein, Elementary Number Theory: Primes, Congruences, and


Secrets, Springer 2008, updated version 2017.

[UspHea39] J. V. Uspensky, M. A. Heaslet, Elementary Number Theory, McGraw-


Hill 1939.

[Vorobi02] Nicolai N. Vorobiev, Fibonacci Numbers, Translated from the Rus-


sian by Mircea Martin, Springer 2002 (translation of the 6th Rus-
sian edition).

[Yashin15] Allan Yashinski, Math 325 – Equivalence Relations, Well-definedness,


Modular Arithmetic, and the Rational Numbers, 13 October 2015.
https://math.hawaii.edu/~allan/WellDefinedness.pdf

You might also like