24 WD
24 WD
Mathematics
(Text for Math 221 Winter 2024 at Drexel University)
Darij Grinberg
draft, March 12, 2024
Contents
0. Preface 6
0.1. What is this? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.2. Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.3. Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
0.4. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1
Math 221 Winter 2024, version March 12, 2024 page 2
∗∗∗
This is a set of lecture notes for my Math 221 course at Drexel University in
Winter 2024. At the moment, it is somewhat of a draft (and much of it is
copypasted from my Math 221 course in Winter 2023). Some parts of it are still
to be filled in (but the rest should already be readable).
Math 221 Winter 2024, version March 12, 2024 page 6
0. Preface
0.1. What is this?
This is a course on discrete mathematics. To us, discrete mathematics means
the mathematics of finite, discrete objects: integers, finite sets, occasionally
some more complex creatures such as graphs and polynomials. Integer se-
quences, while theoretically infinite, are also included since one usually makes
statements about finite pieces of the sequence. Much of linear algebra logically
belongs to discrete mathematics, but there are separate courses entirely devoted
to it, so we won’t touch on it here.
Discrete mathematics is in contrast to continuous mathematics, which stud-
ies real numbers, continuous functions and infinite sets. This mostly begins
with analysis (or calculus, which is its less rigorous variant).
So this course will introduce you to some of the major topics of discrete
mathematics:
We will neither go very deep nor be fully rigorous about everything. There
are deeper, more specific classes on most of these subjects:
taste of each of several important and (if I dare say so) interesting topics, each
time veering deep enough to see some substance but not to get lost in the
jungle. Other introductions to discrete mathematics are [Levin21], [LeLeMe16],
[Newste23] and [GrKnPa94], just to name a few. There is no “standard” choice
of material for such a text; each author goes one’s own way through the vast
landscape. So do I in these notes. I have deliberately avoided anything analytic
or geometric in order to stick to the subject declared (discrete mathematics!),
but otherwise I have picked from different topics and fields. Some topics such
as graphs, posets and the construction of the number systems are nevertheless
missing from this introduction, as the lack of days in an academic quarter has
forced choices upon me.
The course that these notes were written for has a website:
https://www.cip.ifi.lmu.de/~grinberg/t/24wd/
0.2. Plan
This text is split into 7 chapters:
2. Sums and products. Here we define and study finite sums, finite prod-
ucts, factorials and binomial coefficients. These basic algebraic concepts
appear all over mathematics, and also offer us some more practice in us-
ing induction.
5. Maps (aka functions). The notion of a map (or function, which is synony-
mous) is absolutely fundamental to modern mathematics. It encompasses
the functions known from calculus, but is more general, as it allows any
kinds of input and output. We define it rigorously and introduce its basic
features: composition, inverses, injectivity, surjectivity, bijectivity.
0.3. Notations
We shall use the following notations:
• We let N denote the set of all nonnegative integers, that is, {0, 1, 2, . . .}.
• The notation |S| denotes the size (i.e., the number of elements) of a set S.
• The abbreviation “LHS” means “left hand side” (of an equation). The
abbreviation “RHS” means “right hand side”.
• The notation “:=” means “is defined to be”. For example, “sn := 1 + 2 +
· · · + n” means that we define sn to be 1 + 2 + · · · + n.
0.4. Acknowledgments
I thank Keith Conrad, Karen Edwards, Andy Hicks and Tom Roby for helpful
advice and conversations about what a course on discrete mathematics should
contain. (Needless to say, I did not heed all of this advice in these notes.)
Your name could stand here: Please send corrections and comments to
[email protected] .
Math 221 Winter 2024, version March 12, 2024 page 9
What about other values of n ? The questions we can ask are the following:
• For n = 0, we win in 0 moves (since all disks – of which there are none
– are on peg 3 already). This sounds very pedantic and pointless, but it’s
not a bad start.
• For n = 2, we win in 3 moves. Fewer moves are not enough, for fairly
simple logical reasons: We need 1 move to free the largest disk, then 1
move to move it to peg 3, then 1 more move to get the other disk on top
of it.
Solving the problem by brute force gets harder and harder as n grows. But
we can try to analyze our strategy for n = 3 and see if there is a pattern behind
it.
We observe that the largest disk moves only once, and its move is right in the
middle of the strategy. So our strategy for n = 3 can be summarized as follows:
1.–3. Move the two smaller disks from peg 1 onto peg 2.
5.–7. Move the two smaller disks from peg 2 onto peg 3.
Moreover, the moves 1–3 in this strategy are essentially a Tower of Hanoi
game played only with the two smaller disks, except that the goal is not to
move them to peg 3 but to move them to peg 2 (but this doesn’t matter, because
the two games are clearly “isomorphic” – i.e., the roles of pegs 2 and 3 are
swapped but otherwise everything is the same). The largest disk stays at the
bottom of peg 1 all the time and thus does not prevent any of the moves (since
all the other disks are smaller than it and thus can fit on top of it).
Move 4 moves the newly liberated largest disk from peg 1 onto peg 3.
Moves 5–7 are again a little Tower of Hanoi game for the two smaller disks,
except that now they have to be moved from peg 2 to peg 3. Again, the largest
disk (which is now on the bottom of peg 3) does not interfere with any of the
moves.
Now the logic behind the above strategy has become clear (and also easier to
memorize).
1.–7. Move the three smaller disks from peg 1 onto peg 2. (This is a little Tower
of Hanoi game for these three smaller disks. The largest disk rests at the
bottom of peg 1 and does not interfere.)
9.–15. Move the three smaller disks from peg 2 onto peg 3. (This is again a little
Tower of Hanoi game for these three smaller disks. The largest disk rests
at the bottom of peg 3 and does not interfere.)
Thus, we don’t just have a strategy for n = 3 and one for n = 4, but actually
a “meta-strategy” that lets us win the game for n disks if we know how to win
it for n − 1 disks. In a nutshell, it says “first move the n − 1 smaller disks onto
peg 2; then move the largest disk onto peg 3; then move the n − 1 smaller disks
onto peg 3”. We will still call this “meta-strategy” a strategy.
Thus, both Question 1.1.1 (a) and Question 1.1.1 (b) boil down to computing
mn .
Math 221 Winter 2024, version March 12, 2024 page 12
n 0 1 2 3 4 5 6 7 8
mn 0 1 3 7 15 31 63 127 255
Note that these values are easily computed using our strategy, because in order
to win the game for a given n, we have to win it for n − 1, then make one extra
move, then win it for n − 1 again. So we get mn = mn−1 + 1 + mn−1 = 2mn−1 + 1
(for n ≥ 1).
Right?
Not so fast! We have proved that, e.g., the game can be won in 127 moves
for n = 7. We have not proved that it cannot be won in fewer moves. So the
formula mn = 2mn−1 + 1 has been proved not for the # of moves needed to win,
but rather for the # of moves needed to win using our strategy. Maybe there is
a better strategy that wins for n = 7 in (say) 109 moves?
To gain some writing experience, let us write out the proof in detail:
Proof. Assume that mn−1 is an integer. Thus, we can win the game for n − 1
disks in mn−1 moves. Let S be the strategy (i.e., the sequence of moves) needed
to do this. So the strategy S moves n − 1 disks from peg 1 onto peg 3 in mn−1
moves.
Let S23 be the same strategy as S, but with the roles of pegs 2 and 3 swapped.
Thus, S23 moves n − 1 disks from peg 1 onto peg 2 in mn−1 moves.
Let S12 be the same strategy as S, but with the roles of pegs 1 and 2 swapped.
Thus, S12 moves n − 1 disks from peg 2 onto peg 3 in mn−1 moves.
Now, we proceed as follows to win the game with n disks:
A. We use strategy S23 to move the n − 1 smaller disks from peg 1 onto peg
2. (This is allowed because the largest disk rests at the bottom of peg 1
and does not interfere with the movement of smaller disks.)
B. We move the largest disk from peg 1 onto peg 3. (This is allowed because
this disk is free (i.e., there are no disks on top of it) and because peg 3 is
empty, since all the other disks are on peg 2.)
C. We use strategy S12 to move the n − 1 smaller disks from peg 2 onto peg
3. (Again, this is allowed since the largest disk rests at the bottom of peg
3 and does not interfere.)
Math 221 Winter 2024, version March 12, 2024 page 13
This strategy wins the game (for n disks) in mn−1 + 1 + mn−1 = 2mn−1 + 1
many moves. So the game for n disks can be won in 2mn−1 + 1 many moves.
In other words, mn ≤ 2mn−1 + 1. This proves Proposition 1.1.3.
2 because in either case, they would block the move of the largest disk
Math 221 Winter 2024, version March 12, 2024 page 14
n 0 1 2 3 4 5 6 7 8
mn 0 1 3 7 15 31 63 127 255
Obviously, you can keep using Proposition 1.1.4 to compute m9 , m10 , m11 , . . ..
Indeed, the equation
mn = 2mn−1 + 1 (1)
is what is called a recursive formula for the numbers mn . This means a formula
that allows you to compute mn using the previous values m0 , m1 , . . . , mn−1 . In
our case, we only need the direct predecessor mn−1 , so this is a particularly
convenient recursive formula.
1. The statement P (b) holds (i.e., the statement P (n) holds for n = b).
A B A =⇒ B
true true true
true false false .
false true true
false false true
You can think of it as a contract: “If you make A true, then I make B true”. If you don’t
make A true, then this contract places no obligation on me, since you haven’t done your
part! The only way for me to violate the contract is if you make A true but I don’t make B
true. In other words, A =⇒ B is a “relative” statement, which is true by default if A is not.
Usually, if you want to prove an implication A =⇒ B, you start by assuming that A holds,
and you need to show that B holds (under this assumption).
Math 221 Winter 2024, version March 12, 2024 page 16
Before we discuss the true meaning of this principle, let me show how to use
it to prove our mn = 2n − 1 claim. We state this claim as a theorem:
Proving these two claims will be our two goals; we call them Goal 1 and Goal
2. Let us see if we can achieve them.
Goal 1 is easy: The statement P (0) is just saying that m0 = 20 − 1, which is
true since both sides are 0.
We now start working towards Goal 2. Let n ≥ 0 be an integer. We must
prove the implication P (n) =⇒ P (n + 1). To prove this, we assume that P (n)
holds, and we set out to prove that P (n + 1) holds.
Our assumption says that P (n) holds, i.e., that
mn = 2n − 1.
(The question mark above the equality sign just serves to remind us that we
have not proved this equality yet.)
Proposition 1.1.4 yields that mn = 2mn−1 + 1 if n ≥ 1 (and if mn−1 is not ∞).
But this is not very helpful, since we are looking for mn+1 , not for mn .
However, we can also apply Proposition 1.1.4 to n + 1 instead of n (since n
is just an arbitrary integer ≥ 1 in that proposition; it is not bound to be our
current n). This gives us
mn+1 = 2mn + 1.
Math 221 Winter 2024, version March 12, 2024 page 17
Thus,
m n +1 = 2 m n + 1 = 2 · (2n − 1 ) + 1 = 2 · 2n − 2 + 1 = · 2n}
2| {z −1
|{z}
=2n −1 =2n +1
(by one of the laws
of exponents)
= 2n+1 − 1.
What have we really done here? How did this proof work? What is the logic
underlying the Principle of Mathematical Induction?
Let us take a look at the structure of our above proof.
Our goal was to prove that P (n) holds for every n ≥ 0.
In other words, our goal was to prove the statements
Remark 1.2.3. You can metaphorically think of our proof (or any proof using
the Principle of Mathematical Induction) as an infinite daisy chain of lamps,
which stand for the statements P (0) , P (1) , P (2) , . . .: Goal 1 turns the first
lamp on, whereas Goal 2 ensures that each lamp turns the next on when it is
turned on itself.
Or, to use a more commonplace illustration, you have an infinite sequence
of dominos arranged in a row, at sufficiently close distances so that tipping
over one domino will tip over the next. After you tip over the first domino,
all the dominos will eventually fall down. (The dominos here stand for the
statements P (0) , P (1) , P (2) , . . ..)
n ( n + 1)
1+2+···+n = .
2
The LHS (= left hand side) here is understood to be the sum of the first n
positive integers. For n = 0, this sum is an empty sum (i.e., it has no addends
at all), so its value is 0 by definition.
First proof of Theorem 1.3.1. We set
sn := 1 + 2 + · · · + n
n ( n + 1)
for each n ≥ 0. Thus, we must prove that sn = for each n ≥ 0.
2
n ( n + 1)
Let us denote the statement “sn = ” by P (n). So we need to prove
2
that P (n) holds for every n ≥ 0.
According to the Principle of Mathematical Induction, it suffices to show that
0 (0 + 1)
Goal 1 is easy: To prove P (0), we must show that s0 = , but this is
2
true because both sides equal 0.
Now to Goal 2. We let n ≥ 0 be an integer, and we want to prove the
implication P (n) =⇒ P (n + 1). So we assume that P (n) holds, and we set out
to prove P (n + 1).
By assumption, P (n) holds, so that we have
n ( n + 1)
sn = .
2
We must prove P (n + 1); in other words, we must prove that
? (n + 1) ((n + 1) + 1)
s n +1 = .
2
To do so, we observe that
s n +1 = 1 + 2 + · · · + ( n + 1 ) = (1 + 2 + · · · + n ) + ( n + 1 )
| {z }
=sn
n ( n + 1) n ( n + 1)
= s n + ( n + 1) = + ( n + 1) since sn =
2 2
n ( n + 1) 2 ( n + 1) ( n + 2) ( n + 1) ( n + 1) ( n + 2)
= + = =
2 2 2 2
(n + 1) ((n + 1) + 1)
= .
2
In other words, P (n + 1) holds. Thus, we have proved the implication P (n) =⇒
P ( n + 1).
We have now achieved both goals, so the Principle of Mathematical Induction
yields that P (n) holds for every n ≥ 0. This proves the theorem.
There is also a non-inductive proof; this is how Gauss supposedly did it:
Math 221 Winter 2024, version March 12, 2024 page 20
2 · (1 + 2 + · · · + n )
= (1 + 2 + · · · + n ) + (1 + 2 + · · · + n )
= (1 + 2 + · · · + n ) + ( n + ( n − 1) + · · · + 1)
here, we turned the second sum upside-down, i.e.,
we reversed the order of its addends
= (1 + n) + (2 + (n − 1)) + · · · + (n + 1)
| {z } | {z } | {z }
= n +1 = n +1 = n +1
here, we rearranged the sum by matching
up each addend inside the first pair of
parentheses with the corresponding addend
inside the second pair of parentheses
= ( n + 1) + ( n + 1) + · · · + ( n + 1)
| {z }
n addends
= n · ( n + 1) .
n · ( n + 1)
1+2+···+n = ,
2
and thus Theorem 1.3.1 is proved again.
n (n + 1) (2n + 1)
12 + 22 + · · · + n2 = .
6
Proof. The following proof is almost a word-by-word copy of the first proof of
Theorem 1.3.1. The structure is the same; only the calculations change.
We set
s n : = 12 + 22 + · · · + n 2 .
n (n + 1) (2n + 1)
Thus, we must prove that sn = for each n ≥ 0.
6
n (n + 1) (2n + 1)
Let us denote the statement “sn = ” by P (n). So we need
6
to prove that P (n) holds for every n ≥ 0.
According to the Principle of Mathematical Induction, it suffices to show that
Math 221 Winter 2024, version March 12, 2024 page 21
0 (0 + 1) (2 · 0 + 1)
Goal 1 is easy: To prove P (0), we must show that s0 = ,
6
but this is true because both sides equal 0.
Now to Goal 2. We let n ≥ 0 be an integer, and we want to prove the
implication P (n) =⇒ P (n + 1). So we assume that P (n) holds, and we set out
to prove P (n + 1).
By assumption, P (n) holds, so that we have
n (n + 1) (2n + 1)
sn = .
6
We must prove P (n + 1); in other words, we must prove that
? (n + 1) ((n + 1) + 1) (2 (n + 1) + 1)
s n +1 = .
6
To do so, we observe that
sn+1 = 12 + 22 + · · · + (n + 1)2
= 1 + 2 + · · · + n + ( n + 1)2
2 2 2
| {z }
=sn
= s n + ( n + 1)2
n (n + 1) (2n + 1) 2 n (n + 1) (2n + 1)
= + ( n + 1) since sn =
6 6
n (2n + 1)
= ( n + 1) · + ( n + 1)
6
2n2 + 7n + 6
= ( n + 1) ·
6
2
(n + 1) 2n + 7n + 6
=
6
since 2n2 + 7n + 6 can be
(n + 1) (n + 2) (2n + 3)
=
6 factored as (n + 2) (2n + 3)
(n + 1) ((n + 1) + 1) (2 (n + 1) + 1)
= .
6
In other words, P (n + 1) holds. Thus, we have proved the implication P (n) =⇒
P ( n + 1).
We have now achieved both goals, so the Principle of Mathematical Induction
yields that P (n) holds for every n ≥ 0. This proves the theorem.
Math 221 Winter 2024, version March 12, 2024 page 22
As we said, our above proof of Theorem 1.3.2 was an almost verbatim copy of
our first proof of Theorem 1.3.1; we only needed to make the obvious changes
and calculate a little bit harder. Both proofs were more or less determined by
the idea to use induction. In contrast, the slick second proof of Theorem 1.3.1
cannot be adapted to Theorem 1.3.2. So the induction proof has the advantage
of better generalizability.
However, it has the disadvantage that it can only be used to prove a formula
n ( n + 1) n (n + 1) (2n + 1)
(in our case, 1 + 2 + · · · + n = or 12 + 22 + · · · + n2 = ),
2 6
not to find this formula in the first place. We could not have used induction to
answer the question “what is 1 + 2 + · · · + n?”; we could only use it to prove
the answer after guessing it in some way.
for each nonnegative integer n. (The left hand side here is the sum of the
cubes of the first n positive integers.)
• The n is called the induction variable; you say that you induct on n. It
does not have to be called n. Your statement might just as well be “for
a ( a + 1)
every integer a ≥ 0, we have 1 + 2 + · · · + a = ”, and then you
2
can prove it by inducting on a.
• The proof of P (b) (that is, Goal 1 in our above proofs) is called the in-
duction base or the base case. In our above examples, this was always
the proof of P (0), but in general b can be another integer. (For example,
if you are proving the statement “every integer n ≥ 4 satisfies 2n ≥ n2 ”,
then b will have to be 4, so your induction base consists in proving that
24 ≥ 42 .)
In the induction step, the assumption that P (n) holds is called the induc-
tion hypothesis or the induction assumption, and the claim that P (n + 1)
holds (this is the claim that you are trying to prove) is called the induction
goal. The induction step is complete when the induction goal is reached
(i.e., proved).
As an example, let us rewrite our above proof of Theorem 1.2.2 using this
language:
Proof of Theorem 1.2.2, rewritten. We induct on n.
Base case: The theorem5 holds for n = 0, since both m0 and 20 − 1 equal 0.
Induction step: Let n ≥ 0 be an integer. We assume that the theorem holds for
n (this is what we previously called P (n)). We will now show that the theorem
holds for n + 1 as well (this is what we previously called P (n + 1)).
We have assumed that the theorem holds for n. In other words, mn = 2n − 1.
This is our induction hypothesis.
We must prove that the theorem holds for n + 1. In other words, we must
?
prove that mn+1 = 2n+1 − 1.
To prove this, we apply Proposition 1.1.4 to n + 1 instead of n (we can do
this, since mn = 2n − 1 is not ∞). This gives us
m n +1 = 2 mn + 1 = 2 · (2n − 1 ) + 1
|{z}
=2n −1
(by the induction hypothesis)
· 2n} − 1 = 2n+1 − 1.
= 2 · 2n − 2 + 1 = 2| {z
=2n +1
Thus, the induction goal is reached, and the induction is complete. Hence, the
theorem is proved.
f 0 = 0, f 1 = 1, and
f n = f n −1 + f n −2 for each n ≥ 2.
In other words, the Fibonacci sequence starts with the two entries 0 and 1,
and then every next entry is the sum of the two previous entries.
The entries of the Fibonacci sequence are called the Fibonacci numbers. Let
us compute the first fourteen of them:
n 0 1 2 3 4 5 6 7 8 9 10 11 12 13
.
fn 0 1 1 2 3 5 8 13 21 34 55 89 144 233
f 1 + f 2 + · · · + f n = f n+2 − 1.
1 + 1 + 2 + 3 + 5 + 8 + 13 + 21 = 55 − 1.
So we assumed that
f 1 + f 2 + · · · + f n = f n+2 − 1.
We have
f 1 + f 2 + · · · + f n +1 = ( f 1 + f 2 + · · · + f n ) + f n +1 = f n +2 − 1 + f n +1
| {z }
= f n +2 −1
(by our induction hypothesis)
= f +f − 1 = f n +3 − 1
|n+2 {z n+}1
= f n +3
(since the recursive definition
of the Fibonacci sequence
yields f n+3 = f n+2 + f n+1 )
= f (n+1)+2 − 1 (since n + 3 = (n + 1) + 2) .
This is precisely what we wanted to prove – i.e., it says that the theorem holds
for n + 1. This completes the induction step. Thus, the theorem is proved.
20 + 21 + 22 + · · · + 2n−1 = 2n − 1.
Proof. We induct on n.
Base case: For n = 0, the equality 20 + 21 + 22 + · · · + 2n−1 = 2n − 1 is true,
because the LHS8 is an empty sum and thus equals 0, whereas the RHS is
20 − 1 = 1 − 1 = 0.
Induction step: Let n be an integer ≥ 0. Assume that Theorem 1.6.1 holds for
n, i.e., that we have
20 + 21 + 22 + · · · + 2n−1 = 2n − 1.
We must prove that Theorem 1.6.1 holds for n + 1 as well, i.e., that we have
20 + 21 + 22 + · · · + 2(n+1)−1 = 2n+1 − 1.
8 “LHS” means “left-hand side”. Likewise, “RHS” means “right-hand side”.
Math 221 Winter 2024, version March 12, 2024 page 26
However,
20 + 21 + 22 + · · · + 2(n+1)−1 = 20 + 21 + 22 + · · · + 2n
= 20 + 21 + 22 + · · · + 2n−1 + 2n
| {z }
=2n −1
(by the induction hypothesis)
· 2n} − 1 = 2n+1 − 1,
= 2n − 1 + 2n = 2| {z
=2n +1
which is precisely what we want: This shows that Theorem 1.6.1 holds for n + 1.
Thus, our induction step is complete, and Theorem 1.6.1 is proved.
Theorem 1.6.1 can be generalized:
Theorem 1.6.2. Let x and y be any two numbers. Then, for any integer n ≥ 0,
we have
( x − y) x n−1 + x n−2 y + x n−3 y2 + · · · + x2 yn−3 + xyn−2 + yn−1 = x n − yn .
Here, the big sum in the parentheses is the sum of all products xi y j where i
and j are nonnegative integers with i + j = n − 1.
Before we prove this, let us give some examples for what this theorem actu-
ally says:
• For n = 2, Theorem 1.6.2 says that
( x − y ) ( x + y ) = x 2 − y2 .
Since any power of 1 is 1 (and since the 2 − 1 factor also equals 1), this
simplifies to
2n−1 + 2n−2 + 2n−3 + · · · + 22 + 2 + 1 = 2n − 1,
which is precisely Theorem 1.6.1. Thus, Theorem 1.6.2 generalizes Theo-
rem 1.6.1.
Math 221 Winter 2024, version March 12, 2024 page 27
is true, since the LHS is 0 (because the second factor is an empty sum), while
the RHS is x0 − y0 = 1 − 1 = 0 as well.
Induction step: Let n ≥ 0 be an integer. Assume that Theorem 1.6.2 is true for
n. That is, assume that
( x − y) x n−1 + x n−2 y + x n−3 y2 + · · · + x2 yn−3 + xyn−2 + yn−1 = x n − yn .
We must prove that Theorem 1.6.2 is also true for n + 1. That is, we must prove
that
n −1 n −2 2 3 n −3 2 n −2 n −1
n
( x − y) x + x y + x y + · · · + x y +x y + xy + y = x n +1 − y n +1 .
n
We begin by extracting the yn addend from the long sum in the second pair
of parentheses in this equation. We thus obtain
( x − y) x n + x n−1 y + x n−2 y2 + · · · + x3 yn−3 + x2 yn−2 + xyn−1 + yn
= ( x − y) x n + x n−1 y + x n−2 y2 + · · · + x3 yn−3 + x2 yn−2 + xyn−1 + ( x − y) yn
| {z }
=( x n − 1 n − 2 n − 3 2 2 n − 3 n − 2 n − 1
+x y+ x y +···+ x y + xy +y )x
(here, we have factored out an x from the sum)
= ( x − y) x n−1 + x n−2 y + x n−3 y2 + · · · + x2 yn−3 + xyn−2 + yn−1 x + ( x − y) yn
| {z }
= x n −yn
(by the induction hypothesis)
This means precisely that Theorem 1.6.2 is also true for n + 1. Thus, the induc-
tion step is complete, and the theorem is proved.
Another useful particular case of Theorem 1.6.2 is the following equality:9
Thus,
qn − 1
q n −1 + q n −2 + q n −3 + · · · + q 2 + q + 1 = .
q−1
In other words,
qn − 1
q 0 + q 1 + q 2 + · · · + q n −1 =
q−1
(since the sum on the left hand side can be rearranged in any order). This
proves Corollary 1.6.3.
Exercise 1.6.1. Let N denote the set of all nonnegative integers (that is,
{0, 1, 2, . . .}). Let q and d be two real numbers such that q ̸= 1. Let
( a0 , a1 , a2 , . . .) be a sequence of real numbers. Assume that
Prove that
qn − 1
a n = q n a0 + d for each n ∈ N. (3)
q−1
Theorem 1.7.1 (Fake theorem). In any set of n ≥ 1 horses, all the horses are
the same color.
Proof. We induct on n.
Base case: This is clearly true for n = 1, since a single horse always has the
same color as itself.
Induction step: Let n ≥ 1 be an integer. We assume that the theorem holds for
n, i.e., that any n horses are the same color.
We must prove that it also holds for n + 1, i.e., that any n + 1 horses are the
same color.
So let H1 , H2 , . . . , Hn+1 be n + 1 horses.
By our induction hypothesis, the first n horses H1 , H2 , . . . , Hn are the same
color.
Math 221 Winter 2024, version March 12, 2024 page 29
Again by our induction hypothesis, the last n horses H2 , H3 , . . . , Hn+1 are the
same color.
Now, consider the first horse H1 and the last horse Hn+1 . They both have the
same color as the “middle horses” H2 , H3 , . . . , Hn (according to the preceding
two paragraphs). Thus, all the n + 1 horses have the same color, right?
When a claim is as obviously wrong as this one, there is an easy way to find
the mistake in the proof: You just look at some example in which the claim is
wrong, and you trace the proof on this example. The first time you see a wrong
conclusion, that’s where the error probably is.
Theorem 1.7.1 is wrong for n = 2 already, i.e., for two horses. So let us see
where the induction step goes wrong when n = 1 (that is, going from 1 horse
to 2 horses). In this induction step, we claim that H1 and Hn+1 = H2 both have
the same color as the “middle horses” H2 , H3 , . . . , H1 . But there are no “middle
horses”, so it makes no sense to have the same color as these “middle horses”.
So the argument doesn’t work.
Thus, our mistake was to implicitly treat the “middle horses” as if they ex-
isted. They do exist for any n > 1, but not for n = 1, and thus our induction
step breaks down for n = 1.
Note how one little mistake has brought down the entire proof! For an in-
duction proof to work, the induction step needs to work for all n; that is, we
need the implication P (n) =⇒ P (n + 1) to hold for every n. If even one of
these implications breaks down, the whole chain is disconnected, and all the
statements P (n) “to the right of” this breaking point are no longer guaranteed
to hold. For example, if we have a statement P (n) for each n ≥ 0, and we have
proved the base case P (0) and the implication P (n) =⇒ P (n + 1) for all n ̸= 4,
then we can conclude that P (0) , P (1) , P (2) , P (3) and P (4) hold, but we
cannot guarantee that any of P (5) , P (6) , P (7) , . . . hold. As so often, a chain
is only as strong as its weakest link.
n 0 1 2 3 4 5 6 7 8 9 10 11 12 13
.
fn 0 1 1 2 3 5 8 13 21 34 55 89 144 233
f 1 + f 3 + f 5 + · · · + f 2n−1 = f 2n .
(The left hand side is the sum of all f 2i−1 with i ∈ {1, 2, . . . , n}.)
(b) Prove that every nonnegative integer n satisfies
f 0 + f 2 + f 4 + · · · + f 2n = f 2n+1 − 1.
(The left hand side is the sum of all f 2i with i ∈ {0, 1, . . . , n}.)
Proof. Can you induct on two variables at the same time? Not directly (although
you can induct on n and then induct on m in the induction step, so that you
have one induction proof inside another). Fortunately, we don’t need to do this
here. It suffices to induct on one of the variables.
To be specific, let us induct on n. To that purpose, for every integer n ≥ 0,
we define the statement P (n) to say
“for all integers m ≥ 0, we have f n+m+1 = f n f m + f n+1 f m+1 ”.
(Don’t forget the “for all integers m ≥ 0” part! The statement P (n) is not
just a single equality f n+m+1 = f n f m + f n+1 f m+1 for some specific value of
m, but rather combines infinitely many such equalities, one for each integer
m ≥ 0. If we fixed a value of m and defined P (n) to be just the single equality
f n+m+1 = f n f m + f n+1 f m+1 , then the induction proof below would not work,
because we are going to apply the induction hypothesis to a different m than
we start with.)
We shall now prove this statement P (n) for all n ≥ 0 by induction on n.
Base case: We must prove P (0). In other words, we must prove that
“for all integers m ≥ 0, we have f 0+m+1 = f 0 f m + f 0+1 f m+1 ”.
This is easy to show: For all integers m ≥ 0, we have f 0+m+1 = f m+1 and
f 0 f m + f 0+1 f m+1 = 0 f m + 1 f m+1 = f m+1 , so the two sides are equal.
|{z} |{z}
=0 = f 1 =1
Math 221 Winter 2024, version March 12, 2024 page 31
We can rename the variable m as p in this statement (since it is just a bound variable). Thus,
we obtain that
This equality has the same right hand side as (4). Thus, the left hand sides of
the two equalities must be equal as well. In other words, we must have
f n + m +2 = f n +1 f m + f n +1+1 f m +1 .
f n +1+ m +1 = f n +1 f m + f n +1+1 f m +1 .
f 12 + f 22 + · · · + f n2 = f n f n+1 .
(The left hand side here is the sum of the squares of the first n positive
Fibonacci numbers.)
Now, applying this latter statement to p = m + 1 (where m is the m that we fixed), we obtain
f n + m +1+1 = f n f m +1 + f n +1 f m +1+1 .
Math 221 Winter 2024, version March 12, 2024 page 33
Definition 1.8.2. Let a and b be two integers. We say that a divides b (and
we write a | b) if there exists an integer c such that b = ac. Equivalently, we
say that b is divisible by a in this case.
In other words, in our above table of Fibonacci numbers, if some entry of the
first row divides some other entry of the first row, then the same holds for the
corresponding entries of the second row. For example, 6 | 12 implies f 6 | f 12
(which is saying that 8 | 144).
Proof of Theorem 1.8.3. It is reasonable to try induction. However, inducting on
a does not lead anywhere: The base case is easy, but in the induction step it
is completely unclear how to reach the goal, since the condition a | b in the
induction hypothesis usually has nothing to do with the condition a + 1 | b in
the induction goal.
Similar problems appear if you try to induct on b. So neither of the two
variables in the theorem is suitable for being inducted on.
What can we do? Give up on induction?
Not so fast. One thing we haven’t tried is to introduce a new variable and
then induct on that new variable.
To do so, we observe that two integers a, b ≥ 0 satisfy a | b if and only if there
exists an integer c such that b = ac (by the definition of “divides”). Moreover,
if this integer c exists, then it can be chosen to be ≥ 0 (this is automatic when
b
b ̸= 0, because c = > 0 in this case; but otherwise we can achieve this by
a
simply choosing c = 0). Thus, two integers a, b ≥ 0 satisfy a | b if and only if
there exists an integer c ≥ 0 such that b = ac.
Hence, a pair of integers a, b ≥ 0 satisfying a | b is nothing but a pair of the
form a, ac where a, c ≥ 0 are integers. This allows us to restate Theorem 1.8.3
as follows:
Base case: We must prove P (0). In other words, we must prove that
But this is easy, because for any integer a ≥ 0, we have f a·0 = f 0 = 0, which is
divisible by any integer (thus in particular by f a ).
Induction step: Let c ≥ 0 be an integer. We assume that P (c) holds, i.e., that
Let a ≥ 0 be any integer. Then, the induction hypothesis (i.e., our assumption
that P (c) holds) yields that f a | f ac . In other words, f ac = f a p for some integer
p. Now,
This immediately yields that f a | f a(c+1) . Thus, we have shown that for any
integer a ≥ 0, we have f a | f a(c+1) . In other words, we have proved that P (c + 1)
holds. This completes the induction step, and thus the restated theorem is
proved. Therefore, the original Theorem 1.8.3 is also proved.
...............
Is it? There is a subtle gap in our above argument. Can you find it?
...............
Can you? Don’t look down just yet. The gap is somewhere above!
...............
This time, the theorem itself is correct, so you can’t find the gap by tracing
the proof through a case where the theorem is false. Though an example might
be useful...
...............
Math 221 Winter 2024, version March 12, 2024 page 35
No, we didn’t misuse the principle of induction. The structure of the proof
is fine. (Actually, we could have made our statements a bit shorter by fixing
a ≥ 0, but this wouldn’t have made much of a difference.)
...............
The base case was fine, too.
...............
A computer, of course, would spot the problem.
If you tried to formalize the above proof in a computer language (e.g., Coq
or Lean), you would run into a type mismatch error. Some statement has been
proved for variables of a certain type, but is being used for variables of a dif-
ferent type. Very slightly different.
...............
The statement in question is Theorem 1.8.1. It is stated for one kind of vari-
ables, but we have used it for a slightly different kind.
...............
OK, I am spelling it out: Theorem 1.8.1 (i.e., the addition formula f n+m+1 =
f n f m + f n+1 f m+1 ) has been stated and proved for all integers n, m ≥ 0, but we
have applied it to n = ac and m = a − 1. For this to work, we need ac ≥ 0 and
a − 1 ≥ 0. Now, ac ≥ 0 is indeed satisfied (since a ≥ 0 and c ≥ 0), but a − 1 ≥ 0
holds only if a ≥ 1, which is not guaranteed. Thus, our use of Theorem 1.8.1
was illegal when a = 0. And indeed, if we apply Theorem 1.8.1 for a = 0,
then we end up with an f −1 term, which is undefined. Even if you define f −1
appropriately (and there is a good definition; see Subsection 1.10.2), we have
not proved Theorem 1.8.1 for negative n, m. So there is a gap in our proof. Can
we fix it?
...............
Fortunately, we can: Our argument breaks down only in the case when a = 0,
and we can just treat this case a = 0 manually, since it is an easy case. So we
build a case distinction into our above induction step. Thus, the induction step
takes the following form:
Induction step (corrected): Let c ≥ 0 be an integer. We assume that P (c) holds,
i.e., that
“for any integer a ≥ 0, we have f a | f ac ” holds.
We must prove that P (c + 1) holds, i.e., that
Let a ≥ 0 be any integer. We must show that f a | f a(c+1) . We are in one of the
following two cases:
Case 1: We have a = 0.
Case 2: We have a ̸= 0.
Math 221 Winter 2024, version March 12, 2024 page 36
Some remarks:
• The number φ is called the golden ratio, and is famous for many prop-
erties, including the fact that φ2 = φ + 1 (which you can easily check by
Math 221 Winter 2024, version March 12, 2024 page 37
expanding both sides11 ). The number ψ is its so-called conjugate and also
satisfies ψ2 = ψ + 1.
• The numbers f n are integers, but Binet’s formula expresses them in terms
of two irrational numbers φ and ψ. This should be rather unexpected.
φn − ψn φ0 − ψ0 1−1
√ = √ = √ = 0.
5 5 5
Thus, Binet’s formula holds for n = 0.
Induction step: Let n ≥ 0 be an integer.
Assume (as induction hypothesis) that Binet’s formula holds for n; we must
prove that it holds for n + 1.
√
11 Namely: 1+ 5
From φ = , we obtain
2
√ !2 √ √ √ √
2 1 + 5 1+2 5+5 6+2 5 3+ 5 1+ 5
φ = = = = = + 1 = φ + 1.
2 4 4 2 2
• [Grinbe20, Subsection 4.9.2] (which solves any linear recurrence of the form xn =
axn−1 + bxn−2 for constant numbers a and b in an explicit and elementary way);
• [Melian01] and [Ivanov08] (which solve the more general version xn = a1 xn−1 +
a2 xn−2 + · · · + ak xn−k in terms of the eigenvalues of a matrix).
Textbooks on combinatorics or advanced linear algebra also tend to discuss such se-
quences (called linearly recurrent sequences).
Math 221 Winter 2024, version March 12, 2024 page 38
φ n +1 − ψ n +1
f n +1 = √ .
5
The recursive definition of the Fibonacci sequence yields
φn − ψn
f n +1 = f n + f n −1 = √ + f n −1 (by the induction hypothesis) .
5
So far so good, but how can we simplify f n−1 ? Our induction hypothesis only
φn − ψn
tells us that f n = √ , but it says nothing about f n−1 .
5
So this induction proof does not work.13
Let us see how to fix this by introducing a more advanced version of induc-
tion.
We can restate this principle slightly by renaming the n in the induction step
as n − 1 (so that the implication P (n) =⇒ P (n + 1) turns into P (n − 1) =⇒
P (n)). Thus, it takes the following form:
The idea behind the principle (in either form) is that the base case gives us
P (b) whereas the induction step gives us the implications
P (b) =⇒ P (b + 1) ,
P (b + 1) =⇒ P (b + 2) ,
P (b + 2) =⇒ P (b + 3) ,
....
In the domino metaphor (see Remark 1.2.3), the base case tips over the first
domino, and the induction step ensures that each domino falls from the impact
of the previous domino’s falling.
which is somewhat weaker (since it assumes more to get to the same conclu-
sion) but nevertheless gives the same result. Likewise, we could just as well
replace the implication P (b + 2) =⇒ P (b + 3) by the weaker implication
(so that the domino P (n) is tipped over by the combined force of all the pre-
ceding dominos, not just the one domino directly to its left).
This induction principle is called strong induction. Explicitly, it says the
following:
Math 221 Winter 2024, version March 12, 2024 page 40
holds.
Proofs using this principle are called proofs by strong induction (or strong
induction proofs). They differ from proofs by (regular) induction as follows:
In the induction step of a strong induction proof, you can use not just the pre-
ceding statement P (n − 1), but also all the statements before it (P (n − 2) and
P (n − 3) and so on, all the way down to P (b)). In other words, the induc-
tion hypothesis is now stronger (thus the name “strong induction”). Roughly
speaking, strong induction is “induction with a long memory” (as opposed to
regular induction, whose memory only is 1 step long).14
(We will later see a slightly nicer form of strong induction, in which the base
case is incorporated in the induction step.)
• you have proved the implication P (0) =⇒ P (1) (this is the induction step
for n = 1), so you conclude that P (1) holds (since P (0) holds);
by regular induction on n, and then you can derive P (n) from Q (n).)
Math 221 Winter 2024, version March 12, 2024 page 41
• you have proved the implication ( P (0) AND P (1)) =⇒ P (2) (this is the
induction step for n = 2), so you conclude that P (2) holds (since P (0)
and P (1) hold);
• you have proved the implication ( P (0) AND P (1) AND P (2)) =⇒ P (3)
(this is the induction step for n = 3), so you can conclude that P (3) holds
(since P (0) and P (1) and P (2) hold);
• and so on.
φn − ψn
“ fn = √ ”
5
for each n ≥ 0, and we apply the Principle of Strong Induction (for b = 0) to
prove this statement P (n) for each n ≥ 0.
Base case: As above, we check that Binet’s formula (i.e., the statement P (n))
holds for n = 0.
Induction step: Let n > 0 be an integer. We must prove the implication
Thus, we assume that P (0) AND P (1) AND P (2) AND · · · AND P (n − 1)
holds. In other words, we assume that Binet’s formula holds for 0, for 1, for 2,
φk − ψk
and so on, all the way up to n − 1. (In other words, we assume that f k = √
5
for each k ∈ {0, 1, . . . , n − 1}.)
We have to prove P (n). In other words, we have to prove that Binet’s formula
φn − ψn
also holds for n. In other words, we have to prove that f n = √ .
5
We assumed that Binet’s formula holds for n − 1. That is, we have f n−1 =
φ −1 − ψ n −1
n
√ .
5
We assumed that Binet’s formula holds for n − 2. That is, we have f n−2 =
φ −2 − ψ n −2
n
√ .
5
As we have seen above, we have φ2 = φ + 1 and ψ2 = ψ + 1.
Math 221 Winter 2024, version March 12, 2024 page 42
φ n −1 − ψ n −1 φ n −2 − ψ n −2
f n = f n −1 + f n −2 = √ + √
5 5
φ n − 1 −ψ n − 1 φ n −2 − ψ n −2
since f n−1 = √ and f n−2 = √
5 5
1
= √ φ n −1 − ψ n −1 + φ n −2 − ψ n −2
5
1 n −1 n −2
n −1
n −2
= √ φ +φ − ψ +ψ
5 | {z } | {z }
= φ n −2 ( φ +1 ) = ψ n −2 ( ψ +1 )
1
= √ φ n −2 ( φ + 1 ) − ψ n −2 ( ψ + 1 )
5 | {z } | {z }
= φ2 = ψ2
1 1 φn − ψn
= √ φ n −2 φ 2 − ψ n −2 ψ 2 = √ ( φ n − ψ n ) = √ .
5 | {z } | {z } 5 5
n=φ n =ψ
Note that we have had to handle the two cases n = 0 and n = 1 by hand
in our above proof, because we had to reach “2 steps back” in memory in the
induction step (i.e., we had to apply the induction hypothesis both to n − 1 and
to n − 2). 15 The case n = 0 was our base case, whereas the case n = 1 was
part of the induction step, but nevertheless had to be singled out for special
treatment (since n − 2 is negative for n = 1). Nevertheless, it makes sense to
think of the n = 1 case as a “second base case”, even if it is de-jure part of the
induction step.
holds.
How does this restated principle work without a base case? Easy: We have
just repackaged the base case into the induction step. Indeed, note that the
induction step now says “n ≥ b”, not “n > b”. In particular, this means that the
implication
our base case. So we have not magically removed the need for a base case;
we just have merged it into the induction step. Nevertheless, this makes for a
slightly cleaner version of strong induction.
Definition 1.9.6. A prime (or prime number) means an integer p > 1 whose
only positive divisors are 1 and p.
Exercise 1.9.1. Assume that you have 3-cent coins and 5-cent coins (each in
infinite supply). What denominations can you pay with these coins?
Let’s make a table (“yes” means that you can pay it; “no” means that you
Math 221 Winter 2024, version March 12, 2024 page 47
can’t):
0 cents yes
1 cents no
2 cents no
3 cents yes
4 cents no
5 cents yes
6 cents yes: 2 · 3
7 cents no .
8 cents yes: 3 + 5
9 cents yes: 3 · 3
10 cents yes: 2 · 5
11 cents yes: 2 · 3 + 5
12 cents yes: 4 · 3
13 cents yes: 3 + 2 · 5
··· ···
Experimentally, we seem to observe that any denomination ≥ 8 cents can be
paid. Why?
We can notice that if a denomination k (that is, k cents) can be paid, then
so can k + 3 (just add a 3-cent coin). Thus, because we can pay 8 cents, we
can also pay 11, 14, 17, . . . cents. Because we can pay 9 cents, we can also pay
12, 15, 18, . . . cents. Because we can pay 10 cents, we can also pay 13, 16, 19, . . .
cents. Together, these three sequences account for all the integers ≥ 8. Thus,
any denomination of ≥ 8 cents can be paid.
Let us formalize this argument as an induction proof.
We define N to be the set of all nonnegative integers:
N = {0, 1, 2, . . .} .
Proposition 1.9.8. For any integer n ≥ 8, we can pay n cents with 3-cent and
5-cent coins. In other words, any integer n ≥ 8 can be written as n = 3a + 5b
with a, b ∈ N.
In other words, we must prove that we can pay n cents with 3-cent and 5-cent
coins.
We are in one of the following three cases (since n > 8):
Case 1: We have n = 9.
Case 2: We have n = 10.
Case 3: We have n ≥ 11.
In Case 1, we are done, since n = 9 = 3 · 3 + 5 · 0 (that is, n cents can be paid
with three 3-cent coins).
In Case 2, we are done, since n = 10 = 3 · 0 + 5 · 2 (that is, n cents can be paid
with two 5-cent coins).
Now, consider Case 3. In this case, we have n ≥ 11. Hence, n − 3 ≥ 8. This
shows that n − 3 is one of the numbers 8, 9, . . . , n − 1.
Thus, we can apply the induction hypothesis to n − 3. We conclude that
n − 3 cents can be paid with 3-cent and 5-cent coins, i.e., we can write n − 3 as
n − 3 = 3c + 5d with c, d ∈ N. Using these c, d ∈ N, we therefore have
n = 3 + 3c + 5d (since n − 3 = 3c + 5d)
= 3 (c + 1) + 5d,
which shows that n cents can also be paid with 3-cent and 5-cent coins. This
shows that the proposition is true for n, and thus the induction step is complete.
The proposition is thus proved.
Note that the above proof had one “de-jure base case” (the case n = 8) and
two “de-facto base cases” (the cases n = 9 and n = 10, which were formally
part of the induction step but had to be treated separately because n − 3 would
be smaller than 8 in these cases). We could have just as well used the baseless
form of strong induction, in which case we would have to treat all three of these
cases as “de-facto base cases”. This would be a bit more uniform, although this
is entirely a matter of taste.
2
3n −1
yield = 32·(n−1)−(n−2) = 3n ). In view of 3n−1 = 1 and 3n−2 = 1, this
3n −2
12
rewrites as 3n = = 1. This completes the induction step, and thus the
1
claim is proved.
f −1 = f 1 − f 0 = 1 − 0 = 1;
f −2 = f 0 − f −1 = 0 − 1 = −1;
f −3 = f −1 − f −2 = 1 − (−1) = 2;
f −4 = f −2 − f −3 = (−1) − 2 = −3;
....
Thus, we gradually extend the Fibonacci sequence to the left, obtaining a “two-
sided sequence” (. . . , f −2 , f −1 , f 0 , f 1 , f 2 , . . .) that is “infinite in both directions”.
By virtue of its construction, it satisfies f n = f n−1 + f n−2 not only for all n ≥ 2,
but also for all integers n. However, a quick look at the first (say) 7 “extended”
Fibonacci numbers to the left of f 0 reveals that they are not as new as they
might seem: They are just copies of the positive Fibonacci numbers with signs.
More precisely, it looks like we have
Exercise 1.10.2. (a) Try to prove (5) directly by induction on n. (So the
induction step involves assuming that f −n = (−1)n−1 f n and proving that
f −(n+1) = (−1)n f n+1 . Don’t use strong induction yet!) Does this work?
(b) Now, instead, try to prove the stronger claim that “ f −n = (−1)n−1 f n
and f −n+1 = (−1)n−2 f n−1 for each n ≥ 0” by induction on n. Does this
work?
(c) Now, prove (5) by strong induction on n.
Exercise 1.10.3. Let n ≥ 0 be an integer, and let k ∈ {1, 2, . . . , n}. In the proof
of Proposition 1.1.3, we presented a certain strategy for solving the Tower of
Hanoi puzzle with n disks.
Prove that the k-th largest disk is moved exactly 2k−1 many times during
this strategy
a0 = 2, a1 = 3,
an = 3an−1 − 2an−2 for all n ≥ 2.
a0 = 2, a1 = 1,
an = an−1 + 6an−2 for all n ≥ 2.
Exercise 1.10.7. Recall the Fibonacci sequence (Definition 1.5.1) again. Let
n ≥ 0. Prove that
f 3n is even;
f 3n+1 is odd;
f 3n+2 is odd.
(In this exercise, you can freely use basic properties of even and odd num-
bers – such as Proposition 3.3.8.)
Math 221 Winter 2024, version March 12, 2024 page 51
t0 = 1, t1 = 1, t2 = 1, and
1 + t n −1 t n −2
tn = for each n ≥ 3.
t n −3
1 + t2 t1 1+1·1
t3 = = = 2;
t0 1
1 + t3 t2 1+2·1
t4 = = = 3;
t1 1
1 + t4 t3 1+3·2
t5 = = = 7;
t2 1
1 + t5 t4 1+7·3
t6 = = = 11,
t3 2
and so on.)
(a) Prove that tn+2 = 4tn − tn−2 for each n ≥ 2.
(b) Prove that tn is a positive integer for each integer n ≥ 0.
[Hint: Use regular induction for part (a) and strong induction for part (b).
Note that the “positive” part is clear from the definition, so you only need to
prove the “integer” part in (b).]
Exercise 1.10.9. (a) Prove the following: For any integer n ≥ 12, we can pay
n cents with 3-cent and 7-cent coins. In other words, any integer n ≥ 12 can
be written as n = 3a + 7b with a, b ∈ N. (Here, again, N = {0, 1, 2, . . .}.)
(b) Find the largest integer k such that k cents cannot be paid with 2-cent
and 13-cent coins. Prove that for every integer n > k, we can pay n cents
with these kinds of coins.
(c) Is there a largest integer k such that k cents cannot be paid with 2-cent
and 6-cent coins?
a b x y
• The product AX of two 2 × 2-matrices A = and X =
c d z w
ax + bz ay + bw
is defined to be .
cx + dz cy + dw
Exercise 1.10.12. Let m ∈ N. Prove that there exists a way to arrange the
first m positive integers (1, 2, . . . , m) in a row in such a way that the average
of two numbers never stands between these two numbers.
(For example, for m = 8, one such arrangement is 1, 5, 3, 7, 2, 6, 4, 8. The
arrangement 1, 3, 2, 7, 8, 5, 6, 4 is invalid because the average of 1 and 5 is 3,
which stands between 1 and 5.)
[Hint: First show that there is such an arrangement when m is a power of
2 (that is, when m = 2n for some n ∈ N). Then, choose a sufficiently large
power of 2 and remove all entries larger than m.]
More advanced and creative uses of induction can be found in [Grinbe20, Chap-
ter 2], [Grinbe23b, Lecture 1], [AndCri17] and [Gunder10].
Math 221 Winter 2024, version March 12, 2024 page 53
(in Section 1.6). Such sums can be tricky to decipher: You need to guess the
pattern of the addends to understand what the “· · · ” means. There is a notation
that makes such sums both shorter and easier to understand. This is the finite
sum notation (also known as the sigma notation). In its simplest form, it is
defined as follows:
a u + a u +1 + · · · + a v
For example:
10
∑ k = 5 + 6 + 7 + 8 + 9 + 10 = 45;
k =5
10
1 1 1 1 1 1 1 2131
∑ k
= + + + + +
5 6 7 8 9 10
=
2520
;
k =5
10
∑ kk = 55 + 66 + 77 + 88 + 99 + 1010;
k =5
5
∑ k = 5;
k =5
4
∑k=0 (an empty sum) ;
k =5
3
∑k=0 (an empty sum) ;
k =5
8
∑ 3 = 3 + 3 + 3 + 3 = 12 (a sum of four equal terms) ;
k =5
n −1
∑ q k = q 0 + q 1 + · · · + q n −1 for any n ∈ N and any number q;
k =0
n −1
∑ x k y n −1− k = x 0 y n −1 + x 1 y n −2 + x 2 y n −3 + · · · + x n −3 y 2 + x n −2 y 1 + x n −1 y 0
k =0
= yn−1 + xyn−2 + x2 yn−3 + · · · + x n−3 y2 + x n−2 y + x n−1
= x n−1 + x n−2 y + x n−3 y2 + · · · + x2 yn−3 + xyn−2 + yn−1
for any n ∈ N and any numbers x and y.
v
Just don’t make it ∑ au .
u=u
Math 221 Winter 2024, version March 12, 2024 page 55
We have not computed this last sum, so let us do this. I will use the following
“laws of summation”:
• We have
v v v
∑ ( a k − bk ) = ∑ ak − ∑ bk (6)
k=u k=u k=u
for any integers u, v and any numbers ak , bk . Indeed, if you rewrite this
without finite sum notation, it takes the form
( a u − bu ) + ( a u + 1 − bu + 1 ) + · · · + ( a v − bv )
= ( a u + a u + 1 + · · · + a v ) − ( bu + bu + 1 + · · · + bv ) ,
• We have
v v
∑ λak = λ ∑ ak (7)
k=u k=u
for any integers u, v and any numbers λ, ak . Indeed, rewritten without the
use of finite sum notation, this is just saying that
Rules like this are dime a dozen, and you should be able to come up with
them on the spot when you need them. (See [Grinbe15, §1.4.2] for these and
several others.)
Math 221 Winter 2024, version March 12, 2024 page 56
(Theorem 1.3.1) using finite sum notation. We will need three new rules this
time:
• We have
v v v
∑ ak + ∑ bk = ∑ ( a k + bk ) (9)
k=u k=u k=u
for any integers u, v and any numbers ak , bk . Indeed, if you rewrite this
without finite sum notation, it takes the form
( a u + a u + 1 + · · · + a v ) + ( bu + bu + 1 + · · · + bv )
= ( a u + bu ) + ( a u + 1 + bu + 1 ) + · · · + ( a v + bv ) .
• We have
v v
∑ ak = ∑ au+v−k (10)
k=u k=u
for any integers u, v and any numbers ak . This is called “substituting
u + v − k for k in the sum” or just “turning the sum upside-down”, as
it amounts to reversing the order of the addends; restated without finite
sum notation, this is just saying that
a u + a u +1 + · · · + a v = a v + a v −1 + · · · + a u ,
19 Thisuse of the word “limit” is totally unrelated to the way this word is used in analy-
sis/calculus.
Math 221 Winter 2024, version March 12, 2024 page 58
v Rv
There are many similarities between finite sums ∑ ak and integrals f ( x ) dx,
k=u u
Ru
but the analogy should not be taken too far (e.g., an integral f ( x ) dx whose
u
upper and lower limit are equal will always be 0, but an “analogous” finite sum
u
∑ ak will be au ).
k=u
for any integers u ≤ w ≤ v and any numbers ak . This is just saying that
a u + a u +1 + · · · + a v = ( a u + a u +1 + · · · + a w ) + ( a w +1 + a w +2 + · · · + a v ) .
(Strictly speaking, this is true not just for u ≤ w ≤ v but more generally
for u − 1 ≤ w ≤ v. If you find this confusing, recall that an empty sum
equals 0 by definition.)
Finite sum notation, in the form defined above, is helpful when the summa-
tion index is running over an integer interval (i.e., a set of consecutive integers).
For more general situations, there is a more general version of finite sum nota-
tion, e.g.:
∑ k = 2 + 4 + 6 + · · · + m,
k∈{1,2,...,n} is even
where m is the largest even element of {1, 2, . . . , n}. We won’t use it much, but
it is fairly self-explanatory; essentially, the writing under the summation sign
explains what k’s the sum is ranging over. See [Grinbe15, §1.4.1] for a more
precise explanation.
Math 221 Winter 2024, version March 12, 2024 page 59
Exercise 2.1.3. The floor ⌊ x ⌋ of a real number x means the largest integer that
is smaller or equal to x. For instance, ⌊6.2⌋ = 6 and ⌊7.7⌋ = 7 and ⌊8⌋ = 8.
(In other words, ⌊ x ⌋ is what you get if you round x down. Beware: ⌊−1.3⌋ is
−2, not −1.)
Let n ∈ N. Prove that
n jnk n + 1
k
∑ 2 = 2 · 2 .
k =1
(In this exercise, you can freely use basic properties of even and odd num-
bers – such as Proposition 3.3.8.)
Exercise 2.1.4. Let n ∈ N, and let q be any number distinct from 1. Prove
that
n
nqn+1 − (n + 1) qn + 1
∑ kq k
= q ·
( q − 1)2
.
k =1
a u a u +1 · · · a v .
For example:
10
∏ k = 5 · 6 · 7 · 8 · 9 · 10 = 151 200;
k =5
5
1 1 1 1 1 1 1
∏k = · · · · =
1 2 3 4 5 120
;
k =1
5
1 1
∏k = ;
5
k =5
5
1
∏k =1 (an empty product) ;
k =6
n
∏ a = |aa {z
· · · }a = an for any fixed number a and any n ∈ N;
k =1 n times
n
∏ a k = a1 a2 · · · a n
k =1
!
by one of the laws of exponents:
= a1+2+···+n
namely, the law ai1 ai2 · · · ain = ai1 +i2 +···+in
= an(n+1)/2 for any fixed number a and any n ∈ N.
v
In a finite product ∏ ak , the k is called the product index or the running
k=u
index20 , and the symbol ∏ is called the product sign. The numbers ak are
20 And just like in a sum, you can use any letter for it (unless it already stands for something
different).
Math 221 Winter 2024, version March 12, 2024 page 61
called the factors of the product. Other terminology is analogous to the case of
a finite sum (e.g., lower limit, upper limit). Almost all rules for finite sums have
analogues for finite products. Let me only state the analogues of the “splitting-
off rule” and of the rule (6):
• The “splitting-off rule” for products: For any integers u ≤ v and any
numbers au , au+1 , . . . , av , we have
!
v v −1 v
∏ ak = ∏ ak av = au ∏ ak .
k=u k=u k = u +1
a u a u +1 · · · a v = ( a u a u +1 · · · a v −1 ) a v = a u ( a u +1 a u +2 · · · a v ) .
This rule allows us to split the first or the last factor out of a finite product.
This is important for proofs by induction.
for any integers u, v and any numbers ak , bk , as long as the numbers bk are
nonzero. This is an analogue of (6), since the multiplicative counterpart
to subtraction is division. (We had to assume that the bk are nonzero in
order for the fractions ak /bk to be well-defined.)
2.3. Factorials
Now, we define a sequence of integers that appears all over mathematics. Recall
that N = {0, 1, 2, . . .}.
Definition 2.3.1. For any n ∈ N, we define the positive integer n! (called the
factorial of n, and often pronounced “n factorial”) by
n
n! = ∏ k = 1 · 2 · · · · · n.
k =1
For example,
0! = (empty product) = 1;
1! = 1 = 1;
2! = 1 · 2 = 2;
3! = 1 · 2 · 3 = 6;
4! = 1 · 2 · 3 · 4 = 24;
5! = 1 · 2 · 3 · 4 · 5 = 120;
6! = 1 · 2 · 3 · 4 · 5 · 6 = 720;
7! = 5 040;
8! = 40 320;
9! = 362 880;
10! = 3 628 800.
Note the following:
Proposition 2.3.2 (recursion of the factorials). For any positive integer n, we
have
n! = (n − 1)! · n.
1 · 1! + 2 · 2! + 3 · 3! + · · · + n · n! = (n + 1)! − 1
for each n ∈ N.
(Meanwhile, there is no such simple formula for 1! + 2! + 3! + · · · + n!. Not
every sum can be simplified!)
Exercise 2.3.2. (a) Prove that
n
1 n+1
∏ 1− 2
i
=
2n
i =2
n 0 1 2 3 4 5 6
an 2 3 7 43 1807 3263443 10650056950807
Definition
2.4.1. Let n and k be any numbers. Then, we define a number
n
as follows:
k
• If k ∈ N, then we set
n ( n − 1) ( n − 2) · · · ( n − k + 1)
n
:= .
k k!
(The numerator here is the product of k factors, where the first factor is
n and each further factor is 1 smaller than the previous. You can also
k −1
write this product as ∏ (n − i ).)
i =0
/ N, then we set
• If k ∈
n
:= 0.
k
n
The number is called “n choose k”, and is known as the binomial
k
n n
coefficient of n and k. Do not mistake the notation for a vector .
k k
n ( n − 1) ( n − 2) n ( n − 1) ( n − 2)
n
= = ;
3 3! 6
n ( n − 1) n ( n − 1)
n
= = ;
2 2! 2
n n
= = n;
1 1!
n (empty product) 1
= = = 1;
0 0! 1
n
=0 (since 2.5 ∈/ N) ;
2.5
n
=0 (since − 1 ∈ / N) .
−1
Math 221 Winter 2024, version March 12, 2024 page 65
n
Let us tabulate the values of for nonnegative integers n and k:
k
What patterns can we spot in this table? (We are ignoring negative and non-
integer n’s for now.)
The following is probably the most visible one:
n
Proposition 2.4.3. Let n ∈ N and k > n. Then, = 0.
k
Remark 2.4.4. Note that Proposition 2.4.3 would not hold without the n ∈ N
assumption. For example,
The product in the numerator is not 0, since it “jumps over” the 0 factor.
n
Proposition 2.4.3 explains why our above table of has so many zeroes
k
in it. More precisely, it tells us that all entries above the main diagonal of the
table are zeroes (no matter how many more rows and columns we add). Thus,
we can redraw our table as a triangular table (and fill in a few more rows while
at that):
k =0
↙
k =1
n=0 → 1 ↙
k =2
n=1 → 1 1 ↙
k =3
n=2 → 1 2 1 ↙
k =4
n=3 → 1 3 3 1 ↙
k =5
n=4 → 1 4 6 4 1 ↙
k =6
n=5 → 1 5 10 10 5 1 ↙
k =7
n=6 → 1 6 15 20 15 6 1 ↙
n=7 → 1 7 21 35 35 21 7 1
n=8 → 1 8 28 56 70 56 28 8 1
This table is known as Pascal’s triangle, and has a variety of wonderful prop-
erties. Here are just a few:
n−1 n−1
n
= + .
k k−1 k
• Symmetry
of binomial
coefficients: For any n ∈ N and any k, we have
n n
= .
k n−k
Math 221 Winter 2024, version March 12, 2024 page 67
n
• We have = 1 for each n ∈ N.
n
• Integrality
of binomial coefficients: For any n ∈ Z and any k, we have
n
∈ Z.
k
In the next section, we will prove these four propositions and more.
n−1 n−1
n
= + .
k k−1 k
7 6 6
Example 2.5.2. For n = 7 and k = 3, this is claiming that = + ,
3 2 3
which explicitly is saying that 35 = 15 + 20.
But note that Theorem 2.5.1 also can be applied when n or k is negative or
non-integer.
Proof of Theorem 2.5.1. Let n and k be two numbers. We are in one of the follow-
ing three cases:
Case 1: The number k is a positive integer.
Case 2: We have k = 0.
Case 3: None of the above.
Let us first consider Case 1 (this is the interesting case). Here, k is a posi-
tive integer, so that both k and k − 1 belong to N. The definition of binomial
coefficients therefore yields the three formulas
n ( n − 1) ( n − 2) · · · ( n − k + 1)
n
= ;
k k!
n−1 (n − 1) (n − 2) (n − 3) · · · ((n − 1) − (k − 1) + 1)
=
k−1 ( k − 1) !
( n − 1) ( n − 2) ( n − 3) · · · ( n − k + 1)
= ;
( k − 1) !
n−1 (n − 1) (n − 2) (n − 3) · · · ((n − 1) − k + 1)
=
k k!
( n − 1) ( n − 2) ( n − 3) · · · ( n − k )
= .
k!
Math 221 Winter 2024, version March 12, 2024 page 68
n−1 n−1
n
= + .
k k−1 k
Using the formulas (13), (14) and (15), this can be rewritten as
na a a (n − k)
= + .
k! ( k − 1) ! k!
k!
na = a · + a (n − k) .
( k − 1) !
k!
Since = k (because the recursion of the factorials (i.e., Proposition 2.3.2)
( k − 1) !
yields k! = (k − 1)! · k), we can simplify this further to
na = a · k + a (n − k ) ,
n−1 n−1
n
= +
k k−1 k
thus rewrites as
n−1 n−1
n
= + ,
0 0−1 0
n−1
n
which again is true (because Example 2.4.2 shows that = 1 and =
0 0
n−1 n−1
1 and = = 0).
0−1 −1
Math 221 Winter 2024, version March 12, 2024 page 69
n−1 n−1
n
= + ,
k k−1 k
m
all three binomial coefficients are 0 (since a binomial coefficient is 0 by
ℓ
definition when ℓ ∈/ N). Thus, again, the claim is true (since 0 = 0 + 0).
We have now proved Theorem 2.5.1 in all three cases; thus, it is always true.
Pascal’s
identity is highly useful for proving properties of binomial coeffi-
n
cients by induction on n. (We will see an example of this very soon, in the
k
proof of Theorem 2.5.9.)
Pascal’s identity shows that every entry of Pascal’s triangle (except the 1 at
the apex) equals the sum of the two entries directly above it (i.e., of the entry one
step northwest of it and the entry one step northeast of it). But it also applies to
binomial coefficients
that
are
not (commonly) considered
to be part ofPascal’s
−3 −4 −4
3.2 2.2 2.2
triangle, such as = + and = + .
5 4 5 2 1 2
n
Proof. The definition of yields
k
n ( n − 1) ( n − 2) · · · ( n − k + 1)
n
= .
k k!
Math 221 Winter 2024, version March 12, 2024 page 70
One corollary of Theorem 2.5.5 is the fact that the “right border” of Pascal’s
triangle is filled with 1’s:
n
Corollary 2.5.7. For any n ∈ N, we have = 1.
n
Math 221 Winter 2024, version March 12, 2024 page 72
Warning 2.5.8. Corollary 2.5.7 does not hold for negative (or non-integer) n.
Example 2.5.11. Let n = 4 and k = 2 and A = {1, 2, 3, 4}. Then, the 2-element
subsets of A are
We will prove Theorem 2.5.10 later in this course (see Theorem 4.3.3), as we
learn more about finite sets and their sizes. Note that the k-element subsets of
A are also known
as combinations without replacement. Theorem 2.5.10 also
n
explains why is called “n choose k”: After all, a k-element subset of A is a
k
“choice” of k distinct elements (without regard for order) from A.
n
Note again that Theorem 2.5.10 says nothing about binomial coefficients
k
with n ∈ / N, since anumber / N cannot be the size of a set. So
n ∈ Theorem
−5
5
2.5.10 explains why is an integer, but does not explain why is an
2 2
integer.
Math 221 Winter 2024, version March 12, 2024 page 74
n+k−1 ( n + k − 1) ( n + k − 2) ( n + k − 3) · · · n
=
k k!
n ( n + 1) ( n + 2) · · · ( n + k − 1)
= .
k!
−n k n+k−1
Comparing these equalities, we find = (−1) . This proves
k k
the theorem.
n
Corollary 2.5.13. For any n ∈ Z and any number k, we have ∈ Z.
k
We will prove Theorem 2.5.14 in Chapter 6 (as Corollary 6.4.8) using enumer-
ative combinatorics. You can find proofs of Theorem 2.5.14 in [Vorobi02, §15]
and in [Grinbe19a, §1.4.5, proof of Proposition 1.3.32] as well.
Theorem 2.6.1 (binomial formula, aka binomial theorem). Let a and b be any
numbers, and let n ∈ N. Then,
n
n k n−k
( a + b) = ∑
n
a b . (17)
k =0
k
Equivalently:
n
n n−k k
( a + b) = ∑
n
a b . (18)
k =0
k
Math 221 Winter 2024, version March 12, 2024 page 76
( a + b )5
5
5 k 5− k
= ∑ a b
k =0
k
5 0 5 5 1 4 5 2 3 5 3 2 5 4 1 5 5 0
= a b + a b + a b + a b + a b + a b
0 1 2 3 4 5
= 1b5 + 5ab4 + 10a2 b3 + 10a3 b2 + 5a4 b + 1a5
= b5 + 5ab4 + 10a2 b3 + 10a3 b2 + 5a4 b + a5 .
( a + b)2 = b2 + 2ab + a2 .
Proof of Theorem 2.6.1. Clearly, the formula (18) is just the formula (17) with the
variables a and b swapped (since b + a = a + b). Thus, it will suffice to prove
(17).
We will prove (17) by induction on n:
Base case: For n = 0, this formula (17) is true, since
0
0 k 0− k 0
0
( a + b) = 1 and ∑ k a b = 0 |{z} 0−0
a0 b|{z} = 1.
k =0 |{z} =1 =b0 =1
=1
Induction step: Let n ∈ N. We assume (as the induction hypothesis) that the
formula (17) holds for n. In other words, we assume that
n
n k n−k
( a + b) = ∑
n
a b . (19)
k =0
k
We must show that the formula (17) also holds for n + 1. In other words, we
must prove that
n +1
n + 1 k n +1− k
( a + b) n +1
= ∑ a b . (20)
k =0
k
Math 221 Winter 2024, version March 12, 2024 page 77
Indeed, we have
( a + b ) n +1
= ( a + b)n · ( a + b)
!
n
n k n−k
= ∑ a b · ( a + b) (by (19))
k =0
k
! !
n n
n k n−k n k n−k
= ∑ a b ·a+ ∑ a b ·b
k =0
k k =0
k
n n
n n k n−k
= ∑ n−k
ak b{z a+∑ a b| {z b}
k =0
k | }
k =0
k
= a k +1 b n − k = b n − k +1
! !
v v
by distributivity for finite sums, i.e., by the rule ∑ as c= ∑ as c
s=u s=u
n n
n k +1 n − k n k n − k +1
= ∑ a b +∑ a b . (21)
k =0
k k =0
k
So it remains to prove that the first sums on the right hand sides of (21) and
(22) are equal. In other words, it remains to prove that
n k +1 n − k n +1
n
n
∑ k a b = ∑ k − 1 a k b n +1− k . (23)
k =0 k =1
But this becomes clear if we observe that these two sums contain the exact
same addends: Indeed, written out without using summation signs, both sums
become
n 1 n n 2 n −1 n 3 n −2 n n +1 0
a b + a b + a b +···+ a b .
0 1 2 n
This argument can be made more rigorously using an important summation
rule, known as substitution. In its simplest form, this rule says that
v v+δ
∑ ck = ∑ ck−δ (24)
k=u k=u+δ
for any integers u, v, δ and any numbers cu , cu+1 , . . . , cv . This is the discrete
analogue of the formula
Z v Z v+δ
f ( x ) dx = f ( x − δ) dx
u u+δ
from real analysis. A formal proof of (24) can easily be given by induction on v,
but intuitively (24) should be obvious (since both sides are cu + cu+1 + · · · + cv ).
v v+δ
When we use (24) to rewrite a sum of the form ∑ ck as ∑ ck−δ , we say
k=u k=u+δ
that we are substituting k − δ for k in the sum. For example, taking u = 4 and
v = 9 and ck = kk and δ = −2, we see that
9 9+(−2) 7
∑ kk = ∑ (k − (−2))k−(−2) = ∑ ( k + 2 ) k +2 .
k =4 k=4+(−2) k =2
n n k +1 n − k
Now, substituting k − 1 for k in the sum ∑ a b , we obtain
k =0 k
n k +1 n − k n +1
n n +1
n n
∑ k a b = ∑ k − 1 |a {z } b| {z } = ∑ k − 1 ak bn+1−k .
(k−1)+1 n−(k−1)
k =0 k =1 = ak k =1
= b n +1− k
Exercise 2.6.2. Recall the Fibonacci sequence (Definition 1.5.1). Prove that
every n ∈ N and m ∈ N satisfy
n
n
∑ k f m+k = f m+2n .
k =0
n−1
n
Exercise 2.6.3. (a) Prove that k =n for any two numbers n and
k k−1
k.
n n k
(b) Prove that ∑ k x = nx ( x + 1)n−1 for any positive integer n and
k =0 k
any number x.
/ N.]
[Hint: In part (a), don’t forget about cases like k = 0 and k ∈
3.1. Divisibility
3.1.1. Definition
We begin by defining the one most important concept in number theory:
The well-known concepts of even and odd integers are instances of divisibil-
ity:
You probably know a few things about even and odd numbers already: e.g.,
Strictly speaking, these claims (particularly the third one) are not at all ob-
vious. So we need to understand divisibility better to even convince ourselves
that such fundamental statements are true. We will do this soon (Corollary
3.3.9). First, let us prove some basic facts about divisibility.
Proof. (a) Proposition 3.1.4 (a) says that the divisibility a | b does not depend
on the signs of a and b; in other words, it says that we can replace the numbers
a and b by their absolute values without changing the truth (or falsity) of a | b.
Clearly, in order to prove this, it suffices to show the following two state-
ments:
The proof of the second statement is similar. (This time, you need to argue
that a | b implies a | −b. Again, write b as b = ac, and conclude that −b =
− ac = a (−c), so that a | −b.)
Thus, both statements are proved, so that the proof of Proposition 3.1.4 (a) is
complete.
(b) Assume that a | b and b ̸= 0. We must show that abs a ≤ abs b.
Let x = abs a and y = abs b. Thus, x is a nonnegative integer and y is a
positive integer (since b ̸= 0). Thus, x ≥ 0 and y > 0.
Proposition 3.1.4 (a) yields that abs a | abs b (since a | b). In other words, x | y
(since x = abs a and y = abs b). In other words, y = xz for some integer z.
Consider this z.
If we had z ≤ 0, then we would have y = |{z} x z ≤ 0 (by the standard
|{z}
≥0 ≤0
rules for inequalities), which would contradict y > 0. Hence, we cannot have
z ≤ 0. Thus, z > 0, so that z ≥ 1 (since z is an integer). Hence, xz ≥ x1
(since x ≥ 0 allows us to multiply any inequality by x without having to flip
the sign). Therefore, y = xz ≥ x1 = x. In other words, x ≤ y. In other words,
abs a ≤ abs b (since x = abs a and y = abs b). This proves Proposition 3.1.4 (b).
(c) Let a | b and b | a. We must prove that abs a = abs b.
If a = 0, then this is easily done (because if a = 0, then 0 = a | b quickly leads
to b = 0, and therefore a = 0 = b, so that abs a = abs b).
Likewise, this is easily done if b = 0.
It remains to handle the third possible case, which is when both a and b are
̸= 0. Consider this case. In this case, Proposition 3.1.4 (b) yields abs a ≤ abs b
(since a | b and b ̸= 0). However, we can also apply Proposition 3.1.4 (b) with
the roles of a and b interchanged (since b | a and a ̸= 0), and thus obtain abs b ≤
abs a. Combining this with abs a ≤ abs b, we find abs a = abs b. Proposition
3.1.4 (c) is thus proved.
(d) This is quite straightforward:
Assume that a | b. Thus, there exists some integer c such that b = ac (by the
b b
definition of “a | b”). This c must then be (since b = ac implies c = in view
a a
b b
of a ̸= 0). Hence, is an integer, i.e., we have ∈ Z.
a a
b
Forget that we assumed a | b. We thus have shown that ∈ Z if a | b. The
a
b
same argument (done in reverse) yields that conversely, if ∈ Z, then a | b.
a
b
Combining these two facts, we conclude that a | b if and only if ∈ Z. This
a
proves Proposition 3.1.4 (d).
This was a warm-up (if somewhat laborious to write up). Here are some
slightly more substantial properties of divisibility:
Math 221 Winter 2024, version March 12, 2024 page 85
Theorem 3.1.5 (rules for divisibility). (a) We have a | a for each a ∈ Z. (This
is called reflexivity of divisibility.)
(b) If a, b, c ∈ Z satisfy a | b and b | c, then a | c. (This is called transitivity
of divisibility.)
(c) If a1 , a2 , b1 , b2 ∈ Z satisfy a1 | b1 and a2 | b2 , then a1 a2 | b1 b2 . (This is
called multiplying two divisibilities.)
(d) If d, a, b ∈ Z satisfy d | a and d | b, then d | a + b. (This is often restated
as “a sum of two multiples of d is again a multiple of d”.)
b y = axy.
c = |{z}
= ax
Hence, there exists some integer z such that c = az (namely, z = xy). This
shows that a | c. Theorem 3.1.5 (b) is thus proven.
(c) Let a1 , a2 , b1 , b2 ∈ Z satisfy a1 | b1 and a2 | b2 .
From a1 | b1 , we see that b1 = a1 c1 for some integer c1 .
From a2 | b2 , we see that b2 = a2 c2 for some integer c2 .
Consider these integers c1 and c2 . Now,
b1 b2 = a1 c1 a2 c2 = ( a1 a2 ) (c1 c2 ) .
|{z} |{z} | {z }
= a1 c1 = a2 c2 an integer
a + b = dx + dy = d ( x + y) .
| {z }
an integer
“a1 | a2 | · · · | ak ”
shall mean that each of the numbers a1 , a2 , . . . , ak divides the next (i.e., that
a1 | a2 and a2 | a3 and so on, ending with ak−1 | ak ). By induction on k, it is
easy to see that such a chain of divisibilities always entails a1 | ak . For example,
3 | 6 | 18 | 36 entails 3 | 36.
Example 3.1.7. Let b = 10835. Then, 2 ∤ b, since the last digit of b is neither
0 nor 2 nor 4 nor 6 nor 8 (but 5). However, 5 | b, since the last digit of b is 0
or 5. Do we have 3 | b ? The sum of the digits of b is 1 + 0 + 8 + 3 + 5 = 17,
which is not divisible by 3. Thus, b is not divisible by 3. Hence, b is not
divisible by 9 either, because if we had 9 | b, then we would get 3 | 9 | b (by
Theorem 3.1.5 (b)), which would contradict the previous sentence.
• Two even numbers are always congruent (to each other) modulo 2.
• Two odd numbers are always congruent (to each other) modulo 2.
( a ≡ 0 mod n) ⇐⇒ (n | a − 0) ⇐⇒ (n | a) .
then
a1 + a2 ≡ b1 + b2 mod n; (25)
a1 − a2 ≡ b1 − b2 mod n; (26)
a1 a2 ≡ b1 b2 mod n. (27)
Thus, n | a1 − b1 and n | a2 − b2 .
From n | a1 − b1 , we see that a1 − b1 = nc1 for some integer c1 .
From n | a2 − b2 , we see that a2 − b2 = nc2 for some integer c2 .
Consider these integers c1 and c2 .
From a1 − b1 = nc1 , we obtain a1 = b1 + nc1 . Similarly, a2 = b2 + nc2 .
Adding the equalities a1 = b1 + nc1 and a2 = b2 + nc2 together, we find
a1 + a2 ≡ b1 + b2 mod n.
Proposition 3.2.4 (b) says that congruences can be turned around: From
a ≡ b mod n, we can always obtain b ≡ a mod n. (This is very different from
divisibilities, for which a | b almost never implies b | a.)
Proposition 3.2.4 (c) says that congruences can be chained together: From
a ≡ b mod n and b ≡ c mod n, we can always obtain a ≡ c mod n. This is
analogous to Theorem 3.1.5 (b), and leads to a similar convention: Instead
of writing “a ≡ b mod n and b ≡ c mod n”, we will often just write “a ≡ b ≡
c mod n”, understanding that (by Proposition 3.2.4 (c)) this chain of congruences
automatically implies a ≡ c mod n. More generally, the statement
“a1 ≡ a2 ≡ · · · ≡ ak mod n”
shall mean that each of the numbers a1 , a2 , . . . , ak is congruent to the next mod-
ulo n (i.e., that ai ≡ ai+1 mod n for each i ∈ {1, 2, . . . , k − 1}). By induction on k,
it is easy to see that such a chain of congruences always entails a1 ≡ ak mod n
(and, better yet: ai ≡ a j mod n for all i and j).
Note that we can only chain together two congruences modulo the same n,
not two congruences modulo two different n’s. For example, if we know that
a ≡ b mod 2 and b ≡ c mod 3, then we cannot conclude any congruence between
a and c.
Proposition 3.2.4 (d) says that congruences modulo n (for a fixed integer n)
can be added, subtracted and multiplied together (just like equalities). Before
you get over-enthusiastic, keep in mind that
• they cannot be divided by one another: We have 2 ≡ 0 mod 2 and 2 ≡
2 mod 2 but 2/2 ̸≡ 0/2 mod 2.
• they cannot be taken to each other’s power: We have 2 ≡ 2 mod 2 and
2 ≡ 0 mod 2 but 22 ̸≡ 20 mod 2.
However, we can take a congruence to a k-th power for a fixed k ∈ N:
Exercise 3.2.1. Let n, a, b ∈ Z be such that a ≡ b mod n. Let k ∈ N. Prove
that ak ≡ bk mod n.
Proposition 3.2.4 (e) shows that the n in a congruence a ≡ b mod n can be
replaced by any divisor of n. For example, if two integers a and b satisfy
a ≡ b mod 15, then a ≡ b mod 3, since 3 is a divisor of 15.
The next exercise shows that we can divide a congruence a ≡ b mod n by a
nonzero integer d as long as we divide all three numbers in it (a, b and n) by d
(rather than just a and b):
Exercise 3.2.2. Let n, d, a, b ∈ Z, and assume that d ̸= 0 and da ≡ db mod dn.
(a) Prove that a ≡ b mod n.
(b) Show by an example that a ≡ b mod dn is not necessarily true (i.e., we
cannot simply cancel the d from da and db while leaving the dn unchanged).
Math 221 Winter 2024, version March 12, 2024 page 91
In other words,
md · 10d ≡ md mod 9;
md−1 · 10d−1 ≡ md−1 mod 9;
md−2 · 10d−2 ≡ md−2 mod 9;
...;
m0 · 100 ≡ m0 mod 9.
In other words,
m ≡ s mod 9
21 The reason why we can multiply two congruences together is Proposition 3.2.4 (d) (specifi-
cally, (27)).
22 The reason why we can add two congruences together is Proposition 3.2.4 (d) (specifically,
(25)). To be very pedantic, we have to apply (25) several times, since we are adding not two
but d + 1 many congruences together.
Math 221 Winter 2024, version March 12, 2024 page 92
d
m = md · 10d + md−1 · 10d−1 + · · · + m0 · 100 = ∑ mk · 10k .
k =0
d
a := m0 − m1 + m2 − m3 ± · · · + (−1)d md = ∑ (−1)k mk .
k =0
Prove that 11 | m if and only if 11 | a. (This is the classical divisibility test for
divisibility by 11.)
such that
n = qd + r.
For now, of course, we do not yet know that these q and r exist and are
unique (because we haven’t proved the theorem yet). Thus, we will take care to
speak of “a quotient”, “a remainder” and “a quo-rem pair”, never taking their
existence and uniqueness for granted until we have proved it.
|{z} 1 · |{z}
8 = |{z} 5 + 3
|{z} ,
=n =q =d =r ∈{0,1,2,3,4}
so 8//5 = 1 and 8%5 = 3. (This is taking the uniqueness of 8//5 and 8%5
for granted, but we will prove this soon.)
−7 = (−2) · |{z}
|{z} 5 + 3
|{z} ,
| {z }
=n =q =d =r ∈{0,1,2,3,4}
So Theorem 3.3.1 is saying that for any integer n and any positive integer d,
there is a unique quo-rem pair of n and d. Let us now prove this.
Proof of the uniqueness part: Fix an integer n and a positive integer d. We must
show that there is at most one quo-rem pair (q, r ) of n and d. In other words,
we must show that there are no two distinct quo-rem pairs of n and d.
We shall prove this by contradiction. So we assume that (q1 , r1 ) and (q2 , r2 )
are two distinct quo-rem pairs of n and d. We want to derive a contradiction.
Since (q1 , r1 ) is a quo-rem pair of n and d, we have
In other words,
r1 − r2 = (q2 − q1 ) d. (28)
We are in one of the following three cases:
Case 1: We have q1 < q2 .
Case 2: We have q1 = q2 .
Case 3: We have q1 > q2 .
Let us first consider Case 1. In this case, we have q1 < q2 , so that q2 − q1 >
0. Since q2 − q1 is an integer, this entails that q2 − q1 ≥ 1. We can multiply
this inequality by d (since d > 0), thus obtaining (q2 − q1 ) d ≥ 1d = d. In
view of (28), we can rewrite this as r1 − r2 ≥ d. However, r1 ≤ d − 1 (since
r1 ∈ {0, 1, . . . , d − 1}) and r2 ≥ 0 (since r2 ∈ {0, 1, . . . , d − 1}). Hence, r1 −
r2 ≤ r1 ≤ d − 1 < d. This contradicts r1 − r2 ≥ d. Thus, we have found a
|{z}
≥0
contradiction in Case 1.
Let us next consider Case 2. In this case, we have q1 = q2 . Hence, we can
rewrite (28) as r1 − r2 = (q2 − q2 ) d = 0, so that r1 = r2 . Combining q1 = q2
| {z }
=0
with r1 = r2 , we obtain (q1 , q2 ) = (r1 , r2 ), which contradicts our assumption
that the two quo-rem pairs (q1 , r1 ) and (q2 , r2 ) are distinct. Thus, we have
found a contradiction in Case 2.
Finally, in Case 3, we have q1 > q2 and therefore q2 < q1 . Thus, Case 3 is just
a copy of Case 1 with the roles of the two pairs (q1 , r1 ) and (q2 , r2 ) switched
(since the two quo-rem pairs (q1 , r1 ) and (q2 , r2 ) are playing identical roles).
Hence, we obtain a contradiction in Case 3 (since we obtained one in Case 1).
We have now obtained contradictions in all three Cases 1, 2 and 3. Thus,
we always have a contradiction. Hence, our assumption was wrong. This
completes our proof of the uniqueness of the quo-rem pair of n and d.
Math 221 Winter 2024, version March 12, 2024 page 95
Now, let us come to the existence part. It is reasonable to try induction, but
there is a hurdle: Induction on d does not work (there is no good way to use the
induction hypothesis), whereas induction on n cannot be used as long as n can
be negative. Fortunately, the latter hurdle is surmountable. One way around it
is to first prove the existence of a quo-rem pair in the case when n ∈ N (that is,
n ≥ 0), and afterwards generalize this result to arbitrary integers n.
So let us prove the n ∈ N case:
Lemma 3.3.6. Let n ∈ N, and let d be a positive integer. Then, there exists a
quo-rem pair of n and d.
n − d = qd + r.
Thus,
n = (qd + r ) + d = qd + d + r = (q + 1) d + r,
which shows that (q + 1, r ) is a quo-rem pair of n and d (since r ∈ {0, 1, . . . , d − 1}).
Thus, there exists a quo-rem pair of n and d. This completes our induction step,
and thus Lemma 3.3.6 is proved.
We now return to proving Theorem 3.3.1. We have shown that
What remains to be done is proving that there is at least one quo-rem pair of
n and d if n < 0.
This can be done in several ways. One way is to proceed similarly to the
proof of Lemma 3.3.6, but using strong induction on −n.
23 Recall that a strong induction needs no base case (see Subsection 1.9.4).
Math 221 Winter 2024, version March 12, 2024 page 96
(1 − d) n = qd + r
n − dn = qd + r.
Hence,
n = dn + qd + r = (n + q) d + r.
This shows that (n + q, r ) is a quo-rem pair of n and d. Hence, such a quo-rem
pair exists. Hence, we have proved the existence of a quo-rem pair in the case
when n is negative. This completes our proof of Theorem 3.3.1.
Proof. Part (a) is a direct consequence of the definition of divisibility. But part
(b) is not!
So let us prove part (b). This is an “if and only if” statement, so we need to
prove both directions:
and
For the sake of brevity, I shall refer to these two directions as the “=⇒” and
“⇐=” directions (respectively).
Proof of the “=⇒” direction: Assume that n is odd. By Theorem 3.3.1, there
exists a quo-rem pair (q, r ) of n and 2. Consider this (q, r ). By the definition of
a quo-rem pair, this pair satisfies
Corollary 3.3.9. (a) The sum of any two even integers is even.
(b) The sum of any even integer with any odd integer is odd.
(c) The sum of any two odd integers is even.
Proof. We will only prove part (c), since the other two parts are analogous (and
even simpler).
(c) Let a and b be two odd integers. We must prove that a + b is even.
The integer a is odd. Hence, Proposition 3.3.8 (b) shows that we can write a
as a = 2k + 1 for some integer k.
Similarly, we can write b as b = 2ℓ + 1 for some integer ℓ.
Math 221 Winter 2024, version March 12, 2024 page 98
a + b = (2k + 1) + (2ℓ + 1) = 2k + 2ℓ + 2 = 2 (k + ℓ + 1) ,
Remark 3.3.10. Corollary 3.3.9 (c) is a property specific to the number 2. For
example, it is not true that the sum of any two integers not divisible by 3 is
divisible by 3.
Note that part (a) of this proposition can be restated as follows: The remain-
der n%d is an element of {0, 1, . . . , d − 1} that is congruent to n modulo d. Part
(c) says that, conversely, any element c of {0, 1, . . . , d − 1} that is congruent to n
modulo d must be this remainder n%d. Thus, together, these two parts uniquely
characterize the remainder n%d as the only element of {0, 1, . . . , d − 1} that is
congruent to n modulo d. This characterization is good to keep in mind, as it
describes the remainder independently of the quotient.
Proof of Proposition 3.3.11. We set
Thus, (q, r ) is a quo-rem pair of n and d (by the definition of a quo-rem pair).
In other words, we have n = qd + r and q ∈ Z and r ∈ {0, 1, . . . , n − 1}. We can
now prove all five parts of the proposition:
(d) We have n = q d + |{z}
r = (n//d) d + (n%d). This proves Proposition
|{z}
=n//d =n%d
3.3.11 (d).
(a) We have n%d = r ∈ {0, 1, . . . , d − 1}. Moreover, from n = qd + r, we
obtain r − n = r − (qd + r ) = −qd, which is clearly divisible by d. Hence,
d | r − n. Equivalently, r ≡ n mod d. In other words, n%d ≡ n mod d (since
Math 221 Winter 2024, version March 12, 2024 page 99
r = n%d). Thus, Proposition 3.3.11 (a) is proved (since we have shown that
n%d ∈ {0, 1, . . . , d − 1} as well).
(c) Let c ∈ {0, 1, . . . , d − 1} satisfy c ≡ n mod d. We must show that c = n%d.
From c ≡ n mod d, we obtain d | c − n. In other words, c − n = de for
some e ∈ Z. Consider this e. From c − n = de, we obtain c = n + de, so that
n = c − de = (−e) d + c. This (combined with c ∈ {0, 1, . . . , d − 1}) shows that
(−e, c) is a quo-rem pair of n and d. However, (q, r ) is also a quo-rem pair of
n and d (by its definition). Since there is only one quo-rem pair of n and d (by
Theorem 3.3.1), this shows that (−e, c) = (q, r ). Hence, c = r = n%d. This
proves Proposition 3.3.11 (c).
(b) Again, this is an “if and only if” statement, and we shall prove its “=⇒”
and “⇐=” directions separately:
=⇒: Assume that d | n. We must prove that n%d = 0. In other words, we
must prove that r = 0.
Indeed, d | n yields that n ≡ 0 mod d (by Proposition 3.2.3). In other words,
0 ≡ n mod d. Since we furthermore have 0 ∈ {0, 1, . . . , d − 1}, we can thus
apply Proposition 3.3.11 (c) to c = 0, and conclude that 0 = n%d. In other
words, n%d = 0. This proves the “=⇒” direction (i.e., it proves that if d | n,
then n%d = 0).
⇐=: If n%d = 0, then d | n because
n = qd + |{z}
r = qd.
=n%d=0
This proves the “⇐=” direction. Thus, both directions are proved, so that
Proposition 3.3.11 (b) holds.
(e) Assume that n ∈ N. Recall that r ∈ {0, 1, . . . , d − 1}, so that r ≤ d − 1 < d.
But n = qd + r, so that qd + r = n ≥ 0 (since n ∈ N). In other words, qd ≥
−r > −d (since r < d).
If we had q < 0, then we would have q ≤ −1 (since q is an integer) and
therefore qd ≤ (−1) d (since we can multiply the inequality q ≤ −1 by the
positive number d); but this would contradict qd > −d = (−1) d. Hence, we
cannot have q < 0. Thus, q ≥ 0, so that q ∈ N. In other words, n//d ∈ N
(since q = n//d). This proves Proposition 3.3.11 (e).
Definition 3.3.13. The integer part (aka floor) of a real number x is defined
to be the largest integer that is ≤ x. It is denoted by ⌊ x ⌋.
For example,
j√ k
⌊3.8⌋ = 3, ⌊4.2⌋ = 4, ⌊5⌋ = 5, 2 = 1,
⌊π ⌋ = 3, ⌊0.5⌋ = 0, ⌊−1.2⌋ = −2
(make sure you understand the last example! −1 is not ≤ −1.2, but −2 is).
Now, here is the connection to quotients and remainders:
Proof. Proposition 3.3.11 (a) yields n%d ∈ {0, 1, . . . , d − 1}. Hence, n%d ≥ 0
and n%d ≤ d − 1 < d.
Proposition 3.3.11 (d) yields n = (n//d) d + (n%d). Thus,
Dividing both sides of this inequality by d (we can do this, since d > 0), we
n n
obtain < (n//d) + 1. In other words, (n//d) + 1 > .
d d
On the other hand,
Dividing both sides of this inequality by d (we can do this, since d > 0), we
n
obtain ≥ n//d.
d
n n
Now, the integer n//d is ≤ (since ≥ n//d), but the next-larger integer
d d
n
(n//d) + 1 is not (since (n//d) + 1 > ). Thus, n//d is the largest integer that
n jnk d jnk
is ≤ . In other words, n//d = (by the definition of the floor ).
d d d
Solving the equation n = (n//d) d + (n%d) for n%d, we find
jnk jnk
n%d = n − (n//d) d = n − d = n−d· .
| {z } d d
n
=
d
Thus, Proposition 3.3.14 is proved.
Division with remainder is one of the most fundamental facts about integers;
almost all of number theory is downstream of it. Here are some applications:
3401 = r6 46 + r5 45 + r4 44 + r3 43 + r2 42 + r1 41 + r0 40 ,
where each ri is a “base-4 digit” (i.e., an element of {0, 1, 2, 3}). Here, we are tac-
itly assuming that 46 is the highest power of 4 that we need; but we don’t actu-
ally know this yet, so we must be prepared to add higher powers (47 , 48 , 49 , . . .)
if needed.
How do we find these base-4 digits r0 , r1 , . . . , r6 ?
We start by identifying r0 . Indeed, on the RHS24 of the equation
3401 = r6 46 + r5 45 + r4 44 + r3 43 + r2 42 + r1 41 + r0 40 ,
all but the last addends are multiples of 4, whereas the last addend is r0 40 = r0 .
Hence, we can rewrite this equation as follows (factoring out the 4):
3401 = 4 · r6 45 + r5 44 + r4 43 + r3 42 + r2 41 + r1 40 + r0 .
850 = r6 45 + r5 44 + r4 43 + r3 42 + r2 41 + r1 40 .
24 “RHS” means “right hand side”.
Math 221 Winter 2024, version March 12, 2024 page 103
r1 = 850%4 = 2 and
r6 44 + r5 43 + r4 42 + r3 41 + r2 40 = 850//4 = 212.
Thus, we have identified the base-4 digit r1 as 2. In order to find the remain-
ing digits, we analyze the latter equation
212 = r6 44 + r5 43 + r4 42 + r3 41 + r2 40 .
r2 = 212%4 = 0 and
3 2 1 0
r6 4 + r5 4 + r4 4 + r3 4 = 212//4 = 53.
Thus, we have identified the base-4 digit r2 as 0. In order to find the remain-
ing digits, we analyze the latter equation
53 = r6 43 + r5 42 + r4 41 + r3 40 .
r3 = 53%4 = 1 and
r6 42 + r5 41 + r4 40 = 53//4 = 13.
Thus, we have identified the base-4 digit r3 as 1. In order to find the remain-
ing digits, we analyze the latter equation
13 = r6 42 + r5 41 + r4 40 .
Math 221 Winter 2024, version March 12, 2024 page 104
3 = r6 41 + r5 40 .
In this equation, the only addend on the RHS not divisible by 4 is r5 40 = r5 , so
we can rewrite this equation as
0
3 = 4 · r6 4 + r5 ,
In analogy to the decimal system, we can state this as “the number 3401
written in base-4 is 0311021” (since the base-4 digits r6 , r5 , . . . , r0 have been
identified as 0, 3, 1, 1, 0, 2, 1). Commonly, one would omit the leading zeroes, so
this would become 311021.
The method we just used can be used for any given integer b > 1 instead
of 4 and any nonnegative integer n ∈ N instead of 3401: To find the “base-
b digits” of a nonnegative integer n, we first divide n by b with remainder,
then divide the resulting quotient again by b with remainder, then divide the
resulting quotient again by b with remainder, and so on, until we are left with
the quotient 0. The remainders obtained in the process will then be the base-b
digits of n (from right to left). This process must eventually come to an end
because (since b > 1) each quotient will be smaller than the preceding one.
We can summarize this as a theorem:
Math 221 Winter 2024, version March 12, 2024 page 105
n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0
with
k∈N and r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1} .
(b) If n < bk+1 for some k ∈ N, then we can write n in the form
n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0
with
r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1} .
(c) These r0 , r1 , . . . , rk are unique (when k is given). Moreover, they can be
explicitly computed by the formula
ri = n//bi %b for each i ∈ {0, 1, . . . , k } .
r0 = n%b,
r1 = (n//b) %b,
r2 = n//b2 %b,
r3 = n//b3 %b,
...,
rk = n//bk %b.
Proof. Forget that n was fixed (but keep b fixed). We shall prove the following two
claims:
Claim 1: Let n ∈ N and k ∈ N be such that n < bk+1 . Then, we can write n
in the form
n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0
with
r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1} .
Claim 2: Let n ∈ N and k ∈ N. Assume that n has been written in the form
n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0
Math 221 Winter 2024, version March 12, 2024 page 106
with
r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1} .
Then,
ri = n//bi %b for each i ∈ {0, 1, . . . , k } .
Once these two claims are proved, Theorem 3.3.15 will follow, because
• Theorem 3.3.15 (a) follows from Claim 1 (since we can pick k ∈ N high enough
that n < bk+1 holds25 ).
Hence, (n//b) b ≤ n < bk+1 . Dividing this inequality by the positive number b, we
obtain n//b < bk+1 /b = bk .
Now, recall our induction hypothesis, which says that Claim 1 holds for k − 1 instead
of k. In other words, if m ∈ N is such that m < b(k−1)+1 , then we can write m in the
form26
m = s k −1 · b k −1 + s k −2 · b k −2 + · · · + s 1 · b 1 + s 0 · b 0
with
s0 , s1 , . . . , sk−1 ∈ {0, 1, . . . , b − 1} .
25 Indeed, the assumption b > 1 ensures that the sequence b0 , b1 , b2 , . . . is strictly increasing
and thus eventually outgrows any given integer, including our n. Or we can argue this
directly: An easy induction (on n) shows that n < bn+1 , and thus we can simply take k = n.
26 We are deliberately using the letters m and s instead of n and r here, since the letter n is
i i
already taken (and the letters ri will be needed for something different).
Math 221 Winter 2024, version March 12, 2024 page 107
We can apply this to m = n//b (since n//b ∈ N and n//b < bk = b(k−1)+1 ), and
conclude that we can write n//b in the form
with
s0 , s1 , . . . , sk−1 ∈ {0, 1, . . . , b − 1} .
Let us do this. Thus,
n= (n//b) b + (n%b)
| {z }
=sk−1 ·bk−1 +sk−2 ·bk−2 +···+s1 ·b1 +s0 ·b0
= sk−1 · bk−1 + sk−2 · bk−2 + · · · + s1 · b1 + s0 · b0 b + (n%b)
= sk−1 · bk + sk−2 · bk−1 + · · · + s1 · b2 + s0 · b1 + (n%b)
| {z }
=(n%b)·b0
k k −1
= s k −1 · b + s k −2 · b + · · · + s1 · b + s0 · b + (n%b) · b0 .
2 1
Note that the coefficients n%b, s0 , s1 , . . . , sk−1 on the right hand side here all belong to
{0, 1, . . . , b − 1} (as we know). Thus, through this equality, we have written n in the
form
n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0
with
r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1}
(namely, with r0 = n%b and r1 = s0 and r2 = s1 and . . . and rk−1 = sk−2 and rk = sk−1 ).
Hence, n can be written in this form.
We have thus proved that if n ∈ N is such that n < bk+1 , then we can write n in the
form
n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0
with
r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1} .
In other words, we have proved Claim 1 for our k. This completes the induction step.
Thus, Claim 1 is proved by induction.
Proof of Claim 2. We could prove this by induction as well, but let us instead go for a
direct proof.
By assumption, we have
k k
n = r k · b k + r k −1 · b k −1 + · · · + r 1 · b 1 + r 0 · b 0 = ∑ rj · bj = ∑ rj bj.
j =0 j =0
Now, we must prove that ri = n//bi %b for each i ∈ {0, 1, . . . , k }. So let us fix an
i ∈ {0, 1, . . . , k }.
We have
k i −1 k
n= ∑ rj bj = ∑ rj bj + ∑ rj bj (29)
j =0 j =0 j =i
Math 221 Winter 2024, version March 12, 2024 page 108
(here, we have split our sum into two parts: one part which contains the addends for
j ∈ {0, 1, . . . , i − 1}, and one part which contains the addends for j ∈ {i, i + 1, . . . , k }).
We can rewrite the second sum as follows:
k k k
∑ rj bj =
|{z} ∑ r j bi b j −i = bi ∑ r j b j −i .
j =i j =i j =i
= bi b j −i
Let us set
k i −1
q′ := ∑ r j b j −i and r′ := ∑ rj bj.
j =i j =0
n = r ′ + bi q ′ = q ′ bi + r ′ . (31)
k i −1
Note that both sums q′ = ∑ r j b j−i and r ′ = ∑ r j b j are integers (indeed, b j−i is always
j =i j =0
an integer in the first sum, since j ≥ i entails j − i ∈ N).
We have assumed that r0 , r1 , . . . , rk ∈ {0, 1, . . . , b − 1}. In particular, the integers
r0 , r1 , . . . , rk are all ≥ 0 and ≤ b − 1. In other words, each j ∈ {0, 1, . . . , k } satisfies
i −1
r j ≥ 0 and r j ≤ b − 1. Hence, r ′ = ∑ r j b j ≥ 0 (since all the integers r j are ≥ 0, and so
j =0
is b) and
i −1 i −1 i −1
bi − 1
r′ = ∑ |{z}
rj bj ≤ ∑ ( b − 1) b j = ( b − 1) ∑ bj = ( b − 1) ·
b−1
= bi − 1.
j =0 j =0 j =0
≤ b −1 | {z }
=b0 +b1 +···+bi−1
bi − 1
=
b−1
(by Corollary 1.6.3,
applied to b and i
instead of q and n)
Thus, r ′ ∈ 0, 1, . . . , bi − 1 .
this shows that (q′ , r ′ ) is a quo-rem pair of n and bi . Therefore, in particular, q′ is the
quotient of the division of n by bi . In other words,
q′ = n//bi .
Math 221 Winter 2024, version March 12, 2024 page 109
However,
k
q′ = ∑ r j b j − i = r i b 0 + r i +1 b 1 + r i +2 b 2 + · · · + r k b k − i
j =i
b 0 + r i +1 b 1 + r i +2 b 2 + · · · + r k b k − i
= ri |{z}
=1 | {z }
=(ri+1 b0 +ri+2 b1 +···+rk bk−i−1 )b
= ri + ri+1 b0 + ri+2 b1 + · · · + rk bk−i−1 b.
The inductive proof of Claim 1 in the above proof is just a formal avatar of the
algorithm for writing a nonnegative integer n in base b that we demonstrated on an
example before the theorem. The formula ri = n//bi %b from Claim 2, on the other
and
( a%d = b%d) =⇒ ( a ≡ b mod d) .
Let us prove these implications separately:
Math 221 Winter 2024, version March 12, 2024 page 110
Corollary 3.3.17. Let a and b be two integers. Then, a ≡ b mod 2 holds if and
only if the numbers a and b are either both even or both odd.
Proof. =⇒: Assume that a ≡ b mod 2. We must show that the numbers a and b
are either both even or both odd.
Proposition 3.3.16 (applied to d = 2) shows that a ≡ b mod 2 if and only if
a%2 = b%2. Thus, a%2 = b%2 (since a ≡ b mod 2). However, a%2 ∈ {0, 1}
(by Proposition 3.3.11 (a), applied to n = a and d = 2). In other words, a%2 is
either 0 or 1. If a%2 = 0, then b%2 = 0 as well (since a%2 = b%2), and therefore
both a and b are even (by Corollary 3.3.12 (a)). If a%2 = 1, then b%2 = 1 as
well (since a%2 = b%2), and therefore both a and b are odd (by Corollary 3.3.12
(b)). Other cases cannot occur, since we know that a%2 is either 0 or 1. Hence,
in every possible case, the numbers a and b are either both even or both odd.
This proves the “=⇒” direction of Corollary 3.3.17.
⇐=: Assume that the numbers a and b are either both even or both odd.
Thus, the numbers a%2 and b%2 are either both 0 (this happens when a and
b are both even, by Corollary 3.3.12 (a)) or both 1 (this happens when a and b
are both odd, by Corollary 3.3.12 (b)). In either case, we have a%2 = b%2. By
Proposition 3.3.16 (applied to d = 2), this entails a ≡ b mod 2. Hence, the “⇐=”
direction of Corollary 3.3.17 is proved.
(b) If d ∤ n, then
It should be easy to prove both parts of this lemma, but we give a proof for
the sake of completeness.
Proof of Proposition 3.3.18. (a) Assume that d | n. Thus, n = dq for some q ∈ Z. Con-
sider this q.
Recall Definition 3.3.2. We have q ∈ Z and 0 ∈ {0, 1, . . . , d − 1} and n = qd + 0
(since qd + 0 = qd = dq = n). In other words, (q, 0) is a quo-rem pair of n and d (by
the definition of a quo-rem pair). Hence, Definition 3.3.2 shows that n//d = q and
n%d = 0.
On the other hand, from n = dq, we obtain
n − 1 = dq − 1
= ( q − 1) d + ( d − 1) (since (q − 1) d + (d − 1) = qd − d + d − 1 = qd − 1) .
Thus, we have q − 1 ∈ Z and d − 1 ∈ {0, 1, . . . , d − 1} and n − 1 = (q − 1) d + (d − 1).
In other words, the pair (q − 1, d − 1) is a quo-rem pair of n − 1 and d (by the defini-
tion of a quo-rem pair). Hence, Definition 3.3.2 shows that (n − 1) //d = q − 1 and
(n − 1) %d = d − 1.
Now, from (n − 1) //d = q − 1, we obtain ((n − 1) //d) + 1 = q = n//d. In other
words, n//d = ((n − 1) //d) + 1. Combining this with n%d = 0 and (n − 1) %d =
d − 1, we see that Proposition 3.3.18 (a) has been proved.
(b) Assume that d ∤ n. Let q = n//d and r = n%d. Then, by the definition of quotient
and remainder, we have
(b) The greatest common divisor of a and b is the largest among the com-
mon divisors of a and b, unless a = b = 0. In the case a = b = 0, it is defined
to be 0 instead.
We denote the greatest common divisor of a and b as gcd ( a, b), and we
refer to it as the gcd of a and b.
We will soon see that this greatest common divisor is well-defined (see Re-
mark 3.4.2 below). But first, some examples:
This argument also gives us a slow and stupid algorithm to compute gcd ( a, b)
when a ̸= 0: We just go through all integers in the interval [− | a| , | a|], and
check which of them are common divisors of a and b. But there is a much
faster algorithm.
(c) Proposition 3.4.3 (c) follows from observing that a and b play equal roles
in Definition 3.4.1.
(d) Let a, b, c ∈ Z satisfy b ≡ c mod a. We must prove that gcd ( a, b) =
gcd ( a, c).
If a = 0, then this is clearly true (because in this case, b ≡ c mod a becomes
b ≡ c mod 0, which entails b = c).
It thus remains to consider the case a ̸= 0 only. In this case, gcd ( a, b) is
literally the greatest common divisor of a and b, whereas gcd ( a, c) is literally
the greatest common divisor of a and c. Hence, in order to prove that these
two gcds are equal, it will suffice to show that the common divisors of a and b
are precisely the common divisors of a and c. To do this, in turn, it suffices to
prove the following two claims:
Before we prove these two claims, let us recall that b ≡ c mod a; in other
words, c ≡ b mod a (by the symmetry of congruence). Hence, the numbers b
and c play equal roles in our setting. Thus, Claims 1 and 2 are analogous, so
that any proof of one of the two will also prove the other (once the roles of b
and c are switched).
Proof of Claim 1. Let d be a common divisor of a and b. Thus, d | a and d | b
(by the definition of a common divisor). In other words, we have a = dx and
b = dy for some integers x and y. Consider these x and y.
But b ≡ c mod a. In other words, a | b − c. Hence, d | a | b − c (by the
transitivity of divisibility). In other words, b − c = dz for some integer z.
Consider this z.
Now, b − (b − c) = c, so that
b − (b − c) = dy − dz = d (y − z) .
c = |{z}
| {z } | {z }
=dy =dz an integer
(g) is obvious when a = b = 0 (since 0 | 0), and otherwise follows from the
definition of gcd ( a, b).
(h) The divisors of a are precisely the divisors of − a. The divisors of b are
precisely the divisors of −b. Thus, the common divisors of a and b remain
unchanged if we replace a by − a or replace b by −b. Therefore, Proposition
3.4.3 (h) follows from the definition of the gcd.
(i) Let a, b ∈ Z satisfy a | b. Then, b ≡ 0 mod a. Hence, Proposition 3.4.3 (d)
(applied to c = 0) yields gcd ( a, b) = gcd ( a, 0) = | a| (by Proposition 3.4.3 (b)).
This proves Proposition 3.4.3 (i).
Corollary 3.4.4 (Euclidean recursion for the gcd). Let a ∈ Z, and let b be a
positive integer. Then,
(by Proposition 3.4.3 (f), applied to b and a instead of a and b). This proves
Corollary 3.4.4.
Math 221 Winter 2024, version March 12, 2024 page 117
and
Proof. In each step of the Euclidean algorithm, the second argument b gets
replaced by a%b. This has the consequence that b decreases by at least 1 (since
the definition of a remainder yields a%b ∈ {0, 1, . . . , b − 1} and thus a%b ≤
b − 1). But b remains nonnegative throughout the algorithm. Thus, b cannot
decrease (by at least 1) more than b0 times in succession, where b0 is the original
value of b (as it was fed into the algorithm). Hence, the algorithm cannot have
more than b0 steps. In other words, the algorithm must terminate after at most
b0 steps. This proves Proposition 3.4.5 (since b0 is precisely the original value
of b).
Proposition 3.4.5 greatly overestimates the actual time that the Euclidean al-
gorithm needs to terminate: In truth, it terminates after at most log2 ( ab) + 2
steps (if a and b are positive)29 , which is usually much fewer than b. Some vari-
ants of the Euclidean algorithm get to the goal even faster. This speediness is
part of the reason why the Euclidean algorithm (and greatest common divisors)
is so useful in practical applications of number theory.
The Euclidean algorithm can be easily adapted to arbitrary b ∈ Z instead of
just b ∈ N (by adding a first step in which we replace b by −b if b is negative):
of inbuilt fundamental mathematical tools. All the algorithms can be implemented in any
other language as well, but the code looks best in Python.
29 Hints to the proof. Recall that each step of the algorithm replaces the numbers a and b by b
and a%b. Since b > a%b (because a%b ∈ {0, 1, . . . , b − 1} entails a%b < b), this yields that
after each step of the algorithm, the “current” numbers a and b satisfy a > b.
Now, consider the product ab of the two numbers a and b. We claim that each step of
the algorithm, except perhaps the first one, decreases this number by a factor of at least 2.
ab
In order to see this, you need to show that b ( a%b) ≤ whenever a > b. But this follows
2
a
from a%b ≤ , which in turn follows easily from a > b (why?).
2
Now you know that the product ab decreases by a factor of at least 2 at each step of the
algorithm except for the first one. In other words, its binary logarithm log2 ( ab) decreases
by at least 1 at each step of the algorithm except for the first one. At the first step, it also
decreases or stays unchanged. From this, it follows easily that the algorithm cannot have
more than log2 ( ab) + 1 steps until it reaches a situation in which log2 ( ab) ≤ 0. But in such
a situation, we must have a = b = 1, and it will only take one more step to reach the end of
the algorithm.
Math 221 Winter 2024, version March 12, 2024 page 120
gcd ( a, b) = xa + yb.
We will soon prove this theorem. First, we introduce a notation and give a
few examples:
Definition 3.4.7. Let a and b be two integers. Then, a Bezout pair for ( a, b)
means a pair ( x, y) of two integers satisfying gcd ( a, b) = xa + yb.
For instance, a Bezout pair for (4, 7) is a pair ( x, y) of integers satisfying
gcd (4, 7) = x · 4 + y · 7. In view of gcd (4, 7) = 1, this latter equation simplifies
to 1 = 4x + 7y. So a Bezout pair for (4, 7) is a solution to this equation 1 =
4x + 7y in integers x and y. This is similar to the coin problem from Subsection
1.9.6, in the sense that you can think of such a Bezout pair ( x, y) as a way to pay
1 cent with x many 4-cent coins and y many 7-cent coins, assuming that you
are allowed to get change (because x and y are allowed to be negative). Without
change, of course, you could not pay 1 cent using 4-cent coins and 7-cent coins.
But with change, it works: You pay two 4-cent coins and get one 7-cent coin
in return, and thus end up paying 2 · 4 + (−1) · 7 = 1 cent, which is what you
wanted. In other words, the pair ( x, y) = (2, −1) satisfies 1 = 4x + 7y. In other
words, (2, −1) is a Bezout pair for (4, 7). There are also other Bezout pairs for
(4, 7), for example (−5, 3) (since 4 (−5) + 7 · 3 = 1). So a Bezout pair is usually
not unique.
So Bezout’s theorem can be restated as follows: For any two integers a and
b, you can pay gcd ( a, b) cents with a-cent coins and b-cent coins, if you can get
change30 . What denominations can be paid without change is a more compli-
cated story, and we will return to this in Section 3.8.
30 more
precisely: if you can get change in a-cent coins and b-cent coins (and there are infinitely
many coins of either denomination available)
Math 221 Winter 2024, version March 12, 2024 page 121
Here is another example: A Bezout pair for (6, 16) is (3, −1), since gcd (6, 16) =
2 = 6x + 16y for ( x, y) = (3, −1).
So Bezout’s theorem (Theorem 3.4.6) is saying that for any two integers a, b ∈
Z, there exists a Bezout pair for ( a, b).
How can we prove this theorem? Induction (particularly strong induction)
appears to be a reasonable method. Unfortunately, induction can only be used
to prove a statement about elements of a set of the form {k, k + 1, k + 2, . . .} for
a given integer k (that is, a statement about integers from a given lower bound
onwards). To put it differently, induction can only prove a statement that “starts
somewhere” (even if it is presented as a strong induction with no base case).
Meanwhile, in Bezout’s theorem, both a and b are just arbitrary integers, so
they can be arbitrarily low.
This hurdle can be surmounted: While we cannot prove Bezout’s theorem by
induction directly, we can first restrict it to the case when b ∈ N, and prove
this restriction by induction. In other words, we shall use induction to prove
the following particular case of Bezout’s theorem:
Once this lemma is proved, we will quickly deduce Bezout’s theorem in full
generality from it. So let us prove this lemma.
Proof of Lemma 3.4.8. We shall use strong induction on b. Here, we do not con-
sider a to be fixed. Thus, the statement that we will be proving for all b ∈ N
is
P (b) := (for each a ∈ Z, there exists a Bezout pair for ( a, b)) .
Our goal is to prove this statement P (b) for all b ∈ N. We shall do this by
strong induction on b:
Base case: Let us prove the statement P (0). Indeed, for each a ∈ Z, let us set
1,
if a > 0;
sign a := 0, if a = 0;
−1, if a < 0.
Then, for each a ∈ Z, the pair (sign a, 0) is a Bezout pair for ( a, 0), since
Hence, for each a ∈ Z, there exists a Bezout pair for ( a, 0). In other words, the
statement P (0) holds.
Induction step: Fix a positive integer b. We must prove the implication
Thus, we assume (as the induction hypothesis) that P (0) AND P (1) AND P (2)
AND · · · AND P (b − 1) holds. In other words, we assume that the b statements
P (0) , P (1) , P (2) , . . . , P (b − 1) all hold. In other words, we assume that
In other words, we assume that for each a ∈ Z and each d ∈ {0, 1, . . . , b − 1},
there exists a Bezout pair for ( a, d). Renaming a as c here, we can restate this
as follows: We assume that for each c ∈ Z and each d ∈ {0, 1, . . . , b − 1}, there
exists a Bezout pair for (c, d). So this is our induction hypothesis (brought to
its most convenient form).
Our goal is now to prove P (b). In other words, we must prove that for each
a ∈ Z, there exists a Bezout pair for ( a, b).
So we fix an a ∈ Z, and we set out to find a Bezout pair for ( a, b).
The Euclidean recursion (Corollary 3.4.4) yields
a = ( a//b) b + ( a%b) .
Now that Lemma 3.4.8 has been proven, Bezout’s theorem in the general case
(Theorem 3.4.6) easily follows:
Proof of Theorem 3.4.6. We are in one of the following two cases:
Case 1: We have b ≥ 0.
31 Here,sign(a) is what was called sign a in the above proof. In Python, this can be defined as
follows:
def sign(a):
if a < 0:
return -1
if a == 0:
return 0
if a > 0:
return 1
Math 221 Winter 2024, version March 12, 2024 page 124
Thus, there exist two integers x and y such that gcd ( a, b) = xa + yb (namely,
x = u and y = −v). This proves Theorem 3.4.6 in Case 2.
We have now proved Theorem 3.4.6 in both Cases 1 and 2, so that the theorem
always holds.
Exercise 3.4.1. Recall the bezout_pair function defined above. This function
outputs a Bezout pair for any given pair ( a, b) with a ∈ Z and b ∈ N. Tweak
it so that it works for arbitrary b ∈ Z (not just for b ∈ N).
[Feel free to use your favorite programming language instead of Python,
but do not change the logic in the case when b ≥ 0.]
In other words, the common divisors of a and b are precisely the divisors of
gcd ( a, b). In other words, gcd ( a, b) is not just the greatest among the common
divisors of a and b (if a and b are not both 0), but it also is divisible by all of
them.
Proof of Theorem 3.4.9. We must prove the two implications
and
(m | gcd ( a, b)) =⇒ (m | a and m | b) .
The second of these two implications is easy to prove: If m | gcd ( a, b), then
m | a (since m | gcd ( a, b) | a) and m | b (similarly).
It thus remains to prove the first implication: i.e., to prove that
This is saying that when two integers have a common factor s, then this
common factor can be pulled out of their gcd. (The caveat is, of course, that the
common factor must be replaced by its absolute value, since a gcd cannot be
negative by definition.)
Proof of Theorem 3.4.11. Let
Thus, we must prove that h = |s| · g. Note that h and g are nonnegative (because
Proposition 3.4.3 (a) shows that gcds are always nonnegative). Thus, h = |h|
and g = | g|, so that |s| · g = |s| · | g| = |sg| (since | x | · |y| = | xy| for any two real
numbers x and y).
Our goal is to prove that h = |s| · g. Since h = |h| and |s| · g = |sg|, this
amounts to proving that |h| = |sg|. So this is our goal now.
One good way to prove that two integers p and q satisfy | p| = |q| is by
showing that p | q and q | p. Indeed, from p | q and q | p, it follows that
| p| = |q| (by Proposition 3.1.4 (c)).
Thus, in order to prove that |h| = |sg|, it will suffice to show that h | sg and
sg | h. Now, let us do this.
32 “Multiplying both sides by s” means using the following simple fact: If two integers x and y
satisfy x | y, then sx | sy.
Math 221 Winter 2024, version March 12, 2024 page 127
h h
This integer satisfies s · = h = gcd (sa, sb) | sa. Dividing both sides
s s
h h
by s, we thus obtain | a 33 . Similarly, | b. Hence, Corollary 3.4.10
s s
h h h
(applied to m = ) yields | gcd ( a, b). In other words, | g (since
s s s
h
g = gcd ( a, b)). Multiplying both sides by s, we thus obtain s · | sg. In
s
other words, h | sg. Thus, h | sg is proved.
• Second proof of h | sg: We have h = gcd (sa, sb) | sa. In other words, sa =
hu for some integer u. Similarly, sb = hv for some integer v. Consider
these integers u and v.
However, Bezout’s theorem (Theorem 3.4.6) shows that there exist two
integers x and y such that gcd ( a, b) = xa + yb. Consider these x and y.
Now, g = gcd ( a, b) = xa + yb, so that
Definition 3.5.1. Two integers a and b are said to be coprime (or relatively
prime) if gcd ( a, b) = 1.
Since divisibility does not depend on signs (Proposition 3.1.4 (a)), we thus ob-
tain ab | c 34 . This proves Theorem 3.5.4.
Example 3.5.5. We have 4 | 56 and 7 | 56. Since 4 and 7 are coprime, we can
thus conclude (by Theorem 3.5.4, applied to a = 4, b = 7 and c = 56) that
4 · 7 | 56.
In contrast, from 6 | 12 and 4 | 12, we cannot conclude that 6 · 4 | 12, since
6 and 4 are not coprime.
34 Here is this argument in detail: We have just proved that ab | abs c (where we write abs x
for | x | in order to avoid confusing absolute-value bars with divisibility symbols). Propo-
sition 3.1.4 (a) shows that we have ab | c if and only if abs ( ab) | abs c. However, the
same proposition shows that we have ab | abs c if and only if abs ( ab) | abs (abs c). Since
abs (abs c) = abs c, the latter statement can be rewritten as abs ( ab) | abs c. Thus, both state-
ments ab | c and ab | abs c are equivalent to abs ( ab) | abs c, and thus are equivalent to each
other. Hence, from ab | abs c, we obtain ab | c.
Math 221 Winter 2024, version March 12, 2024 page 130
Since divisibility does not depend on signs, this means that a | c. Thus, Theo-
rem 3.5.6 holds.
Again, Theorem 3.5.6 can be motivated using the “independence” view on coprimal-
ity: If a is coprime to b, then b cannot be the “reason” for the divisibility a | bc, and
thus b can be removed from this divisibility. Again, this is neither a proof nor even a
rigorous statement, but it makes Theorem 3.5.6 looks less surprising.
Hence, g | a (since divisibility does not depend on signs). Combining this with
g | c, we obtain g | gcd ( a, c) (by Corollary 3.4.10, applied to g, a and c instead
of m, a and b). However, gcd ( a, c) = 1 (since a is coprime to c), so we obtain
g | gcd ( a, c) = 1.
However, g is a nonnegative integer (since any gcd is a nonnegative integer).
Thus, g is a nonnegative divisor of 1 (since g | 1). Since the only nonnegative
divisor of 1 is 1, we thus conclude that g = 1. Hence, gcd ( ab, c) = g = 1. This
shows that ab is coprime to c, and we have proved Theorem 3.5.8.
Again, Theorem 3.5.8 can be viewed within the “independence” paradigm: If each
of a and b is coprime to c, then so should be ab, because any “dependence” between
Math 221 Winter 2024, version March 12, 2024 page 131
Proof. Read our above proof of Theorem 3.5.4 until the point where it shows that ab |
|c| · gcd ( a, b). Now, observe that |c| divides c (since |c| is either c or −c), and thus
|c| · gcd ( a, b) divides c · gcd ( a, b). Hence,
Proof. Read our above proof of Theorem 3.5.6 until the point where it shows that a |
|c| · gcd ( a, b). Now, observe that |c| divides c (since |c| is either c or −c), and thus
|c| · gcd ( a, b) divides c · gcd ( a, b). Hence,
Theorem 3.5.12. Let a and b be two integers that are not both 0. Let g =
a b
gcd ( a, b). Then, the integers and are coprime.
g g
Proof of Theorem 3.5.12. Since a and b are not both 0, we have gcd ( a, b) ̸= 0
(since 0 cannot divide any nonzero integer). Since we know that gcd ( a, b) ∈
N, we thus conclude that gcd ( a, b) > 0. In other words, g > 0 (since g =
a b
gcd ( a, b)). Thus, and are well-defined. Also, from g > 0, we obtain
g g
| g| = g.
a b
Since g = gcd ( a, b), we have g | a and g | b. Hence, and are integers.
g g
Moreover,
a b a b
g = gcd ( a, b) = gcd g · , g · since a = g · and b = g ·
g g g g
by Theorem 3.4.11,
a b
= | g| · gcd , a b
g g since and are integers
g g
|{z}
=g
a b
= g · gcd , .
g g
Dividing this equality by g, we find
a b
1 = gcd , (since g ̸= 0) .
g g
a b
This shows that and are coprime. Thus, Theorem 3.5.12 is proven.
g g
Definition 3.6.1. An integer n > 1 is said to be prime (or a prime) if the only
positive divisors of n are 1 and n.
Proof. The number p is prime, and thus its only positive divisors are 1 and p.
Since gcd (n, p) is a positive divisor of p (this is easy to see35 ), we thus conclude
that gcd (n, p) must be either 1 or p. So we are in one of the following two cases:
Case 1: We have gcd (n, p) = 1.
Case 2: We have gcd (n, p) = p.
Let us first consider Case 1. In this case, we have gcd (n, p) = 1. In other
words, n is coprime to p. Furthermore, the greatest common divisor of n and
p is gcd (n, p) = 1; therefore, p cannot be a common divisor of n and p (since
p > 1). Thus, n is not divisible by p (since this would entail that p is a common
divisor of n and p). So we have shown that n is coprime to p and not divisible
by p. Thus, Lemma 3.6.2 is proved in Case 1.
Let us now consider Case 2. In this case, we have gcd (n, p) = p ̸= 1. Thus,
n is not coprime to p. Also, p = gcd (n, p) | n shows that n is divisible by p. So
we have shown that n is divisible by p and not coprime to p. Hence, Lemma
3.6.2 is proved in Case 2.
We have now proved Lemma 3.6.2 in both Cases 1 and 2; thus, Lemma 3.6.2
is fully proved.
(The moniker “friend-or-foe lemma” is metaphorical: You can think of inte-
gers that are divisible by p as “friends of p”, and think of integers coprime to
p as “foes of p”. Thus, a prime number cleanly divides the integers into its
“friends” and its “foes”. In contrast, the non-prime number 4 has a more “nu-
anced” relationship with certain integers such as 2 (since 2 is neither divisible
by 4 nor coprime to 4).)
3.6.3. There are infinitely many primes, and some more exercises
Exercise 3.6.1 (b) shows that there are infinitely many primes. This is a famous
result of Euclid; many other proofs of it are known (see, e.g., [Conrad22]).
All primes except for 2 are odd. Thus, the distances between consecutive
primes (except for 2 and 3) are always even. Beside this, however, these dis-
tances are rather unpredictable. For instance, the two consecutive primes 41
and 43 are a distance of 2 apart, whereas the two consecutive primes 113 and
127 are a distance of 14 apart. Even some very simple-sounding questions, such
as “are there infinitely many pairs of consecutive primes at a distance of 2 from
each other?” (such primes are called twin primes) are so-far unresolved (this
one is called the twin primes conjecture). At least, one can show that three
consecutive primes cannot be at distances of 2 from each other:
Exercise 3.6.2. Let p be a prime such that p − 2 and p + 2 are also prime.
Prove that p = 5.
[Hint: Consider the remainders upon division by 6.]
Exercise 3.6.3. Let p be a prime larger than 3. Prove that p2 ≡ 1 mod 24.
[Hint: Recall some older problems. Also note that the integers 3 and 8 are
coprime.]
Exercise 3.6.4. Let n be an integer such that n > 1 but n is not a prime. Let d
be the smallest divisor of n that is larger than 1. Prove that d2 ≤ n.
(You can use standard
properties of inequalities – e.g., the equivalence
2 2
(u ≤ v) ⇐⇒ u ≤ v when u and v are positive.)
Math 221 Winter 2024, version March 12, 2024 page 135
k =0
↙
k =1
n=0 → 1 ↙
k =2
n=1 → 1 1 ↙
k =3
n=2 → 1 2 1 ↙
k =4
n=3 → 1 3 3 1 ↙
k =5
n=4 → 1 4 6 4 1 ↙
k =6
n=5 → 1 5 10 10 5 1 ↙
k =7
n=6 → 1 6 15 20 15 6 1 ↙
n=7 → 1 7 21 35 35 21 7 1
n=8 → 1 8 28 56 70 56 28 8 1
One property of Pascal’s triangle that you might have already noticed is the
following: All
entries
in its n= 7 row except for the two 1’s (i.e., all the binomial
7 7 7
coefficients , ,..., ) are divisible by 7; all entries in the n = 5 row
1 2 6
except for the two 1’s are divisible by 5; likewise for the n = 3 and n = 2 rows.
The pattern here can be generalized to any prime number instead of 7, 5, 3, 2:
p
Theorem 3.6.3. Let p be a prime. Let k ∈ {1, 2, . . . , p − 1}. Then, p | .
k
p−1
p
k =p .
k k−1
| {z }
an integer
(by Theorem 2.5.9)
p
Thus, p | k .
k
From k ∈ {1, 2, . . . , p − 1}, we furthermore obtain p ∤ k (because if we had
p | k, then Proposition 3.1.4 (b) would entail | p| ≤ |k |, which would contradict
|k| = k ≤ p − 1 < p = | p|). In other words, k is not divisible by p.
Math 221 Winter 2024, version March 12, 2024 page 136
Theorem
3.6.3 shows that if p is a prime, then all the binomial coefficients
p
in the p-th row of Pascal’s triangle are divisible by p (except for the two
i
1’s on the borders of the triangle). The following exercise, in contrast, claims
that the binomial coefficients in the ( p − 1)-st row are alternatingly congruent
to 1 and to −1 modulo p:
p−1
≡ (−1)i mod p for each i ∈ {0, 1, . . . , p − 1} .
i
p−1 p−1
[Hint: What connects the three binomial coefficients ,
i i−1
p
and ?]
i
a p ≡ a mod p.
Proof. We shall induct on a. This will only cover the case a ≥ 0, so we will have
to handle the case a < 0 by a separate argument afterwards.
Base case: The congruence a p ≡ a mod p clearly holds for a = 0 (since 0 p =
0 ≡ 0 mod p).
Induction step: Let a ∈ N. Assume (as the induction hypothesis) that a p ≡
a mod p. We must prove that ( a + 1) p ≡ a + 1 mod p.
But the binomial formula (Theorem 2.6.1) yields
p p
p k p−k p k
( a + 1) = ∑
p
a 1|{z} = ∑ a
k =0
k k = 0
k
=1
p −1
p p k p p
= a +∑
0
a + a
0 |{z}
k = 1
k p
| {z } =1 | {z }
=1 =1
here, we have split off the addends
for k = 0 and for k = p from the sum
p −1 p −1
p k p k
= 1+ ∑ a +a = ∑p
a + a p + 1.
k =1
k k =1
k
In other words,
p −1
p k
p p
( a + 1) − ( a + 1) = ∑ k
a . (35)
k =1
However,
Theorem 3.6.3 shows that each k ∈ {1, 2, . . . , p − 1} satisfies p |
p p k p k
| a . In other words, a is a multiple of p for each k ∈ {1, 2, . . . , p − 1}.
k k k
p −1 p
Hence, ∑ ak is a sum of multiples of p, and thus itself a multiple of
k =1 k
p −1 p
p. That is, we have p | ∑ ak . In view of (35), we can rewrite this as
k =1 k
p | ( a + 1) p − ( a p + 1). In other words,
( a + 1) p ≡ a p + 1 mod p. (36)
However, the induction hypothesis says that a p ≡ a mod p. Adding the obvi-
ous congruence 1 ≡ 1 mod p to this, we obtain
a p + 1 ≡ a + 1 mod p.
Math 221 Winter 2024, version March 12, 2024 page 138
( a + 1) p ≡ a p + 1 ≡ a + 1 mod p,
Proof of Theorem 3.6.5. We shall prove the claim of Theorem 3.6.5 in the follow-
ing equivalent form: “If p ∤ a, then p | b.”
Assume that p ∤ a. We must then prove that p | b.
The friend-or-foe lemma (Lemma 3.6.2) yields that a is either divisible by p
or coprime to p. Thus, a is coprime to p (since p ∤ a). In other words, p is
coprime to a. Hence, we can use the coprime cancellation theorem (Theorem
3.5.6, applied to p, a and b instead of a, b and c) to obtain p | b from p | ab. This
is precisely what we wanted to prove. Theorem 3.6.5 is thus proved.
Theorem 3.6.5 shows that if a prime number p divides a product ab, then it
must divide a or b (or both). In contrast, a non-prime number like 4 can divide
a product ab without dividing a or b. For example, 4 | 2 · 6 but 4 ∤ 2 and 4 ∤ 6.
We can extend Theorem 3.6.5 to products of several factors:
Proof sketch. Induct on k. In the induction step, use Theorem 3.6.5. (The base
case is the case k = 0, in which case Corollary 3.6.6 is vacuously true because
p ∤ 1.) (See [Grinbe19b, Proposition 2.13.7] for this proof in detail.)
The following exercise is another form of Fermat’s Little Theorem:
Lemma 3.6.7. Let p be a prime. Let n be a nonzero integer. Then, there exists
a largest m ∈ N such that pm | n.
n
Proof. The relation pm | n means that ∈ Z. In other words, it means that
pm
we can divide n by p at least m times without obtaining a non-integer. So the
claim of Lemma 3.6.7 is saying that there is a largest number of times that we
can divide n by p without obtaining a non-integer. But this is clear: Every time
Math 221 Winter 2024, version March 12, 2024 page 140
we divide n by p, the absolute value |n| decreases (since p > 1), and obviously
this cannot go on forever without eventually yielding a non-integer.36
(See [Grinbe19b, Proof of Lemma 2.13.22] for a more formal proof of Lemma
3.6.7.)
Lemma 3.6.7 allows us to make the following definition:
Thus, ∞ acts like a “mythical number that is larger than any actual number”.
We can keep up this charade as long as we only add and compare, but never
subtract ∞ from anything (since 1 + ∞ = ∞ would turn into 1 = 0 if you
subtracted ∞).
• We have
v3 (99) = 2 since 32 | 99 but 33 ∤ 99 ;
0 1
v3 (98) = 0 since 3 | 98 but 3 ∤ 98 ;
v3 (96) = 1 since 31 | 96 but 32 ∤ 96 ;
v3 (0) = ∞.
We can restate the definition of v p (n) in yet another way: If p is a prime and
n is a positive integer, then v p (n) is the number of zeroes at the end of the
36 Of
course, we are also tacitly using the fact that n is an integer in the first place, so that m = 0
does satisfy pm | n (since p0 = 1 | n).
Math 221 Winter 2024, version March 12, 2024 page 141
n = v p ( a) and m = v p (b) .
Thus, pn | a and pm | b. In other words, there are integers x and y such that
a = pn x and b = pm y. Consider these x and y.
If we had p | x, then we would readily obtain pn+1 | a (because p | x entails
that x = pz for some integer z, and thus this integer z must satisfy a = pn |{z}
x =
= pz
pn pz =pn+1 z) and therefore v p ( a) ≥ n + 1 (by Lemma 3.6.9, applied to n + 1
and a instead of i and n), which would contradict v p ( a) = n < n + 1. Thus, we
cannot have p | x. For similar reasons, we cannot have p | y.
However, multiplying a = pn x with b = pm y, we obtain ab = pn x · pm y =
p +m xy, and thus pn+m | ab. Therefore, v p ( ab) ≥ n + m (by Lemma 3.6.9,
n
v p ( a1 a2 · · · a k ) = v p ( a1 ) + v p ( a2 ) + · · · + v p ( a k )
Proof. Induct on k. The base case uses v p (1) = 0. The induction step relies on
Theorem 3.6.10 (a).
Note that Theorem 3.6.10 (a) would fail if p were allowed to be non-prime.
For instance, v4 (2 · 2) = 1 but v4 (2) + v4 (2) = 0 + 0 = 0.
Thus, in particular, every odd move (i.e., the 1-st, the 3-rd, the 5-th, and so
on moves) moves the smallest disk (since v2 (k ) = 0 when k is odd).
The proof of Proposition 3.6.12 relies on the following lemma about p-valuations:
Math 221 Winter 2024, version March 12, 2024 page 144
so that
v2 k − 2n−1 = v2 2| n−1 +{z
k − 2n−}1 = v2 (k ) . (37)
=k
Recall that the strategy Sn was defined recursively: It consists of first performing the
strategy Sn−1 (but with pegs 2 and 3 swapped), then moving the largest disk (from peg
1 to peg 3), and then again performing the strategy Sn−1 (but now with pegs 1 and 2
swapped). Since strategy Sn−1 requires 2n−1 − 1 moves in total, we thus conclude that
Math 221 Winter 2024, version March 12, 2024 page 145
1. the first 2n−1 − 1 moves of strategy Sn are identical with the corresponding moves
of strategy Sn−1 (except that pegs 2 and 3 are swapped);
2. the 2n−1 -th move of strategy Sn consists in moving the largest disk;
3. the next 2n−1 − 1 moves of strategy Sn (that is, the moves numbered 2n−1 +
1, 2n−1 + 2, . . . , 2n − 1) are identical with the moves of strategy Sn−1 (except
that pegs 1 and 2 are swapped).
Therefore, the k-th move of the strategy Sn
• moves the same disk as the k-th move of Sn−1 if k < 2n−1 ;
• moves the largest disk if k = 2n−1 ;
• moves the same disk as the k − 2n−1 -th move of Sn−1 if k > 2n−1 .
so that n = v2 (k ) + 1. Thus, the k-th move of the strategy Sn moves the (v2 (k ) + 1)-th
smallest disk (because we have shown that it moves the n-th smallest disk). So the
claim we are trying to prove has been proved in Case 2.
n −1
Let us finally consider Case 3. In this case, we have
n − 1
k > 2 . Thus, the k-th move of
the strategy Sn moves the same disk as the k − 2 -th move of Sn−1 (according to the
third of the three bullet points above). But our induction hypothesis (applied to k − 2n−1
n − 1
instead of k) yields that the latter move moves the v2 k − 2 + 1 -th smallest disk
(since k ∈ {1, 2, . . . , 2n − 1} and k > 2n−1 entails k − 2n− 1 ∈ 1, 2, . . . , 2n−1 − 1 quite
easily37 ). Thus, the former move moves the v2 k − 2n−1 + 1 -th smallest disk as well.
In view of (37), we can restate this as follows: The former move moves the (v2 (k ) + 1)-
th smallest disk. So the claim we are trying to prove has been proved in Case 3.
Thus, we have proved our claim in all three Cases 1, 2 and 3. In other words, we
have shown that the k-th move of the strategy Sn moves the (v2 (k ) + 1)-th smallest
disk. Hence, we have proved that Proposition 3.6.12 holds for n. This completes the
induction step. Thus, Proposition 3.6.12 is proved.
37 Here are the details: From k ∈ {1, 2, . . . , 2n − 1} ⊆ Z and k > 2n−1 , we see immediately
that k − 2n−1 is a positive integer. Furthermore, from k ∈ {1, 2, . . . , 2n − 1}, we obtain
k ≤ 2n − 1 = 2 · 2n−1 − 1 = 2n−1 + 2n−1 − 1, so that k − 2n−1 ≤ 2n−1 − 1. Since k − 2n−1 is a
positive integer, this results in k − 2n−1 ∈ 1, 2, . . . , 2n−1 − 1 .
Math 221 Winter 2024, version March 12, 2024 page 146
Exercise 3.6.7. Let p be a prime, and let m ∈ N. Let a and b be two integers
such that pm | ab and pm ∤ a. Prove that p | b.
Proof sketch. First, these sums are infinite sums. Why do they make sense?39
38 See Definition 3.3.13 and Definition 3.3.2 for the notations we are using here. The meaning
of the infinite sums will be discussed in the proof of the theorem.
39 It is trivially easy to concoct an infinite sum that does not make sense: for instance, 1 + 1 +
1 1 1
1 + · · · , or + + + · · · . In general, “infinite” operations in mathematics do not usually
1 2 3
exist unless their existence has been justified.
Math 221 Winter 2024, version March 12, 2024 page 147
Because we can discard all the addends that are zero, and then only finitely
many nonzero addends remain. For instance, if p = 2 and n = 13, then
n n n
1
+ 2 + 3 +···
p p p
13 13 13
= 1 + 2 + 3 +···
2 2 2
= ⌊6.5⌋ + ⌊3.25⌋ + ⌊1.625⌋ + ⌊0.8125⌋ + ⌊0.40625⌋ + · · ·
= 6+3+1+ 0| + 0 + 0 +{z0 + 0 + · · }·
These are zeroes, thus don’t contribute to the sum
= 6 + 3 + 1 = 10,
which is a well-defined
(finite)
value.
More
generally, for any prime p and any
n n n
n ∈ N, the sum 1
+ 2 + 3 + · · · has only finitely many nonzero
p p p
n
addends (because for every i ≥ n, we have pi ≥ pn > n and thus 0 ≤ i < 1, so
p
n
that = 0), and thus becomes a finite sum once we discard all its addends
pi
that are zero; but a finite sum obviously has a well-defined
jnk value.
Moreover, for every positive integer d, we have = n//d (by Proposition
d
3.3.14). Thus, the two infinite sums
n n n
1
+ 2 + 3 +··· and
p p p
n//p1 + n//p2 + n//p3 + · · ·
are equal.
It remains to prove that these two sums equal v p (n!). In other words, we
must prove that
n n n
v p (n!) = 1
+ 2 + 3 +··· . (38)
p p p
We can prove this by induction on n:
The base case (n = 0) boils down to 0 = 0 + 0 + 0 + · · · , which is true.
For the induction step, we proceed from n − 1 to n. So we fix a positive integer
n, and we assume (as our induction hypothesis) that
n−1 n−1 n−1
v p ((n − 1)!) = + + +··· , (39)
p1 p2 p3
and we set out to prove that
n n n
v p (n!) = 1
+ 2 + 3 +··· . (40)
p p p
Math 221 Winter 2024, version March 12, 2024 page 148
We first compare the left hand sides: Let k = v p (n). We know that n! =
(n − 1)! · n, and therefore
v p (n!) = v p ((n − 1)! · n)
= v p ((n − 1)!) + v p (n) (by Theorem 3.6.10 (a))
| {z }
=k
= v p ((n − 1)!) + k.
In other words, the LHS40 of (40) equals the LHS of (39) plus k.
Now, we shall show that the RHSs of the two equations differ by k as well.
For each i ∈ {1, 2, . . . , k}, we have pi | pk | n (since k = v p (n)) and therefore
n−1
n
i
= + 1 by Corollary 3.3.19 (a), applied to d = p .
pi pi
On the other hand, for each i ∈ {k + 1, k + 2, k + 3, . . .}, we have pi ∤ n (since
i > k = v p (n)) and thus
n−1
n
i
= by Corollary 3.3.19 (b), applied to d = p .
pi pi
These two equalities together yield
n n n
1
+ 2 + 3 +···
p p p
n−1 n−1 n−1
= +1 + +1 +···+ +1
p1 p2 pk
n−1 n−1 n−1
+ + + +···
p k +1 p k +2 p k +3
n−1 n−1 n−1
= + + + · · · + k.
p1 p2 p3
In other words, the RHS of (40) equals the RHS of (39) plus k.
But previously, we have shown the same for the LHSs. Thus, the equality
(40) is just the equality (39) with each side increased by k. Since (39) holds (by
the induction hypothesis), it thus follows that (40) also holds. In other words,
n n n
v p (n!) = 1
+ 2 + 3 +··· .
p p p
But this completes the induction step, and thus Theorem 3.6.15 is proven.
(For another proof of Theorem 3.6.15, see [Grinbe19b, Exercise 2.17.2 (c)] or
[Grinbe21, Theorem 5.3.1].)
Theorem 3.6.15 is known as de Polignac’s formula or Legendre’s formula.
Various uses of this formula can be found in [Grinbe21].
40 Theword “LHS” means “left hand side”.
The word “RHS” means “right hand side”.
Math 221 Winter 2024, version March 12, 2024 page 149
The word “uniquely” means here that any two ways of decomposing a given
positive integer n into a product of primes are equal up to reordering the fac-
tors. For example, we can also decompose 200 as 5 · 2 · 2 · 5 · 2, but this is the
same product with the factors in a different order.
Let us state this fact in full generality. First, we introduce a name for these
decompositions:
v p ( n ) = v p ( p1 p2 · · · p k )
= v p ( p1 ) + v p ( p2 ) + · · · + v p ( p k ) (41)
41 In Remark 6.7.2, we will see how many.
Math 221 Winter 2024, version March 12, 2024 page 150
Theorem 3.6.17 (a) shows that every positive integer n has a prime factoriza-
tion. Finding this prime factorization is a classical hard computational problem.
(Quite a few encryption standards rely on its hardness.)
3.6.13. Applications
Prime factorizations can be rather useful. The next few exercises provide some
examples:
Exercise 3.6.9. Let n and m be integers. Prove that n | m if and only if each
prime p satisfies v p (n) ≤ v p (m).
[Hint: For the “if” direction, start by picking prime factorizations of n and
m.]
Math 221 Winter 2024, version March 12, 2024 page 151
• Otherwise, it is 0.
Some examples:
Note that the lcm of two positive integers is a fairly well-known concept:
When you bring two fractions (of integers) to their lowest common denomina-
tor, this lowest common denominator is actually the lcm of the denominators
of the fractions.
Here are some properties of lcms:
Proof sketch. Easy consequences of the definitions. (For part (a), observe that
two nonzero integers a and b have at least one positive common multiple –
namely, | ab|.)
Here is a counterpart to the universal property of the gcd (Theorem 3.4.9):
( a | m and b | m) ⇐⇒ (lcm ( a, b) | m) .
In other words, the common multiples of two integers a and b are precisely
the multiples of lcm ( a, b).
Proof sketch. (See [Grinbe19b, Theorem 2.11.7] for a detailed proof.)
⇐=: If lcm ( a, b) | m, then a | m (since Theorem 3.7.2 (d) yields a | lcm ( a, b) |
m) and b | m (similarly). Thus, the “⇐=” direction of the desired equivalence
is proved.
=⇒: Assume that a | m and b | m. We must show that lcm ( a, b) | m.
If one of a and b is 0, then this is easy (in fact, let’s say that a = 0; then,
0 = a | m, thus m = 0, and therefore lcm ( a, b) | 0 = m). Hence, we need only
to consider the case when a and b are nonzero.
In this case, set ℓ = lcm ( a, b). Recall that ℓ is defined as the smallest positive
common multiple of a and b. Hence, ℓ is a positive integer and is a multiple of
a and of b. Let q and r be the quotient and the remainder of the division of m
by ℓ. Thus,
The gcd and the lcm of two integers are connected to each other by the
following formula:
Theorem 3.7.5 gives an easy way to compute gcd ( a, b) and lcm ( a, b) if you
know prime factorizations of two positive integers a and b. For example, know-
ing that 18 = 2 · 32 and 12 = 22 · 3, we obtain
If you don’t know the prime factorizations of a and b, the quickest way to
find lcm ( a, b) is by using the Euclidean algorithm to find gcd ( a, b) first, and
then solving the equality gcd ( a, b) · lcm ( a, b) = | ab| for lcm ( a, b). This gives42
| ab| a
lcm ( a, b) = = ·b .
gcd ( a, b) gcd ( a, b)
Gcds and lcms can be defined for multiple numbers (not just for two num-
bers). Their properties are mostly analogous to the case of two numbers, with
some exceptions (i.e., the formula gcd ( a, b) · lcm ( a, b) = | ab| does not general-
ize to gcd ( a, b, c) · lcm ( a, b, c) = | abc|, but rather to gcd ( a, b, c) · lcm (bc, ca, ab) =
| abc|). See [Grinbe19b, §2.11] for more details.
xa + yb with x, y ∈ Z.
In other words, it means a number of cents that you can pay with a-cent coins
and b-cent coins if you can get change.
(b) An N-linear combination (short: N-LC) of a and b will mean a number
of the form
xa + yb with x, y ∈ N.
In other words, it means a number of cents that you can pay with a-cent coins
and b-cent coins without getting change.
N-LCs of 3 and 5 as well, whereas the numbers 1, 2, 4, 7 are not. Thus the
complete list of all N-LCs of 3 and 5 is
0, 3, 5, 6, 8, 9, 10, . . . .
| {z }
all integers n≥8
Proposition 3.8.2. The Z-LCs of a and b are exactly the multiples of gcd ( a, b).
Proof of Claim 2. Let n be a multiple of gcd ( a, b). We must prove that n is a Z-LC of a
and b.
Bezout’s theorem (Theorem 3.4.6) says that there exist two integers x and y such that
gcd ( a, b) = xa + yb. Consider these x and y. However, n is a multiple of gcd ( a, b); in
other words, there exists an integer c such that n = gcd ( a, b) · c. Consider this c. Now,
This shows that n is a Z-LC of a and b (since cx and cy are integers). This proves Claim
2.
Combining Claim 1 with Claim 2, we conclude that the Z-LCs of a and b are exactly
the multiples of gcd ( a, b). Thus, Proposition 3.8.2 is proved.
Math 221 Winter 2024, version March 12, 2024 page 156
Now we move on to the N-LCs. What are they? Can we describe them any
better than by their definition?
a
Let g = gcd ( a, b). Then, g divides each of a and b, so that the numbers
g
b
and are positive integers. We can simplify our problem by replacing a and
g
a b a
b with and . Clearly, the N-LCs of a and b are just the N-LCs of and
g g g
b a b
, multiplied by g. By Theorem 3.5.12, the two integers and are coprime.
g g g
Thus, understanding the N-LCs of the original integers a and b is equivalent to
a b
understanding the N-LCs of the coprime integers and .
g g
Hence, it suffices to solve our problem in the case when a and b are coprime.
In this case, Proposition 3.8.2 shows that every integer is a Z-LC of a and
b (since every integer is a multiple of 1 = gcd ( a, b)). The N-LCs are more
interesting. We have already listed the N-LCs of 3 and 5 above; let us now give
a somewhat more complicated example: The N-LCs of 5 and 9 are
0, 5, 9, 10, 14, 15, 18, 19, 20, 23, 24, 25, 27, 28, 29, 30, 32, 33, 34, . . . .
| {z }
all integers n≥32
answer. But Theorem 3.8.3 (a) gives you all the information you need to com-
pute all the N-LCs of a and b, since the first ( a − 1) (b − 1) nonnegative integers
can be checked one by one.
The particular case of Theorem 3.8.3 (a) where a = p and b = p + 1 was
Exercise 3.3.2.
Before we prove Theorem 3.8.3, we show a basic lemma:
Lemma 3.8.4. Assume that the two positive integers a and b are coprime. Let
n ∈ Z. Then, there exist two integers u and v such that 0 ≤ u ≤ b − 1 and
ua + vb = n.
Proof of Lemma 3.8.4. Bezout’s theorem (Theorem 3.4.6) says that there exist two inte-
gers x and y such that gcd ( a, b) = xa + yb. Consider these x and y. Thus, xa + yb =
gcd ( a, b) = 1 (since a and b are coprime).
Recall that b is a positive integer. Thus, division with remainder by b is well-defined
(see Definition 3.3.2 for the terminology).
Let q = (nx ) //b and r = (nx ) %b. In other words, let q and r be the quotient and
the remainder of the division of nx by b. By the definition of quotient and remainder,
we thus have
n = |{z}
nx a + nyb = (qb + r ) a + nyb
=qb+r
Proof of Theorem 3.8.3. We shall first prove part (b) and then part (d). The other two
parts will follow quite easily from these.
(b) Assume the contrary. Thus, ab − a − b is an N-LC of a and b. In other words,
there exist integers x and y such that ab − a − b = xa + yb. Consider these x and y.
From ab − a − b = xa + yb, we obtain ab = xa + yb + a + b = ( x + 1) a + (y + 1) b =
a ( x + 1) + b (y + 1). Hence,
b (y + 1) = ab − a ( x + 1) = a · (b − ( x + 1)) .
| {z }
an integer
Math 221 Winter 2024, version March 12, 2024 page 158
This shows that a | b (y + 1). Thus, the coprime removal theorem (Theorem 3.5.6)
y+1
yields that a | y + 1 (since a is coprime to b). Therefore, is an integer (since
a
y+1
a ̸= 0). Since y + 1 ≥ 1 > 0 and a > 0, this integer is furthermore positive,
|{z} a
≥0
and thus is ≥ 1. In other words, y + 1 ≥ a. Hence, y ≥ a − 1. Now,
ab − a − b = |{z}
x a + y b ≥ 0a + ( a − 1) b = ab − b.
|{z}
≥0 ≥ a −1
Proof of Claim 1. Lemma 3.8.4 shows that there exist two integers u and v such that
0 ≤ u ≤ b − 1 and ua + vb = n. Consider these u and v. Now,
(b − 1 − u) a + (−v − 1) b = ba − a − ua − vb − b
ba − a − b − (ua + vb)
= |{z}
| {z }
= ab =n
= ab − a − b − n = m (42)
(by the definition of m). We are in one of the following two cases:
Case 1: We have v ≥ 0.
Case 2: We have v < 0.
Let us first consider Case 1. In this case, we have v ≥ 0. Thus, v ∈ N. Also, u ∈ N
(since 0 ≤ u). Recall that ua + vb = n, so that n = |{z}
u a + |{z}
v b. This shows that n is
∈N ∈N
an N-LC of a and b. Thus, at least one of the two numbers n and m is an N-LC of a
and b. So we have proved Claim 1 in Case 1.
Let us next consider Case 2. In this case, we have v < 0. Hence, −v > 0, so that
−v ≥ 1 (since −v is an integer) and therefore −v − 1 ≥ 0. Thus, −v − 1 ∈ N. Moreover,
from u ≤ b − 1, we obtain b − 1 − u ≥ 0, so that b − 1 − u ∈ N. However, (42) yields
m = (b − 1 − u) a + (−v − 1) b.
| {z } | {z }
∈N ∈N
This shows that m is an N-LC of a and b. Thus, at least one of the two numbers n and
m is an N-LC of a and b. So we have proved Claim 1 in Case 2.
Thus, Claim 1 holds in each of Cases 1 and 2. The proof of Claim 1 is therefore
complete.
Math 221 Winter 2024, version March 12, 2024 page 159
Proof of Claim 2. Assume the contrary. Thus, both numbers n and m are N-LCs of a
and b. Therefore, we can write n as n = xa + yb for some x, y ∈ N (since n is an N-LC
of a and b). Furthermore, we can write m as m = za + wb for some z, w ∈ N (since m is
an N-LC of a and b). Consider these x, y, z, w. Now, adding the equalities n = xa + yb
and m = za + wb together, we obtain
This shows that n + m is an N-LC of a and b. This contradicts the fact that n + m is not
an N-LC of a and b. This contradiction shows that our assumption was wrong. Hence,
Claim 2 is proved.
Combining Claim 1 with Claim 2, we see that exactly one of the two numbers n
and m is an N-LC of a and b. In other words, exactly one of the two numbers n and
ab − a − b − n is an N-LC of a and b (since m = ab − a − b − n). This proves Theorem
3.8.3 (d).
(a) Let n > ab − a − b. Then, the integer ab − a − b − n is negative, and thus cannot
be an N-LC of a and b (since any N-LC of a and b is ≥ 0). However, Theorem 3.8.3 (d)
yields that exactly one of the two numbers n and ab − a − b − n is an N-LC of a and b.
Since ab − a − b − n cannot be an N-LC of a and b, we thus conclude that n is an N-LC
of a and b. This proves Theorem 3.8.3 (a).
(c) Consider the following table of integers:
0 1 2 ········· ab − a − b − 1 ab − a − b
ab − a − b ab − a − b − 1 ab − a − b − 2 ········· 1 0
Theorem 3.8.3 is one of the deepest results we will see in this course, but it
is only the beginning of a theory! See the Wikipedia page for “Coin problem”
for more general (and trickier) questions, such as describing the N-LCs of three
integers a, b, c. See also the slides of Drew Armstrong’s talk at FPSAC 2017 for
deep connections to algebraic combinatorics (and a visual proof different from
ours).
Math 221 Winter 2024, version March 12, 2024 page 160
A B C D E F G H I J K L M
0 1 2 3 4 5 6 7 8 9 10 11 12
N O P Q R S T U V W X Y Z
13 14 15 16 17 18 19 20 21 22 23 24 25
Thus, each letter corresponds to a unique number in the set {0, 1, . . . , 25}. For instance,
the letter F corresponds to the number 5, and the letter X corresponds to the number 23.
This gives us a method to encode letters as numbers (and, conversely, decode numbers
back into letters); this method will be called numeric encoding of letters.
A word is just a finite list of letters: For example, the word “KITTEN” is the list
(K, I, T, T, E, N). If we encode each of these six letters numerically, then we obtain
the list (10, 8, 19, 19, 4, 13) (since the letter K corresponds to 10, the letter I to 8, and
so on). This way, we can encode any word as a finite list of numbers (specifically, of
numbers in the set {0, 1, . . . , 25}). Conversely, any finite list of such numbers can be
43 Other alphabets (and lowercase letters) can be handled similarly. Note that the Romans had
a slightly different Latin alphabet than we do, but we shall use the modern one (with its 26
letters) for the sake of familiarity.
Math 221 Winter 2024, version March 12, 2024 page 161
decoded into a word (although not necessarily a meaningful word): For instance, the
list (17, 0, 19) decodes as “RAT”, since the number 17 corresponds to the letter R, the
number 0 to the letter A, and the letter 19 to the letter T.
We can now formulate Caesar’s algorithm, which is nowadays known as the “Cae-
sarian cipher ROT3 ” (we will soon see other variants):
Example 3.9.1. Let us encrypt the word “CRAZY” using the Caesarian cipher ROT3 .
First, we encode it as a finite list of numbers:
Our list (2, 17, 0, 25, 24) thus turns into the new list (5, 20, 3, 2, 1). Decoding
the latter list back into a word, we find “FUDCB”.
An easy way to visualize the Caesarian cipher ROT3 is by placing the 26 letters of
the alphabet in the sectors of a “26-hour clock” (an analog clock with 26 hours instead
of the usual 12), in the order A, B, C, . . ., Z clockwise. This “alphabet clock” looks as
Math 221 Winter 2024, version March 12, 2024 page 162
follows:
Y Z A
X B
W C
V D
U E
T F
S G
R H
Q I
P J
O K
N M L
.
Then, ROT3 simply shifts each letter forward by 3 “hours” (so A becomes D, whereas
B becomes E, and so on).
Thus, it is clear how we can decrypt a word encrypted using ROT3 : We just need to
shift each letter backward by 3 “hours”, i.e., replace each ai by ( ai − 3) %26. We can
denote this operation by ROT−3 .
More generally, we define the operation ROTk for any integer k as follows:
Caesarian cipher ROTk (for a given integer k): To encrypt a word (written
in the modern Latin alphabet, all uppercase), proceed as follows:
1. Encode the word as a finite list of numbers ( a1 , a2 , . . . , an ) (using the
numeric encoding).
2. Replace each number ai in this list by ( ai + k ) %26.
3. Decode the resulting list back into a word.
In terms of our “letter clock”, ROTk shifts each letter forward by k “hours”. It is easy
to see that a word encrypted using ROTk can be decrypted back using ROT−k , since
ROT−k shifts each letter backward by k “hours”. We can also prove this rigorously
using our definition of ROTk , using the following simple lemma:
Lemma 3.9.2. Let k be an integer. Let a, b ∈ {0, 1, . . . , 25} be two numbers satisfying
b = ( a + k ) %26. Then, a = (b − k) %26.
Math 221 Winter 2024, version March 12, 2024 page 163
• The encryption method ROT0 does nothing: Each word is encrypted as itself
(since shifting by 0 “hours” on the letter clock changes nothing, or since ( a + 0) %26 =
a%26 = a for each a ∈ {0, 1, . . . , 25}).
• The encryption method ROT26 also does nothing: Each word is encrypted as
itself (since shifting by 26 “hours” on the letter clock amounts to a full revolution,
or since ( a + 26) %26 = a for each a ∈ {0, 1, . . . , 25}).
• The encryption method ROT27 does the same as ROT1 (since ( a + 27) %26 =
( a + 1) %26 for each a ∈ {0, 1, . . . , 25}).
• More generally, if two integers u and v satisfy u ≡ v mod 26, then ROTu = ROTv .
Thus, there are only 26 distinct Caesarian ciphers, namely
Any other ROTk is just a copy of one of these. Of these 26 ciphers, only 25 are
useful, since ROT0 does nothing.
• The cipher ROT13 inverts itself: Any word encrypted using ROT13 can be de-
crypted by applying ROT13 again. Indeed, ROT13 is undone by ROT−13 , but
ROT−13 = ROT13 because −13 ≡ 13 mod 26.
Math 221 Winter 2024, version March 12, 2024 page 164
• Encrypting a word using ROTu (for some integer u) and then encrypting the
result using ROTv (for some integer v) is the same as encrypting the original
word using ROTu+v .
Exercise 3.9.2. Prove the latter statement rigorously using the description in terms
of remainders. That is, prove the following fact:
If u and v are two integers, and if a, b, c ∈ {0, 1, . . . , 25} are three numbers satisfy-
ing b = ( a + u) %26 and c = (b + v) %26, then c = ( a + (u + v)) %26.
We have so far been encrypting single words. To encrypt an entire text, one must
decide what to do about whitespaces. There are different legitimate choices: e.g.,
one can leave them unchanged; one can remove them (at the risk of making the text
hard to read even after decryption); or one can treat them as a “27th letter” of the
alphabet (thus adapting the definition of Caesarian ciphers to use ( a + k ) %27 instead
of ( a + k ) %26). We shall not delve any deeper into these questions here.
Thus, if your enemy finds a text you encrypted using some ROTk , he can just try to
decrypt it using
ROT−0 , ROT−1 , . . . , ROT−25 ,
and see which of the results gives a meaningful word/text rather than gibberish (see
Exercise 3.9.1 (d)).44
In modern language, this is saying that Caesarian ciphers have too small a key size
to be secure. The key here is the number k. While technically there are infinitely many
options for k, there are only 26 distinct ciphers obtained, so the “true” key is just an
element of {0, 1, . . . , 25}. No wonder the cipher is easily broken.
Another problem with Caesarian ciphers is that they are “too regular”: e.g., equal
letters in the original word remain equal after encryption. This, too, causes weaknesses
that render the cipher easy to break.
So how can we create a cipher that is harder to break? We need a bigger key size,
and we need “more chaos” (e.g., don’t apply the same rule to each letter). Here are
some ciphers that are slightly better in some of these regards:
• Monoalphabetic substitution: Here we still do the same thing to each letter, but
this thing is no longer just a shift by k “hours”. Instead, we fix any permutation
of the alphabet (i.e., a rule that sends each letter to a different letter) and we
44 Theanswer might be non-unique when the word is short (see Exercise 3.9.1 (b)), but will
practically always be unique when the word/text is long enough.
Math 221 Winter 2024, version March 12, 2024 page 165
apply this permutation separately to each letter. For instance, we can use the
following permutation:
A B C D E F G H I J K L M
C Z X B N M P A D T S R Q
N O P Q R S T U V W X Y Z
K O E W Y U I J F L G H V
Many different algorithms have been invented over the ages, usually striking some
balance between practicality (ease of use, simplicity, shortness of the key) and security
(unbreakability). See [Singh01] for more classical algorithms and their history.
45 See Corollary 6.6.6 and the discussion that follows it.
Math 221 Winter 2024, version March 12, 2024 page 166
• Julia tells Albert (over the public channel) that she wants to communicate, and
thus he should start creating keys.
• Albert generates two distinct large and sufficiently random primes p and q.
[What exactly does this mean, and how does he do this? With modern hardware,
“large” means approximately 300 digits or more. “Sufficiently random” means
Math 221 Winter 2024, version March 12, 2024 page 167
m := pq and ℓ : = ( p − 1) ( q − 1) .
He makes the number m public (i.e., sends it to Julia over the public channel), but
keeps the number ℓ private (even Julia does not need to know it). Eavesdroppers
will thus learn m, but will struggle to find p and q, since no fast algorithm for
factoring numbers into primes is known. (If anyone finds such an algorithm, the
RSA cipher will be broken.)
• Albert publishes the pair (e, m) (so that Julia knows it, and so does anyone else
who cares to listen). This pair is his public key, whereas the (secret) pair (d, ℓ)
is his private key.
Encrypting a message:
Now, assume that Julia wants to send a message to Albert. She encodes this message
as an element a of the set {0, 1, . . . , m − 1}. (If it does not fit into this set, she just breaks
it up into size-m chunks and encrypts each chunk separately. Note that the encoding
has to be agreed on in advance, but this can be a public method.)
She computes the remainder ae %m and sends this remainder to Albert.
[Practical issue: To compute ae %m fast, she should not try to compute the huge
number ae , since there is no space in the universe to store such a huge number. Instead,
she can “work modulo m”, and use binary exponentiation. For example, to compute
a190 , she should not use the definition a190 = |aa {z · · · }a but the much faster formula
190 times
2 2
! 2
2 2 2
a190 = a 2 a a a a a , and moreover, since she only needs the
remainder a190 %m, she can reduce each intermediate result modulo m (that is, replace
Math 221 Winter 2024, version March 12, 2024 page 168
it by its remainder upon division by m), so that no overly large numbers should appear
in the process.]
Decrypting a message:
Albert receives the remainder b = ae %m. To recover Julia’s original message a, he
just needs to take the d-th power and take its remainder upon division by m. In other
words,
a = bd %m.
[Just like Julia, Albert should use binary exponentiation and work modulo m to
compute this efficiently.]
So the encryption algorithm is just “take the e-th power and then take its remainder
when divided by m”, whereas the decryption algorithm is just “take the d-th power
and then take its remainder when divided by m” (although the implementation is a bit
more complex, in order to be efficient).
Why does this work? Obviously, we need to prove the following proposition:
Proposition 3.9.3 (correctness of RSA). Let p and q be two distinct primes. Let
m = pq and ℓ = ( p − 1) (q − 1). Let e and d be two positive integers such that
ed ≡ 1 mod ℓ.
Let a and b be two numbers in {0, 1, . . . , m − 1} such that b = ae %m. Then, a =
bd %m.
This is not at all obvious! The RSA cipher might resemble a Caesarian cipher in that it
uses remainders, but it is different in that it takes powers instead of adding/subtracting
a fixed k.
To prove Proposition 3.9.3, we will need a lemma, which resembles Fermat’s Little
Theorem (Theorem 3.6.4):
Lemma 3.9.4. Let p and q be two distinct primes. Let N be a positive integer such
that N ≡ 1 mod ( p − 1) (q − 1). Let a be any integer. Then,
a N ≡ a mod pq.
Proof of Lemma 3.9.4. Fermat’s little theorem (Theorem 3.6.4) says that a p ≡ a mod p
and aq ≡ a mod q. Our claim looks similar, but not quite the same. Nevertheless, we
are on the right trail.
We must prove that a N ≡ a mod pq. In other words, we must prove that pq | a N − a.
But p and q are two distinct primes, and thus are coprime (why?46 ). Hence, pq | a N − a
46 Proof.The only positive divisors of q are 1 and q (since q is prime). Since p is neither 1 nor q,
it thus follows that p is not a positive divisor of q. In other words, q is not a multiple of p.
Hence, the friend-or-foe lemma (Lemma 3.6.2) shows that q is coprime to p.
Math 221 Winter 2024, version March 12, 2024 page 169
would follow from the coprime divisors theorem (Theorem 3.5.4), if we can show that
p | a N − a and q | a N − a.
It thus remains to prove that p | a N − a and q | a N − a. We will only show p | a N − a,
since q | a N − a is analogous.
So we must show that p | a N − a. In other words, we must show that a N ≡ a mod p.
However, p − 1 | ( p − 1) (q − 1) | N − 1 (since N ≡ 1 mod ( p − 1) (q − 1)). In other
words, N − 1 = ( p − 1) c for some integer c. Consider this c. It is easy to see that c ≥ 0
(why?47 ), so that c ∈ N. From N − 1 = ( p − 1) c, we obtain
N = 1 + ( p − 1) c. (43)
However, recall that a p ≡ a mod p. Using this fact, we can easily see that
for each k ∈ N.
[Proof of (44): We can prove this by induction on k:
Base case (k = 0): We have a1+( p−1)0 = a1 = a ≡ a mod p. Thus, (44) holds for k = 0.
Induction step: Let k ∈ N. Assume (as the induction hypothesis) that (44) holds for
k; that is, we have a1+( p−1)k ≡ a mod p. We must then prove that (44) holds for k + 1
instead of k; in other words, we must prove that a1+( p−1)(k+1) ≡ a mod p.
But ( p − 1) (k + 1) = ( p − 1) k + ( p − 1), and therefore
bd ≡ ( ae )d = aed mod m.
47 Proof.
Assume the contrary. Thus, c < 0. But p > 1 (since p is prime), so that p − 1 > 0.
Hence, ( p − 1) c < 0 (since c < 0). Thus, N − 1 = ( p − 1) c < 0, so that N < 1. But
N ≥ 1 (since N is a positive integer). This is in obvious contradiction to N < 1. Hence, our
assumption was false, qed.
Math 221 Winter 2024, version March 12, 2024 page 170
In other words, aed ≡ a mod m (since pq = m). Combining what we have shown, we
obtain
bd ≡ aed ≡ a mod m.
Therefore, Proposition 3.3.16 (applied to bd , a and m instead of a, b and d) yields
bd %m = a%m = a (since a ∈ {0, 1, . . . , m − 1}). This proves Proposition 3.9.3.
The RSA cipher, as demonstrated above, lets Julia send secret messages to Albert. If
Albert wants to respond secretly, the two can switch roles (i.e., now Julia must set up
her two primes p′ and q′ and her m′ , ℓ′ , e′ and d′ , publish her public key (e′ , m′ ), and
let Albert encrypt his message using that public key).
The RSA cipher is not hard to implement in your favorite programming language,
provided that it supports sufficiently big integers. But there are some practical consid-
erations:
• You need sufficiently random primes. (Generally, any cipher requires something
sufficiently random that the eavesdroppers cannot guess.)
• Certain primes make for bad choices of p and q, since they allow certain tricks
for computing d. You want to avoid such primes.
• You want to avoid certain practical “side channels” (as with any ciphers).
• You don’t want your message a to be much smaller than m. If it is, pad it with
random bits.
These and many other caveats are discussed on the Wikipedia page for the RSA
cipher, as well as in more serious textbooks on modern cryptography (e.g., [BaEdHa18],
[Buchma04] or [HoPiSi14]).
The RSA cipher can be used not just for encrypting secret messages, but also for
authentification (i.e., proving that a message is really coming from you). See [HoPiSi14,
Chapter 4] or [Buchma04, Chapter 12] for this application.
There are many other modern ciphers. In particular, elliptic curve cryptography (see
[Buchma04, §13.2] or [HoPiSi14, Chapter 6]) can be viewed as a more intricate version
of the RSA cipher.
a N ≡ a mod pqr.
Math 221 Winter 2024, version March 12, 2024 page 171
• How many ways are there to choose 3 odd integers between 0 and 20,
if the order matters (i.e., we count the choice 1, 3, 5 as different from the
choice 3, 1, 5)? (The answer is 1000.)
• How many ways are there to choose 3 odd integers between 0 and 20, if
the order does not matter? (The answer is 220.)
• How many ways are there to choose 3 distinct odd integers between 0 and
20, if the order matters? (The answer is 720.)
• How many ways are there to choose 3 distinct odd integers between 0 and
20, if the order does not matter? (The answer is 120.)
• How many prime factorizations does 200 have (where we count different
orderings as distinct)? (The answer is 10. This is a mix between a number
theory problem and a counting problem.)
• How many ways are there to tile a 2 × 15-rectangle with dominos (i.e.,
rectangles of size 1 × 2 or 2 × 1) ? (The answer is 987. For instance, the
tiling
• How many addends do you get when you expand the product
( a + b) (c + d + e) ( f + g) ? (The answer is 12.)
• How many differentmonomials do you get when you expand the product
( a − b) a2 + ab + b2 ? (This one is more of an algebra problem, but I
wanted to list it because it is connected to counting. The answer is 2,
because ( a − b) a2 + ab + b2 = a3 − b3 .)
• How many positive divisors does 24 have? (We can actually list them:
1, 2, 3, 4, 6, 8, 12, 24. This one is again a mix of a counting problem and
a number theory problem.)
We will first solve a few basic counting problems informally, and then (in
Chapter 6) make the underlying concepts rigorous.
Math 221 Winter 2024, version March 12, 2024 page 172
{1, 3, 5, 7, 9} .
The braces { and } around the list are there to signal that we mean the set of all
the elements, not the single elements themselves. These braces are called “set
braces”, and are involved in several different notations for sets.
Some more examples of finite sets are
{1, 2, 3, 4, 5} ,
{1, 2} ,
{1} (this is the set that only contains 1) ,
{} (the empty set, also denoted ∅) ,
{1, 2, . . . , 1000} (you understand what “ . . . ” means here) .
Some infinite sets can also be written in this form:
The vertical bar | here should be read as “such that” (don’t mistake it for a
divisibility or absolute value bracket). The part before this bar says what type
of objects you are considering (in our case, it is the integers x); the part after this
bar imposes a condition (or several) on these objects (in our case, the condition
is x2 < 13). What you get is the set of all objects of the former type that satisfy
the latter condition. For instance,
n o
x is an integer | x2 < 13
= {all integers whose square is smaller than 13}
= {−3, −2, −1, 0, 1, 2, 3} .
Some authors write a colon (:)instead of the vertical bar |. Thus, they write
x is an integer | x2 < 13 as x is an integer : x2 < 13 .
Yet another way of defining sets is when you let a variable range over a given
set and collect certain derived quantities. For example,
n o
x2 + 2 | x ∈ {1, 3, 5, 7, 9}
Math 221 Winter 2024, version March 12, 2024 page 174
means the set whose elements are the numbers x2 + 2 for all x ∈ {1, 3, 5, 7, 9}.
Thus,
n o n o
2 2 2 2 2 2
x + 2 | x ∈ {1, 3, 5, 7, 9} = 1 + 2, 3 + 2, 5 + 2, 7 + 2, 9 + 2
= {3, 11, 27, 51, 83} .
{an expression | x ∈ S}
stands for the set whose elements are the values of the given expression for all
x ∈ S.
Some more examples of this:
x+1 1+1 2+1 3+1 4+1 5+1
| x ∈ {1, 2, 3, 4, 5} = , , , ,
x 1 2 3 4 5
3 4 5 6
= 2, , , ,
2 3 4 5
and
n o n o
x %5 | x ∈ N = 0 %5, 1 %5, 2 %5, 3 %5, 4 %5, 5 %5, 6 %5, . . .
2 2 2 2 2 2 2 2
= {0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, . . .} .
Note that the remainders x2 %5 repeat every five steps, because every integer
x satisfies ( x + 5)2 ≡ x2 mod 5 and thus ( x + 5)2 %5 = x2 %5 (by Proposition
3.3.16).
Let me stress once again that a set cannot contain an element more than once.
Also, sets do not come with an ordering of their elements. Thus,
since each of these four sets contains 1 and 2 and nothing else. If S is a set
and p is an object, then S either contains p or does not contain p; it cannot
“contain p twice”, nor can it contain an element “before” another. So when
you write {2, 1, 1}, you aren’t making a set that contains 1 twice; you are just
saying twice that it contains 1, and this is equivalent to saying the same thing
once. Likewise, the sets {1, 2} and {2, 1} do not “contain 1 and 2 in different
orders”; you are just saying in different orders that they contain 1 and 2, but
the meaning is the same. So
n o
x %5 | x ∈ N = {0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, . . .}
2
= {0, 1, 4} .
Note that sets can contain any mathematical objects, not just numbers. In
particular, they can contain other sets. Make sure you understand what the
sets
{1, 2, 3} , {{1, 2, 3}} , {{1, 2} , {3}} , {{1} , {2} , {3}}
are and why they are different48 .
(The “or” is non-exclusive, as usual. So this includes the elements that are
contained in both A and B.)
(e) We define the intersection of A and B to be the set
• The set {1, 2, 3} contains three elements, namely the numbers 1, 2 and 3.
• The set {{1, 2, 3}} contains one element, namely the set {1, 2, 3}.
• The set {{1, 2} , {3}} contains two elements, namely the sets {1, 2} and {3}.
• The set {{1} , {2} , {3}} contains three elements, namely the sets {1}, {2} and {3}.
Math 221 Winter 2024, version March 12, 2024 page 176
For example,
{1, 3, 5} ⊆ {1, 2, 3, 4, 5} ,
{1, 2, 3, 4, 5} ⊇ {1, 3, 5} ,
we don’t have {5, 6, 7} ⊆ {1, 2, 3, 4, 5} ,
{1, 2, 3} = {3, 2, 1} ,
{1, 3, 5} ∪ {3, 6} = {1, 3, 5, 3, 6} = {1, 3, 5, 6} ,
{1, 3, 5} ∩ {3, 6} = {3} ,
{1, 2, 4} ∩ {3, 5} = ∅ (so that the sets {1, 2, 4} and {3, 5} are disjoint) ,
{1, 3, 5} \ {3, 6} = {1, 5} ,
{3, 6} \ {1, 3, 5} = {6} ,
Z \ N = {−1, −2, −3, . . .} = {all negative integers} .
For example, the three sets {1, 2}, {5} and {0, 7} are disjoint. On the other
hand, the three sets {1, 2}, {5} and {2, 3} are not (since {1, 2} ∩ {2, 3} = {2} ̸=
∅).
0+1
Base case: For n = 0, the claim is true, because there are 0 = odd
2
integers between 0 and 0.
Induction step: Let n be a positive integer. Assume (as the induction hypoth-
jnk
esis) that the claim is true for n − 1. That is, assume that there are exactly
2
odd integers between 0 and n − 1. We must show that the claim also holds for
n+1
n, i.e., that there are exactly odd integers between 0 and n.
2
Let me introduce a shorthand: The symbol “#” shall mean “number”. Thus,
our induction hypothesis says
jnk
(# of odd integers between 0 and n − 1) = , (45)
2
and our goal is to prove that
n+1
(# of odd integers between 0 and n) = .
2
We are in one of the following two cases:
Case 1: The number n is even.
Case 2: The number n is odd.
Let us consider Case 1 first. In this case, n is even. Thus, n is not odd.
Therefore, the odd integers between 0 and n are precisely the odd integers
between 0 and n − 1 (since the extra integer n does not qualify as odd). Hence,
(# of odd integers between 0 and n)
= (# of odd integers between 0 and n − 1)
jnk
= (by (45)) . (46)
2
However, n + 1 is odd (since n is even), and thus 2 ∤ n + 1. Therefore,
Corol-
n+1
lary 3.3.19 (b) (applied to 2 and n + 1 instead of d and n) yields =
2
( n + 1) − 1
j k
n
= . Comparing this with (46), we find
2 2
n+1
(# of odd integers between 0 and n) = .
2
Thus, we have achieved our goal in Case 1.
Let us now consider Case 2. In this case, n is odd. Thus, the odd integers
between 0 and n are precisely the odd integers between 0 and n − 1 along with
the new odd integer n. Hence,
(# of odd integers between 0 and n)
= (# of odd integers between 0 and n − 1) + 1
jnk
= +1 (by (45)) . (47)
2
Math 221 Winter 2024, version March 12, 2024 page 178
• there are 4 subsets of {1, 2}, namely {} , {1} , {2} , {1, 2}.
• there are 2 subsets of {1}, namely {} and {1}.
• there is 1 subset of {}, namely {}.
• there are 16 subsets of {1, 2, 3, 4}.
This is precisely what we needed to prove. This completes the induction step,
and thus Theorem 4.3.1 is proved.
More generally, we have the following:
(# of subsets of S) = 2n .
Informal proof. This follows from Theorem 4.3.1, since we can rename the n ele-
ments of S as 1, 2, . . . , n.
For example,
More generally, the answer to the question “how many k-element subsets
n
does a given n-element set have” turns out to be the binomial coefficient .
k
Let us state this as a theorem and give an informal proof (which will easily
become rigorous once we have the basic concepts of counting pinned down):52
52 This theorem is exactly Theorem 2.5.10, which we left unproved a few chapters ago.
Math 221 Winter 2024, version March 12, 2024 page 182
Informal proof. We induct on n (without fixing k). That is, we use induction on
n to prove the statement
for each n ∈ N.
Base case: Let us prove P (0). Let k be any number. The only 0-element set
is ∅, and its only subset is ∅. Thus, a 0-element set S necessarily has one
0-element subset (∅) and no other subsets. Hence, it satisfies
(
1, if k = 0;
(# of k-element subsets of S) =
0, else.
For instance:
Math 221 Winter 2024, version March 12, 2024 page 183
(again by the statement P (n − 1), but now applied to k − 1 instead of k). Note
that we deliberately did not fix k in our induction, so that we were now able to
apply P (n − 1) to k − 1 instead of k.
Now, (48) becomes
by Pascal’s recurrence (Theorem 2.5.1). But this is precisely the equality that
we have to prove. This completes the induction step, and thus Theorem 4.3.3 is
proved.
The above proof can also be used to write an algorithm that lists all the k-
element subsets of {1, 2, . . . , n}. This algorithm is recursive and proceeds as
follows:
• If n = 0, then:
– if k = 0, then list ∅ (i.e., the resulting list will consist only of ∅).
– otherwise, list nothing.
• Otherwise,
– list the red sets (by listing all the (k − 1)-element subsets of {1, 2, . . . , n − 1},
and inserting n into each of them);
– list the green sets (i.e., the k-element subsets of {1, 2, . . . , n − 1});
– combine these two lists.
54 Note that lists are enclosed within brackets in Python: e.g., a list that we call ( a, b, c) would
be written [a,b,c] in Python. Also, Python’s notation set([a,b,c]) corresponds to our
{ a, b, c}.
Math 221 Winter 2024, version March 12, 2024 page 185
Definition 4.4.1. A finite list (aka tuple) is a list consisting of finitely many
objects. The objects appear in this list in a specified order, and they don’t
have to be distinct.
A finite list is delimited using parentheses: i.e., the list that contains the
objects a1 , a2 , . . . , an in this order is denoted by ( a1 , a2 , . . . , an ).
“Specified order” means that the list has a well-defined first entry, a
well-defined second entry, and so on. Thus, two lists ( a1 , a2 , . . . , an ) and
(b1 , b2 , . . . , bm ) are considered equal if and only if
• we have n = m, and
For example:
• The lists (1, 2) and (2, 1) are not equal (although the sets {1, 2} and {2, 1}
are equal).
• The lists (1, 2) and (1, 1, 2) are not equal (although the sets {1, 2} and
{1, 1, 2} are equal).
• The lists (1, 1, 2) and (1, 2, 2) are not equal (although the sets {1, 1, 2} and
{1, 2, 2} are equal).
• How many pairs ( a, b) are there with a, b ∈ {1, 2, 3} ? There are nine:
The fact that there are nine of them is not surprising given how I’ve laid
them out: They are forming a table with 3 rows and 3 columns, where the
row determines the first entry of the pair55 and the column determines
the second entry. Thus, their total number is 3 · 3 = 9.
• How many pairs ( a, b) are there with a, b ∈ {1, 2, 3} and a < b ? There are
three:
(1, 2) , (1, 3) , (2, 3) .
• How many pairs ( a, b) are there with a, b ∈ {1, 2, 3} and a > b ? Again,
three:
(2, 1) , (3, 1) , (3, 2) .
Informal proof. (a) These pairs can be arranged in a table with n rows and n
columns, where the rows determine the first entry and the columns determine
55 i.e.:
0 + 1 + 2 + · · · + ( n − 1) ,
because there are 0 such cells in the first column, 1 such cell in the second, 2
such cells in the third, and so on. Hence,
(c) A pair ( a, b) with a = b is just a pair of the form ( a, a), that is, a single
element of {1, 2, . . . , n} written twice in succession. Counting such pairs is
therefore tantamount to counting single elements of {1, 2, . . . , n}; but there are
clearly n of them.
(d) The pairs ( a, b) that satisfy a > b are in one-to-one correspondence with
the pairs ( a, b) that satisfy a < b: Namely, each former pair becomes a latter pair
if we swap its two entries, and vice versa. Thus, the # of former pairs equals the
# of latter pairs. But we have already found (in part (b)) that the # of latter pairs
is 1 + 2 + · · · + (n − 1). Hence, the # of former pairs is 1 + 2 + · · · + (n − 1) as
well.
Math 221 Winter 2024, version March 12, 2024 page 189
n2 + n n ( n + 1)
1+2+···+n = = .
2 2
Thus, we have recovered the Little Gauss formula (Theorem 1.3.1) by counting
pairs. This illustrates the fact that counting can be used to prove algebraic
identities.
Exercise 4.4.1. How many pairs ( a, b) are there with a ∈ {1, 2, 3} and b ∈
{1, 2, 3, 4, 5} ?
Solution. By the same reasoning as in Proposition 4.4.3 (a), there are 15 such
pairs, since the pairs can be arranged in a table with 3 rows and 5 columns.
The same reasoning gives the following more general result:
Informal proof. You can think of these triples as occupying the cells of a 3-
dimensional table, but this kind of visualization is tricky (and gets even less
reliable when you get to higher dimensions).
A better approach: Re-encode each triple ( a, b, c) as a pair (( a, b) , c) (a pair
whose first entry is itself a pair). This is a pair whose first entry comes from
the set of all pairs ( a, b) with a ∈ A and b ∈ B, whereas its second entry comes
from C. Let U be the set of all pairs ( a, b) with a ∈ A and b ∈ B. Then, this set
U is an nm-element set, because
For instance, {1, 2} × {7, 8, 9} is the set of all pairs ( a, b) with a ∈ {1, 2} and
b ∈ {7, 8, 9}. Explicitly, it consists of the following six pairs:
Theorem 4.4.8 (product rule for two sets). If A is an n-element set, and B is
an m-element set, then A × B is an nm-element set.
Likewise, we can restate Theorem 4.4.5 as follows:
Theorem 4.4.9 (product rule for two sets). If A is an n-element set, and B is
an m-element set, and C is a p-element set, then A × B × C is an nmp-element
set.
More generally:
( n − 1) n
n
1 + 2 + · · · + ( n − 1) = = .
2 2
(by a similar argument: these k-tuples are just the k-element subsets of {1, 2, . . . , n}
in disguise). For comparison, if we drop the “a1 < a2 < · · · < ak ” requirement,
56 We are again being informal here. To be more rigorous, we should be speaking of a one-to-
one correspondence between the former triples and the latter subsets. But it is not yet the
time for this pedantry.
Math 221 Winter 2024, version March 12, 2024 page 193
then we have
Other counting problems don’t have answers this simple. For instance, it is
not hard to see that
but there is no way to express this without a “· · · ” or a ∑ sign. For each specific
k, however, we can simplify this:
10 + 20 + · · · + n0 = 1| + 1 +{z· · · + 1} = n;
n times
n ( n + 1)
11 + 21 + · · · + n1 = 1 + 2 + · · · + n = ;
2
n (n + 1) (2n + 1)
12 + 22 + · · · + n2 = ;
6
n2 ( n + 1)2
13 + 23 + · · · + n3 = ;
4
n (2n + 1) (n + 1) 3n + 3n2 − 1
4 4 4
1 +2 +···+n = ;
30
....
In the next two chapters, we will learn what it means for a set to have n
elements, and what rules we have actually been using in our above informal
arguments. To do so, we must first get familiar with the concept of maps (also
known as functions).
Math 221 Winter 2024, version March 12, 2024 page 194
This is not a real definition, as it only kicks the can down the road: It defines
“function” in terms of “rule”, but what is a rule? But it gives some good
intuition, provided that it is correctly understood. Here are some comments
that should clarify it:
Do not confuse the → arrow with the 7→ arrow! The former arrow is
written between the sets X and Y, whereas the latter is written between a
specific input and the corresponding output.
• The notation
X → Y,
x 7→ (some expression involving x )
(where X and Y are two sets) means “the function from X to Y that sends
each element x of X to the expression to the right of the “7→” symbol”.
1 x
Here, the expression can (for example) be x2 or or .
x+4 x+2
For example,
R → R,
x 7→ x2
R → R,
x
x 7→
sin x + 15
is the function that takes the sine of the input, then adds 15, then divides
the input by the result. (Note that this is well-defined, since sin x + 15 is
x
never zero and thus the expression is always meaningful, so we
sin x + 15
really get a function from R to R.)
For yet another example,
R → R,
x 7→ 2
Z → Q,
x 7→ 2x
Math 221 Winter 2024, version March 12, 2024 page 196
x −2 −1 0 1 2
1 1
2x 1 2 4
4 2
Z → Q,
1 , if x ̸= 1;
x 7→ x − 1
5, if x = 1.
• The notation
f : X → Y,
x 7→ (some expression involving x )
means that we take the function from X to Y that sends each x ∈ X to the
expression to the right of the “7→” symbol, and we call this function f .
(Or, if a function named f has already been defined, this notation means
that this f is the function from X to Y that sends each x ∈ X to the
expression to the right of the “7→” symbol.)
For example, if we write
f : R → R,
x 7→ x2 + 1,
then f henceforth will denote the function from R to R that sends each
x ∈ R to x2 + 1.
h (0) = 92,
h (2) = 20,
h (4) = 92.
The values here have been chosen at whim, for no particular reason. A
function does not have to be “natural” or “meaningful” in any way; all it
has to do is transform each element of X into some element of Y.
Math 221 Winter 2024, version March 12, 2024 page 197
• The Caesarian ciphers from Section 3.9 can also be viewed as examples of maps
(i.e., functions). Specifically, if k is any integer, and if we denote the set of all
words (including nonsensical ones like “OQJCLA”) by W, then the Caesarian ci-
pher ROTk is a map from W to W. For instance, ROT1 (“KITTEN”) = “LJUUFO”.
At this point, we have a good idea of what a function is, but the provisional
definition given above (Definition 5.1.1) wasn’t as precise as we would like.
Even worse, the word “rule” in that definition is still unclear, and prevents us
from dealing with functions that can neither be given by an explicit formula
(such as “take the square”) nor be specified by a complete list of values (e.g.,
since the domain is infinite). Thus, we need a better definition of a function.
This is what we will do in the present chapter. The trick is to first define
the more general concept of a relation, and then to characterize functions as
relations with a certain property.
5.2. Relations
Relations (to be specific: binary relations) are another concept that you have
already seen on myriad examples:
What do these relations all have in common? They can be applied to pairs
of objects. Applying a relation to a pair of objects gives a statement which can
be true or false. For example, applying the relation “coprime” to the pair (5, 8)
Math 221 Winter 2024, version March 12, 2024 page 199
yields the statement “5 is coprime to 8”, which is true. Applying it to the pair
(5, 10) yields the statement “5 is coprime to 10”, which is false.
A general relation R relates elements of a set X with elements of a set Y. For
any pair ( x, y) ∈ X × Y (that is, for any pair consisting of an element x ∈ X
and an element y ∈ Y), we can apply the relation R to the pair ( x, y), obtaining
a statement “x R y” which is either true or false. To describe this relation R,
we need to know which pairs ( x, y) ∈ X × Y do satisfy x R y and which pairs
don’t. In other words, we need to know the set of all pairs ( x, y) ∈ X × Y that
satisfy x R y. For a rigorous definition of a relation, we simply take the relation
R to be this set of pairs. In other words, we define relations as follows:
• we write x R y if ( x, y) ∈ R;
• we write x R y if ( x, y) ∈
/ R.
All the relations we have seen so far can be recast in terms of this definition:
{( x, y) ∈ Z × Z | x divides y}
= {( x, y) ∈ Z × Z | there exists some z ∈ Z such that y = xz}
= {( x, xz) | x ∈ Z and z ∈ Z} .
For instance, the pairs (2, 4) and (3, 9) and (10, 20) belong to this subset,
whereas the pairs (2, 3) and (2, 15) and (10, 5) do not.
• The coprimality relation (“coprime to”) is a subset of Z × Z, namely the
subset
{( x, y) ∈ Z × Z | x is coprime to y}
= {( x, y) ∈ Z × Z | gcd ( x, y) = 1} .
It contains, for instance, (2, 3) and (7, 9), but not (4, 6).
n
• For any n ∈ Z, the “congruent modulo n” relation ≡ is a subset of Z × Z,
namely the subset
{( x, y) ∈ Z × Z | x ≡ y mod n}
= {( x, x + nz) | x ∈ Z and z ∈ Z}
(because for a given integer x, the integers y that satisfy x ≡ y mod n are
precisely the integers of the form x + nz for z ∈ Z).
Math 221 Winter 2024, version March 12, 2024 page 200
• A geometric example: Let P be the set of all points in the plane, and let
L be the set of all lines in the plane. Then, the “lies on” relation (as in “a
point lies on a line”) is a subset of P × L, namely the subset
E A = {( x, y) ∈ A × A | x = y}
= {( x, x ) | x ∈ A} .
• We can literally take any subset of X × Y (where X and Y are two sets)
and it will be a relation from X to Y. Just as with functions, a relation
does not have to follow any “meaningful” rule. For example, here is a
relation from {1, 2, 3} to {5, 6, 7}:
5 6 7
1 no yes yes
2 no no no
3 yes no no
A good way to visualize a relation R from a set X to a set Y (at least when
X and Y are finite) is by drawing the sets X and Y as blobs, drawing their
elements as nodes within these blobs, and drawing an arrow from the x-node
to the y-node for every pair ( x, y) that belongs to the relation R. For example,
the relation R in our last example can be visualized as follows:
X Y
1 5
2 6
3 7
(49)
Math 221 Winter 2024, version March 12, 2024 page 201
(which we illustrated in (49)) is not a function from {1, 2, 3} to {5, 6, 7}. In fact,
it violates output uniqueness at x = 1 (since there are two y ∈ {5, 6, 7} that
satisfy 1 R y) and also violates it at x = 2 (since there are no y ∈ {5, 6, 7} that
satisfy 2 R y). Each of these two violations is reason enough to disqualify this
relation from being a function.
In our above list of relations, only the equality relation E A is a function.
Here is an example of a function from X = {1, 2, 3} to Y = {5, 6, 7, 8}: the
relation
{(1, 7) , (2, 5) , (3, 7)} .
This relation satisfies output uniqueness and thus is a function. Visualized by
blobs and arrows, it looks as follows:
X Y
1 5
2 6
3 7
8
.
is a relation whose visual picture has exactly one arrow coming out of each
X-node.
{( x, f ( x )) | x ∈ X } .
Therefore, we can translate rigorous functions into provisional ones and vice
versa. We thus shall think of the two concepts as being the same (i.e., we
will regard the rigorous concept as a clarification of the provisional one). In
particular, all the notations we have introduced for provisional functions will
be used for rigorous ones.
f 0 : {1, 2, 3, 4} → {1, 2, 3, 4}
Math 221 Winter 2024, version March 12, 2024 page 203
1 R 3, 2 R 2, 3 R 3, 4R3
f 1 : {1, 2, 3, 4} → {1, 2, 3} ,
n 7→ n ?
Such a function f 1 does not exist, since it would have to send 4 to 4, but 4 is
not in the target {1, 2, 3}.
This is a pedantic issue, but it should be kept in mind: Not every ex-
pression that appears to define a function actually defines a function. Make
sure that the expression to the right of the “7→” symbol always is an actual
element of the target (which, in this case, is the set {1, 2, 3}).
f 2 : {1, 2, 3, . . .} → {1, 2, 3, . . .} ,
n 7→ (the number of positive divisors of n) .
As a relation, it is
(We cannot list all the pairs, since there are infinitely many.) Thus, f 2 (1) = 1
and f 2 (2) = 2 and f 2 (3) = 2 and so on.
fe2 : Z → {1, 2, 3, . . .} ,
n 7→ (the number of positive divisors of n) ?
There is no such function fe2 , since fe2 (0) would have to be undefined or ∞
(because 0 has infinitely many positive divisors).
This is the exact same problem that we had with the non-function f 1 above.
Math 221 Winter 2024, version March 12, 2024 page 204
f 3 : {1, 2, 3, . . .} → {1, 2, 3, . . .} ,
n 7→ (the smallest prime divisor of n) ?
Again, there is no such function f 3 , since f 3 (1) makes no sense (indeed, the
number 1 has no prime divisors, thus no smallest prime divisor).
This is essentially the same problem as with the function fe2 from the pre-
vious example, except that this time the value f 3 (1) is really undefined (as
opposed to just failing to belong to the target).
Note that the function f 3 “almost” exists: There is a relation “y is the
smallest prime divisor of x” from {1, 2, 3, . . .} to {1, 2, 3, . . .}, but this relation
fails the output uniqueness requirement at x = 1, and thus is not a function.
However, we can make it into a function by removing the offending element
1 from its domain. That is, there is a function
f 4 : Q → Z,
a
7→ a (for a, b ∈ Z with b ̸= 0) ?
b
Restated in words, this is to be a function that takes a rational number as
input, writes it as a ratio of two integers and outputs the numerator. Is there
such a function?
Again, the answer is no. Again, the problem is a failure of output unique-
ness, but this time, it fails not because the output does not exist (or does
not belong to the target), but rather because the output is non-unique. For
example, if f 4 was a function, then we would have the two equalities
1
f 4 (0.5) = f 4 =1 and
2
3
f 4 (0.5) = f 4 = 3,
6
which contradict one another. The underlying issue is that a rational number
can be written as a fraction in several different ways, and the numerators of
these fractions will usually not be the same. Thus, if you follow the rule
a
7→ a to compute the output of f 4 for a given input, your output will depend
b
on how exactly you write your input as a fraction, and this is a violation of
output uniqueness.
Math 221 Winter 2024, version March 12, 2024 page 205
5.5. Well-definedness
The issues that we have seen in the last few examples (supposed functions
failing to exist either because their output values make no sense, or because
these values don’t lie in Y, or because these values are ambiguous) are known
as well-definedness issues. Often, mathematicians say that “a function is well-
defined” when they mean that its definition does not suffer from such issues
(i.e., its definition really defines a function). So you should read “This function
is well-defined [or: not well-defined]” as “The definition we just gave really
defines a function [or: does not actually define a function]”.
For example, as we just saw, the function
f 4 : Q → Z,
a
7→ a
b
is not well-defined (i.e., there is no such function), but the function
f 5 : Q → Q,
a a2
7→ 2
b b
a
is well-defined (because if you write a given rational number as for different
b
a2
pairs ( a, b), the resulting quotients will all be equal). The function
b2
f 1 : {1, 2, 3, 4} → {1, 2, 3} ,
n 7→ n
is not well-defined (since its supposed output f 1 (4) fails to lie in the target
{1, 2, 3}), whereas the function
f 6 : {1, 2, 3, 4} → {1, 2, 3} ,
n 7→ 1 + (n%3)
1 1
2 2
3 3
When the domain of a function f is a Cartesian product of several sets (i.e., its
inputs are tuples), f is called a multivariate function. For instance, the function
f : Z × Z → Z,
( a, b) 7→ a + b
(which sends each pair ( a, b) of two integers to their sum a + b) is a multivariate
function. Its input is a pair of two integers, i.e., it really has two inputs (a and
b). As a relation, it is the subset
{(( a, b) , a + b) | a, b ∈ Z}
= {(( a, b) , c) | a, b, c ∈ Z such that c = a + b}
of (Z × Z) × Z. Of course, this function has a name: It is the addition of
integers. Other multivariate functions are
Z × Z → Z,
( a, b) 7→ a − b
(known as the subtraction of integers) and
Z × Z → Z,
( a, b) 7→ ab
Math 221 Winter 2024, version March 12, 2024 page 207
Z × Z → Z,
( a, b) 7→ a/b,
since a/b is not always an integer (and does not even exist when b = 0).
When f is a multivariate function whose inputs are k-tuples, we commonly
use the shorthand notation f ( a1 , a2 , . . . , ak ) for its values f (( a1 , a2 , . . . , ak )).
(That is, we commonly omit the outer pair of parentheses.) For instance, if
f is the addition of integers, then f ( a, b) = f (( a, b)) = a + b for all a, b ∈ Z.
X → Z,
x 7→ f ( g ( x )) .
In other words, f ◦ g is the function that first applies g and then applies f .
This function f ◦ g is called the composition of f with g (and I pronounce it
“ f after g”).
f : R → R,
x 7→ x3
and
g : R → R,
1
x 7→ .
x2 +7
Math 221 Winter 2024, version March 12, 2024 page 208
whereas
1 1
( g ◦ f ) ( x ) = g ( f ( x )) = g x3 = 2
= .
( x3 ) + 7 x6 + 7
i 1 2 3 i 1 2 3 4
.
f (i ) 1 3 2 g (i ) 2 1 3 2
These two functions can be visualized using blobs and arrows, and we can
even reuse the target-blob from g as the domain-blob for f :
1 1 1
2 2 2
3 3 3
g f
4 4
.
{1, 2, 3, 4} {1, 2, 3, 4}
1 1
2 2
3 3
4 f ◦g 4
Math 221 Winter 2024, version March 12, 2024 page 209
Exercise 5.8.1. For any positive integer d, let us define the function
rd : Z → Z,
n 7→ n%d
(which sends each integer n to the remainder of the division of n by d). For
example, r5 (18) = 18%5 = 3 and r6 (18) = 18%6 = 0.
(a) Make a table of the values of the function r2 ◦ r3 on the inputs
0, 1, 2, 3, 4, 5.
(b) Prove that r2 ◦ r3 ̸= r2 .
(c) Let d and e be two positive integers such that d | e. Prove that rd ◦ re =
rd .
( f ◦ g) ◦ h = f ◦ ( g ◦ h) .
so that
( f ◦ ( g ◦ h)) ( x ) = f ( g (h ( x ))) = (( f ◦ g) ◦ h) ( x ) .
Since this holds for each x ∈ X, we conclude that f ◦ ( g ◦ h) = ( f ◦ g) ◦ h
(because two functions u and v from X to W are equal if and only if the equality
u ( x ) = v ( x ) holds for each x ∈ X). This proves the theorem.
Intuitively, the claim of Theorem 5.8.4 is pretty obvious: It is just saying that if
you can do three things (applying h, applying g and applying f ) in succession,
then it does not matter whether you view it as “first doing h followed by g, and
then doing f ” or as “first doing h, and then doing g followed by f ”.
Thanks to Theorem 5.8.4, we can write compositions of several functions
without parentheses: i.e., instead of writing f ◦ ( g ◦ h) or ( f ◦ g) ◦ h, we can just
write f ◦ g ◦ h.
The following property of composition of functions is even easier. We recall
that idP means the identity map on a given set P; this is the map from P to P
that sends each element p ∈ P to itself.
f ◦ idX = idY ◦ f = f .
( f ◦ idX ) ( x ) = f (idX ( x ))
= f (x) (since the definition of idX yields idX ( x ) = x ) .
This shows that f ◦ idX = f (since both f ◦ idX and f are functions from X to
Y). A similar computation yields idY ◦ f = f . Thus, the theorem follows.
Thanks to Theorem 5.8.5, we can remove identity maps from compositions:
e.g., the composition f ◦ g ◦ idP ◦ h (where P is the target of h and the domain
of g) can be simplified to f ◦ g ◦ h.
i 1 2 3 4
s1 ( i ) 2 1 3 4
.
s2 ( i ) 1 3 2 4
s3 ( i ) 1 2 4 3
(That is, each si is the function that transforms the two numbers i and i + 1
into one another while leaving all other inputs unchanged.)
Math 221 Winter 2024, version March 12, 2024 page 211
In other words: We say that f is injective if there are no two distinct elements
x1 , x2 ∈ X such that f ( x1 ) = f ( x2 ).
In other words: We say that f is injective if any two elements x1 , x2 ∈ X
satisfying f ( x1 ) = f ( x2 ) must also satisfy x1 = x2 .
(b) We say that f is surjective (aka onto, aka a surjection) if
• The function
f : N → N,
k 7 → k2
• Let S = {0, 1, 4, 9, 16, . . .} be the set of all perfect squares (i.e., all squares
of integers). Then, the function
g : N → S,
k 7 → k2
is injective (for the same reason as the f in the previous example) and
also surjective (since every perfect square can be written as k2 for some
k ∈ N). Thus, it is bijective.
Take note: The functions f and g differ only in their choice of target! Other
than that, they are indistinguishable (both have domain N, and send each
element of this domain to its square). But of course, this little difference
matters for the surjectivity, since the surjectivity depends crucially on the
target. No wonder that g is surjective while f is not.
• Let S = {0, 1, 4, 9, 16, . . .} be the set of all perfect squares again. Consider
the function
gZ : Z → S,
k 7 → k2 ,
which differs from g only in its domain (it allows all integers rather than
only nonnegative integers as inputs). This function gZ is not injective
(since gZ (1) = gZ (−1)), but is still surjective (since each perfect square
can be written as k2 for some k ∈ Z). Since it is not injective, it cannot be
bijective.
• The function
h : N → N,
k 7→ k//2
(recall that k//2 is the quotient of the division of k by 2) is not injective (for
example, the two distinct elements 0, 1 ∈ N satisfy h (0) = h (1), because
both h (0) = 0//2 and h (1) = 1//2 are 0), but is surjective (because for
each y ∈ N, there exists an x ∈ N such that h ( x ) = y, namely for example
x = 2y). Hence, it is not bijective.
Math 221 Winter 2024, version March 12, 2024 page 213
x a b c d ···
.
f (x) f ( a) f (b) f (c) f (d) ···
Then:
(a) The function f is injective if and only if the bottom row of this table has
no two equal entries.
(b) The function f is surjective if and only if every element of Y appears in
the bottom row.
(c) The function f is bijective if and only if every element of Y appears
exactly once in the bottom row.
Math 221 Winter 2024, version March 12, 2024 page 214
For example:
• The function
f : {1, 2, 3} → {7, 8, 9} ,
k 7→ k + 6
k 1 2 3
f (k) 7 8 9
(by noticing that every element of {7, 8, 9} appears exactly once in the
bottom row of this table). Of course, this can also be shown logically (by
arguing that f is injective and surjective because adding 6 can be undone
by subtracting 6).
• The function58
f : {4, 6, 7} → {0, 1, 2} ,
k 7→ k%3
k 4 6 7
f (k) 1 0 1
has the element 1 appear twice in the bottom row (so f is not injective)
and does not have the element 2 in its bottom row (so f is not surjective).
• the function f is injective if and only if no two arrows hit the same
Y-node;
• the function f is surjective if and only if every node in the Y-blob gets
hit by at least one arrow;
• the function f is bijective if and only if every node in the Y-blob gets
hit by exactly one arrow.
X Y
X Y
1 1
1 1
2 2
2 2
3 3
3
neither injective nor surjective
injective but not surjective
(since 1 ∈ Y is not hit,
(since 2 ∈ Y is not hit)
while 2 ∈ Y is hit twice)
X Y X Y
1 1 1 1
2 2 2 2
3 3 3
f : Z → Z,
x 7→ 3 − 2x.
f : N → N,
x 7→ x!.
f : Z × Z → Z × Z,
( x, y) 7→ ( x + y, x − y) .
f : Z × Z → Z × Z,
( x, y) 7→ ( x − y, y − x ) .
f : Z × Z → Z × Z,
( x, y) 7→ ( x + 2y, x + y) .
f : Z × Z → Z × Z,
( x, y) 7→ ( x + 2y, 2x + y) .
5.10. Inverses
5.10.1. Definition and examples
Bijective maps have a special power: They can be inverted. Here is what this
means:
Roughly speaking, an inverse of f thus means a map that both undoes f and
is undone by f .
Not every function has an inverse. We shall soon see which ones do and
which ones don’t; we will also prove that an inverse of f is unique if it exists.
For now, however, let us explore a few examples:
• Let f : {1, 2, 3} → {7, 8, 9} be the “add 6” function – i.e., the function that
sends each x ∈ {1, 2, 3} to x + 6 ∈ {7, 8, 9}. Then, f has an inverse: the
“subtract 6” function (i.e., the function from {7, 8, 9} to {1, 2, 3} that sends
each y to y − 6). Indeed, if we denote the “subtract 6” function by g, then
we have
• Let f : {1, 2, 3} → {7, 8, 9} be the “subtract from 10” function – i.e., the
function that sends each x ∈ {1, 2, 3} to 10 − x ∈ {7, 8, 9}. Then, f has
its inverse: In fact, this function f is its own inverse. This is because
10 − (10 − n) = n for each n ∈ Z.
side by side:
X Y Y X
1 1 1 1
2 2 2 2
3 3 3 3 .
4 4 4 4
5 5 5 5
f g
As you see, there is a “dual” relationship between these two diagrams:
Whenever the diagram of f has an arrow from some x ∈ X to some
y ∈ Y, the diagram of g has an arrow from y to x. In other words, the
diagram of g can be obtained from the diagram of f by swapping the X-
blob with the Y-blob and reversing the direction of each arrow. This rule
applies not just to our specific two maps f and g, but to any map f that
has an inverse. Thus, if you have drawn a blobs-and-arrows diagram of
a function f , it is fairly easy to construct its inverse (as long as such an
inverse exists).
This rule can also be restated in terms of tables of values: If you have a
table of all values of a function f : X → Y, then you can get an inverse of f
by swapping the two rows of this table. For instance, if f : {1, 2, 3, 4, 5} →
{1, 2, 3, 4, 5} is the function we just showed, then f has the table of values
k 1 2 3 4 5
,
f (k) 3 5 1 2 4
and thus you can get its inverse g by swapping the two rows:
k 3 5 1 2 4
.
g (k) 1 2 3 4 5
which is absurd.
The same argument shows that more generally, if a function f : X → Y
is to have an inverse, then f should be injective, because two distinct
elements x1 and x2 of X satisfying f ( x1 ) = f ( x2 ) would create a contra-
diction via x1 = g ( f ( x1 )) = g ( f ( x2 )) = x2 .
59 Wehave already done this in the above examples, but we repeat it for the sake of complete-
ness.
Math 221 Winter 2024, version March 12, 2024 page 220
Theorem 5.10.2 says that bijective maps are the same as invertible maps (i.e.,
maps that have an inverse). This is a fundamental result that is used all over
mathematics.
g1 ◦ f ◦ g2 = g1 ◦ idY = g1 with
| {z }
=idY
g ◦ f ◦ g = idX ◦ g2 = g2 ,
| 1{z } 2
=idX
we find g1 = g2 , qed.
that is,
These equalities should explain why the notation f −1 was chosen for the inverse
of f .
f : E → N,
k 7→ k/2.
f −1 : N → E,
k 7→ 2k.
f : R≥0 → R≥0 ,
x 7→ x2
f −1 : R≥0 → R≥0 ,
√
x 7→ x.
f : R → R,
x 7→ x2
has no inverse. In fact, this function is not injective (since f (1) = f (−1))
and not surjective (since −1 is not a square of a real number), so it is
certainly not bijective, and thus not invertible.
Math 221 Winter 2024, version March 12, 2024 page 222
• The function
f : R → R,
x 7→ x3
has an inverse. This inverse is the function
f −1 : R → R,
√
x 7→ 3 x.
f : Z × Z → Z × Z,
( x, y) 7→ ( x + 3y, 2x + 5y)
has an inverse. Give an explicit formula for this inverse (i.e., for f −1 ((u, v))).
[Hint: This is a linear algebra question, since f −1 ((u, v)) should be a pair
( x, y) ∈ Z × Z satisfying ( x + 3y, 2x + 5y) = (u, v).]
Proof. The map idX is an inverse of itself (since idX ◦ idX = idX and idX ◦ idX =
idX ). Hence, it has an inverse, and thus is bijective (by Theorem 5.10.2).
( f ◦ g ) −1 = g −1 ◦ f −1 .
Math 221 Winter 2024, version March 12, 2024 page 223
Proof. This is obvious from the blobs-and-arrows picture; but let us check this
rigorously.
For any x ∈ X, we have
g−1 ◦ f −1 (( f ◦ g) ( x )) = g−1 f −1 ( f ( g ( x ))) = g−1 ( g ( x )) = x.
| {z }
= g( x )
{1, 2}
{1} 1 {1}
1 2 1
g f
We know that f is injective. In other words, for any v1 , v2 ∈ Y satisfying f (v1 ) = f (v2 ),
we have v1 = v2 (by our definition of “injective”). Applying this to v1 = g (u1 ) and v2 =
g (u2 ), we obtain g (u1 ) = g (u2 ) (since f ( g (u1 )) = f ( g (u2 ))).
Math 221 Winter 2024, version March 12, 2024 page 225
Forget that we fixed z. We thus have shown that for any z ∈ Z, there exists
some y ∈ Y such that f (y) = z. In other words, the map f is surjective (by our
definition of “surjective”). This completes our proof.]
(f) This is false.
[Counterexample: For instance, we can set X = {1} and Y = {1, 2} and Z =
{1}, and let f : Y → Z be the map that sends both elements of Y to 1, while
g : X → Y is the map sending 1 to 1. Then, f ◦ g is surjective (in fact, f ◦ g is
the identity map id{1} ), but g is not.]
(g) This is true.
[Proof: This is part of Theorem 5.10.7. But let us give a different proof as
well: Assume that f and g are bijective. Thus, f and g are both injective and
surjective. Hence, f ◦ g is injective (by Exercise 5.11.1 (a)) and surjective (by
Exercise 5.11.1 (d)). Thus, f ◦ g is bijective.]
(h) This is false.
[Counterexample: For instance, we can set X = {1} and Y = {1, 2} and Z =
{1}, and let f : Y → Z be the map that sends both elements of Y to 1, while
g : X → Y is the map sending 1 to 1. Then, f ◦ g is bijective (in fact, f ◦ g is the
identity map id{1} ), but f is not.]
(i) This is false.
[Counterexample: For instance, we can set X = {1} and Y = {1, 2} and Z =
{1}, and let f : Y → Z be the map that sends both elements of Y to 1, while
g : X → Y is the map sending 1 to 1. Then, f ◦ g is bijective (in fact, f ◦ g is the
identity map id{1} ), but g is not.]
Solution. We shall prove the “=⇒” and “⇐=” parts of this equivalence sepa-
rately:
=⇒: If we have f ( x ) = y, then
f −1 y = f −1 ( f ( x )) = x (by (51)) .
|{z}
= f (x)
Exercise 5.11.3. Let A and B be two sets. As we know from Exercise 4.4.3,
the two sets A × B and B × A are usually not the same. However, I claim that
there is a bijective map from A × B to B × A. Prove this (by finding one such
map, and showing that it is bijective).
f : A × B → B × A,
( a, b) 7→ (b, a) .
This is the map that sends each pair ( a, b) ∈ A × B to the pair (b, a) ∈ B × A; in
other words, it swaps the two entries of the input pair. Likewise, consider the
map
g : B × A → A × B,
(b, a) 7→ ( a, b)
(which does the same as f , but does it to a pair in B × A instead of a pair in
A × B). Let us show that these two maps f and g are mutually inverse.
Indeed, in order to show this, we must check that f ◦ g = idB× A and g ◦ f =
id A× B .
Let us check that f ◦ g = idB× A . This means checking that ( f ◦ g) (y) =
idB× A (y) for each y ∈ B × A. So let y ∈ B × A be arbitrary. Thus, y is a pair
(b, a) with b ∈ B and a ∈ A. Consider these b and a. Hence, y = (b, a), so that
g (y) = g ((b, a)) = ( a, b) (by the definition of g). By the definition of f ◦ g, we
have
(by the definition of f ). Comparing this with idB× A (y) = y = (b, a), we obtain
( f ◦ g) (y) = idB× A (y).
Forget that we fixed y. We thus have shown that ( f ◦ g) (y) = idB× A (y) for
each y ∈ B × A. Thus, we have proved the equality f ◦ g = idB× A . Similarly we
can show the equality g ◦ f = id A× B (since the maps f and g are constructed
in the same way, just with the roles of A and B switched). These two equalities
(together) show that the map g is an inverse of f . Hence, the map f has an
inverse, and thus is bijective (by Theorem 5.10.2). Thus, there exists a bijective
map from A × B to B × A (namely, f ).
X Y
1 1
2 2
3
.
X Y
1 1
2 2
3
.
Math 221 Winter 2024, version March 12, 2024 page 229
f : Q → Q,
x
x 7→ .
1 + x2
(b) The function
f : Z → Q,
x
x 7→ .
1 + x2
(c) The function
f : Z × Z → Z,
( x, y) 7→ 2x + 3y.
f : N × N → N,
( x, y) 7→ 2x + 3y.
Definition 5.12.1. Let X and Y be two sets. We say that these two sets X
and Y are isomorphic as sets (or, for short, isomorphic, or in bijection, or
in one-to-one correspondence, or equinumerous) if there exists a bijective
map from X to Y.
Some examples:
• The sets {1, 2} and {1, 2, 3} are not isomorphic. In fact, there is no sur-
jective map f : {1, 2} → {1, 2, 3} (since, informally, a map from {1, 2} to
{1, 2, 3} has only two arrows, but two arrows cannot hit all three elements
of {1, 2, 3}). Thus, there is no bijective map f : {1, 2} → {1, 2, 3} either.
• The sets {1, 2, 3} and {6, 7, 8} are isomorphic. In fact, the map
{1, 2, 3} → {6, 7, 8} ,
k 7→ k + 5
(that is, the “add 5” map) is bijective (and its inverse sends k 7→ k − 5).
• The sets {1, 2, 3} and {3, 8, 19} are isomorphic. In fact, the map f :
{1, 2, 3} → {3, 8, 19} with the table of values
x 1 2 3
f (x) 3 8 19
is bijective.
• The sets {1, 2, 3} and {1, 3, 5} are isomorphic. In fact, the map
{1, 2, 3} → {1, 3, 5} ,
k 7→ 2k − 1
is a bijection.
N → E,
n 7→ 2n
is a bijection.
• The sets N and O := {all odd nonnegative integers} are isomorphic, since
the map
N → O,
n 7→ 2n + 1
is a bijection.
Math 221 Winter 2024, version March 12, 2024 page 231
(nice and not-too-easy exercise: prove this!). This is the so-called Cantor
pairing function.
• The sets N and R are not isomorphic, i.e., there exists no bijection from
N to R. Informally speaking, this is because there are “a lot more” real
numbers than there are nonnegative integers. This is not a proof at all (af-
ter all, N and Q are isomorphic, despite the rational numbers seemingly
outnumbering the nonnegative integers!); an actual proof can be found
(e.g.) in [Newste23, Theorem 6.2.21] or in [LeLeMe16, Corollary 8.1.17].
Math 221 Winter 2024, version March 12, 2024 page 233
6. Enumeration revisited
6.1. Counting, formally
6.1.1. Definition
As you might have noticed, isomorphic sets (at least when they are finite) have
the same number of elements – i.e., the same size. We shall now use this to
define the size of a set!
First, some notations:
Definition 6.1.1. (a) If n ∈ N, then [n] shall mean the set {1, 2, . . . , n}.
For example, [3] = {1, 2, 3} and [7] = {1, 2, 3, 4, 5, 6, 7} and [0] = ∅ and
[1] = {1}.
(b) If a, b ∈ Z, then [ a, b] shall mean the set
For example:
• The set {“cat”, “dog”, “rat”} has size 3, since the map
is a bijection.
• The set {4, 5, 6, 7} has size 4, since the map
{4, 5, 6, 7} → [4] ,
k 7→ k − 3
is a bijection.
• The set N is infinite, so there is no bijection from N to [n] for any n ∈ N.
Thus, N does not have size n for any n ∈ N.
Math 221 Winter 2024, version March 12, 2024 page 234
In other words, a set has size n (for n > 0) if and only if we can remove
a single element from it and obtain a set of size n − 1. This is a recursive
definition, as it reduces the question “what is a set of size n” to the (simpler)
question “what is a set of size n − 1”.
The following fact is not obvious, but can be proved:
Theorem 6.1.4. (a) The above two definitions of size (Definition 6.1.2 and
Definition 6.1.3) are equivalent.
(b) The size of a finite set is determined uniquely – i.e., a set cannot have
two different sizes at the same time.
Definition 6.1.5. (a) An n-element set (for some n ∈ N) means a set of size
n.
(b) A set is said to be finite if it has size n for some n ∈ N.
(c) If S is a finite set, then |S| shall denote the size of S (which is unique
because of Theorem 6.1.4 (b)).
(d) We also refer to |S| as the cardinality of S, or as the number of elements
of S. In particular, the number of some things means the size of the set of
these things.
The number of odd integers between 4 and 10 is the size of the set
Theorem 6.1.6 (Bijection Principle). Let A and B be two finite sets. Then,
| A| = | B| if and only if there exists a bijection from A to B.
Here, we recall that [n] means the set {1, 2, . . . , n} consisting of the first n
positive integers.
The next rule classifies sets of small size:
|S ∪ {t}| = |S| + 1.
Theorem 6.1.10 (Sum rule for two sets). Let A and B be two disjoint finite
sets. (Recall that “disjoint” means A ∩ B = ∅.) Then, the set A ∪ B is again
finite, and has size
| A ∪ B| = | A| + | B| .
Math 221 Winter 2024, version March 12, 2024 page 236
| A1 ∪ A2 ∪ · · · ∪ A k | = | A1 | + | A2 | + · · · + | A k | .
The following theorem has been previously stated (without using the “size”
terminology) as Theorem 4.4.8:
Theorem 6.1.13 (Product rule for two sets). Let A and B be any finite sets.
Then, the set
| A × B| = | A| · | B| .
| A1 × A2 × · · · × A k | = | A1 | · | A2 | · · · · · | A k | .
All the above theorems are foundational, and are perhaps the reason why the
arithmetic operations +, − and · on nonnegative integers have been introduced
some millennia ago. Nevertheless, they can be rigorously proved, but this is
not something we will do here.62
62 Forinstance, Theorem 6.1.13 can be proved by induction on | B| using Theorem 6.1.11 and
Theorem 6.1.6, whereas Theorem 6.1.14 can be proved by induction on k using Theorem
6.1.13.
Math 221 Winter 2024, version March 12, 2024 page 237
The above theorems are known as “basic counting rules” or “counting prin-
ciples”. There are a few more counting principles, which we might state later
on.
Theorem 6.1.15. Let A and B be two finite sets (not necessarily disjoint).
Then, the set A ∪ B is finite and has size
| A ∪ B| = | A| + | B| − | A ∩ B| .
Partial proof. We shall take for granted that A ∪ B is finite, and only prove the
equality | A ∪ B| = | A| + | B| − | A ∩ B| here.
We first claim that
( A ∪ B) \ A = B \ ( A ∩ B) . (53)
This equality is obvious using Venn diagrams, but let us prove it rigorously
using “element chasing”:63
Proof of (53). Let us first prove ( A ∪ B) \ A ⊆ B \ ( A ∩ B). In order to do so, we
must show that each x ∈ ( A ∪ B) \ A belongs to B \ ( A ∩ B). Let us do this:
Let x ∈ ( A ∪ B) \ A. Thus, x ∈ A ∪ B but x ∈ / A. From x ∈ A ∪ B, we see
that x ∈ A or x ∈ B. But the first of these two possibilities is impossible (since
x∈/ A). Thus, the second possibility must hold. In other words, we have x ∈ B.
Furthermore, we have x ∈ / A ∩ B (since x ∈ A ∩ B would entail x ∈ A ∩ B ⊆ A,
which would contradict x ∈ / A). Combining x ∈ B with x ∈ / A ∩ B, we obtain
x ∈ B \ ( A ∩ B ).
Forget that we fixed x. We thus have shown that each x ∈ ( A ∪ B) \ A belongs
to B \ ( A ∩ B). In other words, ( A ∪ B) \ A ⊆ B \ ( A ∩ B).
Next, let us prove that B \ ( A ∩ B) ⊆ ( A ∪ B) \ A. To do so, we must show
that each x ∈ B \ ( A ∩ B) belongs to ( A ∪ B) \ A. We do this as follows: Let
x ∈ B \ ( A ∩ B). Thus, x ∈ B but x ∈ / A ∩ B. If we had x ∈ A, then we would
have x ∈ A ∩ B (since x ∈ A and x ∈ B), which would contradict x ∈ / A ∩ B.
Hence, we cannot have x ∈ A. Thus, x ∈ / A. Also, x ∈ B ⊆ A ∪ B. Combining
this with x ∈/ A, we find x ∈ ( A ∪ B) \ A.
Forget that we fixed x. We thus have shown that each x ∈ B \ ( A ∩ B) belongs
to ( A ∪ B) \ A. In other words, B \ ( A ∩ B) ⊆ ( A ∪ B) \ A.
Now, combining the two inclusions
( A ∪ B) \ A ⊆ B \ ( A ∩ B) and B \ ( A ∩ B) ⊆ ( A ∪ B) \ A,
63 Itis worth noting that both sides of (53) are equal to B \ A. However, we will not need this
fact.
Math 221 Winter 2024, version March 12, 2024 page 238
|( A ∪ B) \ A| = | B \ ( A ∩ B)| . (54)
|( A ∪ B) \ A| = | A ∪ B| − | A| . (55)
| B \ ( A ∩ B)| = | B| − | A ∩ B| . (56)
But we know from (54) that the left hand sides of the two equalities (55) and
(56) are equal. Thus, their right hand sides are also equal. In other words,
| A ∪ B| − | A| = | B| − | A ∩ B| .
| A ∪ B| = | A| + | B| − | A ∩ B| .
| A ∪ B ∪ C | = | A| + | B| + |C | − | A ∩ B| − | A ∩ C | − | B ∩ C | + | A ∩ B ∩ C | .
More generally, such a formula can be stated for any k finite sets, and is known
as the “principle of inclusion and exclusion” or “Sylvester’s sieve formula”. See
[Grinbe22, Lecture 19, §2.7] or any textbook on combinatorics.
| A ∪ B| = | A \ B| + | B \ A| + | A ∩ B| .
{ a, a + 1, a + 2, . . . , b} = { x ∈ Z | a ≤ x ≤ b}
whenever a and b are two integers. In particular, [n] = [1, n] for every n ∈ N.
We begin with Proposition 4.2.2 (rewritten using the notation [ a, b]):
f : [ b − a + 1] → [ a, b] ,
| {z } |{z}
={1,2,...,b− a+1} ={ a,a+1,...,b}
i 7 → i + ( a − 1) .
This map f just adds a − 1 to its input. (Informally, we can view it as moving
numbers to the right by a − 1 units on the number line.)
It is easy to see that this map f has an inverse: Namely, the map
[ a, b] → [b − a + 1] ,
j 7 → j − ( a − 1)
(# of subsets of [n]) = 2n .
The proof we gave in Section 4.3 had some informal steps; let us now make
it rigorous:64
Rigorous proof of Theorem 6.2.2. We induct on n.
The base case (n = 0) is easy: The set [0] is empty, and thus its only subset is
{} itself; hence, the # of subsets of [0] is 1 = 20 . In other words, Theorem 6.2.2
holds for n = 0.
Induction step: We proceed from n − 1 to n. Thus, let n be a positive integer.
We assume (as the induction hypothesis) that Theorem 6.2.2 holds for n − 1
instead of n, and we set out to prove that it holds for n.
So our induction hypothesis says that
(# of subsets of [n]) = 2n .
We define
A set cannot be red and green at the same time. In other words, the sets
are disjoint65 . Hence, the sum rule for two sets (Theorem 6.1.10, applied to
A = {red sets} and B = {green sets}) yields
(This is just a formal way to say “the # of all sets that are red or green equals
the # of red sets plus the # of green sets”. Indeed, the notation {red sets} means
the set of all red sets, and thus the expression |{red sets}| means the size of the
set of all red sets, i.e., the # of all red sets.)
Furthermore, each subset of [n] is either red or green (and conversely, each
red or green set is a subset of [n]). Hence,
Therefore,
Thus it remains to count the red sets and the green sets separately.
The green sets are easy: They are just the subsets of [n − 1]. Hence,
65 Keep
in mind: The notation “{red sets}” stands for the set of all red sets. For example, if
n = 3, then
{red sets} = {{3} , {1, 3} , {2, 3} , {1, 2, 3}} .
Math 221 Winter 2024, version March 12, 2024 page 242
and
remn : {red sets} → {green sets} ,
R 7→ R \ {n} .
It is easy to see that both of these maps insn and remn are well-defined66 . A
little bit of set-theoretic computation shows that
insn (remn ( R)) = R for every red set R
(because if R is a red set, then
insn (remn ( R)) = remn ( R) ∪ {n} (by the definition of insn )
| {z }
= R\{n}
(by the definition of remn )
= ( R \ {n}) ∪ {n} = R (since n ∈ R (because R is red))
). Similarly,
remn (insn ( G )) = G for every green set G.
These two equalities show that the map remn is an inverse of insn . Hence, the
map insn has an inverse, i.e., is bijective (by Theorem 5.10.2). In other words,
insn is a bijection. Hence, there exists a bijection from {green sets} to {red sets}
(namely, insn ). Thus, the bijection principle yields
|{green sets}| = |{red sets}| .
In other words,
(# of green sets) = (# of red sets) ,
and thus
(# of red sets) = (# of green sets) = 2n−1 .
Combining what we have shown, we now obtain
(# of subsets of [n]) = (# of red sets) + (# of green sets)
| {z } | {z }
=2n −1 =2n −1
n −1 n −1 n −1 n
=2 +2 = 2·2 =2 .
This is precisely what we needed to prove. This completes the induction step,
and thus Theorem 6.2.2 is proved.
66 Indeed, we need to show that
• if G is a green set, then G ∪ {n} is a red set;
• if R is a red set, then R \ {n} is a green set.
Both of these claims are very easy. For instance, if G is a green set, then G is a subset of
[n], and thus G ∪ {n} is a subset of [n] as well (since n ∈ [n]), and furthermore is red (since
n ∈ {n} ⊆ G ∪ {n}).
Math 221 Winter 2024, version March 12, 2024 page 243
(# of subsets of S) = 2n .
Rigorous proof. Informally, we derived this from Theorem 6.2.2 by renaming the
elements of S as 1, 2, . . . , n (so that S became the set [n]).
Rigorously, this means setting up a one-to-one correspondence between the
subsets of S and the subsets of [n], and then using the bijection principle to
argue that the # of the former equals the # of the latter.
How do we get this correspondence? First, we set up a one-to-one corre-
spondence between the elements of S and the elements of [n]. (This is what
the “renaming” in our informal proof was secretly doing.) Formally, this can
be done as follows:
The set S is an n-element set, i.e., has size n. Hence, by the definition of size,
the set S is isomorphic to [n]. In other words, there is a bijection α : S → [n].
Consider this α. Being a bijection, the map α has an inverse α−1 (by Theorem
5.10.2).
Now, define a map
(This map α−1 ∗ is defined in the same way as α∗ , but using the map α−1
to each element of a given set and then applying α−1 to the results will recover
the original set, and likewise if you apply α−1 first and then α). Thus, the
map α∗ has an inverse, i.e., is a bijection (by Theorem 5.10.2). Thus, we have
Math 221 Winter 2024, version March 12, 2024 page 244
In other words,
Rigorous proof. We induct on n (without fixing k). That is, we use induction on
n to prove the statement
for each n ∈ N.
Base case: Let k be any number. The only 0-element set is ∅, and its only
subset is ∅. Thus, a 0-element set S necessarily has one 0-element subset (∅)
and no other subsets. Hence, it satisfies
(
1, if k = 0;
(# of k-element subsets of S) =
0, else.
Each k-element subset of [n] is either red or green (but not both). Hence,
using the sum rule for two sets, we find
(This is proved just as we proved (57) in the rigorous proof of Theorem 6.2.2.)
The green sets are just the k-element subsets of [n − 1]. Thus,
Conversely, if B is a blue set, then B ∪ {n} is a red set68 . Thus, we obtain a map
These two maps remn and insn are mutually inverse69 . Thus, the map remn
has an inverse, i.e., is bijective (by Theorem 5.10.2). Hence, we have found a
bijection from {red sets} to {blue sets} (namely, remn ). The bijection principle
therefore yields
|{red sets}| = |{blue sets}| .
In other words,
(again by the statement P (n − 1), but now applied to k − 1 instead of k). Note
that we deliberately formulated P (n) as a “for any k” statement (rather than
fixing k at the onset of our proof), so that we were now able to apply P (n − 1)
to k − 1 instead of k.
67 Proof.Let R be a red set. Then, R is a k-element set (by the definition of a red set), so that
| R| = k. Moreover, n ∈ R (by the definition of a red set), so that {n} ⊆ R. Hence, the
difference rule (Theorem 6.1.12 (b), applied to S = R and T = {n}) yields | R \ {n}| =
| R| − |{n}| = k − 1. Hence, R \ {n} is a (k − 1)-element set. Since R \ {n} is furthermore
|{z} | {z }
=k =1
a subset of [n − 1] (because R is a subset of [n], and we are removing n from it), we thus
conclude that R \ {n} is a (k − 1)-element subset of [n − 1], that is, a blue set.
68 Proof. Let B be a blue set. Then, B is a ( k − 1)-element subset of [ n − 1] (by the definition
Then, a simple application of Theorem 6.1.12 (b) would have shown that
S \ {t} is an (n − 1)-element set, so we could apply our induction hypothesis
P (n − 1) to it. Thus, the above argument could be made using S, t and
S \ {t} instead of [n], n and [n − 1]. In particular, the green sets would be
precisely the k-element subsets of S \ {t}, whereas the red sets would be in
one-to-one correspondence (i.e., bijection) with the (k − 1)-element subsets
of S \ {t} (and the bijection would be given by removing/inserting t). This
argument would be not only shorter but also more conceptual than the one
we gave above.
However, I chose to give the proof I gave because it has the advantage of
familiarity (the set [n] = {1, 2, . . . , n} is easier to visualize than an arbitrary
n-element set), and in order to illustrate how the bijection principle can be
used to rename the elements of a given set in a convenient way.
Likewise, Theorem 6.2.3 could also be proved more directly: Instead of
deducing it from Theorem 6.2.2 via “renaming”, we could have proved it
by induction, again picking an element t of S in the induction step, defin-
ing red and green sets, and counting both kinds of sets using the induction
hypothesis (applied to the (n − 1)-element set S \ {t}).
Let us derive a nice, if simple, corollary from our last few theorems:
Corollary 6.2.6. Let n ∈ N. Then,
n
n
∑ k = 2n .
k =0
Math 221 Winter 2024, version March 12, 2024 page 248
Proof. Consider the n-element set [n] = {1, 2, . . . , n}. This set has size n, so each
subset of [n] must have size ≤ n (by Theorem 6.1.12 (a)). Hence, each subset of
[n] has size 0 or size 1 or size 2 or · · · or size n. Thus, we can write the set
{subsets of [n]}
as a union
Furthermore, this union is a union of disjoint sets (since a subset of [n] cannot
have several distinct sizes at once). Therefore, the sum rule for k sets (Theorem
6.1.11) yields
|{subsets of [n]}|
= |{0-element subsets of [n]}|
+ |{1-element subsets of [n]}|
+ |{2-element subsets of [n]}|
+···
+ |{n-element subsets of [n]}| .
70
70 In more details:
The n + 1 sets
are finite (since [n] has only finitely many subsets) and disjoint (since a subset of [n] cannot
Math 221 Winter 2024, version March 12, 2024 page 249
have several distinct sizes at once). Thus, the sum rule for k sets (Theorem 6.1.11) yields
Since
{subsets of [n]}
is the union
|{subsets of [n]}|
= |{0-element subsets of [n]}|
+ |{1-element subsets of [n]}|
+ |{2-element subsets of [n]}|
+···
+ |{n-element subsets of [n]}| .
Math 221 Winter 2024, version March 12, 2024 page 250
In other words,
(# of subsets of [n])
= (# of 0-element subsets of [n])
+ (# of 1-element subsets of [n])
+ (# of 2-element subsets of [n])
+···
+ (# of n-element subsets of [n])
n
= ∑ (|# of k-element{zsubsets of [n])
k =0 }
n
=
k
(by Theorem 6.2.4, applied to S=[n])
n
n
= ∑ .
k =0
k
Thus,
n
n
∑ k = (# of subsets of [n]) = 2n
k =0
(by Theorem 6.2.2).
Corollary 6.2.6 can also be easily obtained from the binomial formula (this
was part of Exercise 2.6.1 (a)). Our above proof, however, reveals its combina-
torial meaning: It comes from the comparison of two different ways to count
one and the same thing (viz., the subsets of [n]). This technique of proving
equalities is called double counting, and has multiple other applications (see,
e.g., [Newste23, §8.1]).
)
Math 221 Winter 2024, version March 12, 2024 page 252
• How many ways are there to choose 3 odd integers between 0 and 20,
if the order matters (i.e., we count the choice 1, 3, 5 as different from the
choice 3, 1, 5)? (The answer is 1000.)
We can solve this now: To choose 3 odd integers between 0 and 20, if
the order matters, amounts to choosing a 3-tuple ( a, b, c) where a, b, c ∈
{1, 3, 5, . . . , 19}. Since this set {1, 3, 5, . . . , 19} is a 10-element set (because
Proposition 4.2.1 yields that the # of odd integers between 0 and 20 is
(20 + 1) //2 = 10), the # of these 3-tuples is 10 · 10 · 10 = 1000 (by Theo-
rem 4.4.5).
• How many ways are there to choose 3 odd integers between 0 and 20, if
the order does not matter? (The answer is 220.)
We cannot solve this yet, at least not if the values 3 and 20 are generalized
to k and n. This will be done in Theorem 6.6.9.
• How many ways are there to choose 3 distinct odd integers between 0 and
20, if the order matters? (The answer is 720.)
We cannot solve this yet, at least not if the values 3 and 20 are generalized
to k and n. This will be done in Theorem 6.2.4.
• How many ways are there to choose 3 distinct odd integers between 0 and
20, if the order does not matter? (The answer is 120.)
We can solve this now: This amounts to counting the 3-element subsets
of {1, 3, 5, . . . , 19}; but Theorem 6.2.4 answers such questions. Since the
10
set {1, 3, 5, . . . , 19} has size 10, its number of 3-element subsets is =
3
10 · 9 · 8
= 120.
3!
• How many prime factorizations does 200 have (where we count different
orderings as distinct)? (The answer is 10. This is a mix between a number
theory problem and a counting problem.)
We can solve this now, at least for 200: We know that 200 = 2 · 2 · 2 · 5 · 5.
Thus, by the fundamental theorem of arithmetic, all prime factorizations
of 200 consist of five factors, three of which are 2’s and two of which
are 5’s. The only freedom is in choosing where to place the three 2’s
among the five positions (of course, the two 5’s will then have to occupy
the remaining positions). There are 5 factors in total, so 5 positions, and
we have to choose 3 of these 5 positions to put our three 2’s in. This
Math 221 Winter 2024, version March 12, 2024 page 253
• How many ways are there to tile a 2 × 15-rectangle with dominos (i.e.,
rectangles of size 1 × 2 or 2 × 1) ? (The answer is 987.)
We cannot solve this yet. But we will outline a solution in Subsection
6.4.6.
• How many addends do you get when you expand the product
( a + b) (c + d + e) ( f + g) ? (The answer is 12.)
We can solve this now: Each addend consists of exactly one of a and b,
exactly one of c, d and e, and exactly one of f and g. So the addends are
in one-to-one correspondence with the triples ( x, y, z) where x ∈ { a, b}
and y ∈ {c, d, e} and z ∈ { f , g}. Thus, their # is 2 · 3 · 2 (since { a, b} is a
2-element set, {c, d, e} is a 3-element set, and { f , g} is a 2-element set).
Note that we are using the fact that the addends all end up distinct, so
they don’t cancel or combine.
• How many differentmonomials do you get when you expand the product
( a − b) a2 + ab + b2 ? (This one is more of an algebra problem, but I
wanted to list it because it is connected to counting. The answer is 2,
because ( a − b) a2 + ab + b2 = a3 − b3 .)
This is not a combinatorics problem: The answer is 2, because we have
( a − b) a + ab + b = a − b3 . The other addends all cancel out, so you
2 2 3
• How many positive divisors does 24 have? (We can actually list them:
1, 2, 3, 4, 6, 8, 12, 24. This one is again a mix of a counting problem and
a number theory problem.)
Okay, but let us generalize: How many positive divisors does a given
positive integer n have? We cannot solve this yet. In this course, we will
not get to solve it, but it is not too hard to solve using the methods we
Math 221 Winter 2024, version March 12, 2024 page 254
The word “lacunar” comes from Latin “lacuna” (= “gap”). The idea is that a
lacunar set has a “gap” (or “buffer zone”) between any two distinct elements.
For example, the set {2, 4, 7} is lacunar, but the set {2, 4, 5} is not (since 4 and
5 are consecutive integers). Any 1-element set of integers is lacunar, and so is
the empty set.
Now we can ask ourselves some natural questions: For given n ∈ N,
1. how many lacunar subsets does the set [n] = {1, 2, . . . , n} have?
2. how many k-element lacunar subsets does [n] have for a given k ∈ N?
3. what is the largest size of a lacunar subset of [n] ?
We shall answer all these three questions in this section.
It remains to show that this size is the largest possible – i.e., that if L is a
lacunar subset of [n], then
n+1
| L| ≤ .
2
n+1
So let L be a lacunar subset of [n]. Our goal is to prove that | L| ≤ .
2
n+1
We shall first prove that | L| ≤ .
2
Here are two different ways to prove this (each way illustrates a nice tech-
nique):
n+1
First proof of | L| ≤ . Let ℓ1 , ℓ2 , . . . , ℓk be the elements of L, listed in increas-
2
ing order, so that L = {ℓ1 , ℓ2 , . . . , ℓk } and ℓ1 < ℓ2 < · · · < ℓk . Thus, | L| = k.
Now, we assume (for the moment) that k > 0. Thus, k ≥ 1 (since k is
an integer). We have ℓ1 ∈ L ⊆ [n], so that ℓ1 ≥ 1. Moreover, the elements
ℓ1 and ℓ2 of L satisfy ℓ1 < ℓ2 and ℓ2 ̸= ℓ1 + 1 (since L is lacunar), so that
ℓ2 ≥ ℓ1 + 2 ≥ 1 + 2 = 3. Furthermore, the elements ℓ2 and ℓ3 of L satisfy
|{z}
≥1
ℓ2 < ℓ3 and ℓ3 ̸= ℓ2 + 1 (since L is lacunar), so that ℓ3 ≥ ℓ2 + 2 ≥ 3 + 2 = 5.
|{z}
≥3
Proceeding in the same way, we find that
(Strictly speaking, this can be proved by induction on i. The base case follows
from ℓ1 ≥ 1 = 2 · 1 − 1, whereas the induction step requires deriving ℓi+1 ≥
2 (i + 1) − 1 from ℓi ≥ 2i − 1, which can be done by observing that L is lacunar
and therefore ℓi+1 ≥ ℓi + 2 ≥ 2i − 1 + 2 = 2 (i + 1) − 1.)
|{z}
≥2i −1
Now, we can apply (59) to i = k, and thus obtain ℓk ≥ 2k − 1. However,
ℓk ∈ L ⊆ [n], so that ℓk ≤ n. Thus, n ≥ ℓk ≥ 2k − 1, so that n + 1 ≥ 2k and
n+1
thus ≥ k. We have proved this under the assumption that k > 0, but this
2
n+1
also holds in the opposite case (because if k ≤ 0, then ≥ 0 ≥ k). Thus, we
2
n+1
always have ≥ k (independently of any assumptions). In other words,
2
n+1 n+1
we have ≥ | L| (since k = | L|). In other words, we have | L| ≤ .
2 2
n+1
Second proof of | L| ≤ . Define a new set
2
L+ := {ℓ + 1 | ℓ ∈ L} .
Math 221 Winter 2024, version March 12, 2024 page 256
L+ = {i ∈ Z | i − 1 ∈ L }
L ∪ L+ ≤ |[n + 1]| = n + 1.
If the sets L and L+ had an element j in common, then both j − 1 and j would
belong to L (indeed, j ∈ L+ = {i ∈ Z | i − 1 ∈ L} would entail j − 1 ∈ L),
which would contradict the fact that L is lacunar (since j − 1 and j are two
consecutive integers). Thus, the sets L and L+ have no element in common. In
other words, they are disjoint. Hence, by the sum rule (Theorem 6.1.10, applied
to A = L and B = L+ ), we have | L ∪ L+ | = | L| + L+ = | L| + | L| = 2 · | L|.
|{z}
=| L|
Hence,
2 · | L| = L ∪ L+ ≤ n + 1.
n+1
In other words, | L| ≤ .
2
n+1
We have now proved (in two different ways) that | L| ≤ . Now, recall
2
If x is a real number, then ⌊ x ⌋ is the
the definition of the floor of a real number:
n+1 n+1
largest integer that is ≤ x. Hence, is the largest integer that is ≤ .
2 2
n+1 n+1
Therefore, any integer that is ≤ must also be ≤ . Applying this
2 2
n+1 n+1
to the integer | L|, we conclude that | L| ≤ (since | L| ≤ ). As
2 2
explained above, this completes the proof of Proposition 6.4.2.
for n in range(10):
print("For n = " + str(n) + ", the number is " + str(num_lacs(n)))
The first two lines here speak for themselves (once you know that all is
the universal quantifier). The function Subsets computes the set of all subsets
of a given set, or (if we provide it an integer n as input) all subsets of [n].
The sum(1 for S in SomeSet) construction is just a slick way of counting the
elements of SomeSet, exploiting the fact that a sum of the form 1 + 1 + · · · + 1
equals the number of its addends. The last two lines are prompting SageMath
to compute the # of lacunar subsets of [n] for each n ∈ [0, 9] (note that range(a,
b) means the integer interval [ a, b − 1] in SageMath) and to output these 10
numbers. I refer to [Grinbe19a, §1.4.3] for more hints on the use of SageMath,
and to its documentation for a more systematic introduction. Note that you
can use SageMathCell to easily call SageMath from your browser (although the
computations you call are limited by 30 seconds each, since they happen on the
server).
The answers we get from SageMath are interesting:
n 0 1 2 3 4 5 6 7 8 9
.
# of lacunar subsets of [n] 1 2 3 5 8 13 21 34 55 89
f 0 = 0, f 1 = 1, and
f n = f n −1 + f n −2 for each n ≥ 2.
71 SageMath is built on top of the Python programming language, so you will recognize a
lot of Python syntax. Actually, the only piece of non-Python code in the following code
snippet is the Subsets(n) part. If you want to use (pure) Python instead of SageMath,
you can replace sum(1 for S in Subsets(n) if is_lacunar(S)) by sum(1 for i in
range(n+1) for S in combinations(range(1, n+1), i) if is_lacunar(S)), after first
importing the combinations function from the itertools package (using from itertools
import combinations).
Math 221 Winter 2024, version March 12, 2024 page 258
n 0 1 2 3 4 5 6 7 8 9 10 11 12 13
.
fn 0 1 1 2 3 5 8 13 21 34 55 89 144 233
The two above tables have the same entries, if you discount the fact that the
first two Fibonacci numbers f 0 = 0 and f 1 = 1 are missing from the former
table. So we have good reasons to suspect that
We have ℓ−1 = 1 (since the set [−1] = ∅ has only one lacunar subset, namely
∅ itself) and f −1+2 = f 1 = 1. Hence, ℓ−1 = 1 = f −1+2 . In other words, (60)
holds for n = −1. A similar computation shows that (60) holds for n = 0.
Let us next show the following:
(by the sum rule, since each lacunar subset of [n] is either red or green but
cannot be both at the same time72 ).
The green lacunar subsets of [n] are just the lacunar subsets of [n − 1] (since
“green” means “does not contain n”). Thus,
It is easy to see (just as in the proof of Theorem 6.2.2) that the map remn is an
inverse of insn . Thus, the map insn has an inverse, i.e., is bijective (by Theorem
5.10.2). Hence, we have found a bijection
In other words,
(# of lacunar subsets of [n − 2]) = (# of red lacunar subsets of [n]) .
Thus,
(# of red lacunar subsets of [n]) = (# of lacunar subsets of [n − 2]) = ℓn−2
(by the definition of ℓn−2 ).
Altogether,
ℓn = (# of red lacunar subsets of [n]) + (# of green lacunar subsets of [n])
| {z } | {z }
=ℓn−2 =ℓn−1
= ℓ n −2 + ℓ n −1 = ℓ n −1 + ℓ n −2 .
This proves Claim 1.
Now we still need to prove (60). In other words, we need to prove that the
two sequences (ℓ−1 , ℓ0 , ℓ1 , . . .) and ( f 1 , f 2 , f 3 , . . .) are identical. But at this point,
this is very easy: These two sequences
• have the same two starting entries ℓ−1 = f 1 and ℓ0 = f 2 (this can be easily
checked directly),
• and satisfy the same recursive equation: namely, each entry of either
sequence is the sum of the preceding two entries (since Claim 1 yields
ℓn = ℓn−1 + ℓn−2 , whereas the definition of the Fibonacci numbers yields
f n+2 = f n+1 + f n ).
Since a recursively defined sequence is uniquely determined by its starting
entries and its recursive equation, we thus conclude that the two sequences
(ℓ−1 , ℓ0 , ℓ1 , . . .) and ( f 1 , f 2 , f 3 , . . .) are identical. Thus, (60) follows. This slightly
informal argument can be formalized as a straightforward strong induction73 .
73 Proof. Let us prove (60) by strong induction on n:
Base case: We have already checked that (60) holds for n = −1.
Induction step: Let n ≥ 0 be an integer. Assume (as the induction hypothesis) that the
claim (60) holds for each of −1, 0, 1, . . . , n − 1 instead of n. We must prove that (60) holds
for n as well, i.e., that we have ℓn = f n+2 .
If n = 0, then this follows from the fact (observed above) that (60) holds for n = 0. It thus
remains to consider the case when n ̸= 0. So let us assume that n ̸= 0. Since n ≥ 0, we thus
obtain n ≥ 1, so that n − 1 ≥ 0 and n − 2 ≥ −1.
In particular, n − 1 ≥ 0 ≥ −1. Hence, our induction hypothesis yields that the claim (60)
holds for n − 1 instead of n. In other words, we have ℓn−1 = f (n−1)+2 = f n+1 .
Also, our induction hypothesis yields that the claim (60) holds for n − 2 instead of n (since
n − 2 ≥ −1). In other words, we have ℓn−2 = f (n−2)+2 = f n .
Now, Claim 1 yields ℓn = ℓn−1 + ℓn−2 = f n+1 + f n . But the recursive definition of the
|{z} |{z}
= f n +1 = fn
Fibonacci sequence also yields f n+2 = f n+1 + f n . Comparing these two equalities, we find
ℓn = f n+2 . In other words, (60) holds for n. This completes the induction step. Thus, (60) is
proved.
Math 221 Winter 2024, version March 12, 2024 page 261
Thus we have proved (60). In other words, we have proved Theorem 6.4.3
(because we have ℓn = (# of lacunar subsets of [n])).
for n in range(10):
print("For n = " + str(n) + ", the numbers are " + \
str([num_lacs(n, k) for k in range(n+1)]))
(where each entry is the # of lacunar k-element subsets of [n] for the correspond-
ing values of n and k, and where an empty box means that the corresponding
# is 0). The many 0’s are unsurprising (they are predicted by Proposition 6.4.2),
and likewise the values for k = 0 and k = 1 are clear (since every subset that
has size ≤ 1 is lacunar). But staring at the table for a bit longer reveals some-
thing subtler: It is a sheared Pascal’s triangle! For example, the n = 7 row
contains the numbers 1, 7, 15, 10, 1, which appear along a diagonal in Pascal’s
Math 221 Winter 2024, version March 12, 2024 page 262
triangle. All the entries are binomial coefficients, and a bit of work reveals the
exact formula:
n+1−k
(# of k-element lacunar subsets of [n]) = .
k
7+1−3
5
(# of 3-element lacunar subsets of [7]) = = = 10,
3 3
n+1−k
and observing that the # of k-element subsets of [n + 1 − k ] is (by
k
Theorem 6.2.4). Such a proof has the advantage of not just proving Theorem
6.4.5 but also explaining “why” it holds (at least if you consider it as a given
that binomial coefficients count k-element subsets).
This second proof rests upon a basic feature of finite sets of integers:
74 This
necessitates a bit of casework.
75 Tobe very pedantic: [Grinbe17, Exercise 3 (a)] only states Theorem 6.4.5 in the case when
n ∈ N. But the remaining case is trivial (since k ≤ n + 1 leads to k = 0 when n is negative,
and thus we have to count 0-element subsets of an empty set, which is not a deep question).
Math 221 Winter 2024, version March 12, 2024 page 263
This proposition is just saying that if you are given a k-element set S of
integers, then there is a unique way to list the elements of S in increasing order
(with no repetitions). Intuitively, this is clear (just write down the smallest
element of S, then the second-smallest element, then the third-smallest, and so
on, until you run out of elements; it’s not like you have any other options!).
But intuition is not proof. Nevertheless, we will not stoop down to this low a
foundational level here76 , and just take Proposition 6.4.6 for granted.
In connection with Proposition 6.4.6, we introduce a notation:
Thus, for example, {2 < 4 < 5} is the set {2, 4, 5}, whereas the expression
{4 < 2 < 5} is meaningless.
Proposition 6.4.6 can now be restated as follows: If k ∈ N, then any k-element
set of integers can be written in the form {s1 < s2 < · · · < sk } for a unique k-
tuple (s1 , s2 , . . . , sk ) of integers.
We are now ready to prove Theorem 6.4.5:
Proof of Theorem 6.4.5. Let m := n + 1 − k. Then, m = n + 1 − k ≥ 0 (since
k ≤ n + 1), so that [m] is an m-element set. Also, m = n + 1 − k = n − (k − 1),
so that m + (k − 1) = n.
←
−
Now, if S = {s1 < s2 < · · · < sk } is a k-element lacunar subset of [n], then S
shall mean the set
76 A boring and detailed (but ultimately very simple) proof of Proposition 6.4.6 can be found
in [Grinbe15, proof of Theorem 2.46].
Math 221 Winter 2024, version March 12, 2024 page 264
S 3 5 9 11
←
− 3 4 7 8
S
(note that each of the red arrows is slightly more horizontal than the previous
one).
We note the following properties of compression: If S = {s1 < s2 < · · · < sk }
←
−
is a k-element lacunar subset of [n], then its compression S is still a k-element
set (i.e., the compression process does not cause any two distinct elements to
“collide”) and can be written as
(since S is lacunar, so that any two “positionally adjacent” elements si and si+1
of S satisfy si < si+1 − 1 and thus si − (i − 1) < (si+1 − 1) − (i − 1) = si+1 − i).
←
−
Furthermore, S is a subset of [m] (because its smallest element is s1 ≥ 1 (since
s1 ∈ S ⊆ [n]), whereas its largest element is sk − ( k − 1) ≤ n −
|{z}
≤n
(since sk ∈S⊆[n])
(k − 1) = m). Thus, we can define a map
T 3 4 7 8
−
→ 3 5 9 11
T
(note that each of the red arrows is slightly more horizontal than the previous
one).
We note the following properties of expansion: If T = {t1 < t2 < · · · < tk } is
−
→
a k-element subset of [m], then its expansion T is still a k-element set (i.e., the
expansion process does not cause any two distinct elements to “collide”) and
can be written as
(since each i ∈ [k − 1] satisfies ti < ti+1 and thus ti + (i − 1) < ti+1 + (i − 1) <
−
→
ti+1 + i). Furthermore, T is a subset of [n] (because its smallest element is t1 ≥
1 (since t1 ∈ T ⊆ [m]), whereas its largest element is tk + ( k − 1) ≤
|{z}
≤m
(since tk ∈ T ⊆[m])
m + (k − 1) = n), and is lacunar (since the expansion process ensures that
the distance between any two “positionally adjacent” elements of T has been
−
→
increased by 1 in T , so they can no longer be consecutive integers). Thus, we
can define a map
6.4.5. A corollary
Combining Theorem 6.4.5 with Theorem 6.4.3, we obtain a curious formula for
the Fibonacci numbers in terms of binomial coefficients:
78 In
fact, each k-element subset T of [m] satisfies compress (expand T ) = T, because if we write
T as T = {t1 < t2 < · · · < tk }, then
expand T = expand ({t1 < t2 < · · · < tk }) = {t1 < t2 + 1 < t3 + 2 < · · · < tk + (k − 1)}
and therefore
A similar argument shows that any k-element lacunar subset S of [n] satisfies
expand (compress S) = S.
Math 221 Winter 2024, version March 12, 2024 page 267
which is indeed true. Of course, the three summandsthat are 0 could just
n n−k
as well be excluded from the sum, and the sum ∑ in Corollary
k =0 k
⌊n/2⌋ n − k n−k
6.4.8 could be replaced by the smaller sum ∑ (since =0
k =0 k k
whenever ⌊n/2⌋ < k ≤ n); but I find it more important to keep the sum
simple than to minimize the number of its addends.
Proof of Corollary 6.4.8. It is easy to see that any subset of [n − 1] has a size
between 0 and n (inclusive)79 . (Actually, it cannot have size n unless n = 0,
but I find it more convenient to nevertheless include the “unnecessary” value
n among the theoretically possible sizes; I am not saying that all of these sizes
actually are achievable.)
Now, from n ∈ N, we obtain n ≥ 0, thus n − 1 ≥ −1. Hence, Theorem 6.4.3
(applied to n − 1 instead of n) yields
79 Proof.
Let T be a subset of [n − 1]. We must show that T has a size between 0 and n (inclusive).
In other words, we must prove that | T | ∈ {0, 1, . . . , n}.
However, we have T ⊆ [n − 1] ⊆ [n] and therefore | T | ≤ |[n]| (by Theorem 6.1.12 (a),
applied to S = [n]). Hence, | T | ≤ |[n]| = n. Since | T | is a nonnegative integer, we thus
obtain | T | ∈ {0, 1, . . . , n}, as desired.
Math 221 Winter 2024, version March 12, 2024 page 268
Therefore,
Of course, the same problem can be asked for n × m-rectangles for arbitrary
n and m, but we shall focus on the case n = 2 (that is, a rectangle of height
2). (See [Grinbe19a, §1.1] for some references on the much harder cases when
n > 2.)
It turns out that the ways to tile a 2 × m-rectangle with dominos are in bi-
jection with the lacunar subsets of [m − 1]. Indeed, if T is a way to tile the
2 × m-rectangle, then we let C (T ) be the set of all columns (counted from the
left) in which horizontal dominos of T start (where we say that a horizontal
domino is a domino of height 1 and width 2, and it starts in the leftmost of the
two columns that it spans). For example, if T is the tiling shown above, then
Math 221 Winter 2024, version March 12, 2024 page 269
C (T ) = {2, 6, 8, 11}. Now, it is not hard to see (but not completely obvious; see
[Grinbe19a, §1.4.4, Second proof of Proposition 1.4.9]) that the map
p n = p n −1 + p n −3 + p n −4 for each n ≥ 4.
Exercise 6.4.2. A set S of integers shall be called self-starting if its size |S| is
also its smallest element. (For example, {3, 5, 6} is self-starting, while {2, 3, 4}
and {3} are not.)
Let n ∈ N.
(a) For any k ∈ [n], find the number of self-starting subsets of [n] having
size k.
(b) Find the number of all self-starting subsets of [n].
6.5.1. Compositions
How many ways are there to write the integer 5 as a sum of 3 positive integers,
if the order matters? Since 5 and 3 are not very large numbers, we can just list
all these ways:
These are exactly the 6 ways we found above (but written as 3-tuples).
(b) The compositions of 3 are
(c) The only composition of 0 is the empty list (), which is a 0-tuple. It is a
composition into 0 parts.
Let us now count compositions of n into k parts. (Later, we will count all
compositions of n.) Again, the answer turns out to be a binomial coefficient:
n−1
(# of compositions of n into k parts) = . (61)
n−k
n−1
(# of compositions of n into k parts) = . (62)
k−1
in this case, and we obtain (61) by comparing these two equalities. Thus, The-
orem 6.5.3 holds for n = 0 (because the equality (62) is claimed for n > 0
only).)
Thus, we only need to consider the case when n ̸= 0. Let us thus focus on
this case. From n ̸= 0, we obtain n ≥ 1 (since n ∈ N), thus n − 1 ∈ N.
For any composition a = ( a1 , a2 , . . . , ak ) of n into k parts, we define the partial
sum set C ( a) to be the set
{ a 1 , a 1 + a 2 , a 1 + a 2 + a 3 , . . . , a 1 + a 2 + · · · + a k −1 }
= { a1 + a2 + · · · + ai | i ∈ [k − 1]} .
a1 a2 ··· ak
0 s1 s2 ··· s k −1 n
80 and since all these addends are positive (because a composition has positive entries)
Math 221 Winter 2024, version March 12, 2024 page 272
Furthermore, it is not hard to see that this map C has an inverse81 , and thus is
a bijection. Hence, the bijection principle yields
Thus, both (61) and (62) have been proved. This completes the proof of Theorem
6.5.3.
We can also count all compositions of a given n:
Proof sketch. This can be proved using a similar argument as in Theorem 6.5.3
(but now we need to count all subsets of [n − 1]). See [Grinbe19c, Exercise 1
(b)] for details.
Note that Theorem 6.5.4 does not hold for n = 0 (since 0 has 1 composition,
1
but 20−1 = ).
2
81 This is easiest to see using the visual description of C ( a) that we showed above: Given a
(k − 1)-element subset I of [n − 1], we can use the elements of I to subdivide the interval
[0, n]R into k blocks. The lengths of these blocks (listed from left to right) form a composition
a of n into k parts, and this composition satisfies C ( a) = I. Moreover, this composition is
the only one with this property. Thus, the map that sends each (k − 1)-element subset I
of [n − 1] to the corresponding composition a (whose construction we just explained) is an
inverse map of C.
Rigorously, this can be restated as follows: For each (k − 1)-element subset I =
{i1 < i2 < · · · < ik−1 } of [n − 1] (where we are using Convention 6.4.7 again), we can define
a composition
A ( I ) : = ( i 1 − i 0 , i 2 − i 1 , i 3 − i 2 , . . . , i k −1 − i k −2 , i k − i k −1 ) ,
For instance:
0, 0, . . . , 0 , 1, 0, 0, . . . , 0 .
| {z } | {z }
any number of zeroes any number of zeroes
Here, “any number” allows for the possibility of “none”, and in particular
the 1-tuple (1) is a weak composition of 1.
n+k−1
(# of weak compositions of n into k parts) = .
n
Moreover, if n + k > 0 (that is, if n and k are not both 0), then
n+k−1
(# of weak compositions of n into k parts) = .
k−1
Math 221 Winter 2024, version March 12, 2024 page 274
( a1 + 1) + ( a2 + 1) + · · · + ( a k + 1) = ( a1 + a2 + · · · + a k ) +k
| {z }
=n
(since ( a1 ,a2 ,...,ak ) is a weak composition of n)
= n+k
is well-defined. These two maps are clearly inverses of each other (since adding
1 and subtracting 1 are inverse operations). Therefore, they are bijections. The
bijection principle thus yields
6.6. Selections
We now come back to a class of problems that we have posed at the start of
Chapter 4 (before Section 4.1) but haven’t fully answered yet: counting the
ways to select a bunch of elements from a given set.
To be more specific, these are problems of the following form: Given an n-
element set S, how many ways are there to select k elements from S (where n
and k are fixed nonnegative integers)?
The words “k elements” in this question are ambiguous, as they allow for
several interpretations:
1. Do we want k arbitrary elements or k distinct elements?
2. Does the order of these k elements matter or not? (In other words, would
“1, 2” and “2, 1” count as two different selections?)
In total, these decisions leave you with 4 options, leading to 4 different prob-
lems. In this section, we shall address them all.
Definition 6.6.3. Let S be any set, and let k ∈ N. Then, Sk shall mean the
Cartesian product
| ×S×
S {z· · · × S} = {( a1 , a2 , . . . , ak ) | a1 , a2 , . . . , ak ∈ S}
k times
= {k-tuples whose all entries belong to S} .
And indeed, there are 60 injective 3-tuples in {1, 2, 3, 4, 5}3 . For example,
(2, 5, 4) and (5, 3, 2) are two of them.
Note that the right hand side in Theorem 6.6.4 is precisely the numerator in
n
the definition of the binomial coefficient (Definition 2.4.1), and thus can be
k
n
rewritten as k! · (since k! is the denominator). Thus, the claim of Theorem
k
6.6.4 can be restated as
k
n
# of injective k-tuples in S = k! · .
k
Now, how do we prove the theorem? Let us first give an informal proof:
Informal proof of Theorem 6.6.4. Let us look at an example (which is representa-
tive of the general case): We let n = 5 and k = 3 and S = { a, b, c, d, e}. How
many injective k-tuples are there in Sk ? In other words (since k = 3): How
many injective 3-tuples are there in S3 ?
Such a 3-tuple has the form ( x, y, z), where x, y, z are three distinct elements
of S. Let us see how such a 3-tuple can be chosen:
1. First, we choose its first entry x. There are 5 options for this, since S has 5
elements (and x can be any of these 5).
2. Then, we choose its second entry y. There are 4 options for it, since y
can be any of the 5 elements of S except for x (because the injectivity of
( x, y, z) demands y to be distinct from x).
3. Finally, we choose its third entry z. There are 3 options for it, since z can
be any of the 5 elements of S except for x and y (because the injectivity
of ( x, y, z) demands z to be distinct from x and y) and since x and y are
already distinct.
Altogether, we have 5 options at the first step, then 4 options at the second
step (no matter which option has been chosen at the first step), and finally
Math 221 Winter 2024, version March 12, 2024 page 278
3 options at the third step. Altogether, we can therefore choose our 3-tuple in
5 · 4 · 3 many different ways, because the numbers of options multiply. Here, we
have used a counting rule called “dependent product rule”, which informally
says that if we perform a multi-step construction, and we have
• . . .,
n (n − 1) (n − 2) · · · (n − 0 + 1) = (empty product) = 1,
We have assumed that P (k − 1) holds. In other words, for all n ∈ N and all
n-element sets S, we have
k −1
# of injective (k − 1) -tuples in S
= n ( n − 1) ( n − 2) · · · ( n − ( k − 1) + 1) . (63)
(Again, the question mark atop the equality sign reminds us that this is not
proved yet.)
Let s1 , s2 , . . . , sn be the n elements of S (listed without repetition). Then, any
k-tuple in Sk ends82 with exactly one of s1 , s2 , . . . , sn . Hence, by the sum rule,
we have
k
# of injective k-tuples in S
= # of injective k-tuples in Sk that end with s1
+ # of injective k-tuples in Sk that end with s2
+···
+ # of injective k-tuples in Sk that end with sn
n
= ∑ # of injective k-tuples in S that end with si .
k
(64)
i =1
(which removes the last entry from our k-tuple and leaves the other entries as
they are)83 . Conversely, we have a map
n o n o
injective (k − 1) -tuples in (S \ {si })k−1 → injective k-tuples in Sk that end with si ,
( . . . ) 7 → ( . . . , si )
(which inserts an si after the end of a (k − 1)-tuple; the result is still injec-
tive84 )85 . These two maps are clearly inverses of each other86 , and thus are
bijections. Hence, the bijection principle yields
k
# of injective k-tuples in S that end with si
= # of injective (k − 1) -tuples in (S \ {si })k−1 .
Forget that we fixed n and S. We thus have proved that for all n ∈ N and all
n-element sets S, we have
k
# of injective k-tuples in S = n (n − 1) (n − 2) · · · (n − k + 1) .
Corollary 6.6.6 is one of the reasons why factorials are ubiquitous in com-
binatorics. The n! ways to list the n elements of a given n-element set S are
sometimes called the “permutations” of S, but this name is more frequently
used for the bijective maps from S to S. (The # of the latter maps is also n!, and
the two concepts are closely related. For details, see [Grinbe22, §1.7.4 in Lecture
13]. See also [Grinbe22, Lectures 26–28] for much more about permutations.)
Exercise 6.6.1. (a) How many 7-digit numbers are there? (A “k-digit number”
means a nonnegative integer that has k digits when written in the decimal
system (without leading zeroes). For example, 3902 is a 4-digit number, not
a 5-digit number.)
(b) How many 7-digit numbers are there that have no two equal digits?
(c) How many 7-digit numbers have an even sum of digits?
(d) How many 7-digit numbers are palindromes? (A “palindrome” is a
number such that reading its digits from right to left yields the same number.
For example, 5 and 1331 and 49094 are palindromes.)
[If your answer is a product or power, you do not need to simplify it to a
number.]
If we care about their order, then we are just counting all k-tuples in Sk . The
answer to this question is simple:
1. We can define the notion of a multiset, which is “like a finite set but
allowing an element to be contained multiple times”. This is done, e.g., in
[Grinbe22, §2.9 (Lectures 21–22)] or (in more detail) in [Grinbe19a, §2.11].
Then, a selection of k arbitrary elements from a set S, disregarding the
order, can be formalized as a size-k multisubset of the set S.
2. Alternatively, we can define the notion of an unordered k-tuple, which
is “a k-tuple up to reordering its entries”. Formally, these unordered
k-tuples are defined as the equivalence classes of usual (i.e., ordered) k-
tuples with respect to a certain equivalence relation. (See, e.g., [Grinbe19a,
Example 3.3.24] for the details.) Then, a selection of k arbitrary elements
from a set S, disregarding the order, can be formalized as an unordered
k-tuple of elements of S.
Math 221 Winter 2024, version March 12, 2024 page 284
(# of all ways to select k elements from S (if order does not matter))
k+n−1
=
k
Informal proof of Theorem 6.6.9 (sketched). For the sake of simplicity, we assume
that S = [n] (since otherwise, we can rename the n elements of S as 1, 2, . . . , n).
Then, as we said above, a selection of k arbitrary elements from S = [n] (disre-
garding the order) can be defined as a weakly increasing k-tuple in Sk . But a
weakly increasing k-tuple in Sk must always look as follows:
1, 1, . . . , 1, 2, 2, . . . , 2, . . . , n, n, . . . , n
| {z } | {z } | {z }
a1 many 1’s a2 many 2’s an many n’s
This proves Theorem 6.6.9 (because these weakly increasing k-tuples in Sk are
the ways to select k elements from S (if order does not matter)).
For a rigorous proof, see [Grinbe19a, Corollary 2.11.3] (but note that the
meanings of the letters n and k are switched in [Grinbe19a, Corollary 2.11.3]).
We have now solved all our four selection problems. We now come to a
different counting problem.
Thus, let us try a new strategy. The word “bookkeeper” has 10 letters.
Hence, any anagram of it is a 10-letter word as well. Its letters are
a1 copies of s1 ,
a2 copies of s2 ,
...,
an copies of sn
is
n
( a1 + a2 + · · · + a n ) ! a k + a k +1 + · · · + a n
=∏ .
a1 ! · a2 ! · · · · · a n ! k =1
ak
Informal proof (sketched). Follow the same logic as we used for “bookkeeper”
above. To construct such a tuple, we
• first
choose the positions for the a1 many s1 ’s among its entries (there are
a1 + a2 + · · · + a n
many options for this);
a1
• then
choose the positions for the a2 many s2 ’s among its entries (there are
a2 + a3 + · · · + a n
many options for this);
a2
Math 221 Winter 2024, version March 12, 2024 page 289
• then
choose the positions for the a3 many s3 ’s among its entries (there are
a3 + a4 + · · · + a n
many options for this);
a3
• and so on, until finally
choosing
the positions for the an many sn ’s among
an
its entries (there are many options for this).
an
a1 + a2 + · · · + a n a2 + a3 + · · · + a n a3 + a4 + · · · + a n
an
···
a1 a2 a3 an
n
a + a k +1 + · · · + a n
=∏ k
k =1
ak
n
( a k + a k +1 + · · · + a n ) !
by the factorial formula
=∏
a ! (( ak + ak+1 + · · · + an ) − ak )!
k =1 k
(Theorem 2.5.3)
n
(a + a + · · · + an )!
= ∏ ak ! (akk+1 +k+a1k+2 + · · · + an )!
k =1
n
∏ ( a k + a k +1 + · · · + a n ) !
k =1
= n
n
∏ ak ! ∏ ( a k +1 + a k +2 + · · · + a n ) !
k =1 k =1
( a1 + a2 + · · · + a n ) ! · ( a2 + a3 + · · · + a n ) ! · · · · · a n !
= n
∏ ak ! (( a2 + a3 + · · · + an )! · ( a3 + a4 + · · · + an )! · · · · · an ! · 0!)
k =1
( a1 + a2 + · · · + a n ) !
here, we have cancelled factors that appear
=
n
both in the numerator and the denominator
∏ ak ! · 0!
k =1
( a1 + a2 + · · · + a n ) !
= n (since 0! = 1)
∏ ak !
k =1
( a1 + a2 + · · · + a n ) !
= .
a1 ! · a2 ! · · · · · a n !
This proves Theorem 6.7.1.
(For a rigorous proof, see [Grinbe19a, Proposition 2.12.13]. Note that the
objects s1 , s2 , . . . , sn are required to be 1, 2, . . . , n in [Grinbe19a, Proposition
2.12.13], but this makes no serious difference, since we can always rename them
at will.)
Math 221 Winter 2024, version March 12, 2024 page 290
Remark 6.7.2. We can now answer the question “how many prime factor-
izations does a given number have?” from Subsection 3.6.12. For example,
consider the number 600 = 23 · 3 · 52 . A prime factorization of 600 is a tuple
that consists of three 2’s, one 3 and two 5’s, in an arbitrary order. Thus, the
6!
# of such prime factorizations is (by Theorem 6.7.1). Similarly, we
3! · 1! · 2!
can proceed for any positive integer instead of 600.
Proof. Nice and fairly easy exercise! (See [Grinbe19a, Exercise 2.12.6] for a
proof.)
n
Just like the binomial coefficients with n ∈ N and k ∈ {0, 1, . . . , n} can
k
b
be arranged into Pascal’s triangle, the multinomial coefficients
a1 , a2 , . . . , a n
(for a given n) can be arranged into an n-dimensional analogue of Pascal’s
triangle, called Pascal’s simplex (or, for n = 3, Pascal’s pyramid). Theorem
6.7.3 then says that each entry in this simplex (except for the 1 at the apex) is
the sum of its n adjacent entries just above it.
Multinomial coefficients owe their name to another fundamental property
they satisfy: a generalization of the binomial formula, called the multinomial
formula:
Proof. See [Grinbe19a, Theorem 2.12.17] (which gives two references). Here is
the simplest proof in a nutshell:
We expand ( x1 + x2 + · · · + xn )b and collect equal terms. For instance, if
n = 2 and b = 3, then
( x1 + x2 + · · · + x n ) b
= ( x1 + x2 )3
= ( x1 + x2 ) ( x1 + x2 ) ( x1 + x2 )
= x1 x1 x1 + x1 x1 x2 + x1 x2 x1 + x1 x2 x2 + x2 x1 x1 + x2 x1 x2 + x2 x2 x1 + x2 x2 x2
= x13 + 3x12 x2 + 3x1 x22 + x23 .
Math 221 Winter 2024, version March 12, 2024 page 292
What terms do we get for general n and b ? Well, if we expand the product
( x1 + x2 + · · · + x n ) b
= ( x1 + x2 + · · · + x n ) ( x1 + x2 + · · · + x n ) · · · ( x1 + x2 + · · · + x n ),
| {z }
b times
Remark 6.7.5. We note that Theorem 6.7.3 can be used to give a second proof of
Theorem 6.7.1. Here is a rough outline of this proof:
A tuple that consists of
a1 copies of s1 ,
a2 copies of s2 ,
...,
an copies of sn
s1 s2 · · · s n
will be called an -tuple. Thus, Theorem 6.7.3 is claiming that
a1 a2 · · · a n
s1 s2 · · · s n b
the # of -tuples is , where b := a1 + a2 + · · · + an .
a1 a2 · · · a n a1 , a2 , . . . , a n
We shall now prove this by induction on b. The base case (b = 0) is trivial (since
b = 0 entails a1 = a2 = · · · = an = 0, so we are counting 0-tuples). In the induction
s1 s2 · · · s n
step (from b − 1 to b), we separate the -tuples according to their
a1 a2 · · · a n
last entry (just as in our above rigorous proof of Theorem 6.6.4). This last entry is
either s1 or s2 or · · · or sn . Hence, the sum rule yields
s1 s2 · · · s n
# of -tuples
a1 a2 · · · a n
n
s1 s2 · · · s n
=∑ # of -tuples that end with si
i =1
a1 a2 · · · a n
| {z
}
s 1 s 2 · · · s i −1 si s i +1 · · · s n
=# of -tuples
a 1 a 2 · · · a i −1 a i − 1 a i +1 · · · a n
(by a bijection argument, just as in the proof of Theorem 6.6.4,
using the bijection that removes the last entry from a tuple)
n
s1 s2 · · · s i −1 si s i +1 · · · s n
= ∑ # of
a1 a2 · · · a i −1 a i − 1 a i +1 · · · a n
-tuples
i =1 | {z }
b−1
=
a1 , . . . , ai−1 , ai − 1, ai+1 , . . . , an
(by the induction hypothesis if ai >0, and for obvious reasons if ai =0)
n
b−1
=∑
i =1
a1 , . . . , ai−1 , ai − 1, ai+1 , . . . , an
b
= (by Theorem 6.7.3) ,
a1 , a2 , . . . , a n
which completes the induction step. This proof is less conceptual than the proof
we sketched above, but it is easier to formalize, since it does not use the dependent
product rule.
Math 221 Winter 2024, version March 12, 2024 page 294
f ′ : Fix ( g ◦ f ) → Fix ( f ◦ g) ,
x 7→ f ( x ) .
Construct a similar map g′ in the opposite direction. Prove that these two
maps f ′ and g′ are inverse to each other.]
Math 221 Winter 2024, version March 12, 2024 page 295
[Hint: What does the pseudolacunarity of a set S mean for the even ele-
ments of S ? What does it mean for the odd elements of S ?]
Theorem 6.9.1 (pigeonhole principles for maps). Let X and Y be two finite
sets. Let f : X → Y be a map. Then:
(a) If | X | > |Y |, then f cannot be injective.
(b) If f is injective and | X | = |Y |, then f is bijective.
Math 221 Winter 2024, version March 12, 2024 page 296
90 Butbeware of extending your intuition to infinite sets! It is easy to construct an injective but
not surjective map f : N → N.
Math 221 Winter 2024, version March 12, 2024 page 297
References
[AlNoWo19] Michael H. Albert, Richard J. Nowakowski, David Wolfe, Lessons
in Play: An Introduction to Combinatorial Game Theory, 2nd edition,
CRC Press 2019.
[Grinbe17] Darij Grinberg, UMN Fall 2017 Math 4707 & Math 4990 homework set
#2 with solutions, http://www.cip.ifi.lmu.de/~grinberg/t/17f/
hw2s.pdf
[Grinbe19c] Darij Grinberg, Drexel Fall 2019 Math 222 homework set #0 with solu-
tions, http://www.cip.ifi.lmu.de/~grinberg/t/19fco/hw0s.pdf
[Grinbe23a] Darij Grinberg, An introduction to graph theory (Text for Math 530 in
Spring 2022 at Drexel University), arXiv:2308.04512v1.
[Grinbe23b] Darij Grinberg, Math 235: Mathematical Problem Solving, Fall 2023,
worksheets.
https://www.cip.ifi.lmu.de/~grinberg/t/23f/