0% found this document useful (0 votes)
24 views108 pages

Main

Uploaded by

episcopiei3asoc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views108 pages

Main

Uploaded by

episcopiei3asoc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Counting Strategies

Hiram Golze and Matthew Babbitt

June 2020
2
Contents

1 Counting Review 3
1.1 The Addition and Multiplication Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Permutations, Word Rearrangements, and the Multinomial Theorem . . . . . . . . . . . . . . 4
1.3 Stars and Bars: An Application of Word Rearrangements . . . . . . . . . . . . . . . . . . . . 5
1.4 Complementary Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Combinatorial Identities 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Committee Forming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Blockwalking and Pascal’s Triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Other Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Principle of Inclusion-Exclusion 19
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Connecting PIE with Complementary Counting . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Indistinguishability and Rotational Symmetry 25


4.1 Tables and Necklaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Indistinguishable Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Burnside’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Recursion 31
5.1 Tiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Counting Recursively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3 Theory of Linear Recurrences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 Bijections 41
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.3 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.4 The Catalan Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7 Invariants and Monovariants 49


7.1 Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.2 Monovariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

8 The Pigeonhole Principle 55


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.2 Trickier Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3
4 CONTENTS

9 Double Counting 61
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

10 Generating Functions 1 65
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
10.1.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
10.2 Appendix: Partial Fraction Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

11 Generating Functions 2 75
11.1 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
11.2 Roots of Unity Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
11.3 Multiple Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11.4 Catalan Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

12 Probability 83
12.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
12.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
12.3 Geometric Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

13 Expected Value 89
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
13.2 Linearity of Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
13.3 The Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

14 Graph Theory 1 - Introduction 97


14.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
14.2 Parts of a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
14.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

15 Graph Theory 2 - Planarity and Connectedness 101


15.1 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
15.2 Planarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Chapter 1

Counting Review

Combinatorics is fundamentally the study of counting objects. The most basic way of accomplishing this is
enumeration: list the objects out in a row, and start chanting “1, 2, 3, . . . ” until you run out of objects. The
question of counting becomes harder, and therefore more interesting, when we twist the problem in different
ways, for example:
• Counting a list where we are only given a method of constructing individual objects,
• Counting a list of familiar objects with restrictions placed on them,
• Solving a problem phrased in terms of algebra or number theory, which can be re-interpreted as a counting
problem.
These complications efficiently create very difficult problems, so in this first chapter we take the opportunity
to review basic techniques that you should be familiar with before continuing. If this material is new to you,
we highly recommend that you switch to a gentler combinatorics course.

1.1 The Addition and Multiplication Principles


The two most basic counting ideas that students learn first are the addition and multiplication principles.
Recall that given a set S, we denote the size of S, or the cardinality of S, by |S|.
Proposition 1.1.1. (Addition Principle) Given two disjoint (empty intersection) sets A and B, their union
A ∪ B will have |A| + |B| elements.
Proposition 1.1.2. (Multiplication Principle) Given two (not necessarily disjoint) sets A and B, let P be
the set of all pairs (a, b), where a ∈ A and b ∈ B. Then P will have |A| · |B| elements.
These propositions are so basic that their names are taken for granted, and are used implicitly. For exam-
ple, look at the following example, and see where you can find applications of the addition and multiplication
principles.
Example 1.1.1.
Find the number of ways of choosing three vertices of a cube so that they form the vertices of a right triangle.
Solution: Assume, without loss of generality, that we have a unit cube. Then any segment √ between
√ two

vertices must either be a cube edge, a face diagonal, or a space diagonal. These have
√ lengths
√ √ 1, √2, √
and√ 3
respectively. Therefore, the only possible side lengths for the right triangle are 1- 1- 2 and 1- 2- 3.
We count √ these
√ √ two cases separately.
The 1- 1- 2 triangles must be one half of a face of the cube. The cube has six faces, and there are
four ways √ one half of the cube to be a right triangle, so there are 6 · 4 = 24 triangles in this case.
√ to√choose
The 1- 2- 3 triangles must include a space diagonal. Once we choose the space diagonal, we need
to choose one cube edge adjacent to this diagonal. Once we make this choice, the face diagonal connecting

5
6 CHAPTER 1. COUNTING REVIEW

these two segments is uniquely determined. Therefore the number of such triangles is equal to the number
of ways to choose a space diagonal, and an adjacent cube edge. There are only four space diagonals, and
there are six cube edges adjacent to it, so there are 4 · 6 = 24 right triangles in this case.
The two cases above are disjoint, so the total number of right triangles among the vertices of the cube is
24 + 24 = 48 . 

1.2 Permutations, Word Rearrangements, and the Multinomial


Theorem
A popular archetype of combinatorial problem is counting rearrangements. When not dealing with restric-
tions, the problem is stated thus:
Example 1.2.1. How many ways are there to rearrange the numbers in the list
1, 2, 3, . . . , n?
The way to count these arrangements is to construct one iteratively, element by element. There are n
numbers we can choose to be the first list element. After we choose this first element, there are n−1 numbers
remaining, for each choice of n, for the second element. Therefore there are n(n − 1) ways to choose first
two elements. Continuing, we have
n · (n − 1) · (n − 2) · · · 2 · 1 = n!
total ways to rearrange the n things in the list.

Perhaps the most famous counting problem of all time is an application of the idea of rearrangements,
and falls under the category of word problems.
Example 1.2.2. How many ways are there to arrange the letters in the word “MISSISSIPPI?”
Despite there being eleven letters, the answer is not 11!, since some of the letters are now indistinguish-
able. The method that we use to count these rearrangements must take this into account.

Solution: Call M ISSISSIP P I by W1 . We first count the number of arrangements of


W2 = M I1 S1 S2 I2 S3 S4 I3 P1 P2 I4 ,
where we add a subscript to each letter. Since the letters are now distinct, there are 11! rearrangements of
this new “word”.
Now we connect the rearrangements of this new word to the rearrangements of “MISSISSIPPI”. Notice
that there are four I’s, and there are 4! = 24 ways to arrange them when they are distinguishable. When
we make them indistinguishable like in W3 = M IS1 S2 IS3 S4 IP1 P2 I, we lower the number of arrangements
by a factor of 24, since for each arrangement of W3 we have 24 arrangements of W2 . We conclude that the
1
number of arrangements of W3 is 4! · 11!.
Similarly, since there are four copies of S and two copies of P in MISSISSIPPI, we need to reduce the
count by additional factors of 41 and 2!, the number of arrangements of this word is
11!
= 34650.
4! · 4! · 2!
We can also write this as 11!/(4! · 4! · 2! · 1!), counting the singular M . 
n

The problem of rearranging indistinguishable objects is related to binomials: k is the number of ways
to rearrange k A’s and n − k B’s, and is given by
 
n n!
= .
k k! · (n − k)!
However, the ability to easily count rearrangements for words with many different letters encourages us to
extend this definition.
1.3. STARS AND BARS: AN APPLICATION OF WORD REARRANGEMENTS 7

Definition 1.2.1. A multinomial coefficient is defined by non-negative integers n1 , n2 , . . . , nk , and their


P
k
total N = n` . We denote it and define it as
`=1
 
N N!
= .
n1 , n2 , . . . , nk n1 ! · n2 ! · · · nk !
Recall that the Binomial Theorem is stated in terms of binomial coefficients:

Theorem 1.2.1. (Binomial Theorem) For every positive integer n, the expansion of (x + y)n is given by
Xn  
n n k n−k
(x + y) = x y .
k
k=0

We can easily generalize this to multinomials with our new definition.

Theorem 1.2.2. (Multinomial Theorem) For all positive integers n and k, the expansion of (x1 + x2 + · · · +
xk )n is given by
X  
n
(x1 + x2 + · · · + xk )n = xn1 xn2 · · · xnk k .
n1 , n2 , . . . , nk 1 2
n1 ,n2 ,...,nk ≥0
n1 +···+nk =n

1.3 Stars and Bars: An Application of Word Rearrangements


Stars and Bars is a phrase that is used to refer to a family of problems that, with some work, can be reduced
to rearranging a string of stars ? and bars . This problem is quite easy, but the difficulty in this method
stems from having to recognize the relationship between the problem setup, and arranging stars and bars.

Example 1.3.1. The number of ways to arrange n stars ? and k bars is n+k k .

To solve the following example, we are going to relate it to the previous one.
Example 1.3.2.
(a) What is the connection between part (b) and arranging stars and bars?
(b) How many non-negative integer solutions (a, b, c, d) are there to a + b + c + d = 10?
Solution:
(a) Let us think about what stars and bars can represent. Stars can represent different objects, and bars
are normally thought of as separators. So, if we have some solution (a, b, c, d) = 10, we can list out the
values of the variables with stars, and separate them with bars. So there are a stars before the first bar,
b between the first and second, c between the second and third, and d after the third. In other words:

Each solution (a, b, c, d) = 10 corresponds to an arrangement of 10 stars and 3 bars.

We can see that different arrangements of 10 stars and 3 bars give different solutions (a, b, c, d), and
each solution can be obtained by some arrangement (since we can go from solutions to arrangements).
Therefore, the number of arrangements must be equal to the number of solutions.

(b) The number of ways to arrange 10 stars and 3 bars is 10+33 = 286, so there are 286 non-negative
integer solutions to a + b + c + d.

An immediate question we can ask is: if we change the conditions of the problem, will we still be able to
use the same method? The answer is, in most cases, yes.
8 CHAPTER 1. COUNTING REVIEW

Example 1.3.3. Find the number of positive integer solutions (a, b, c, d) to a + b + c + d = 10.

Solution: We present two methods to approaching this problem.

Method 1: When we consider our arrangement of 10 stars and 3 bars, we need the spaces before and
after each bar to have at least one star. This is equivalent to having the bars each adjacent to two stars. To
construct such an arrangement, we first place 10 stars in a row:

? ? ? ? ? ? ? ? ??

Then we need to place the three bars in three different slots between the stars. The stars form 9 slots, so this
gives 93 = 84 arrangements. These are the arrangements that correspond to positive solutions (a, b, c, d),
so there are 84 positive integer solutions to a + b + c + d = 10.

Method 2: Let a = i + 1, b = j + 1, c = k + 1, and d = ` + 1. Then i, j, k, and ` form a non-negative


integer solution to the equation
i + j + k + ` + 4 = 10.
Thus we are looking for the non-negative
 solutions to i + j + k + ` = 6, and from an analogous argument as
in the last example, there are 6+3
3 = 84 of these. 
Notice that the second method generalizes very well: if we are looking for integer solutions (a, b, c, d)
to some equation a + b + c + d = n under the condition that each variable is at least k, we can make a
substitution to rebound each variable to be non-negative instead, which reduces the problem to a previous
one.
There are many different settings for stars and bars to show up, but they usually rest on finding integer
solutions to some restriction. Here are some examples of other restrictions:

• Finding non-negative solutions to a + b + c ≤ n. This is equivalent to finding all non-negative solutions to


a + b + c + d = n, and we just let d be the difference between a + b + c and n.

• Finding the number of solutions to 0 ≤ a ≤ b ≤ c ≤ n. In this case, we can consider the differences
d1 = a − 0, d2 = b − a, d3 = c − b, and d4 = n − c. We need these differences to sum to n, and each
solution to d1 + d2 + d3 + d4 = n corresponds to a unique solution to 0 ≤ a ≤ b ≤ c ≤ n.

• Finding solutions with parity restrictions, i.e. a + b + c + d = 10 given that the variables are odd. This is
done similarly as finding positive solutions, i.e. by letting a = 2i + 1, b = 2j + 1, c = 2k + 1, and d = 2` + 1.

• Finding the number of solutions under an upper-bound restriction, i.e. a + b + c + d + e + f = 30 given that
the variables are between 1 and 6 inclusive. Here we can take advantage of the symmetry of the bounds:
letting a = 7 − i, etc. gives a new equation: we need to find six positive integers at most 6 that sum to
12. This is easier to count.

Before we continue, we issue the following warning.

Warning: There is no stars and bars “formula”. If you solve a proof problem and your entire solution is
“By stars and bars the answer is 192”, then your proof will earn no points. This is because stars and bars is
a method, not a formula. The method is to reinterpret the problem as finding the number of arrangements
of stars and bars. The restrictions on your arrangements will depend on how you make them correspond
to the problem. As we saw with the non-negative and positive integer solutions to the linear equation
a + b + c + d = 10, there are different ways to relate problems to stars and bars. In your solution, you must
clearly explain how your objects correspond to stars and bars rearrangements.

Example 1.3.4. (AIME; 2008) There are two distinguishable flagpoles, and there are 19 flags, of which 10
are identical blue flags, and 9 are identical green flags. Let N be the number of distinguishable arrangements
using all of the flags in which each flagpole has at least one flag and no two green flags on either pole are
adjacent. Find the remainder when N is divided by 1000.
1.3. STARS AND BARS: AN APPLICATION OF WORD REARRANGEMENTS 9

Solution: Since the green flags are not allowed to be adjacent, they will probably represent bars, and the
blue flags will represent stars. However, we’ve been given the added complication that we need to distribute
the blue and green flags between the poles. For each distribution, we would need to solve two stars and bars
problems, and then we would need to enumerate all distributions. It would be much more convenient if all
of the flags were on one pole.

To put all the flags on one pole, we’re going to create a process of combining the two poles into one, in
such a way that retains the number of arrangements. One way we could try to do this is to put one flagpole
on top of another, but we would then lose the information of where one pole ends and another pole begins.
We account for this by including a twentieth red flag: this red flag will indicate where the first pole ends and
the second pole begins. For example, the following two flagpoles would be combined in the following way:

B
G

B B
B

B G
B

G B G

Pole A Pole B Combined

We then have the following restrictions:

(a) No two green flags may be adjacent,

(b) Since each flagpole must have at least one flag, the red separator flag may not be on either end of the
combined flagpole.

We can confirm that all of these flag arrangements on the combined flagpole correspond to all the flag
arrangements on the original two flagpoles.
Now suppose hypothetically that the red flag were actually blue. Then we would need to find the number
of ways to place 9 green flags among 11 blue ones so  they are not adjacent. The blue flags form 12 slots,
including the slots at the end, so we would have 12 9 arrangements. Once we change the blue flag back to
red, and ignore condition (b), then there  are 11 ways to arrange the 10 blue and 1 red flags once we place
the green ones, so there would be 11 12 9 = 2420 arrangements.
Unfortunately, the red flag not being at the ends will complicate our counting process: if green flags are
at both ends, then the red flag clearly satisfies the condition and all 13 arrangements of blue and red flags
work, but if the ends are not green, then they could either be blue or red, so some of the 13 arrangements will
not give valid flag placements. We get around this in the following way: we count all of the arrangements
satisfying (a), regardless of whether they satisfy (b), and then we count all of the arrangements satisfying
(a) but break (b).
10 CHAPTER 1. COUNTING REVIEW

The number of arrangements that satisfy (a) is 2420, as determined before. If we wanted to break (b),
there are two ways to do it: either the red flag is on the bottom, or the top. If we place the red flag on
the bottom, then the relative arrangements of red and blue flags are fixed, and we need to place the 9
green
 flags into the 11 remaining slots (since we can’t place any green flags below the red one). This gives
11
9 = 55 arrangements. We also get 55 arrangements if the red flag is on the top, so there are 110 total
arrangements that satisfy (a) but not (b). Therefore the number of arrangements that satisfy both (a) and
(b) is 2420 − 110 = 2310. The remainder when this is divided by 1000 is 310 . 
Notice that in the previous problem, we counted some group of objects satisfying some condition by
figuring out how many didn’t satisfy that condition. This leads into our final review technique.

1.4 Complementary Counting


Complementary counting is the idea that if you have a set of objects S, and a subset T ⊆ S that you want
to count, it is equivalent to count the number of elements in S that are not in T , i.e. it is equivalent to find
|S \ T |. Then you can subtract this from |S| to get |T |. This technique is most useful when T is hard to
count, but the complement set S \ T is not.

Example 1.4.1. (AIME; 2004) A convex polyhedron P has 26 vertices, 60 edges, and 36 faces, 24 of
which are triangular, and 12 of which are quadrilaterals. A space diagonal is a line segment connecting two
non-adjacent vertices that do not belong to the same face. How many space diagonals does P have?

Solution: We can easily count the number of segments connecting two arbitrary vertices: this is 26 2 =
325. However, counting space diagonals is much more involved, because we don’t know the geometry of P .
Instead, we count all of the segments that are not space diagonals, and subtract this from our total of 325.
The segments that are not space diagonals are either edges, or face diagonals. There are 60 edges. Also,
triangles have 0 interior diagonals and quadrilaterals have two interior diagonals, hence there are a total of
2 · 12 = 24 face diagonals. Thus the total number of segments that do not appear on the surface of the
polyhedron is 325 − 60 − 24 = 241 . This is the number of space diagonals. 
Complementary counting can be a powerful problem-solving tool in applicable problems. However, prob-
lems using this technique aren’t always apparent. Here are a couple of pointers for recognizing complementary
counting problems:

• If a problem asks you to count the objects that satisfy at least one of several requirements, then the
complement is the objects that satisfy none of the requirements. This can be much easier to count, as
otherwise you would have to figure out which objects satisfy which combinations of conditions.
• If a problem gives a restriction that’s only broken by a small number of objects, as in the previous example,
then it is probably a good idea to directly count those complementary objects.

This list is not comprehensive, but as you solve more combinatorics problems, you will get better at recog-
nizing ones that can be solved with complementary counting.
Chapter 2

Combinatorial Identities

2.1 Introduction
A combinatorial identity is an equation that is always true that can be proven by appealing to counting
arguments. In a combinatorial proof, we generally try to show that two expressions are equal by showing
that they count the same thing. Combinatorial proofs are often more elucidating than algebraic proofs.
Combinatorial proofs allow us to prove statements with the use of storytelling.
In this lecture, we will discuss some of the important characters and methods of combinatorial arguments.

Definition 2.1.1. If n and k are nonnegative integers, we define nk to be the number of ways to pick k
objects from n objects if order does not matter. Sometimes, these are called binomial coefficients.

Having defined this, we will make our first combinatorial argument to prove the formula for nk .

Proposition 2.1.1. If n and k are nonnegative integers, then


 
n
n! = · k! · (n − k)!.
k

Proof. The left hand side counts the number of ways to order a set with n elements. On the  right hand
side, we begin by choosing k elements to occupy the first k positions, which can be done in nk ways. These
k elements can be ordered in k! ways. Additionally, we also need to order the remaining
 (n − k) elements.
This can be done in (n − k)! ways. So overall, we can order the n elements in nk · k! · (n − k)! ways. We
conclude that  
n
n! = · k! · (n − k)!.
k

n

From this, we obtain the algebraic formula for k as an easy corollary.

Corollary 2.1.2. If n and k are nonnegative integers, then


 
n n!
= .
k k!(n − k)!

Most of the time, we don’t evaluate every single factorial in this formula, because many terms cancel. In
fact, the following definition provides a nice way to formalize these cancellations.

Definition 2.1.2. For any real number α and nonnegative integer k, we can define the falling factorial (α)k
as
α · (α − 1) · (α − 2) · · · (α − (k − 1)).
This is the product of k numbers starting with α. By convention, (α)0 = 1 for all α.

11
12 CHAPTER 2. COMBINATORIAL IDENTITIES

With this definition, we can define binomial coefficients for more than just nonnegative integers.

Definition 2.1.3. Given a real number α and a nonnegative integer k, generalized binomial coefficients are
defined as  
α (α)k
= .
k k!
Note that these agree with our previous formulas, but they extend the meaning of binomial coefficients.

In particular, note that if k > n, then one of the integers in the falling factorial will be 0, so nk will be 0
when k > n.
While these formulas are useful in their own right, you are not allowed to use them today. Today, we
will focus on the art of using combinatorial arguments to prove things. While algebra can be a useful tool,
it is often less elegant and masks the true nature of mathematical truths.
Now we will go over some of the more common forms of stories told in combinatorial arguments.

2.2 Committee Forming


One of the most
 common stories that we tell when making a combinatorial argument is committee forming.
Note that nk counts the number of ways to form a committee of size k from a group of n people. In using
committee forming, one usually counts the same scenario differently, often taking advantage of casework.
Consider the following example.

Example 2.2.1. (Pascal’s identity) Prove that if n and k are nonnegative integers, then
     
n+1 n n
= + .
k+1 k+1 k

Proof. The left hand side counts the number of ways to form a committee of size k + 1 from n + 1 people. On
the other hand, suppose that one of the n + 1 people is named George. George has two possibilities—either
he is on the committee or off the committee.

• If George is not on the committee, then we


 still need to pick k + 1 people for the committee from the
n
other n people. This can be done in k+1 ways.

• If George is on the committee, then we need


 to pick the remaining k people for the committee from
the other n people. This can be done in nk ways.

Together, these two cases cover all possibilities. Hence matching the two different ways of counting the same
thing, we deduce that      
n+1 n n
= + .
k+1 k+1 k

Example 2.2.2. (Symmetric Identity) Prove that if n is a nonnegative integer and 0 ≤ k ≤ n, then
   
n n
= .
k n−k

Proof. The left hand side counts the number of ways to choose a committee of size k from a group of n
people. Alternatively, if we want k people to be on the committee, we need n − k people to be off the
committee, and then the remaining k peoplewill be on the committee. Choosing a group of n − k people
n
to be off the committee can be done in n−k ways. Therefore, since both situations end up with the same
result, we deduce that    
n n
= .
k n−k
2.2. COMMITTEE FORMING 13

Example 2.2.3. Prove that if n is a nonnegative integer, then


       
n n n n
+ + + ··· + = 2n .
0 1 2 n

Proof. Suppose that we have n people. Since nk counts the number of ways to form a committee of size k,
we see that the left hand side counts the number of ways to form a committee of any size from n people.
Alternatively, suppose we walk up to each person and ask, “Would you like to be on the committee?”
Each person clearly has 2 choices. Therefore, since each of the n people has 2 choices, there are a total of
2n ways to form a committee. Therefore,
       
n n n n
+ + + ··· + = 2n .
0 1 2 n

For the next identity, we will state it in a form that is easier to remember, though it is easier to prove
with a slight modification.
Proposition 2.2.1. (Absorption Identity) Prove that if n and k are positive integers, then
   
n n n−1
= .
k k k−1
Proof. Note that this is equivalent to proving
   
n n−1
k· =n· .
k k−1
To interpret the left side of the identity, we begin by selecting a committee of size k from n people, which
we can do in nk ways. After selecting the committee, we elect a chairperson of the committee from among
the committee members. The number of ways to select a committee of  size k from a group of n people, with
a chairperson chosen who is also on the committee is therefore n · nk .
On the other hand, instead of selecting the committee members first, we could pick the chairperson first.
The chairperson could be any of the n people. Since the chairperson must be on the committee, we have
k − 1 spots left on the committee, and we must choose those spots from n − 1 people. This can be done in
n−1
k−1 ways. Therefore, in our alternate count, the number of ways to select a committee 
of size k from a
group of n people with a chairperson chosen who is also on the committee is n · n−1 k−1 . It follows that
   
n n−1
k· =n· .
k k−1

In the previous proof, note that our first step was to multiply by k, thereby clearing denominators.
This is generally a good idea when making combinatorial arguments—addition and multiplication have
easy combinatorial interpretations (casework and independent events, respectively), while division is usually
harder to work with.
Example 2.2.4. Prove that if n ≥ 1, then
           
n n n n n n
+ + + ··· = + + + ··· .
0 2 4 1 3 5
Proof. The left hand side counts the number of ways to form a committee of even size, while the right hand
side counts the number of ways to form a committee of odd size. When n is odd, we can argue that choosing
a committee of even size is the same as choosing an anti-committee (everyone not on the committee) of odd
size. Therefore, when n is odd, the result is true.
However, when n is even, the complement of a committee of even size also has even size. To deal with
this, we will construct a committee as follows:
14 CHAPTER 2. COMBINATORIAL IDENTITIES

Step 1: Go to the first n − 1 people and ask them if they would like to be on the committee. Each person has
two choices: either they are on the committee or off the committee.
Step 2: The final person looks at the committee that has already been formed. They can choose to be either
on the committee or off the committee. However, one choice will lead to the committee having odd
size, and the other choice will lead to the committee having even size.
Therefore, the final person always has two choices: make the committee have even size or make the committee
have odd size. Hence every even committee corresponds to exactly one odd committee, so the two quantities
must be equal.
Note: One easy consequence of the above proof is that the number of odd and even committees that can
be formed from n people is 2n−1 , because each of the initial n − 1 people have 2 choices.

2.3 Blockwalking and Pascal’s Triangle


Our next form of story is that of blockwalking. In a blockwalking argument, we make a connection with
counting paths on a city block.
Example 2.3.1. How many ways can you travel from point A(0, 0) to point B(6, 4) using exactly 10 steps,
if each step has length 1? One such path is shown below.
B

Solution: While this might initially be challenging to count, it would be easier to count on a smaller grid.
For example, the number of 1-step paths from A to (1, 0) is 1, the number of 1-step paths from A to (0, 1) is
1. Next, the number of paths from A to (1, 1) is 2. We can iteratively construct these numbers, because any
path that ends at (x, y) must have passed through either (x, y − 1) or (x − 1, y) as the penultimate point.
Therefore, the number of paths ending at (x, y) is equal to the number of paths that pass through (x, y − 1)
plus the number of paths that pass through (x − 1, y). Using this idea, we can successively compute the
number of paths through each point, which we label in the diagram below.
1 5 15 35 70 126 210
B
1 4 10 20 35 56 84

1 3 6 10 15 21 28

1 2 3 4 5 6 7

1 1 1 1 1 1 1
A

It follows that there are 210 paths from A to B. 


However, if we changed the grid to a 40 × 60 grid, this would become significantly more tedious–we
would have to write out 41 · 61 different numbers. To deal with this, we will tell a blockwalking story.
Solution: Every path from A to B must consist of six right steps and four up steps. Therefore, we can
2.3. BLOCKWALKING AND PASCAL’S TRIANGLE 15

represent a path as a rearrangement of the word RRRRRRU U U U . For example, the sample path given
in the problem statement can be represented by RRU RU U RRU R. Conversely, every rearrangement of
the word RRRRRRU U U U will correspond to a valid path. The number of rearrangements of the word
RRRRRRU U U U is the number of ways that we can place 6 R’s into the following 10 spaces:

.
10

This is 6 . This evaluates to 210 as before. 
You may recognize the numbers in the first solution from Pascal’s triangle. In fact, if we turn a city block
diagonally, we will see Pascal’s triangle exactly.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
6 15 20 15 6
21 35 35 21
56 70 56
126 126
252

Using our combinatorial reimagining of blockwalking, where the number of paths from (0, 0) to (x, y) is
x+y
x , we can rewrite Pascal’s triangle in the following form.

0

0

1

1
0 1
2
 
2

2
0 1 2
3
 
3

3 3

0 1 2 3

4 4
 
4

4 4

0 1 2 3 4
5
 5
 
5

5 5
 
5
0 1 2 3 4 5

6 6
 
6

6 6

1 2 3 4 5
7
 
7

7 7

2 3 4 5
8
 
8

8
3 4 5

9

9
4 5
10

5

Example 2.3.2. Reprove the following identities using blockwalking:


  
(a) Pascal’s Identity: n+1 n
k+1 = k + k+1 .
n

 
(b) Symmetric Identity: nk = n−k n
.
   
(c) n0 + n1 + n2 + · · · + nn = 2n
16 CHAPTER 2. COMBINATORIAL IDENTITIES

Solution:

(a) n+1
k+1 represents the number of (n + 1)-step paths from (0, 0) to (k + 1, n − k). Note that any path that
ends at (k + 1, n − k) must have passed through either (k, n − k) or (k + 1, n − k − 1) in the previous
step. Therefore, the number of (n + 1)-step paths from (0, 0) to (k + 1, n − k) is equal to the number of
n-step paths from (0, 0) to (k, n − k) plus the number of n-step paths from (0, 0) to (k + 1, n − k − 1).
Hence      
n+1 n n+1
= + .
k+1 k k+1

(b) nk represents the number of n-step paths from (0, 0) to (k, n − k). Such paths must use k right steps
and n − k up steps. Alternatively, we can reflect these paths across the line y = x, so that right steps
become up steps and up steps become right steps. Under this reflection, a path
 from (0, 0) to (k, n − k)
n
becomes a path from (0, 0) to (n − k, k). There are n−k such paths, so nk = n−k n
.
(c) The left hand side represents the number of paths of length n starting at the origin and ending at any
point, so long as all steps are either right or up. Alternatively, for each step, we have 2 choices: we can
choose to step to the right or step up. Thus there are 2n such paths, so
       
n n n n
+ + + ··· + = 2n .
0 1 2 n


As an example of blockwalking in action, consider the following example.
Example 2.3.3.
   2  2  2  2
2n n n n n
= + + + ··· + .
n 0 1 2 n

Solution: Note that 2n n counts the number of ways to walk from (0, 0) to (n, n) in 2n steps. However,
any path from (0, 0) to (n, n) must pass through exactly one point (x, y) with x + y = n. These correspond
to the points marked in the diagram below.
A

B
To compute the number of paths passing through (k, n − k), note that we must have an initial path from A 
to (k, n − k), followed by another path from (k, n − k). The number of paths from A to (k, n − k) is nk ,
while the number of paths from (k, n − k) to (n, n) is also nk by symmetry. Therefore, the total number of
2
paths from (0, 0) to (n, n) that pass through (k, n − k) is nk . Summing these over all possible k, we deduce
that the total number of paths is the right side of the equation, so
   2  2  2  2
2n n n n n
= + + + ··· + .
n 0 1 2 n

2.4. OTHER IDENTITIES 17

2.4 Other Identities


In this section, we will demonstrate proofs to a few identities using similar techniques. Can you come up
with better stories?
Example 2.4.1. If n is a nonnegative integer, prove that
       
n n 2 n n n
1· +4· +4 · + ··· + 4 · = 5n .
0 1 2 n
Solution: A school has four different science clubs and one history club. Each student must choose
exactly one club to be a member of. Note that each student has five choices for which club to join, so overall,
there are 5n ways for the students to choose clubs to join.
On the other hand, we can count this by doing  casework on how many students are in science clubs. If
k students are in science clubs, then there are nk ways to select the students who are in science clubs (the
remaining n−k students are assigned to history clubs). Each of these k students in science clubs must choose
membership in one of the four science clubs, so each student has 4 choices. Therefore, there are 4k · nk ways
for the students to choose memberships in the clubs.
Since both methods count the same thing, we conclude that
       
n n 2 n n n
1· +4· +4 · + ··· + 4 · = 5n .
0 1 2 n

The next two identities are so named because of their shape when drawn in Pascal’s triangle.
Example 2.4.2. (Hockey Stick Identities) Prove that
         
n n+1 n+2 n+m n+m+1
+ + + ··· + =
0 1 2 m m
AND          
n n+1 n+2 n+m n+m+1
+ + + ··· + =
n n n n n+1

1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
6 15 20 15 6
21 35 35 21
56 70 56
126 126
252

This can be proven in many ways—by committee forming, blockwalking, or inductive use of Pascal’s triangle.
We will illustrate a proof using committee forming.
Proof. First, note that the two identities are equivalent by use of the symmetric identity. We will prove the
second version. 
First, note that n+m+1
n+1 represents the number of ways to form a committee of size n + 1 from a group
of n + m + 1 people.
18 CHAPTER 2. COMBINATORIAL IDENTITIES

To count the left side, suppose that the n + m + 1 people arrive to the committee-forming event, and
you give each person a number (from 0 to n + m) based on the order in which they arrive. To choose the
committee, you start by picking person k to be the “lowest-numbered person” on the committee (so the
people with numbers 0 through k − 1 are not on the committee). Thus there are n + m − (k + 1) candidates
remaining among people with numbers k + 1,. . . , m + n.
 From these candidates, you must choose n people to
be on the committee. Therefore, there are n+m−(k+1)
n ways to select a committee such that person k is the
“lowest-numbered person” on the committee. If the n + 1 people are the people chosen for the committee,
then the “lowest numbered person” will be person m − 1. Thus the lowest-numbered person can be anyone
from person 0 to person m − 1.
Adding the number of committees for each choice of k from k = 0 to k = m − 1, we find that the total
number of committees that can be formed is
       
n n+1 n+2 n+m
+ + + ··· + .
n n n n

From this, the identity clearly follows.

For the next identity, we will present three proofs. The first is another committee-forming story, while the
latter two make use of known combinatorial identities algebraically to help us form a proof. This illustrates
the utility of knowing several combinatorial identities, because sometimes combinatorial proofs may be easier
to obtain algebraically.

Example 2.4.3. Prove that


       
n n n n
1· +2· +3· + ··· + n · = n · 2n−1 .
1 2 3 n

Proof. On the left-hand side, note that terms are of the form k· nk . We know that we can choose a committee
of size k in nk ways. Having selected a committee, we can elect a chairperson from the k members in k
ways. Therefore, the number of ways to elect a committee of size k with a chairperson is k · nk . Adding this
over all k, we obtain the left hand side (note that it is impossible for a committee to have size 0—since the
chairperson must also be on the committee, the committee has size at least 1).
On the other hand, instead of selecting the committee first, we can select the chairperson of our committee
from among the n people in n ways. After this, each of the remaining n − 1 people has 2 choices—either
they are on the committee or off the committee. Therefore, the committee/chairperson combination can be
selected in n · 2n−1 ways. Thus the left hand side and right hand side both count the same thing, so they
must be equal.

Proof. (By Absorption) 


Applying the absorption identity to each nk on the left hand side, we find that the left hand side is equal
to          
n n−1 n n−1 n n−1 n n−1
1· · +2· · +3· · + ··· + n · · .
1 0 2 1 3 2 n n−1
This simplifies to        
n−1 n−1 n−1 n−1
n + + + ··· + .
0 1 2 n−1
From a previous identity, we know that the sum of the (n − 1)st row of Pascal’s triangle is 2n−1 . Therefore,
this expression simplifies to n · 2n−1 , which is what we wanted to prove.

Proof. (Symmetry and Sum Manipulation)


Let S bethe value
 of the sum on the left hand side. We can reverse the sum, noting that by the Symmetric
Identity, nk = n−k
n
. Therefore,
n
 n
 n
 n

S =  1· 1 + 2· 2 + ··· + (n − 1) · n−1 + n· n
n n n n
S = n· 0 + (n − 1) · 1 + (n − 2) · 2 + ··· + 1· n−1
2.4. OTHER IDENTITIES 19

Adding these, we find that


         
n n n n n
2S = n + + + ··· + + .
0 1 2 n−1 n

Again, using the fact that the sum of the nth row of Pascal’s Triangle is 2n , we deduce that 2S = n · 2n .
Hence S = n · 2n−1 .
Most combinatorial identities can be proved in multiple ways. Generally, we have a preference for
combinatorial proofs, because when we learn a combinatorial proof, it feels like we understand the structure of
why something happens. Algebraic manipulations can also be beautiful, but they often mask the underlying
ideas. As you proceed through the problems, try to tell your own stories.
20 CHAPTER 2. COMBINATORIAL IDENTITIES
Chapter 3

Principle of Inclusion-Exclusion

3.1 Motivation
On day 1, we reviewed the addition principle: if two sets A and B are disjoint, then their union A ∪ B has
|A| + |B| elements. However, this is only true if the sets are disjoint, i.e. if |A ∩ B| = 0. What happens when
|A ∩ B| > 0? Are we still able to calculate |A ∪ B|?

Example 3.1.1. (AMC 8; 2008) Each of the 39 students in the eighth grade at Lincoln Middle School has
one dog or one cat or both a dog and a cat. Twenty students have a dog and 26 students have a cat. How
many students have both a dog and a cat?

Solution: We visualize this with a Venn Diagram. Each circle corresponds to a pet that the students
inside own, the region in both circle represents students who own both pets, and the regions outside the
intersection represent the students who only own one of each pet.

has a dog owned by a cat

20 − x x 26 − x

We let the number of students who have both pets be x. Then there are 20 − x people who only own a dog,
and 26 − x people who live with cats. Thus the total number of students is, in terms of x,

x + 20 − x + 26 − x = 46 − x.

The total number of students is also 39, so x = 7 . 

If we analyze how we computed x, we see that it ends up being 20 + 26 − 39. This is the sum of the sizes
of the two sets, minus the union. This gives us the following two equalities:

|A ∩ B| = |A| + |B| − |A ∪ B|,


|A ∪ B| = |A| + |B| − |A ∩ B|.

The natural question is: how far can we extend these equalities? The way this is done is by increasing
the number of sets, and expressing the size of the union in terms of all the intersections. Let us see this in
action for three sets:

21
22 CHAPTER 3. PRINCIPLE OF INCLUSION-EXCLUSION

A A A

B C B C B C

|A| + |B| + |C| |A| + |B| + |C| |A| + |B| + |C|


−|A ∩ B| − |B ∩ C| − |C ∩ A| −|A ∩ B| − |B ∩ C| − |C ∩ A|
+|A ∩ B ∩ C|

We start by considering the sum of the elements of the sets A, B, and C. This overcounts all the elements
in the intersections A ∩ B, B ∩ C, and C ∩ A, so we can subtract the sizes of these intersections. This results
in counting everything except the elements of A ∩ B ∩ C, so we add those back in. We conclude that:

|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |B ∩ C| − |C ∩ A| + |A ∩ B ∩ C|.

This gives us a calculation for three sets: the size of the union can be expressed entirely in terms of the size of
the intersections, and ends up being an alternating sum based on the number of sets inside the intersection.
This in fact generalizes to any number of sets:
Theorem 3.1.1. (The Principle of Inclusion and Exclusion) For sets A1 , A2 , . . . , An , the size of their
union can be calculated as
n
!  
X X
|A1 ∪ A2 ∪ · · · ∪ An | = |Ai | −  |Ai ∩ Aj | + · · · + (−1)n+1 |A1 ∩ · · · ∩ An |
i=1 1≤i<j≤n
 
n
X X
 
= (−1)k+1 |Ai1 ∩ Ai2 ∩ · · · ∩ Aik |
 .
k=1 i1 ,i2 ,...,ik ∈[1,n]
ia distinct

Proof. We prove this similar to how we derived the identity for three sets: we count the number of times an
arbitrary element of the union is counted in the right-hand side. Suppose some element e ∈ A1 ∪ · · · ∪ An is
contained in k of the sets, namely e ∈ An1 ∩ An2 ∩ · · · ∩ Ank , and is contained in no other sets. Then e will
only be counted in the intersections involving these sets.
The element e will first be counted in all intersections of size 1, i.e. |An1 |, etc. Since e is in
 exactly k sets,
this contributes a count of k. Then we will subtract all intersections of size 2. There are k2 ways to choose

two different sets containing e, so this will subtract k2 from our count. In general, suppose we consider
 
intersections of size `: there are k` such intersections containing e, so we will ad (−1)`−1 k` to our count.
In all, e will be counted      
k k k
k− + − · · · + (−1)k−1
2 3 k
times on the right-hand side. Notice that
       
k k k k
−(1 − 1)k = − + − + · · · + (−1)k−1 = 0.
0 1 2 k
Since the left-hand side is zero, this rearranges to
     
k k k
=k− + · · · + (−1)k−1 = 0,
0 2 k
3.2. EXAMPLES 23

Therefore e is counted exactly once on the right-hand side. This is true for all e in the union A1 ∪A2 ∪· · ·∪An ,
so we must have that the right-hand side counts the number of elements in the union. This proves the equality
in the theorem statement.
An alternative proof exists involving indicator functions, polynomials, and Vieta’s Formulas, which will
appear in the handout.

3.2 Examples
We begin with a direct application of the Principle of Inclusion and Exclusion, i.e. PIE, mixed with a bit of
number theory.
Example 3.2.1. (AIME; 2005) Find the number of positive integers that are divisors of at least one of
1010 , 157 , 1811 .

Solution: Note that 1010 = 210 ·510 , 157 = 37 ·57 , and 1811 = 211 ·322 . These numbers have (10+1)(10+
1) = 121, (7 + 1)(7 + 1) = 64, and (11 + 1)(22 + 1) = 276 divisors, respectively. Then gcd(1010 , 157 ) = 57 ,
so there are (7 + 1) = 8 numbers that are divisors of both 1010 and 157 . Also, gcd(157 , 1811 ) = 37 , so there
are (7 + 1) = 8 numbers that are divisors of both numbers, and lastly, gcd(1010 , 1811 ) = 210 , so there are
(10 + 1) = 11 numbers that are divisors of both numbers. Also, only 1 is a divisor of all three numbers.
Therefore, by Inclusion-Exclusion, there are

121 + 64 + 276 − 8 − 8 − 11 + 1 = 435

numbers that are a divisor of at least one of those numbers. 

Notice that in the PIE calculation, the alternating signs are a way to correct for overcounting. We can
use this idea to solve other problems as well. The next example demonstrates using the idea behind PIE
instead of being a direct application of PIE.

Example 3.2.2. (ARML; 1995) Compute the number of distinct planes passing through at least three vertices
of a given cube.

Solution: A given plane may pass through at most four points of a given cube. Thus we compute the
number of planes that pass through exactly four, then take the number of triples of vertices 83 and correct
for overcounting these four-vertex planes.
If a plane passes through four points, then it either is a face plane, or lies on the diagonals of two opposite
faces. There are six face planes, and there are six planes along facediagonals, so there are 12 planes that
pass through four vertices. When we count the number of triples 83 = 56, each of these planes is counted
4

3 = 4 times. Therefore to correct for overcounting, we subtract these planes three times. To conclude, the
answer is     
8 4
− 12 · − 1 = 56 − 12 · 3 = 20 .
3 3


3.3 Connecting PIE with Complementary Counting


The last set of examples we look at share a common theme: the objects we need to count are normally very
hard to analyze, but instead we are able to use complementary counting to produce an application of PIE
instead.
24 CHAPTER 3. PRINCIPLE OF INCLUSION-EXCLUSION

Example 3.3.1. Compute the number of ways to arrange the numbers 1 through n in a line, such that
number i is not in position i for all i from 1 to n. Such an arrangement is called a derangement of the first
n numbers.

Solution: If we try to calculate this directly, after we place 1 in any of the last n − 1 slots, we don’t
know how many positions we have to place the number 2. If number 1 is in position 2, then there are n − 1
positions we can place 2. Otherwise, there are only n − 2 positions we can place 2. This sort of analysis
creates many cases very quickly.
Instead, we use complementary counting. This counts the number of arrangements where number i is in
position i for at least one number i. Let Ai be the set of arrangements where i is in the correct position.
We then wish to calculate |A1 ∪ A2 ∪ · · · ∪ An |, which is the complement of what we’ve been asked to count.
Recall that, from PIE,

|A1 ∪ A2 ∪ · · · ∪ An | = (|A1 | + |A2 | + · · · + |An |)


− (|A1 ∩ A2 | + |A1 ∩ A3 | + · · · + |An−1 ∩ An |)
+ ···
+ (−1)n−1 |A1 ∩ A2 ∩ · · · ∩ An |.

To calculate the size of an intersection of k of these sets, notice that Ai1 ∩ Ai2 ∩ · · · ∩ Aik is the set of
arrangements where i1 , i2 , . . . , ik are all in the correct positions. This leaves one possibility for the positions
of these numbers. The remaining (n − k)! numbers can be arranged arbitrarily, so we conclude that

|Ai1 ∩ Ai2 ∩ · · · ∩ Aik | = (n − k)!.


n

In the PIE calculation, there are k terms that are intersections of k sets, so we conclude that
n
X  
n
|A1 ∪ A2 ∪ · · · ∪ An | = (−1)k−1 (n − k)!.
k
k=1

Recall that this is the complement of what we wish to count, so the total number of derangements of n
numbers is    
n
X n
X
k−1 n k n
n! − (−1) (n − k)! = (−1) (n − k)!.
k k
k=1 k=0

Our last example connects back to number theory.

Example 3.3.2. Euler’s Totient Function ϕ(n) counts the number of integers at most n that are relatively
prime to n. Show that if n ≥ 2, and if n has prime factorization pe11 pe22 · · · pekk , then
    
1 1 1
ϕ(n) = n · 1 − 1− ··· 1 − .
p1 p2 pk

Solution: Recall that a number m is relatively prime to n = pe11 pe22 · · · pekk if m is divisible by none of p1 ,
p2 , . . . , pk . We then use complementary counting: we count the number of m ≤ n such that m is divisible
by at least one of p1 , p2 , . . . , pk .

Note that there are n/p1 numbers at most n that are divisible by p1 . Suppose we choose ` primes pk1 ,
pk2 , . . . , pk` to divide the number m. Then there are
n
pk1 pk2 · · · pk`
3.3. CONNECTING PIE WITH COMPLEMENTARY COUNTING 25

possibilities for the value of m. Therefore the number of m ≤ n that share prime factors with n is, from
PIE,
n n n
n − ϕ(n) = + + ··· +
p1 p pk
 2 
n n n
− + + ··· +
p1 p2 p1 p3 pk−1 pk
+ ···
n
+ (−1)k−1 · .
p1 p2 · · · pk

We solve for ϕ(n) to get


   
n n n n n n n
ϕ(n) = n − + + ··· + + + + ··· + − · · · + (−1)k · .
p1 p2 pk p1 p2 p1 p3 pk−1 pk p1 p2 · · · pk

By Vieta’s Formulas, this factors to


    
1 1 1
ϕ(n) = n 1 − 1− ··· 1 − .
p1 p2 pk


26 CHAPTER 3. PRINCIPLE OF INCLUSION-EXCLUSION
Chapter 4

Indistinguishability and Rotational


Symmetry

Today we talk about a rather niche topic that is popular in the United States: indistinguishability. This
manifests itself in the form of counting arrangements where some are classified as indistinguishable, and most
commonly the arrangements that are considered the same are rotations and reflections of each other. This
leads normal methods of counting arrangements to end up overcounting, so we must find ways to correct
this error. As we’ll see today, this correction can become very complicated as we vary the conditions of the
problem.

4.1 Tables and Necklaces


We begin with the two classical introductions to rotational symmetry: sitting people at a table, and arranging
beads on a necklace. This will let us begin to formulate general strategies for these sorts of problems.
Example 4.1.1. How many ways are there to sit n people at a circular table, if having everyone rotate by
the same amount is not considered to change the arrangement?

Solution: First we count the number of arrangements without regarding for rotation. There are n
distinct people and n spaces to put them, so there are n! arrangements before accounting for rotations.
When we consider rotations, many of these arrangements end up being the same. We consider the groups
of seatings that are classified as being the same: these seatings are rotations of each other. If we group our
n! original seatings into collections of indistinguishable arrangements, then we get n arrangements in each
collection. This is due to the n different ways to rotate a seating. Once we group all of the arrangements
like this, the different groups give distinguishably different seatings, so the number of different seatings is
the number of groups. There are n arrangements in each group, so there are n!/n = (n − 1)! different
arrangements of the people around the table. 

This example was fairly straightforward: including rotations meant that we overcounted each arrangement
by the same amount each time. This holds true in the second context as well.
Example 4.1.2. We are given n > 2 different colored beads to put on a necklace. How many necklaces can
we construct, if rotating or reflecting the necklace is not considered to change the arrangement?
Solution: We do this similarly to the previous problem. There are n! arrangements not accounting for
rotations and reflections. Rotations give a group of n arrangements that are indistinguishable. However, if
we reflect as well, then we can turn the necklace over and get n more rotations, so in all there are groups of
2n arrangements that are indistinguishable. Thus, there are n!/(2n) groups of arrangements, and different
groups give different arrangements, so there are 21 (n − 1)! ways to construct the necklace. 
The idea behind our previous two solutions was that we can group up all of the arrangements into
categories where the patterns in each category are considered the same, and then we find the number of

27
28 CHAPTER 4. INDISTINGUISHABILITY AND ROTATIONAL SYMMETRY

different patterns by counting the number of different categories. Let’s see an example that extends this idea
into three dimensions:

Example 4.1.3. (HMMT; 2013) I have 8 unit cubes of different colors, which I want to glue together into a
2 × 2 × 2 cube. How many distinct 2 × 2 × 2 cubes can I make? Rotations of the same cube are not considered
distinct, but reflections are.

Solution: Again, we try to group the colorings into categories based on which are classified as the same.
Note that the unit cubes are all different colors, so any rotation will change how the colors appear. This
means that each group should have size R, where R is the number of rotations of the cube. There are 6 ways
to choose the front face, and 4 ways to rotate the cube once this front is chosen, so there are R = 6 · 4 = 24
rotations of the cube. Without rotations there are 8! ways to order the colors, so there are

8! 7!
= = 1680
24 3
distinct cubes we can make. 

4.2 Indistinguishable Objects


The three examples above have a common feature: the objects we are arranging are all different. What
happens when we let objects be indistinguishable as well? Are we able to count the arrangements in the
same way? It turns out that indistinguishable objects drastically complicate computations.

Example 4.2.1. How many ways are there to set n identical chairs at a round table? How many ways are
there to use n > 2 identical beads to make a necklace?

Solution: Recall the computations we made for the problems when we were using distinct objects: we
took the number of arrangements without regard for rotations and reflections, and we divided by the number
of ways to rotate and reflect. Let’s see what answers we get when we apply this idea here: There is only
one way to sit the identical chairs at the table without regard for rotations, so the computation we get is
1
n . There is only one way to string the identical beads to make a necklace without regard for rotations or
1
reflections, so the computation we get is 2n . Clearly these are wrong. Therefore the solution isn’t just to
divide by the number of rotations or reflections we have.
However, the grouping idea still (trivially) works: if we consider all the collections of arrangements that
are the same under rotation and reflection, and count those, we get that there is only one group, and it has
only one element, so there is only one way to sit the chairs or string the beads. 
The grouping idea became harder in this example, because for the arrangements we could make, we could
no longer easily say how many different rotations and reflections we have. This is because, in some cases,
the number of different rotations of an arrangement may be smaller due to the indistinguishability of the
objects. Let’s see if we can extend this to more complicated settings.
First, we create a definition that lets us talk about these collections of arrangements explicitly.

Definition 4.2.1. Suppose we have a set X of arrangements, and a list of transformations gi : X → X that
each turn one arrangement into another. Then for a fixed arrangement x ∈ X, the set of arrangements that
the gi send x to is called the orbit of x with respect to these transformations.

In the HMMT example, given a coloring x of the 2 × 2 × 2 cube, the 24 rotations mean that x has
an orbit of size 24, corresponding to each of the colorings that look different from x, but are rotationally
indistinguishable from x.

Example 4.2.2. (Mandelbrot; 2010) How many distinct ways are there to arrange six identical black chairs
and six identical red chairs around a large circular table in an evenly spaced manner? (Two arrangements
are considered to be the same if one can be obtained from the other by rotating the table and chairs.)
4.3. BURNSIDE’S LEMMA 29

Solution: We again need to count the number of collections of indistinguishable rotations, i.e. we need
to count all the orbits of the arrangements. Because we’re now switching to a mindset centered around orbits,
we now consider arrangements of chairs to be rotationally distinct, and focus on counting the different orbits
instead. The difficulty in this problem stems from teh fact that different arrangements can have different
orbit sizes: for example, alternating the colors of the chairs will give an orbit size of 2, but putting all the
red chairs next to each other gives an orbit size of 12.
To approach this problem, we count the orbits by performing casework on the possible sizes of the orbits.
Since there are 12 chairs, the only possible orbit sizes are divisors of 12, so they are 1, 2, 3, 4, 6, and 12.
The only way an arrangement can have an orbit of size 1 is if the arrangement has chairs of only one color,
so there are no size-1 orbits. The only way we can have an orbit size of 2 is if the colors alternate, so this
gives two arrangements with orbit size 2 (and these two arrangements are indistinguishable from each other,
due to rotation).
There are no orbits of size 3, as this would force the number of black and red chairs to be a multiple of
12/3 = 4. To calculate orbits of size 4, we notice that we can just fill in the first four chairs around, and the
rest falls into place. We need to use two black and two red chairs to do this, so the possible arrangements
of the first four chairs are:

BBRR, BRBR, BRRB, RBBR, RBRB, RRBB.

Starting the table with BRBR or RBRB forces the table to alternate colors, and we’ve already counted
these arrangements. So, the only arrangements for size-4 orbits stem from BBRR, BRRB, RBBR, and
RRBB. These four arrangements are rotationally indistinguishable, so this will give one orbit of size 4.
To find orbits of size 6: We can once again
 arrange the first six chairs, which must have three black
and three red chairs. We can do this in 63 = 20 ways. However, two of those configurations result in
arrangements of orbit size 2, so there are only 20 − 2 = 18 chair arrangements that give size-6 orbits.
Therefore the number of size-6 orbits is 18/6 = 3. 
We finally find the orbits of size 12. There are 126 = 924 total arrangements of the chairs without ac-
counting for rotation. We have used 2 + 4 + 18 = 24 of these arrangements for smaller orbits, so the number
of arrangements with size-12 orbits is 924 − 24 = 900. Therefore, the number of size-12 orbits is 900/12 = 75.

We conclude that there is one orbit of size 2, one orbit of size 4, three orbits of size 6, and 75 orbits of
size 12, so the total number of orbits is

1 + 1 + 3 + 75 = 80.

We conclude that the number of arrangements of the chairs, counting rotations as the same, is 80 . 
This gives us a powerful mindset we can use to solve these sorts of problems: we can perform casework
on how “symmetric” the arrangements are, i.e. on how many arrangements that are considered to be the
same.

4.3 Burnside’s Lemma


Our last tactic is an advanced result from abstract algebra, and it requires some more definitions and ideas.

Definition 4.3.1. Suppose we have a set of arrangements X, and some transformation g : X → X that
turns arrangements in X into different arrangements. We say that x ∈ X is a fixed point with respect to g
if g(x) = x, i.e. if acting the transformation g on x produces x.

Examples include the monochromatic necklaces we considered earlier: if we only use one bead color, then
this arrangement is a fixed point of every rotation and reflection.

Theorem 4.3.1. (Burnside’s Lemma) Consider a set X of arrangements of objects. Let {g1 , g2 , . . . , gk }
be a collection of transformations that act on X, such that this collection is closed under composition, i.e.
we can’t construct new transformations from the ones we already have. This collection will also contain the
30 CHAPTER 4. INDISTINGUISHABILITY AND ROTATIONAL SYMMETRY

“identity transformation”, where we don’t transform the object. Then the number of orbits in X is equal to
the average number of fixed points of gk . In other words,
k
1X
# of orbits of X = (# of fixed points of gi ).
k i=1

The proof of this theorem requires the following concepts:


• Groups,
• Cosets of groups,
• Group actions on sets,
• Orbits, fixed points, and stabilizers,
• The orbit-stabilizer theorem,
• Lagrange’s Theorem on groups,
• A nice bijection that is hidden under a lot of terminology.
Therefore we omit the proof of this theorem. However, if you’re interested in the proof and are not afraid
of some group theory, the proof can be found online and will probably only take a day’s worth of dedicated
effort to understand.
Example 4.3.1. (Purple Comet; 2015) You have a collection of small wooden blocks that are rectangular
solids measuring 3 × 4 × 6. Each of the six faces of each block is to be painted a solid color, and you have
three colors of paint to use. Find the number of distinguishable ways you could paint the blocks. (Two blocks
are distinguishable if you cannot rotate one block so that it looks identical to the other block.)

Solution:
Method 1: We solve this with Burnside’s Lemma: we figure out how many rotations there are, calculate
the number of fixed points for each rotation, and then take the average.
Fix a 3 × 4 × 6 block in three-dimensional space. Then we can either keep it there (Which is a rotation!),
or rotate it by 180◦ about one of its faces. Note that we can’t put these rotations together to make more, so
there are only four different rotations.
For the trivial rotation, every arrangement of colors is a fixed point and there are 36 = 729 ways to
arrange the colors, so there are 729 fixed points of this rotation.
When rotating about the 3 × 4 faces, these two faces can be anything, but the 4 × 6 faces must be the
same since they swap, and the 3 × 6 faces must be the same as well. Therefore, to create a fixed point of
this rotation, there are 32 choices for the 3 × 4 faces, 3 choices for the 4 × 6 faces, and 3 choices for the 3 × 6
faces. We conclude that there are 32 · 3 · 3 = 81 fixed points of this rotation.
The above calculation works for the other two rotations about the 4 × 6 and 3 × 6 faces, so there are 81
fixed points for each of those.
In conclusion, the average number of fixed points for these four rotations is
729 + 81 + 81 + 81
= 243 .
4
This must also be the number of distinguishable arrangements of X, from Burnside’s Lemma.

Method 2: We can also solve this problem by performing casework on the sizes of the orbits of the
arrangements. Since there are only four different rotations we can make, each orbit has a size of at most 4.
We can also argue that each orbit size needs to divide 4, so the only orbit sizes are 1, 2, and 4. Again, we
consider arrangements to be rotationally distinct, and count the orbits.
Each rotation has the effect of reflecting two pairs of opposite sides of the box, so the only way we can
have orbits of size 1 is if the opposite pairs of sides are the same. There are 33 = 27 ways to paint the box
such that opposite sides are the same color, so there are 27 orbits of size 1.
4.3. BURNSIDE’S LEMMA 31

The only way we can have orbits of size 2 is if two pairs of opposite sides have the same color and the third
pair has different colors. There are 3 ways to choose which pair of sides have different colors, 3 · 2 = 6 ways
to choose their colors, and 32 ways to color the remaining pairs of opposite sides so they’re the same. This
gives 3 · 6 · 32 = 162 rotationally distinct arrangements of colors with orbit size 2, so this gives 162/2 = 81
orbits.
Finally, there are 36 = 729 rotationally distinct arrangements total, and we’ve used 27 + 162 = 189 of
them to create orbits of size 1 and 2, so there are 729 − 189 = 540 left, which will go into orbits of size 4.
This gives 540/4 = 135 more orbits.
In conclusion, the number of distinct orbits of the paintings of this box is

27 + 81 + 135 = 243 .


Both mindsets work, but you can see how much more tedious it is to work with orbit casework instead
of using Burnside’s Lemma.
Example 4.3.2. (PUMaC; 2012) Two white pentagonal pyramids, with side lengths all the same, are glued
to each other at their regular pentagon bases. Some of the resulting 10 faces are colored black. How many
rotationally distinguishable colorings may result?

Solution: We present a solution with Burnside’s Lemma. Let r be the rotation by 72◦ about the
polyhedron’s axis, i.e. a rotation that sends each face to an adjacent one. Let f be the result of flipping the
polyhedron so that its top becomes its bottom and vice versa. Then there are ten different transformations
of the solid: the identity transformation, the four rotations with out a flip (which we denote r, r2 , r3 , r4 ),
the flip f , and the four resulting rotations (which we denote rf , r2 f , r3 f , r4 f ).
We now find the number of fixed points of these ten rotations. The identity transformation fixes every
coloring, and there are 210 = 1024 colorings, so the identity transformation has 1024 fixed points. The
rotation r only has fixed points for colorings where the top is all one color, and the bottom is all one color.
There are 2 color choices each for the top and the bottom, so there are 22 = 4 fixed points of r. Similarly,
there are 4 fixed points of r2 , r3 , and r4 .
The flip f only has a fixed point when each face on the top is the same color as the corresponding face
on the bottom. There are 25 = 32 ways to color the top, so there are 32 ways to color the whole solid to be
a fixed point of f . Now note that if we apply f twice to the solid, it goes back to its original position. The
same can be said for rf , r2 f , r3 f , and r4 f , so from similar logic as to f , each of these other flips also has
32 fixed points.
We conclude that the average number of fixed points is equal to
1024 + 4 · 4 + 5 · 32
= 120 .
10
This must be the number of distinguishable colorings of the solid. 
32 CHAPTER 4. INDISTINGUISHABILITY AND ROTATIONAL SYMMETRY
Chapter 5

Recursion

Suppose that we have a quantity that depends on n, call it an . In most of what we’ve done so far, we’ve
counted an directly for each value of n. In harder counting problems, it’s sometimes helpful to express an in
terms of the previous case(s). This allows us to build on previous work, rather than having to split things
into nasty cases. The following example illustrates the power of a recursive mindset.
Example 5.0.1. Determine the maximum number of regions (finite or infinite) formed by n lines in the
plane.
Solution: Let an be the maximum number of regions formed by n lines in the plane. Clearly, a0 = 1
and a1 = 2. More generally, suppose we have n − 1 lines that achieve an−1 regions. Then when we draw
the nth line, how many regions does it create? Instead of focusing on regions, we first focus on intersection
points. Note that the new line can intersect each of the already-drawn lines at most once, creating at most
n − 1 new intersection points. In the below diagram, the bold line is the newly drawn line.

Note that the n − 1 new intersection points split the new line into n distinct rays/segments. Each of these
segments splits an existing region into two regions, thereby creating one new region. Therefore, since each
segment contributes one new region, we can create at most n new regions. Therefore,

an = an−1 + n.

Iterating this formula, we find

an = n + an−1
= n + (n − 1) + an−2
..
.
= n + (n − 1) + (n − 2) + · · · + 1 + a0
n(n + 1)
= + 1.
2


33
34 CHAPTER 5. RECURSION

While this is a relatively simple example, it illustrates how we can simplify a problem by thinking about
it recursively. Instead of counting everything at once, we were able to focus on the outcome of drawing a
single line.

5.1 Tiling
Next, we illustrate a beautiful example of recursion in solving a tiling problem.

Example 5.1.1. Find the number of ways to completely tile a 1 × n board using only 1 × 1 and 1 × 2 tiles.
One such tiling of a 1 × 8 board is shown below.

Solution: Let fn be the number of tilings of the 1 × n board. To approach this recursively, we do
casework based on the last tile. There are two cases: either the last tile is a 1 × 1 tile, or else it is a 1 × 2
tile. In the first case, we can tile the remaining 1 × (n − 1) board in fn−1 ways. In the second case, we can
tile the remaining 1 × (n − 2) board in fn−2 ways.

fn−1

fn−2

Therefore, fn = fn−1 + fn−2 . Since f1 = 1 and f2 = 2, we can compute any term in the sequence using this
recursion.

Note: By convention, we will say that f0 = 1—there is 1 way to tile a 1 × 0 board—simply place 0 tiles.
This answer may be a bit unsatisfying, because in order to compute fn , we need to compute all of the
previous terms. We’ll discuss a method for coming up with a formula later. However, it’s still relatively easy
to obtain a formula for small terms.
You might recognize the sequence fn as being related to another famous sequence called the Fibonacci
sequence.

Definition 5.1.1. The Fibonacci sequence Fn is defined by F1 = 1, F2 = 1, and for n ≥ 3, Fn = Fn−1 +Fn−2 .
The first few terms of the Fibonacci sequence are given by

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, . . .

The main difference between the two sequences is the starting conditions—we find that F3 = 2, so f1 = F2
and f2 = F3 . Therefore all terms beyond this point will match, so fn = Fn+1 , i.e., the Fibonacci sequence
is one term behind the tiling sequence.
Tiling allows us to prove some nice identities about Fibonacci numbers. For convenience, we will state
these identities in terms of fn .

Example 5.1.2. Prove that if n ≥ 0, then

f0 + f1 + f2 + · · · + fn = fn+2 − 1.

Proof. The right hand side is one less than the number of tilings of a 1 × (n + 2) board. In fact, we can say
that it is equal to the number of tilings of a 1 × (n + 2) board that use at least one 1 × 2 tile.
To count the left hand side, we do casework based on the first time a 1 × 2 tile appears. It can either
appear as the first tile, second tile, third tile, and so on, and in each case, we count the number of ways we
can tile the remaining board.
5.2. COUNTING RECURSIVELY 35

fn

fn−1

fn−2

..
.

··· f0

Since these cases are disjoint and cover every single possible tiling that contains at least one 1 × 2 tile, we
deduce that
f0 + f1 + f2 + · · · + fn = fn+2 − 1.

Example 5.1.3. Prove that


f2n = fn2 + fn−1
2
.

Proof. We note that f2n counts the number of tilings of a 1 × 2n board. A 1 × 2n board can be thought of
as two 1 × n boards placed next to each other. We do casework based on whether or not the tiling contains
a 1 × 2 tile in the exact center of the board. If the tiling contains a 1 × 2 tile in the center of the board, then
we can tile either side of the board in fn−1 ways. If the tiling does not contain a 1 × 2 tile in the center of
the board, then we can tile either side of the board in fn ways.

fn−1 fn−1

fn fn

2
It follows that f2n = fn−1 + fn2 .

5.2 Counting Recursively


When we try to solve a recursive counting problem, the basic idea is to try to express the number of ways
to count the nth case in terms of previous cases. Consider the following example.

Example 5.2.1. (AMC 12; 2009) Ten women sit in 10 seats in a line. All of the 10 get up and then reseat
themselves using all 10 seats, each sitting in the seat she was in before or a seat next to the one she occupied
before. In how many ways can the women be reseated?

Solution: First, we note that the woman sitting in chair #10 must sit in either chair #9 or chair #10.
Let sn be the number of ways that n women sitting in a line can reseat themselves according to the given
conditions.

• If Woman #10 sits in chair #9, then the only woman who can occupy chair #10 is the woman who
was previously sitting in chair #9, so the women sitting in chairs #9 and #10 must switch spots. The
remaining 8 women can reseat themselves in s8 ways.

• If Woman #10 sits in chair #10, then the remaining 9 women can reseat themselves in s9 ways.
36 CHAPTER 5. RECURSION

Therefore, s10 = s9 + s8 . More generally, similar logic shows that sn = sn−1 + sn−2 . Since s1 = 1 and s2 = 2,
we can compute successive terms of the sequence, finding

1, 2, 3, 5, 8, 13, 21, 34, 55, 89.

Therefore, s10 = 89 . 

Example 5.2.2. (AIME; 2006) A collection of 8 cubes consists of one cube with edge-length k for each
integer k, 1 ≤ k ≤ 8. A tower is to be built using all 8 cubes according to the rules:
• Any cube may be the bottom cube in the tower.
• The cube immediately on top of a cube with edge-length k must have edge-length at most k + 2.
Let T be the number of different towers than can be constructed. What is the remainder when T is divided
by 1000?
Solution: Let Tn be the number of valid towers using n cubes with edge lengths 1, 2, 3, . . . , n. Then
T1 = 1, T2 = 2, and T3 = 6. Also, to compute Tn+1 , we note that the cube of edge-length n + 1 can either go
on the bottom, on top of the cube of side length n, or on top of the cube of side length n − 1. If we have a
tower of n + 1 cubes and remove the cube of side length n + 1, then it is also clear that the remaining tower
is a valid tower of side length length n. Therefore, each tower with n cubes corresponds to exactly 3 towers
with n + 1 cubes, so Tn+1 = 3Tn . Therefore, T8 = 2 · 36 = 1458, and the answer is 458 . 

Example 5.2.3. An alphabet contains three letters: A, B, C. Find a recurrence that computes the number
of “words” of length n that contain an even number of A’s.
Solution: Let xn be the number of words of length n that contain an even number of A’s. It’s easy to
check that x1 = 2 and x2 = 5. Now we do casework based on the last letter of an n-letter word.
• If the last letter is a B, then the previous n − 1 letters form a valid (n − 1)-letter word. Since there
are xn−1 valid (n − 1)-letter words, there must be xn−1 valid n-letter words that end in B.
• If the last letter is a C, then the previous n − 1 letters form a valid (n − 1)-letter word. Since there
are xn−1 valid (n − 1)-letter words, there must be xn−1 valid n-letter words that end in C.
• If the last letter is an A, then there must be another A, because A’s appear an even number of times.
We can do casework based on where the second to last A occurs. In the cases listed below, an X
represents either a B or a C.

AA, AXA, AXXA, ..., AXX . . . XA.


| {z } | {z } | {z }
n−2 letters n−3 letters n−4 letters

Since we have two choices for each X, we find that there are

xn−2 + 2 · xn−3 + 22 · xn−4 + · · · + 2n−3 · x1 + 2n−2

words that end with A.


It follows that
xn = 2xn−1 + xn−2 + 2 · xn−3 + 22 · xn−4 + · · · + 2n−3 · x1 + 2n−2 (1)
However, this seems somewhat impractical—unlike in previous problems where we only needed the last two
terms, this requires knowing all previous terms. The computations will become progressively more intense.
One way to deal with such intensity is to write out the previous recurrence.

xn−1 = 2xn−2 + xn−3 + 2 · xn−4 + 22 · xn−5 + · · · + 2n−4 · x1 + 2n−3 . (2)

Subtracting twice (2) from (1), we find

xn − 2xn−1 = 2xn−1 − 3xn−2 .


5.2. COUNTING RECURSIVELY 37

Therefore, xn = 4xn−1 − 3xn−2 . 


This solution was somewhat messy—we ended up with lots of cases, but we were able to collapse them by
looking at a previous instance of the recurrence and collapsing equations. One way to avoid this messiness
is to introduce another helpful sequence.
Solution: Let yn be the number of words of length n with an odd number of A’s. Again, we do casework
based on the last letter of an n-letter word.

• If the last letter is a B, then the previous n − 1 letters form an (n − 1)-letter word with an even number
of A’s. Thus there are xn−1 valid n-letter words that end in B.

• If the last letter is a C, then the previous n − 1 letters form an (n − 1)-letter word with an even number
of A’s. Thus there are xn−1 valid n-letter words that end in C.

• If the last letter is an A, then the previous n − 1 letters form an (n − 1)-letter word with an odd number
of A’s. Thus there are yn−1 valid n-letter words that end in A.

Therefore,
xn = 2xn−1 + yn−1 . (3)
On the other hand, since we introduced a new sequence, we must develop a recursion for yn . Computing yn ,
we find:

• If the last letter is a B, then the previous n − 1 letters form an (n − 1)-letter word with an odd number
of A’s. Thus there are yn−1 valid n-letter words that end in B.

• If the last letter is a C, then the previous n − 1 letters form an (n − 1)-letter word with an odd number
of A’s. Thus there are yn−1 valid n-letter words that end in C.

• If the last letter is an A, then the previous n − 1 letters form an (n − 1)-letter word with an even
number of A’s. Thus there are xn−1 valid n-letter words that end in A.

Therefore,
yn = 2yn−1 + xn−1 . (4)
We can use (3) and (4) to quickly compute terms, which would be a big improvement over our previous
method.
However, we can also use these recurrences to obtain the same recursion we found above. We can rewrite
(3) as yn−1 = xn − 2xn−1 . Similarly, yn = xn+1 − 2xn . Substituting these into (4), we find

(xn+1 − 2xn ) = 2(xn − 2xn−1 ) + xn−1 .

Hence xn+1 = 4xn − 3xn−1 . 


Often the introduction of a second sequence makes it easier to obtain a recurrence. While this is useful
in its own right, by clever substitution, we can often use the recurrences to obtain a recurrence in terms of
the original sequence.
The first few terms of the above sequence are

2, 5, 14, 41, 122, 365, . . . .

If we take the differences between consecutive terms, a pattern arises:

2 5 14 41 122 365
3 9 27 81 243
n
It appears that xn = 1 + (1 + 3 + 32 + · · · + 3n−1 ) = 1 + 3 2−1 . In fact, using induction, one could prove that
this formula holds true for all n. In fact, there is a general method for solving these recurrences, which we
will now discuss.
38 CHAPTER 5. RECURSION

5.3 Theory of Linear Recurrences


Before we demonstrate how to solve linear recurrences, we will approach things informally. Consider the
following example, which we approach by experimentation (the final details in each step are left as an exercise
to the reader).
Example 5.3.1. Suppose that the sequence xn is defined by the recursive relation xn+1 = 5xn − 6xn−1
for n ≥ 2, and some initial conditions. Find a general formula for xn given each of the following initial
conditions:
(a) x1 = 1, x2 = 2
(b) x1 = 1, x2 = 3
(c) x1 = 1, x2 = 4
(d) x1 = 2, x2 = 5
Solution:
(a) We compute the first few terms: x1 = 1, x2 = 2, x3 = 4, x4 = 8, x5 = 16, . . . . It appears that
xn = 2n−1 . This is true, and one can prove it by induction.
(b) We compute the first few terms: x1 = 1, x2 = 3, x3 = 9, x4 = 27, x5 = 81, . . . . It appears that
xn = 3n−1 . This is true, and one can prove it by induction.
(c) We compute the first few terms: x1 = 1, x2 = 4, x3 = 14, x4 = 46, x5 = 146, . . . . We don’t see an
obvious pattern here. So unfortunately, the patterns of part (a) and (b) will not hold in general.
(d) We compute the first few terms: x1 = 2, x2 = 5, x3 = 13, x4 = 35, x5 = 97, . . . . The pattern here is
not as obvious, but if we look carefully, we see that it is actually the sum of the two sequences in (a)
and (b), i.e., xn = 2n−1 + 3n−1 . This is true, and one can prove it by induction.

In this example, we found that the solutions to (a) and (b) were geometric sequences. But the solution
to (d) was the sum of two geometric sequences. In fact, the very astute reader may have been able to find
that the general formula for (c) was xn = 2 · 3n−1 − 2n−1 . This suggests that in the recurrence relation
xn+1 = 5xn − 6xn−1 , there is something special about the geometric sequences 2n−1 and 3n−1 . We now
explore why this is the case.
Definition 5.3.1. A homogeneous linear recursive sequence is a sequence defined by the values of the first
k terms, x1 , x2 , . . . , xk , and a recurrence relation

xn = c1 xn−1 + c2 xn−2 + c3 xn−3 + · · · + ck xn−k .

In light of our prior examples, one might guess that xn behaves like a geometric sequence. Suppose that
xn = aλn . If this was the case, then we could substitute this into the recurrence relation to find that

aλn = c1 (aλn−1 ) + c2 (aλn−2 ) + · · · + ck (aλn−k )


λk = c1 λk−1 + c2 λk−2 + · · · + ck . (1)

The equation (1) is called the characteristic equation of the sequence xn . If the roots of the characteristic
equation are λ1 , λ2 , . . . , λk , then every geometric sequence of the form aλki will satisfy the original recurrence
relation. In fact, we can add these geometric series together, getting sequences of the form

xn = a1 λn1 + a2 λn2 + · · · + ak λnk , (2)

and these sequences will still satisfy the recurrence relation. Therefore, if we can find a set of coefficients
{a1 , a2 , . . . , ak } such that (2) matches the values of the k initial conditions x1 , . . . , xk , then by induction,
the formula of the form (2) must be the formula for all n.
5.3. THEORY OF LINEAR RECURRENCES 39

Recall that in the ABC problem from above, we obtained the recurrence xn = 4xn−1 − 3xn−2 . Note that
the characteristic equation is λ2 − 4λ + 3 = 0, which has roots λ = 3 and λ = 1. Hence the formula must be
of the form xn = a · 3n + b · 1n . Using the initial conditions, we can deduce that a = 1/2 and b = 1/2, so we
obtain the same formula.
However, recurrences won’t always lead to such nice formulas. Consider the following example.
Example 5.3.2. Find a closed-form expression for the number of distinguishable tilings of a 1 × n board
using 1 × 1 and 1 × 2 tiles, if there are two colors of 1 × 1 tiles (red and green) and only one color of 1 × 2
tile (blue).
Solution: Let cn be the number of tilings of a 1 × n board. A tiling can end in three ways: a red 1 × 1
tile, a green 1 × 1 tile, or a blue 1 × 2 tile. Tiling the remaining board in each case, we deduce that
cn = cn−1 + cn−1 + cn−2 .
Therefore, cn = 2cn−1 + cn−2 , where c1 =√2 and c2 = 5. The characteristic equation for this recurrence is
λ2 − 2λ − 1 = 0, which has roots λ = 1 ± 2. Therefore, for some constants α and β, we must have
√ √
cn = α(1 + 2)n + (1 − 2)n .
Plugging in n = 1 and n = 2, we find
√ √
2 = (1 + 2)α + (1 − 2)β
√ 2
√ 2
5 = (1 + 2) α + (1 − 2) β.

Multiplying the first equation by 1 + 2 and subtracting the second equation, we find
√ √
2(1 + 2) − 5 = (−1 − (3 − 2 2))β.
√ √ √ √
Hence −3

+ 2 2 = (−4 + 2 2)β. Multiplying both sides by −4 − 2 2, we find 4 − 2 2 = 8β. Hence
β = 2−4 2 . Substituting this into the first equation, we find
√ !
√ √ 2− 2
2 = (1 + 2)α + (1 − 2) .
4
√ √
4+3 2
√ √
This simplifies to (1 + 2)α = 4 Multiplying by 2 − 1, we find α = 2+4 2 . Therefore,
.
√ ! √ !
2+ 2 √ n 2− 2 √
cn = (1 + 2) + (1 − 2)n .
4 4

Using this method, we can obtain a closed-form expression for the Fibonacci sequence. We leave the
proof of this formula as an exercise to the reader.
Proposition 5.3.1. (Binet’s Formula)
If F1 = F2 = 1 and Fn = Fn−1 + Fn−2 for all n ≥ 3, then
√ !n √ !n !
1 1+ 5 1− 5
Fn = √ − .
5 2 2

One potential difficulty arises when the characteristic polynomial has multiple roots. For these scenarios,
we have the following proposition:
Proposition 5.3.2. If xn is a homogeneous linear recursive sequence whose characteristic polynomial P (λ)
is divisible by (λ − λi )d (i.e., λ = λi is a root with multiplicity d), then the following sequences are all
solutions to the recurrence for xn :
xn = c1 λni , xn = c2 nλni , xn = c3 n2 λni , ..., xn = cd nd−1 λni .
In particular, to solve the recurrence, we express xn as a sum of sequences of the above types and solve for
the coefficients of each type of term.
40 CHAPTER 5. RECURSION

Example 5.3.3. The sequence bn is defined by b1 = 20, b2 = 125, and bn+1 = 10bn − 25bn−1 . Find a
closed-form expression for bn .

Solution: The characteristic equation for bn is λ2 −10λ+25 = 0. This factors as (λ−5)2 = 0. Therefore,
bn can be written in the form p · 5n + qn · 5n . Substituting in the initial conditions n = 1 and n = 2, we find

(p + q) · 5 = 20 and (p + 2q) · 52 = 125.

Therefore, p + q = 4 and p + 2q = 5. We solve this to find q = 1 and p = 3. Therefore, bn = (3 + n) · 5n . 


In the next example, we find the general solution to an non-homogeneous linear recurrence. A non-
homogeneous linear recursive sequence is like a linear recursive sequence, but there is one extra term that
may depend on n. The recursive relation looks like

xn = c1 xn−1 + c2 xn−2 + c3 xn−3 + · · · + ck xn−k + f (n).

Consider the following example.

Example 5.3.4. Solve the recurrence:

an+2 = an+1 + an + n − 1, a0 = 0; a1 = 0.

Solution: Computing the first few terms of the sequence, we find

0, 0, −1, −1, −1, 0, 2, 6, 13, . . .

This doesn’t seem to help much. However, if we add n to each term, we obtain the sequence

0, 1, 1, 2, 3, 5, 8, 13, 21, . . .

It appears that this is the Fibonacci sequence, so we might guess that an = Fn − n. In fact, if we define the
sequence bn by bn = an + n, then substituting an = bn − n into the original recurrence, we find

bn+2 − (n + 2) = (bn+1 − (n + 1)) + (bn − n) + (n − 1),

which simplifies to bn+2 = bn+1 + bn . Since b0 = 0 and b1 = 1, we see that b2 = 1, so indeed, bn is the
Fibonacci sequence. Thus an = Fn − n. 
It might seem like we pulled this out of a hat, which is somewhat true. However, there is a general
process to solving inhomogeneous linear recurrences, which we describe below.

• Step 1: Find a sequence xn = g(n) that satisfies

xn = c1 xn−1 + c2 xn−2 + c3 xn−3 + · · · + ck xn−k + f (n)

without using the initial conditions. This is called the particular solution.

• Step 2: Find the characteristic equation to the recurrence

xn = c1 xn−1 + c2 xn−2 + c3 xn−3 + · · · + ck xn−k .

If this has distinct roots λ1 , λ2 , . . . , λj with multiplicities d1 , d2 , . . . , dj , then we can write

xn = (a1,1 + a1,2 n + a1,3 n2 + · · · + a1,d1 −1 nd1 −1 )λn1


+ (a2,1 + a2,2 n + a2,3 n2 + · · · + a2,d2 −1 nd2 −1 )λn2
+ · · · + (aj,1 + aj,2 n + aj,3 n2 + · · · + a1,dj −1 ndj −1 )λnj + g(n).

We use the initial conditions to solve for the coefficients ar,s .


5.3. THEORY OF LINEAR RECURRENCES 41

The only thing we haven’t specified is how to find the particular solution. This can be a difficult process,
because it requires some experience. Usually, the particular solution will look very similar to the non-
homogeneous piece of the recurrence, f (n). For example, we have the following examples of particular
solutions:
• If f (n) = dk nk + dk−1 nk−1 + · · · + d1 n + d0 , then the particular solution will be of the form

pn = ek nk + ek−1 nk−1 + · · · + e1 n + e0 .

To determine the values of ei , we simply substitute this into the recurrence and match coefficients.
• If f (n) = rn , then the particular solution will be of the form

pn = C · rn .

• If f (n) = nk · rn , then the particular solution will be of the form

pn = rn (ek nk + ek−1 nk−1 + · · · + e1 n + e0 ).

Again, we must substitute this into the recurrence and match coefficients to determine the values of
ei .
In the above example, since the non-homogeneous part is f (n) = n − 1, we guess that the particular solution
is a linear polynomial, say an = cn + d. Then if an+2 = an+1 + an + (n − 1), we find

c(n + 2) + d = (c(n + 1) + d) + (cn + d) + (n − 1).

Matching coefficients of n, we find c = 2c + 1, and matching constant terms, we find 2c + d = (c + d) + d − 1,


or rather 2c + d = c + 2d − 1. From the coefficient of n, we find c = −1, so when we plug this into the
second equation, we find −2 + d = −1 + 2d − 1, which reduces to 0 = d. Therefore, the particular solution is
xn = −n. From here we know that the homogeneous recurrence is the same as Fibonacci, so since it satisfies
the same initial conditions, it must be exactly equal to the Fibonacci sequence.
42 CHAPTER 5. RECURSION
Chapter 6

Bijections

6.1 Introduction
The key idea behind using bijections is as follows, if we have a set A of objects we’d like to count that is
difficult to manage, we can find a set B with the same number of elements that is easier to count. Then we
count B, and conclude that A has |B| elements.

Definition 6.1.1. A bijection is a function f : A → B (mapping from A to B) such that f is one-to-one,


and is onto. The function f being one-to-one means that if a1 , a2 ∈ A are not equal then f (a1 ) 6= f (a2 ).
The function f being onto means that if b ∈ B then we can find a ∈ A such that f (a) = b.

If f is a bijection, then we have a one-to-one correspondence between the elements of A and those of B:
for each a ∈ A we can find one b ∈ B it maps to, and for each b ∈ B we can find one a ∈ A that maps to it.
The implication is that if we can find a bijection f : A → B between two finite sets, then the sets A and B
must have the same number of elements.

Example 6.1.1. There exists a bijection f1 between the arrangements of 10 stars and 3 bars, and the non-
negative solutions (a, b, c, d) to a + b + c + d = 10. The bijection takes in an arrangement of stars and bars
as input, and outputs the counts of stars between bars. For example:

f1 maps ? ? ? | ? ? ? ? ? | | ? ? to (3, 5, 0, 2).



13
We conclude that the number of non-negative solutions to a + b + c + d = 10 is 3 .

Furthermore, there exists a bijection f2 between walks along the sides of a 3 × 4 grid of squares going up
and to the right, and the arrangements of 3 R’s and 4 U’s. The bijection takes a walk as input and sends it
to the arrangement of R’s and U ’s correponding to the individual steps of the walk. For example:

f2 maps to RURRURU.

We
7
 conclude that the number of walks from the lower-left corner to the upper-right corner of a 3 × 4 grid
is 3 .

Mastery of bijections comes with experience moreso than through theoretical knowledge, so today we
will be focusing primarily on examples.

43
44 CHAPTER 6. BIJECTIONS

6.2 Examples
Example 6.2.1. (HMMT; 2007) On the Cartesian grid, Johnny wants to travel from (0, 0) to (5, 1), and
he wants to pass through all twelve points in the set S = {(i, j) | 0 ≤ i ≤ 1, 0 ≤ j ≤ 5, i, j ∈ Z}. Each step,
Johnny may go from one point in S to another point in S by a line segment connecting the two points. How
many ways are there for Johnny to start at (0, 0) and end at (5, 1) so that he never crosses his own path?

Solution: Suppose the path goes to some point (a, 0), for a ≤ 1. Then the path must have visited every
point (1, 0), (2, 0), . . . , (a − 1, 0) previously, or else the path wouldn’t be able to access these points without
crossing the path again. Similarly, if the path goes to some point (b, 1), it must have visited all of (0, 1),
(1, 1), . . . , (b − 1, 1) previously. Therefore the order that the path visits the points on the bottom is fixed,
as is the order that it visits the top points. Therefore, the path is entirely determined by whether the next
step is a top point or a bottom point.
As a result, we prove that the paths biject to the arrangements of five T ’s and five B’s in a row. Given
any arrangement of T ’s and B’s, we can construct a valid path from (0, 0) to (5, 1) by moving to the first
un-visited point in the corresponding row. This will visit ten points, at which point we move to (5, 1) to
complete the path. This creates a one-to-one mapping f from arrangements of T ’s and B’s to valid paths.
Conversely, we can take a valid path and construct an arrangement of T ’s and B’s, which f will then map
back to the valid path. Therefore f is one-to-one and onto, and hence a bijection between these two sets of
objects.

The number of arrangements of five T ’s and five B’s is 10 5 = 252, so there are 252 valid paths from
(0, 0) to (5, 1). 

Example 6.2.2. (Purple Comet; 2020) A deck of eight cards has cards numbered 1, 2, 3, 4, 5, 6, 7, 8,
in that order, and a deck of five cards has cards numbered 1, 2, 3, 4, 5, in that order. The two decks are
riffle-shuffled together to form a deck with 13 cards with the cards from each deck in the same order as they
were originally. Thus, numbers on the cards might end up in the order 1122334455678 or 1234512345678
but not 1223144553678. Find the number of possible sequences of the 13 numbers.

Solution: We claim that their is a bijection between riffle-shuffled sequences and rearrangements of
the string AAAAAAAABBBBB, where as we move from left to right, the total number of A’s that have
appeared is always at least as many as the number of B’s that have appeared.
To see this, suppose we have a riffle-shuffled sequence. Then whenever we see a number for the first time
(from left to right), we replace it with an A, and when we see a number for the second time, we replace it
with a B. The resulting string will certainly have eight A’s and five B’s, because exactly five numbers appear
twice, and eight numbers appear once. Moreover, when the kth B is written, then k was just written for a
second time, so the kth A must have been written before this. Therefore, the number of A’s written will
always be at least as many as the number of B’s written. Therefore, this process converts the shuffled deck
into a string of the desired type.
On the other hand, if we are given a string of 8 A’s and 5 B’s with the desired property, we can undo
this process by simply replacing the kth A by the number k and the kth B by the number k. The resulting
sequence will be riffle-shuffled, because it will have resulted from the riffle shuffle where the deck with cards
1–8 has the letter A on the back of the cards, and the deck with cards 1–5 has the letter B on the back of
the cards.
Thus we have a bijection, and we will count the resulting sequence with block walking. Suppose that A
represents moving one unit to the right and B represents moving one unit up. Then we wish to compute
the number of paths from (0, 0) to (8, 5) that never move above the line y = x (so that the number of A’s is
always at least as many as the number of B’s). We can count this using the grid below.
6.3. PARTITIONS 45

42 132 297 572

14 42 90 165 275

5 14 28 48 75 110

2 5 9 14 20 27 35

1 2 3 4 5 6 7 8

1 1 1 1 1 1 1 1 1

Thus there are 572 such paths, so there must be 572 possible riffle sequences. 

6.3 Partitions
As another example of bijections, we turn to integer partitions.

Definition 6.3.1. A partition of a positive integer n is a way of writing n as a sum of positive integers
n = λ1 + λ2 + · · · + λk , where λ1 ≥ λ2 ≥ · · · ≥ λk . For example, n = 4 has five partitions:

4, 3 + 1, 2 + 2, 2 + 1 + 1, 1 + 1 + 1 + 1.

We say that the λi are the parts of the partition. The number of partitions of n is denoted as p(n), so
p(4) = 5.

The partitions of a positive integer n are of interest in particular because there is not an easy way to
count them. There is no known closed-form expression, and while there is a recursive expression (see Euler’s
Pentagonal Number Theorem), it becomes progressively more complicated. When we prove results about
partitions, we often resort to the use of bijections.
One of the most common tools in studying partitions from a combinatorial perspective is the use of
Ferrers diagrams. A Ferrers diagram is a pictorial representation of a partition using an array of dots. If
the partition is λ1 + λ2 + · · · + λk , then we draw a diagram with λ1 dots in the first row, λ2 dots in the
second row, and so forth. All rows are aligned to the left. For example, the Ferrers diagram corresponding
to 7 + 5 + 4 + 3 + 3 + 1 + 1 is shown below.

7
5
4
3
3
1
1

As an example of the utility of Ferrers diagrams, consider the next example.

Example 6.3.1. Prove that the number of partitions of n whose largest part is k is equal to the number of
partitions of n with k parts.

Proof. We introduce the notion of the conjugate of a partition. Given a partition λ of n, we say that its
conjugate λ0 is the partition resulting from flipping the Ferrers diagram across its main diagonal (the dashed
line in the diagram below).
46 CHAPTER 6. BIJECTIONS

λ=6+5+4+2+1+1+1 λ0 = 7 + 4 + 3 + 3 + 2 + 1

Suppose we define f (λ) = λ0 to be the function that takes a partition to its conjugate. Note that if λ has
largest part k, then when λ is conjugated, this largest part rotates to become the number of parts. Therefore,
f maps partitions of n with largest part k to partitions of n with k parts. Conversely, f maps partitions of
n with k parts to partitions of n with largest part k. We claim that f creates a one-to-one correspondence
between the two sets of partitions.
To see this, let A be the set of all partitions of n with largest part k, and let B be the set of all partitions
of n with k parts. Note that f (f (λ)) = λ (taking the conjugate of the conjugate of a partition will flip it
back to itself.)1 In particular, it follows that f : A → B must be one-to-one, because if f (λ1 ) = f (λ2 ), we
can apply f to both sides, finding λ1 = f (f (λ1 )) = f (f (λ2 )) = λ2 . Also, f : A → B must be onto, because
if λ ∈ B, then f (λ) ∈ A, and f (f (λ)) = λ. Hence f is a bijection, so A and B must have the same size. We
conclude that the number of partitions of n whose largest part is k is equal to the number of partitions of n
with k parts.

The above proof introduced the notion of the conjugate of a partition. Based on this, we make the
following definition.

Definition 6.3.2. A partition is self-conjugate if λ = λ0 (i.e., a partition is its own conjugate.)

Our next example demonstrates a bijection involving self-conjugate partitions.

Example 6.3.2. Prove that the number of self-conjugate partitions of n is equal to the number of partitions
of n whose parts are all odd and distinct.

Proof. Given the Ferrers diagram of a self-conjugate partition, we highlight the dots that are in either the
first row or first column. Then we highlight the dots in either the first remaining row or the first remaining
column, and we continue doing this until we’ve covered all of the dots. Straightening out each of these
L-shapes, we create a new partition, as shown below right.

Since the partition is self-conjugate, each L-shape is symmetric, so each L-shape has the same number of
dots extending downward as it does extending to its right. It follows that each L-shape contains an odd
number of dots. Also, as we move right/down through the L-shapes, we see that the size of the L’s must
decrease strictly, because each successive L has one fewer row and one fewer column it can occupy. Thus
this operation will convert a self-conjugate partition into a partition with odd distinct parts.
On the other hand, we can reverse this process—given a partition with odd distinct parts, we circle the
middle dot, and fold the row into an L-shape with vertex at the center dot
1 If a function satisfies f (f (x)) = x, it is called an involution. It is fairly easy to prove that an involution is bijective.
6.3. PARTITIONS 47

Note that this will undo the original operation. This ensures that the original operation is one-to-one,
because if two self-conjugate partitions mapped to the same odd distinct partition, then the reversal would
have to take them back to the same self-conjugate partitions.
Also, by symmetry, the reverse operation takes an odd distinct partition to a self-conjugate partition.
Thus the original operation is onto, because if we apply the reverse operation to an odd distinct partition,
then when we apply the original operation again to the newly self-conjugate partition, we will return to the
same odd distinct partition. Hence this operation is a bijection, so the number of self-conjugate partitions
of n is equal to the number of partitions of n into odd distinct parts.
Example 6.3.3. Prove that the number of partitions of n into odd parts is equal to the number of partitions
of n into distinct parts.
Proof. In order to turn a partition with odd parts into a partition with distinct parts, we perform the
following process:
• If two numbers in the partition are equal, you may combine them into a single part to form a new
partition.
• Continue repeating the above step until you can no longer continue.
For example, given then partition 5 + 5 + 3 + 3 + 3 + 1 + 1 + 1 + 1, the process might look something like this:

(5 + 5) + (3 + 3) + 3 + (1 + 1) + (1 + 1) → 10 + 6 + 3 + 2 + 2 → 10 + 6 + 3 + (2 + 2) → 10 + 6 + 3 + 4.

Hence the process converts the partition 5 + 5 + 3 + 3 + 3 + 1 + 1 + 1 + 1 into 10 + 6 + 4 + 3, which has


distinct parts.
At this point, we might have two questions:
(1) Will this process end, and if so, will it yield a partition with distinct parts?
(2) If we start with a particular partition λ, will every choice of steps end up with the same partition?
To answer the first question, note that in each step of the process, the total number of parts decreases by
exactly one, so the process cannot continue forever. Also, if any two numbers are equal in the final result,
then by the definition of the process, we could combine them using the process, contradicting the fact that
it is the final result. Therefore, the process will certainly yield a partition with distinct parts.
To answer (2), we will pay attention to the largest odd divisor of each part. Suppose that we combine
two numbers of the form (2k − 1) · 2m , so 2k − 1 is the largest odd divisor of each number. Then

(2k − 1) · 2m + (2k − 1) · 2m = (2k − 1) · 2m+1 ,

so when we combine the two numbers, the largest odd divisor will remain the same. For example, if our odd
partition has a number of 1’s and a number of 3’s, then the 1’s can combine into 2’s, any 2’s can combine into
4’s, and so on, but the largest odd divisor of any numbers that started with 1’s will always be 1. Similarly,
the 3’s can combine into 6’s, the 6’s can combine into 12’s, and so on, but the largest odd divisor will always
be 3. Therefore, you can never combine terms that descended from 1’s with terms that descended from 3’s,
or more generally, this is true for any distinct pair of odd numbers. So given a partition into odd parts, we
will group equal parts. For example, with 5 + 5 + 3 + 3 + 3 + 1 + 1 + 1 + 1, we group the parts as

(5 + 5) + (3 + 3 + 3) + (1 + 1 + 1 + 1).
48 CHAPTER 6. BIJECTIONS

Any number with largest odd factor (2k − 1) can be written in the form (2k − 1) · 2m . Suppose that the
given partition of n into odd parts has a total of a parts equal to (2k − 1). Then these parts will combine
into numbers with largest odd factor (2k − 1), which are of the form (2k − 1) · 2m . Hence at the end of the
process, we can write the sum of the resulting parts as
a · (2k − 1) = (2k − 1) · 2m1 + (2k − 1) · 2m2 + · · · + (2k − 1) · 2mr ,
where m1 > m2 > · · · > mr (otherwise we could combine more parts). Hence a = 2m1 + 2m2 + · · · + 2mr .
But by the uniqueness of binary representations, there is a unique way to write a positive integer as the sum
of distinct powers of 2, so no matter how we go through the process, the numbers m1 , m2 , . . . , mr will be
the same. It follows that every choice of steps will end up with the same partition of n into distinct parts.
The above discussion also provides a nice way to undo the process—given a partition of n into distinct
parts, say λ = λ1 + λ2 + · · · + λk , we find the largest odd divisor of each part, so λi = (2ki − 1) · 2mi . Then
we split λi into
(2ki − 1) + (2ki − 1) + · · · + (2ki − 1) .
| {z }
mi times
Since λi came from parts with the same largest odd divisor, the odd parts from which it descended must all
be (2ki − 1). If we do this for each λi , we will convert a partition of n into distinct parts into a partition
of n into odd parts, and it exactly reverses the above process. Pairing these two processes together shows
that the process is a bijection, so the number of partitions of n into odd parts is the same as the number of
partitions of n into ditinct parts.

6.4 The Catalan Numbers


The second major construct we analyze today are the Catalan numbers. The primary interest behind this
sequence stems from the fact that it counts many many sets of objects, implying that there are bijections
between these objects.
Example 6.4.1. Compute the number Cn of paths from (0, 0) to (n, n) in the coordinate plane that:
• travel along the gridlines, and
• do not go above the line y = x.
Solution: We approach this problem by complementary counting: we find the number Un of paths from
(0, 0) to (n, n) that intersect that are above the line y = x at some point. Then the number Cn of paths
that don’t cross y = x is  
2n
Cn = − Un .
n
To count Un , we create a bijection:
   
All paths from (0, 0) to All paths from (0, 0)
bijects to .
(n, n) going above y = x to (n − 1, n + 1)
To construct this bijection, we take a “bad” path from (0, 0) to (n, n) and alter it as follows. Consider the
first point P on the path that is above y = x. Then we reflect the entire path after P . For example, the
following bad paths from (0, 0) to (4, 4) reflect to paths from (0, 0) to (3, 5):

(3, 5) (3, 5)

P
P
6.4. THE CATALAN NUMBERS 49

To prove that this is a bijection for the general case, we must prove all of the following:

1. This reflection always gives a path from (0, 0) to (n − 1, n + 1),

2. It will give different paths from (0, 0) to (n − 1, n + 1) when given different bad paths,

3. It will give every possible path from (0, 0) to (n − 1, n + 1) as we reflect all the bad paths.

The first item is clear: since P is the first point above the line y = x, the moves up to this point have one
more up than it does rights, so the remaining moves will have one more right than it does ups. Reflecting
means that the remaining moves now have one more up instead, resulting in n − 1 rights total and n + 1 ups
total. Therefore the final point will be (n − 1, n + 1), so the reflection will always end up here.
The second item is also clear: the reflection is reversible, so if two bad paths have the same reflection,
then they will “un-reflect” to the same path, so they must have been the same bad paths originally.
The third item is a bit harder to see. It stems from the fact that the reflection is reversible. Given a path
from (0, 0) to (n − 1, n + 1), we can take the first point P above the line y = x, and reflect the entire path
after P . This will result in a path from (0, 0) to (n, n), which will be a bad path since it goes through P . If
we act the reflection process on this bad path, we’ll then get the original path from (0, 0) to (n − 1, n + 1).
This shows that every path to (n − 1, n + 1) is attainable through this process.

We conclude that this reflection


 process gives a bijection between bad paths and paths from (0, 0) to
2n 2n
(n − 1, n + 1). There are n−1 paths between these two points, so Un = n−1 . We conclude that the number
of good paths, from complementary counting, is
         
2n 2n 2n n 1 2n
Cn = − = · 1− = · .
n n−1 n n+1 n+1 n


To end the session, we give some bijections that demonstrate what sorts of objects the Catalan numbers
count.

Example 6.4.2. A balanced string of parentheses of length 2n is a string of n open and n closed parentheses
such that each open parentheses is paired with a closed parenthesis at some point after, such that the string of
parentheses aligns with the rules of algebra. Find a bijection between these strings and the paths from (0, 0)
and (n, n) not going above the line y = x.

Solution: We do not rigorously prove the bijection, but we describe it here. Each open parenthesis
corresponds to moving to the right in the path to (n, n), and each closed parenthesis corresponds to moving
up. Try proving that this gives a bijection. 

Example 6.4.3. Consider the set of full binary trees with 2n + 1 points: this is a binary tree such that each
interior node has two children, either interior nodes or leaf nodes. For example, the following is a full binary
tree for n = 6:

Construct a bijection between full binary trees with 2n + 1 nodes, and walks from (0, 0) to (n, n) not going
above the line y = x.

Solution: Again, we only outline the bijection, and we leave the rigorous proof to the reader. We
traverse the tree and label each node starting at the top root node, following the paradigm that:

• If at an interior node, we map out its left subtree then its right subtree,
50 CHAPTER 6. BIJECTIONS

• Each time we hit a leaf node we go back to the most recent incompleted interior node and go to its right
subtree.
For example, for the tree in the example diagram, we have the following numbering:
1

2 11

3 6
7 12 13
4
5 10
8 9

Now given such a numbering, we look at the numbers from 1 to 2n. If the corresponding node is an interior
node, the path from (0, 0) to (n, n) goes to the right. If the corresponding node is a leaf, the path goes up.
For example, the path for the tree labeling above is as shown:

Try proving that this process gives a bijection between full binary trees with 2n + 1 nodes, and such block
walks. 
Chapter 7

Invariants and Monovariants

7.1 Invariants
Today, we begin our discussion of combinatorial methods of proof relating to proving possibility or impossi-
biity. The first idea is the idea of an invariant.

Definition 7.1.1. An invariant is a quantity or quality that never changes after a process is performed.

An invariant helps simplify a problem by focusing on one particular aspect of the process. A typical
setting involves a game or a process, where at each step, there is a certain set of allowable moves. Then we
wish to determine if it is possible to move from an initial configuration to a different configuration in a finite
number of steps.
If we can determine a quantity that is unchanged after each allowable move and show that the two
configurations have different values of this quantity, then we can conclude that it is impossible to achieve
the second configuration in any finite series of steps. This will be true regardless of the choices we make.
When using an invariant, instead of having to consider all possible choices of moves, we simply prove that
the two configurations can be put into completely different categories based on the invariant.
The first type of invariant we will talk about is parity. One way to describe parity is the property of
being even or odd—we could say that the parity of the integer 4 is even, while the parity of the integer 5 is
odd. However, we also think of it as the process of separating objects into two categories.

Example 7.1.1. Six chips are placed on an 8 × 8 grid in the arrangement shown below left. Every minute,
each chip must move to a horizontally or vertically adjacent square (chips can occupy the same square). Is
it possible that after some period of time that the chips can be arranged as shown below right?

Solution: Suppose we color the grid in a checkerboard pattern as shown below.

51
52 CHAPTER 7. INVARIANTS AND MONOVARIANTS

Note that in the initial arrangement, four chips are on black squares and two chips are on white squares.
Every minute, a chip on a black square must move to a white square, and a chip on a white square must
move to a black square. Let Bn be the number of black squares occupied after n minutes, and let Wn to
be the number of white squares occupied after n minutes. Then Bn = Wn−1 and Wn = Bn−1 . Hence
|Bn − Wn | = |Bn−1 − Wn−1 |. Hence the quantity |Bn − Wn | is invariant—it must always be the same value.
Initially, we know that |B0 − W0 | = |4 − 2| = 2. However, if the arrangement matches the right board
after n minutes, then |Bn − Wn | = |3 − 3| = 0. Since we initially have |B0 − W0 | = 2, and since |Bn − Wn | is
an invariant, it is impossible to obtain an arrangement with |Bn − Wn | = 0. Thus the desired arrangement
is impossible. 
Imposing colorings on grids is a useful tactic to take advantage of parity. However, we can also take
advantage of more traditional usages of parity, where we simply look at the evenness or oddness of a number.
Example 7.1.2. Six 0’s and five 1’s are written on are whiteboard. At each step we erase two of the numbers
and:
• If the two numbers are equal, we write 1 on the board.
• Otherwise we write 0.
After 10 steps, a single number remains. What is that number?
Solution: While the numbers in this problem are small enough that we could try an example, that
would not be a proof. In a proof problem, you need to find more than the answer—we would also need to
prove why we always obtain that answer, regardless of the steps we take.
To do this, we look at the parity of the number of 0’s on the list. Initially, we have six 0’s on the list.
Then for each type of replacement, we find
• If we replace 0, 0 by 1, then the number of 0’s decreases by 2.
• If we replace 1, 1 by 1, then the number of 0’s stays the same.
• If we replace 0, 1 by 0, then the number of 0’s stays the same.
Hence the number of 0’s on the list will either stay the same or decrease by 2 at each step, so the number of
0’s on the list will always be even—we say that the parity of the number of 0’s on the list is invariant. Hence
when there is one number remaining on the list, the only way to have an even number of 0’s is if there are
zero 0’s. Hence the final number will be 1. 
Often, invariants will depend on the computation of some related statistic, by which we mean an algebraic
quantity that can be computed for each step. Consider the following example.
Example 7.1.3. An ordered triple of numbers is given. It is √ permitted to perform
√ the following operation
on the triple: to√change two of them, say
√ a and
√ b, to (a + b)/ 2 and (a − b)/ 2. Is it possible to obtain the
triple (1, 2, 1 + 2) from the triple (2, 2, 1/ 2) using this operation?
Solution: Note that  2  2
a+b a−b
√ + √ = a2 + b2 .
2 2
7.1. INVARIANTS 53

Therefore, the sum of the squares of the entries in the triplet is an invariant.
We find that √ √
12 + 22 + (1 + 2)2 = 8 + 2 2
and  2
2
√ 2 1 13
2 + ( 2) + √ = .
2 2
Therefore, as the sum of the squares of the triplets are not equal, this is impossible. 
The next example combines the previous ideas of parity and an invariant statistic.
Example 7.1.4. Suppose that n is odd, and that (a1 , a2 , . . . , an ) is a permutation of (1, 2, . . . , n). Prove
that
(a1 − 1)(a2 − 2) · · · (an − n)
is even.
Solution: Note that a1 + a2 + · · · + an = 1 + 2 + · · · + n, regardless of the chosen permutation. Therefore,
the sum of the numbers in the product, or

(a1 − 1) + (a2 − 2) + · · · + (an − n) (1)

must always be equal to 0, i.e., it will always be even.


We claim that at least one of the terms a1 − 1, a2 − 2, . . . , an − n is even. Otherwise, if all of the terms
are odd, then (1) will consist of the sum of an odd number of odd numbers, which is odd. This contradicts
the fact that the sum in (1) is equal to 0. Hence at least one of these terms is even, which means that their
product must be even. 
We also can turn to number theory and the tools of modular arithmetic to help us form invariants. These
extend the use of parity and allow us to work with smaller numbers. The following is an example of an
invariant.
Proposition 7.1.1. A number is congruent to the sum of its base-ten digits mod 9.
Proof. Suppose that the number is an an−1 an−2 . . . a0 . Then its base-ten expansion is

an · 10n + an−1 · 10n−1 + an−2 · 10n−2 + · · · + a1 · 101 + a0 .

Taking this mod 9, we find

an ·10n +an−1 ·10n−1 +an−2 ·10n−2 +· · ·+a1 ·101 +a0 ≡ an ·1n +an−1 ·1n−1 +an−2 ·1n−2 +· · ·+a1 ·11 +a0 (mod 9).

It follows that an an−1 an−2 . . . a0 ≡ an + an−1 + an−2 + · · · + a1 + a0 (mod 9), which is what we wanted to
prove.
Suppose that we denote S(n) to be the sum of the base-ten digits of the positive integer n. It can be
shown that the sequence n, S(n), S(S(n)), S(S(S(n))), . . . is a decreasing sequence of positive integers, where
it decreases strictly until reaching a one-digit number (at which point it is constant). The resulting constant
value is called the digital root of n. By the above proposition, a number is congruent to its digital root mod
9.
As an application, consider the following problem.
Example 7.1.5. (IMO; 1975) When 44444444 is written in decimal notation, the sum of its digits is A. Let
B be the sum of the digits of A. Find the sum of the digits of B. (A and B are written in decimal notation.)
Solution: By the above proposition, 44444444 ≡ A ≡ B ≡ S(B) (mod 9), where S(B) is the sum of the
digits of B. We wish to find S(B). We claim that it is the digital root of 44444444 .
To prove this, we will roughly estimate the size of each number. Note that

44444444 < 10, 0004444 = 104·4444 = 1017,776 .


54 CHAPTER 7. INVARIANTS AND MONOVARIANTS

Since 44444444 has at most 17, 777 digits, each of which is at most 9, we know that A ≤ 9 · 17, 777 <
9 · 18, 000 = 162, 000. Therefore, the sum of the digits of A is at most 1 + 9 + 9 + 9 + 9 + 9 = 46, so B ≤ 46.
Hence S(B) ≤ 4 + 9 = 13.
On the other hand, 44444444 ≡ (4 + 4 + 4 + 4)4444 ≡ 74444 (mod 9). Then 73 ≡ 1 (mod 9), so 74444 ≡ 71
(mod 9). It follows that S(B) ≡ 7 (mod 9). The only positive integer less than or equal to 13 that is 7
(mod 9) is 7, hence S(B) = 7. 

7.2 Monovariants
Monovariants are similar to invariants in that they are a way to study iterative processes. More formally,
we can define them as follows.

Definition 7.2.1. A monovariant is a quantity that either always increases or always decreases after a
process is performed.

Similar to invariants, we often compute some sort of algebraic statistic, and we show that this statistic
must always increase or always decrease. This might cause problems in the following situations:

• If the problem asks if we can return to the initial configuration.

• If it is impossible for such a process to go on forever. For example, if a certain statistic is always a
positive integer and it is decreasing at each step, then the process cannot go on forever.

The following illustrates an algebraic monovariant.

Example 7.2.1. Suppose that we have n ≥ 3 numbers on a blackboard, none of which is equal to 0. Every
minute, you erase two numbers, a and b, replacing them with the numbers a + 2b and b − a2 . Is it possible
after a finite amount of time, to have the same n numbers on the blackboard as you had in the beginning?

Solution: Given two terms a and b the sum of the squares of the terms is a2 + b2 . The sum of the
squares of the replaced numbers is
 2 
b a 2 5
a+ + b− = (a2 + b2 ).
2 2 4

This is greater than or equal to the sum of the squares of the previous terms, with equality if and only if
a = b = 0. Hence the sum of the squares of the numbers on the blackboard is a monovariant—it is always
at least as large as the previous sum of squares.
To prove that we cannot obtain the same n numbers, we use the fact that none of the initial numbers is
0. Therefore, after the first step, the sum of the squares of the terms will be strictly larger than the sum of
the squares of the terms in the previous step. After this step, the sum of the squares never decreases, so we
can never return to our initial set of n numbers. 
Using a monovariant on the next problem provides a nice way to turn some of our intuitive thinking into
something more rigorous.

Example 7.2.2. A certain mansion has 144 rooms. A total of 2020 people come to a party at the mansion,
and they play the following game. Every minute, one person leaves the room that they are in, and they move
to a room that has at least as many people as their previous room had in it. The party will end as soon as
everyone is in the same room. Prove that the party will end in a finite amount of time.

Proof. First, we could note an invariant—the number of people in the mansion is always 2020. However,
this does not appear to be very helpful.
Since the numbers are so large, we might investigate by testing out smaller numbers. Certainly, if we
have two rooms, the result is obvious—if a room has more people in it, then the people in the other room
will always come to this room. In fact, no one can ever leave the room with a higher number of people.
However, when we move to three rooms, this is no longer true—for example, if the numbers of people in each
7.2. MONOVARIANTS 55

room are initially 2, 3, 4, then the room with 4 people in it might not always be the largest number—after
the first minute, the numbers might be 1, 4, 4, in which case either the second or third room could have the
most people after a series of steps. However, as a general rule, it appears that people prefer to be in rooms
where there are more people.
As people move to rooms with more people, the number of possible interactions will always increase.
We could measure this by counting how many handshakes could take place. For example, suppose that two
rooms have a and b people in them, respectively, where a ≤ b. A person who moves from the first room to the
second room will go from shaking a − 1 hands to shaking b hands, a difference of b − (a − 1) = (b − a) + 1 ≥ 1.
Therefore, whenever a person moves to a larger room, the total number of possible handshakes will increase
by at least one. Hence the total number of possible handshakes is a strictly increasing monovariant.
Since at most 20202 handshakes can occur among the 2020 people, we know that the total number of
handshakes is bounded above, so at some point, we will not be able to increase our monovariant, i.e., no
one will be able to move out of their room. However, if we have two nonempty rooms with a and b people
with a ≤ b, then certainly at least one person can move out of their room. Therefore, the only way that our
monovariant could not increase is if all people are in the same room. Therefore, eventually everyone will be
in the same room, at which point the party will end. In particular, our proof shows that it must end after
no more than 2020 2 minutes (nearly 4 years!).

The next problem shows how looking at extreme values (i.e. maximums or minimums) can be exploited
when using monovariants.

Example 7.2.3. We have a deck of n cards numbered 1, 2, . . . , n. They are shuffled in random order. At
each step, if the top card is k, then the order of the first k cards is reversed. For example, the following
represents one step in this process, where the top number on the deck is initially 4.

452316 → 325416

Prove that from any initial shuffling, card 1 will eventually be at the top of the deck.

Proof. To approach this problem, we would initially try some cases. We are more interested in cases that
take a longer time to return to the final state. Consider the following shuffling, with the process illustrated:

3 1 4 5 2 → 4 1 3 5 2 → 5 3 1 4 2 → 2 4 1 3 5 → 4 2 1 3 5 → 3 1 2 4 5 → 2 1 3 4 5 → 1 2 3 4 5.

This arrangement has some interesting properties—when 1 is at the top of the deck, all of the cards are in
order. However, this need not happen in general. For example, 2 1 4 3 5 → 1 2 4 3 5 does not end with all
cards in order. Another observation is that when 5 is the top card, it moves to the bottom of the deck, and
it stays at the bottom of the deck. Since there are no other numbers that reverse the bottom card, it cannot
be moved again. After 5 is at the bottom of the deck, a similar thing happens with 4—when it is moved to
the second-to-last position, it cannot be moved again. This has the feel of a monovariant—first we found 5
couldn’t move, then we found that 4 couldn’t move.
To extend this idea, suppose that the sequence t0 , t1 , t2 , t3 , . . . is defined so that tk is the top card
on the deck after k steps. Let M (k) = max{tk , tk+1 , tk+2 , . . . }. This is well defined, because the cards are
between 1 and n inclusive, and we can always find the maximum value of a finite set. Since each successive
term M (k) is taking the maximum of a set with one fewer element, it follows that

M (1) ≥ M (2) ≥ M (3) ≥ · · · . (1)

This is a monovariant, but it would help if we knew it was strictly decreasing. Given a particular maximum
value M (k) > 1, we know that there exists some j ≥ k such that the top number on the deck after j steps
is M (k), i.e., tj = M (k). We claim that M (j + 1) < M (k). It suffices to show that M (k) can never be
achieved again as the top card. Since the top number on the deck is M (k), we know that M (k) is placed in
the M (k)th position when the first M (k) cards are reversed. Therefore, the only way that it can be moved
again is if the top number on the deck is greater than or equal to M (k). But since M (k) is fixed in the
deck until it is moved, the top number that first causes M (k) to move cannot be M (k). Therefore, the top
number must be strictly greater than M (k) at some later time. But this contradicts the fact that M (k) was
56 CHAPTER 7. INVARIANTS AND MONOVARIANTS

the largest remaining value of the top card. Therefore, M (k) can never be moved again, and in particular,
it can never appear as the top card again. Hence M (j + 1) < M (k).
Thus for each term in (1), there exists a strictly smaller value later on in the sequence, so we can form
a strictly decreasing subsequence. But all terms are positive integers, so this cannot continue indefinitely.
Therefore, since we can keep on decreasing until M (k) = 1, there must be some point in the sequence where
M (k) = 1, and after this point, the top number on the deck is always 1. Therefore, 1 will eventually appear
at the top of the deck.
The next example demonstrates a geometric monovariant.
Example 7.2.4. Nine unit squares of a 10 × 10 square are infected. At each step, the unit squares that have
two infected neighbors become infected themselves. Can the infection spread into all the unit squares?

Solution: We claim that the sum of the perimeters of the contiguous infected regions forms a monovariant
(by perimeter, we mean the number of edges that are adjacent to one infected cell but not two infected cells).
Initially, each infected square can contribute at most 4 to the sum of the perimeters. Therefore, the sum of
the perimeters of the infected regions at the start is at most 4 · 9 = 36.
If a square becomes infected, then at least two of its neighbors were infected, and accounting for symmetry
and rotations, there are only four ways that this can happen, drawn below. In the figure, a lightly shaded
square represents an infected square, and a black square is a newly infected square.

Note that in the first and second cases, the net change to the perimeter is 0, because the newly infected cell
removes two edges from the perimeter (they are now inside the infected region) and adds two edges to the
perimeter. In the third case, the net change to the perimeter is −2, because the newly infected cell removes
three edges and adds one edge to the perimeter. In the fourth case, the net change to the perimeter is −4,
because the newly infected cell removes four edges and adds zero edges to the perimeter. It follows that
when we add a new infected square, the perimeter cannot increase.
In particular, since the perimeter of the 10 × 10 square is 40, we can never infect the entire 10 × 10 square.

Chapter 8

The Pigeonhole Principle

8.1 Introduction
Our next method of combinatorial proof is the Pigeonhole Principle. Sometimes, this is called Dirichlet’s
Box Principle, because the mathematician Peter Gustav Lejeune Dirichlet wrote a book on the technique,
where he referred to using boxes. Eventually people started calling it the Pigeonhole Principle—a pigeonhole
is an open space in a desk, cabinet, or wall that is used for keeping letters or papers. However, we often
think of the Pigeonhole Principle as putting a certain number of pigeons into a certain number of holes (or
boxes). Formally, the Pigeonhole Principle states the following:

Theorem 8.1.1. (Pigeonhole Principle) If n + 1 pigeons are distributed into n holes, then there must exist
some hole with at least 2 pigeons.

While this statement may appear self-evident, we provide a brief proof.

Proof. Suppose for sake of contradiction that every hole contains fewer than 2 pigeons, i.e., 1 or fewer pigeons.
Then there are at most n · 1 = n pigeons, which is a contradiction. Therefore, some hole must contain at
least 2 pigeons.

While seemingly simple, the Pigeonhole Principle can be a very powerful tool, especially when we use
more abstract notions of pigeons and holes. Before we proceed into some examples, we state two other
versions of the Pigeonhole Principle.

Theorem 8.1.2. (Pigeonhole Principle, Part 2) If n − 1 pigeons are distributed into n holes, then there
must exist some hole with 0 pigeons.

Theorem 8.1.3. (Pigeonhole Principle, Part 3) If kn + 1 pigeons are distributed into n holes, then there
must exist some hole with at least k + 1 pigeons.

The pigeonhole has some strange consequences. For example, we could make the following statement:

Example 8.1.1. The maximum number of hairs on a person’s head is about 500, 000. Therefore, there are
at least two people in New York City (population around 8.4 million) that have the same number of hairs on
their head.

While the statement of the Pigeonhole Principle makes it sound quite simple, it can be applied in very
intricate ways. Often the intricacy lies in how we choose to decide what objects are the pigeons and what
objects are the holes. Sometimes, the pigeons represent physical objects that we are categorizing into different
properties, and the possible properties are the holes. Consider the following example.

Example 8.1.2. At a dinner party, 12 guests are seated at a round table. Unfortunately, everyone’s nametag
was covered up, so everyone sat in the wrong seats. Prove that you can rotate the guests around the table
such that at least two guests are sitting in the correct spot.

57
58 CHAPTER 8. THE PIGEONHOLE PRINCIPLE

Proof. Certainly, by rotating person #k to their correct spot, we can guarantee that at least one guest will
be in the correct seat. But how can we guarantee that two guests will be in the correct seat? Thinking
about using the Pigeonhole Principle we will use the 14 possible rotations of the guests as the boxes (we
are not counting the original position as a rotation, since no guests are in the correct position). For each
guest, there is exactly one rotation for which they are in the correct seat. We give each guest a token, and
each guest places their token in the box representing the rotation where they are in the correct position.
Therefore, a total of 15 tokens are placed into 14 boxes.
By the Pigeonhole Principle, some box must contain at least 2 tokens, which represents at least two
people sitting in the correct position for that rotation. Thus there must be some rotation for which at least
two guests are in the correct positions.
Example 8.1.3. Among the 12 guests at the dinner party, some are friends and some are strangers. As-
suming that friendship is always mutual (i.e., if you are my friend, then I am your friend), prove that two
people at the party have the same number of friends at the party.
Proof. Note that everyone has a certain number of friends at the party, ranging from 0 to 11. Therefore,
using “# of friends” as boxes, we can place the 12 people into 12 boxes. It seems like we’re stuck, because
the Pigeonhole Principle requires us to have n + 1 pigeons and n holes. However, note that it is impossible
for one person to have 0 friends and another person to have 11 friends, because on the one hand, there would
be a person who is friends with no one, and on the other hand, there would be a person who is friends with
everyone. This is impossible. Therefore, we use a single box for having 0/11 friends.
Thus we now have 12 people that we are distributing into 11 boxes, so two people must be in the same
box. Thus these two people will have the same number of friends at the party.
The next example involves a geometric use of the Pigeonhole Principle.
Example

8.1.4. Prove that if we select any five points inside a unit square, then two of them must lie within
2
2 units of each other.
Proof. Suppose that we split a unit square into four congruent squares as shown below.

Using these four squares as boxes and the five points as pigeons, we know by the Pigeonhole Principle√that
some square must contain at least two points.
√ The furthest distance between two 1/2 × 1/2 squares is 2/2.
Therefore, this two points are at most 2/2 units away from each other, so we have shown that such points
must exist.
The next example is an elementary problem from Ramsey Theory. In this example, instead of putting
people into boxes, we will put relationships into boxes.
Example 8.1.5. Prove that among any 6 people, there are either at least 3 people who know each other or
3 people who are pairwise strangers (assume that “knowing someone” is symmetric).
Proof. It may be more convenient to think in terms of colors—if two people known each other, we say they
have a green relationship; if two people are strangers, we say they have a red relationship.
Consider Person #1. There are five possible relationships that Person #1 has with other people, each of
which is colored red or green. By the Pigeonhole Principle, using the relationships as pigeons and the colors
as boxes, one of these two colors must have at least three relationships, i.e., Person #1 is either friends with
at least three people or strangers with at least three people. Without loss of generality (swapping colors if
necessary), assume that Person #1 has a green relationship with at least three people, which we call Persons
#2, #3, and #4. If any of these three people are friends with each other (say Person #2 and Person #3),
then we are done, because Persons #1, #2, and #3 will all know each other. Otherwise, none of Persons
#2, #3, and #4 will know each other, so they are all pairwise strangers, so we are once again done.
8.1. INTRODUCTION 59

The next example is a tricky example of how we can construct pigeons to meet our needs.
Example 8.1.6. Prove that any set of integers with n elements has a non-empty subset where the sum of
elements of the subset is divisible by n.
Proof. Suppose that the set is {x1 , x2 , . . . , xn }. Let Sk = {x1 , x2 , . . . , xk }. If any of the sums of the Sk ’s
are divisible by n, then we are done. Otherwise, we may assume that the n sums, sum(S1 ), sum(S2 ),
. . . , sum(Sn ) have nonzero residues mod n. However, there are only n − 1 nonzero residues mod n, hence
by the Pigeonhole Principle, there must be at least two subsets, say Si and Sj with i < j, such that
sum(Si ) ≡ sum(Sj ) (mod n).
However, since i < j, we know that Si is a proper subset of Sj , so Sj \ Si is a nonempty subset of the
original set. Furthermore, this also implies that

sum(Sj \ Si ) = sum(Sj ) − sum(Si ) ≡ 0 (mod n).

Hence Sj \ Si is a nonempty subset whose sum of elements is divisible by n. Thus such a subset must always
exist.
In this problem, looking at the partial sums allowed us to focus on the sum of a subset rather than
elements of the subset. Partial sums are often a useful trick in this scenario—we’ll come back to them in a
later example.
Our next example proves a surprising fact and demonstrates a use of the Pigeonhole Principle in number
theory.
Example 8.1.7. Prove that there is a Fibonacci number divisible by 2017.
Proof. In fact, we will prove something stronger—given a positive integer n, there exists a Fibonacci number
divisible by n. Experimenting with small cases, we see that the Fibonacci sequence appears to be periodic
mod n, but sometimes it might take some time before it repeats. For example, the Fibonacci sequence mod
5 is
1, 1, 2, 3, 0, 3, 3, 1, 4, 0, 4, 4, 3, 2, 0, 2, 2, 4, 1, 0, 1, 1, . . . .
We see that in this case, the Fibonacci sequence has period 20 mod 5. While it achieved 0 (mod 5) much
earlier, it takes a while for the Fibonacci sequence to start over again.
Indeed, since each term depends on the previous two terms, we will need to keep track of ordered pairs
of consecutive terms, (Fj , Fj+1 ). We see that there are n2 possible values for what the two terms in the
ordered pair could be mod n. Therefore, since the Fibonacci sequence is infinite, we can use ordered pairs
of consecutive Fibonacci numbers as an infinite number of pigeons, with the n2 possible ordered pairs of
remainders mod n as the holes, to see that there exists some j < k such that (Fj , Fj+1 ) ≡ (Fk , Fk+1 )
(mod n).
But this still does not necessarily prove that there is a Fibonacci number that is 0 (mod n)—it merely
proves that the Fibonacci sequence is eventually periodic mod n. However, we can use this in conjunction
with backtracking—note that if Fi+1 = Fi + Fi−1 , then Fi−1 = Fi+1 − Fi . Thus we can backtrack to find
previous values of the Fibonacci sequence. In particular, if Fj ≡ Fk (mod n) and Fj+1 ≡ Fk+1 (mod n),
then Fj−1 = Fj+1 − Fj and Fk−1 = Fk+1 − Fk , so Fj−1 ≡ Fk−1 (mod n). Therefore, inducting downward,
we can show that the Fibonacci sequence is completely periodic mod n—it doesn’t just eventually fall into
a repeating pattern, it will repeat the first terms of the sequence after a period of time.
Additionally, if F1 = F2 = 1, then using the backtracking equation, we find that F0 = F2 − F1 = 0,
so F0 ≡ 0 (mod n). While we did not start with F0 being defined, the backtracking equation allows us to
define F0 so that it matches the periodic pattern mod n. Additionally, since F0 ≡ 0 (mod n), and since Fn
is completely periodic mod n, it follows that there must be some later term in the Fibonacci sequence that
is also 0 (mod n). This number will be divisible by n, so we have proven that if n is a positive integer, then
there exists a Fibonacci number divisible by n.
So far, all of our examples have focused on a discrete use of the pigeonhole principle—we’ve always
dealt with an integer number of pigeons and an integer number of holes. However, we can also apply the
Pigeonhole Principle in non-discrete settings, which we illustrate with the next version of the Pigeonhole
Principle.
60 CHAPTER 8. THE PIGEONHOLE PRINCIPLE

Proposition 8.1.4. If a1 , a2 , . . . , an are real numbers such that a1 + a2 + · · · + an = S, then at least one of
the ai ’s is greater than or equal to the average, i.e., ai ≥ Sn for some i.

We make use of this version of the Pigeonhole Principle in our next example.

Example 8.1.8. On a circular route, there are n identical cars. Together they have enough gas for one car
to make a complete tour. Prove that there is a car that can make a complete tour by taking gas from all the
cars that it encounters.

Proof. Suppose that we label the cars in order going in the direction of motion as c1 , c2 , . . . , cn .
Suppose that car ci has enough gas to cover a distance of di miles, and that the distance between cars ci
and ci+1 (where cn+1 is defined to be c1 ) is ei miles. Then d1 + d2 + · · · + dn and e1 + e2 + · · · + en must both
be equal to the total distance around the loop. In particular, since d1 + d2 + · · · + dn = e1 + e2 + · · · + en , it
follows that
(d1 − e1 ) + (d2 + e2 ) + · · · + (dn − en ) = 0.
Thus by the Pigeonhole Principle, at least one di − ei is greater than or equal to 0. If di − ei ≥ 0, then ci
has enough gas (di ) to travel the distance to car ci+1 (which is ei ).
We proceed by induction. Clearly, if n = 1, then the car has exactly enough gas to make the complete
tour. Assume that the result holds true for n = k cars. If n = k + 1, then by the above argument, at least
one car ci has enough gas to make it to the next car ci+1 . Therefore, we could just as soon give ci+1 ’s gas
to ci at the start, because ci will be able to use all of ci+1 ’s gas. By removing ci from the scenario, we now
have k cars, so by the inductive hypothesis, some car can make a complete tour. Therefore, by induction,
for any positive integer number of cars, some car will be able to complete a full tour.

8.2 Trickier Examples


The next three examples illustrate some novel uses of the Pigeonhole Principle. First, we illustrate another
use of the partial sum method.

Example 8.2.1. While on a four-week vacation, Herbert will play at least one set of tennis each day, but
he will not play more than 40 sets total during this time. Prove that no matter how he distributes his sets
during the four weeks, there is a span of consecutive days during which he will play exactly 15 sets.

Proof. During his four-week vacation, there are 28 days, so suppose that Herbert plays xi sets on day i,
where 1 ≤ i ≤ 28, and xi ≥ 1 for all i. Then define si = x1 + x2 + · · · + xi to be the ith partial sum of the
number of sets played. We know that s28 ≤ 40. We will use the residue classes mod 15 as boxes to hold the
si . In the diagram below, each box is labeled with all of the possible si each number is congruent to mod
15.

31 32 33 34 35 36 37 38 39 40

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Note that if si < sj are in the same box, where the box is one of the last five boxes, then sj − si = 15. In
particular, in this scenario Herbert would play exactly 15 sets between days i + 1 and j inclusive. Therefore,
we assume for sake of contradiction that Herbert can play at most one game from each of these days. Thus
Herbert must place at least 28 − 5 = 23 of the si into the first ten boxes. By the Pigeonhole Principle, at
least one of the first ten boxes must contain at least three si . In this case, since every possible number in
that box appears, then there must exist si < sj such that sj − si = 15. As before, Herbert will play exactly
15 sets between days i + 1 and j, inclusive. Therefore, such a period of time must always exist.

The next example combines the idea of parity with the Pigeonhole Principle.
8.2. TRICKIER EXAMPLES 61

Example 8.2.2. A 6 × 6 board is tiled completely with dominoes. Prove that one can cut the board along
one of the vertical/horizontal grid lines of the board so that no dominoes are cut, as shown in the example
below.

Solution: First, note that since there are 18 dominoes, there will be a total of 18 line segments that are
covered by dominoes. Since there are 10 possible grid lines, this means that on average each grid line will
have a total of 18/10 = 1.8 of its six line segments covered by dominoes. Thus by the Pigeonhole Principle,
at least one of the grid lines is covered by fewer than 1.8 (i.e., 1 or fewer) dominoes.
We claim that each grid line is covered by an even number of dominoes. Each grid line splits the 6 × 6
grid into two rectangular regions, each with an even number of squares. A domino either covers two squares
in a region or one square in each region (in which case it covers the grid line). Since the total number of
squares in each region is even, the total number of dominoes that cover one square in each region must be
even. Hence there are an even number of dominoes that cover each grid line.
It follows that the grid line covered by fewer than 1.8 dominoes cannot be covered by just one domino,
so it must be covered by 0 dominoes. Hence when we cut the board along this grid line, no domino will be
cut. 
We end with this example, because it illustrates a unique use of pigeons and holes, but it also has an
unexpected answer.

Example 8.2.3. (Putnam; 1996) Suppose that each of 20 students has made a choice of anywhere from 0
to 6 courses from a total of 6 courses offered. Prove or disprove: there are 5 students and 2 courses such
that all 5 have chosen both courses or all 5 have chosen neither course.

Solution: Note that there are 26 = 64 possible class schedules. So we certainly cannot guarantee that
two students have exactly the same class schedule. However, for a given student, there’s a certain set of
pairs of courses for which they either take both courses or don’t take both courses. For example, suppose
that a student has the following course schedule.
Course 1
Course 2
Course 3
Course 4
Course 5
Course 6

Student 1

This student takes both courses in the pairs(1, 3), (1, 4), and (3, 4), and they take neither course in the pairs
(2, 5), (2, 6), and (5, 6). For each of the 62 = 15 pairs of classes, we create a box, and we place the label
(a, b) on the box (representing Course a and Course b). If a student takes both classes in a pair (a, b), they
receive a red token to place in the (a, b) box; if the student takes neither class, they receive a blue token to
place in the (a, b) box. For example, student 1 will receive 3 red and 3 blue for a total of 6 tokens. In fact,
we note the following:

• If a student takes all 6 courses, they will receive 62 = 15 tokens and place them in different boxes.

5
• If a student takes exactly 5 courses, they will receive 2 = 10 tokens and place them in different boxes.
62 CHAPTER 8. THE PIGEONHOLE PRINCIPLE

4
 
2
• If a student takes exactly 4 courses, they will receive 2 + 2 = 7 tokens and place them in different
boxes.
3
 
3
• If a student takes exactly 3 courses, they will receive 2 + 2 = 6 tokens and place them in different
boxes.
2
 
4
• If a student takes exactly 2 courses, they will receive 2 + 2 = 7 tokens and place them in different
boxes.

5
• If a student takes exactly 1 courses, they will receive 2 = 10 tokens and place them in different boxes.
6

• If a student takes exactly 0 courses, they will receive 2 = 15 tokens and place them in different boxes.

Therefore, each of the 20 students will place at least 6 tokens into the 15 boxes, so at least 120 tokens will
be placed into boxes. Therefore, by the Pigeonhole Principle, some box must contain at least 120/15 = 8
tokens. By a second application of the Pigeonhole Principle at least four chips in this box must be the same
color. However, this is not enough—this only proves that there must be four students who either take both
classes or take neither class.
Our argument is not completely wasted, though—note that if 121 tokens are placed into boxes, then
some box must contain at least 9 tokens, so in this box there must be at least five chips of the same color,
which corresponds to at least five students who either take both classes or take neither class. If we have a
student who does not take 3 courses, they will cause at least 121 tokens to be distributed. It follows that if
the statement is false, then each student must take exactly 3 courses.
There are 63 = 20 ways for a student to pick three courses, so suppose that we distribute every possible
combination of three courses to the 20 students. Then no student will have the same set of three courses.
Also, if we look at all students who take courses a and b, then each student takes exactly one of the remaining
four courses, and there are only four ways to choose these, so there are at most 4 students that take both
course a and course b. Thus there cannot be five students who all take a pair of courses. The same argument
applies for students who do not take the same two courses. Therefore, the statement is false. 
The next example illustrates how the choices of pigeons and holes may be constructed in strange ways.
Example 8.2.4. The positive integers 1 to 101 are written down in any order. Prove that you can choose
11 of these numbers that are monotonically increasing or decreasing.
Solution: Suppose that the kth number is xk . For each k, define (ik , dk ) such that ik is the length of
the largest increasing subsequence starting with xk , and define dk to be the length of the largest decreasing
subsequence starting with xk .
It is impossible for two ordered pairs to be exactly the same—if there exist k < j such that xk < xj such
that (ik , dk ) = (ij , dj ), then in fact, since any increasing subsequence starting with xj can be extended to a
longer subsequence starting with xk , it cannot be that ik = ij . A similar argument shows that we cannot
have xk > xj .
However, if ik , dk ≤ 10, then are at most 102 = 100 possible ordered pairs, so by the Pigeonhole Principle,
as there are 101 integers, there must be two that are represented by the sake ordered pair (ik , dk ). This
contradicts the above paragraph. Hence there is at least one ordered pair (ik , dk ) where either ik ≥ 11 or
dk ≥ 11, which is what we wanted to prove. 
Chapter 9

Double Counting

9.1 Introduction
We recently covered bijections, which is the tactic of showing that two different sets have the same size
to solve problems. Today we will cover a very similar idea called Double Counting, which is the tactic of
counting the same set, but in two different ways. The implication is that the two different calculations must
therefore be equal, which will let us derive interesting results.

Example 9.1.1. In how many ways can one choose a committee of k people from n − 1 senators and one
senate head?

Solution: Note that there are n distinct people, so a committee of k can be formed in nk ways.
Now notice that we can also count this using casework on whether or not we include the senate head in
the committee. If we do not include the head, then there are n−1 ways to form the committee. If we do
 k
include the head, then there are n−1
k−1 ways to form the rest of the committee. Therefore there are a total
n−1
 n−1

of k−1 + k ways to form a committee of k people.
We have just computed the number of ways to form a committee of k people in two different ways, so
the two counts we got must be equal. We can therefore conclude Pascal’s Identity to be true:
     
n n−1 n−1
= + .
k k−1 k

This is the general technique behind Double Counting. The method stems from a simple idea, but
applications can become quite complicated: it may be difficult to choose which set to count, and it may be
difficult to find two different ways to count it. Today, we see several different examples of Double Counting
in action.

9.2 Examples
The first two examples are from Number Theory, and we give them to demonstrate the power of combinatorics
when applied in other fields.
Example 9.2.1. Prove that for n a positive integer
X
φ(d) = n
d|n

where φ(d) is the Euler’s totient function, representing the number of positive integers less than or equal to
n that are relatively prime to n.

63
64 CHAPTER 9. DOUBLE COUNTING

Solution: The right-hand side being n suggests that we should be counting the set {1, 2, 3, . . . , n}. The
left-hand side is the sum of several different terms, all organized by the divisors d of n, so we should be
thinking about how to sort {1, 2, 3, . . . , n} by the divisors of n. Finally, we’re summing φ(d) for each divisor
d | n, so we should be thinking about numbers relatively prime to each divisor.
One way we can construct numbers relatively prime to d is by considering all fractions m d , where
gcd(m, d) = 1. Therefore, φ(d) counts all reduced fractions with denominator d. However, since d | n,
we can rewrite the fraction to have denominator n. So this suggests that we instead count the list:
1 2 3 n
, , , ..., .
n n n n
This list clearly has n elements. Now suppose that we completely reduce each fraction. If a fraction has
denominator d | n, then its numerator will be relatively prime to d and at most d. Since there are φ(d)
possible numerators, we have φ(d) fractions with denominator d. This is true for all d | n, so the total
number of fractions must be X
φ(d).
d|n

The number of fractions is also n, so this sum must equal n. 

Example 9.2.2. Let vp (j) denote the exponent of p in the prime factorization of j. Legendre’s Factorial
Formula says that vp (n!) can be evaluated as the infinite sum
     
n n n
vp (n!) = + 2 + 3 + ···
p p p
Prove this formula.
Solution: When we multiply two numbers, we add their exponents, so we can rewrite vp (n!) as
n
X
vp (n!) = vp (i).
i=1

We visualize this sum by creating a grid of dots. The bottom axis will have numbers 1 through n, and above
each number i we place vp (i) dots. For example, for p = 2 and n = 8, we obtain the following diagram:

1 2 3 4 5 6 7 8

Then vp (i) counts the number of dots in the ith column from the left. The sum of these numbers from 1 to
n will give vp (n!), so the total number of dots in this grid is vp (n!).
Now consider the dots in the kth row. A column i will only have a dot if i is a multiple of pk . There
are bn/pk c numbers at most n that are a multiple of pk , so there are bn/pk c dots in the kth row. This is
true for all positive integers k (including the ones for which pk > n, in which case the kth row has no dots).
Therefore, we can count the dots in the grid by summing up the rows, to get
∞ 
X 
n
vp (n!) = .
pk
k=1

This proves the Legendre Factorial Formula. 


The next few examples are more combinatorial in nature, and will line up with problems that you will
see in Olympiads.
Example 9.2.3. Twenty-five people form several committees. Each committee has five members, and any
two committees have at most one common member. Prove that there are at most 30 committees.
9.2. EXAMPLES 65

Solution: Since each committee has five members, we can count the pairs (m, c) of members m in
committees c. If we count these over all committees, we will get a total count of 5c pairs.
Now we count these pairs by looking at the members. Suppose we fix a person m, and look at all the
committees they can be in. Any two committees that m is in can only share one member, but they already
share m, so all of m’s committees cannot share any other people. There are only 24 other members and each
of m’s committees can only hold 4 of them, so m can only be in 24/4 = 6 committees at most, and thus only
in 6 pairs (m, c).
This is true for all twenty-five people, so if we count the pairs (m, c) over members we get at most
6 · 25 = 150 pairs. The number of pairs is also 5c, so 5c ≤ 150, and hence c ≤ 30. 

Example 9.2.4. (IMO; 1977) In a finite sequence of real numbers the sum of any seven consecutive terms
is negative and the sum of any eleven consecutive terms is positive. Determine the maximum number of
terms in the sequence.

Solution: Naı̈vely, we can say that if we have 77 elements in the sequence, then we can split them into
groups of 11 to get a positive sum, and groups of 7 to get a negative sum, which shows that having 77
elements is impossible. However, it is unlikely that we will be able to get 76 elements in our sequence, so we
need a stronger argument.
Instead, we organize the sequence elements in a different fashion to get an alternative Double Counting
argument. We create a two-dimensional 7 × 11 array, where rows will have 7 adjacent terms and column will
have 11 adjacent terms. The following array has this property:
 
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 
 
x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 
 
x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 
 
x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 
 
x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 
x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17

Summing by the rows gives a positive total, and summing by the columns gives a negative total, so there
cannot exist 17 different elements. Therefore, there are at most 16 elements in the sequence.
Now the question is: How can we prove that the maximum is 16? It is possible to find an example
sequence with 16 elements using some experimentation, but here we present a solution that produces a
method of finding such sequences. We can express the conditions in the problem in terms of partial sums
of the sequence: if we let sn be the sum of the first n terms, then the conditions rewrite to sn > sn+7 and
sn < sn+11 . We can then use these inequalities in the following way:

s10 < s3 < s14 < s7 < 0 < s11 < s4 < s15 < s8 < s1 < s12 < s5 < s16 < s9 < s2 < s13 < s6 .

So if we let s10 = −4, s3 = −3, s14 = −2, . . . , and s6 = 12, then we can solve for the nth sequence element
as sn − sn−1 , and obtain a sequence that satisfies all of the conditions:

5, 5, −13, 5, 5, 5, −13, 5, 5, −13, 5, 5, 5, −13, 5, 5.

Therefore 16 is in fact the maximum possible length of the sequence. 

Example 9.2.5. (IMO; 1989) Let n and k be positive integers and let S be a set of n points in the plane
such that

(i) No three points of S are collinear, and

(ii) For any point P of S there are at least k points of S equidistant from P .

Prove that
1 √
k< + 2n.
2
66 CHAPTER 9. DOUBLE COUNTING

Solution: The Double Counting will have to somehow take advantage of the points equidistant from each
point P . Notice that these equal distances imply isosceles triangles, so how we can approach this problem is
by counting the number of isosceles triangles among these points.
Consider the ordered pairs (A, B, C) of points in S, where A is a vertex of the isosceles triangle ABC.
Let there be ∆ such triples. We can count these by considering the vertex A: there are n choices for what
A can be, and there are at least k(k − 1) choices for B and C, so there are at least nk(k − 1) such triples,
so ∆ ≥ nk(k − 1).
We now count these triples by looking at the base corners B and C. Given any two points B and C, any
A must be on the perpendicular bisector of BC. However, since no three points in S are collinear, there can
only be at most two such points. There are n(n − 1) ways to choose B and C, and for each choice we have
at most two choices for A, so there are at most 2n(n − 1) such triples, so ∆ ≤ 2n(n − 1).
Combining these two inequalities yields

nk(k − 1) ≤ ∆ ≤ 2n(n − 1).



1 2

This implies that k 2 − k ≤ 2n − 2, so k − ≤ 2n − 78 . Taking the square root yields that k − 1
< 2n,
√ 2 2
which proves that k < 21 + 2n.

Chapter 10

Generating Functions 1

10.1 Introduction
A generating function is a tool to encode counting problems using polynomials. The reason that polynomials
are useful is that they satisfy the distributive law, which corresponds nicely to combinatorial interpretations.
Before we use polynomials, suppose that we have three events that occur, call them A, B, and C (something
like tossing a coin or rolling a die). Suppose each event can take on two values–i.e. in event A, either a1 or
a2 happens (but not both). There are eight possibilities for how the three events could occur, namely,

a1 b1 c1 , a1 b1 c2 , a1 b2 c1 , a1 b2 c2 , a2 b1 c1 , a2 b1 c2 , a2 b2 c1 , a2 b2 c2 .

However, we could also represent these eight events using addition and the distributive law. Consider the
product
(a1 + a2 )(b1 + b2 )(c1 + c2 ),
where the three expressions in parentheses represent the events A, B, and C, respectively. Therefore, when we
expand the product using the distributive law, a term in the expansion looks like ai bj ck , where 1 ≤ i, j, k ≤ 2.
We obtain
a1 b1 c1 + a1 b1 c2 + a1 b2 c1 + a1 b2 c2 + a2 b1 c1 + a2 b1 c2 + a2 b2 c1 + a2 b2 c2 .
This looks exactly like the terms we obtained before. Therefore, we can use the distributive law as a sort of
bookkeeping tool.
Polynomials allow us to keep track of more information. In general, each event has a number of values
that it can take on. For example, in rolling a die, one can obtain any value of 1 through 6. We represent
this event with the generating function

D(x) = x + x2 + x3 + x4 + x5 + x6 ,

where the coefficient of xk represents the number of ways one can roll a k in a single toss of the die. What
happens if we roll two dice? We multiply D(x) by itself, finding

(x + x2 + x3 + x4 + x5 + x6 )(x + x2 + x3 + x4 + x5 + x6 ).

To find a term in the product using the distributive law, we take a term from the first set of parentheses,
and then multiply it by a term from the second set of parentheses, i.e., xi · xj = xi+j . This represents rolling
i on the first die, and j on the second die, for a total sum of i + j. Therefore, when the polynomials are
multiplied together, an xk appears every time the sum of the rolls is k. Hence the coefficient of xk is the
number of ways to roll a sum of k. With two dice, we find that

D(x)2 = x2 + 2x3 + 3x4 + 4x5 + 5x6 + 6x7 + 5x8 + 4x9 + 3x10 + 2x11 + x12 .

In particular, as the coefficient of x8 is 5, we conclude that there are 5 ways to roll a total of 8. Of course,
we could have easily found this by listing the possibilities ((2, 6), (3, 5), (4, 4), (5, 3), (6, 2)). Does this really
make the problem easier? We’ll come back to this after we discuss some more tools.

67
68 CHAPTER 10. GENERATING FUNCTIONS 1

Example 10.1.1. Red candies come in packs of 1 or 2. Blue candies come in packs of 2 or 3. Green
candies come in packs of 1 or 3. Suppose that three children Alyssa, Brad, and Chris, each buy one pack of
candy. Find a generating function that describes all possible outcomes—in particular, when you expand your
generating function, the coefficient of xk should be the number of ways that the three children can collectively
buy k candies.

Solution: For each child, there are six possibilities: they can obtain a 1-pack in two ways (red or green),
a 2-pack in two ways (red or blue), and a 3-pack in two ways (blue or green). Therefore, we can represent
the event of one child with the generating function

(x + x + x2 + x2 + x3 + x3 ) = 2x + 2x2 + 2x3 .

Therefore, the generating function for all three children is given by

(2x + 2x2 + 2x3 )3 .


Expanding this might be a chore, and usually we won’t expand it. However, we may be interested in the
coefficients of particular terms. For this, we introduce the following notation.

Definition 10.1.1. Given a polynomial P (x), we use the notation [xk ]P (x) to denote the coefficient of xk
in P (x). For example,  
k n n
[x ](1 + x) =
k
by the Binomial Theorem.

The advantage of using generating functions is that we can use the tools of algebra to simplify our
work. We will state some of these results now, as these will be crucial tools in your journey to becoming a
generatingfunctionologist. The first tools are the geometric series formulas for finite and infinite geometric
series.

Proposition 10.1.1. (Geometric Series) If x 6= 1, then

1 − xn
= 1 + x + x2 + x3 + · · · + xn−1 . (G1)
1−x
Also, if |x| < 1, then
1
= 1 + x + x2 + x3 + · · · . (G2)
1−x
Additionally, we have various forms of the Binomial Theorem. The first three equations are all special
cases of the last equation, though we may have to do some work to derive them. Recall our definition of nk

in terms of falling factorials as nk = (n)
k! .
k

Proposition 10.1.2. (The Binomial Theorem)

1. (Regular Binomial Theorem) If n is a positive integer and x is a real number, then


       
n n n n 2 n n
(1 + x) = + x+ x + ··· + x . (B1)
0 1 2 n

2. (Negative Binomial Theorem, Version 1) If n is a positive integer and |x| < 1 is a real number, then
     
−n −n −n −n 2
(1 + x) = + x+ x + ··· . (B2)
0 1 2

Note that this is an infinite series!


10.1. INTRODUCTION 69

3. (Negative Binomial Theorem, Version 2) If n is a positive integer and |x| < 1 is a real number, then
     
n n+1 2 n+2 3
(1 − x)−n = 1 + x+ x + x + ··· (B3)
1 2 3
     
n n+1 2 n+2 3
=1+ x+ x + x + ··· (B30 )
n−1 n−1 n−1

Again, this is an infinite series, and it is a different way of writing (B2).

4. (Generalized Binomial Theorem) If α is any real number and |x| < 1 is a real number, then
     
α α 2 α 3
(1 + x)α = 1 + x+ x + x + ··· . (B4)
1 2 3

We will not prove the generalized binomial theorem, because the proof requires calculus. However, it is
an essential tool in our toolbox. Briefly, we note that if n is a positive integer, then
   
−n (−n)k (−n)(−n − 1)(−n − 2) · · · (−n − (k − 1)) k n+k−1
= = = (−1) · .
k k! k! k

This can be use to show the equivalence of B2 and B3.


As an example of the Negative Binomial Theorem, we will approach the problem of stars and bars in a
new light.

Example 10.1.2. Find a generating function that represents the number of nonnegative integer solutions
(a, b, c) to a + b + c = n.

Solution: Since a can be any nonnegative integer, we represent its possibilities with the generating
function
1
1 + x + x2 + x3 + · · · = . (1)
1−x
(Here, the term xk represents the possibility when a = k). Expanding

(1 + x + x2 + · · · )(1 + x + x2 + · · · )(1 + x + x2 + · · · ), (2)

let xa be the term from the first factor, xb be the term from the second factor, and xc the term from the
third factor. When multiplying these terms, we obtain the term xa+b+c in the expansion. Therefore, each
term xn in the expansion was created by multiplying xa · xb · xc , yielding a + b + c = n. So the number of
solutions to the equation a + b + c = n is the coefficient of xn in the expansion of (2).
However, we can also write (2) using the right side of (1), so it can more compactly be written as (1−x)−3 .
Using the typical setting of stars and bars, we know that the number  of solutions will be the same as the
number of arrangements of n stars and 2 bars, which is just n+2 0
2 . Indeed, if we compare this to (B3 ) of
n
the Negative Binomial Theorem when n = 3, we see that it is the coefficient of x . 
As an example of how helpful these tools are, we return to the problem of dice.

Example 10.1.3. Suppose that six dice are rolled. How many ways can you roll a sum of 27?

Solution: Recall from above that the generating function for a single die was D(x) = x+x2 +x3 +· · ·+x6 .
Therefore, the generating function representing the possible results of six dice is

D(x)6 = (x + x2 + x3 + x4 + x5 + x6 )6 .

Using geometric series,


x(1 − x6 )
x + x2 + x3 + x4 + x5 + x6 = .
1−x
70 CHAPTER 10. GENERATING FUNCTIONS 1

Therefore, we can write the above generating function as


 6
x(1 − x6 )
= x6 (1 − x6 )6 (1 − x)−6 .
1−x

We want to find the coefficient of x27 in this expansion. We find that

[x27 ]x6 (1 − x6 )6 (1 − x)−6 = [x21 ](1 − x6 )6 (1 − x)−6


= [x21 ](1 − 6x6 + 15x12 − 20x18 + · · · )(1 − x)−6 .

We need not worry about any other terms in the expansion of (1 − x6 )6 , because they will be larger than
x21 (and the terms in (1 − x)−6 all have positive degree. Thus we can write the answer as
       
−6 −6 −6 −6
[x21 ](1 − x)−6 − 6[x15 ](1 − x)−6 + 15[x9 ](1 − x)−6 − 20[x3 ](1 − x)−6 = − +6 − 15 + 20
21 15 9 3
       
26 20 14 8
= −6 + 15 − 20
5 5 5 5
= 1, 666 .

This last step seems very computation heavy—the first term is 26 5 = 65, 780. This doesn’t seem particularly
practical to do by hand. However, it is possible to improve this method using symmetry. In particular, note
that the number of ways to roll a sum of 27 with six dice is the same as the number of ways to roll a sum of
15 with six dice, because you can replace each roll x with a roll 7 − x, and then the sum is 7 · 6 − 27 = 15.
It’s much easier to compute the coefficient of x15 in the expansion. We find

[x15 ]x6 (1 − x6 )6 (1 − x)−6 = [x9 ](1 − x6 )6 (1 − x)−6


= [x9 ](1 − 6x6 + 15x12 + · · · )(1 − x)−6
   
−6 −6
= (−1)9 − 6 · (−1)3
9 3
   
14 8
= −6
5 5
= 2002 − 6 · 56
= 1, 666 .

This simple step saved us from an enormous amount of computation. 


Generating functions are particularly powerful when dealing with restrictions. Consider the following
example.
Example 10.1.4. (MATHCOUNTS; 2019) Chris flips a coin 16 times. Given that exactly 12 of the flips
land heads, what is the probability that Chris never flips five heads in a row? Express your answer as a
common fraction.

Solution: First, there are 16
12 sequences of 12 H’s (representing heads) and 4 T ’s (representing tails).
These each occur with equal probability.
To compute the number of such sequences where Chris never flips five heads in a row, we will use
generating functions. Since Chris flips four tails, his sequence will look like this:

T T T T

where in each space, he has anywhere from 0 to 4 H’s. Therefore, each space can be represented by the
generating function (1 + x + x2 + x3 + x4 ). Thus the generating function

(1 + x + x2 + x3 + x4 )5 (1)
10.1. INTRODUCTION 71

represents all of the possible ways to place anywhere from 0 to 4 H’s in each space (where the coefficient of
xk represents the number of sequences with k H’s). Our sequence has 12 H’s, so we wish to determine the
coefficient of x12 in (1). Using finite geometric series, we can write (1) as
 5
1 − x5
= (1 − x5 )5 (1 − x)−5 . (2)
1−x

Therefore, as (1 − x5 )5 = 1 − 5x5 + 10x10 − · · · , we see that the coefficient of x12 in (2) will be
     
−5 −5 −5
(−1)12 − 5(−1)7 + 10(−1)2 .
12 7 2

This can be rewritten as      


16 11 6
−5 + 10 = 320.
4 4 4

320 16
Thus the desired probability is = . 
(16
4) 91

10.1.1 Sequences
We can also use generating functions to encode sequences. In general, given a sequence {an }, n = 0, 1, 2, . . . ,
we define the generating function A(x) of the sequence {an } to be

A(x) = a0 + a1 x + a2 x2 + · · · .

For example, the boring sequence an = 1 for all nonnegative integer n has the generating function

A(x) = 1 + x + x2 + x3 + · · · .
1
This is just a geometric series, with sum 1−x for |x| < 1. Therefore, we say that the closed form of A(x) is
1
A(x) = 1−x . Studying closed forms is nice, because it allows us to study the entire sequence at the same
time. As an example, let’s derive the Fibonacci generating function.

Example 10.1.5. Find the closed form of the generating function F (x) for the Fibonacci sequence defined
by F1 = F2 = 1 and Fn+1 = Fn + Fn−1 .

Solution: First, we write out the expansion of the generating function, so

F (x) = F1 x + F2 x2 + F3 x3 + · · · .

For each coefficient n ≥ 3, we can apply the recursive relation, hence

F (x) = x + x2 + (F2 + F1 )x3 + (F3 + F2 )x4 + (F4 + F3 )x5 + · · · .

Rearranging terms, we can write this as

F (x) = x + (x2 + F2 x3 + F3 x4 + F4 x5 + · · · ) + (F1 x3 + F2 x4 + F3 x5 + · · · ).

x
Hence F (x) = x + xF (x) + x2 F (x), or rather (1 − x − x2 )F (x) = x. It follows that F (x) = . 
1 − x − x2

Example 10.1.6. If Fn is the nth Fibonacci number, find the sum of the series

F1 F2 F3 F4
1
+ 2 + 3 + 4 + ··· .
2 2 2 2
72 CHAPTER 10. GENERATING FUNCTIONS 1

x
Solution: Since F (x) = F1 x + F2 x2 + F3 x3 + · · · = 1−x−x2 , we can substitute x = 1/2 to find that

F1 F2 F3 F4 (1/2)
+ 2 + 3 + 4 + ··· = = 2.
21 2 2 2 1 − (1/2) − (1/2)2


Warning! This only works if the series converges. For example, we could not plug √
in x = 1, because
the series would not converge. It turns out that this series will only converge if |x| < 5−1
2 = 0.618.
In the recursion lecture, we connected tiling problems to the Fibonacci sequence. Now we show how the
two relate via generating functions.
Example 10.1.7. Find the generating function for the number of tilings of a 1 × n board using 1 × 1 and
1 × 2 tiles, and compare this to the generating function of the Fibonacci sequence.
Solution: Let G(x) be the desired generating function. Each tile has two choices. It can either be a
1 × 1 tile or a 1 × 2 tile. Therefore, the generating function of a single tile is x + x2 . Therefore, the generating
function that represents using k tiles is (x + x2 )k . For example, if we have three tiles, we represent the eight
possible tilings as shown below.

x1+1+1 x1+2+2

x1+1+2 x2+1+2

x1+2+1 x2+2+1

x2+1+1 x2+2+2

Therefore, summing over all possible k, we find that

G(x) = 1 + (x + x2 )1 + (x + x2 )2 + (x + x2 )3 + · · · .

This is an infinite geometric series with common ratio x + x2 . Therefore,


1
G(x) = .
1 − (x + x2 )

Note that this is nearly the same as the Fibonacci generating function. In fact, F (x) = xG(x). This matches
with our interpretation that Fn+1 represents the number of tilings of a 1 × n board. 
The closed form generating function of a recursive sequence can be helpful in deriving their closed-form
expressions when we use partial fraction decompositions. As an example, we will revisit the following example
from when we talked about recursive sequences.
Example 10.1.8. Find a closed-form expression for an , where a1 = 1, a2 = 4, and an+1 = 5an − 6an−1 .
Solution: Suppose that A(x) = a1 x + a2 x2 + a3 x3 + · · · . Then iterating the recurrence on all of the
terms with n ≥ 3, we find

A(x) = x + 4x2 + (5a2 − 6a1 )x3 + (5a3 − 6a2 )x4 + (5a4 − 6a3 )x5 + · · · .

Rearranging terms, we can write this as

A(x) = x − x2 + 5(x2 + a2 x3 + a3 x4 + a4 x5 + · · · ) − 6(a1 x3 + a2 x4 + a3 x5 + · · · ).


10.1. INTRODUCTION 73

Hence A(x) = (x − x2 ) + 5xA(x) − 6x2 A(x), or rather (1 − 5x + 6x2 )A(x) = x − x2 . Therefore,


x − x2
A(x) = .
1 − 5x + 6x2
2
x−x
The denominator factors, so A(x) = (1−2x)(1−3x) . Using the method of partial fraction decomposition
(described below), we can rewrite this as
 
x − x2 1 1
= (1 − x) − .
(1 − 2x)(1 − 3x) 1 − 3x 1 − 2x
Inside of parentheses, we have geometric series. Therefore,
∞ ∞
!
x − x2 X X
n n
= (1 − x) (3x) − (2x)
(1 − 2x)(1 − 3x) n=0 n=0
∞ ∞
! ∞ ∞
!
X X X X
n n n n
= (3x) − (2x) −x (3x) − (2x)
n=0 n=0 n=0 n=0

Recall that an is the coefficient of x . In the above expression, we find that the coefficient of xn is 3n − 2n −
n

(3n−1 − 2n−1 ), or rather an = 2 · 3n−1 − 2n−1 . 


Generating functions also provide motivation for some of our guessing-based ideas in recursive sequences.
In the next example, we find the generating function of a recursive sequence whose characteristic polynomial
has a double root.
Example 10.1.9. Find the generating function and closed-form expression for bn , where b0 = 1, b1 = 6,
and bn+1 = 4bn − 4bn−1 .
Before we begin, we will note that the characteristic polynomial is λ2 − 4λ + 4, which has a double root
at λ = 2. Solution: Suppose that B(x) = b0 + b1 x + b2 x2 + b3 x3 + · · · . Then iterating the recurrence on
all of the terms with n ≥ 2, we find
B(x) = 1 + 6x + (4b1 − 4b0 )x2 + (4b2 − 4b1 )x3 + (4b3 − 4b2 )x4 + · · · .
Rearranging terms, we can write this as
B(x) = 1 + 2x + 4(b0 x + b1 x2 + b2 x3 + · · · ) − 4(b0 x2 + b1 x3 + b2 x4 + · · · ).
Hence B(x) = 1 + 2x + 4xB(x) − 4x2 B(x). It follows that (1 − 4x + 4x2 )B(x) = 1 + 2x, so
1 + 2x
B(x) =
(1 − 2x)2
is the generating function representing B(x). We can write this as (1 + 2x)(1 − 2x)−2 . By the Negative
Binomial Theorem, this is just
X∞   X∞  
−2 n n+1
B(x) = (1 + 2x) (−2x) = (1 + 2x) · 2n · xn .
n=0
n n=0
1

Distributing (1 + 2x), we find


∞ 
X  ∞ 
X 
n+1 n+1
B(x) = · 2n · xn + 2x · 2n · xn .
n=0
1 n=0
1
n
Hence the coefficient of x is given by
   
n+1 n n
·2 +2· · 2n−1 = (2n + 1) · 2n .
1 1

It follows that bn = (2n + 1) · 2n for all nonnegative integers n. 


We can also use this method to attack non-homogeneous recursive relations.
74 CHAPTER 10. GENERATING FUNCTIONS 1

Example 10.1.10. Find the generating function and closed-form expression for cn , where c0 = 3, c1 = 10,
and for n ≥ 1, cn+1 = 5cn − 6cn−1 + 5n .
Solution: As in previous solutions, let C(x) = c0 + c1 x + c2 x2 + c3 x3 + · · · . When we iterate the
recurrence on all terms n ≥ 2, we find

C(x) = 3 + 10x + (5c1 − 6c0 + 51 )x2 + (5c2 − 6c1 + 52 )x3 + (5c3 − 6c2 + 53 )x4 + · · · .

Rearranging terms, we can write this as

C(x) = 3 − 6x + 5(c0 x + c1 x2 + c2 x3 + c3 x4 + · · · ) − 6(c0 x2 + c1 x3 + c2 x4 + · · · ) + (x + 51 x2 + 52 x3 + 53 x4 + · · · ).


x
This can be written as C(x) = 3 − 6x + 5xC(x) − 6x2 C(x) + 1−5x , where we summed the right-most series
as a geometric series. Therefore,
x
(1 − 5x + 6x2 )C(x) = 3 − 6x + .
1 − 5x
Since (1 − 5x + 6x2 ) = (1 − 2x)(1 − 3x), we can write this as
3 − 6x x 3 x
C(x) = + = + .
(1 − 2x)(1 − 3x) (1 − 2x)(1 − 3x)(1 − 5x) 1 − 3x (1 − 2x)(1 − 3x)(1 − 5x)

Using the method of partial fractions, suppose that


x r s t
= + + .
(1 − 2x)(1 − 3x)(1 − 5x) 1 − 2x 1 − 3x 1 − 5x

Then
x = r(1 − 3x)(1 − 5x) + s(1 − 2x)(1 − 5x) + t(1 − 2x)(1 − 3x).
Plugging in x = 1/2, we find 1/2 = r(−1/2)(−3/2), so r = 32 . Plugging in x = 1/3, we find 1/3 =
s(1/3)(−2/3), so s = − 32 . Plugging in x = 1/5, we find 1/5 = t(3/5)(2/5), so t = 5/6. Therefore,

3 2 1 3 1 5 1
C(x) = + · − · + ·
1 − 3x 3 (1 − 2x) 2 1 − 3x 6 1 − 5x
X∞ ∞ ∞ ∞
2X n n 3X n n 5X n n
=3 3n xn + 2 x − 3 x + 5 x .
n=0
3 n=0 2 n=0 6 n=0

2 3 5
It follows that the coefficient of xn is 3n+1 + 3 · 2n − 2 · 3n + 6 · 5n , which we can rewrite as

2n+2 + 3n+2 + 5n+1


cn = .
6

10.2 Appendix: Partial Fraction Decomposition


The method of partial fraction decomposition can be used to split up fractions of the form a(x)
b(x) where a(x)
and b(x) are nonzero polynomials. This is usually based on the factorization of b(x). Suppose we can factor
b(x) as
b(x) = (x − r1 )(x − r2 ) · · · (x − rn ),
where r1 , r2 , . . . , rn are distinct. Then we might guess that we can write

a(x) A1 A2 An
= + + ··· + ,
b(x) x − r1 x − r2 x − rn
10.2. APPENDIX: PARTIAL FRACTION DECOMPOSITION 75

because when we find a common denominator, the right hand side will have the same denominator as the
left hand side. However, we also note that after finding a common denominator on the right hand side, the
numerator is a polynomial of degree less than or equal to n − 1. So we usually want deg(a(x)) < deg(b(x)).
This can be achieved using the polynomial division algorithm, where we can find the quotient q(x) and
remainder r(x) when we divide a(x) by b(x). The relationship between these polynomials is

a(x) = q(x)b(x) + r(x).


a(x) r(x)
Therefore, b(x) = q(x) + b(x) . The division algorithm produces r(x) such that deg(r(x)) < deg(b(x)). So
r(x)
when deg(a(x)) ≥ deg(b(x)), we can apply the division algorithm, and then we use partial fractions on b(x) .
We illustrate the method with an example.

Example 10.2.1. Find A and B such that


2 A B
= + .
(x + 1)(x + 3) x+1 x+3

Solution: When x 6= −1, −3, we multiply both sides of the equation by (x + 1)(x + 3). Hence

2 = A(x + 3) + B(x + 1). (1)

This can be written as 2 = (A + B)x + (3A + B). This equation must be true for all x 6= −1, −3. However,
if it is a linear function, then it will have only one solution. Hence it cannot be linear, so A + B = 0. It
follows that 3A + B = 2. Solving this system of equations, we find A = 1 and B = −1. Hence
2 1 1
= − .
(x + 1)(x + 3) x+1 x+3


This method is fully rigorous, but it can sometimes be tedious. There is another faster method to finish
things off from (1). However, it might not always work, and it might seem like we are breaking rules.
Solution: In (1), we plug in x = −3, finding 2 = −2B. We also plug in x = −1, finding 2 = 2A. Hence
A = 1 and B = −1. 
In this solution, we carefully selected values to make half of the right side of (1) become 0, and the values
popped out! However, at the beginning of the first solution, we also assumed that x 6= −1, −3, so are we
breaking any rules? It turns out that this is completely legal because the functions on both sides of (1) are
continuous for all real x. A careful argument shows that this method is valid.
When b(x) has non-distinct roots, the method of partial fractions changes somewhat. In particular, when
b(x) has a factor of the form (x − r)d (where d ≥ 2), we will use

A1 A2 A2 A2
+ + + ··· +
x − r (x − r)2 (x − r)3 (x − r)d

to obtain the factor (x − r)d in the common denominator. Also, sometimes b(x) has non-reducible factors.
If b(x) has an irreducible factor f (x) of degree d, then we will use a term of the form

Ad−1 xd−1 + Ad−2 xd−2 + · · · + A1 x + A0


f (x)

to obtain the factor f (x) in the common denominator. Note that the degree of the numerator is at most one
less than the degree of f (x). Consider the following example.

Example 10.2.2. Use the method of partial fractions to decompose


2
.
(x2 + 1)(x − 1)2
76 CHAPTER 10. GENERATING FUNCTIONS 1

Solution: Following the above discussion, we will attempt to write this in the form
2 Ax + B C D
= 2 + + .
(x2 + 1)(x − 1)2 x +1 x − 1 (x − 1)2

Clearing denominators, we find

2 = (Ax + B)(x − 1)2 + C(x − 1)(x2 + 1) + D(x2 + 1).

If we follow the faster method from above, we can plug in x = 1 to find 2 = 2D, so D = 1. While we might
try plugging in x = ±i, this won’t yet get us to the value of C. Instead, we combine like terms, finding

2 = (A + C)x3 + (B − 2A − C + D)x2 + (A − 2B + C)x + (B − C + D).

Therefore, A + C = 0, B − 2A − C + D = 0, A − 2B + C = 0, and B − C + D = 2. Substituting the fourth


equation into the second equation, we find 2 − 2A = 0, so A = 1. Therefore, C = −1, so B = 0, and thus
D = 1. It follows that
2 x 1 1
2 2
= 2 − + .
(x + 1)(x − 1) x + 1 x − 1 (x − 1)2

Chapter 11

Generating Functions 2

11.1 Partitions
In this section, we will use generating functions to study partitions. We first studied partitions when we
talked about bijections. We mentioned that there was no easy way to count the number of partitions p(n),
which is why bijections were so useful in the study of partitions. However, partitions have very nice generating
functions, so it’s often even easier to talk about the generating function of partitions.

Example 11.1.1. Find the generating function for the number of partitions p(n) of a number n.

Solution: When counting the number of partitions of n, recall that order does not matter, so 3 + 1 + 1
and 1 + 1 + 3 are the same partition of 5. Therefore, we only care about the number of times each part
appears in the partition. We find:

• The generating function for the number of ones that can appear in a partition is 1 + x + x2 + x3 + · · · =
1
1−x , because there is 1 way that k ones can appear, namely 1 + 1 + · · · + 1.
| {z }
k ones

• The generating function for the number of twos that can appear in a partition is 1 + x2 + x4 + x6 + · · · =
1
2 + 2 + · · · + 2.
1−x2 , because there is 1 way that k twos can appear, namely | {z }
k twos

• The generating function for the number of threes that can appear in a partition is 1+x3 +x6 +x9 +· · · =
1
1−x3 , because there is 1 way that k threes can appear, namely 3 + 3 + · · · + 3.
| {z }
k threes

• In general, the generating function for the number of k’s that can appear in a partition is 1 + xk +
1
x2k + x3k + · · · = 1−x k.

This holds true for every possible part size. Therefore, since these are all independent events, we multiply
them together to find that
1 1 1
p(0) + p(1)x + p(2)x2 + p(3)x3 + · · · = · · ··· .
1 − x 1 − x2 1 − x3
We can write this more compactly in product notation as

Y 1
.
1 − xk
k=1


Using generating functions, we can re-attack other problems that we solved in the bijections section.

77
78 CHAPTER 11. GENERATING FUNCTIONS 2

Example 11.1.2. Prove that the number of partitions of n into odd parts is equal to the number of partitions
of n into distinct parts.
Solution: Using a similar argument to above, we find that the number of partitions of n into odd parts
is
(1 + x + x2 + x3 + · · · )(1 + x3 + x6 + x9 + · · · )(1 + x5 + x10 + x15 + · · · ) · · · ,
Where the first set of parentheses represents the number of 1’s, the second set of parentheses represents the
number of 3’s, the third set of parentheses represents the number of 5’s, and so on. This can be written as
the infinite product

Y
1 1 1 1
· · ··· = . (1)
1 − x 1 − x3 1 − x5 1 − x2k−1
k=1
On the other hand, the generating function for the number of partitions of n into distinct parts is

Y
(1 + x)(1 + x2 )(1 + x3 )(1 + x4 ) · · · = (1 + xk ). (2)
k=1

In this product, the factor (1 + xk ) represents having the part k appear either 0 or 1 times. We wish to
prove that (1) and (2) are equal. We can rewrite (2) as

Y ∞
Y 1 − x2k
(1 + xk ) = .
1 − xk
k=1 k=1
2k
Note that in the numerator, x will always be an even power of x. Therefore, the even powers of x in the
numerator will cancel with the even powers of x in the denominator, leaving only the odd powers of x in the
denominator. Hence
Y∞ Y∞
1
(1 + xk ) = .
1 − x2k−1
k=1 k=1
It follows that (1) and (2) are equal, so since the generating functions for the number of partitions of n into
odd parts is the same as the number of partitions of n into distinct parts, the two quantities must be equal.

The next example illustrates how we can use generating functions to deal with restrictions that might
prove more challenging for bijective methods.
Example 11.1.3. Fix q ≥ 1. For each n ≥ 1, prove that the number of partitions of n into parts that are
not divisible by q + 1 is equal to the number of partitions of n in which no part appears more than q times.
1
Solution: As in the original partition generating function, the factor 1−x k is the generating function for

the number of parts of size k. We can have parts of any size except the sizes divisible by q + 1, so we must
omit them from our generating function. Equivalently, we can multiply the partition generating function by
(1 − xk ) for all k divisible by q + 1. Therefore, the generating function for the number of partitions of n into
parts that are not divisible by q + 1 is

Y ∞
Y
1
· (1 − xk(q+1) ). (1)
1 − xk
k=1 k=1

On the other hand if the part k appears no more than q times, then its contribution can be represented by
1 − x(q+1)k
1 + xk + x2k + · · · + xqk = .
1 − xk
Multiplying this over all positive integers k, we find that the generating function for the number of partitions
of n in which no part appears more than q times is

Y 1 − x(q+1)k
. (2)
1 − xk
k=1

Since (1) and (2) are equal, it follows that the number of partitions of n into parts that are not divisible by
q + 1 is equal to the number of partitions of n in which no part appears more than q times. 
11.2. ROOTS OF UNITY FILTERS 79

11.2 Roots of Unity Filters


Next, we will discuss the idea of a roots of unity filter. One of the motivations behind this idea is the
following problem.

Example 11.2.1. In a group of 2020 people, find a closed-form expression for the following:

(a) The number of ways to form a committee of even size.

(b) The number of ways to form a committee where the size of the committee is divisible by 4.

Solution:

(a) The number of ways to form a committee of even size is


       
2020 2020 2020 2020
+ + + + ··· .
0 2 4 6

To compute this, we will use the Binomial Theorem twice. We find that
           
2020 2020 2020 2020 2020 2020
(1 + 1)2020 = + + + + + ··· +
0 1 2 3 4 2020
           
2020 2020 2020 2020 2020 2020
(1 − 1)2020 = − + − + + ··· +
0 1 2 3 4 2020

Adding these, we notice that the odd terms cancel out and the even terms double up. Hence
     
2020 2020 2020
22020 + 0 = 2 +2 +2 + ··· .
0 2 4

Dividing by 2, we find      
2019 2020 2020 2020
2 = + + + ··· .
0 2 4

Hence the number of ways to form a committee of even size is 22019 .

(b) First, note that the number of ways to form a committee whose size is divisible by 4 is
       
2020 2020 2020 2020
+ + + + ··· .
0 4 8 12

Let’s reexamine the above solution. We can write the first two lines as
           
2020 2020 2020 2020 2 2020 3 2020 4 2020 2020
(1 + x) = + x+ x + x + x + ··· + x
0 1 2 3 4 2020
           
2020 2020 2020 2 2020 3 2020 4 2020 2020
(1 − x)2020 = − x+ x − x + x + ··· + x
0 1 2 3 4 2020

Therefore, when we add these and divide by 2, we cancel out the odd terms, so the sum is
     
(1 + x)2020 + (1 − x)2020 2020 2020 2 2020 4
= + x + x + ··· .
2 0 2 4

We want the sum of every other term. To do this, we might try to mimic our original solution. In order
to cancel out the term with coefficient 2020
2 , we might try plugging in x2 = 1 and x2 = −1, i.e., x = 1
80 CHAPTER 11. GENERATING FUNCTIONS 2

and x = i. We find
         
(1 + 1)2020 + (1 − 1)2020 2020 2020 2020 2020 2020
= + + + + + ···
2 0 2 4 6 8
         
(1 + i)2020 + (1 − i)2020 2020 2020 2 2020 4 2020 6 2020 8
= + i + i + i + i + ···
2 0 2 4 6 8

Note that i2k = −1 when k is odd and i2k = 1 when k is even. Therefore, when we add these two
equations and divide by 2, we find
     
(1 + 1)2020 + (1 − 1)2020 + (1 + i)2020 + (1 − i)2020 2020 2020 2020
= + + + ··· .
4 0 4 8
To evaluate these expressions, note that (1 + i)2 = 2i and (1 − i)2 = −2i. Thus the number of ways to
form a committee whose size is divisible by 4 from 2020 people is

22020 + (2i)1010 + (−2i)1010 22020 − 21010 − 21010 22020 − 21011


= = .
4 4 4


This concept can be used more generally with roots of unity. We will not go into detail about roots of
unity here, but we will recall some definitions.
Definition 11.2.1. The nth roots of unity are the roots to the polynomial z n − 1 = 0.
Using the algebra of complex numbers, we can come up with a formula for the nth roots of unity.
Proposition 11.2.1. The nth roots of unity are given by rk = cos 2πk 2πk
n + i sin n = e
2πik/n
, where k =
2πi/n
0, 1, 2, . . . , n − 1. Often times, we set ω = e , so then we can say that the nth roots of unity are 1, ω, ω 2 ,
. . . , ω n−1 .
In the complex plane, the nth roots of unity are equally spaced around the unit circle, as we demonstrate
below when n = 7.
ω2
ω1

ω3

ω4

ω6
ω5

For our purposes, the most useful aspect of the nth roots of unity is the fact that they are periodic with
period n. Additionally, we have the following useful identity.
Proposition 11.2.2. If ω 6= 1 is an nth root of unity, then

1 + ω + ω 2 + ω 3 + · · · + ω n−1 = 0.

Proof. The left hand side is a finite geometric series, hence its sum is
1 − ωn
.
1−ω
However, as ω is an nth root of unity, we know that ω n = 1. Hence the sum must be 0.
11.3. MULTIPLE VARIABLES 81

This leads us to the following general theorem on roots of unity filters.


Theorem 11.2.3. (Roots of Unity Filter) If P (x) is a polynomial, then we can take a roots of unity filter
to find the sum of every nth coefficient of P (x). If P (x) = a0 + a1 x + a2 x2 + · · · + ad xd , then

P (1) + P (ω) + P (ω 2 ) + · · · + P (ω n−1 )


= a0 + an + a2n + a3n + · · · .
n
This filters out everything except for the coefficients of xk , where k is a multiple of n.
The proof of this relies on the identity in the previous proposition. Note that the fourth roots of unity
are 1, i, −1, and −i. Therefore, given the polynomial P (x) = a0 + a1 x + a2 x2 + · · · + an xn , we can say that

P (1) + P (i) + P (−1) + P (−i)


a0 + a4 + a8 + a12 + · · · = .
4
This was the essential ingredient in our solution to finding the number of committees whose size is divisible
by 4.
As another example, we present the following beautiful solution.
Example 11.2.2. (AIME; 2016) The figure below shows a ring made of six small sections which you are to
paint on a wall. You have four paint colors available and you will paint each of the six sections a solid color.
Find the number of ways you can choose to paint the sections if no two adjacent sections can be painted with
the same color.

Solution: In order to use a generating function, suppose that we label the colors as 0, 1, 2, 3. As we
proceed around a valid coloring of the ring in the clockwise direction, we know that between two adjacent
sections with colors si and si+1 , there exists a number di ∈ {1, 2, 3} such that si+1 ≡ si + di (mod 4).
Therefore, we can represent each border between sections by the generating function (x + x2 + x3 ), where
x, x2 , x3 correspond to increasing the color number by 1, 2, 3 (mod 4), respectively. Thus the generating
function that represents going through all six borders is A(x) = (x + x2 + x3 )6 , where the coefficient of xn
represents the total number of colorings where the color numbers are increased by a total of n as we proceed
around the ring.
However, if we go through all six borders, we must return to the original section, whose color has already
been chosen. Therefore, the net change in color is 0 (mod 4), so we wish to find the sum of the coefficients
of xn in A(x) with n ≡ 0 (mod 4). Using a roots of unity filter, the sum of the coefficients of A(x) with
powers congruent to 0 (mod 4) is

A(1) + A(i) + A(−1) + A(−i) 36 + (−1)6 + (−1)6 + (−1)6 732


= = .
4 4 4

We multiply this by 4 to account for the initial choice of color, so the answer is 732 . 

11.3 Multiple Variables


Sometimes, we might need to keep track of more things, and in this case, it may be prudent to use more
variables.
82 CHAPTER 11. GENERATING FUNCTIONS 2

Example 11.3.1. Find a generating function f (x, y) such that the coefficient of y m xn is the number of
partitions of n that have exactly m parts.

Solution: As we did with the normal partition generating function, we look at each possible part size.
For parts of size k, we find that the generating function is

1
1 + yxk + y 2 x2k + y 3 x3k + · · · = ,
1 − yxk

because if we have j parts of size k, it will contribute y j · xjk to a term. Therefore, the desired generating
function is

Y 1
f (x, y) = .
1 − yxk
k=1

We demonstrate another example of the use of multiple variables. The method of generating functions
is much more high-tech than is needed to solve the problem, but it is an interesting example in how we can
combine multiple variables with a roots of unity filter.

Example 11.3.2. (AIME; 2019) A standard six-sided fair die is rolled four times. The probability that
the product of all four numbers rolled is a perfect square is m
n , where m and n are relatively prime positive
integers. Find m + n.

Solution: We use generating functions. Note that the prime factorizations of the six numbers on a die
are 1, 21 , 31 , 22 , 51 , and 21 ·31 . Therefore, let a, b, and c be variables that represent the events of contributing
a single 2, 3, or 5 to the prime factorization of the product. Then the possible events in a single die can be
represented by the generating function

1 + a + b + a2 + c + ab.

Therefore, the generating function

f (a, b, c) = (1 + a + b + a2 + c + ab)4

will enumerate all possible events when rolling four die. We wish to find the sum of the coefficients of the
terms ai bj ck where i, j, k are all even.
Note that f (1,b,c)+f
2
(−1,b,c)
will find the sum of all of the terms with an even power of a, and this is

(3 + 2b + c)4 + (1 + c)4
g(b, c) = .
2

Similarly, g(b,1)+g(−1,c)
2 will find the sum of all terms with both an even power of a and an even power of b,
and this is
(4 + 2b)4 + 24 + (2 + 2b)4 + 04
h(b) = .
4
Finally, h(1)+h(−1)
2 represents the sum of all terms with an even power of each of a, b, and c, hence the
number of possible ways that the product is a perfect square is

(64 + 24 + 44 ) + (24 + 24 + 04 )
= 2 · (81 + 16 + 3) = 200.
8
200 25
Hence the desired probability is 64 = 162 , and the answer is 187 . 
11.4. CATALAN NUMBERS 83

11.4 Catalan Numbers


In this section, we will apply generating functions to derive the general formula for Catalan numbers. This
demonstrates the full power of the Generalized Binomial Theorem.
Recall the that when we talked about bijections, we defined the Catalan numbers as follows.

Definition 11.4.1. The nth Catalan number Cn is the number of paths from (0, 0) to (n, n) in the coordinate
plane that:

• travel along the gridlines, and

• do not go above the line y = x.

Using this definition, we will develop a recursion satisfied by Catalan numbers, after which we will derive
the Catalan generating function.
2
Example 11.4.1. Find a closed-form expression for the
3 1 2n
 generating function C(x) = C0 + C1 x + C2 x +
C3 x + · · · . Use this to derive the formula Cn = n+1 n for the nth Catalan number.

Solution: As a first step, we will prove the following recurrence:

Cn = C0 Cn−1 + C1 Cn−2 + C2 Cn−3 + · · · + Cn−1 C0 .

We will prove this combinatorially, based on the given definition. To compute Cn , we do casework based on
the first time that a path reaches the line y = x after (0, 0).
If a path first reaches y = x at (k, k), then we know that the path cannot touch y = x before x = k,
which means that it must stay at or below the line y = x − 1 prior to reaching (k, k). We can draw the
following representation of our path.

Thus we may think of our path in two segments—one in the lower shaded region and one in the upper shaded
region. The lower portion of the path goes from (1, 0) to (k, k − 1) without crossing above y = x − 1, and
there are Ck−1 such paths. The upper portion of the path goes from (k, k) to (n, n), and there are Cn−k
such paths. Thus there are a total of Ck−1 Cn−k paths that first touch the line y = x at (k, k). Summing
this from k = 1 to n, we find

Cn = C0 Cn−1 + C1 Cn−2 + C2 Cn−3 + · · · + Cn−1 C0 .

Now suppose that we square C(x). Then we find



X
2
C(x) = (C0 Ck + C1 Ck−1 + C2 Ck−2 + · · · + Ck C0 ) xk .
k=0
84 CHAPTER 11. GENERATING FUNCTIONS 2

By the above recursion, this simplifies to



X
C(x)2 = Ck+1 xk .
k=0

Hence xC(x) + 1 = C(x). We can write this as a quadratic in C(x), so xC(x)2 − C(x) + 1 = 0. Using the
2

quadratic formula, we find √


1 ± 1 − 4x
C(x) = . (1)
2x √
This might seem strange, to write a generating function as a square root, but when we write 1 − 4x as
(1 − 4x)1/2 , we can apply the Generalized Binomial Theorem. But first, we must settle whether to use the
+ or − sign from the quadratic formula. By definition, the coefficients of C(x) are the Catalan numbers,
which are all positive. We can compute that
X∞  
1/2 1/2
(1 − 4x) = (−4x)k .
k
k=0

Using the formula for binomial coefficients, we find that when k > 0,
 
1/2 (1/2)k
· (−4)k = · (−4)k
k k!
(1/2)(1/2 − 1)(1/2 − 2)(1/2 − 3) · · · (1/2 − (k − 1))
= · (−1)k · 2k · 2k
k!
1 · (−1) · (−3) · (−5) · · · (−(2k − 3))
= · (−1)k · 2k
k!
1 · 3 · 5 · · · (2k − 3)
= · (−1)k−1 · (−1)k · 2k
k!

Note that (−1)k−1 · (−1)k = 1, because it will create an odd power of −1. Also, in order to create a complete
factorial on the numerator, we multiply by 2 · 4 · 6 · · · (2k − 2) = 2k−1 · (k − 1)! in both numerator and
denominator, so
 
1/2 (1 · 3 · 5 · · · (2k − 3)) · (2 · 4 · 6 · (2k − 2)) k
· (−4)k = − ·2
k k! · (2k−1 · (k − 1)!)
(2k − 2)!
=− ·2
k! · (k − 1)!
 
2 2k − 2
=−
k k−1
Hence

X  
2 2k − 2 k
(1 + 4x)1/2 = 1 − x
k k−1
k=1

Therefore, since almost all of the coefficients of this series are negative, but the coefficients in C(x) are
positive, we know that in (1), the ± symbol must be a minus sign. Therefore,
X∞   !!,
2 2k − 2 k
C(x) = 1 − 1 − x (2x)
k k−1
k=1
X∞  
1 2k − 2 k−1
= x
k k−1
k=1
X∞  
1 2k k
= x .
k+1 k
k=0
1 2k

It follows that Ck = [xk ]C(x) = k+1 k , which is what we wanted to prove. 
Chapter 12

Probability

12.1 Basics
Probability is the analysis of the likelihood that events happen in random trials. The setup of the concept
of probability is involved, so we do so carefully and precisely to ensure that later on we will have an easier
time developing more complicated ideas.

Definition 12.1.1. 1. In an experiment, the sample space Ω is defined to be the set of all possible outcomes
of the experiments.

2. An event F is a set of outcomes, and the set of all outcomes F is called the event space of the experiment.

3. A probability function is a function P that maps every outcome in Ω to a number between 0 and 1.

Example 12.1.1. Consider an experiment where we roll two standard six-sided dice. The sample space Ω
is the set of all ordered pairs (a, b) where a, b ∈ {1, 2, 3, 4, 5, 6}. An example of an event F ∈ F is the event
where the two dice sum to 10, and this event includes the outcomes (4, 6), (5, 5), and (6, 4). The probability
1
function P is the function such that P ((a, b)) = 36 , since each roll (a, b) is equally likely.

Under this definition, probability satisfies several axioms.

Proposition 12.1.1. 1. For any event A ∈ Ω, P (A) ≥ 0. The statement that P (A) = 0 does not imply
that the event A is impossible.

2. P (Ω) = 1: in other words, the probability that an experiment gives a result in the sample space is 1.

3. For any sequence of disjoint (i.e. never happening simultaneously) events A1 , A2 , . . . , An , we have
n
X
P (Ak ) = P (A1 ∪ A2 ∪ · · · ∪ Ak ).
k=1

In other words, the probability that at least one of A1 , A2 , . . . , Ak occurs is equal to the sum of the
probabilities.1

The way we generally compute the probability of an event F is by listing out all of the possible outcomes
in the event F , and then dividing this count by the size of the sample space Ω.

Example 12.1.2. (AIME; 2001) Club Truncator is in a soccer league with six other teams, each of which
it plays once. In any of its 6 matches, the probabilities that Club Truncator will win, lose, or tie are each 13 .
The probability that Club Truncator will finish the season with more wins than losses is m n , where m and n
are relatively prime positive integers. Find m + n.
1 This also holds for infinite sequences of events, as long as they are disjoint.

85
86 CHAPTER 12. PROBABILITY

Solution: We can express each of the possible outcomes as a string of W ’s, L’s, and T ’s. Since the
possible outcomes of each individual game are equally likely, each string of six letters will be equally likely.
Therefore, we wish to find the number of six-letter strings with more W ’s than L’s. This will be our event,
and we will divide the size of this event by the sample space of all six-letter strings of W ’s, L’s, and T ’s.
To do this we use complementary counting and symmetry. We first find the number of strings with the
same number of W ’s as L’s, and subtract this from the total count. If we have no W ’s or L’s we  must
6
have six T ’s, so there is only one such arrangement. One W and one L gives four T ’s, and 4,1,1 = 30
6
 6

arrangements. Two W ’s and two L’s gives 2,2,2 = 90 strings, and three W ’s and three L’s gives 3,3 = 20
strings. Hence there are
1 + 30 + 90 + 20 = 141
strings of six letters with the same number of W ’s as L’s. Therefore the number of strings with an unequal
number of W ’s and L’s is 36 − 141 = 588, half of which have more W ’s than L’s. We conclude that 294 out
of 729 strings have more W ’s than L’s, which gives us a probability of
294 98
=
729 243
of having more wins than losses. We conclude that (m, n) = (98, 243) and m + n = 341 . 
This approach of dividing the size of the event generally works, but we must be careful to use it correctly.
Example 12.1.3. What is wrong with the following argument?

When rolling two dice, there are 11 possible sums of the faces that come up. These outcomes are
1
2 through 12. Therefore, the probability of rolling a 2 is 11 , as is the probability of rolling a 7, or an 11.

Solution: The issue with this argument is that it assumes that all outcomes are equally likely. However,
the outcomes defined by the experiment are not equally likely. In fact, if we consider the two dice as
independent events (which we will explain shortly), we will find that rolling a seven has a probability of 16 ,
because this event contains 6 outcomes out of 36 equally likely outcomes making up the probability space.


Example 12.1.4. (AHSME; 1974) A fair die is rolled six times. Find the probability of rolling at least a
five at least five times.
Solution: Let G denote the outcome of rolling at least a five on a given roll, and let L denote the
outcome of rolling less than five. Then we wish to find the probability that, when we roll a die six times, we
obtain a string of outcomes containing at least five G’s.
5
The probability of the string being GGGGGL is 62 · 46 = 729 2
. The probability of the string being any
other arrangement of these letters is the same, and there are six of these arrangements, so the probability of
any of these six arrangements appearing is
2 4
= 6· .
729 243
6
The probability of the string GGGGGG occurring is 62 = 729 1
. We conclude that the probability of rolling
at least a five at least five times is
4 1 13
+ = .
243 729 729


Example 12.1.5. A set of 12 n(n + 1) distinct numbers is arranged at random in a triangular array:
×
× ×
× × ×
.. .. ..
. . .
× × × × ×
12.2. CONDITIONAL PROBABILITY 87

Let Mk be the largest number in the kth row from the top. Find the probability that M1 < M2 < · · · < Mn .
Solution: We approach the problem in the following way: we find the probability Pn that Mn is the
largest of M1 through Mn , then the probability Pn−1 that Mn−1 is the largest of M1 through Mn−1 , and so
on. The total probability that M1 < M2 < · · · < Mn will then be
Pn · Pn−1 · · · P2 · P1 .
Consider any arrangement of the 12 n(n + 1) of the numbers in the triangle. The only way Mn is to be the
largest of all the maximums M1 , M2 , . . . , Mn is if the nth row contains the largest element in the triangle.
The probability that this largest element is in the last row is
n 2
1 = = Pn .
2 n(n+ 1) n+1
Now suppose we’ve placed all of the elements into the nth row, and we wish to examine Mn−1 . The only
way Mn−1 can be the greatest of the maximums M1 , M2 , . . . , Mn−1 is if the (n − 1)st row contains the
largest element in the first n − 1 rows, which happens with probability
n−1 2
1 = = Pn−1 .
2 (n− 1)n n
2
We can continue in this fashion, and in general Pk = k+1 . Therefore the probability that M1 < M2 < · · · <
Mn is
2 2 2 2n
· ··· = .
n+1 n 2 n!


12.2 Conditional Probability


Sometimes, in probability problems, we are given addition information as to which events have occurred. For
example, in a roll of two dice we may be given the information that the sum of their values is even. This new
information will change the probabilities of relevant events: newly impossible events will have probability 0,
while events still possible will have their probabilities scaled up.
Definition 12.2.1. Let F be an event. We denote the probability that E happens given that F has happened
by P (E | F ). This probability is then calculated as
P (E ∩ F )
P (E | F ) = .
P (F )
What’s interesting about this definition is that P (E ∩ F ) is an expression that is symmetric with respect
to E and F , but P (E | F ) and P (F ) are asymmetric. Therefore, switching the positions of E and F lets us
also conclude that
P (E ∩ F )
P (F | E) = .
P (E)
We can then solve for P (E ∩ F ) in both equalities to obtain Bayes’ Rule.
Proposition 12.2.1. (Bayes) For any events E and F ,
P (E | F ) · P (F ) = P (F | E) · P (E) = P (E ∩ F ).
Bayes’ Rule is usually written as the less intuitive
P (F | E) · P (E)
P (E | F ) = ,
P (F )
in order to emphasize that we may convert between P (F | E) and P (E | F ).
88 CHAPTER 12. PROBABILITY

Example 12.2.1. (AHSME; 1973) There are two cards; one is red on both sides and the other is red on one
side and blue on the other. The cards have the same probability (1/2) of being chosen, and one is chosen
and placed on the table. If the upper side of the card on the table is red, then find the probability that the
under-side is also red.
Solution: Denote the event that a red side is shown by R, and denote the event that the wholly-red
card was chosen by W . We are then asked to compute P (W | R). From Bayes’ Rule, this is
P (R | W ) · P (W )
P (W | R) = .
P (R)
Note that if W happens then the side that shows up is automatically red, so P (R | W ) = 1. The probability
that the wholly-red card is chosen is 1/2, so P (W ) = 1/2. Finally, the only way that R does not occur
is if we pick the red/blue card and flip the blue side up, which happens with probability 21 · 21 = 14 . Thus
P (R) = 1 − 14 = 43 . Substituting, we calculate

1 · 21 2
P (W | R) = = .
3/4 3


Example 12.2.2. (HMMT; 2014) A particular coin has a 13 chance of landing on heads (H), 13 chance of
landing on tails (T), and 31 chance of landing vertically in the middle (M). When continuously flipping this
coin, what is the probability of observing the continuous sequence HMMT before HMT?
Solution: To solve this problem, we let PS be the probability that we see the sequence HMMT before
HMT, given that the sequence of flips started with the string S. Then, for example,
1 1 1
PH = PHH + PHM + PHT .
3 3 3
Notice that, once we get to the sequence HH, the given event is just as likely to occur if we were at the
sequence H, so PHH = PH . Similarly, PHT = PT , so
1 1 1
PH = PH + PHM + PT . (1)
3 3 3
We can do similar analysis with PM and PT : we get
1 1 1
PM = PT = PH + PM + P T .
3 3 3
This of course implies that PM = 31 PH + 23 PM , so we also have that PM = PH = PT . For the rest of this
solution, we let this common value be x, for convenience.
We now continue the analysis to find PHM .
1 1 1 1 1 1
PHM = PHM H + PHM M + PHM T = PH + PHM M = x + PHM M .
3 3 3 3 3 3
We finally analyze PHM M .
1 1 1 1 1 1 2 1
PHM M = PHM M H + PHM M M + PHM M T = PH + PM + = x + .
3 3 3 3 3 3 3 3
We can now substitute into equation (1) to get
  
2 1 1 1 1 2 1
x= x+ x+ x+ .
3 3 3 3 3 3 3
This simplifies to 23 x = 27
14 1
x + 27 , which solves to yield x = 1/4. Therefore PH = PM = PT = 1/4. The
probability that we see HM M T before HM T is the same regardless of the first coin toss, so the overall
probability must be 1/4 . 
12.3. GEOMETRIC PROBABILITY 89

12.3 Geometric Probability


The final topic we cover is that of geometric probability. We may encounter a situation where we are
randomly selecting a point from a region, instead of selecting from a finite set of possibilities. Here we
review how probability is defined in this context.
Definition 12.3.1. Suppose we are given a region R (in two-dimensional space, three-dimensional space,
or what have you) with finite area. Suppose S ⊂ R is a subset of R. Then when we select a point uniformly
at random from R, the probability that it lies in S is equal to

[S]
,
[R]

where [S] denotes the area (or volume) of S, and likewise for [R]. Here R is the sample space, and the event
we are analyzing is the event of a randomly chosen point falling in S.
The challenge in geometric probability problems generally stems from identifying the regions correspond-
ing to the event and the sample space. Furthermore, the question may not even be framed in terms of
geometric probability: we will need to take a problem and reframe it as a geometry problem. The only
advantage that we are given is that such problems are easy to spot: if we are choosing an object uniformly at
random over some infinite set, we will almost certainly be analyzing the problem with geometric probability.
Example 12.3.1. (AMC 12; 2003) Three points are chosen randomly and independently on a circle. What
is the probability that all three pairwise distances between the points are less than the radius of the circle?

Solution: It is easy to see that this will only be possible if the three points lie within a 60◦ arc of the
circle. Therefore, we wish to find the probability that three randomly chosen points lie on the same 60◦ arc.
Without loss of generality, we fix one point on the circle. We can then place the other two points anywhere
between −180◦ clockwise and 180◦ clockwise beyond this point. Let x and y be the degree measures of the
other two points.
First, note that the desired event requires that x, y ∈ [−60, 60].
If x, y ∈ [0, 60], then clearly all of those points satisfy the desired property. Similarly, if x, y ∈ [−60, 0],
then all such points satisfy the desired property.
If x ∈ [0, 60] and y ∈ [−60, 0], then the minor arc between x and y is x − y, and we want this to be less
than or equal to 60, hence x − y ≤ 60, or y ≥ x − 60.
By the same logic, if y ∈ [0, 60] and x ∈ [−60, 0], then x ≥ y − 60. Now we graph these regions.
y
(−180, 180) (180, 180)

(−180, −180) (180, −180)

The two triangles can be combined into a single shaded square, hence the shaded region is equal to three
2
squares of side length 60. Thus the probability that we are in the shaded region is 3·60 1
3602 = 12 , and the
desired probability is 1/12 . 
90 CHAPTER 12. PROBABILITY
Chapter 13

Expected Value

13.1 Introduction
When we talk about expected value, we are generally talking about the average value of an event—i.e., if we
continuously iterate an event that has a numerical value, then on average, what will that value be? More
formally, we can define it as follows:
Definition 13.1.1. Given a random variable X with a finite number of possible outcomes x1 , x2 , . . . , xn ,
such that the probability of xi occurring is pi , the expected value of X, denoted E(X), is defined to be
E[X] = p1 x1 + p2 x2 + · · · + pn xn .
To illustrate this, we start with one of the most basic examples.
Example 13.1.1. If a single standard six-sided dice is rolled, what is the expected value of the number
shown?
Solution: Here, the random variable is a single roll of the dice, which we call D. We know that
D ∈ {1, 2, 3, 4, 5, 6}. The probability of each value occurring is 61 , hence the expected value is
1 1 1 1 1 1
E[D] = · 1 + · 2 + · 3 + · 4 + · 5 + · 6 = 3.5 .
6 6 6 6 6 6

Here is another example of how we can use probability.
Example 13.1.2. (AMC 10; 2007) A player chooses one of the numbers 1 through 4. After the choice has
been made, two regular four-sided (tetrahedral) dice are rolled, with the sides of the dice numbered 1 through
4. If the number chosen appears on the bottom of exactly one die after it has been rolled, then the player wins
1 dollar. If the number chosen appears on the bottom of both of the dice, then the player wins 2 dollars. If
the number chosen does not appear on the bottom of either of the dice, the player loses 1 dollar. What is the
expected return to the player, in dollars, for one roll of the dice?
Solution: The player rolls the chosen number on a single die with probability 1/4. Therefore, both dice
show the number with probability (1/4)2 = 1/16. Also, the probability that the first die does not match the
chosen number and the second die does not match the chosen number is (3/4)(3/4) = 9/16. Therefore, the
remaining probability (that exactly one die matches the chosen number) is 1 − 1/16 − 9/16 = 3/8.
Thus the expected return to the player is

1 3 9 1
·2+ ·1+ · (−1) = − .
16 8 16 16

In the next example, we illustrate a mysterious application of expected value.

91
92 CHAPTER 13. EXPECTED VALUE

Example 13.1.3. A casino offers you a game of chance. On each turn, a fair coin is tossed. Initially, there
are $2 in the pot. On each toss, if you the coin lands on heads, the money in the pot is doubled, while if the
coin lands on tails, the game stops and you win whatever is in the pot. What would be a fair price to pay
the casino for entering the game?
This example is known as the St. Petersburg Paradox. Approaching this with expected value, note
that the game ends on the first tails, so the possible game sequences are T , HT , HHT , HHHT , . . . . The
probability of each sequence is 12 , 212 , 213 , 214 , . . . , and we note that the winnings in these game sequences
are 21 , 22 , 23 , 24 , . . . , respectively. Therefore, the desired expected value is
1 1 1 1 1
· 2 + 2 · 22 + 3 · 23 + 4 · 24 + · · · = 1 + 1 + 1 + 1 + · · · .
2 2 2 2
We see that the expected value is infinite! Therefore, we should be willing to pay any price to pay this game.
However, suppose that the casino starts selling tickets for $50. Would you buy a ticket? The issue is that
1
you have a very small probability of recouping your money (a 32 chance). So expected value might not be
practical, unless you have an unlimited supply of resources and time.

13.2 Linearity of Expectation


Sometimes, the formula given in the definition of expected value is difficult to apply directly. In these cases,
it may be more useful to change perspective and approach expected value constructively. To illustrate this
idea, we begin with an example that appears to not be directly related to expected value.
Example 13.2.1. Let s(n) be the sum of the base-10 digits of n. Find the sum of s(n) over all positive
four-digit base-10 integers
Solution: If we were to compute s(n) for each number directly, we would get a sum of

(1 + 0 + 0 + 0) + (1 + 0 + 0 + 1) + (1 + 0 + 0 + 2) + · · · + (9 + 9 + 9 + 9).

This seems tricky to evaluate. The values of s(n) seem to have some patterns (i.e., s(n) increases by 1 for the
first 10 integers), but these patterns are interrupted. Instead of trying to completely describe these patterns,
we will instead look at each digit.
• In the thousands digit, for every 1 ≤ d ≤ 9, the digit d will occur in the thousands place 103 times,
contributing 1000d to s(n). Therefore, the total contribution of the thousands digits to s(n) is 1000(1+
2 + · · · + 9) = 45, 000.
• In the hundreds digit, for every 0 ≤ d ≤ 9, the digit d will occur in the hundreds place 9 · 102
times, contributing 900d to s(n). Therefore, the total contribution of the hundreds digits to s(n) is
900(0 + 1 + 2 + · · · + 9) = 40, 500.
• In the tens digit, for every 0 ≤ d ≤ 9, the digit d will occur in the tens place 9 · 102 times, contributing
900d to s(n). Therefore, the total contribution of the tens digits to s(n) is 900(0 + 1 + 2 + · · · + 9) =
40, 500.
• In the units digit, for every 0 ≤ d ≤ 9, the digit d will occur in the units place 9·102 times, contributing
900d to s(n). Therefore, the total contribution of the units digits to s(n) is 900(0 + 1 + 2 + · · · + 9) =
40, 500.
Therefore, the sum of s(n) over all four-digit numbers is 45, 000 + 3 · 40, 500 = 166, 500 . 
We’ll revisit this example momentarily from the lens of expected value. However, we first note that the
key idea here was to change perspective. Instead of looking at each number as a whole, we broke it down into
smaller pieces, looking at the individual digits of each number. This is one of the main ideas of constructive
expectation, to look more narrowly at an easier problem. One of the key tools in this process is the following
proposition.
13.2. LINEARITY OF EXPECTATION 93

Proposition 13.2.1. (Linearity of Expectation) If X and Y are two random variables, then

E[X + Y ] = E[X] + E[Y ].

Also, if c is a constant, then


E[cX] = c · E[X].
Note that these do not require that X and Y be independent. More generally, if X1 , X2 , . . . , Xn are random
variables, then

E[X1 + X2 + X3 + · · · + Xn ] = E[X1 ] + E[X2 ] + E[X3 ] + · · · + E[Xn ].

We now present a second solution to the above problem.

Solution: We will compute E(s(n)) for a randomly chosen 4-digit number a b c d. By linearity of
expectation, this is just
E[a + b + c + d] = E[a] + E[b] + E[c] + E[d].
Since a is a nonzero digit, E[a] = 5. Since b, c, d can be 0, it follows that E[b] = E[c] = E[d] = 4.5. Therefore,

E[a + b + c + d] = 5 + 4.5 + 4.5 + 4.5 = 18.5.


S
Alternatively, if the sum of s(n) over all four digit numbers is S, then the expected value of s(n) is 9000 .
Therefore, S = 9000 · 18.5 = 166, 500 . 
Using expected value on this problem simplified many of the computations. We were able to look at very
specific pieces of the problem.
One useful companion of the linearity of expectation is the idea of an indicator function.

Definition 13.2.1. An indicator function of a set A is a function 1A : A → {0, 1} such that


(
1 if x ∈ A
1A (x) =
0 otherwise.

Indicator functions allow us to count the number of objects in a set. For example, if x1 , x2 , . . . , xn are
random variables, then the number of these random variables in the set A is given by

1A (x1 ) + 1A (x2 ) + · · · + 1A (xn ).

As an example of how we can use indicator functions with expected value, consider the following example.

Example 13.2.2. (CMIMC; 2018) Victor shuffles a standard 54-card deck then flips over cards one at a
time onto a pile stopping after the first ace. However, if he ever reveals a joker he discards the entire pile,
including the joker, and starts a new pile; for example, if the sequence of cards is 2-3-Joker-A, the pile ends
with one card in it. Find the expected number of cards in the end pile.

Solution: The important cards in the deck are the two jokers and four aces. Suppose we start by laying
these six cards down as shown below, leaving spaces for the other cards.

Joker A A Joker A A .

The order of these six cards doesn’t particularly matter, because the cards that are discarded are those that
appear in the blank immediately preceding the first ace. Let Si be the set of cards that appear in the ith
blank, where 1 ≤ i ≤ 7.
If the remaining 48 cards are x1 , x2 , . . . , x48 , then the number of cards that appear in the ith blank is
given by
1Si (x1 ) + 1Si (x2 ) + · · · + 1Si (x48 ).
94 CHAPTER 13. EXPECTED VALUE

Then by linearity of expectation,

E [1Si (x1 ) + 1Si (x2 ) + · · · + 1Si (x48 )] = E [1Si (x1 )] + E [1Si (x2 )] + · · · + E [1Si (x48 )] .

There are seven blanks, and a card is equally likely to appear in one of the seven blanks. It follows that
E [1Si (x1 )] = 17 (and the same is true for x2 , x3 , . . . , x48 ). Therefore,
1
E [1Si (x1 ) + 1Si (x2 ) + · · · + 1Si (x48 )] = 48 · .
7

48 55
Since the end pile also includes the first ace, the expected number of cards in the end pile is 7 +1= .
7


Example 13.2.3. (Utah Math Olympiad; 2020) In a 3 × 3 square grid, four of the nine squares are chosen
at random and shaded. In the resulting figure, a region is a set of shaded squares that are vertically or
horizontally (not diagonally) adjacent. For example, the following grid has two regions, one containing 3
squares and the other containing 1 square:

Find the expected value of the number of regions.


Solution: The number of regions in the grid can be calculated in the following formula:

(# regions) = 4 − (# adjacencies) + (# boxes), (*)

where an adjacency is two filled squares that are horizontally or vertically adjacent, and a box is four filled
squares in a 2 × 2 box. Why is this true? First, we overcount the number of regions by counting the number
of filled squares, which is 4. Next, whenever two filled squares are adjacent, that combines two regions into
one; so we subtract 1 for each adjacency. However, there is a case where this might now undercount: if
when combining two regions into one, those were already the same region. This happens only if there is a
2 × 2 box, where we have four filled squares, then four adjacencies, so that the final adjacency is between
two regions that were already combined. To correct for this, we add 1 if there is a 2 × 2 box in the figure.
Now from equation (*), we apply linearity of expectation to get:

E[# regions] = 4 − E[# adjacencies] + E[# boxes].

(Here E[X] denotes the expected value of X; linearity of expectation states that for any X and Y , E[X +Y ] =
E[X] + E[Y ].)
• To calculate E[# adjacencies], there are 12 possible adjacencies, and each adjacency happens with
probability 72 / 94 , because there are 94 ways to select four filled squares, and if two adjacent squares

are filled in, the two remaining filled squares can be chosen in 72 ways. So
7

2 21
E[# adjacencies] = 12 · 9 = 12 · = 2.
4
126

• To calculate E[# boxes], there are 4 ways to have a 2 × 2 box, so we have


4 4 2
E[# boxes] = 
9 = 126 = 63 .
4
13.2. LINEARITY OF EXPECTATION 95

Therefore, our final answer is


2 128
4−2+ = .
63 63

This next example is very similar to the use of indicator functions.
Example 13.2.4. (PUMaC; 2015) Andrew has 10 balls in a bag, each a different color. He randomly picks
a ball from the bag 4 times, with replacement. Find the expected number of distinct colors among the balls
he picks.
Solution: Let Xi be 1 if color #i appears in the four balls and 0 otherwise. Then we wish to find

E[X1 + X2 + · · · + X10 ].

By linearity of expectation, this is just

E[X1 + X2 + · · · + X10 ] = E[X1 ] + E[X2 ] + · · · + E[X10 ].



9 4 6561
Now in general, the probability that color #i does NOT appear in the four balls is 10 = 10000 . Hence the
6561 3439 3439
probability that color #i does appear in the four balls is 1 − 10000 = 10000 . It follows that E[xi ] = 10000 .
Hence
3439 3439
E[X1 + X2 + · · · + X10 ] = 10 · = .
10000 1000

The previous examples made use of the fact that expected value is additive. It might be natural to ask
if expected value is also multiplicative. In some cases (when X and Y are independent), the answer is yes.
Proposition 13.2.2. If X and Y are two independent random variables, then

E[XY ] = E[X]E[Y ].

This is not necessarily true if the two variables are dependent.


As an example of this proposition in the case of independent events, we present the following basic
example.
Example 13.2.5. When two dice are rolled, the two numbers shown are a and b. Find the expected value
of a · b in two different ways.
2
Solution: Note that the probability that the two rolls are (a, b) is 16 = 36
1
. Therefore, the desired
expected value is
1
E[ab] = (1 · 1 + 1 · 2 + 1 · 3 + · · · + 1 · 6 + 2 · 1 + 2 · 2 + · · · + 6 · 6)
36
1
= (1 + 2 + · · · + 6)(1 + 2 + · · · + 6)
36
212
=
36
49
= .
4

On the other hand, we know that E[a] = E[b] = 72 . Since two dice rolls are independent events, the above
2 49
proposition implies that E[ab] = E[a] · E[b] = 72 = . 
4
Next, consider what happens when we have two dependent events.
96 CHAPTER 13. EXPECTED VALUE

Example 13.2.6. Computer A randomly picks a number x from the set S = {1, 2}. Computer B also
randomly picks a number y from the same set, with the condition that it will only pick a number at least as
large as A’s number. Find E[x], E[y], and E[xy].

Solution: Clearly, E[x] = 21 · 1 + 12 · 2 = 1.5. The only way that Computer B will pick a 1 is if Computer
2
A also picks a 1. Therefore, computer B will pick a 1 with probability 21 = 14 , and it will pick a 2 with
probability 1 − 14 = 34 . Hence E[y] = 41 · 1 + 34 · 2 = 74 .
On the other hand, there are three possible choices for (x, y): either (1, 1), (1, 2), or (2, 2). We can
compute that these events happen with probability 41 , 14 , and 21 , respectively. Therefore,

1 1 1 11
E[xy] = (1 · 1) + (1 · 2) + (2 · 2) = .
4 4 2 4
Note that E[x] · E[y] = 1.5 · 1.75 = 2.625, while E[xy] = 2.75. 
This illustrates that when we have two dependent events (Computer B’s choice depends on Computer
A’s choice), it need not be the case that E[xy] = E[x] · E[y].

13.3 The Geometric Distribution


Definition 13.3.1. If in a certain process, each individual step is successful with probability p, we say
that the process follows the geometric distribution. In a geometric distribution, the probability that the first
instance of success is at step k is (1 − p)k−1 · p.

Proposition 13.3.1. If the random variable X denotes the first instance of success in a geometric distri-
bution (where each step is successful with probability p), then

1
E[X] = .
p

Proof. By definition of expected value,

E[X] = p · 1 + (1 − p) · p · 2 + (1 − p)2 · p · 3 + (1 − p)3 · p · 4 + · · · .

By the negative binomial theorem, this is equal to


1
E[x] = p(1 − (1 − p))−2 = ,
p

which is what we wanted to prove.

This proposition makes a lot of intuitive sense. To see this, suppose that we have a geometric process
that is successful with probability 13 . It would make sense that the first instance of success will, on average,
be on the third attempt.

Example 13.3.1. 5 people stand in a line facing one direction. In every round, the person at the front
moves randomly to any position in the line, including the front or the end. Find the expected number of
rounds needed for the last person of the initial line to appear at the front of the line.

Solution: Suppose that Carl is the last person in the initial line, so Carl is initially in position 5. At
each round, Carl will either stay in the same position (if the person at the front moves to a position in front
of Carl), or he will move up one place (if the person at the front moves to a position behind Carl).
At the start, Carl is in the fifth position. Since the person at the front moves to each of five possible
positions (one through five) with equal probability, there is a 15 probability that Carl will move up by one
place. This probability will remain the same until Carl has success in moving up a place, so it follows a
geometric distribution. Therefore, the expected number of rounds it takes for Carl to move to position 4 is
1
1/5 = 5.
13.3. THE GEOMETRIC DISTRIBUTION 97

If Carl is in the fourth position, then in each round, the probability that he moves to the third position
is 52 . Hence the expected number of rounds it takes Carl to move from position 4 to position 3 is 2/5 1
= 25 .
If Carl is in the third position, then in each round, the probability that he moves to the second position
is 53 . Hence the expected number of rounds it takes Carl to move from position 3 to position 2 is 3/5 1
= 35 .
If Carl is in the second position, then in each round, the probability that he moves to the first position
is 54 . Hence the expected number of rounds it takes Carl to move from position 2 to position 1 is 4/5 1
= 45 .
It follows that the expected number of rounds that it takes Carl to move to the first position is

5 5 5 125
5+ + + = .
2 3 4 12


98 CHAPTER 13. EXPECTED VALUE
Chapter 14

Graph Theory 1 - Introduction

The field of Graph Theory originated from a particular problem: the bridges in the Prussian city of
Königsberg (now the Russian city of Kaliningrad) connected four different landmasses together, as shown
below. A rather natural question was whether there existed a tour of the city that would cross each bridge
exactly once, so visitors could get a nice view from as many angles as possible.

The solution was found by none other than Leonhard Euler, whose method laid down the foundations
of graph theory. He recognized that the only information that mattered in the tour was the sequence of
bridges that was traversed, which let him reformulate the problem in simpler abstract terms by replacing
the landmasses with points:

He could then argue, using methods we will cover today, that such a tour is impossible. Notably, his
public reaction to his own solution was that it had very little to do with mathematics, and was simply a
puzzle with a newfound solution. This is a demonstration of another feature of Graph Theory: it doesn’t
feel like a rigid mathematical field, and so it will require a new perspective to learn.

14.1 Basics
Definition 14.1.1. A graph G = (V, E) is a pair such that V is a finite set of vertices (we represent these
as points) and E is a set of edges (i.e. 2-element subsets of V ). We can denote an edge between vertices u
and v in G by uv or vu.
This definition lets us have multiple edges between two vertices, or even have edges where the endpoints
are the same (called loops). In this course, we will focus on mainly simple graphs, which have no multiple
edges or loops. However, we will occasionally be interested in directed graphs.

99
100 CHAPTER 14. GRAPH THEORY 1 - INTRODUCTION

Definition 14.1.2. A graph with vertex set V and edge set E is directed if each edge in E is an ordered
pair of two vertices. To draw a directed graph, instead of drawing segments between connected vertices we
draw arrows from some vertices to other ones.

14.2 Parts of a Graph


We say two vertices u, v ∈ V are adjacent if there is an edge between them (i.e. uv ∈ E). The degree of a
vertex v ∈ V , denoted d(v), is the number of edges v is an endpoint of. A subgraph H of a graph G is a
graph such that V (H) ⊆ V (G) and E(H) ⊆ E(G).
A walk in a graph is a sequence of vertices where consecutive vertices are adjacent in the graph. A path
is a walk such that all vertices are distinct; a cycle is a walk such that all vertices are distinct except that
the last vertex is the same as the first vertex. We say a graph G is connected if for all pairs of vertices
(u, v) ∈ V (G), there exists a path from u to v. A component of G is a maximal (i.e. we cannot add anything
to it) connected subgraph of G.

With these definitions, we can talk about special types of graphs.


Some special types of graphs

• A tree is a connected, acyclic graph. A leaf of a tree is a vertex of degree 1.

• A graph is bipartite if we can partition our vertices into two sets X and Y such that X ∪ Y = V (G),
X ∩ Y = ∅, no two vertices in X are adjacent, and no two vertices in Y are adjacent.

• The complete graph on n vertices (denoted Kn ) is the graph of n vertices with all possible edges.

• A complete bipartite graph is a bipartite graph such that 2 vertices are adjacent if and only if they are
in different parts of our bipartition. We denote the complete bipartite graph with partitions of size a
and b by Ka,b .

• We denote the graph that is simply a path on n vertices by Pn .

• We denote the graph that is simply a cycle on n vertices by Cn .

14.3 Examples
Example 14.3.1. Among any six people, show that either three know each other, or three of them are
strangers to each other.

Solution:
One way we can think about this problem is by letting the people represent vertices, and connect two
people if they know each other. However, it will be more helpful for our argument if all of the people are
connected, so instead we consider a complete graph on six vertices, and color an edge red if the people at
the endpoints know each other, and color it blue if the people at the endpoints don’t know each other. This
creates a coloring of the edges of K6 with red and blue, called a two-coloring. It now suffices to show that
there’s either a fully-red triangle or a fully-blue triangle.
Consider any vertex. It has degree 5, so from the Pigeonhole Principle, three of its connecting edges will
have the same color. Without loss of generality, suppose this color is red. We will then have three red edges
meeting at a single vertex. Now consider the three edges e1 , e2 , e3 that connect these red edges together.
If any of e1 , e2 , or e3 are red, then we form a fully-red triangle with two of the red edges we already have.
Otherwise, all of e1 , e2 , e3 are blue, in which case we have a fully-blue triangle. In either case, we have
a monochromatic triangle, which means that there are three people that either know each other or are all
strangers.We proceed with a Double Counting argument. Note that d(v) counts all of the edges adjacent to
v. Therefore, summing this over all v counts the ordered pairs (v, e) of vertices v and edges e such that e
connects to v.
14.3. EXAMPLES 101

We now count these pairs by the edges instead. Each edge e will have two endpoints v1 and v2 , so each
e is in exactly two pairs (v, e). There are |E| edges, so there are 2|E| pairs. This must equal to the previous
total, which proves the given equality. 

Example 14.3.2. (Handshake Lemma) In any graph G(V, E), we have that
X
d(v) = 2|E|.
v∈V

Solution: We proceed with a Double Counting argument. Note that d(v) counts all of the edges adjacent
to v. Therefore, summing this over all v counts the ordered pairs (v, e) of vertices v and edges e such that e
connects to v.
We now count these pairs by the edges instead. Each edge e will have two endpoints v1 and v2 , so each
e is in exactly two pairs (v, e). There are |E| edges, so there are 2|E| pairs. This must equal to the previous
total, which proves the given equality. 

Example 14.3.3. (Erdös) Prove that if n + 1 numbers are selected from the set {1, 2, . . . , 2n}, then two of
these numbers will have an integer ratio.
Solution: Note that n + 1 is just above half of the numbers in the list. This suggests that we use a
parity argument. What we will do is construct a graph with 2n vertices whose construction focuses on the
odd numbers in the set {1, 2, . . . , 2n}.
Consider all the odd numbers from 1 to 2n − 1. Then for each even number e ∈ {2, 4, . . . , 2n}, group e
with its largest odd divisor. So for example, 1, 2, 4, 8, etc. are grouped together, 3, 6, 12, etc. are grouped
together, and so on. Finally, consider a graph G on 2n vertices labelled 1 through 2n where a and b are
connected if a | b or b | a. Then each group, e.g. (1, 2, 4, . . . ) or (3, 6, 12, . . . ), is a complete subgraph of G.
This is because each group of numbers forms a geometric sequence with common difference 2, so the ratios
are going to form powers of 2.
We have found n complete subgraphs of G, so if we choose n + 1 vertices, from the Pigeonhole Principle
two of them must fall within the same complete subgraph. As a result, the numbers corresponding to these
two vertices will divide each other, which proves the result. 

Example 14.3.4. (Balkan MO; 1985) There are 1985 participants to an international meeting. In any group
of three participants there are at least two who speak the same language. It is known that each participant
speaks at most five languages. Prove that there exist at least 200 participants who speak the same language.
Solution: The main difficulty with this problem is that it doesn’t give us any useful information on the
number of languages there are in the meeting. To get around this, we will analyze particular people, and
look at their languages.
Choose a person X, and assume for the sake of argument that they speak five languages. If everyone
 at
the meeting speaks one of these five languages, then we are done: by the Pigeonhole Principle, 1984
5 = 396
people speak the same language. Otherwise, assume there is some person Y that doesn’t speak any of the
languages that X speaks. For the sake of argument, suppose that Y speaks five other languages.
We now have two people, X and Y , that speak ten languages between them. If we take any of the
1983 other people, call them Z, and consider the triple (X, Y, Z), by the problem statement two of them
must speak the same language. This implies that Z speaks one of the ten languages that X and Y speak.
Therefore,
 1983  all of the 1983 other attendees speak one of these ten languages. By the Pigeonhole Principle,
10 = 199 of them speak the same one of these ten languages. If we group the corresponding X or Y
with these people, we will obtain 200 people who speak the same language.
If X or Y spoke fewer languages, then these calculations become even more favorable, so we will still be
able to produce 200 people who can speak the same language. This proves the result. 

Example 14.3.5. (USAMO; 1989) The 20 members of a local tennis club have scheduled exactly 14 two-
person games among themselves, with each member playing in at least one game. Prove that within this
schedule there must be a set of 6 games with 12 distinct players.
102 CHAPTER 14. GRAPH THEORY 1 - INTRODUCTION

Solution: Consider the graph G with 20 vertices where vertices correspond to members, and two vertices
are connected if their corresponding members played a game against each other. Then this graph G has 20
vertices and 14 edges, with each vertex having degree at least 1. The problem then is equivalent to showing
that there are six edges with twelve distinct endpoints.
Suppose we can choose k of the edges e1 , e2 , . . . , ek with 2k distinct endpoints, where k is large as
possible. It suffices to show that k ≥ 6. Since k is large as possible, there can’t be any edges disconnected
from e1 , e2 , . . . , ek . Therefore the remaining edges each are adjacent to one of e1 , e2 , . . . , ek .
We now count the vertices of G in two different ways. First, there are clearly 20 vertices. Now suppose
we count by edges. We have 2k vertices from the edges e1 , e2 , . . . , ek . Each of the 14 − k remaining edges
will be connected to these ei , and will possibly have an endpoint that is not included in our 2k. Therefore,
each of the 14 − k remaining edges will add at most 1 vertex to our count, so the number of vertices is at
most 2k + (14 − k). We conclude that
2k + 14 − k ≥ 20,
which rearranges to k ≥ 6. 
Chapter 15

Graph Theory 2 - Planarity and


Connectedness

15.1 Connectedness
We start with a review of some definitions from the previous chapter.

Definition 15.1.1. A graph G is called connected if between any pair of vertices of G there exists a path
connecting them. A tree is a connected graph with no cycles, and a leaf is a vertex of a tree with degree 1.

In order to prove statements about connected graphs, we need to answer the following question: What
are the “smallest” connected graphs? In other words, how many edges can we remove from a graph until it
starts being disconnected? It turns out that trees form the smallest connected graphs in this sense. We will
make this result precise over the next few results.

Proposition 15.1.1. Suppose a graph G with n vertices has at least n edges. Then it must contain some
cycle.

Proof. We proceed by induction on n. For the base case, suppose a graph G on 3 vertices has 3 edges: then
G must be K3 , in which case it has a cycle. Now suppose as the inductive hypothesis that for some constant
k ≥ 2, all graphs on k vertices will have a cycle if they have at least k edges.
Consider a graph G on k + 1 vertices with at least k + 1 edges.
Case 1: G has a vertex with degree 1 or fewer. Then we can consider the subgraph H obtained by
deleting this vertex, along with whatever edge it may possibly be connected to. Then H has k vertices, and
at least k edges. From the inductive hypothesis, H has a cycle, so G must have this same cycle.
Case 2: G has no vertices with degree 1 or fewer. Then all vertices of G have degree 2 or more. We
construct a cycle manually. Start at some vertex v1 ∈ V (G). Since d(v1 ) ≥ 2, we can path to some adjacent
vertex v2 . Now for each i ≥ 2, run the following process:

• If vi has appeared in our path before, stop. We will have formed a cycle.

• If vi was a new vertex, then since it has degree at least 2, we should be able to path out of it: proceed to
some new vertex v1 .

Using this process, whenever we arrive at a new vertex, we will always be able to path out of it. However,
there are only finitely many vertices of G, so we will eventually path to a vertex that was previously in our
path. We can then take the entire path between the two instances of this repeated vertex, and this will form
a cycle of G. This proves the inductive step in Case 2.
This completes the proof of the claim.

Proposition 15.1.2. Any connected graph G with n vertices has at least n − 1 edges.

103
104 CHAPTER 15. GRAPH THEORY 2 - PLANARITY AND CONNECTEDNESS

Proof. We proceed via induction on n. For n = 2, clearly a graph with 2 vertices has at least 1 edge. We use
this as our base case. For the inductive step, assume for some k ≥ 2 that a connected graph on k vertices
will have at least k − 1 edges.
Suppose, for contradiction, that there is a connected graph on k + 1 vertices with fewer than k edges.
Therefore, from the Handshake Lemma,
X
2(k − 1) ≥ 2 · |E| = d(v).
v∈V (G)

The sum of the degrees of the k + 1 vertices of G is 2k − 2, so from the Pigeonhole Principle at least one of
these vertices has degree 1 or fewer. Since G is connected, this vertex must have degree exactly 1. Let H be
the subgraph obtained by deleting this vertex, along with its edge. Then H must still be connected, but it
has k vertices and fewer than k − 1 edges. This violates the induction hypothesis, which is a contradiction.
Therefore we have completed the inductive step, and have consequently shown that any connected graph on
n vertices has at least n − 1 edges.
Proposition 15.1.3. The following statements are equivalent:
(a) G is a tree (i.e. connected and has no cycles)
(b) G is connected and has n − 1 edges
(c) G has n − 1 edges and no cycles
Proof. We first show that (a) implies (b). Since G is connected, it has at least n − 1 edges. Since G has no
cycles, it has fewer than n edges. Consequently, it has n − 1 edges.
We now show that (b) implies (c). Suppose for contradiction that there is a connected graph G with
n − 1 edges, containing a cycle. Since G is connected, there are paths between any two vertices of G.
Deleting an edge from the cycle will delete some paths, but we can simply reroute them around the cycle.
Therefore deleting an edge from the cycle does not disrupt the connectivity of G. Therefore there must exist
a connected graph with n − 2 edges, which is a contradiction. Therefore if G is connected and has n − 1
edges, then it has no cycles, and hence (b) implies (c).
We now show that (c) implies (a). We must show that if G has n − 1 edges and no cycles, then it must
be connected. Suppose for contradiction that such a graph G is not connected. Choose two vertices v and
w in V (G) such that there is no path from v to w. Then the edge vw between v and w must not be drawn.
Consider the graph H obtained by adding this edge. This graph H cannot contain cycles not using vw, or
else the graph G would also have a cycle. Therefore the cycles of H must all contain vw. However, this cycle
produces two paths between v and w: one that is simply the edge vw, and another path that is simply the
rest of the cycle. This path must be in G, which is impossible since G is not connected. We conclude that H
has no cycles. However, we got H by adding an edge to G, so H has n vertices and n edges. Consequently
H must have a cycle, which is a contradiction. We conclude that our graph G must be connected.

We have shown that (a) implies (b), (b) implies (c), and (c) implies (a), so if any one of these statements
is true, then all three are true. This proves their equivalence.
Example 15.1.1. A volleyball net has the form of a rectangular lattice with dimensions 50 × 600. What is
the maximum number of unit strings you can cut before the net falls apart into more than one piece? We
assume that the fibers are of sufficiently good quality that they will not fray, so the only way to split the net
into two pieces will be to cut it in two.
Solution: We consider this volleyball net as a graph with 51 · 601 = 30651 = v vertices, and

50 · 601 + 51 · 600 = 60650 = e

edges, where the edges corresponding to the unit segments. The problem then rephrases as the following:
how many of the edges can we possibly cut while maintaining the graph’s connectivity?
Note that if there are any cycles in the graph, we can delete one of its edges and maintain the graph’s
connectivity. Furthermore, if there are at least v edges, then there will be a cycle, so we can cut unit segments
15.2. PLANARITY 105

until we get to v − 1 edges. At this point, we will have a connected graph with v vertices and v − 1 edges.
From the previous proposition, this graph must be a tree, and will therefore have no cycles. Note that if we
delete any additional edge, we will have fewer than v − 1 edges on v vertices, so the graph will no longer be
connected. Therefore the maximum number of edges we can cut is e − (v − 1). This is equal to

60650 − (30651 − 1) = 30000 .

15.2 Planarity
When we draw graphs, we have a tendency to try to not make edges intersect. This idea can be formalized
like so:
Definition 15.2.1. We say that a graph is planar if it can be drawn in the plane such that no two edges
intersect.
Example 15.2.1. Is the following graph planar?

Solution: Note that the definition only requires that it can be drawn in the plane with no two intersecting
edges, so even if a graph appears to have intersecting edges, it might still be planar. In fact, this graph is
isomorphic to the following graph.

Therefore it is planar. 
Now if a graph is drawn to have no edge intersections, it will divide the plane into well-defined regions
called faces. It is an interesting question of how to count the faces of a planar graph. The following theorem
is a very powerful relationship between the number of faces, and the numbers of vertices and edges of the
graph.
Theorem 15.2.1. (Euler’s Theorem) If a connected planar graph has V vertices, E edges, and F faces,
then V − E + F = 2.
Proof. We first prove the result for trees. Every tree must have F = 1, because there is only the infinite
face. So it suffices to show that V − E + 1 = 2, or rather V = E + 1. This can be shown by induction on
the number of vertices. If a graph has one vertex, then it has no edges, so V = 1 and E = 0, and the result
holds true. Suppose that the result holds true for all trees with V = k, and suppose we are given a graph
with V = k + 1. It can be shown that every tree contains at least one leaf (we leave this as an exercise
to the reader), so by deleting the vertex and its corresponding edge, we create a graph with k vertices and
E − 1 edges. Thus by the inductive hypothesis, k = (E − 1) + 1. It follows that for the (k + 1)-vertex graph,
V = k + 1 = E + 1, so by induction, for every tree, we have V = E + 1. Therefore, since F = 1, it follows
that for every tree, V − E + F = 2.
If a graph is not a tree, then there exists some edge that can be removed without disconnecting the
graph. If the original graph has V vertices, E edges, and F faces, then after removing this edge, the graph
has Vnew = V vertices, Enew = E − 1 edges, and Fnew = F − 1 faces (the two faces on either side of the edge
will combine to form one new face—note that if they were already part of the same face, then removing the
edge would disconnect the graph). Therefore,

Vnew − Enew + Fnew = V − (E − 1) + (F − 1) = V − E + F.


106 CHAPTER 15. GRAPH THEORY 2 - PLANARITY AND CONNECTEDNESS

It follows that when we remove edges that do not disconnect the graph, the quantity V − E + F is an
invariant. Since the graph has a finite number of edges, this cannot continue forever, and when it is no
longer possible, the graph will be a tree. By the initial paragraph, a tree satisfies V − E + F = 2, hence the
value of the invariant is 2, so V − E + F = 2 for every connected graph.

This result is usually applied in one of two ways: we either use it on a planar graph to make some
definitive calculation, or we use it to show that a graph cannot be planar by proving inequalities based on
V , E, and F , as in the following example.

Example 15.2.2. (Utilities Problem) Suppose there are three houses on a plane, and each needs to be
connected to the water, gas, and electricity companies. Without using the third dimension, is there some way
to make these nine connections such that no two lines cross?

Water Gas Elec.

Solution: Note that we can represent the problem with the following graph.

We can rephrase the question as asking if this graph is planar. Suppose for sake of contradiction that the
graph is planar. Note that V = 6 and E = 9, and the graph is clearly bipartite. Because the graph is
bipartite, we can color the top three vertices red and the bottom three vertices green, so every edge has one
vertex of each color. Therefore, every face must contain an even number of vertices/edges, because any two
adjacent vertices on the face must be different colors.
Let N be the total number of pairs of the form (face, edge), where an edge borders a face. Then since
each face borders at least four edges, we find N ≥ 4F . On the other hand, each edge borders exactly two
faces, we know N = 2E. Hence 2E ≥ 4F , or F ≤ E2 . By Euler’s Formula, we know that 2 = V − E + F ,
hence
E
2=V −E+F ≤V −E+ .
2
It follows that E ≤ 2V − 4 for any bipartite planar graph. However, in the “utilities graph,” we know that
E = 9 and 2V − 4 = 8, so we have a contradiction. Hence the utilities graph cannot be planar. 
Recall that Euler’s Theorem was started for connected graphs. A natural question that we can ask is
how the formula is affected if the graph becomes disconnected. In fact, the formula is altered in an intuitive
way:

Corollary 15.2.2. If a planar graph has C connected components, Then V − E + F − C = 1.

Proof. We can induct on the number of connected components. Euler’s Theorem proves the result for
C = 1. Suppose that it holds true for all graphs with C = k connected components. Starting with a graph
with V vertices, E edges, F faces, and k connected components, suppose that we add a new connected
component that by itself contains V1 vertices, E1 edges, and F1 faces. Then by Euler’s Formula, we know
V1 − E1 + F1 = 2. When we add this graph to the side of our graph with k components, then clearly
Vnew = V + V1 and Enew = E + E1 . Also, note that the infinite face of the new component is the same as the
15.2. PLANARITY 107

infinite face of the original graph, so instead of adding F1 to the number of faces, we find Fnew = F +(F1 −1).
Therefore,

Vnew − Enew + Fnew − Cnew = (V + V1 ) − (E + E1 ) + (F + (F1 − 1)) − (k + 1)


= (V − E + F − k) + (V1 − E1 + F1 − 2)
= 1 + 0,

where in the last step, we used the inductive hypothesis and Euler’s Theorem for the new component.
Therefore, the formula must be true for k + 1 connected components. So by induction, the statement must
be true for all planar graphs.
We end the section with a profound application of Euler’s Theorem in a setting that is not a problem in
graph theory a priori.
Example 15.2.3. (Utah Math Olympiad; 2020; Variation) In a 3 × 3 square grid, each square is shaded
either black or white, each with probability 1/2. In the resulting figure, a region is a set of shaded squares
that are vertically or horizontally (not diagonally) adjacent. For example, the following grid has two regions,
one containing 3 squares and the other containing 1 square:

Find the expected value of the number of regions.


Solution: Given a coloring of the grid, we can define the following graph: the vertices are filled in
squares, and the edges are pairs of squares that are horizontally or vertically adjacent. For example, if there
are four squares colored in a 2 × 2 box, this will be a graph with four vertices and four edges (a 4-cycle).
The graph is planar: if we draw vertices in the center of the grid and edges between them, it is clear that
the drawing of the graph is a planar drawing. Therefore,

C = V − E + F − 1, (1)

From this we can calculate the expected number of regions by linearity of expectation:

E[C] = E[V ] − E[E] + E[F ] − 1.

Now, the first two of these are easy to calculate: E[V ] is 29 by linearity of expectation since there are 9
vertices, and E[E] = 12 4 = 3 since there are 12 possible edges—2 in each row and 2 in each column—and
each edge has a probability of 41 of existing in the graph.
For the final E[F ], we need to calculate what a face is. A face occurs, for starters, if there are four filled
in squares in a 2 × 2 box. For a larger face to exist, it requires an open space inside a cycle of squares; this
can only happen if there is a 3 × 3 box, and the interior (center square) of the box is unfilled. (Anything else
would result in smaller faces consisting of 2 × 2 boxes.) So the expected number of faces is the total expected
number of 2 × 2 boxes plus the expected number of 3 × 3 faces. The expected number of 2 × 2 boxes is
1 1
4· = ,
16 4
1
because there are four possible such boxes and each occurs with probability 16 ; and the expected number of
3 × 3 faces (with an open square in the middle) is just
1 1
9
= ,
2 512
108 CHAPTER 15. GRAPH THEORY 2 - PLANARITY AND CONNECTEDNESS

because there is only one possible such face, and only one possible coloring that achieves it. Finally, we will
always have the infinite face, so E[F ] = 14 + 512
1
+ 1.
Therefore, our answer is

9 1 1 7 1 897
E[V ] − E[E] + E[F ] − 1 = −3+ + = + = .
2 4 512 4 512 512

You might also like